0% found this document useful (0 votes)
110 views

Learning Techniques Biomedical Informatics Studies

Uploaded by

siradanbilgiler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
110 views

Learning Techniques Biomedical Informatics Studies

Uploaded by

siradanbilgiler
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 395

Studies in Big Data 68

Sujata Dash · Biswa Ranjan Acharya ·


Mamta Mittal · Ajith Abraham ·
Arpad Kelemen Editors

Deep Learning
Techniques
for Biomedical
and Health
Informatics
Studies in Big Data

Volume 68

Series Editor
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Big Data” (SBD) publishes new developments and advances
in the various areas of Big Data- quickly and with a high quality. The intent is to
cover the theory, research, development, and applications of Big Data, as embedded
in the fields of engineering, computer science, physics, economics and life sciences.
The books of the series refer to the analysis and understanding of large, complex,
and/or distributed data sets generated from recent digital sources coming from
sensors or other physical instruments as well as simulations, crowd sourcing, social
networks or other internet transactions, such as emails or video click streams and
other. The series contains monographs, lecture notes and edited volumes in Big
Data spanning the areas of computational intelligence including neural networks,
evolutionary computation, soft computing, fuzzy systems, as well as artificial
intelligence, data mining, modern statistics and Operations research, as well as
self-organizing systems. Of particular value to both the contributors and the
readership are the short publication timeframe and the world-wide distribution,
which enable both wide and rapid dissemination of research output.
** Indexing: The books of this series are submitted to ISI Web of Science, DBLP,
Ulrichs, MathSciNet, Current Mathematical Publications, Mathematical Reviews,
Zentralblatt Math: MetaPress and Springerlink.

More information about this series at http://www.springer.com/series/11970


Sujata Dash Biswa Ranjan Acharya
• •

Mamta Mittal Ajith Abraham


• •

Arpad Kelemen
Editors

Deep Learning Techniques


for Biomedical and Health
Informatics

123
Editors
Sujata Dash Biswa Ranjan Acharya
Department of Computer Science School of Computer Science
North Orissa University and Engineering
Takatpur, Odisha, India KIIT Deemed to University
Bhubaneswar, Odisha, India
Mamta Mittal
Computer Science and Engineering Ajith Abraham
Department Scientific Network for Innovation
G. B. Pant Government Engineering College and Research Excellence
New Delhi, Delhi, India Machine Intelligence Research Labs
Auburn, AL, USA
Arpad Kelemen
Department of Organizational Systems
and Adult Health
University of Maryland
Baltimore, MD, USA

ISSN 2197-6503 ISSN 2197-6511 (electronic)


Studies in Big Data
ISBN 978-3-030-33965-4 ISBN 978-3-030-33966-1 (eBook)
https://doi.org/10.1007/978-3-030-33966-1
© Springer Nature Switzerland AG 2020
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

Overview

Biomedical and Health Informatics is an emerging field of research at the inter-


section of information science, computer science, and health care. Health care
informatics and analytics is a new era that brings tremendous opportunities and
challenges due to easily available plenty of biomedical data for further analysis. The
aim of healthcare informatics is to ensure the high-quality, efficient healthcare,
better treatment and quality of life by efficiently analyzing of abundant biomedical,
and healthcare data including patient’s data, electronic health records (EHRs) and
lifestyle. Earlier, it was common requirement to have a domain expert to develop a
model for biomedical or healthcare; however, recent advancements in representa-
tion learning algorithms (deep learning techniques) allow to automatically learning
the pattern and representation of the given data for the development of such model.
Deep learning methods with multiple levels of representation in which at each
level the system learn higher abstract level representation. Deep learning based
algorithms has demonstrated great performance to a variety of areas including
computer vision, image processing, natural language processing, speech recognition,
video analysis, biomedical and health informatics etc. Deep learning approaches
such as neural networks such as deep belief network, convolutional neural network,
deep auto-encoder, and deep generative networks have emerged as powerful com-
putational models. These have shown significant success in dealing with massive
data for large number of applications due to their capability to extract complex
hidden features and learn efficient representation in unsupervised setting.
The book will play a vital role in improvising human life to a great extent. All
the researchers and practitioners those who are working in field of biomedical and
health informatics, and deep learning will be highly benefited. This book would be
a good collection of state-of-the-art approaches for deep learning based biomedical
and health related applications. It will be very beneficial for the new researchers and
practitioners working in the field to quickly know the best performing methods.
They would be able to compare different approaches and can carry forward their

v
vi Preface

research in the most important area of research which has direct impact on bet-
terment of the human life and health. This book would be very useful because there
is no book in the market which provides a good collection of the state-of-the-art
methods of deep learning based models for biomedical and health informatics as
Deep learning is recently emerged and very un-matured field of research in
biomedical and healthcare.
This book, Deep Learning Techniques for Biomedical and Health Informatics,
aims to present discussions on various applications of deep learning relating to the
Biomedical and Health Informatics problems and suggest latest research method-
ologies and emerging developments to benefit the researchers and practitioners. In
this volume, 49 researchers and practitioners of international repute have presented
latest research developments, current trends, state of the art reports, case studies and
suggestions for further development in the field of biomedical and health infor-
matics, and deep learning.

Objective

The purpose of this book is to report the latest advances and developments in the
field of biomedical and health informatics, and deep learning. The book comprises
the following three parts:
• Deep Learning for Biomedical Engineering and Health Informatics
• Deep Learning and Electronics Health Records
• Deep Learning for Medical Image Processing

Organization

There are 16 chapters in Deep Learning Techniques for Biomedical and Health
Informatics. They are organized into three parts, as follows:
• Part One: Deep Learning for Biomedical Engineering and Health Informatics.
This part has a focus on deep learning paradigms and its application in
biomedical and health informatics, clinical decision support systems, disease
diagnosis and monitoring systems and recommender systems for health infor-
matics. There are six chapters in this part. The first chapter looks into the
application of deep learning to healthcare data in the task like information and
relation extraction. The second and third contribution focus on discovery of
biomedical named entities from many biomedical text mining task applying
deep learning techniques. The fourth chapter introduces deep learning and
developments in neural network and then discusses its applications in healthcare
Preface vii

and its relevance in biomedical informatics and computational biology research


in public health domain. The fifth chapter discusses various existing deep
learning techniques and their applications for decision support in clinical sys-
tems. The sixth chapter discusses the challenges and issues of health recom-
mender system.
• Part Two: Deep Learning and Electronics Health Records. The second part
comprises seven chapters. The first contribution discusses about the design and
implementation of explainable deep learning system for healthcare using HER.
The second chapter audits the deep learning strategies connected with EHR
information examination and induction. The third chapter contribution focus on
the extensive application of deep learning in many domains, including bioin-
formatics for the analysis and classification of biomedical imaging data,
sequence data from omics and biomedical signal processing. The fourth chapter
discusses advanced distributed security techniques such as blockchain to protect
the health data from unauthorized access and the fifth contribution presents
CNN based classification for malaria disease to classify the blood films into
infected and normal blood films. The sixth chapter presents deep reinforcement
learning based approach for complete health care recommendations including
medicines to take, doctors to consult, nutrition to acquire and activities to
perform that consists of exercises and preferable sports. The seventh contribu-
tion presents the advantages in dealing with text-based extractions and retrievals
using deep learning techniques.
• Part Three: Deep Learning for Medical Image Processing. There are three
chapters in this part. The first chapter discusses several deep learning archi-
tectures which can be effectively used for HRV signal analysis for the purpose
of detection of diabetes. The second chapter discusses the issues and challenges
of DL approaches for analysing biomedical images and its application for
classification, registration and segmentation. The last chapter gives an overview
of deep learning-based segmentation algorithms with a special reference to brain
tumor classification, various challenges, along with its future scope.

Target Audiences

The current volume is a reference text aimed to support a number of potential


audiences, including the following:
• Researchers in this field who wish to have the up-to-date knowledge of the
current practice, mechanisms, and research developments.
• Students and academicians of biomedical and informatics field who have an
interest in further enhancing the knowledge of the current developments.
viii Preface

• Industry and peoples from Technical Institutes, R&D Organizations, and


working in the field of machine learning, deep learning, biomedical engineering,
health informatics, and related fields.

Baripada, Odisha, India Sujata Dash


Bhubaneswar, Odisha, India Biswa Ranjan Acharya
New Delhi, India Mamta Mittal
Auburn, AL, USA Ajith Abraham
Baltimore, MD, USA Arpad Kelemen
Acknowledgements

The editors would like to acknowledge the help of all the people involved in this
project and, more specifically, to the reviewers who took part in the review process.
Without their support, this book would not have become a reality.
First, the editors would like to thank each one of the authors for their time,
contribution, and understanding during the preparation of the book.
Second, the editors wish to acknowledge the valuable contributions of the
reviewers regarding the improvement of quality, coherence, and content presenta-
tion of chapters.
Last but not least, the editors wish to acknowledge the love, understanding, and
support of their family members during the preparation of the book.

Baripada, Odisha, India Sujata Dash


Bhubaneswar, Odisha, India Biswa Ranjan Acharya
New Delhi, India Mamta Mittal
Auburn, AL, USA Ajith Abraham
Baltimore, MD, USA Arpad Kelemen

ix
Contents

Deep Learning for Biomedical Engineering and Health Informatics


MedNLU: Natural Language Understander for Medical Texts . . . . . . . 3
H. B. Barathi Ganesh, U. Reshma, K. P. Soman and M. Anand Kumar
Deep Learning Based Biomedical Named Entity Recognition
Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Pragatika Mishra, Sitanath Biswas and Sujata Dash
Disambiguation Model for Bio-Medical Named Entity Recognition . . . . 41
A. Kumar
Applications of Deep Learning in Healthcare and Biomedicine . . . . . . . 57
Shubham Mittal and Yasha Hasija
Deep Learning for Clinical Decision Support Systems: A Review
from the Panorama of Smart Healthcare . . . . . . . . . . . . . . . . . . . . . . . . 79
E. Sandeep Kumar and Pappu Satya Jayadev
Review of Machine Learning and Deep Learning Based
Recommender Systems for Health Informatics . . . . . . . . . . . . . . . . . . . . 101
Jayita Saha, Chandreyee Chowdhury and Suparna Biswas

Deep Learning and Electronics Health Records


Deep Learning and Explainable AI in Healthcare Using EHR . . . . . . . . 129
Sujata Khedkar, Priyanka Gandhi, Gayatri Shinde
and Vignesh Subramanian
Deep Learning for Analysis of Electronic Health Records (EHR) . . . . . 149
Pawan Singh Gangwar and Yasha Hasija
Application of Deep Architecture in Bioinformatics . . . . . . . . . . . . . . . . 167
Sagnik Sen, Rangan Das, Swaraj Dasgupta and Ujjwal Maulik

xi
xii Contents

Intelligent, Secure Big Health Data Management Using Deep Learning


and Blockchain Technology: An Overview . . . . . . . . . . . . . . . . . . . . . . . 187
Sohail Saif, Suparna Biswas and Samiran Chattopadhyay
Malaria Disease Detection Using CNN Technique with SGD,
RMSprop and ADAM Optimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Avinash Kumar, Sobhangi Sarkar and Chittaranjan Pradhan
Deep Reinforcement Learning Based Personalized Health
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
Jayraj Mulani, Sachin Heda, Kalpan Tumdi, Jitali Patel,
Hitesh Chhinkaniwala and Jigna Patel
Using Deep Learning Based Natural Language Processing Techniques
for Clinical Decision-Making with EHRs . . . . . . . . . . . . . . . . . . . . . . . . 257
Runjie Zhu, Xinhui Tu and Jimmy Huang

Deep Learning for Medical Image Processing


Diabetes Detection Using ECG Signals: An Overview . . . . . . . . . . . . . . 299
G. Swapna, K. P. Soman and R. Vinayakumar
Deep Learning and the Future of Biomedical Image Analysis . . . . . . . . 329
Monika Jyotiyana and Nishtha Kesswani
Automated Brain Tumor Segmentation in MRI Images Using Deep
Learning: Overview, Challenges and Future . . . . . . . . . . . . . . . . . . . . . 347
Minakshi Sharma and Neha Miglani
Editors and Contributors

About the Editors

Sujata Dash received her Ph.D. in computational modeling from Berhampur


University, Orissa, India, in 1995. She is Associate Professor in P.G. Department of
Computer Science and Application, North Orissa University, Baripada, India. She
has published more than 150 technical papers in international journals, conferences,
and chapters of reputed publications. She has guided many scholars for their Ph.D.
in computer science. She is associated with many professional bodies like IEEE,
CSI, ISTE, OITS, OMS, IACSIT, IMS, and IAENG. She is in the editorial board of
several international journals and also reviewer of many international journals. Her
current research interests include Machine Learning, Distributed Data Mining,
Bioinformatics, Intelligent Agent, Web Data Mining, Recommender System, and
Image Processing.

Biswa Ranjan Acharya is an academic currently associated with Kalinga Institute


of Industrial Technology Deemed to be University along with pursuing Ph.D. in
computer application from Veer Surendra Sai University of Technology (VSSUT),
Burla, Odisha, India. He has received MCA in 2009 from IGNOU, New Delhi,
India, and M.Tech. in Computer Science and Engineering in the year of 2012 from
Biju Patnaik University of Technology (BPUT), Odisha, India. He is also associ-
ated with various educational and research societies like IEEE, IACSIT, CSI,
IAENG, and ISC. He has along with 2 years of industry experience as a software
engineer, a total of 10 years’ experience in both academia of some reputed uni-
versity like Ravenshaw University and software development field. He currently is
working on research area multiprocessor scheduling along with different fields like
Data Analytics, Computer Vision, Machine Learning, and IoT. He published some
research articles in international reputed journal as well as serving as reviewer.

xiii
xiv Editors and Contributors

Mamta Mittal is graduated in computer engineering from Kurukshetra University


Kurukshetra, in 2001, and received masters’ degree (Honors) in computer engi-
neering from YMCA, Faridabad. Her Ph.D. is from Thapar University, Patiala, in
computer engineering and rich experience of more than 16 years. Presently, she is
working at G. B. Pant Government Engineering College, Okhla, New Delhi
(under Government of NCT Delhi), and supervising Ph.D. candidates of GGSIPU,
New Delhi. She is working on DST approved Project “Development of IoT-based
hybrid navigation module for mid-sized autonomous vehicles.” She has published
many SCI/SCIE/Scopus indexed papers and Book Editor of renowned publishers.

Ajith Abraham is current working as Director of Machine Intelligence Research


Labs (MIR Labs), which has members from more than 100 countries. Dr. Abraham’s
research and development experience includes more than 27 years in the industry
and academia. He received M.S. from Nanyang Technological University (NTU),
Singapore, and Ph.D. in Computer Science from Monash University, Melbourne,
Australia. He works in a multi-disciplinary environment involving machine (net-
work) intelligence, cyber security, sensor networks, Web intelligence, scheduling,
data mining and applied to various real-world problems. He has given more than
100+ conference plenary lectures/tutorials and invited seminars/lectures in over 100
universities around the globe.

Arpad Kelemen is Professor of informatics at the University of Maryland,


Baltimore. He has expertise in biomedical informatics, human–computer interac-
tion, game development for education and self-management, data mining, machine
learning, artificial intelligence, intelligent patient care technologies, and software
and healthcare database development. He published over 60 peer-reviewed research
articles and two books, and served as PI, Co-PI, and Co-I for multiple grants from
NSF, NIH, HRSA, and New York State Foundation for Science, Technology, and
Innovation. Dr. Kelemen holds a Ph.D. in computer science from the University of
Memphis, MS and BS from the University of Szeged, Hungary.

Contributors

M. Anand Kumar Department of Information Technology, National Institute of


Technology Karnataka, Surathkal, India
H. B. Barathi Ganesh Amrita School of Engineering, Center for Computational
Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore,
India
Sitanath Biswas North Orissa University, Baripada, India
Suparna Biswas Department of Computer Science and Engineering, Maulana
Abul Kalam Azad University of Technology, Kolkata, West Bengal, India
Editors and Contributors xv

Samiran Chattopadhyay Department of Information Technology, Jadavpur


University, Kolkata, West Bengal, India
Hitesh Chhinkaniwala Adani Institute of Infrastructure Engineering, Ahmedabad,
India
Chandreyee Chowdhury Computer Science and Engineering, Jadavpur
University, Kolkata, India
Rangan Das Department of Computer Science and Engineering, Jadavpur
University, Jadavpur, Kolkata, India
Swaraj Dasgupta Department of Computer Science and Engineering, Jadavpur
University, Jadavpur, Kolkata, India
Sujata Dash North Orissa University, Baripada, India
Priyanka Gandhi Department of Computer Engineering, VESIT, Mumbai, India
Pawan Singh Gangwar Delhi Technological University, Delhi, India
Yasha Hasija Delhi Technological University, Delhi, India
Sachin Heda Department of Computer Science and Engineering, Institute of
Technology Nirma University, Ahmedabad, India
Jimmy Huang Information Retrieval and Knowledge Management Research Lab,
School of Information Technology, York University, Toronto, Canada
Monika Jyotiyana Central University of Rajasthan, Bandar Sindri, Ajmer, India
Nishtha Kesswani Central University of Rajasthan, Bandar Sindri, Ajmer, India
Sujata Khedkar Department of Computer Engineering, VESIT, Mumbai, India
A. Kumar Department of Computer Science and Engineering, National Institute
of Technology Raipur, Raipur, Chhattisgarh, India
Avinash Kumar School of Computer Engineering, KIIT DU, Bhubaneswar, India
Ujjwal Maulik Department of Computer Science and Engineering, Jadavpur
University, Jadavpur, Kolkata, India
Neha Miglani Department of Computer Engineering, National Institute of
Technology, Kurukshetra, India
Pragatika Mishra Gandhi Institute for Technology, Bhubaneswar, India
Shubham Mittal Delhi Technological University, Delhi, India
Jayraj Mulani Department of Computer Science and Engineering, Institute of
Technology Nirma University, Ahmedabad, India
Jigna Patel Department of Computer Science and Engineering, Institute of
Technology Nirma University, Ahmedabad, India
xvi Editors and Contributors

Jitali Patel Department of Computer Science and Engineering, Institute of


Technology Nirma University, Ahmedabad, India
Chittaranjan Pradhan School of Computer Engineering, KIIT DU,
Bhubaneswar, India
U. Reshma Arnekt Solutions Pvt. Ltd., Magarpatta City, Pune, Maharashtra, India
Jayita Saha Computer Science and Engineering, Jadavpur University, Kolkata,
India
Sohail Saif Department of Computer Science and Engineering, Maulana Abul
Kalam Azad University of Technology, Kolkata, West Bengal, India
E. Sandeep Kumar Department of Telecommunication Engineering, M.S.
Ramaiah Institute of Technology, Bengaluru, India
Sobhangi Sarkar School of Computer Engineering, KIIT DU, Bhubaneswar,
India
Pappu Satya Jayadev Department of Electrical Engineering, IIT Madras,
Chennai, India
Sagnik Sen Department of Computer Science and Engineering, Jadavpur
University, Jadavpur, Kolkata, India
Minakshi Sharma Department of Computer Engineering, National Institute of
Technology, Kurukshetra, India
Gayatri Shinde Department of Computer Engineering, VESIT, Mumbai, India
K. P. Soman Amrita School of Engineering, Center for Computational
Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore,
India
Vignesh Subramanian Department of Computer Engineering, VESIT, Mumbai,
India
G. Swapna Amrita School of Engineering, Center for Computational Engineering
and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore, India
Xinhui Tu School of Computer Science, Central China Normal University,
Wuhan, China
Kalpan Tumdi Department of Computer Science and Engineering, Institute of
Technology Nirma University, Ahmedabad, India
R. Vinayakumar Amrita School of Engineering, Center for Computational
Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Coimbatore,
India
Editors and Contributors xvii

Runjie Zhu Information Retrieval and Knowledge Management Research Lab,


Department of Electrical Engineering and Computer Science, York University,
Toronto, Canada
Abbreviations

AD Alzheimer’s disease
AdaGrad Adaptive gradient algorithm
ADC Analog-to-digital converter
ADHD Attention-deficit hyperactivity disorder
ADNI Alzheimer’s disease neuroimaging initiative
AE Auto-encoders
AES Advanced Encryption Standard
AFLC Adaptive fuzzy leader clustering algorithm
AHE Adaptive histogram equalization
AI Artificial intelligence
AMBE Absolute mean brightness error
ANFIS Adaptive neuro-fuzzy inference system
ANN Artificial neural network
ANS Autonomic nervous system
Anti-CPP Anti-cyclic citrullinated peptide
ApEn Approximate entropy
AR Autoregressive
AUC Area under curve
AV Atrioventricular
BBHE Bi-histogram equalization
BCHC Birmingham Community Healthcare
BERT Bidirectional Encoder Representations From Transformer
BETA Blackbox Explanations Using Transparent Approximations
Bi-LSTM Bidirectional long short-term memory
BiLM Bidirectional language model
Bio-NER Biomedical named entity recognition
BMESO B-Begin M-Middle E-End S-Single O-Outside
BMI Body mass index
BOE Bag of events
BoW Bag of words

xix
xx Abbreviations

BRCA1 Breast cancer gene type 1


BUN Blood urea nitrogen
C4.5 A decision tree algorithm
CAD Computer-aided design
CAMDM Computer-aided medical decision making
CAN Cardiovascular autonomic neuropathy
CAT Computerized axial tomography
CBIR Content-based image retrieval
CBoW Continuous bag of words
CD Correlation dimension
CDBN Convolutional deep belief networks
CDC Centers For Disease Control And Prevention
CDP Code On Dental Procedures And Nomenclature
CDSS Clinical decision support system
CE Character embedding
CEC Constant error carousel
CGMS Continuous glucose monitoring system
CLAHE Contrast-limited adaptive histogram equalization
CM Confusion matrix
CNN Convolution neural network
CP Clinical predictions
CPI Compound–protein interaction
CPT Current procedural terminology
CRF Conditional random field
CRP C-reactive protein
CRPS Continuous ranked probability score
CSF Cerebrospinal fluid
CSG Continuous skip-gram
CT Computed tomography
CUIs Concept unique identifiers
DAE Denoising auto-encoders
DAG Directed acyclic graph
DBM Deep Boltzmann machine
DBN Deep belief network
DCNNs Deep convolutional neural networks
DDI Drug–drug interaction
DES Data Encryption Standard
DET Determinism
DFA Detrended fluctuation analysis
DIARETDB1 Diabetic Retinopathy Database
DL Deep learning
DM Diabetes mellitus
DNA Deoxyribonucleic acid
DNN Deep neural network
Abbreviations xxi

DQN Deep Q network


DRG Diagnostic related grouping
DRL Deep reinforcement learning
DRMM Deep relevance matching model
EBV Epstein–Barr virus
ECG Electrocardiogram
ED Encoder–decoder
EE Energy expenditure
EEG Electroencephalogram
EHR Electronic health records
EI Extended intelligence
ELMO Embeddings from language models
EM Expectation maximization
E-Mail Electronic mail
EMD Empirical mode decomposition
EMR Electronic Health Records
EPS Epsilon
ESR Erythrocyte sedimentation rate
ESRD End-stage renal disease
FCM Fuzzy c-means
FDA Food And Drug Administration
FFT Fast Fourier transform
FHE Fuzzy logic-based histogram equalization
FIS Fuzzy inference system
FITBIR The Federal Interagency Traumatic Brain Injury Research
FN False negative
FP False positive
GAN Generative adversarial network
GBDT Gradient boosting decision trees
GBT Gradient boosting tree
GLoVe Global vector
GM Gray matter
GPS Global Positioning System
GPUs Graphical processing units
GRAM Graph-based attention model
GRU Gated recurrent unit
GSN Generative stochastic network
HAR Human activity recognition
HbA1c Hemoglobin A1c
HCPCS Healthcare Common Procedure Coding System
HCUP Healthcare Cost And Utilization Project
HDL High-density lipoproteins
HE Histogram equalization
HER Hindsight experience replay
HF Heart failure
xxii Abbreviations

HIN Heterogeneous information network


HMD Human Mortality Database
HMM Hidden Markov model
HOS Higher-order spectrum
HPI History of patient illness
HRS Health recommender systems
HRV Heart rate variability
i2b2 Informatics For Integrating Biology and The Bedside
IBL Instance-based learning
ICD International Classification of Diseases
ICD9 International Classification of Diseases 9
ID Identifier
IDPs Intrinsically disordered proteins
IDRs Intrinsically disordered regions
IE Information extraction
IMA Indian Medical Association
IMF Intrinsic mode function
IOB Inside—Outside—Beginning
IoT Internet of things
IR Information retrieval
JSON JavaScript Object Notation
KNN K-nearest neighbor
KPCA Kernel principal component analysis
LA Left arm
LAM Laminarity
LDA Linear discriminant analysis
LDL Low-density lipoproteins
LIDC Lung Image Database Consortium Dataset
LIME Local interpretable model
LL Left leg
LoG Laplacian of Gaussian
LSDB Locus-specific databases
LSTM RNN Long short-term Memory RNN
LSTM Long short-term memory
LV Left ventricle
MCEMJ Medical Concept Embeddings From Medical Journals
MDF Markov decision process
medGAN Medical Generative Adversarial Network
MEMM Maximum entropy Markov model
MICCAI Medical image computing and computer-assisted intervention
MIDAS The Multimedia Medical Archiving System
MIL Multi-instance learning
MILA Montreal Institute For Learning Algorithms
MiME Multilevel Medical Embedding
MIMIC Medical Information Mart For Intensive Care
Abbreviations xxiii

MinPts Minimum points


MiRNA Micro-ribonucleic acid
ML Machine learning
MLEE Multilevel event extraction
MLP Multilayer perceptron
MRI Magnetic resonance imaging
MRNA Messenger ribonucleic acid
MSE Mean square error
MTL Multi-task learning
MTM Multi-task model
NDC National Drug Codes
NDD Neurodegenerative disorders
NEC Named entity classification
NED Named entity detection
NER Named entity recognition
NGS Next-generation sequencing
NIHCC National Institute of Health Clinical Centre
NLM National Library of Medicine
NLP Natural language processing
NLU Natural language understanding
NMS Non-maxima suppression
NN Neural network
NNE Non-named entities
NO Nitric oxide
NP Noun phrases
OAI Osteoarthritis initiative
OASIS Open Access Series of Imaging Studies
OGTT Oral glucose tolerance test
PCA Principal component analysis
PD Parkinson’s disease
PDA Personal digital assistant
PET Positron emission tomography
PHQ Patient Health Questionnaire
PII Personally identifiable information
PINN Pairwise input neural network
PNS Parasympathetic nervous system
POMDP Partially observed Markov decision process
POS Part of speech
PoW Proof of work
PP Prepositional phrases
PPG Photoplethysmography
PPI Protein–protein interaction
PSD Power spectrum density
PSNR Peak signal–noise ratio
xxiv Abbreviations

QoS Quality of service


QSAR Quantitative structure−activity Relationship
RA Right arm
RBM Restricted Boltzmann machine
RCNNs Region convolutional neural networks
RE Relation extraction
ReLU Rectified linear unit
RF Random forest
RL Representation learning
RMSE Recursive mean separate histogram equalization
RMSProp Root mean square propagation
RNA Ribonucleic acid
RNN Recurrent neural network
RoI Region of interest
RP LIME Random pick local interpretable model
RQA Recurrence quantification analysis
RR Recurrence rate
RS Recommender systems
RSA Rivest–Shamir–Adleman
RSNA Radiological Society of North America
SA Sinoatrial
SAE Sparse auto-encoders
SampEn Sample entropy
SBE Surrounding-based embedding feature
SCR Summary care records
SDP Shortest dependency path
SEER Survival Epidemiology And End Results Program
SGD Stochastic gradient descent
SHMS Smart healthcare monitoring system
SIFT Scale-invariant feature transform
SiRNA Small interfering ribonucleic acid
SMS Short message service
SNS Sympathetic nervous system
SP LIME Selective pick local interpretable model
SPECT Single-photon emission computed tomography
SPPMI Shifted positive pointwise mutual information
SQL Structured query language
SRL-RNN Supervised reinforcement learning with recurrent neural network
SSIM Structural similarity index mean
STARE Structured analysis of the retina
SVD Singular value decomposition
SVM Support vector machine
TCIA The Cancer Imaging Archive
TD Temporal difference
Abbreviations xxv

TE Tone entropy
TG Triglycerides
TN True negative
TP True positive
t-SNE T-distributed stochastic neighbor embedding
TT Trapping time
UCI University of California, Irvine
UMLS Unified medical language system
UQI Universal quality index
USF University of Southern California
UTI Urinary tract infection
VAE Variational auto-encoders
VEGF Vascular endothelial growth factor
VGG Visual geometry group
VHL Von-Hippel–Lindau Illness
VIA Visual and image analysis
VP Verb phrases
WBAN Wireless body area network
WBCD Wisconsin Breast Cancer Dataset
WE Word embedding
WHO World Health Organization
WM White matter
XML Extensible Markup Language
Deep Learning for Biomedical
Engineering and Health Informatics
MedNLU: Natural Language
Understander for Medical Texts

H. B. Barathi Ganesh, U. Reshma, K. P. Soman and M. Anand Kumar

Abstract Natural Language Understanding is one of the essential tasks for building
clinical text-based applications. Understanding of these clinical texts can be achieved
through Vector Space Models and Sequential Modelling tasks. This paper is focused
on sequential modelling i.e. Named Entity Recognition and Part of Speech Tagging
by attaining a state of the art performance of 93.8% as F1 score for i2b2 clinical cor-
pus and achieves 97.29% as F1 score for GENIA corpus. This paper also states the
performance of feature fusion by integrating word embedding, feature embedding
and character embedding for sequential modelling tasks. We also propose a frame-
work based on a sequential modelling architecture, named MedNLU, which has the
capability of performing Part of Speech Tagging, Chunking, and Entity Recognition
on clinical texts. The sequence modeler in MedNLU is an integrated framework
of Convolutional Neural Network, Conditional Random Fields and Bi-directional
Long-Short Term Memory network.

1 Introduction

Medical fields generate digital data in the form of clinical reports—structured/semi-


structured data, raw data and the amount of data consumers/patients generate in

H. B. Barathi Ganesh (B) · K. P. Soman


Amrita School of Engineering, Center for Computational Engineering and Networking (CEN),
Amrita Vishwa Vidyapeetham, Coimbatore, India
e-mail: [email protected]
K. P. Soman
e-mail: [email protected]
U. Reshma
Arnekt Solutions Pvt. Ltd., Pentagon P-3, Magarpatta City, Pune, Maharashtra, India
e-mail: [email protected]
M. Anand Kumar
Department of Information Technology, National Institute of Technology Karnataka, Surathkal,
India
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 3


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_1
4 H. B. Barathi Ganesh et al.

social media platforms. No one individual can acquire and maintain the knowledge
needed to comprehend the entirety of these data. Here comes the need of Natural
Language Processing (NLP), which is one of the subfields of Artificial Intelligence.
MedNLU is a framework to address the challenges involved in understanding the
information that are hidden in the midst of digital health data. This framework will
act as a fundamental component in many health care applications that requires natural
language processing and understanding.
The MedNLU comprises of subfields in Artificial Intelligence like NLP, Con-
ventional Machine Learning and Deep Learning. It takes the health-care texts or
health-care documents as the input and outputs the tokenized text, chunked text,
parsed text, entities and Part of Speech (POS) tags associated with the medical text.
By utilizing these entities and POS tags, knowledge base can be built, which can
be used for data base management and conversational systems. The components of
MedNLU framework are given in Fig. 1.
Most of the health documents are produced using Electronic Health Records
(EHRs) that includes records of patient’s family history, reason for initial complaint,
diagnosis and treatment, prescription medication, lab tests and results, record of
visits, administrative and billing data, patient demographics, progress notes, vital
signs, medical histories, immunization dates, allergies, radiology images and so on.
Almost all the details of a patient will be readily available for clinicians and physicians

Fig. 1 MedNLU framework


MedNLU: Natural Language Understander for Medical Texts 5

at any point of time. With the help of NLP in Health care, the end system that required
a human assistance to re-check the documents got replaced by NLP based systems.
There are hospitals around the world which have already started using NLP on a
daily basis.
Extraction of information from it will help to develop the application like deci-
sion support systems, adverse drug reaction identification, pharmacovigilance, effec-
tive management of pharmacokinetics, patient cohort identification, effective EMR
development and maintenance. This required information is mostly extracted through
NLU tasks like Named Entity Recognition (NER), Part of Speech (POS) tagging and
Chunking [1–4].
So far medical domain has been mostly using NLU, which worked on rule-based
methodology [5, 6]. Rule-based is nothing but a set of hand coded rules to extract
valuable information from the medical data. With respect to the knowledge resources
that needed to be extracted in each set of documents, certain rules were framed that
were convenient for the structure of each document. In simple, document specific
rules were used for knowledge extraction. It was a little later when algorithm driven
models were used and reduced the workload of manually encoding the datum [7, 8].
Recently researchers have moved into applying Deep Learning to the Health care
data in the task like NER and relation extraction [9–11].
By observing this here we state the performance of feature fusion by integrat-
ing word embedding, feature embedding and character embedding for sequential
modelling tasks. Sequence modeler in MedNLU is an integrated framework of Con-
volutional Neural Network (CNN), Long-Short Term Memory (LSTM) network and
Conditional Random Fields (CRF). The proposed framework named MedNLU, has
the capability of performing Named Entity Recognition, Part of Speech Tagging,
Parsing and Chunking on clinical texts. This experiment also proves that without
having a domain specific word embedding model, the sequence model architecture
attains state of art performance using word embeddings developed from general
English text.

2 Related Works

Effective computation of dense word matrix and addition of downstream model [12]
on word2vec with different architecture is published in [13] which has a large impact
on research group to make use of the Big Data available in healthcare domain. Thus,
Text Classification tasks like sentiment analysis, Text summarization, Information
extraction (IE) [14] and Information retrieval (IR) which are some of common NLP
problems [15] have started using word embedding. Because of its acclaim fields in
health care and bioinformatics uses the same. Some of healthcare problems like Rela-
tion Extraction (RE), Named Entity Recognition (NER) [1], drug-disease interaction,
medical synonym extraction [2], and chemical-disease relation are getting special
attention. Closed set small corpus or general big corpus such as Google news and
6 H. B. Barathi Ganesh et al.

Wikipedia [16] have been used by people most of the time for training the embed-
ding models. These models cannot be directly used since clinical texts includes more
clinical words than the general words and it is not following the general grammar
patterns.
After computing the word vectors, different methodologies were used for eval-
uating the word embedding models. Context predicting and context counting from
semantic vectors are few among them in which the relation between data and corre-
lation issues with the different parameters are measured for lexical semantic tasks to
evaluate the word embedding model [17]. Counter predicting model is chosen over
count-based model due to its ability to give better results. Latent Semantic Analysis
was used by Landauer Thomas [18] for indirect knowledge accretion from text and
analysis for similarities in space were done by local co-occurrence. Unsupervised
vectors were used for classification problems in analogy tasks by Turney [19] and
this unsupervised way of learning for text applications were tried to be modified by
many others [20]. In bioinformatics domain assessment of word embeddings was
done by Pakhomo [12].
Due to restrictions on the use of clinical texts (HIPAA), work available on clinical
POS tagging is much less. POS annotation of 390,000 pediatric sequence from text
at Cincinnati Children’s Medical Centre was reported by Pestian et al. [3]. With
the addition of Special Lexicon into tagger wordlist, tagger which is comparable to
dTagger after training acquired an accuracy of 91.5%. But both tagger and the corpus
were not available. In order to reduce the dimensions of clinical text annotation while
co-training a POS tagger along with WSJ corpus Liu et al. [4, 21] developed sampling
methods. While evaluating one of the sampling methods in tagging pathology reports,
84% of the training data found to be reduced giving an accuracy of 92.7%.
Due to the domain constraints, annotated corpus as well as the trained tagger
were not available to the research community. Mayo Clinic in Rochester, Minnesota
developed MED corpus [4] having 100,650 POS-tagged tokens from 273 clinical
notes. An accuracy of 93.6% on the clinical notes was achieved when annotations
were pooled with GENIA and POS-tagged corpora [22]. Even with the unavailability
of clinical text corpora, Mayo Clinic made a biomedical NLP package cTAKES [6]
which is a full- established tagger made as a pre-trained reusable model.
The classic methods of doing NER were dictionary based and Rule based
approaches [5], which required domain expertise for detecting proper rules. Earlier
most of the researchers, those focused on named entity recognition tasks mostly pro-
posed the conventional machine learning approaches or using a grouping of conven-
tional machine learning and rule-based approaches. In [23] different supervised and
semi-supervised machine learning algorithms were used for NER problems which
concentrated on domain-dependent attribute and specialized text features. Hybrid
models made by concatenating Conditional Random Fields (CRF) and Support Vec-
tor Machines (SVM) algorithms combined with different pattern matching rules
gave better output as shown in [7]. In [8] combining some pre-processing techniques
like annotation and true casing with CRF based NER seems to have better concept
extraction performance. i2b2 challenge top performed models employed CRF and
MedNLU: Natural Language Understander for Medical Texts 7

semi markov Hidden Markov Models (HMM) with the F-score value of 0.85 in the
shared task.
Brown clustering method was used to derive unsupervised feature representations
from unlabelled corpora joined with HMM algorithm that was semi-supervised, was
selected as the best performing system for 2010 i2b2/VA challenge [24]. Multiple
aspect relations between words are not captured by one-hot unsupervised word fea-
ture representation from Brown clustering. Thus Jonnalagadda [25] proposed clinical
Entity recognition that was improved by including distributional word representa-
tion with random indexing model. By integrating word embedding obtained from
English Wikipedia corpus has been applied for the different NER tasks [25] which
found out to be a successive approach. CRF based concept extraction system [26] got
an enhanced performance through binarized word embedding obtained from domain
specific corpora.
By the commencement of deep learning, a subset of machine learning, unparal-
leled results were obtained for visual, NER and speech. Features are automatically
learned in neural networks which reduces the man power that was earlier needed for
machine learning, making neural network advantageous than conventional machine
learning algorithms. Researchers now started applying Deep Learning algorithms to
the health care data in the task like NER and relation extraction [9–11].

3 Methodology

The neural network architecture used for our implementation has multiple compo-
nents. The architecture of the entire process is depicted in Fig. 2.
The text representation is first and foremost technique in any natural language
understanding task. It sets the stage for the performance of subsequent machine
learning or deep learning algorithm. In our problem statement we transformed the
input sentence into a vector representation combining three different attributes named
as word embeddings, character embedding and feature vector. The character embed-
dings are computed through CNN using the methodology described in [27]. In this
experiment we have used a domain specific word embedding model developed from
Journal of Medical Case Reports (Health Embedding) and also, we have experi-
mented with the architecture with word embeddings from Google (Google Embed-
ding). By fusing these three vectors to the network with Bi-LSTM followed by CRF or
SoftMax makes the final prediction. The developed health embeddings are evaluated
through both qualitative and quantitative methods.

3.1 Word Embedding

Word embedding captures the contextual meaning of words in terms of a low dimen-
sional vector. A word vector should clearly represent the distribution of adjacent
8 H. B. Barathi Ganesh et al.

Fig. 2 Sequencing modeler

words around the current word. This approach of representing words has helped
achieve state of the art performance for many challenging natural language process-
ing tasks. The two major models for learning word embedding were Continuous
Bag-of-Words (CBoW) model which learns current word representation based on
adjacent words (or context) and Continuous Skip-Gram model which learns by pre-
dicting the adjacent words given a context word [13, 28].
In our experimentation, we employed CBoW model for word embeddings. The
input layer consists of context words with a word window of size S and Vocabulary
V. This input is passed to hidden layer h which is an N-dimensional vector. Finally,
the output y is one-hot encoded word from training examples. The input layer is con-
nected to the hidden layer via a V × N weight matrix W and hidden layer is connected
to output layer using a N × V weight matrix Wt. The forward pass computations are
performed by first computing the output of hidden layer h as follows:

1 
s
h= W xi (1)
s i=1

Finally, the output is computed as:


 
exp u j
y j = P(w1 , . . . , wc ) =    (2)
v
j=1 exp u j

where, u j is the input to each layer in output layer. This forward pass is followed by
a backward propagation in which the model learns the parameters in term of weight
MedNLU: Natural Language Understander for Medical Texts 9

matrices W and Wt . The weight matrices are initialized with random values. The
cost function (E) which is just the conditional probability of output word given the
input word is computed using the training examples fed to the model. Our objective
is to maximize the conditional probability. Maximizing the conditional probability is
similar to minimizing the negative log probability. The final objective function could
be written as:

minimi ze j = − log P(w1 , ..., wc ) (3)

The optimization procedure includes gradient computation of the objective func-


tion with respect to the unknown parameters [14]. The parameters are finally updated
at each iteration using Stochastic Gradient Descent.

3.2 Character Embedding

The character level representation of words were extracted using a Convolutional


Neural Network (CNN). The CNN helps extracts morphological information from
all the characters in a word and transform it into neural encodings. Earlier research
has shown that Convolutional Neural Networks is one of the prominent approaches
to mine the prefix (first n characters) and suffix (last n words) information from
characters of respective words and represent them as a lower dimensional vector call
character embedding. Figure 2 shows the CNN architecture, which is used to mine
the character-level vector representation of a given word. This architecture is similar
to the architecture proposed by Chiu et al. [27]. Except the character type features, in
this experiment we have used only the character embeddings as the inputs to CNN.
The overview of the architecture for extracting character embedding using CNN is
given in Fig. 3.

3.3 Feature Vector

The feature vector is just one hot encoded vector. It transforms the 7 categorical
attributes into one hot encoding vector. The different categorical attributes are Start
case, uppercase, lowercase, all numeric, partially numeric, contains digit, and others.
The final vector input is the concatenation of character embedding (vector for char-
acter representation), word embedding (vector for word representation), and feature
vector (categorical attributes). We call this concatenation as Feature Fusion. The
word embedding from pre-trained google news vectors were used in one setup while
in the other we trained our own embeddings on healthcare data. These healthcare
embeddings seem to work better than the pre-trained embeddings. In our experimen-
tation, we have observed that the implementation using feature fusion yields better
results than word embeddings alone.
10 H. B. Barathi Ganesh et al.

Fig. 3 CNN for Character embedding

3.4 Bidirectional Long Short-Term Memory (Bi-LSTM)

The textual data is nothing but a string of words put together with some language
specific rules. The most suitable network architecture which inherently works well
with sequential data is Long Short-Term Memory (LSTM) networks. The network
architecture of LSTM differs in terms of directionality. It could be unidirectional
or bi-directional. The Bi-LSTM has access to the information from past as well as
future [29, 30].
The LSTM network consists of a set of memory blocks. Each LSTM cell has a
self-connected memory cell and three gates namely, input, output and forget gates.
These inputs, output and reset gates corresponds to write, read and reset operations
for a single LSTM cell. These memory blocks help LSTM cells to retain information
for a longer duration of time and it also help solve long range dependency issues.
In sequential task of Natural Language Processing (NLP), it is always better to have
both past as well as future contexts. However, an LSTM cell retains information
from the past values not the future values. An elegant solution to the aforementioned
scenario is to use a Bi-directional LSTM cell [29]. The idea is to replicate the LSTM
cell and stack it side by side. The first cell reads the input as-is and the second half
reads the same input but a reverse copy of it. It has practically proven to work better
for sequential tasks.
MedNLU: Natural Language Understander for Medical Texts 11

3.5 Conditional Random Fields

In sequence labelling task, it is always beneficial to consider the correlation between


adjacent labels. In NLP tasks like Part of Speech (POS) tagging and Named Entity
Recognition (NER), there are multiple labels per sequence. Instead of decoding
individual labels we model the sequence jointly using CRF [31].
Given an input word sequence x = x1 , x2 , …, xn where each element is a vec-
tor representation of each word in the sequence. Another sequence y = y1 , y2 …,
yn represents the sequence of labels for the word sequence x. The probabilistic
sequence model for given sequence of words x given as the conditional probability
label sequence given the word sequence. It could be given as:
n  
 z i + b y,y 
T
exp W y,y
i=1
p(x; W, b) =  n   (4)

y ∈γ (z) i=1 exp W T
y,y  z i + b y,y 

where y’ and y are the label pair. Wand b are the weight vectors and bias corresponding
to the language pair. The training of CRF is executed using maximum likelihood
estimation. For training set pair (xi , yi ), the log likelihood is given as;

L(W, b) = log p(y|z; W, b) (5)
i

The objective is to choose the parameters such that the log-likelihood is maxi-
mized. To retrieve the sequence of labels with highest probability, we use:

y ∗ = argmax p(y|z; W, b) (6)

4 Corpora Statistics

Data utilized for forming distributional representation model (word embedding) is


created with text content web crawled from the sources like GENIA [32] and Journal
of Medical Case Reports (BMC) and i2b2 [33]. The closed set small corpus like
GENIA corpus and i2b2 corpus includes the clinical data for performing POS tagging
and NER. Contents from the medical journal is collected by web crawler which is a
program used for accumulating relevant data from the internet. Web crawler fetches
the documents corresponding to the seed URL and parses links in the seed page and
place each URL into a queue. These links are used to collect the text data.
Uniform cleaning is applied for crawled data (training) as well as testing data
(GENIA and i2b2). The uniform cleaning applied for removing irrelevant content
from the raw data. It includes handling special characters like ± , Latin alphabets,
12 H. B. Barathi Ganesh et al.

Table 1 Experimented data


Crawled data GENIA i2b2
statistics
BMC corpus clinical
Number of 4109 67 –
documents
Number of 434,099 23,467 16,107
sentences
Number of 7,861,071 439,403 201,015
words
Average 18.10 18.72 12.48
word/sentence

and etc. which are not encoded by UTF-8 encoding scheme. We have also removed
the classes with negligible count: predeterminers (PDT), interjection (UH) and ?/= .
Statistics about the corpora utilized in creating the MedNLU is shown in Table 1.
GENIA [32] corpus is used for creating parts of speech tagging model in MedNLU.
The i2b2 clinical [33] corpus is annotated with 3 types of clinical tags, which are
named as problem, test and treatment. These tags were comprised of successive
words also. This corpus consists of 16107 sentences of patient discharge summary.
The i2b2 clinical corpus follows the Inside—Outside—Beginning (IOB) format.

5 Experiments and Observations

The sequential modeler for MedNLU has been constructed by integrating CNN,
BLSTM and CRF. The systematic diagram is given in Fig. 2. This experiment is per-
formed with the system having the following configuration: RAM 32 GB, NVIDIA
GEFORCE GTX1080, i7 Processor, Python 3 and Ubuntu 16.04 LTS.
For every word, character-level representation (i.e. 30 × 1 vector) is computed
using CNN as given in Fig. 2. For each of these embeddings, we fine-tune the initial
embeddings by modifying them during weights updates of the neural network model
by back-propagation. These character embeddings are concatenated with the corre-
sponding word embedding (300 × 1) and a feature vector (7 × 1). This concatenated
vector has been fed to the BLSTM followed by the CRF layer. For the performance
observation purpose, we also integrated the BLSTM with the typical SoftMax layer.
The dropout has been applied in multiple levels during the computation of char-
acter embeddings. It applied before inputting to CNN as well as on input and output
vectors of BLSTM. The dropout rate has been fixed as 0.25 for all dropout layers
through all the experiments. This is shown in Fig. 1. Optimization of parameters are
performed with mini-batch Adam optimizer with batch size 32 and early stopping 5.
We have used pre-trained word embedding generated from general news text
(Google embedding), as well as the embedding model developed from clinical texts
MedNLU: Natural Language Understander for Medical Texts 13

(Health embedding). Python Gensim library is used for developing health embed-
dings. From word2vec, the continuous bag of words model with the following param-
eters are used to compute the health embeddings: minimum word frequency as 1,
embedding dimension as 300 and window size as 4. The corpus used for creating
health embedding model has explained under corpus statistics. The systematic dia-
gram is given below in Fig. 4.
The created word embeddings are evaluated through qualitative and quantita-
tive analysis. We have used cosine distance to inference the similarity among the
words/phrases for performing qualitative evaluation. The top five similar words were
taken with respect to the target word for further analysis. The qualitative analysis
results are given in Following Table 2.
In qualitative analysis, health embedding (vectors computed for the clinical text)
is validated by using the data from two sequential modeling tasks: POS tagging and
NER. Qualitative evaluation is performed based on the three different categories
disorder, symptoms and drug name.
t-distributed Stochastic Neighbour Embedding (t-SNE) whose primary purpose
is used for visualizing high parameter data. There are techniques like multidi-
mensional scaling, sammon mapping graph-based techniques are developed earlier
before t-SNE. Here D-dimensional data is visualized into two dimensional or three-
dimensional data.
In t-SNE, the euclidean distances between vectors are converted into a probability
distribution such that similar vector will have the high probability. The t-SNE map is

Fig. 4 Model diagram for creation of health care embeddings


14 H. B. Barathi Ganesh et al.

Table 2 The performance of health embeddings through quantitative analysis


Category Target Health embedding Google Word Wikipedia
word/phrase embedding embedding from
glove
Disorder Diabetes Psoriasis Diabetics Hypertension
Neutropenia Diabetic Obesity
Schizophrenia Hypertension Arthritis
Epilepsy Diabetes mellitus Cancer
Obesity Heart Alzheimer
Symptom Dyspnea Fatigue Dyspnoea Shortness
Diarrhea Pruritus Breathlessness
Nausea Nasopharyngitis Cyanosis
Arthralgia Symptom severity Photophobia
Dizziness Rhinorrhea Faintness
Drug Aspirin Azathioprine Dose aspirin Ibuprofen
Rifampicin Ibuprofen Tamoxifen
Capecitabine Statins Pills
Doxorubicin Statin Statins
Fluconazole Calcium Medication
supplements

generated by keeping the target word in different categories and the same is shown
in Fig. 5. t-distributed stochastic neighbour embedding maps the vector in the high
dimensional space into the 2-dimension space. Here in this paper, the vectors from
the health embeddings those are close in the vector space can be visualized by t-SNE
map.
In the above figure, data points in the orange, blue and green colors are representing
the respective categories like disorder, drug, and symptoms. From Fig. 5, we can
clearly observe that the computed health embedding maps the different categories
(disorder, drug, and symptoms) into different clusters.
We modelled our analysis into classification task for performing quantitative anal-
ysis. As described in Ghanny et al. [34], we then evaluated the word embedding on a
POS tagged representation of GENIA corpus as given in Fig. 4 to ensure the quality
of representation.
In POS tagging task we have totally 26 classes and those were mapped to meta
tags with the count of 12 classes. In entity recognition task we have 7 classes. The
statistics about the classes are given in Tables 3 and 4. The obtained quantitative
results are given in the following Fig. 6. The results were obtained using 10 × 10
fold cross validation by having LSTM as a classifier.
Finally, we ended up with performance results for four architecture i.e. ([Google
Embedding or Health Embedding] + BLSTM + [CRF or SoftMax]). The observed
results for POS task for these combinations are given in Fig. 6. The performance of
sequence modeler on NER corpus are shown in Fig. 7a, b and performance on POS
corpus are shown in Fig. 8a, b.
The chunking and Parsing are performed through regular expression parser. A
set of rules defined for extracting Clauses (S), Prepositional Phrases (PP), Verb
MedNLU: Natural Language Understander for Medical Texts 15

Fig. 5 t-SNE map of health embeddings computed through word2vec CBOW model

Table 3 POS corpus: target


POS Tag Count POS Tag Count
class statistics
SYM 3217 IN 12,414
CC 4122 CD 4672
JJ 8454 VB 11,822
NN 36,085 RB 3034
WDT 442 PRP 807
DT 7171

Table 4 NER corpus: Target


NER Tag Count
class statistics
B-problem 19,664
I-problem 27,938
B-test 13,831
I-test 11,898
B-treatment 14,185
I-treatment 12,053
O 291,706
16 H. B. Barathi Ganesh et al.

Fig. 6 The performance of health embeddings through quantitative analysis

Phrases (VP) and Noun Phrases (NP). These commonly occurring grammar rules
are extracted from POS tagged corpus based on the frequency of its occurrence. The
resultant parsed tree from the chunking is also a part of MedNLU.
It can be observed that CRF performs better than the SoftMax in both the NER
and POS tagging tasks. This ensures the need of sequence modeler at the output
layer than the typical SoftMax layer. The time duration takes for building CRF and
SoftMax based models are almost the same. Due to this we have not given the details
about time consumption for building proposed sequence modeler.
Google embedding wins the race by attaining better results than the health embed-
dings in both the tasks. Hence this ensures that, the sequence modeler is independent
to the requirement of domain knowledge. It can also be inferred that the character
embeddings include the information about medical words that are not present in the
Google embeddings.
We also compared the results obtained by the other models on experimented
corpora. The sequence modeler able to achieve the state of the art performance on
i2b2 clinical corpus. The statistics are given in the following Table 5. MedNLU
able to achieve nearly 8% of improved performance. Due to the non-availability of
standard separated train and test files, we have not compared the results obtained for
GENIA Corpus.

6 Conclusion

An integrated framework for Natural Language Understanding of clinical text has


been developed. The proposed sequential modeler on Part of Speech Tagging and
MedNLU: Natural Language Understander for Medical Texts 17

Fig. 7 a Performance of google embedding—sequence modeler with CRF and SoftMax on NER
b Performance of health embedding—sequence modeler with CRF and SoftMax on NER
18 H. B. Barathi Ganesh et al.

Fig. 8 a Performance of google embedding—sequence modeler with CRF and SoftMax on POS
tagging b performance of health embedding—sequence modeler with CRF and SoftMax on POS
tagging

Table 5 Comparing obtained results with other systems on NER corpus


Methodology Precision (%) Recall (%) F-Score (%)
Semi supervised hidden markov models [24] 83.64 86.88 85.23
Distributional semantics and CRF [25] 85.60 82.70 83.70
CRF-neural embedding [26] 85.10 80.60 82.80
MedNLU 94.60 93.10 93.8
MedNLU: Natural Language Understander for Medical Texts 19

Named Entity Recognition by attains the state of the art performance of 93.8%
as F1 score for i2b2 clinical corpus and achieves 97.29% as F1 score for GENIA
corpus. From the observed results it is clear that the character embedding provides
an additional sub word information about the clinical words. Character Embedding
along with the word embedding (computed for general text) solves the requirement
of clinical text-based word embedding model. The sub features extracted from the
clinical words also contributes towards the objective. The proposed MedNLU, has
the capability of performing Named Entity Recognition, Part of Speech Tagging,
Parsing and Chunking on clinical texts.
These successive results are good enough to extend this framework further towards
building the relation extraction and dependency parsing modules. It is also clear
that the existing annotated corpora are not good enough to drive the deep learning
algorithms. Hence, future work will also be focused on creating large annotated
clinical text-based corpora. Framework will be extended further by including features
that support in finding of Adverse Drug reaction and also findings of disability.

References

1. Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., Liu, S., Zeng, Y.,
Mehrabi, S., Sohn, S. et al.: Clinical information extraction applications: a literature review. J.
Biomed. Inform, 2017
2. Yogatama, D., Liu, F., Smith, N.A.: Extractive summarization by maximizing semantic volume.
In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,
pp. 1961–1966, (2015)
3. Pestian, J.P., Itert, L., Duch, W.: Development of a pediatric text-corpus for part-of-speech
tagging. In: Proceedings of the International IIS: IIPWM‘04 Conference held in Zakopane,
Poland. Springer, pp. 219–26 (2004)
4. Pakhomov, S.V., Coden, A., Chute, C.G.: Developing a corpus of clinical notes manually
annotated for part-of-speech. Int J Med Inform. 75(6), 418–429 (2006)
5. Hirschman, L., Morgan, A.A., Yeh, A.S.: The MITRE Corporation. Rutabaga by any other
name: extracting biological names. J. Biomed. Inform. 35(4), 247–259 (2002)
6. Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.:
Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component
evaluation and applications. J. Am. Med. Inform. Assoc. 17(5), 507–513 (2010)
7. Boag, W., Wacome, K, Naumann, T., Rumshisky, A.: Cliner: a lightweight tool for clinical
named entity recognition. AMIA Joint Summits on Clinical Research Informatics (poster)
(2015)
8. Fu, X., Ananiadou, S.: Improving the extraction of clinical concepts from clinical records. In:
Proceedings of BioTxtM14 (2014)
9. Lv, X., Guan, Y., Yang, J., Wu, J.: Clinical relation extraction with deep learning. International
Journal of Hybrid Information Technology, pp. 237–248 (2016)
10. Wu, Y., Jiang, M„ Lei, J., Xu, H.: Named entity recognition in Chinese clinical text using deep
neural networks. Studies in Health Technology and Informatics, pp. 624 (2015)
11. Dong, X., Qian, L., Guan, Y., Huang, L., Yu, Q., Yang, J.: A multiclass classification method
based on deep learning for named entity recognition in electronic medical records. In: Scientific
Data Summit (NYSDS), IEEE, pp. 1–10 (2016)
12. Pakhomov, S.V., Finley, G., McEwan, R., Wang, Y., Melton, G.B.: Corpus domain effects on
distributional semantic modeling of medical terms. Bioinformatics 32(23), 3635–3644 (2016)
20 H. B. Barathi Ganesh et al.

13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of
words and phrases and their compositionality. In: Advances in Neural Information Processing
Systems, pp. 3111–3119 (2013)
14. Ganguly, D., Roy, D., Mitra, M., Jones, G.J.: Word embedding based generalized language
model for information retrieval. In: Proceedings of the 38th International ACM SIGIR Con-
ference on Research and Development in Information Retrieval, ACM, pp. 795–798 (2015)
15. Ganesh, H.B., Kumar, M.A., Soman, K.P.: From vector space models to vector space models of
semantics. In: Forum for Information Retrieval Evaluation, Springer, Cham, pp. 50–60 (2018)
16. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in
biomedical named entity recognition tasks. BioMed research International, 2014 (2014)
17. Jagannatha, A., Chen, J., Yu, H.: Mining and ranking biomedical synonym candidates from
wikipedia. In: Proceedings of the Sixth International Workshop on Health Text Mining and
Information Analysis, pp. 142–151 (2015)
18. Gurulingappa, H., Toldo, L., Schepers, C., Bauer, A., Megaro, G.: Semi-supervised information
retrieval system for clinical decision support. In TREC (2016)
19. Peter, D.T.: A uniform approach to analogies, synonyms, antonyms, and associations. In: Pro-
ceedings of the 22nd International Conference on Computational Linguistics, Vol. 1. Associa-
tion for Computational Linguistics, pp. 905–912 (2008)
20. Landauer, T.K., Dumais, S.T.: A solution to plato’s problem: the latent semantic analysis theory
of acquisition, induction, and representation of knowledge. Psychol. Rev. 104(2), 211 (1997)
21. Liu, K., Chapman, W., Hwa, R., Crowley, R.S.: Heuristic sample selection to minimize ref-
erence standard training set for a part-of-speech tagger. J. Am. Med. Inform. Assoc. 14(5),
641–650 (2007)
22. Fan, J.W., Prasad, R., Yabut, R.M., Loomis, R.M., Zisook, D.S., Mattison, J.E., Huang, Y.:
Part-of-speech tagging for clinical text: wall or bridge between institutions?” In: AMIA Annual
Symposium Proceedings, vol. 2011. American Medical Informatics Association, p. 382–391
(2011)
23. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: probabilistic models for
segmenting and labeling sequence data. ICML. pp. 282–289 (2001)
24. de Bruijn, Berry, Cherry, Colin, Kiritchenko, Svetlana, Martin, Joel, Zhu, Xiaodan: Machine-
learned solutions for three stages of clinical information extraction: the state of the art at i2b2
2010. J. Am. Med. Inform. Assoc. 18(5), 557–562 (2011)
25. Jonnalagadda, S., Cohen, T., Wu, S., Gonzalez, G.: Enhancing clinical concept extraction with
distributional semantics. J. Biomed. Inform. 45(1), 129–140 (2012)
26. Wu, Y., Xu, J., Jiang, M., Zhang, Y., Xu, H.: A study of neural word embeddings for named entity
recognition in clinical text. In: AMIA Annual Symposium Proceedings, vol. 2015, p. 1326.
American Medical Informatics Association (2015)
27. Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional lstm-cnns. arXiv preprint
arXiv:1511.08308 (2015)
28. Ganesh, H.B., Kumar, M.A., Soman, K.P.: Distributional semantic representation in health
care text classification. In: International Conference on Forum of Information Retrieval and
Evaluation, pages 201–204, 2016
29. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A..: Transition based dependency
parsing with stack long short-term memory. In: Proceedings of ACL-2015 (Volume1: Long
Papers), pages 334–343 (2015)
30. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional lstm and
other neural network architectures. Neural Networks 18(5–6), 602–610 (2005)
31. Settles, B.: Biomedical named entity recognition using conditional random fields and rich
feature sets. In: Proceedings of the COLING 2004 NLPBA,. 2004, pp 104–108 (2004)
32. Verspoor, K., Cohen, K.B., Lanfranchi, A., Warner, C., Johnson, H.L., Roeder, C., Choi, J.D.,
Funk, C., Malenkiy, Y., Eckert, M., et al.: A corpus of full-text journal articles is a robust eval-
uation tool for revealing differences in performance of biomedical natural language processing
tools. BMC Bioinformatics 13(1), 207 (2012)
MedNLU: Natural Language Understander for Medical Texts 21

33. Uzuner, O., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/VA challenge on concepts, asser-
tions, and relations in clinical text. J Am Med Inform Assoc. Sep-Oct 18(5), 552–556 (2011)
34. Ghannay, S., Favre, B., Esteve, Y., Camelin, N.: Word embedding evaluation and combination.
In: Proceedings of the Tenth International Conference on Language Resources and Evaluation
(LREC 2016), pp. 300–305 (2016)
35. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! a systematic comparison of context-
counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting
of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 238–247
(2014)

H. B. Barathi Ganesh Current Chief Technology Officer at Arnekt Solutions Pvt Ltd., a pioneer-
ing Artificial Intelligence technologist with 5+ years of experience in implementing AI-enabled
technologies and enterprise systems that facilitate business processes and strategic objectives.
Continuous practitioner in blending of technology and business requirements for defining pow-
erful future business strategies, which were evidenced by cost-effective, high-performance ser-
vices and products. Has broader AI expertise in the domains like Automotive, BFSI, Education,
E-Commerce, Logistics, Manufacturing, and Retail.

U. Reshma Principal Engineer—Researcher in the field of Natural Language Processing, Conven-


tional Machine Learning and Deep Learning. Has sound fundamental understanding in sub-fields
of Artificial Intelligence.

K. P. Soman Currently serves as Head and Professor at Amrita Center for Computational Engi-
neering and Networking (CEN), Coimbatore Campus. He has 300+ publications in national &
international journals and conference proceedings. He has organized a series of workshops and
summer schools in Advanced signal processing using wavelets, Kernel Methods for pattern clas-
sification, Deep learning, Big-data Analytics etc. for industry and academia. Authored books on
“Insight into Wavelets”, “Insight into Data mining”, “Support Vector Machines and Other Kernel
Methods” and “Signal and Image processing-the sparse way”, published by Prentice Hall, New
Delhi, and Elsevier.

M. Anand Kumar Received his Ph.D. in Machine Translation from Amrita Center for Compu-
tational Engineering and Networking (CEN), Coimbatore Campus. Currently serving as an assis-
tant professor at the Department of Information technology, National Institute of Technology, Kar-
nataka. He has 100+ publications in national and international journals and conference proceed-
ings. His research interests include Natural Language Processing, Text Mining, Deep Learning
and Transfer Learning.
Deep Learning Based Biomedical Named
Entity Recognition Systems

Pragatika Mishra, Sitanath Biswas and Sujata Dash

Abstract In this chapter, we are proposing a really crucial downside known as


medicine Named Entity Recognition system. Named entity recognition could be a
vital mission in linguistic communication process referring to artificial intelligence,
information Retrieval and data Extraction. Linguistic communication process could
be a subfield of engineering, computer science and data engineering that deals that
the interaction between the pc and human language. It deals with the method and
analyse the language information. It’s a pc activity during which computers square
measure subjected to know, alter and analyse which has automation of activities,
strategies of communication. One amongst the vital elements of linguistic commu-
nication process (NLP) is called Entity Recognition (NER), which is employed to
search out and classify the expressions of specific which means in texts, written in
linguistic communication. The various varieties of named entities includes person
name, association name, place name, numbers etc. During this book chapter we tend
to area unit solely handling medicine named entity recognition (Bio-NER) that could
be a basic assignment within the conducting of medicine text terms, like ribonucleic
acid, cell type, cell line, protein, and DNA. Biomedical NER be one amongst the fore-
most core and crucial task in medicine data extraction from documents. Recognizing
or characteristic medicine named entities looks to be tougher than characteristic tra-
ditional named entities. During this book chapter we tend to area unit victimization
Deep learning formula that is additionally called deep structural learning or grad-
able learning. It’s a division of a broader unit of machine learning ways supported
learning knowledge representation conflicting such task algorithms. This kind of
learning is supervised, semi supervised or unsupervised. Deep learning model area
units are largely inspired by IP and communication pattern in biological nervous

P. Mishra
Gandhi Institute for Technology, Bhubaneswar, India
e-mail: [email protected]
S. Biswas · S. Dash (B)
North Orissa University, Baripada, India
e-mail: [email protected]
S. Biswas
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 23


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_2
24 P. Mishra et al.

systems nonetheless with various variations from structural and purposeful func-
tions of biological brains. For experiment and analysis, we’ve used GENIA Corpus
that was created by a gaggle of researchers to develop the analysis of knowledge and
text mining system in biological science. It consists of one, 999 MEDLINE abstracts.
The GENIA Corpus has been loosely employed by linguistic communication process
community for improvement of linguistics search system and institution Bio human
language technology tasks. During this analysis, we tend to propose a multi-tasking
learning arrangement for Bio-NER that supports NN models to avoid wasting human
effort. Deep neural spec that has several layers and every layer abstract options pri-
marily based on the standard generated by the lower layers. After comparing with the
results of various experiments like Saha et al.’s (Pattern Recogn. Lett 3:1591–1597,
2010) with a Precision of 68.12, Recall 67.66 and F-Score 67.89; Liao et al.’s
(Biomedical Named Entity Recognition Based on Skip-Chain Crfs. pp. 1495–1498,
2012) with a Precision of 72.8, Recall 73.6 and F-Score73.2; ABNER (A Biomed-
ical Named Entity Recognizer, pp. 46–51, 2013) with a Precision of 69.1, Recall
72.0 and F-Score 70.5; Sasaki et al. (How to Make the Most of Ne Dictionaries
in Statistical Ner. pp. 63–70, 2008) with a Precision of 68.58, Recall 79.85 and
F-Score 73.78; Sun et al.’s (Comput. Biol. Med 37:1327–1333, 2007) with a Preci-
sion of 70.2, Recall 72.3 and F-Score 71.2; Our system has achieved a Precision of
66.54, Recall 76.13 and F-score 71.01% on GENIA normal take a look at corpus,
that is near to the progressive performance using simply Part-of-speech feature and
shows that deep learning will efficiently be performed upon medical specialty Named
Entity Recognition. This book chapter deals with the following section: Introduc-
tion, Literature review, Architecture, Experiment, Results and analysis, conclusion
and future work and References.

Keywords GENIA corpus · Deep learning · Machine learning · Natural language


processing · Named entity recognition

1 Introduction

In this book chapter, we are dealing with a really crucial downside referred to as
medicine Named Entity Recognition system. Named entity recognition may be a vital
mission in language process touching the computational linguistics, info Retrieval
and data Extraction. Language process may be a subfield of technology, computer
science and data engineering that deals with the interaction between the computer
and human language. It deals with the method and analyse the language information.
It’s a laptop activity during which computers are subjected to know, alter and analyse
which incorporates automation of activities, ways of communication. Named Entity
Recognition (NER) is one in all the crucial elements to language process (N L P)
that is employed to search out and categorise expressions of distinctive which means
in texts, written in language. The assorted kinds of named entities embrace person’s
name, organization’s name, place’s name, numbers etc. during this book chapter we
Deep Learning Based Biomedical Named Entity Recognition Systems 25

have a tendency tobe completely coping with medicine named entity recognition
(Bio-NER) that may be primary task in managing medicine text terminologies, like
polymer, cell-type, cell-line, protein, and DNA. Biomedical N.E.R is the most simple
and important task in medicine data extraction from text. Recognizing or distinctive
medicine named entities appears to be additionally tricky than to recognize traditional
named entities. Biomedical named entity recognition faces five challenges:
• The numbers of new medical terms are emerging. Therefore it is hard to build a
dictionary which will include the newest term.
• Same word could be categorized into different entity in term of context.
• Length of an entity could be quite long, and may include special characters such
as hyphens.
• Abbreviations are frequently used in the biomedical area that undergo ambiguous
situation.
• In biomedical terminology, normal terms or functional terms are united. Due to
this the term becomes very long. It is challenging for bio-NER to fragment the
sentence with named entities.
Recently, applications of deep learning build approach has been made to bio-
medical named entity recognition (Bio-N.E.R) which has shown promising outputs.
However, an abundant/huge quantity of training data or the scarcity/lack of data can
hamper the performance of deep learning approaches. Deep learning is also known
as deep structural learning or hierarchical learning. It is a branch of a broader unit
of machine learning methods based on learning data representations, as opposed
to specified task algorithms. Deep learning models are mostly enthused by com-
munication patterns and information processing in biological nervous systems yet
has various differences from the functional and structural property of the biological
brain (human brain). Also, Deep learning methods such as deep belief networks,
deep neural networks and recurring neural networks are applied to areas like audio
recognition, computer vision, former social network filtering, bioinformatics, natural
language processing, etc. where they have shown results equivalent to and in certain
cases advanced to human experts. This type of learning can be: Supervised learn-
ing, which is a machine learning chore of learning a function that maps an input and
output based on example input–output pairs. Semi-supervised learning, which is a
class of machine learning chore and technique which also make use of un-labelled
data for training a small quantity of labelled data with a large amount of unlabeled
data. Unsupervised learning, which is a term used for Hebbian learning, associated
to learning without a teacher, also known as self organisation and a method of mod-
elling the probability density of inputs. In this research work, we draw on a method
which is based on Convolution Neural Network (CNN)
Named Entity Recognition (NER) is that computerised process of finding out
plus labelling entities in a given text. Within the medicine domain, typical entity
varieties embody illness, chemical, cistron and macro molecule. Biomedical NER
(BioNER) is a necessary structural block of the various down-stream text mining
applications like extraction of drug-drug interactions [1] and disease-treatment rela-
tions. Bio-NER be additionally used once in the formation of a classy medicine
26 P. Mishra et al.

entity search tool [2] that allows user to cause advanced query to go looking for bio-
entities. NER, in medicine text-mining is concentrated chiefly on the wordbook, the
rule and the machine learning-based approach [3–5] word book based mostly sys-
tems have an easy and insightful structure however they cannot handle undetected
entity or polysemantic word, leading to lower recall [3, 4]. Additionally, building and
maintaining a comprehensive and latest wordbook includes a substantial quantity of
labour-intensive work. The statute primary approach is a lot of a scendable; how-
ever it wants manually crafted featured sets to suit a model to a dataset [5]. These
dictionary-based and ruled approach are able to do high preciseness [2] however
will manufacture incorrect predictions once a brand new word, that isn’t within the
coaching knowledge, seems for the period of a sentence (out-of-vocabulary problem).
Habibi et al. [6, 7] utilised character-level word embedding to confine character-
istic, like writing options, of medical specialty entities and achieved progressive per-
formance, demonstrating the efficiency of character-level word embeddings in Bio-
NER. Even though these models have shown some potential results, NER remains a
really difficult chore within medical specialty domain for all the subsequent reasons.
First, a restricted quantity of coaching knowledge is offered for BioNER task. On
the contrary, the J.N.L.P.B.A corpus [8] contain annotation of solely genes and pro-
teins. Hence, {the knowledge|the info|the information} for every entity kind includes
solelya little section of the overall quantity of annotated data. Multi-task learning
(MTL) may be a technique for coaching one model for numerous tasks at an equiv-
alent time. MTL will influence totally diverse datasets that area unit composed for
various however connected tasks [9]. Though extraction of genes is totally dissimi-
lar from extraction of chemicals, each task needs learning of some general options
which will facilitate perceive the linguistic expressions for medicine texts. Student
et al. Since M.T.L based mostly models square measure is trained on various styles of
entities and bigger coaching knowledge, they need a broad exposure of varied med-
ical specialty entities, which as expected ends up in higher recall. On the contrary,
because the M.T.L models square measure is trained on combos of various entity
varieties, they have an inclination towards own issue in differentiating amongst entity
varieties, leading to low preciseness.
Another reason NER is troublesome within the medical specialty domain is the
associate entity might be tagged as completely unlike entity sorts counting on its
matter context. As an example, BiLSTM-CRF based mostly models for illness entities
erroneously labeled the factor name “BRCA1” as an illness entity as a result of there
are illness names like “BRCA1 abnormalities” or “Brca1-deficient” within coaching
sets. In addition, the coaching set that annotates “VHL” (Von-Hippel-Lindau illness)
as disease entity confuses the model as a result of VHL be additionally used as
factor name, since the alteration of this factor causes VHL illness. Therefore, every
model is Associate in nursing professional in its own domain and helps improving the
accuracy rate by investing the multi-domain data from the opposite model. Driven
by the works of Collobert [10], we have a tendency to tend to place up a neural
network model in support of medication N.E.R mission. Our works gift that deep
learning can expeditiously be performed on drugs N.E.R. Our design achieves getting
Deep Learning Based Biomedical Named Entity Recognition Systems 27

ready towards progressive performance on GENIA corpus that may be a well-liked


commonplace corpus has been adopted by several analysis teams as assessment.

2 Literature Review

In the field of Biomedical, the level of data has been produced each day is Giga-
byte or even Terabyte. The development of the medicine analysis space has been
driven into some ways by such an enormous quantity of information. Medicine
Named Entity Recognition could be an important initial step for medicine scien-
tific discipline. Medicine Named Entity Recognition is far trickier than the final
Named Entity Recognition thanks to complexities like daily dynamic cluster mem-
bers, distinguished boundaries and irregularity in expression [11–15]. The popular-
ity of genes, drawing out a listing of exclusive identifiers for human genes and also
the extraction of physical macromolecule—protein interaction annotation—relevant
info. AN even-handed exactness and recall discovered in favour of the submission
of the cistron mentioned for cistron standardization task. Within the case of protein-
protein interaction task completely different results were obtained looking on the
annotation extraction progress. The final characteristic discovered task was the group-
ing of system outputs showed results higher than a single system that light-emitting
diode to the event of the foremost text mining meta-server in the context [12]. There
has been numerous supervised technique that are accustomed learn medicine names
entity recognition issues like: MEMMs (Maximum Entropy Markov Models) [16]
or conditional Markov model could be a graphical model that mixes HMM and most
entropy models for sequence labelling. MEMMs notice applications in language
processing; a part of speech tagging in specific likewise as info extraction.
HMM (Hidden mathematician|Markov|Andre Markov|mathematician} Models)
[17] is applied math Markov model into that the system being modelled is taken
to be a procedure with unobserved state. CRF (Conditional Random Field) [18] be
a category of applied math modelling methodology that is applied in recognition
of pattern and machine learning, used for structured prediction. A CRF is capable
of taking context into consideration. For instance, the linear chain CRF predicts
the sequence of labels for sequence of input samples. It’s fashionable in linguistic
communication process. HMM, MEMM, and CRF square measure 3 fashion able
applied math model strategies, often applied to pattern recognition and machine
learning issues. In Hidden Markov Model (HMM) the word “Hidden” depicts the
fact that only the symbols released by the system can be seen. Advantages of Hidden
Markov Model have a strong foundation with efficient learning algorithms. Whereas
disadvantages of Hidden Markov Model include its dependency on every state and its
corresponding observed objects. The sequence labelling, having a relationship with
individual words, also relates to aspects such as sequence length or world context,
etc. Maximum Entropy Markov Model takes into consideration the dependencies
between neighbouring and entirely observed sequence which gives better expression
ability. Conditional Random Field Model addresses the labelling bias issue. With
28 P. Mishra et al.

Comparison to Hidden Markov Model, since CRF does not have strict independent
assumptions as HMM and accommodate any contact information. Thus its feature
design is flexible. Whereas, compared to Maximum Entropy Markov model, CRF
computes the conditional probability of global optimal output notes; it overcomes
the drawbacks of label bias. CRF is additionally applied for entity recognition in
medicine by Settles [2], which accomplish Associate in Nursing F-score of 69.9%
on GENIA corpus in conjunction with varied types of character. Whereas, the HMM
when applied on GENIA corpus to attain preciseness of 65.5% and a recall of 66.9%
[2]. Li conferred 2 faces of medicine named entity recognition model on GENIA
corpus, which is split in 2 components [10, 19]: Named entity detection (NED): this
is often the primary half that is employed to differentiate the non-named entities
(NNE) while not characteristic their sort. Names entity classification (NEC): This is
the second part in which the multi-agent technique or strategy is used, achieving an
F- score of 76.06%.
BioNER is additionally used once building a classy medical specialty entity search
tool [20] that allows the user to cause complicated query to go looking for biomedical
entities. NER in medical specialty text mining concentrates principally on wordbook,
the rule, and the machined learning-based approaches [3–5, 21–23]. Word book based
mostly systems have straightforward and perceptive structure however they cannot
handle undetected entities or ambiguous words, leading to low recall [3, 4]. These
rules and dictionary-based approaches are able to do high preciseness [3] however
will manufacture incorrect predictions once a replacement word, that isn’t within
the coaching in formation, seems in the sentence (not from the vocabulary issues).
The not from the vocabulary issues drawback happen soften particularly within the
medical specialty domain, because it is frequent for replacement medical specialty
term, like a replacement drug name. Habibi et al. [24] utilised character level word
embedding to capture characteristics, like writing options, of medical specialty enti-
ties and achieved progressive performance, demonstrating the efficiency of character
level word embedding in Bio NER. Though these models have shown some potential
results, NER continues to be an awfully difficult job within medical specialty domain
for subsequent reasons. Firstly, a restricted quantity of coaching knowledge is out
there for Bio NER task. The Gold-standard datasets contain annotation of 1 or 2 vari-
eties of entity. As an instance, the NCBI corpus [8] includes annotations of diseases
however not for different varieties of entities like proteins and genes. On the con-
trary, JNLPBA corpus [9] contains annotations of solely protein sand genes. Hence,
{the knowledge |the info| the information} for every entity sort contains solelya little
fraction of the entire quantity of annotated data. Multi-task learning (MTL) could
be a methodology to coach one model for several tasks at a similar time. MTL will
influence completely different datasets that area unit composed for various however
connected tasks [25].
Although extraction of genes is totally different from extraction of chemicals, each
task needs to learn some general options which may facilitate perceive the linguistic
expressions of medical specialty text. [26, 27] achieved performance appreciate that
of the progressive single task NER models. In contrast to the standard MTL ways
that use solelyone static model, CollaboNet consists of several models strained on
Deep Learning Based Biomedical Named Entity Recognition Systems 29

totally diverse datasets for various task. On the contrary, because the MTL models
are trained with mixtures of various entity sorts, they have an inclination to possess
problem in differentiating amongst entity sorts, leading to low preciseness. A further
excuse NER is troublesome within the medicine domain is the associate degree entity
can be tagged as a completely different entity sorts betting on its matter context.

3 Architecture

In this chapter we tend to use a technique that relies on a convolutional neural network
(CNN) that has obsessed some human language technology (Natural Language Pro-
cessing) tasks [28–30]. This convolutional neural spec is projected by Bengio [30]
for the probabilistic language model. Neural Networks were introduced when this
for compound human language technology tasks. We tend to take this into thought
for medicine named entity recognition task. The design is given in Fig. 1. After we
compare the previous over engineering system, the deep learning approach that is
enforced here reduces the enslavement on linguistic ingenuity. In figure one the token
beta, delineate within the right middle of the window, and calculated at instant “t”.
Words contained by the sliding windows that square measure painted as real valued
vectors square measure inputs for this neural network. The node score for every
label of word beta is produced once the transformation of linear layers and sigmoid
layers. At last the count lattice for the sentence be given as output at the top of pro-
cedure. Viterbi algorithmic rule is then applied to induce the label sequence within
the best state. And, the length of our input for CNN is fastened and custom-made to
text information. Firstly, a word wordbook S is be created by massive information
from medical specialty papers. The words in S are reworked into vectors for input
for CNN. Each word within the word book encompasses a preset dimension vector.
Therefore, the words that are altered into vectors are held on within the matrix

M ∈D×|S| (1)

where, D is the vector dimension of the node.


|S| be the size of the vocabulary or dictionary of words. Here, we consider |S| as
finite.
M is randomly initialized and trained with General Neural Network on a huge
number of biomedical text paper files (unlabeled).
M, representation of the real-valued vector, can be obtained in two ways:
• First methodology would be to initialize the vector of every word i with zero (0)
for all the positions and one for the position M to optimize them as parameters
throughout the coaching section [29].
• Second technique is viewing the illustration of word as fraction of coaching a
neural-network-language model [31–34].
30 P. Mishra et al.

Fig. 1 Architecture for neural network


Deep Learning Based Biomedical Named Entity Recognition Systems 31

In this chapter, we have a tendency to optimize the word illustration for such
that Bio-NER task. Here, we have a tendency to use the second technique. Once
comparison the various language models [34–37] we have a tendency to choose
skip-gram neural network language model. This model isn’t the most effective model,
however it’s additional applicable for the coaching of rare words.

3.1 Extraction of Features in Sentence Level

Here, within the case of medical specialty NER tasks, a correct label for every word
within the sentences have to be compelled to tend to suggest if it’s a Bio-NER or
not. These sentences area unit taken as inputs and acceptable labelled sequences
are given as output for every sentence. The lengths of the sentences don’t seem
to be fastened however the input for neural network is fastened. This is often the
explanation; we have a tendency to choose window approach. Therefore, the window
size is determined as ‘k’ at the start and completely different exactness might occur
within the system thanks to it. The dependency data amongst the label of every word
and it’s near words area unit below concern thanks to window approach. Hence, the
words close the labelled word of the window will experience the layer along. When
we study the word at position C, along with the neighbouring words of position in
the range [(C − (k − 1)/2), (C + (k − 1)/2)] shall be pass onto Mapping layer.
Since, every word is reworked into D-dimensional vector via this layer; hence the
input-size for the linear layer-1 is unbroken mounted.

3.2 Criteria of Label

Deep neural network is described as a structural style with many layers. The layers
show characteristics supported the options made by subordinate layers. Betting on
the planning of the neural network, every layer may either be linear perform or
alternative transformation.
A perform fθ (.) describes the 3 layers in our design as
 
f (x) = M 2 g M 1 x + b1 + b2 (2)

where, the matrices


M 1 ∈ H ∗Dk , b1 ∈1∗H , M 2 ∈|L|∗H , b2 ∈1∗|L| and g (.) are sigmoid functions.
H be the number of the hidden units.
– |L| be the size of possible label tags set with the uses of Stochastic gradient ascent,
on a training set T,
the V-dimensional parameter matrix θ(θ1 , θ2 , . . . . . . θv ) will be trained by maxi-
mizing the convenience
32 P. Mishra et al.

log p(y|x, θ )
(x,y)∈T

This is multi-class classification.


Since f (x, l, θ) be used to describe the score for every Ith label in example x,
corresponding to the training window. Hence, f (x, l, θ) is interpreted as conditional
probability P (l | x, θ).
Now, using the softmax regression operation,

e f (x,l,θ )
P(l|x, θ ) =  f (x, j,θ ) (3)
e

We define the log-add operation as


 

logi add z i = log e zi

Hence, the log-likelihood for the training (x, y) is

log p(y|x, θ ) = f (x, y, θ ) − log j add f (x, j, θ ) (4)

Here, f (x[1:T] , l, t, θ) is the output score of sentence x[1:T] —where l is tag, t is


time, θ are the parameters.
The Bio-NER contains a concern of the score for every path of the label because of
the dependency among the tags within the same sentence. Thus for the interpretation
of the output, we must always contemplate the dependencies between the labels. The
score of sentence x, on the trail, be the total of 2 components. First, the mentioned
node scores then the second, transition scores Alj, that is chance of transformation
from label one to j.
Here, θ ∼ is denoted as all the parameters including Alj and θ.
For sentence x[1:T] , the score of the path with the labels l[1:T] is

  T
  
W x[1:T ] , l[1:T ] , θ ∼ = A[t−1]l[t] + f x[1:T ] , l[t] , t, θ (5)
t=1

The log conditional probability for taking the real labelled path, y[1:T] be
 
log P(y[1:T ] |x[1:T ] , θ ∼ ) = W x[1:T ] , y[1:T ] , θ ∼ − log∇l[1:T ] addW (x[1:T ] , l[1:T ] , θ ∼
(6)

During training stage, to maximize the



log P(y[1:T ] |x[1:T ] , θ ∼ ),
(x,y∈T )
Deep Learning Based Biomedical Named Entity Recognition Systems 33

all the parameters θ ∼ are trained over (x[1:T] , y[1:T] ). In inference procedure, the
Viterbi algorithm is used to come across
 
argl[1:T ] maxW x[1:T ] , l[1:T ] , θ ∼ (7)

3.3 Stochastic Gradient

The simplest optimisation algorithms to attenuate a formula ar the Gradient descent


algorithms. Considering the massive value of computation, we have a tendency to
choose the random gradient [38] optimizing technique. The new worth of θ is com-
puted in every iteration step for Associate in nursing example (x, y).

θ ← θ + ∈  log p(y|x, θ ) (8)

 log p(y|x, θ ) shows the gradient of

log p(y[1:T ] |x[1:T ] , θ ) (9)

with respect to θ and


∈ as a small positive constant where ∈ is the chosen learning rate.

4 Experiment

The task of Bio-NER is to acknowledge the entities like diseases, viruses, proteins
and genes and label them not ably in straightforward medicine text. The figure a
pair of shows that each word in a very given sentence be taken as token and allied
with the selected label. Here, the labels O, B-C or I-C not solely indicates the cluster
however conjointly the placement of the token inside the Named Entities, wherever
C is for class, B and that I are locations for starting associate in training inside an
entity severally. There are five label categories: deoxyribonucleic acid, RNA, Protein,
Cell_type and Cell_line. Here O indicates the token that isn’t an element of Named
Entity. The check file is thought as the BIO notation in GENIA Corpus. 11 labels are
enclosed victimization this BIO notation in Fig. 2. These tokens are assigned with
one amongst the 11 labels within the result.
34 P. Mishra et al.

Fig. 2 Biomedical named


entity recognition example

5 Result of Experiment and Its Analysis

Unlabeled knowledge are collected from the PUMBED information mistreatment


bio python and therefore the keywords chosen for looking are ‘drug’, ‘protein’,
‘interaction’, ‘cell_type’ and ‘DNA’. We take into thought 339,074 papers from the
pumbed information and 294,893 documents amongst them have abstracts. Whole
430 MB file is employed as our unlabeled knowledge. We have a tendency to use the
Word2vec tool to use our skip gram language model. As a result, 205,914 words with
600 dimension vectors are incorporated in our word lexicon S. we have a tendency to
additionally take into thought the POS tagger tools. This tool is deliberately planned
for the medicine texts as a result of the options of medicine text are quite completely
different from the opposite articles. GENIA Corpus is applied during this experiment
wherever exactness, recall and F-score are selected for analysis. Exactness be variety
|the amount |the quantity of Named Entities properly detected and divided by the
whole figure of Named Entities known by system. Recall is quantity of Named
Entities properly detected and divided by the quantity of Named Entities enclosed
within the input text that is that the harmonic performance of a system.

2(Pr ecision ∗ Recall)


F − Scor e =
(Pr ecision + Recall)

According to different systems, the classes like super molecule and polymer have
the very best F-Score. The number of each entity within the coaching knowledge is
shown in Table 1. Once examination Table 1 and Fig. 3, the class ‘cell-type has the
tiniest coaching knowledge set however has highest preciseness and second highest
F-Score. In figure four, square measure able to see that there are twelve-tone music
‘B-DNA’ wrong labelled words into ‘B-Protein’ that includes a larger count than
different medicine classes.
We found 2 major reasons when researching on the coaching data: medical spe-
cialty Named Entities are composed of the many nested named entities. as an example,
Deep Learning Based Biomedical Named Entity Recognition Systems 35

Table 1 Major entity


Category Precision Recall F-Score
categories and performances
Protein 0.6389 0.8062 0.7128
DNA 0.6427 0.6761 0.6590
RNA 0.6050 0.6102 0.6076
Cell_type 0.7344 0.7356 0.7351
Cell_line 0.5008 0.6160 0.5524
Overall 0.6486 0.7610 0.7004

Fig. 3 Major entity


categories and training data
of words contained for each
category

words like, ‘Viruses’, ‘Epstein-Barr’, ‘protein’, ‘cell’, ‘EBV’ are in each the entities
but belong to totally different classes in Fig. 4.
It is found that these words might come into view at different positions according
to categories. The BMESO notation is applied to utilize this information since BIO
notation cannot present such information (Fig. 5).
BMSEO notation is analogous to BIO notation which supplies elaborated depic-
tion of the position of every word within the entities. Here B indicated the start of
entity and E is the finish of that object. Words amid B and E are denoted as M. If
the entity is singular, it shall be denoted as S. Second reason is the need of training
36 P. Mishra et al.

Fig. 4 Error distribution

Fig. 5 NERs and labels examples

set of the labels as well as the entities that don’t come into view in training set. The
ultimate results on GENIA file is listed in Table 2.
After comparing with the results of various experiments like Saha et al.’s [39]
with a Precision of 68.12, Recall 67.66 and F-Score 67.89; Liao et al.’s [40] with
a Precision of 72.8, Recall 73.6 and F-Score 73.2; ABNER [41] with a Precision
of 69.1, Recall 72.0 and F-Score 70.5; Sasaki et al.’s [42] with a Precision of
68.58, Recall 79.85 and F-Score73.78; Sun et al.’s [43] with a Precision of 70.2,
Recall72.3 and F-Score71.2; Our system has achieved a Precision of 66.54, Recall
Deep Learning Based Biomedical Named Entity Recognition Systems 37

Table 2 Comparison with


Teams Precision Recall F-Score
state of the art systems
Saha et al. [39] 68.12 67.66 67.89
Liao et al. [40] 72.8 73.6 73.2
ABNER [41] 69.1 72.0 70.5
Sasaki et al. [42] 68.58 79.85 73.78
Sun et al. [43] 70.2 72.3 71.2
Our results 66.54 76.13 71.01

76.13 and F-score 71.01% on GENIA standard test corpus, which be nearly the
state-of-the-art performance. However, the biomedical dictionary changes every day
and will be different due to changing tasks and corpora.

6 Conclusion and Future Scope

In this book chapter, we have enforced a compound layer neural network on medicine
Named Entity Recognition system. Results that are achieved square measure getting
ready to state-of-art performance. There’s a scope of any improvement of the per-
formance of neural network. The belief of the left boundary word is crucial and not
word or the subsequent words are tagged incorrectly too. Reverse recognition with
forward recognition can be explored in future for better accuracy of the system.

References

1. Lim, S., Lee, K., Kang, J.: Drug drug interaction extraction from the literature using a recursive
neural network. PLoS ONE 13(1), e0190926 (2018)
2. Lee, K., Hwang, Y., Kim, S., Rim, H.: Biomedical named entity recognition using two-phase
model based on Svms. J. Biomed. Inform. 37(6), 436–447 (2004)
3. Hettne, K.M., Stierum, R.H., Schuemie, M.J., Hendriksen, P.J., Schijvenaars, B.J., Mulligen,
E.M.V et al.: A dictionary to identify small molecules and drugs in free text. Bioinformatics.
25(22), 2983–2991 (2009)
4. Song, M., Yu, H., Han, W.S.: Developing a hybrid dictionary-based bio-entity recognition
technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)
5. Fukuda, K.I., Tsunoda, T., Tamura, A., Takagi, T. et al.: Toward information extraction: iden-
tifying protein names from biological papers. In: Pac sympbiocomput. vol. 707, p. 707–718
(1998)
6. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for
named entity recognition. In: HLT-NAACL. The Association for Computational Linguistics.
p. 260–270 (2016)
7. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics
resources for biomedical text processing. In: Proceedings of the 5th International Symposium
on Languages in Biology and Medicine, Tokyo, Japan. p. 39–43 (2013)
38 P. Mishra et al.

8. Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recogni-
tion task at JNLPBA. In: Proceedings of the international joint workshop on natural language
processing in biomedicine and its applications. Association for Computational Linguistics.
p. 70–75 (2004)
9. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). Available from: https://
doi.org/10.1023/A:1007379606734
10. Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference
on Artificial Intelligence and Statistics (2011)
11. Dai, H., Chang, Y.C., Tsai, R.T.Z.H., Hsu, W.: New challenges for biological text- mining in
the next decade. J. Comput. Sci. Technol. 25(1), 169–179 (2010)
12. Krallinger, M., Morgan, A., Smith, L., Leitner, F., Tanabe, L., Wilbur, J., Hirschman, L.,
Valencia, A.: Evaluation of text-mining systems for biology: overview of the second biocreative
community challenge. Genome Biol. 9(2) (2008)
13. Dai, H., Huang, C., Lin, R., Tsai, R., Hsu, W.: Biosmile web search: a web application for
annotating biomedical entities and relations. Nucleic Acids Res. 36, 390–397 (2008)
14. Rebholz-Schuhmann, D., Arregui, M., Gaudan, S., Kirsch, H., Jimeno, A.: Text processing
through web services: calling Whatizit. Bioinformatics. 24(2) 296–300 (2008)
15. Si, L., Kanungo, T., Huang, X.: Boosting performance of bio-entity recognition by combining
results from multiple systems. In: Proceedings of the 5th International Workshop on Bioinfor-
matics ACM (2005), pp. 76–83
16. Tsuruoka, Y., Tateishi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S., Tsujii, J.I.:
Developing a robust part-of-speech tagger for biomedical text. In: Advances in Informatics.
Springer (2005), pp. 382–392
17. Vlachos, A.: Evaluating and combining biomedical named entity recognition systems.
In: BioNLP 2007: Biological, Translational, and Clinical Language Processing (2007),
pp. 199–206
18. Li, L., Zhou, R., Huang, D.: Two-phase biomedical named entity recognition using crfs. Com-
put. Biol. Chem. 33(4), 334–338 (2009)
19. Li, L., Fan, W., Huang, D.: A two-phase bio-ner system based on integrated classifiers and
multi-agent strategy. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 897–904 (2013)
20. Lee, S., Kim, D., Lee, K., Choi, J., Kim, S., Jeon, M., et al.: BEST: next-generation biomedical
entity search tool for knowledge discovery from biomedical literature. PLoS ONE 11(10),
e0164680 (2016)
21. Proux, D., Rechenmann, F., Julliard, L., Pillet, V., Jacq, B.: Detecting gene symbols and names
in biological texts. Genome Inform. 9, 72–80 (1998)
22. Tsai, R.T.H., Sung, C.L., Dai, H.J., Hung, H.C., Sung, T.Y., Hsu, W.L.: NERBio: using selected
word conjunctions, term normalization, and global patterns to improve biomedical named entity
recognition. In: BMC bioinformatics. BioMed Central. 7, S11 (2006)
23. Ju, M., Miwa, M., Ananiadou, S.: A neural layered model for nested named entity recognition.
In: Proceedings of the 2018 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). vol. 1,
p. 1446–1459 (2018)
24. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition
and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
25. Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning
approach to biomedical named entity recognition. BMC Bioinform. 18(1), 368 (2017)
26. Zheng, J.G., Howsmon, D., Zhang, B., Hahn, J., McGuinness, D., Hendler, J et al.: Entity
linking for biomedical literature. In: Proceedings of the ACM 8th International Workshop on
Data and Text Mining in Bioinformatics. ACM. p. 3–4 (2014)
27. Tsutsui, S., Ding, Y., Meng, G.: Machine reading approach to understand Alzheimers disease
literature. In: Proceedings of the Tenth International Workshop on Data and Text Mining in
Biomedical Informatics (DTMBIO) (2016)
28. Bengio, R.D.Y., Vincent, P.: A neural probalilistic language model. In: NIPS. vol. 13 (2001)
Deep Learning Based Biomedical Named Entity Recognition Systems 39

29. Westion, R.C.A.J.: A unified architecture for natural language processing: deep neural networks
with multitask learning. In: ICML (2008)
30. Collobert, J.W.R., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, A.P.: Natural language
processing (almost) from scratch. JMLR (2011)
31. YoshuaBengio, R.E.D., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach.
Learn. Res. 3, 1137–1155 (2003)
32. Schwenk, H.: Continuous space language models. Comput. Speech Lang. 21(3), 492–518
(2007)
33. Mikolov, T., Karafiat, M., Burget, L., Cernocky, J., Khudanpur, S.: Recurrent neural network
based language model. In: Eleventh Annual Conference of the International Speech Commu-
nication Association (INTERSPEECH) (2010), pp. 1045–1048
34. Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language
models. In: Proceedings of the 29th International Conference on Machine Learning (ICML-12)
(2012), pp. 1751–1758
35. Collobert, R.: Deep learning for efficient discriminative parsing. In: International Conference
on Artificial Intelligence and Statistics (AISTATS) (2011)
36. Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for
semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for
Computational Linguistics (2010), pp. 384–394
37. Yih, W.T., Mikolov, T., Zweig, G.: Linguistic regularities in continuous space word representa-
tions. In: Proceedings of the 2013 Conference of the North American Chapter of the Association
for Computational Linguistics: Human Language Technologies, (2013) pp. 746–751
38. Bottou, L.: Stochastic gradient learning in neural networks. In: Proceedings of Neuro-Nimes,
vol. 91 (1991)
39. Saha, S.N.S.K., Sarkar, S., Mitra, P.: A composite kernel for named entity recognition. Pattern
Recogn. Lett. 3, 1591–1597 (2010)
40. Liao, Z., Wu, H.: Biomedical named entity recognition based on skip-chain crfs. In: Indus-
trial Control and Electronics Engineering (ICICEE), 2012 International Conference on. IEEE
(2012), pp. 1495–1498
41. ABNER: A Biomedical Named Entity Recognizer (2013), pp. 46–51
42. Sasaki, Y.T.Y., McNaught, J., Ananiadou, S.: How to make the most of ne dictionaries in
statistical ner. In: Proceedings Workshop Current Trends in Biomedical Natural Language
Processing (2008), pp. 63–70
43. Sun, C., Guan, Y., Wang, X., Lin, L.: Rich features based conditional random fields for bio-
logical named entities recognition. Comput. Biol. Med. 37, 1327–1333 (2007)

Pragatika Mishra is an M. Tech in Computer Science and Engineering from Biju Patnaik Uni-
versity of Technology. She has around 2 years of experience in teaching under-graduate students.
Her area of research interests are Artificial Intelligence and Machine Learning.

Sitanath Biswas has done M.E (CSE) from Utkal University and currently pursuing Ph.D. from
North Orissa University, Baripada, Odisha. He is currently working as Asst. Prof. in Gandhi Insti-
tute for Technology, Bhubaneswar, Odisha. He has over 14 years of experience in Teaching and
Research. He has published over 18 research papers in various international Journal of repute. His
area of research is artificial Intelligence and Natural Language Processing.

Sujata Dash received her Ph.D. degree in Computational Modelling from Berhampur University,
Orissa, India in 1995. She is an Associate Professor in P.G. Department of Computer Science and
Application, North Orissa University, at Baripada, India. She has published more than 150 tech-
nical papers in international journals, conferences, and book chapters of reputed publications. She
has guided many scholars for their Ph.D. degrees in computer science. She is associated with many
40 P. Mishra et al.

professional bodies like IEEE, CSI, ISTE, OITS, OMS, IACSIT, IMS and IAENG. She is in the
editorial board of several international journals and also reviewer of many international journals.
Her current research interests include Machine Learning, Distributed Data Mining, Bioinformat-
ics, Intelligent Agent, Web Data Mining, Recommender System and Image Processing.
Disambiguation Model for Bio-Medical
Named Entity Recognition

A. Kumar

Abstract Discovery of biomedical named entities is one of the preliminary steps


for many biomedical texts mining task. In the biomedical domain, typical entities are
present, including disease, chemical, gene, and protein. To find these entities, cur-
rently, a deep learning-based approach applied into the Biomedical Named Entity
Recognition (Bio_NER) which gives prominent results. Although deep learning-
based approach gives a satisfactory result, still a tremendous amount of data is
required for training because a lack of data can be one of the barriers in the perfor-
mance of Bio_NER. There is one more obstacle in the path of Bio_NER is polysemy
or misclassification of the entity in bio-entity. Which means one biomedical entity
might have a different meaning in different places, i.e., a gene named entity may
be labeled as disease name. When Conditional Random Field combined with deep
learning-based approach i.e. Bidirectional Long Short Term Memory (Bi-LSTM),
It mistakenly labeled a gene entity “BRCA1” as a disease entity which is “BRCA1
abnormality” or “Braca1-deficient” present in the training dataset. Similarly, “VHL
(Von Hippel-Lindau disease),” which is one of the genes named labeled as a disease
by Bi-LSTM CRF Model. One more problem is addressed in this chapter, as bio-
med domain, entities are long and complex like cell whose name is “A375M (B-Raf
(V600E)) is a human melanoma cell line”, in this biomedical entity, multiple words
are present, but still it is difficult to find the context information of this particular
bio-entity. For lack of data and entity misclassification problem, this chapter embeds
multiple Bio_NER models. In the proposed model, the model trained with different
datasets is connected so that the targeted model obtained the information by combin-
ing another model, which reduce the false-positives rate. Recurrent Neural Network
(RNN) which is dependent upon the Bi-LSTM gates are introduced to handle the long
and complex range dependencies in biomedical entities. BioCreative II GM Corpus,
Pubmed, Gold-standard dataset, and JNLPBA dataset are used in this research work.

A. Kumar (B)
Department of Computer Science and Engineering, National Institute of Technology Raipur,
Raipur, Chhattisgarh 492010, India
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 41


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_3
42 A. Kumar

Keywords Information extraction · Bio-Medical Name Entity Recognition


(Bio-NER) · Conditional Random Field (CRF) · Deep learning · Machine
Learning (ML) · Long Short Term Memory Network (LSTM) · Text mining

Abbreviations

CRF Conditional random field


LSTM Long short term memory
BILSTM Bidirectional long short term memory
BioNER Biomedical named entity recognition
NER Named entity recognition
MTM Multi task model
WE Word embedding
CE Character embedding

1 Introduction

As the internet is growing day by day, biomedical text data is also increasing, and
for access meaningful and important information from various biomedical text data,
a strong technique is required. In this chapter, Named Entity Recognition (NER),
which is a technique of information extraction (IE) and a part of text mining is used
in this research. The named entity recognition (NER) is required substances to label
the text dataset. NER automatically recognizes name entity in natural language in
the domain of interest. In biomedical text data, many identities are required to label
like gene name, protein name, disease, chemical, medication. In the recent past,
most of the researcher focuses on protein and gene items extraction, whereas some
research going on disease entity extraction. Biomedical named entity recognition is
somewhere difficult to normal named entity recognition, because of the reason that
biomedical named entity recognition have a verity of the alias, abbreviation, verity
in naming convention and organism which may refer as the same name of protein
or genes with term which refer different biological entities. In the example, one
biomedical entity named called p53, which refer as a protein named in one context.
Similarly, p53 also refers to a molecular weight of protein with 53 KD. For
tackle, this type of problem, different approaches of named entity recognition has
been applied on biomedical text name as a rule-based approach, dictionary-based
approach, and machine learning-based approach. As we know, thousands of biomed-
ical pieces of literature published in thousand of the journal every day, which emerges
new terms and spelling variation of an existing biomedical word. The rule-based and
dictionary-based approach of named entity recognition is not suitable because of
Disambiguation Model for Bio-Medical Named Entity Recognition 43

less prediction power. Here ML-based approach comes into the picture. The ML-
based approach is more reliable and robust for biomedical named entity recognition
because it has capabilities to handle a data of high dimensional vector features for
text processing and it can predict new terms or variations depends on learning pat-
tern. For training, a reliable and high performance named entity recognition model
is required, which is capable of fully capture the words in the context. Biomedi-
cal NER was developed for the use of various linguistic features characteristics of
the word like lemmatization and stemming, morphological features like prefixes or
suffixes, word shape, character weight, etc., orthographic features like word forma-
tion, symbols, digits, etc., contextual features like word windows and conjunction.
Binary encoding sets of feature is used for an input of ML to train the algorithm
of Named entity recognition model with the involvement of annotation of named
entity mentioned in training dataset.
Recently past year most of the researchers work only the single domain such
as protein, or a gene or a disease or a chemical name but none of the research as
described in the literature has been done for all four datasets together. This research
chapter considered multiple domains (protein, gene, disease, and chemical name)
together so that we can automatically recognize and labeled the correct entity in a
given text. The main goal of this research is to handle polysemous words, which is
the main cause of lower recall. The model which mention in this chapter is combined
handles all four type of domain dataset in a single model.

1.1 Rule-Based Approach

The rule-based approach deal with the orthographic and morphological structures.
Compare the rule-based approach with the dictionary-based approach; the rule-based
approach performs better as the comparison with the directory-based approach. In, a
character string is used to identify the term followed by the rules and the handcrafter
patterns to concatenation the adjacent words of a named entity. The drawback with
the rule-based approach is, it highly depended upon the domain-based named entities
which have common morphological and orthographic characteristics. Depending on
handcrafted features and inappropriate for a new domain or naming convention,
switch with the other approaches.

1.2 Dictionary-Based Approach

The dictionary-based approach used to find the entire name entity from a given text
by the dictionary, and various terminology has applied on bio-med text mining. An
instance of “HUGO,” is a terminology which provides 21,000 gene entities of human.
UniProt database of the Swiss-Prot, which contains 180,000 records of protein, has
been frequently used. BioThesaurus include the compilation of several of million
44 A. Kumar

genes and protein mapped into the UniProt entries used by cross-reference in the
database of iProclass. Unlike a machine learning-based approach, the significant
advantage of dictionary-based approach over the machine learning-based approach
used an external identifier for built each entry which provides metadata to the anno-
tation extracted names. However, this approach suffers various challenges, including
false positive, due to the cause of ambiguity in the name. Spelling variations and
synonyms covered by the false negative. This approach depends on the curation and
creation of lexicon to the particular domain, which contains millions of entities. To
solve the problem of spelling variation, Tsuruoka et al., use the variant generator and
string searching and method for achieving improved F-Score on GENIA corpora
compared by the exact matching algorithm [1, 2].

1.3 Machine Leaning Based Approach

Machine learning-based approach is one of the best and frequently used in the area
of text mining. BioCreative II protein or gene tasks achieved the best performance by
using a machine learning-based approach. Different type of supervised learning like
a Support Vector Machine (SVM) [3], Hidden Markov Model (HMM) [2], CRF [4],
MEMMs [5], Cased-based [6] have used in named entity recognition. Supervised
learning methods utilizes only annotated text corpus. To resolve the sparseness of
data issue, which encountered during the use of a large set of features on a minimal
dataset of training. Recently few semi-supervised learning methods used for large
size of unannotated text corpora. The vital part of the ML approach is an appropriate
selection of features set, which is represented by the named entity. Mostly used
features are morphological patterns, parts of speech (POS) tagging, orthographical
words pattern formation, tokenization, lemmatization, and conjunction of contextual
features.
In recent, the importance of deep learning-based methods is demonstrated by the
various studies. The ability of Recurrent Neural Network (RNN) is shown by a Sahu
and Anand [7] for biomedical text named entity recognition. The model proposed
by Sahu and Anand is the combination of Conditional Random Field (CRF) with
Bi-directional Long Short-Term Memory (Bi-LSTM), used character level (Cl) and
word level (WL) embedding but they did not describe the benefits of CL and WL
embedding with Bi-LSTM-CRF model. Habib et al., [8], merged the Bi-LSTM-CRF
model Lample et al., [9] with word embedding of Pyysalo et al., [10]. Habibi used CL
based word embedding for capturing characteristics like an orthographic feature of
bio-medical entities. Habibi et al., illustrate the potentiality of character-level word
embedding in Biomedical Named entity recognition. Although the given models
showed the prominent result, still a very challenging task in the area of biomedical
named entity recognition remains. First, to deal with a small amount of training data,
which is available for Biomed NER task. A Gold Standard datasets are consist of only
one or two types of annotation of the entity. NCBI corpora [11] contain only diseases
annotation only, and this corpus does not contain any other types of an entity like
Disambiguation Model for Bio-Medical Named Entity Recognition 45

gene and proteins. Whereas in JNLPBA corpora [12], consist annotations of gene
and proteins only. Therefore, a small amount of total annotated data is compromised
for each entity.
Discuss multitask learning model, which is used to train a single model for mul-
tiple tasks at the same periods. MTL can influence by distinct datasets collected
for different but related task [13]. Although the extraction of gene entity is entirely
different as compare to chemicals entity. Both the task requires the learning of some
standard features which can help to access the linguistic expression of biomedical
text. Crichton et al., [14] developed a multitask learning model which was trained
by the various datasets that contain annotation of different types of entities. MTL
model proposed by Wang et al., [15] performs better as compare to other states of
the art methods, single task named entity recognition models. This much of litera-
ture review inspire us for the proposed model, proposed model is a combination of
multiple models. As previous conventional multitask learning method which only
uses a single-task model. The proposed model trained different datasets for different
tasks. The proposed model is used to train an annotated dataset for a particular type
of entity so that it becomes trained for its own entity type. The major drawback in
multitask learning methods are, it produces high recall and low precision value. So
multitask learning method based models, train multiple types of entities and having
a more extensive training dataset. The coverage of various biomedical entities is
broader, which resulting in a higher recall. On the other side, MTL based models
trained a combination of different type of entities, which create difficulty to differ-
entiate among a different kind of entity, which results lower precision value. One
more reason for that named entity recognition is said to be difficult in the field of the
biomedical domain is that NER labeled as a different entity type based on the textual
context. In this chapter, observed that many false prediction tents to the polysemy
problem. For example, a word can use as a disease name and a gene name. Model
designed to labeled disease entity mistakenly labeled gene as a disease this mistified
problem of entity tends towards the false positive rate. Example, BI-LSTM-CRF
models for labeling disease type of entity incorrectly label the gene name “BRCA1”
as a disease type of entity because there exist disease name as a “BRCA1 abnormal-
ities” or “brca1 deficient” in the training dataset. Besides, in training data set one
annotates as “VHL” (Von Hippel-Lindau disease) is a disease entity which confuses
the model because “VHL” also used for the gene name and the after the mutation of
the gene is converted into a disease. For solving the false positive which is arises due
to the polysemous words, a proposed model is introduced, in which “BRCA1” utilize
the outputs of a chemical and gene models. Once it predicts as a gene, it informs
to the disease model that it identifies “BRCA1” as a gene, so that disease model
will not need to predict as a disease. In the proposed model, each model is trained
individually of its entity type and further train with the output of another model to
train the other kind of entity.
The remaining chapter is organized as follows: Sect. 2 describes the basic concepts
of Conditional Random Field (CRF), Long Short Term Memory (LSTM) and Bi-
directional Long Short Term Memory (BILSTM) which is used in the field of Deep
Learning; In Sect. 3 proposed methodology is present by using biomedical datasets;
46 A. Kumar

In Sect. 4 dataset description and evaluation matrix is described; In Sect. 5 proposed


model is compare with the existing multi-task model (MTM) for biomedical named
entity recognition; and finally Sect. 6, gives the conclusion of this chapter along with
its future work.

2 Background

In the following section, the deep techniques has been applied on biomedical named
entity recognition. The brief introductions about three approaches are as follows.

2.1 Deep Learning Technique

Deep learning is a part of an artificial neural network technique and a subclass of


machine learning. In deep learning, multiple layers used for a higher level of feature
from the input dataset. LSTM (Long Short Term Memory) used in the field of deep
learning and is part of the recurrent neural network. Opposite of feedforward neural
networks, LSTM contain feedback connection also. LSTM is capable to process
single as well as the sequence of data like video or speech. LSTM Contain a cell,
an output gate, an input gate, and a forget gate the cell is used for remembering the
values and the remaining three gates operate the flow of information.

2.1.1 CRF Model

Conditional Random Field is a probabilistic graphical model which is generally used


for sequence tagging task like named entity recognition, Object Recognition.
Part of Speech (POS) Tagging, etc. CRF is conditionally trained a model which
is capable of working with huge amount of nonindependent features. Despite of
discrete classifier, CRF has a special property of considering neighboring examples.

2.1.2 LSTM Network

Long Short Term Memory (LSTM) is the Recurrent Neural Network (RNN) based
neural network which efficiently managed variable-length inputs. Research has
proven that RNN is useful in various NLP tasks like speech recognition, language
modeling, and machine translation [16, 17], RNN based LSTM variants are mostly
used [18]. The proposed model uses the LSTM framework from Graves et al., [16].
The following steps are used to calculate the hidden states by given the output of the
embedding layer.
Disambiguation Model for Bio-Medical Named Entity Recognition 47

it = σ (Wxi xt + Whi ht−1 + bi ) (1)

 
f t = σ Wx f xt + Wh f ht−1 + b f (2)

ct = f t  ct−1 + it  tanh(Wxc xt + Whc ht−1 + bc ) (3)

ot = σ (Wxo xt + Who ht−1 + Wco ct + bo ) (4)

ht = ot  tanh(ct ) (5)

where logistic hyperbolic tangent function and sigmoid function and denoted as tanh
and σ respectively and  use for element-wise product. Forward LSTM is used to
extract represent of input in a forward direction and backward LSTM, which represent
the input in a backward direction. The concatenation of forward and backward LSTM
create the hidden state which is proposed by Schuster and Paliwal [19], and it was
frequently is used in various sequence encoding task.

2.1.3 BILSTM-CRF Network

Bi-directional Long Short Term Memory (BILSTM) network handles backward


dependency issue, long term dependency problem, modeling dependency for adjacent
output tags to enhance the performance of sequence labeling models [15]. Condi-
tional Random Field (CRF) is applied just after the output layer of BI-LSTM to
capture dependencies. BILSTM-CRF network model architecture shows in Fig. 1.
In the input layer, words are taking in the form of tokens, and then these tokens
are passes through the BILSTM layers. The output of BILSTM layer goes to CRF

Output CRF
Layer Algorithm

LSTM LSTM LSTM LSTM Backward LSTM


Bidirectional Long
Short Term Memory
Layer Forward LSTM
LSTM LSTM LSTM LSTM

Input Word2Vec
Layer Representation

Fig. 1 BILSTM-CRF network model architecture


48 A. Kumar

model, where the CRF model tags the input tokens sequence according to tagging
scheme.
The probability of each label given in the sequence S = w1 . . . , wn are calculated
by the following equations.

z t = W y h bi
t + by (6)

p(yt |w1 . . . , wn ; Θ) = so f tmax(z t ) (7)

  exp a j
so f tmax a j = 
k exp ak

where W y and b y shows in Eq. (6) are the parameters of fully connected layer for BIO
tagging scheme, and to calculate the probability of each tag, softmax (.) function is
applied. Based on probability p from Eq. (7) the training objective is to minimize by
following steps.


N
L L ST M = − log p(yt |w1 , . . . , wn ; Θ) (8)
t=1


T
 
LC RF = − A yt−1, yt + z t,yt (9)
t=1

Loss = L L ST M + L C R F (10)

where L L ST M is use for cross entropy loss for the label yt and L C R F stands for the
negative sentence-level log likelihood. A yt−1, yt , z t,yt shows the transition and emission
score respectively and summation of A yt−1, yt , z t,yt gives the tag score.

3 Methodology

This section describes the architecture of proposed model. The combination of mul-
tiple datasets like NCBI [11], BC5CDR [20], JNLPBA [21], BC5CDR [22] are
considered as an input dataset Fig. 2 shows the architecture of the proposed model.
The following steps describe the proposed model.
1. All the biomedical dataset first combine and sent it to the individual model.
2. Each model trains the dataset according to its bio-entity type and send it to the
max pooling function.
3. The function of max pooling is to progressively reduce the dataset size of the rep-
resentation to reduce the number of parameters and computation in the network.
Pooling layer operates on each feature map independently.
Disambiguation Model for Bio-Medical Named Entity Recognition 49

Fig. 2 Architecture of the proposed model

4. The activation function introduced nonlinearity in the output of the neuron. Then
it sends to the targeted model.
5. The proposed model again combine with Conditional random field (CRF) to give
the sequential tagged output.
6. Target output will give the annotated tagged dataset.
In this chapter, Deep Learning concept is introduced. To handle the deep learning-
based method a very big amount of dataset are required and biomedical dataset are
capable to fulfill the requirement of deep learning-based methods.
The advantage of using the deep learning-based methods in the biomedical named
entity recognition is it reduce the probability of error. It will later on discuss in the
evaluation section.
50 A. Kumar

Table 1 Biomedical database description


S. no. Corpus Entity type #annotation #sentences Data size
1. NCBI-disease [11] Disease 6881 7639 793 abstract
2. JNLPBA [21] Gene/protein 35,336 22,562 2404 abstract
3. BC5CDR [14] Disease 12,852 14,228 1500 article
4. BC4CHEMD [15] Disease 84,310 86,679 10,000 abstract

4 Evaluation

4.1 Dataset Description

In this section, four biomedical datasets are considering for experimental research
named as NCBI [11], BC5CDR [20], JNLPBA [21], BC5CDR [22] all four men-
tioned datasets are collected by the Chichton et al., [14]. These four datasets con-
structed from MEDLINE abstracts [23] and each dataset concentrate one of the three
biomedical entity type gene or protein, disease, and chemical. Cell type entity tags
from JNLPBA did not consider in this research. All datasets consist of input sen-
tences of the biomedical entity. JNLPBA contain training and testing dataset while
remaining three contain development, training, and testing dataset. JNLPBA used a
small part of training dataset as a development dataset, which is approximately equal
to the size of test datasets. JNLPBA dataset from Crichton et al., [14] Contain split
sentences. This chapter needs original dataset developed by Kim et al., [20] which
contain more accurate sentence separation. The description of the datasets shown in
Table 1.

4.2 Evaluation Metric

To evaluate the performance of biomedical named entity recognition task, Informa-


tion Extraction (IE) metrics is considered. To calculate precision (P), Recall (R) and
F-Score or F-Measure (F1 ) defined Eqs. (11), (12), (13). Are follows:

TP
precision(P) = (11)
T P + FP
TP
Recall(R) = (12)
T P + FN
2× P × R
F − measur e(F1 ) = (13)
P+R
Disambiguation Model for Bio-Medical Named Entity Recognition 51

where:
• TP stands for (True Positive) = total number of correct entities in sequence.
• TP + FP stand for (False Positive) = total number of ground truth entities in
sequence.
• TP + FN stands for (False Negative) = total number of predictive entities in
sequence.

4.3 Post Processing and Parameters Setting

Post-processing step is applied to correct false BIOES sequences. These steps


increases the precision approx 0.1–0.5%, and F1 score about 0.04–0.3%. Gener-
ally precision, recall, and F1 scores are used to evaluate the performance of the
models.
AdaGrad optimizer [24] in which the initial learning rate 0.01 is exponentially
decayed for each epoch by 0.95. The dimension of the character level embedding
(dchar ) kept 30 and dimension of the character level word embedding (dclwe ) was kept
200 * 3. 300 hidden units for both forward and backward LSTMs are used. Dropout
[25] is applied into two parts of the proposed model: outputs of CLWE (0.5) and
BILSTM (0.3). The minimum batch size of experiment was 10 parameter settings
are mostly same as Wang et al., [3]. Only very few settings differ from the parameter
of Wang like dropout rate etc. parameter is using for validation sets only.

5 Result and Discussion

Table 2, shows the comparison of the experimental result between multitask-learning


model and the proposed model. BC5CDR-Disease dataset is used by Wang et al.,
[26] for an experiment. Wang tests his model repeatedly on BC5CDR-disease dataset

Table 2 Performance comparison


S. no. Datasets Proposed model Wang et al., [26]
Precision Recall F1 score Precision Recall F1 score
1. NCBI-disease 84.48 87.27 86.36 85.86 86.42 86.14
2. JNLPBA 74.43 83.22 78.58 70.91 76.34 73.52
3. BC5CDR-disease 85.61 82.61 84.08 *83.73 *82.93 *83.33
4. BC4CHEMD 90.78 87.01 88.85 91.30 87.53 89.37
5. Average 84.07 85.03 84.47 82.95 83.30 83.09
* The experiment that conducted not to borrow from the orignal paper
52 A. Kumar

to compare his model with the other models. The iterative result denoted by the aster-
isks symbol in the table. The proposed model performs ten times with ten different
initialization and then take the arithmetic mean of all the four datasets to evaluate
the performance of each model.
The proposed model as shown in Table 2, gain higher precision as well as F1 score
as compare to MTM model on all datasets. The proposed model able to improve both
precisions as well as recall. The proposed model also performs better as compared
to the Multi Task Model (MTM) from Wang et al., [15] on four datasets. The pro-
posed model consists of the expert training model for each entity type, which further
enhances biomedical named entity recognition performance.
When the proposed model compared with baseline models, the Proposed model
achieves higher precision on average. Even though if the slight increase in recall,
the increase in precision is more valuable than that of recall when considering the
practical use of the bio-NER model. The strong probability of repeating important
information in a large size corpus, but it may not create any problem in the perfor-
mance of the named entity system it will be compensated in another place. However,
false information and error propagation can affect the entire system.
Recognizing biomedical entity as different bio-entity type is the type of bio-
entity error. For instance, ‘VHL’ a gene recognize as a disease when it was used in
the sentences is a type of bio-entity error. The interesting thing is, bio-entity error
generally occurs when the bio-entities are confusing or entity contain multiple words
(e.g. BRCA1). The error comes out from MTM are 4334 whereas proposed model on
four datasets (BC5CDR-disease, BC4CHEMD, JNLPBA, NCBI) gives 3966 which
is 368 less as compared to the MTM Model. Proposed shows the best performance
in error analysis.
The inaccuracy investigation on STM which is a single LSTM-CRF model shows
a lot of errors while classified bio-entity in JNLPBA. It contains 49.3% of total errors
of JNLPBA. The error investigates on the MTM model is a bio-entity error which
contains 1333 out of 4334 errors which are 38% of incorrect error. The bio-entity
type of error is much greater as comparison to the other type of errors like a span
error which was the most common error type, which contain 38% of incorrect errors.
While most span errors tend from subjective annotations or can be easily fixed by
non-experts, bio-entity errors are difficult to detect, even for biomedical researchers.
Also, for biomedical text mining methods, such as drug-drug interaction extraction,
span errors can cause minor errors. bio-entity errors could lead to entirely different
results.
Disambiguation Model for Bio-Medical Named Entity Recognition 53

In the proposed model, every expert model trained single entity type dataset,
and the output of the training data is concatenated with word embedding. Other
expert models share knowledge to the targeted model as shown in Fig. 2 so that the
bio-entity type error problem will reduce. Table 2 shows and thus, 736 errors are
bio-entity errors, covers 18.6% of all the errors.

6 Conclusions

Conclude this paper with the introduction to the proposed model, which contains mul-
tiple bidirectional LSTM-CRF (BILSTM-CRF) Model for recognition of biomedical
entities. Most of the state of the art methods are capable of handling only a single
type of entity. The proposed model can handle multiple datasets along with higher
F1 Scores. Dissimilar to the multi-task models, Proposed model used various single
task NER models, which relay more information to other models for achieving the
highest precision. To enhance the performance over multi-task models, Proposed
model categorized biomedical entity which is polysemous or which have the same
orthographic feature. As a result show, Proposed model achieving excellent results
as a comparison to the related work proposed on four BioNER datasets in term of
precision, recall, and F1 Score. Although there is some computational overhead in
this proposed model, when it gives an accurate result, it does not make any sense.
This proposed model will be imposed on a geospatial dataset in future.

Acknowledgements The authors would like to thank the National Institute of Technology Raipur
for providing necessary infrastructure and facility for doing research.

References

1. Zhong, H., Hu, X.: Disease named entity recognition by machine learning using semantic type
of metathesaurus. Int. J. Mach. Learn. Comput. 3(6), 494–498 (2014)
2. Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a hidden
Markov model, vol. 1. In: Proceedings of the 18th Conference on Computational Linguistics,
pp. 201–207 (2000)
3. Zhou, G.D.: Recognizing names in biomedical texts using mutual information independence
model and SVM plus sigmoid. Int. J. Med. Inf. 75(6), 456–467 (2006)
4. Lafferty, J., Mccallum, A., Pereira, F.C.N., Pereira, F.: Conditional Random Fields, pp. 282–289
(2001)
5. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information
extraction and segmentation. In: Proceedings of the Seventeenth International Conference on
Machine Learning, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp. 591–598
(2000)
6. Neves, M.L., Carazo, J.-M., Pascual-Montano, A.: Moara: A Java library for extracting and
normalizing gene and protein mentions. BMC Bioinf. 11(1), 157 (2010)
7. Sahu, S.K., Anand, A.: Recurrent neural network models for disease name recognition using
domain invariant features. ArXiv E-Prints. arXiv:1606.09371 (2016)
54 A. Kumar

8. Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word
embeddings improves biomedical named entity recognition. Bioinformatics (Oxford, England)
33(14), i37–i48 (2017)
9. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures
for named entity recognition. ArXiv E-Prints. arXiv:1603.01360 (2016)
10. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics
resources for biomedical text processing. In: Proceedings of the 5th Languages in Biology
and Medicine Conference (LBM’13), pp. 39–44 (2013)
11. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition
and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
12. Goulart, R.R.V., Strube de Lima, V.L., Xavier, C.C.: A systematic review of named entity
recognition in biomedical texts. J. Braz. Comput. Soc. 17(2), 103–116 (2011)
13. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
14. Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning
approach to biomedical named entity recognition. BMC Bioinf. 18(1), 368 (2017)
15. Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-
type biomedical named entity recognition with deep multi-task learning. ArXiv E-Prints. arXiv:
1801.09851 (2018)
16. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks.
In: IEEE International Conference, Department of Computer Science, University of Toronto,
no. 3, pp. 6645–6649
17. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. ArXiv
E-Prints. arXiv:1508.06615 (2015)
18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
19. Song, M., Yu, H., Han, W.-S.: Developing a hybrid dictionary-based bio-entity recognition
technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)
20. Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recogni-
tion task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language
Processing in Biomedicine and Its Applications, Association for Computational Linguistics,
Stroudsburg, PA, USA, pp. 70–75 (2004)
21. Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition
task at JNLPBA, 70–75 (n.d.)
22. Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Lu, Z., Leaman, R., Lu, Y.,
Ji, D., Lowe, D.M., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its
annotation principles. J. Cheminf. 7(Suppl 1 Text mining for chemistry and the CHEMDNER
track), S2–S2 (2015)
23. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition.
In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning,
Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 147–155 (2009)
24. Campos, D., Matos, S., Oliveira, J.L.: Biomedical named entity recognition: a survey of
machine-learning tools. In: Sakurai, S. (ed.) Theory and Applications for Advanced Text Min-
ing (2012)
25. Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying
protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718
(1998)
26. Kim, S., Chen, J.Y., Cutello, V., Lee, D.: DTMBIO 2016: The Tenth International Workshop on
Data and Text Mining in Biomedical Informatics. In: Proceedings of the 25th ACM International
on Conference on Information and Knowledge Management, pp. 2511–2512 (2016)
Disambiguation Model for Bio-Medical Named Entity Recognition 55

Ashutosh Kumar completed his Bachelor of Engineering (B.E.) in 2014 from Rajiv Gandhi
Proudyogiki Vishwavidyalaya, Bhopal in Computer Science and Engineering. He received his
Master of Technology (M.Tech) degree in 2017 from the Central University of Rajasthan in Com-
puter Science and Engineering. Currently, he is Ph.D. Research Scholar in National Institute of
Technology Raipur. His area of research interest includes “Text mining for biomedical literature”
and “Named Entity Recognition.”
Applications of Deep Learning
in Healthcare and Biomedicine

Shubham Mittal and Yasha Hasija

Abstract The increasing advancements and improvements in medicine and health-


care in the past few decades have ushered us into a data-driven era where a huge
amount of data is collected and stored. With this change, there is a need for analyti-
cal and technological upgradation of existing systems and processes. Data collected
is in the form of Electronic Health Data taken from individuals or patients which
can be in the form of readings, texts, speeches or images. A means to Artificial
Intelligence—‘Machine Learning’ is the study of models that computer systems use
to self-learn instructions based on the weight of parameters without being provided
explicit instructions. Parallelly with biomedical advancements in the past decade, it
has been observed that there has been an increasing refinement of algorithms and
tools of machine learning. Deep Learning is one of the more promising of these
algorithms. It is an Artificial Neural Network that designs models computationally
that are composed of many processing layers, in order to learn data representations
with numerous levels of abstraction. Research suggests that deep learning might
have benefits over previous algorithms of machine learning and its’ suggestive better
predictive performance is, hence garnering significant attention. With their multiple
levels of representation and results that surpass human accuracy, deep learning has
particularly found widespread applications in health informatics and biomedicine.
These are in the field of molecular diagnostics comprising pharmacogenomics and
identification of pathogenic variants, in experimental data interpretation comprising
DNA sequencing and gene splicing, in protein structure classification and predic-
tion, in biomedical imaging, drug discovery, medical informatics and more. The
aim of this chapter is to discuss these applications and to elaborate on how they
are being instrumental in improving healthcare and medicine in the modern context.
Algorithms of deep learning show an improved potential in learning patterns and
extracting attributes from a complex dataset. We would first introduce deep learning
and developments in artificial neural network and then go on to discuss its appli-
cations in healthcare and finally talk about its’ relevance in biomedical informatics

S. Mittal · Y. Hasija (B)


Delhi Technological University, Delhi, India
e-mail: [email protected]
S. Mittal
e-mail: [email protected]
© Springer Nature Switzerland AG 2020 57
S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_4
58 S. Mittal and Y. Hasija

and computational biology research in the public health domain. In the end future
scope of deep learning algorithms would be discussed from a modern healthcare
perspective.

1 Introduction

In the last 10–15 years there has been a drastic advancement in data acquiring tech-
nologies in the field of life sciences, together with improvements in computational
biology and techniques of digital storage which has transformed modern biology
into a data-rich science from a data-poor one. Owing to this development, research
today is data-driven and there are multiple potential solutions to a biological prob-
lem today, unlike before where one question granted one answer. Bioinformatics
deals with assisting in the handling of this large dataset in different aspects, be it
storing, extracting or analyzing data. Techniques of extracting data are further han-
dled by computational biology techniques using programming and algorithms. This
set of methods used to discover meaningful relationships, patterns and functions in
biological data is called ‘data mining’.
In the early 1990s, an area of study that became a popular part of Computational
Science is ‘Soft Computing’. Soft computing as a term represents all the methodolo-
gies that provide flexible information processing capability while handling ambigu-
ous real-life situations. While hard computing aims for precision, soft computing
deals in the domain of partial truth, ambiguity, inaccuracy and approximation to
obtain the solution for a problem [1, 2]. Earlier this was not possible and only simple
systems could be precisely analyzed and modelled by computational approaches,
while systems of medicine, biology, management studies, humanities and other sim-
ilar fields of study which are more complex remained difficult to control by conven-
tional analytical and mathematical methods. However, soft computing techniques
complement each other and therefore biological processes are more closely resem-
bled by soft computing techniques than traditional techniques that mostly work on
logic, such as predicate logic and sentential logic. The main constituents of soft com-
puting are—genetic algorithms, rough sets, fuzzy logic, neural networks, and signal
processing tools such as wavelets. Of these, neural networks have a wide scope in
terms of classifying and representing biological data computationally. Neural net-
works are strong and exhibit good learning and generalization abilities in data-rich
environments [3]. The algorithms used in Neural networks are called Machine Learn-
ing Algorithms.
Applications of Deep Learning in Healthcare and Biomedicine 59

1.1 Machine Learning

Finding answers to problems through computer program methods using experienced


data is known as learning [4]. Several machine learning algorithms have been pro-
posed in the last few decades. With the vast amount of biological data being gen-
erated every day across the world, there is a need for computational techniques to
analyze such data. Such processing power is held by machine learning techniques
which extract useful hidden relationships from great volumes of data. Essentially
how machine learning algorithms work is—they analyze the whole data, regulate
their internal structure to given data and generate hidden layers which give estimated
models to study results from [5]. The major challenges that bioinformatics applica-
tions of machine learning algorithms, face today are two. The first is the availability
of a smaller number of samples as data and the second is the fact that each sample
in life science that is characterized by thousands of features.
Machine learning algorithms have been categorized in the following manner:
Supervised learning
Model is trained on inputs and desired outputs with an aim to be able to precisely
predict future output which is unknown. Data used is labelled and the job is known as
‘regression’ when the output of target is a continuous variable and as ‘classification’
when the same is a group of discrete values.
Unsupervised learning/clustering
Unlike supervised learning, unlabeled data is used in unsupervised learning. Also,
the clusters of data are created based on matching similarity and closeness which
help in further analysis of data.
Semi-supervised learning
This type of learning is performed by categorizing input data into both types—that
is, training of the model is performed upon a small amount of labelled data and a
large amount of unlabeled data.
Reinforcement learning
This model, unlike the previous learning models, is aimed at bettering online perfor-
mance and lacks any kind of input and output data.
Optimization
Involves selecting the model that fits the data in the best possible way, that is selecting
among the numerous possible models, that model which gives the most optimal result.

1.2 Artificial Neural Network

An application of Artificial Intelligence (AI)—Machine Learning empowers systems


with the capability of performing various tasks by self-learning automatically and
60 S. Mittal and Y. Hasija

improving with experience, without any specific programming. To do this, the sys-
tem must be made conversant with a dataset, which is called ‘training data’. Majorly
the two machine learning methods—supervised and unsupervised learning train an
algorithm. At the time of training, a certain set of instructions are provided and it is
from them that Supervised learning generates a function that reproduces the output
[2]. The training process is called to ‘regression’ when the data in output has a con-
tinual value and ‘classification’ when a categorical value is present in output data
[6]. Unsupervised learning includes the creation of a function that considers the hid-
den structures from unlabeled input data, unlike supervised data. During the training
phase pre-processing of training data set is done and important features are extracted.
Preprocessing involves noise reduction, feature extraction, image rectification and
similar operations. For every new application, it is necessary to design features in
a new way because feature extraction is a challenging task, especially when it is of
medical importance. This process is frequently called “hand-crafting” of features, in
the deep learning literature. Depending on the feature vector x  Rn, the classifier
must predict the precise class y, which is characteristically assessed by a function ŷ
= f(x) which gives the classification result ŷ directly. The parameter vector θ of the
classifier is obtained during the training phase and later checked on a separate test
data set.
Artificial Neural Network is a well-known classification and regression algorithm
in Machine learning, which represents the units of several layers in the computational
analysis by imitating the architecture and signal transmission of the neurons and their
synapses in the human brain. The ANN consists of interrelated artificial neurons
where each neuron implements a simplistic classifier model that gives a decision
signal as outputs based on a certain weighted summation of proofs. A wide number
of these basic computational elements are accumulated together to form the ANN [7].
Here the features of the network are trained by a certain valuable algorithm like the
‘back-propagation’ algorithm, where signals from the input and anticipated decision
outputs are presented in pairs, mirroring the situation where the brain focuses on an
external stimulus of sensation to learn to achieve specific jobs (Fig. 1).
Machine learning features used in input data can be numerical and nominal values.
Defining logical and powerful features is fundamental to machine learning studies.
ANN has shown extraordinary performance in numerous areas, but also drawbacks
such as a decline in the local minimum during optimization, and overfitting (over-
training) for certain values. Artificial neural networks based predictive techniques
have over the last few years shown incredible capabilities in solving problems of non-
linear modelling [8] in various applications, but most of these methods composed
of shallow architectures because of problems related to deep networks training. Due
to fast learning algorithms that have been proposed recently deep architecture has
attracted a lot of consideration lately especially since deep ANNs have proved to
outperform conventional methods of pattern recognition, classification and machine
learning domains. DNN is composed of a series of layers stacked. Prediction is made
found on the first layer i.e. the input. Output in the last layer predicts a class or value.
Hidden layers are those between the input and output layers, and they are called so
because their condition does not relate to data that is observable.
Applications of Deep Learning in Healthcare and Biomedicine 61

Fig. 1 A conceptual analogy between real neurons (on the left) and artificial neurons (on the right)

The multi-layered construction of the neural networks permits them to make more
complicated decisions. For explicit training models, each edge demands weights that
are optimized. These weights use the sum of a wide number of characteristics and are
initialized at random and eventually organized by a good algorithm for optimization
like the ‘gradient descent’ algorithm. After the application of training sample data
to the network, there is an evaluation of a loss function between the target class and
the prediction. All features are thence mildly updated towards the course that would
be favouring the minimization of a loss function. On the basis of these networks,
numerous classes of deep learning exist, all with varied approaches. Depth of layers
is extended by DNNs as compared to the traditional ANN, along with a demonstration
of better performance in recognition studies and prediction, when the layers become
complicated.

1.3 Deep Learning

Deep learning allows the representation of multiple levels of abstraction through


models that are designed computationally and comprise many layers of processing.
Introduced in the year 2000 but gaining momentum today due to upgradation in
technology, deep learning techniques have drastically improved and increased its
applications be it in the recognition of speech, or detection of an object or in bio-
logical fields such as in gene expression studies and in drug discovery [9]. By using
the backpropagation algorithm deep learning discovers elaborate structure in large
datasets to guide a machine to change its internal parameters which are then used
to calculate the demonstration of each layer from the demonstration in the layer
before that [10, 11]. Deep convolutional networks have initiated a revolution in the
processing of video, images, audio and speech, while recurrent nets have stood out
in analyzing sequential data such as text and speech.
62 S. Mittal and Y. Hasija

Deep learning can go unsupervised unlike shallow learning (supervised), and with
little guidance, it learns uniquely complex patterns from raw data of high dimension
[12]. This optimization is called as the tradeoff of the breadth or depth; that is. Deep
learning has demonstrated its usefulness in—language and image recognition, video
games, replication of painting styles or even classical composition of music.
Representation learning is the type of learning required in these tasks; where there
is detection/classification of patterns from unprocessed raw data, especially at times
when the data in question is hierarchical in construct. For example, Image recog-
nition starts with learning a pecking order of sub-images from pixels with edges,
and then motifs, up until the final output is a full object. Being particularly unsuper-
vised, deep neural network algorithms can act as feature detector units at each layer
which slowly but ultimately extract more sophisticated and invariant features from
the original input signals [12, 13]. Machines can now accurately identify millions
of images which seems like an impossible task as per human standards. Using deep
learning machines are able to learn to differentiate between similar objects or a sen-
tence with high accuracy. They have also motivated the machine learning community
towards bringing to fruition the idea of automation of tasks such as image recogni-
tion, prediction, classification and annotation in biology, where the huge complexity
and vastness of data now overshadows human analytical capabilities [14, 15].

2 Deep Learning: Recent Trends

Deep learning has given rise to immense possible and ongoing applications across
the world, both in the Biological and non-biological domains.

2.1 In Non-biological Domains

The application of deep learning has increasingly progressed ever since the advent
of Convoluted Neural Network in early 2000. It has been since used for numerous
applications with wide success such as image segmentation and face recognition.
However, these did not gain much attention in research and the industry, at least not
before 2012 in an open ImageNet Competition, that comprised of millions of images
for training and 150,000 pictures exclusively for verification and testing [16]. This
competition created a new field and had an enormous effect, leading the researchers to
collaborate and compete, without making them collect a large-scale labelled dataset
[5]. ‘Dropout’—a new technique for regulating, and a novel image extension skill,
were used to improve the results of this competition. Furthermore, big giants in the
IT and AI world such as Microsoft, Google and Facebook started considering image
recognition using algorithms of deep learning as important areas of research. Post
this, techniques in deep learning showed a 16% error rate in 2012 and it diminished
to 3% and below in 2016, therefore surpassing object classification performance by
Applications of Deep Learning in Healthcare and Biomedicine 63

any human being. Object classification innovations have been relocated to semantic
segmentation and object localization. The RNN-based language model and CNN-
based image recognition framework were integrated to establish a visual questioning
and answering, and an image captioning system.
Another important area is speech recognition where computer science and elec-
trical engineering knowledge, and research in linguistics, and health care (including
radiology) can be combined. Technologies that bring about the translation and recog-
nition of the speech to text by computational equipment, including robotics and smart
technologies, have been developed by many researchers. Lately, due to advances in
deep learning and big data, there has been tremendous progress in speech recognition
[17]. This is evident from the numerous available speech recognition systems in mul-
tiple international firms, such as Facebook, Google, and by the numerous scientific
papers that have been published in the research field on this topic.

2.2 In Biological Domain

The expression profile of a gene can be considered a snap or image of the activities
taking place inside a given cell or tissue very much like how a picture (image) is
representative of the objects in an environment. Patterns of gene expression demon-
strate a cell’s physical state in the same way how objects in a picture are represented
by a pixel pattern. This is how similarities can be compared between biological data
and the kind of data deep learning has been quite successful with particularly, audio
and image data.
In quite the same manner how two very similar but classically different images
must be distinguished by deep learning algorithms regardless of background, two
very similar but classically different pathologies of the disease may be discerned
which is why thus discrimination of basic differences is absolutely essential. Invari-
ance and selectivity are needed for both gene expression analysis and image recog-
nition and are also two descriptors of CNNs [18].
Very similar analogies can be made with other deep learning applications; for
example, language prediction, requires sequential learning with RNNs and this is
very similar to signaling in biology, where one occurrence can be predicted from
previous occurrences in the same way that a word in a sentence can be predicted
from the preceding group of words. Another similar example would be the structural
prediction of biological targets such as proteins.
While these parallel comparisons are illustrative in nature, they also have various
advantages together with DNNs that reinforce their case for biological applications.
First and foremost, deep networks require the datasets for successful analysis which
life science data more than enough provides. Also, DNNs are well designed to make
use of well spread, noisy, and high dimensional data having non-linear relationships,
which are quite endemic to data extracted in biology [18]. Furthermore, DNNs have
an ability to generalize i.e. if it is trained on a dataset once, it can be applied to
various other datasets as well, which as it turns out for the better, is already required
64 S. Mittal and Y. Hasija

Fig. 2 Deep Neural Network Assembly. a Input data—it consists of data from Electronic Health
Records, clinical data, and also molecular data from microarray, MRI, etc. b Data preprocessing—in
this step the source data is preprocessed before analysis by a deep neural network. Techniques of
standardization, normalization, noise reduction and others are being used. c Deep Neural Network—
pre-processed data is used in several hidden layers all the while extracting important features and
resulting in output layer with trained neurons. d Output—this result helps in various biomedical
and healthcare applications such as—diagnosis of disease, genotype-phenotype correlation, disease
prediction, studying pharmacogenomics and drug response, among many others

for analysis of multi-platform heterogeneous data, such as that of expression of a


gene.
Despite the good match of biological data and DNN, their adoption in biology
has been slow due to several possible reasons. This might be because biological data
used for training has a lot of features associated with it, unlike non-biological data.
More computational trials and research is required when dealing with deep learning
on data. Moreover, the ability to simply interpret data with clear transparency is
lacking in DNNs as they only learn by simple relations, associations and patterns.
Such models are called ‘black boxes’ and they, therefore, require also the support of
human beings for interpretation [18]. But the benefits of deep learning overshadow
its negatives and might even be overcome with time (Fig. 2).

3 Applications of Deep Learning in Biomedicine

Let us review the current and possible applications in Biomedicine in this section
when it comes to deep learning.

3.1 Biomarkers

In biomedicine is the conversion of data into biomarkers which reproduce physical


states and phenotype—such as disease, is a valuable task. Biomarkers are highly
Applications of Deep Learning in Healthcare and Biomedicine 65

important when it comes to assessing the outcomes of clinical trial and identifying
diseases and monitoring them, specifically near diseases like cancer. For the modern
translational medicine identification of specific biomarkers with high sensitivity is a
big challenge [10, 2]. An essential tool for biomarker development is Computational
biology which may use any source of data, virtually speaking, from proteomics to
genomics.

3.2 Genomic Study

Next-generation sequencing technology has helped produce a huge volume of


genomic data. A lot of this data can be analyzed computationally using in silico
approaches such as structurally annotating genomes, including predicting the site of
protein binding, and of splicing sites and noncoding regulatory sequences.
A significant sector of genomics is environmental genomics or the metagenomics
which NGS has brought attention to. One of the challenges in it is functionally
analyzing species diversity and sequence data. Using deep belief networks and RNN
has allowed phenotypic categorization of data of human microbiome and data of
metagenomics pH [2, 10]. These helped provide the ability to learn to represent
dataset in a hierarchical manner however they could not improve the accuracy of
categorization. Nevertheless, on large datasets and after properly selecting network
parameters DNN is said to have the potential to greatly improve metagenomics
algorithms.

3.3 Transcriptomic Analysis

Various kinds of transcripts are analyzed to gather functionally important information


such as splicing code, disease biomarkers, etc. These can be miRNA, mRNA, siRNA,
etc. Normalization is required since gene expression data obtained from different
sources is dependent on numerous factors. For cross-platform analysis, Deep Neural
Networks are quite well suited due to their strong generalization capacity [10]. Size
of gene expression datasets and They are also well equipped to handle some of the
other major issues with gene expression data, such as the size of the data sets and the
need for dimension reduction and selectivity/invariance, and in the following section.
Analysis of transcriptomics data with high dimensional matrix has also proven
quite successful with deep learning. In one technique deep learning was used to
extract features of cancer datasets which proved highly successful over previously
used methods of basic feature selection [2]. The results showed high accuracy with
better classification and selection of cancer features. Another instance where deep
learning proved successful was when Fakoor et al. applied the autoencoder network
for cancer classification upon gene expression data taken from Microarray.
66 S. Mittal and Y. Hasija

3.4 Medical Image Processing

One of the most successful applications of DNN across the world has been in image
analysis. Architectures of deep learning have proven better at recognizing objects
in pictures than human detection and traditional image recognition. As a result,
around the world, all advanced software systems use deep learning for image anal-
ysis involving object recognition, retrieval, and categorization [2, 14]. Naturally, in
medicine, this had been of great value to researchers and technicians in identifying
disease based on pictures of symptoms, especially in dermatological disorders but
also in images showing gene expression and internal body imaging [2, 10]. Convo-
luted Neural Networks evidently have shown to be most useful in this arena of image
analysis.

3.5 Splicing

Another area of Biomedicine where deep learning is highly used is splicing, which
is indicative of the biological activity in eukaryotic organisms. Current techniques
prove insufficient in regulating splicing be it the structure of spice site, its state or
splicing silencers or enhancers. But the most evident problem is of ‘raw reads’ at
splice code locations which are essentially shorter than actual genes with a really high
level of duplication [2, 10]. Deep learning comes to the rescue with high efficiency
when it comes to studying splicing mechanism and understanding splice codes,
outperforming Bayesian methods in splicing prediction.

3.6 Proteomic Study

Deep Learning represents data hierarchically and extract and learn from interac-
tions which are complex which is quite beneficial for protein network analysis. For
example, using phosphorylation data a deep learning a belief network (bimodal) was
created to predict the response of human cells to stimulus from the response of rat
cells to the same stimulus. The algorithm used showed a very high accuracy over
traditional approach [2, 10]. Also, analytical approaches (algorithms) of proteomics
do not require large training data, unlike other ML algorithms. It is also true that
proteomics is still very new to research compared to transcriptomics and contains
very less data for analysis.
Applications of Deep Learning in Healthcare and Biomedicine 67

3.7 Structural Biology and Chemistry

Protein modelling, including folding and protein dynamic, comprise the study of
structural biology and chemistry. For good function prediction of enzymes, RNA
binding, substrate and antigen-binding, perfect structure determination is important.
Diseases such as Alzheimer’s and Parkinson’s are a result of the accumulation of
abnormal proteins which are identified through structural biology studies. Compar-
ative modelling is a technique to predict the secondary structure of a protein, based
on homology of the compound but due to a limited number of well-annotated com-
pounds, it is this is not easy [2, 10]. Applying deep learning using sequence has
greatly improved protein structure prediction. Certain proteins are particularly very
important even after lacking a unique structure. These proteins are called IDPs or
intrinsically disordered proteins with the domains without a continuous structure
called intrinsically disordered regions or IDRs. Deep learning algorithms have been
used to separate IDP/IDR from structured proteins. Back in 2013, ‘DNdisorder’—a
sequence-based deep learning predictor was published by Eickholt and Cheng which
was highly successful at predicting disordered proteins compared to other advanced
predictors. In 2015, ‘DeepCNF’ an even better predictor was developed which could
predict IDPs and particular proteins with IDRs by obtaining and analyzing data
from experiments. This proved to be a better algorithm than those used in ab initio
predictors.

3.8 Drug Discovery

Applying computational techniques to discover drugs and study their biochemistry


has always been an important part of drug research across the world and it not only
reduces time but also saves on cost and resources. Although several approaches exist
to do this none of them have been declared ‘optimal’ as of yet due to certain limitations
such as, limitation by the class of protein or being unable to perform high throughput
screening, etc. PINN or Pairwise Input Neural Network was used by Wang et al. to
study the interaction of target and ligand, by extracting features from target profiles
and protein sequence [2, 10, 19]. Using DNN and CNN prediction of properties such
as drug toxicity and high reactivity is possible which are highly valued aspects of
drug design and discovery.
68 S. Mittal and Y. Hasija

4 Applications of Deep Learning in Health Care

4.1 Translational Bioinformatics

With the findings of the human genome project, a huge amount of previously unex-
plored biological data has been obtained including genes, proteins and also knowl-
edge on processes describing how genes interact with the external environment to
produce proteins. Also, developments in life sciences and biotechnology have drasti-
cally reduced the cost of gene sequencing and directed disease treatment by genome
and proteome analysis [20]. Translational Medicine essentially involves the appli-
cation of research performed in basic biological laboratories at the clinical level by
making use of inputs from clinical observations. And Bioinformatics entails the use
of computational techniques and algorithms to critically store, represent or analyze
biological data including metabolites within cells, RNA expression, DNA sequence
and proteins [21]. Translational bioinformatics integrates these two fields; in the sense
that it involves the development of databases and algorithms to research basic cellu-
lar and molecular data by keeping enhancement of clinical care as the ultimate goal.
Simply put, research in translational bioinformatics unites molecular information
(small molecules, lipids, protein, RNA and DNA) with knowledge about clinical
entities (patients, symptoms, diseases, pathology reports, laboratory tests, clinical
images and drugs) to improve our biological understanding and ultimately patient
care. This has given rise to bioinformatics research in personalized medicine where
treatment is designed specifically for the individual and not generally for many.
Machine learning in the field of traditional bioinformatics comprises of 3 research
areas—process prediction, disease prevention and personalized treatment. These
domains are governed by three major areas of life sciences—genomics, epigenomics,
and pharmacogenomics. While genomics is the study of DNA structure, genes, cre-
ation of proteins and phenotypic expression for creation of targeted therapies, phar-
macogenomics is aimed at creating more effective drugs with minimal side effects
while providing specialized treatment for individual and epigenomics is the study of
effect of environmental factors on the interaction between and formation of proteins
[21].
Genetic variants among population and species are created as a result of alternative
splicing which is hence one of the popular areas of study involving machine learning.
Their understanding could be the steppingstone in detecting diseases early. Another
application of deep learning and machine learning algorithms in computational biol-
ogy is the protein-protein interaction study using QSAR (Quantitative Structure-
Activity Relationship) and CPI (Compound-Protein Interaction) [21]. These also
help in modelling proteins binding to RNA. Also, due to several reasons such as
transcriptional or translational errors, instability in the chromosome, cancer progres-
sion or differentiation of cells, DNA methylation affects the expression of DNA,
which is an area requiring more study using deep neural networks.
Applications of Deep Learning in Healthcare and Biomedicine 69

4.2 Universal Sensing for Health and Wellbeing

One of the most applicative fields of Deep Learning, showing great potential for
growth is biosensing in smart devices used in healthcare. The algorithms may be
used in devices to monitor calorie intake, assist those with partial vision, detect
irregularities in biomedical devices and more. Some of these applications have been
discussed in the subsections below.

4.2.1 Recognizing Activity and Expenditure of Energy

Dieticians say that an optimum diet comprising a limited number of calories should be
consumed in order to stay healthy and fit. But today obesity is rampant to the level of
becoming an epidemic and being one of the causes of dangerous and chronic diseases
such as those related to the heart and others such as type II diabetes. This can be
overcome by keeping a track of the kind and quantity of consumed food and duration
and type of exercises or physical activities performed, all of which contribute to a
healthy disposition. However, to do this requires competent technology that is able to
select characteristics which may generalize from the numerous foods and activities
[21]. This is achieved using wearable devices and smartphones that monitor and
manage food intake and energy expense.
Recently a Calorie measurement system was created which acted as an assistant
that could estimate the number of calories in a food item in a picture and this infor-
mation then helps the consumer control or prevent health problems concerned with
a disease by controlling the intake of that food item [21]. The system is applied
by means of smartphones and uses Convoluted Neural Networks (CNNs), leading
to many more advanced techniques such as cloud computing on mobiles, size cal-
ibration and distance estimation that help in recognizing food type, estimating the
calories and for classification of human activities, such as—a baby crawling, some-
one falling (abnormal activities will raise alarm and inform the family members)
[21]. Also, on comparing different datasets of human activity recognition and the
performance of CNN based method on them it was found that deep learning method
is more generalizable as it has better classification accuracy. Furthermore, smart
wearable devices which are low powered are less efficient which is why they cannot
handle greater computational complexity needed in deep learning. In such situations
using preprocessing standardizing techniques is recommended as they decrease dif-
ferences caused by properties of sensor like orientation and position from changing
data in the input.

4.2.2 Abnormality Detection in Vital Signs

Individuals suffering from prolonged illness and those whose state is critical need to
be closely monitored and it is hence important to analyze discrepancies in their vital
70 S. Mittal and Y. Hasija

signs. Abnormalities, however, vary patient to patient and are affected by equipment
and noise. Machine learning techniques greatly contribute to this approach for detect-
ing irregularities [21]. EEG is an equipment used to record electronic brain activity;
in 2010 Wulsin et al. [22] proposed an approach to detect discrepancies in an EEG
using Deep Belief Network (DBN). These use large datasets which proved DBN
to be a more effective method even outperforming SVM [21]. In 2015 Wang et al.
[23] created a DBN which compressed the signal thereby resulting in 50% energy
saving while keeping the same neural decoding accuracy, which is a breakthrough
in developing low power implantable and wearable sensors.

4.2.3 Assistive Devices

These comprise devices that are used to understand object shape and volume and clas-
sify them by operating in the three-dimensional space. It could be used for patients
suffering from visual or audio impairment, speech impairment, etc. with the feedback
provided by the user in the form of gesture, tactile feedback or audio feedback. Deep
learning greatly helps in enhancing such devices; for example, in 2016 a CNN based
wearable device was proposed by Poggi et al. [24] to aid people having impaired
vision in detecting an obstacle. Similarly gestured based assistive devices have been
proposed for patients with audio impairment and also for a highly sensitive environ-
ment like during surgery where a touch-free human-computer interaction would be
preferable [21]. In fact, in 2015 Huang et al. [25, 26] had proposed a DNN based
method for recognizing sign language that used real-time data. However, many such
applications like gesture recognition are quite challenging due to a great number of
possible distinctions in hand postures and due to subsequent algorithm complexity.

4.3 Informatics in Medicine

This field aims to study a large amount of aggregated data in the medical domain
in order to augment and grow the decision support system in the clinical sphere as
well as increase healthcare data assessment for assuring good quality and easy access
to medical services. The EHR (Electronic Health Records) are very data-intensive
sources of information with respect to patient data including their drug prescriptions,
treatments recommended, diseases diagnosed, records of vaccinations and labora-
tory tests results from machines such as EEG and clinical images both internal and
external. Mining into this extensive dataset would certainly provide us with a greater
understanding of the disease and eventually improve its management [21]. How-
ever, there are several disadvantages to this. For example, due to irregular compiling
of information, there is complexity in data. Similarly, erratic delays between the
recognizing of disease and diagnosis of disease increase the complexity of learning.
Deep learning comprehends data depiction in both supervised and unsupervised
conducts and its accomplishments are greatly attributed to its capacity to learn unique
Applications of Deep Learning in Healthcare and Biomedicine 71

patterns and characteristics. Scaling up of large datasets is done exceptionally well


by deep learning methods. Moreover, deep neural networks can associate several
components of data architecture which is why they handle information of multiple
models well [21]. For these reasons and more deep learning has been widely accepted
in research in medical informatics. In 2015, Shin et al. [27] proposed an image-text
CNN to identify data that links images and reports of radiology from picture and
information database of hospitals. In 2014, Liang et al. [28] had used a revised
version of CDBN for training huge datasets on hypertension. In 2016, Putin et al.
[29] used deep learning to recognize markers that forecast the age of a human being
based on a blood test. In 2015, Nie et al. [30] suggested a DNN for instinctively
inferring disease. Later, more methods were developed to interpret healthcare data
some of which were on GBDT (gradient boosting decision trees), LSTM RNN, etc.
Deep learning has therefore greatly increased accessibility to varied data be it from
clinics, hospitals, data clouds or research organizations.

4.4 Public Health

Through the means of analyzing the extent of disease and interaction with the envi-
ronment, public health is aimed at improving healthcare facilities, preventing diseases
and prolonging life. The domain of public health involves epidemic and pandemic
studies, and their applications include air quality checks, assurance of drug safety,
surveillance of epidemic, studies of environmental factors on lifestyle diseases such
as obesity. Computational methodologies help in creating models for such studies;
however, they are currently limited as they lack the ability to include real-time data
in the analysis, Deep learning however if incorporated promise a better and stronger
ability to generalize. This is because they are data-driven methods and are also able
to optimize the cost function with the availability of new datasets [10]. An example
of one such optimization algorithm is ‘stochastic gradient descent’, which is widely
used in DNNs. Therefore, for analyzing public health data deep learning methods
along with network analysis and recommendation systems are most advised.
Assessing and predicting air pollutant concentration is one such application of
deep learning. A system has been designed by Ong et al. in 2015 [31] which collects
data from sensors in more than 52 cities of Japan and based on this, it forecasts air
pollution level in the country [21]. The DNN method used is trained in an online
manner and comprises of stacked Autoencoders. However, it is also true, as was
found out, that deep learning techniques are affected by incomplete data of the
real world. Tracking of disease outbreaks by performing epidemiology studies and
assessment of lifestyle diseases through social media is another very interesting
application of deep learning in the health sector. Examples of such diseases are Ebola
and Influenza. In 2015, Zhao et al. [32] used Twitter to track the health of the public,
continuously and quite accurately applications [1]. Here, DNNs are used to check
for characteristics describing an epidemic and their changes with the environment to
track the development of the disease. Not just this but messages on twitter may also
72 S. Mittal and Y. Hasija

be used to study antibiotics and have shown a good forecast of intestinal diseases.
To classify antibiotic-related classes DBN was used while in 2016, Zou et al. [33]
used deep learning to identify three types of intestinal diseases. Furthermore, in
2016, Garimella et al. [34] used geographically marked pictures from Instagram
to track drinking, obesity, smoking and other lifestyle diseases and compared the
classification by users with deep learning annotations. The results of the study stated
that deep learning-based algorithmic annotations were more successful in predicting
and categorizing behaviours such as drug abuse and drinking.
Data from mobile phones such as texts or phone call location can greatly be used to
characterize the behaviour of human beings. This technique uses CNN, it is gaining
increasing popularity and is found highly accurate for prediction of gender and age of
individuals. Therefore, metadata of individuals, mobile networks, social media data
and EHRs help in forming policy for public health. This could also help in keeping
large scale surveillance for diseases and create alert mechanisms at their onset or
at the time when symptoms appear. However, collection of such personal data also
poses the risk of intrusion of one’s privacy be it through social media platforms like
Facebook, Twitter, Instagram or through databases containing sensitive data with
low security and prone to easy exploitation. Hence, the current situation requires
individuals to be able to control access to their private health information while at
the same time creating mechanisms to gain more information for large scale study
using deep learning algorithms (Table 1).

5 Challenges of Deep Learning in Biomedicine


and Healthcare

Compared to previous machine algorithms deep learning faces several challenges


despite having shown an upper hand in feature extraction, recognition and classifi-
cation. Biological data is highly complex and is not easy for humans to interpret it
alone. Neither can it be interpreted in a good manner with controlled quality level by
algorithms alone, such as the ones in deep learning since DNNs lack the transparency
to uncover biological relationships [21]. They require human input along with com-
putational analysis to holistically analyze given biological data. This is known as
the ‘black box’ problem. Also, in order to have high accuracy deep learning algo-
rithms require a large training data which in most cases is not instantly available.
Non-availability of large enough training data gives the risk of overfitting, i.e., when
test error is high in spite of having a low training error. Moreover, many times it is
difficult to choose the particular type of DNN which would be appropriate for a task.
Although there are tools to help in this selection such as techniques of hyperparameter
optimization, it is not always upfront in deciding upon which architecture to utilize
especially when new ones are continuously being added. Although on the whole
computational techniques decrease the cost of analyzing data and save time, DNNs,
in particular, have an intensive data training process which is also time-consuming
Applications of Deep Learning in Healthcare and Biomedicine 73

Table 1 Summary of applications of deep learning in biomedicine and healthcare


Applications Data source Deep learning algorithm References
used
Cancer diagnosis and Gene expression data Deep autoencoders [35]
classification
Protein secondary PDB, CASP9, CASP10 DNSS (multimodal [36]
structure prediction DBNs)
Gene variants Microarray data DNN [37]
Annotating gene Gene expression data CNN [38]
expression patterns
Metagenomic Microbiome sequencing RNN, DBN [39]
classification data
Target-ligand interaction sc-PDB SVD and autoencoder [40]
prediction
DNA methylation DNA, RNA sequencing DNN [41, 42]
data
Identification of RNA-seq and whole DNN [43]
Expression Quantitative genome-wide SNP-array
Trait Loci (eQTL) data
Effects of noncoding Transcription factors DeepSEA (CNN) [44]
variants binding profiles, histone
mark profile from
ENCODE and
epigenomics project
Modelling structural doRiNA (database of DBN (multimodal [45]
features of RNA-binding RNA interactions in DBNs)
protein targets post-transcriptional
regulation)
3D brain reconstruction MRI Deep autoencoders [46, 47]
Alzheimer diagnosis PET scans DNN, CNN [48]
Cell clustering Microscopy Deep autoencoder [49, 50]
Hemorrhage detection X-ray images DNN [51, 52]
Organ segmentation Endoscopy images CNN [53]
Human activity Wearable devices CNN, DNN [54, 55]
recognition

and requires skilled individuals such as GPU programmers. Scientists also find deep
learning unable to answer some important questions and provide solutions. For one,
many high-level visualizations obtained using deep learning are not easy to inter-
pret. Plus, there are sometimes no provisions to apply changes in case of any issue in
classification. Moreover, deep learning is not suitable for all kinds of diseases, par-
ticularly rare diseases. Evidence also suggests that DNNs can also be easily tricked
to obtain misclassified information by making minute changes in the input.
74 S. Mittal and Y. Hasija

6 Conclusion

It is no surprise that today the tremendous amount of biological data generated is


simply too large for humans to analyze alone. In addition, we, therefore, require the
support of machine learning and specifically deep learning algorithms to effectively
interpret data in biomedicine and healthcare [10]. Recent research and development
in deep learning in biomedicine has advanced us into an era where we find widespread
applications in drug designing, genomic and proteomic analysis, transcriptomic gene
expression analysis, splicing, medical image processing, multi-omics study, among
others [2, 14]. In healthcare a similar advancement is seen when DNN is applied
to translational bioinformatics, finding genetic variants, studying target and ligand
interaction, medical imaging, assistive wearable devices, medical informatics, public
health, etc. Although this approach of deep learning in biological sciences is still in
its infancy, it is a novel approach and holds great potential of drastically changing
the life sciences scenario with respect to cost, time and extent of use of technology—
both physically and computationally. Granting there are a few drawbacks when it
comes to the application of the algorithm, however, these get overshadowed by the
extent of improvement in the biological domain even though we are just at the outset
of deep learning applications. With time these methodologies are bound to improve,
as a result of the partnerships forged by the thousands of people brought together by
the common goal of researching and developing deep learning algorithms upon life
science data. Giants of the IT and Pharmaceutical world across the globe are rapidly
investing in research into computational biological sciences as the next few years are
projected to demonstrate an immense growth in this sector.
The book chapter was aimed at achieving two things—One, making the reader
aware of the recent advancements of neural networks and particularly deep learning,
along with their current and future applications in biological sciences. The second
was bridging the gap pure biologists and the community of computational biologists
[21, 56]. Many research results of DNNs in Biological sciences have not even been
announced due to lack of awareness and are impending communication. Research
progresses exponentially when a large number of people associate and work together
with a common goal, just like how deep neural networks work more effectively on
large datasets.

References

1. Angermueller, C., Pärnamaa, T., Parts, L., Stegle, O.: Deep learning for computational biology.
Mol. Syst. Biol. 12(7), 878 (2016)
2. Cao, C., et al.: Deep learning and its applications in biomedicine. Genom. Proteom. Bioinf.
16(1), 17–32 (2018)
3. Rajeswari, K., Vivekanandan, N., Amitaraj, P., Fulambarkar, A.: A study on redesigning modern
healthcare using internet of things, pp. 59–69 (2017)
4. Jiang, F., et al.: Artificial intelligence in healthcare: past, present and future. Stroke Vasc.
Neurol. 2(4), 230–243 (2017)
Applications of Deep Learning in Healthcare and Biomedicine 75

5. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
6. Nelson, D., Wang, J.: Introduction to artificial neural systems. Neurocomputing 4(6), 328–330
(2003)
7. Jain, A.K., Mao, J., Mohiuddin, K.M.: Artificial neural networks: a tutorial. Computer 29(3),
31–44 (1996)
8. Pour, M.P., Seker, H., Shao, L.: Automated lesion segmentation and dermoscopic feature seg-
mentation for skin cancer analysis. In: Proceedings of the Annual International Conference of
the IEEE Engineering in Medicine and Biology Society, EMBS, pp. 640–643 (2017)
9. Norgeot, B., Glicksberg, B.S., Butte, A.J.: A call for deep-learning healthcare. Nat. Med. 25(1),
14–15 (2019)
10. Mamoshina, P., Vieira, A., Putin, E., Zhavoronkov, A.: Applications of deep learning in
biomedicine. Mol. Pharm. 13(5), 1445–1454 (2016)
11. Ching, T., et al.: Opportunities and obstacles for deep learning in biology and medicine. J. R.
Soc. Interface 15(141) (2018)
12. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18(5), 851–869
(2017)
13. Erickson, B.J., Korfiatis, P., Akkus, Z., Kline, T., Philbrick, K.: Toolkits and libraries for deep
learning. J. Digit. Imaging 30(4), 400–405 (2017)
14. Akkus, Z., Galimzianova, A., Hoogi, A., Rubin, D.L., Erickson, B.J.: Deep learning for brain
MRI segmentation: state of the art and future directions. J. Digit. Imaging 30(4), 449–459
(2017)
15. Miotto, R., Wang, F., Wang, S., Jiang, X., Dudley, J.T.: Deep learning for healthcare: review,
opportunities and challenges. Brief. Bioinf. 19(6), 1236–1246 (2017)
16. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural
Comput. 18(7), 1527–1554 (2006)
17. Neapolitan, R.E., Neapolitan, R.E.: Neural networks and deep learning. In: Artificial Intelli-
gence, pp. 389–411 (2018)
18. Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)
19. Faust, O., Hagiwara, Y., Hong, T.J., Lih, O.S., Acharya, U.R.: Deep learning for healthcare
applications based on physiological signals: a review. Comput. Methods Programs Biomed.
161, 1–13 (2018)
20. Kim, K.G.: Book review: deep learning. Healthc. Inform. Res. 22(4), 351 (2016)
21. Ravi, D., et al.: Deep learning for health informatics. IEEE J. Biomed. Health Inf. 21(1), 4–21
(2017)
22. Wulsin, D., Blanco, J., Mani, R., Litt, B.: Semi-supervised anomaly detection for EEG wave-
forms using deep belief nets. In: Proceedings—9th International Conference on Machine Learn-
ing and Applications, ICMLA 2010, pp. 436–441 (2010)
23. Wang, A., Song, C., Xu, X., Lin, F., Jin, Z., Xu, W.: Selective and compressive sensing for
energy-efficient implantable neural decoding. In: IEEE Biomedical Circuits and Systems Con-
ference: Engineering for Healthy Minds and Able Bodies, BioCAS 2015—Proceedings (2015)
24. Poggi, M., Mattoccia, S.: A wearable mobility aid for the visually impaired based on embedded
3D vision and deep learning. In: Proceedings—IEEE Symposium on Computers and Commu-
nications, Aug 2016, pp. 208–213
25. Huang, J., Zhou,W., Li, H., Li, W.: Sign language recognition using real-sense. In: 2015 IEEE
China Summit and International Conference on Signal and Information Processing, ChinaSIP
2015—Proceedings, pp. 166–170 (2015)
26. Tang, A., Lu, K., Wang, Y., Huang, J., Li, H.: A real-time hand posture recognition system
using deep neural networks. ACM Trans. Intell. Syst. Technol. 6(2), 1–23 (2015)
27. Shin, H.-C., Lu, L., Kim, L., Seff, A., Yao, J., Summers, R.M.: Interleaved text/image deep
mining on a large-scale radiology database for automated image interpretation. J. Mach. Learn.
Res. 17(1–31), 2 (2015)
28. Liang, Z., Zhang, G., Huang, J.X., Hu, Q.V.: Deep learning for healthcare decision making
with EMRs. In: Proceedings—2014 IEEE International Conference on Bioinformatics and
Biomedicine, IEEE BIBM 2014, pp. 556–559 (2014)
76 S. Mittal and Y. Hasija

29. Korzinkin, M., et al.: Deep biomarkers of human aging: application of deep neural networks
to biomarker development. Aging (Albany NY) 8(5), 1021–1033 (2016)
30. Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B., Chua, T.S.: Disease inference from health-
related questions via sparse deep learning. IEEE Trans. Knowl. Data Eng. 27(8), 2107–2119
(2015)
31. Ong, B.T., Sugiura, K., Zettsu, K.: Dynamically pre-trained deep recurrent neural networks
using environmental monitoring data for predicting PM2.5. Neural Comput. Appl. 27(6),
1553–1566 (2016)
32. Zhao, L., Chen, J., Chen, F., Wang, W., Lu, C.T., Ramakrishnan, N.: SimNest: social media
nested epidemic simulation via online semi-supervised deep learning. In: Proceedings—IEEE
International Conference on Data Mining, ICDM, Jan 2016, pp. 639–648
33. Zou, B., Lampos, V., Gorton, R., Cox, I.J.: On infectious intestinal disease surveillance using
social media content, pp. 157–161 (2016)
34. Garimella, K., Alfayad, A., Weber, I.: Social media image analysis for public health. In: Pro-
ceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pp. 5543–5547
(2015)
35. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis
and classification. In: Proceeding of the ICML Work. Role Mach. Learn. Transform. Healthc.
(2013)
36. Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein
secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf. (2014)
37. Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity
of genetic variants. Bioinformatics (2015)
38. Zeng, T., Li, R., Mukkamala, R., Ye, J., Ji, S.: Deep convolutional neural networks for annotating
gene expression patterns in the mouse brain. BMC Bioinf. (2015)
39. Ditzler, G., Polikar, R., Rosen, G.: Multi-layer and recursive neural networks for metagenomic
classification. IEEE Trans. Nanobiosci. (2015)
40. Wang, C., Liu, J., Luo, F., Tan, Y., Deng, Z., Hu, Q.N.: Pairwise input neural network for
target-ligand interaction prediction. In: Proceedings—2014 IEEE International Conference on
Bioinformatics and Biomedicine, IEEE BIBM 2014 (2014)
41. Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction
prediction by deep learning. Methods (2016)
42. Angermueller, C., Lee, H.J., Reik, W., Stegle, O.: DeepCpG: accurate prediction of single-cell
DNA methylation states using deep learning. Genome Biol. (2017)
43. Witteveen, M.J.: Identification and elucidation of expression quantitative trait loci (eQTL) and
their regulating mechanisms using decodive deep learning (2014)
44. Zhou, J., Troyanskaya, O.G.: Predicting effects of noncoding variants with deep learning-based
sequence model. Nat. Methods (2015)
45. Zhang, S., et al.: A deep learning framework for modeling structural features of RNA-binding
protein targets. Nucleic Acids Res. (2015)
46. Mansoor, A., et al.: Deep learning guided partitioned shape model for anterior visual pathway
segmentation. IEEE Trans. Med. Imaging (2016)
47. Shan, J., Li, L.: A deep learning method for microaneurysm detection in fundus images. In:
Proceedings—2016 IEEE 1st International Conference on Connected Health: Applications,
Systems and Engineering Technologies, CHASE 2016 (2016)
48. Fritscher, K., Raudaschl, P., Zaffino, P., Spadea, M.F., Sharp, G.C., Schubert, R.: Deep neural
networks for fast segmentation of 3D medical images. In: Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat-
ics) (2016)
49. Avendi, M.R., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model
approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal.
(2016)
50. Cheng, J.Z., et al.: Computer-aided diagnosis with deep learning architecture: applications to
breast lesions in US images and pulmonary nodules in CT scans. Sci. Rep. (2016)
Applications of Deep Learning in Healthcare and Biomedicine 77

51. Rose, D.C., Arel, I., Karnowski, T.P., Paquit, V.C.: Applying deep-layered clustering to mam-
mography image analytics. In: Proceedings of the 2010 Biomedical Science and Engineering
Conference, BSEC 2010: Biomedical Research and Analysis in Neuroscience, BRAiN (2010)
52. Wang, J., MacKenzie, J.D., Ramachandran, R., Chen, D.Z.: A deep learning approach for
semantic segmentation in histology tissue images. In: Lecture Notes in Computer Science
(including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformat-
ics) (2016)
53. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical
dysplasia diagnosis. In: Lecture Notes in Computer Science (including subseries Lecture Notes
in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016)
54. Sun, L., Jia, K., Chan, T.H., Fang, Y., Wang, G., Yan, S.: DL-SFA: deeply-learned slow feature
analysis for action recognition. In: Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (2014)
55. Ravi, D., Wong, C., Lo, B., Yang, G.Z.: Deep learning for human activity recognition: a
resource efficient implementation on low-power devices. In: BSN 2016—13th Annual Body
Sensor Networks Conference (2016)
56. Dolmans, D.H.J.M., Loyens, S.M.M., Marcq, H., Gijbels, D.: Deep and surface learning in
problem-based learning: a review of the literature. Adv. Health Sci. Educ. 21(5), 1087–1112
(2016)

Shubham Mittal is dynamic individual currently pursuing his master’s in bioinformatics from the
Delhi Technological University. Having completed his bachelor’s in biotechnology he is highly
motivated towards computational research in life sciences and possesses an in-depth knowledge
of the field. In his free time Shubham likes to play basketball, listen to music and read fiction.

Dr. Yasha Hasija is an Associate Professor in the Delhi Technological University. She holds a
bachelor’s and master’s degree in biotechnology and Ph.D. in Bioinformatics. Besides having a
sound academic foundation Dr. Yasha is a vibrant individual and a very good orator. Specializing
in genome informatics and interaction study with human diseases, some of her research interests
are—genetic analysis of dermatological disorders, tuberculosis study and role of human genetic
variations in age-related disorders.
Deep Learning for Clinical Decision
Support Systems: A Review
from the Panorama of Smart Healthcare

E. Sandeep Kumar and Pappu Satya Jayadev

Abstract Innovations in Deep learning (DL) are tremendous in the recent years and
applications of DL techniques are ever expanding and encompassing a wide range
of services across many fields. This is possible primarily due to two reasons viz.
availability of massive amounts of data for analytics, and advancements in hardware
in terms of storage and computational power. Healthcare is one such field that is
undergoing a major upliftment due to pervasion of DL in a large scale. A wide vari-
ety of DL algorithms are being used and being further developed to solve different
problems in the healthcare ecosystem. Clinical healthcare is one of the foremost areas
in which learning algorithms have been tried to aid decision making. In this direc-
tion, combining DL with the existing areas like image processing, natural language
processing, virtual reality, etc., has further paved way in automating and improv-
ing the quality of clinical healthcare enormously. Such kind of intelligent decision
making in healthcare and clinical practice is also expected to result in holistic treat-
ment. In this chapter, we review and accumulate various existing DL techniques and
their applications for decision support in clinical systems. There are majorly three
application streams of DL namely image analysis, natural language processing, and
wearable technology that are discussed in detail. Towards the end of the chapter, a
section on directions for future research like handling class imbalance in diagnostic
data, DL for prognosis leading to preventive care, data privacy and security would
be included. The chapter would be a treat for budding researchers and engineers who
are aspiring for a career in DL applied healthcare.

Keywords Machine learning · Deep learning · Smart healthcare · Clinical


decision support system

E. Sandeep Kumar (B)


Department of Telecommunication Engineering, M.S. Ramaiah Institute of Technology,
Bengaluru, India
e-mail: [email protected]
P. Satya Jayadev
Department of Electrical Engineering, IIT Madras, Chennai, India
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 79


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_5
80 E. Sandeep Kumar and P. Satya Jayadev

1 Introduction

Deep Learning has showed its potential in recent years for reducing the large datasets
to a more abstract representation well suited for classification and prediction applica-
tions, which is the heart of many smart tech-systems. Majority of the DL algorithms
comprise of sequence of blocks embedded with primitive linear or non-linear oper-
ations that operate on the data flowing from a block to the other, thereby learning
a more condensed representation of the information contained in the dataset and
thereby aiding decision making process [1].
A healthcare system comprises of doctors, nurses, front-line managers, middle-
level managers, senior managers, and board of directors. Decision making process
is a crucial aspect of this group and the decisions taken by them can be classified
to be clinical and non-clinical. Clinical decision support systems (CDSS) are the
technology driven arrangements that assist a physician or any medical practitioner
for better decision making process. Under clinical systems, there are decisions taken
with respect to diagnosis, therapy, treatment and medical prescription, while the non-
clinical decisions include those taken with respect to resource allocation, budgets,
strategic planning, etc. Even though DL can assist in decisions of all kinds, in this
chapter we stick to the core objective of discussing about the use of DL in clinical
decision making.
Clinical decisions are one among the many complex and challenging decision
support systems, mainly because of the various measurable and non-measurable
attributes involved in decision making and complex relations that exist between those
attributes. The attributes include patients’ beliefs, lifestyles, experiences, education
level, diagnostic reports, historical health records and so on. DL algorithms can serve
as effective tools for supporting the decision making process, however the attributes
input to the algorithms must be measurable and quantifiable.
To understand where DL fits in the bigger picture of CDSS, let us look at a general
block diagram of a CDSS shown in Fig. 1.
The system has mainly three blocks: patient’s primary data [2]—this data com-
prises of observed symptoms, diagnostic reports, medical records of the patient, etc.,
secondary data comprises of data external to healthcare system which include the
patient’s food and drinking habits, sensitivity of body to certain allergens, patient’s
rights, etc., which are to be considered for more informed decisions regarding health-
care. The knowledge base refers to the historical medical records of other patients
stored in the database which can serve as a reference while taking decision with
regard to the current patient. In [3, 4], we see the authors coming up with a fuzzy-
rule based system referred as virtual clinic with an objective to automatically assign
doctors to patients and then assisting the doctors in giving prescriptions by using the
historical knowledge base. However, as this database expands, it may contain thou-
sands of entries making it almost impossible to search them thoroughly for informa-
tive records. This is where DL comes in handy with the tremendous representation
power of deep neural networks. Machine learning and/or deep learning models have
the potential to compress huge databases into abstract miniature representations and
Deep Learning for Clinical Decision Support Systems … 81

Fig. 1 Block diagram of a clinical decision support system

can almost replace the knowledge base, making it optional. These models can be
directly used to serve as a decision making tool in the CDSS. As an example, in [5],
authors use deep feed forward neural networks to predict the inpatient clinical order
patterns. The features considered were comorbidity, patient sex and race, Interna-
tional classification of diseases (ICD)-diagnosis codes and so on, from the electronic
health records. They concluded that deep neural network based model outperformed
standard of care human authored order sets in predicting actual clinical practices. In
[6], authors use convolutional neural networks on the electronic health records of the
patients and extract high-level semantic information of the diagnosis and generate a
report. This result is used as assistance for medical practitioners to conclude on the
health status. There are many such applications of ML and DL in different domains
of clinical healthcare and also potential applications of DL that can be explored in
the future, all of which will be covered in the later parts of this chapter.
The rest of the chapter is organized as follows: in Sect. 2, we review the existing
works on the applications of DL with image processing (computer vision) for CDSS,
Sect. 3 deals with the applications of DL in Natural Language Processing (NLP) for
CDSS while highlighting certain existing challenges, Sect. 4 deals with DL and
wearable device technology based CDSS, Sect. 5 looks at the issues involved in
using DL for CDSS, Sect. 6 discusses future research perspectives on the use of DL
for CDSS and finally, Sect. 7 summarizes and concludes the chapter.

2 Deep Learning and Image Analysis

Image analysis is an area that was well explored in smart healthcare. Ever since the
digital imaging came into existence, automated analysis of the images using naive
rule based architectures is being done. The era in the early 90s shifted to the use of
82 E. Sandeep Kumar and P. Satya Jayadev

simple machine learning algorithms to extract useful patterns and information from
the images. However, this involved a lot of hand engineering from deciding which
features to extract, how to extract, which algorithm to use for decision making, and
so on. The recent advances in DL come as a big relief since the architectural nature
of DL is so powerful that it can extract the features and approximate a prediction
function from the given data seamlessly. This very special potential of DL algorithms
made it a preferred tool for image analysis and computer vision applications.
In all the existing research works, the blocks involved in the image analysis are
similar. The blocks are summarized in Fig. 2: image acquisition includes various
methods using which images of an entity is captured, these images are passed through
the preprocessing stages where the images are filtered or subjected to manual bound-
ing box carving, and the modified image is passed into a convolutional neural network
(CNN) block for training. The obtained image is passed to an interpreter (optional)
which can be a fully connected network, autoencoders, and so on, that quantifies the
obtained image from the previous stages into a required form that is suitable for a
medical practitioner to understand. We shall now look at the various applications of
DL in analysis of medical images, proposed by researchers in recent years.
Convolutional neural networks are seen often in the works that use DL for image
analysis. The reasons for such an extensive usage of CNNs are: a CNN learns the
relevant features like how human brain extracts features from an image. Another
important characteristic of CNNs is weight sharing [7], where the kernels are shared
across an image which gives the advantages of learning the local patterns efficiently
and increasing the model efficiency by reducing the number of parameters involved
in the whole process. Transfer learning [7] which is explicitly used in image analysis
is easy in case of CNNs than conventional dense neural networks.
Let us see few works that use CNNs for image analysis tasks. In [8], the authors
review various medical imaging applications of DL. They notify that image analysis

Fig. 2 General block diagram for image analysis using CNNs


Deep Learning for Clinical Decision Support Systems … 83

has been carried majorly on pathology, lung, brain, cardiac, abdomen, breast, bone,
retina, etc. Alongside, various imaging modalities are used, such as MRI, CT, X-
ray, PET, ultrasound and visible range, of which MRI and visible light microscopic
imaging are majorly used in image analysis. In addition, the authors state that image
analysis techniques like segmentation, classification (for medical examination and
inferences, and object detection), and registration are widely studied. Among these,
segmentation of a required region of interest (RoI) and detection of an object in a
given image are most studied among image analysis methods due to their practical
implications.
In [9], the authors proposed a methodology for segmentation of regions of interest
applied to identifying heart chambers. The methodology has three parts viz. the first
part uses convolution neural networks (CNNs) [10] to locate the area containing left
ventricle (LV) in the image frame, the second part consists of stacked autoencoders
to infer the shape of the LV from the image fed from the first part, the third part
comprises of a Dense-NN to segment and deliver a binary mask of the LV. The
algorithm was trained and validated on a publicly available LV datasets (MRI scans)
obtaining an accuracy of 96.69%.
In [11], the authors propose a scribble based CNN for image segmentation task.
As stated by the authors, a completely automated DL algorithm performs poorer on
the unseen/test images and hence a bounding box is needed to concise the search
space for the algorithm. This bounding box based training method has provided better
segmentation accuracy. The work in [12], propose a method of multimodal image
segmentation where authors use MRI, PET and CT imaging. The images are passed
through three separate CNNs and the outputs are fused together to get a more precise
segmented image. In [13], the authors propose a novel architecture of using deep-
CNNs (DCNNs) to work collaboratively towards the segmentation of brain tumor
and skin lesions. The DCNNs are paired and whenever a DCNN misclassifies a data
input, a synergic error is produced that updates the whole network together with the
usual back propagated error. Similar kind of works using CNNs are presented below
in Table 1 in a confined manner.
Though CNNs are shown to be very effective in object detection and segmenta-
tion, they required datasets with large number of samples and correspondingly high
memory requirements and processing power. Also they fail to detect the variations
in pixel information at the boundaries. Therefore, authors in [18] propose a recur-
rent neural network (RNN) based architecture where it learns the level-set based
deformable models (LDMs, also known as the geometric or implicit active contour
models) evolving under constant and mean curvature velocities. The specific tasks
considered in this work were the segmentation of the Optic Disc and Cup in color
fundus images, cell nuclei in histopathology images and the left atrium in cardiac
MRI volumes. The block diagram will remain the same as in Fig. 2, however CNN
block is replaced by RNNs. Similar kinds of works that aim at medical segmentation
using CNNs and RNNs can be seen in [19–21]. The Table 2 shows the image datasets
being used in the majority of the research works related to medical imaging.
84 E. Sandeep Kumar and P. Satya Jayadev

Table 1 DL and image analysis works


Citations Imaging modality Remarks
Koitka et al. [14] X-ray Uses faster-RCNN with Inception
ResNet V2 as a feature extractor.
DL techniques are used to
determine the ossification areas in
bone to determine the age of
fossils. Dataset was taken from
RSNA pediatric bone age
challenge
Deniz et al. [15] MRI (magnetic resonance UNets (type of CNNs) are used to
imaging) work on the MR image slices to
segment proximal-femur-RoI for
fracture risk assessment.
Abd-Ellah et al. [16] MRI (magnetic resonance The work has two stages: the first
imaging) stage has CNNs (AlexNets,
VGG-16 and VGG-19) used with
error-correcting output codes
based support vector machine
(ECOC-SVM) for tumor detection
and in the second part, R-CNN for
the tumor localization
Kamnitsas et al. [17] MRI (magnetic resonance The work uses 3D CNNs to
imaging) segment lesions from the brain
images taken from patients having
brain tumors and experienced
ischemic stroke

Summary In this section various DL algorithms and their use in image analysis task
was reviewed. Majority of the existing works in image analysis focus on segmentation
and detection of RoI in images. DL architectures like CNNs and its variants can be
widely seen. RNNs were also applied for a few imaging tasks, and combination of
deep learning with naive machine learning techniques like support vector machines
(SVMs) are also encountered in the literature. Even though image segmentation using
machine learning techniques was studied for many decades, it was a tedious job to
extract meaningful information from the images (especially medical images) due
to a lot of hand feature engineering involved. Usage of DL algorithms reduced this
effort and clinical support systems reliant on the medical imaging inferences got a
tool to take the decisions in a timely manner. It is also observed that supervised deep
learning techniques were employed more than the unsupervised learning techniques
in the existing literature related to image analysis.
Deep Learning for Clinical Decision Support Systems … 85

Table 2 Image datasets


Dataset Remarks
Brainweb [22] Contains simulated brain MRI images of
normal and multiple sclerosis
MICCAI [23] Contains brain tumor data
NIHCC [24] Contains chest X-ray images of 8 kinds of
diseases: Atelectasis, Cardiomegaly, Effusion,
Infiltration, Mass, Pneumonia, Pneumothorax
TCIA [25] Lung Image Database Consortium (LIDC),
Reference Image Database to Evaluate
Response (RIDER), and other image datasets
related to- breast cancer, lung phantom,
non-small cell lung cancer, brain cancer,
Glioblastoma Multiforme, Squamous Cell
Carcinoma, prostate cancer, etc.
OASIS [26] Contains brain images—MR and PET
ADNI [27] Contains data pertaining to Alzheimer’s
disease
FITBIR [28] Traumatic brain injury dataset
STARE [29] Retinal image dataset
John Hopkins Medical Institute repository Brain image database
[30]
MIDAS [31] Brain lesions image dataset
UCI repository [32] Parkinson’s disease dataset
Cornell repository [33] Chest CT images
USF repository [34] Mammography and gait baseline
LIDC [35] Lung image database
SCR database [36] Chest radiographs
VIA group [37] Lung image database
Mini-MIAS [38] Mammogram images
DIARETDB1 [39] Diabetic retinopathy images
OAI [40] General image database

3 DL and Natural Language Processing

Natural Language Processing (NLP) is the ability of a computer program to under-


stand, interpret and manipulate human language. The applications of NLP in general
include enterprise search where the computer programs extract the information from
human speech, thereby searching for the relevant records in the database and return-
ing with an answer. Sentiment analysis is another powerful application of NLP where
the comments and reviews are analyzed by data scientists to get a feedback on the
performance, for further improvement. Specifically, in healthcare, NLP has a vital
role to play apart from the above mentioned ones. Firstly, even though many patients
86 E. Sandeep Kumar and P. Satya Jayadev

can access their electronic health record (EHR), which is a real-time patient data
record, they cannot interpret it. One of the prime reasons is the lack of time from
the medical practitioners to make patients understand the EHR data. By using NLP,
one can understand the data and keep his health on check through suggested medical
prescriptions, daily activity chart, and so on. That apart, converting an image or a
pdf into informative text and thereby parsing and analyzing it to extract useful infor-
mation is another application of NLP. One best example is an IBM Watson machine
[41], where the machine is trained to run on the patient’s data and extract the risk
features and thereby predict possible diseases that could affect the patient.
Let us go through the existing works in the literature that uses NLP for clinical
support system. The work in [42] presents an approach on usage of NLP to extract
the potential medical conditions from the free-text medical reports. The entire pro-
cess here is composed of two main components: the background application and
the problem list management application. The background app is responsible for
extracting the information about possible medical conditions using rule based NLP
from the medical documents and stores it in a central database. Problem list man-
agement app accesses the data stored in the database, and concludes on the medical
problem of a patient. The work focused on 80 different types of medical conditions
like Arrhythmia or Ischemic heart disease; Mitral stenosis or Left bundle branch
block; Wheeze or Pain. In [43], authors propose an NLP based method to analyze
and compare the health records of the patients who are more likely to commit suicide
and who have already attempted suicide. The work is based on the fact that many
patients who are at the risk of committing suicide meet their physicians for consul-
tancy. This study used eNQUIRENet, a database that links EHR data across multiple
non-integrated primary care clinical organizations representing more than 3 million
patients and 1700 clinicians. Three sources were used to confirm that the patient
has a suicidal tendency—firstly searching ICD-9 codes (International Classification
of Diseases codes) indicating suicide attempt or ideation: E950–959 (attempt) and
V62.84 (ideation) from the EHR, second being parsing the HPI field (History of
Present Illness) to recognize the entries that are relevant to the symptoms of the
suicide like self harm, hang, cut attempts; third field is the PHQ-9 (Patient Health
Questionnaire) examination where the depression severity is recorded. The extracted
fields confirmed that suicide attempts is more likely seen than only ideation. A sim-
ilar work is seen in [44] to infer on the presence of acute bacterial pneumonia based
on chest X-ray reports of 292 patients using rule based NLP.
However, all the methods mentioned above do not use concepts of DL even though
they are considered to be NLP systems for clinical support. In this context, we
are proposing a method that uses CNN for text classification. The method has the
following stages: (i) extraction of keywords from the data records (ii) Converting the
word sequences from the text/sentences and medical codes to a vector form using
a look-up table/feature mapping process and (iii) classifying the text into disease
occurrence by feeding the obtained sequence of vectors to a sequence of convolution
and pooling layers. The block diagram shown in Fig. 3 explains this method. The
output of the classification layer can be used for any prediction or identification
purposes in CDSS.
Deep Learning for Clinical Decision Support Systems … 87

Fig. 3 Classification from text data using CNNs

In a similar way, we can use RNNs for learning from texts. Figure 4 is a possible
architecture based on RNNs of which can be used for learning from text data. The
figure shows a series of RNN cells connected sequentially to form a network. The
words in the clinical text or the medical codes are fed as the input to this sequence
learner and the output can be taken from all the RNN cells or just the last RNN
cell based on the requirement. For instance, the text or symptomatic information
extracted from the EHR can be fed to these networks to predict the most probable
disease affecting the patient.
A similar network can be built by replacing vanilla RNNs in the architecture
by long short term memory (LSTM) cells. Usage of LSTMs has an advantage of
88 E. Sandeep Kumar and P. Satya Jayadev

Fig. 4 Block diagram of sequential text learning using RNNs

carrying forward the information for a longer part of the sequence using a memory
cell and multiple gates. This helps the neural network to learn the changes in training
the dataset with fewer errors.
The following are a few links to the datasets often used in NLP for clinical support
and healthcare applications.

Dataset Remarks
MIMIC [45] Developed by MIT and has anonymised health record of approx.
40,000 critical patients
i2b2 [46] Health records of nearly 1500 patients
HealthData [47] Health data from US Federal Government
BCHC data platform [48] Health data from 26 cities, for 34 health indicators and across 6
demographic indicators
HMD [49] Human mortality database
MHealth dataset [50] Database of body motion and physical activities
Medicare [51] Data on services and procedures that physicians and other
healthcare professionals provided to Medicare beneficiaries
LSDB [52] Data related to life sciences
(continued)
Deep Learning for Clinical Decision Support Systems … 89

(continued)
Dataset Remarks
HCUP-US [53] Datasets contain encounter-level information on inpatient stays,
emergency department visits, and ambulatory surgery in US
hospitals
SEER [54] Data about cancer incidence segmented by demographic groups
such as age, race, and gender, provided by the US government
BROAD [55] Data categorized by project such as brain cancer, leukemia,
melanoma, etc.

In general, the overall system block diagram of DL application on EHR is as


shown in the Fig. 5.
As shown in Fig. 5, deep learning techniques applied on the EHR should perform
three major tasks: single concept extraction which is to extract information like
possible diseases, treatments and procedures. Secondly, temporal event extraction
which assigns time to the events, like within a few hours, from this month and so on.

Fig. 5 NLP for healthcare decisions [56]


90 E. Sandeep Kumar and P. Satya Jayadev

Third is the relation extraction like which treatment effects what, which test is for
what and so on.
In the above discussions, few DL techniques like CNNs and RNNs are explained
in detail. However, there are other DL techniques that are widely used for NLP
applications for clinical decision support such as Boltzmann machines and its variants
like deep belief networks [57], autoencoders and its variants like sparse autoencoders,
variational and denoising autoencoders. In that context, in [58] authors used deep
belief networks (DBNs) that uses restricted Boltzmann machines (RBMs) as building
blocks for call-routing in call–center customer hotline that gives technical assistance
for a Fortune–500 company. RBMs have an advantage of extracting useful features
from the data using visible and hidden node architecture. The obtained features are fed
to the layers of RBMs to form DBN, and trained using Kullback–Leibler divergence.
In addition, DBNs are used as feature extractors for the traditional machine learning
algorithms like SVMs, Maximum entropy and boosting. The obtained results in
that work proves that combining DBNs with SVMs, provide better accuracy that
using those learning models individually for solving the call-routing problem. The
same method can be used to process speech in medical domain as well. Few other
applications of RBMs are seen in [59–61].

3.1 Challenges for Using DL for NLP in Healthcare

• Data heterogeneity: EHR data is available in different forms varying from hand-
written text to printed documents. DL algorithms must be able to parse and under-
stand this data. Specifically, clinical texts contain abbreviations, shorthand nota-
tions and vary from one clinician to another.
• Policy and data privacy issues: Training using DL algorithms requires large
datasets. Providing this data to DL researchers is always bound by the policies and
the privacy concerns of the patients.
• Deciding benchmarks: Since many researchers use their own private data they
are hesitant to share the data to other researchers and hence, setting a common
benchmark for a task in clinical support is difficult.
• Inherent problems of DL: These problems come from the DL algorithms them-
selves such as the choice of the model for a task, data size, tuning hyper parameters,
high performance hardware requirements, over fitting and under fitting issues,
generalization issues, flexibility (bias and variance tradeoffs) and multitasking
(learning multiple tasks together taking advantage of common knowledge) issues.

Summary Natural language processing (NLP) is one among the well sought areas
of deep learning research communities. The use of DL to understand and interpret
the health records saves time of clinicians while providing timely medications to
patients. NLP applications involve the use of a wide range of algorithms from simple
rule based data parsing techniques to usage of convolution and recurrent neural
networks. There are few challenges and issues in using DL based NLP for CDSS and
Deep Learning for Clinical Decision Support Systems … 91

addressing these would lead to an important milestone in the progress of automated


medicine.

4 DL and Wearable Device Technology

Wearable technology is revolutionizing consumer electronics. With the advances in


circuit technology, wearable devices are being widely used to capture the patterns
of the patients for clinical support and decision making process. Apple’s iWatch
[62] having Mayo clinic app to capture the health conditions of the patients like
heart rate, blood pressure, body temperature and calories burnt, is one of the best
examples of wearable technology. Remote patient monitoring is one of the key focus
of this technology, where a patient can be monitored without re-admissions, and the
patient’s progress can be distantly invigilated and intervened when there is a sudden
decline in health condition. The devices collect data and transmit to a centralized
cloud where the clinical support decisions are taken.
In general, combination of DL with the wearable technology is derived from
the comparison of the big data system with the human nervous system [63]. The
human body-central nervous system comprises the brain and spinal cord as the major
organs. The spinal cord picks the signals from the different parts of the body using
sense organs. The same phenomenon is imitated in the wearable-DL technology
where the cloud supported by DL constitute the brain of the system, the sensing and
communication modules are analogous to the sensory organs and the spinal cord
respectively. The complete block diagram of wearable technology with DL is shown
in Fig. 6.
The internal architecture of a wearable sensor is shown in Fig. 7, where there
is an internal battery and a charger unit or sometimes the module can be driven
by an external processor to which it is interfaced. Bio-sensors are used to fetch the
physiological signals from the body. These signals are passed through a pre-amplifier
and a signal conditioning circuit to eliminate noise and minor signal artifacts, and
an ADC (Analog to Digital Converter) for the conversion. From ADC, the signal
reaches a controller that transmits the data wirelessly by using a suitable wireless
module. Sometimes sensing modules might send directly to the cloud or to a nearby
aggregator that aggregates the data from many sensors and transmits the data to the
cloud.
Wearable technology has evolved with many improvements and innovations and
few among them are discussed here. As discussed before, in traditional wearable
technology system, DL resides in the cloud since large amount of computation power
is required for the algorithms to execute and sensing module which is driven by a
battery cannot afford to execute these algorithms. In this context, authors in [64]
provide an innovative solution where the DL tasks are not allotted to the cloud but
rather to a local hand held device like a smart phone or a tablet, bringing in the notion
of edge computing. Doing this will not require an internet connectivity always and
92 E. Sandeep Kumar and P. Satya Jayadev

Fig. 6 Block diagram of wearable DL

Fig. 7 Block diagram of sensing module

privacy breach which arises due to transfer of data to an external site (cloud) can be
avoided.
In [65], the authors propose a new idea of using a smart phone as the sensing device
with DL programs running on the phone itself. The accelerometers, gyroscopes and
the magnetometer sensors available on smart phones are used to study the human
activity. The work contains use of SIFT (Scale Invariant Feature Transform) for
feature extraction from the signals picked up by the smart phone sensors and the
Deep Learning for Clinical Decision Support Systems … 93

obtained features are passed onto convolution neural network for classifying the
signal into a human activity.
In [66], the authors propose a complete architecture for CDSS based on wearable
technology and basic machine learning algorithms. The architecture contains four
tiers: tier-1 does pervasive monitoring of the physiological signals like ECG, EEG,
respiratory signals, oxygen and heart rate, body temperature, ankle and foot motion.
The obtained signals are passed to tier-2 which provides preliminary decision support
to the physicians even though accurate laboratory measurements are not yet avail-
able at this stage. In tier-3, a more detailed analysis of the patient combined with
the laboratory measurements is carried out. Finally tier-4 provides post-diagnostic
suggestions, prescriptions and so on. All these tiers are internally connected to a diag-
nosis engine that contains machine learning algorithms providing decisions to every
tier. All the laboratory test and diagnosis data is fed to that engine that provides
adequate decisions at every point of time. The machine learning assistance block
contains single or ensemble of learning algorithms. The authors have explored the
usage of random forest, naive bayes, K-nearest neighbor, SVM, best-first decision
tree and multilayer perceptron models for diagnosis inference. These models open-
up ways for exploring the usage of deep learning algorithms instead of traditional
ML algorithms.
A very interesting work is observed in [67], where the authors propose a method to
monitor the symptoms of mental health using wearable technology. The locomotion
data is picked by GPS, accelerometer and gyroscopes, speech is picked by micro-
phones in smart phones or watch, facial expressions by the camera in the phone, eye
blink pattern by camera, electrodermal activity by a smart watch, social interaction
pattern by voice calls, twitter and other social network data. Though not many details
are discussed as to how these signals can be utilized for monitoring mental health,
this opens up a new direction, where a CDSS based on learning of the mental health
signals can be designed using the same methodology dealt in [65].
In [68], use of wearable technology to remotely monitor elderly citizens is pro-
posed and referred it as Smart Healthcare Monitoring System (SW-SHMS). The
architecture of SW-SHMS has three main parts: patient’s environment where the
body is attached with sensors to read temperature, blood oxygen level, heart rate
and this sensed data is transmitted to the patient’s smart phone or a gateway device
via which the data reaches the cloud. The corresponding block diagram is shown in
Fig. 8.
Cloud performs various analytics on the data using machine learning and/or DL
algorithms to extract useful inference which is later sent to the monitoring platform
containing of the doctors who can take clinical decisions and take precautionary
measures.
According to the survey of existing works in [63], these are the list of DL algo-
rithms that are often seen in combination with wearable technology, they are: deep
unsupervised learning—restricted boltzmann machines, deep belief networks, deep
boltzmann machines, autoencoders and variational autoencoders, generative adver-
sarial networks and sequence learning; deep supervised learning—feed forward neu-
ral networks, deep neural networks, spike neural networks, sequence to sequence
94 E. Sandeep Kumar and P. Satya Jayadev

Fig. 8 SW-SHMS system architecture

learning, RNNs, LSTMs, GRUs, Convolutional LSTMs; deep reinforcement learn-


ing—deep Q networks and inverse DRL.
Summary Majority of the works that use wearable technology with DL are having
similar architecture where the application of analytics and other DL tasks are done on
the cloud. However, there are exceptions being developed to this, where researchers
propose use of edge computing to address the issue of privacy breach and computation
burden of centralized cloud methods. Almost all kinds of DL algorithms are being
used to perform analytics. The results obtained from the analytics are being used to
assist clinicians to take further decisions with respect to a patient.

5 Issues in Using DL for CDSS

The following are a few problems that are still prevailing towards usage of DL for
CDSS: The following are a few problems that are still prevailing towards usage of
DL for CDSS:
1. Regulations and policies: There are no fixed rules and regulations for using
DL in clinical decision support systems. To overcome this difficulty, US FDA
Deep Learning for Clinical Decision Support Systems … 95

made the first set of regulations [69] for assessing AI systems in healthcare. The
guidelines mentioned by FDA clearly notifies about the use of data and adaptive
designs in clinical trials. In this direction, Arterys’ medical imaging platform
became the first FDA-approved DL platform for CDSS.
2. Data sharing: Training and validation of DL systems requires huge amount of
data, and the sharing of it among the hospitals and the DL experts. Currently
there are no incentives for people to share data and also they are bound by IP
rights and privacy policies. However, the data exchange is now slowly turning
towards a reward based system, one best example is the insurance companies
collect data from physicians for data analytics and also crowd sourcing of health
data is slowly booming up.
3. Data compatibility: Sometimes the data obtained by the machines and the pro-
cedures adopted in healthcare is often not useful for DL/ML systems due to lack
of compatibility with the algorithms in use.
4. Privacy issues: As already mentioned, health data is personal information of an
individual and many times family member, relatives and clinicians may refuse to
provide the data as a notion of privacy breach. To solve this DL experts came up
with the concept of distributed machine learning where the training and testing of
the learning algorithm will happen at the place where data is generated without
transferring it to the centralized cloud. However, the method might still take a
considerable amount of time to become acceptable to medical practitioners and
be regularly used by them.
5. Sociocultural issues: Most of the patients or clinicians do not trust the use of AI
in healthcare and in many cases people are more cautious to stake their lives or
careers for using AI. Also, people working in medical domain have feared job
insecurity due to the AI systems showing higher level of accuracy than human
experts. In addition, the concept of AI is not understandable by majority common
people in our society and there is fear due to unawareness and uncertainty.
6. Transparency: Many DL algorithms contain black boxes without much inner
details and lack in explaining the clinicians why certain prediction are coming
from an algorithm. This makes a clinician not to have much on trust AI based
systems.

6 Future Research Directions

1. Explanatory DL: As already mentioned, many DL algorithms lack explanation


for the predictions made and work like black boxes. This aspect will not allow
sufficient trust to develop on DL based CDSS, where clinicians have to take life
and death decisions. This is one prominent issue that needs to be addressed in
the future research.
96 E. Sandeep Kumar and P. Satya Jayadev

2. Rationality Versus Irrationality: Humans can be extremely irrational with


regard to medical decisions. But ML and DL algorithms are trained to be ratio-
nal learners. This issue has to be addressed in the future research and there are
researchers coming up with game theoretic based solutions for this purpose.
However, it is still an open issue.
3. Data security and privacy: When there are pool of databases or a crowd sourced
data pools being together used for training and testing ML/DL algorithms online,
there are always chances of algorithms getting compromised or data getting
leaked. Hence, security is always a concern of these expert systems. In addition,
the privacy of the patients should also be safeguarded. In this context, novel
security and privacy algorithms are always needed.
4. Artificial Intelligence (AI) to Extended Intelligence (EI): The notion of human
intelligence versus artificial intelligence should fade away in the upcoming
research. This also increases trust among patients and clinicians to use AI in
healthcare. This is possible only if the AI transforms to EI where machines
become part of the learning and supporting ecosystem i.e. machines providing
support to the activities performed by humans in their daily lives.
5. Skewed or Imbalanced datasets: It becomes extremely important that the data
set that we are using for the training and validating the AI algorithms are not
using the datasets that are skewed towards a single class. In case the data is class
imbalanced, then either the data is to be pre-processed or algorithms be modified
for learning to happen in an unbiased manner. This is hardly addressed in many
of the existing works and needs to be looked into in the future.
6. DL for prognosis: Majority of the works that focus on the use of deep learning
aims at inferences which are required for diagnosis. However, there are limited
works that aim at prognosis (to have knowledge before hand, to know how likely
the health situation is going to turn out) leading to a medical condition. Using DL
for prognosis apart from diagnosis will be a good direction for future research.

7 Conclusions

In this chapter, we discussed three important applications of DL for CDSS that


are towards image analysis, natural language processing and wearable technology.
When disruptive technologies like AI makes its way into the world of healthcare and
medicine, the traditional methods of healthcare and means of taking clinical decisions
undergo drastic transformation. This is very much essential in today’s era where we
are heading towards building of a smart city in which smart healthcare holds high
prominence. Usage of sophisticated learning algorithms have been making the tasks
of clinicians easy, saving their time, and increasing the quality of life of patients and
common people. There are few issues associated with using AI and DL in CDSS
especially with security and privacy concerns, but nevertheless in the future there is
no doubt that DL will become one of the most powerful tools for decision making
in clinical diagnosis leading to smart healthcare.
Deep Learning for Clinical Decision Support Systems … 97

References

1. Esteva, A., Robicquet, A., Ramsundar, B., Kuleshov, V., DePristo, M., Chou, K., Cui, C.,
Corrado, G., Thrun, S., Dean, J.: A guide to deep learning in healthcare. Nat. Med. 25, 24–29
(2019)
2. Safran, C., Bloomrosen, M., Hammond, W.E., Labkoff, S., Markel-Fox, S., Tang, P.C., Detmer,
D.E.: Toward a national framework for the secondary use of health data: an American medical
informatics association white paper. J. Am. Med. Inf. Assoc. 14(1), 1–9 (2007). https://doi.org/
10.1197/jamia.m2273. ISSN 1067-5027. PMC 2329823. PMID 17077452
3. Atta-ur-Rahman, M.I.B.A: Virtual clinic: a CDSS assisted telemedicine framework. In:
Telemedicine Technologies, chap. 15, 1st edn. Elsevier (2019)
4. Atta-ur-Rahman, S.M.H., Jamil, S.: Virtual clinic: a telemedicine proposal for remote areas
of Pakistan. In: 3rd World Congress on Information and Communication Technologies
(WICT’13), pp. 46–50, 15–18 Dec, Vietnam (2013)
5. Wang, J.X., Sullivan, D.K., Wells, A.J., Wells, A.C., Chen, J.H.: Neural networks for clinical
order decision support. AMIA Jt. Summits Trans. Sci. Proc. 2019, 315–324 (2019)
6. Yang, Z., Huang, Y., Jiang, Y., Sun, Y., Zhang, Y.-J., Luo, P.: Clinical assistant diagnosis for
electronic medical record based on convolutional neural network. Sci. Rep. 8(6329) (2018)
7. Yamashita, R., Nishio, M., Do, R.K.G., Togashi, K.: Convolutional neural networks: an
overview and application in radiology. Insights Imaging 9, 611–629 (2018). https://doi.org/
10.1007/s13244-018-0639-9. Springer Publications
8. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., van der Laak,
J.A.W.M., van Ginneken, B., Sánchez, C.I.: A survey on deep learning in medical image
analysis. Med. Image Anal. 42, 60–88 (2017)
9. Avendi, M., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model
approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image
Anal. 30, 108–119 (2016)
10. Szegedy, C., Toshev, A., Erhan, D.: Deep Neural Networks for Object Detection. NIPS (2013)
11. Wang, G., Li, W., Zuluaga, M.A., Pratt, R., Patel, P.A., Aertsen, M., Doel, T., David, A.L.,
Deprest, J., Ourselin, S., Vercauteren, T.: Interactive medical image segmentation using deep
learning with image-specific fine tuning. IEEE Trans. Med. Imaging 37(7), 1562–1573 (2018)
12. Guo, Z., Li, X., Huang, H., Guo, N., Li, Q.: Medical image segmentation based on multimodal
convolutional neural network: study on image fusion schemes. In: IEEE 15th International
Symposium on Biomedical Imaging (ISBI 2018), 4–7 Apr 2018, Washington, D.C., USA,
pp. 903–907
13. Zhang, J., Xie, Y., Wu, Q., Xia, Y.: Medical image classification using synergic deep learning.
Med. Image Anal. 54, 10–19 (2019)
14. Koitka, S., Demircioglu, A., Kim, M.S., Friedrich, C.M., Nensa, F.: Ossification area localiza-
tion in pediatric hand radiographs using deep neural networks for object detection. PLoS One
13(11), e0207496 (2018). https://doi.org/10.1371/journal.pone.0207496
15. Deniz, C.M., Xiang, S., Hallyburton, R.S., Welbeck, A., Babb, J.S., Honig, S., Cho, K., Chang,
G.: Segmentation of the proximal femur from MR images using deep convolutional neural
networks. Sci. Rep. 8(16485) (2018)
16. Abd-Ellah, M.K., Awad, A.I., Khalaf, A.A.M., Hamed, H.F.A.: Two-phase multi-model auto-
matic brain tumour diagnosis system from magnetic resonance images using convolutional
neural networks. EURASIP J. Image Video Process. 2018, 97 (2018)
17. Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueck-
ert, S., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain
lesion segmentation. Med. Image Anal. 36, 61–78 (2017)
18. Chakravarty, A., Sivaswamy, J.: RACE-net: a recurrent neural network for biomedical image
segmentation. IEEE J. Biomed. Health Inf.
19. Wang, S., He, K., Nie, D., Zhou, S., Gao, Y., Shen, D.: CT Male pelvic organ segmentation
using fully convolutional networks with boundary sensitive representation. Med. Image Anal.
(2019)
98 E. Sandeep Kumar and P. Satya Jayadev

20. Ambellan, F., Tack, A., Ehlke, M., Zachow, S.: Automated segmentation of knee bone and
cartilage combining statistical shape knowledge and convolutional neural networks Data from
the osteoarthritis initiative. Med. Image Anal. 52, 109–118 (2019)
21. Gao, Y., Phillips, J.M., Zheng, Y., Min, R., Fletcher, P.T., Gerig, G.: Fully convolutional
structured LSTM networks for joint 4D medical image segmentation. In: IEEE 15th interna-
tional symposium on biomedical imaging (ISBI 2018), Washington, DC, 2018, pp. 1104–1108.
https://doi.org/10.1109/isbi.2018.8363764
22. http://brainweb.bic.mni.mcgill.ca/brainweb/
23. http://braintumorsegmentation.org/
24. https://nihcc.app.box.com/v/ChestXray-NIHCC
25. https://www.cancerimagingarchive.net/
26. http://www.oasis-brains.org/#data
27. http://adni.loni.usc.edu/
28. https://fitbir.nih.gov/
29. http://cecas.clemson.edu/~ahoover/stare/
30. http://lbam.med.jhmi.edu/
31. https://www.insight-journal.org/midas/
32. http://archive.ics.uci.edu/ml/index.php
33. http://www.via.cornell.edu/databases/
34. http://www.eng.usf.edu/cvprg/
35. https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI
36. http://www.isi.uu.nl/Research/Databases/SCR/
37. http://www.via.cornell.edu/crpf.html
38. http://peipa.essex.ac.uk/info/mias.html
39. http://www2.it.lut.fi/project/imageret/diaretdb1/
40. https://oai.epi-ucsf.org/datarelease/
41. IBM Watson Clinical Decision support system. https://www.ibm.com/watson-health/solutions/
clinical-decision-support
42. Meystre, S., Haug, P.J.: Natural language processing to extract medical problems from elec-
tronic clinical documents: performance evaluation. J. Biomed. Inf. 39(6), 589–599 (2006).
ISSN 1532-0464
43. Anderson, H.D., Pace, W.D., Brandt, E., Nielsen, R.D., Allen, R.R., Libby, A.M., West, D.R.,
Valuck, R.J.: Monitoring suicidal patients in primary care using electronic health records. J.
Am. Board Fam. Med. 28(1), 65–71 (2015). https://doi.org/10.3122/jabfm.2015.01.140181
44. Fiszman, M., Chapman, W.W., Aronsky, D., Evans, R.S., Haug, P.J.: Automatic detection of
acute bacterial pneumonia from chest X Ray reports. J. Am. Med. Inform. Assoc. 7(6), 593–604
(2000)
45. https://mimic.physionet.org/
46. https://www.i2b2.org/NLP/DataSets/Main.php
47. https://healthdata.gov/search/type/dataset
48. https://bchi.bigcitieshealth.org/indicators/1827/searches/34444
49. https://www.mortality.org/
50. https://archive.ics.uci.edu/ml/datasets/MHEALTH+Dataset
51. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/
Medicare-Provider-Charge-Data/Physician-and-Other-Supplier.html
52. https://dbarchive.biosciencedbc.jp/index-e.html
53. https://hcup-us.ahrq.gov/databases.jsp
54. https://seer.cancer.gov/faststats/index.html
55. https://gengo.ai/datasets/18-free-life-sciences-medical-datasets-for-machine-learning/?utm_
campaign=c&utm_medium=quora&utm_source=rei
56. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in
deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health
Inf. 22(5), 1589–1604 (2018). https://doi.org/10.1109/JBHI.2017.2767063
Deep Learning for Clinical Decision Support Systems … 99

57. Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural lan-
guage understanding. IEEE/ACM Trans. Audio, Speech, Lang. Process. 22(4), 778–784 (2014).
https://doi.org/10.1109/TASLP.2014.2303296
58. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10),
1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
59. Jin, Y., Zhang, H., Du, D.: Improving deep belief networks via delta rule for sentiment clas-
sification. In: IEEE 28th international conference on tools with artificial intelligence (ICTAI),
San Jose, CA, pp. 410–414 (2016). https://doi.org/10.1109/ictai.2016.0069
60. Jiang, X., Zhang, H., Duan, F., Quan, X.: Identify Huntington’s disease associated genes based
on restricted Boltzmann machine with RNA-seq data. BMC Bioinf. 18(1), 447 (2017). https://
doi.org/10.1186/s12859-017-1859-6
61. Tomczak, J.M.: Learning informative features from restricted Boltzmann machines. Neural
Process. Lett. 44(3), 735–750 (2016). https://doi.org/10.1007/s11063-015-9491-9. Springer
Publications
62. https://www.apple.com/in/watch/
63. Dargazany, A.R., Stegagno, P., Mankodiya, K.: Wearable DL: wearable internet-of-things and
deep learning for big data analytics—concept, literature, and future. Mob. Inf. Syst. (8125126),
20 (2018). https://doi.org/10.1155/2018/8125126
64. Xu, M., Qian, F., Zhu, M., Huang, F., Pushp, S., Liu, X.: DeepWear: adaptive local
offloading for on-wearable deep learning. IEEE Nat. Future Mob. Inf. Syst. Article ID
8125126, 20 (2018). https://doi.org/10.1155/2018/8125126TransactionsonMobileComputing,
https://doi.org/10.1109/tmc.2019.2893250
65. Ravi, D., Wong, C., Lo, B., Yang, G.: Deep learning for human activity recognition: a resource
efficient implementation on low-power devices. In: IEEE 13th international conference on
wearable and implantable body sensor networks (BSN), San Francisco, CA, pp. 71–76 (2016).
https://doi.org/10.1109/bsn.2016.7516235
66. Yin, H., Jha, N.K.: A health decision support system for disease diagnosis based on wearable
medical sensors and machine learning ensembles. IEEE Trans. Multi-Scale Comput. Syst. 3(4),
228–241 (2017). https://doi.org/10.1109/tmscs.2017.2710194
67. Abdullah, S., Choudhury, T.: Sensing technologies for monitoring serious mental illnesses.
IEEE Multimedia 25(1), 61–75 (2018). https://doi.org/10.1109/mmul.2018.011921236
68. Al-khafajiy, M., Baker, T., Chalmers, C., Asim, M., Kolivand, H., Fahim, M., Waraich, A.:
Remote health monitoring of elderly through wearable sensors. Multimed. Tools Appl. 78(17),
24681–24706 (2019). https://doi.org/10.1007/s11042-018-7134-7. Springer Publications
69. Jiang, F., Jiang, Y., Zhi, H., et al.: Artificial intelligence in healthcare: past, present and future.
Stroke Vasc. Neurol. 2 (2017). https://doi.org/10.1136/svn-2017-000101

E. Sandeep Kumar completed his Bachelor of Engineering (B.E.) in Telecommunication Engg.,


from Jawaharlal Nehru National College of Engineering (JNNCE), Shimoga, Karnataka, India
with six merit awards and distinction. He completed his Master of Technology (M. Tech) in Digi-
tal Communication Engineering from M.S. Ramaiah Institute of Technology (MSRIT), Bangalore,
India with first rank and a gold medal. Currently, he is a collaborative Ph.D. scholar with MSRIT,
IIT-Madras and FIU, Miami. His area of interest is data and network science and has published
many papers in international, national journals and conferences.

Pappu Satya Jayadev earned his Bachelors in Electrical and Electronics Engineering, with dis-
tinction, from Gayatri Vidya Parished College of Engineering, Visakhapatnam. Currently, he is a
graduate scholar (M.S. + Ph.D.) at IIT Madras, working with Dr. Ramkrishna Pasumarthy and Dr.
Nirav Bhatt. He is affiliated with the Robert Bosch Center for Data Science and AI, and Systems
and Control groups at IIT Madras. His research interests include analysis, optimization and con-
trol of systems, applying the tools of machine learning and deep learning. His works have been
published in multiple national and international conferences.
Review of Machine Learning and Deep
Learning Based Recommender Systems
for Health Informatics

Jayita Saha, Chandreyee Chowdhury and Suparna Biswas

Abstract Recommender Systems have become essential in personalized healthcare


as they provide meaningful information to the patients depending on the specific
requirements and availability of health records. With the improvement of machine
learning techniques, the recommender system brings about several opportunities to
the medical science. Systems can perform more efficiently and solve complex prob-
lems using deep learning, even when data set is diverse and unstructured. Here we
present a comprehensive overview of the challenges associated with the existing
recommender systems. Machine learning and deep learning techniques that are gen-
erally applied for health recommender system are discussed in detail along with their
application to health informatics.

Keywords Health informatics · Recommender system · Machine learning · Deep


learning · Smart healthcare · Semi supervised learning

1 Introduction to Biomedical and Health Informatics

World is facing major demographic challenges such as increase of life expectancy


leading to aging population and prevalence of chronic diseases. Treatment of such
diseases requires daily monitoring, often through hospitalization. These challenges
are compounded by the rising healthcare costs. Thankfully, technology has come
up a long way to provide assistance to citizens especially for monitoring health
parameters under free living conditions. Thus, the period of hospitalization may
be reduced while improving the quality of life of citizens. User behavior can be

J. Saha · C. Chowdhury (B)


Computer Science and Engineering, Jadavpur University, Kolkata, India
e-mail: [email protected]
J. Saha
e-mail: [email protected]
S. Biswas
Computer Science and Engineering, Maulana Abul Kalam Azad University of Technology,
Kolkata, India
e-mail: [email protected]
© Springer Nature Switzerland AG 2020 101
S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_6
102 J. Saha et al.

objectively monitored through non-invasive sensing technologies to shed light on


the relation between effects of physical activity and daily lifestyle on health of the
individual.
Health informatics helps to link technologies, communications and healthcare
to improve safety, quality of healthy lifestyle and operating medical information
systems. Here, informatics refers to the science of how to apply knowledge extracted
from collected data to improve health and the quality of health care services. Hence,
in health informatics, computer and information science principles are applied for
the betterment of patient care and public health.
Such technologies have come up to provide assistance to both health professionals
and citizens, especially for monitoring health conditions through detailed analysis
of various health records as described in Fig. 1. These assistance applications play
a crucial role in spreading health awareness. For instance, smartphone and wearable
sensing devices are used to collect human daily activities and information related to
everyday life [1].

Fig. 1 Components of biomedical and health informatics


Review of Machine Learning and Deep Learning … 103

Current studies have shown that keeping track of lifestyle related information
such as daily steps, body weight and spent calories are very useful to develop user
awareness that may ultimately lead to healthy lifestyle—a crucial component for
treating many chronic diseases. In fact, these measurements over time may reveal
interesting insights concerning the user efforts and the final outcome. In this context,
recommender systems could be readily utilized for health informatics which may lead
to the improvement of chronic health conditions. Thus, in this chapter an overview
of recommender systems is presented. The state-of-the-art applications where these
are used, machine learning, especially, deep learning techniques that are applied is
also detailed in this chapter.

2 Introduction to Recommender System

Recommender systems aim to help users by providing suitable options to execute a


task easily and efficiently. Such systems learn user behavior by filtering through a
large amount of data [2].
Two scenarios for health recommendation system are as follows.
• In the first scenario, the health professionals are considered to be the end-users of
health recommender systems.
• In the second scenario, patients are considered to be the end-users.
Health professionals would be able to retrieve additional information, such as
related research articles or clinical guidelines through the use of recommender sys-
tems. In the second scenario, the end-users may be benefitted by getting evidence
based, high quality health related content.
Recommender Systems (RS) [3] in healthcare are useful in decision making and
assisting in personalized healthcare for generating meaningful recommendations
depending on the domain and the particular characteristics of available health records.
However, a recommender system should also comprehend the patient, the require-
ments, and the attitudes in the context of health and disease management. Thus,
health recommender systems should be more sensitive for these kind of applications
[2]. In the following subsections, the architecture of such systems and its applications
that are reported in the literature are discussed.

2.1 Application in Healthcare

Recommendation systems are getting increasingly popular day-by-day, especially


for applications in health informatics. Ranging from Bioinformatics to predicting
the spread of infectious diseases, such systems are often applied to detect any hidden
pattern and hence recommend a suitable solution. These systems are mostly based
on machine learning, especially deep learning techniques. These learning techniques
104 J. Saha et al.

Table 1 Overview of the applications of recommendations for health informatics


Area of applications Application Input parameters Learning techniques
Bioinformatics [4–6] Drug design Molecule compounds Deep neural network
RNA binding protein Gene RNA/DNA Deep belief network
Compound protein sequences Deep neural network
interaction Molecule compounds
Medical imaging Tissue classification MRI/CT images Convolutional neural
[7–9] Organ segmentation Microscopy network
Tumor detection Hyperspectral images Convolutional deep
Hemorrhage Endoscopy images belief network
detection
Deep neural network
Pervasive sensing Monitoring of ECG, EEG Convolutional neural
[10–13] biological parameters Devices implanted network
Anomaly detection
Human activity Wearable sensing Convolutional neural
recognition devices network
Smartphones Deep belief network
Video
Obstacle detection RGB-D camera Convolutional neural
Sign language Real-sense camera network
recognition Depth camera Deep belief network
Hand gesture
recognition
Public health [14, 15] Lifestyle diseases Text messages Convolutional neural
Infectious disease Social media data network
epidemics Geo-tagged images Deep belief network
Deep neural network

are detailed in the subsequent sections. In Table 1, representative health informat-


ics applications are summarized that employs different deep learning techniques.
These techniques are discussed in detail in subsequent sections. The main applica-
tion domains are as follows.
Bioinformatics − It focuses on investigating and understanding the biological processes at a
molecular level. Pharmacogenomics is a field of bioinformatics that attempts to analyze the
variable drug response of different subjects because of their genetic differences.

Hence this field explores the design of more effective drugs for personalized treat-
ment thereby reducing side effects. Understanding the influence of environmental
factors on formation of protein and their interactions is another interesting application
where deep learning techniques are found to be very useful.
• Medical Imaging—Automated medical image analysis is a crucial requirement
today for modern medicine. In recent years, deep learning techniques, especially
convolutional neural networks are becoming increasingly popular in the medical
imaging research community. It is because deep learning techniques are found
Review of Machine Learning and Deep Learning … 105

to perform extremely well for computer vision applications and run-time perfor-
mance of such techniques could be improved when parallelized on GPUs.
• Pervasive Sensing—Ambient, wearable and even implantable sensors are used to
monitor body vitals for health for elderly specifically under free living conditions.
Regular monitoring of energy expenditure of a person throughout the day along
with his food intake helps him to curb obesity and thus improve personal health.
Different wearable and ambient sensors are used to monitor daily activities pro-
vide assistance for elderly patients to improve their quality of life. Human activity
recognition can also be utilized for rehabilitation of heart and stroke patient and
post trauma recovery. Such activity recognition can be performed using wear-
able and implantable assistive devices. Continuous monitoring of body vital signs
are important for improving the treatment of patients in critical care as physical
conditions of such patients need to be carefully analyzed [16].
• Public Health—This has come up as an important discipline as it aims to prevent
disease proactively by analyzing the possible spreading patterns of a disease. It also
aims to investigate the influence of environmental factors on social behaviors. The
spread of a disease or even social habits induced by environmental factors can be
localized to a small area, a state or even across country. Public health applications
mostly focus on different patterns of spread of epidemics and lifestyle diseases and
analyze the inherent factors influencing such behavior. As the data size increases,
scalability becomes a crucial issue that is hard to be addressed by the conventional
predictive models. Therefore, performance tuning of these systems is difficult and
can only be done by domain experts. The deep learning algorithm designs mostly
explore online machine learning. Thus, cost function optimization takes place
sequentially and new training datasets are considered as input to the system. So,
evidently, deep learning techniques play important roles in the health recommender
system for research studies in public health [17].

2.2 System Architecture

A health recommender system has several phases following the basic architecture of
a health informatics system as described in Fig. 2. Publicly available health datasets
and quality metrics are two key concerns for the success of recommender systems
in health informatics.

3 Overview of Health Recommender System

Machine learning algorithms are very useful for various recommendation systems
for the application domains stated in the previous section. It can provide better
recommendations from traditional approaches. It can reduce computation complexity
106 J. Saha et al.

Fig. 2 System architecture of health recommender system

and work with multi source data. Existing Health Recommender Systems (HRS) can
be classified into two categories based on their application.
(a) Disease Diagnosis HRS
People with multiple health conditions may have specific challenges and co-
morbidities. The onset of a challenge could divulge an underlying medical condition.
In this way, the medical conditions may be diagnosed early so as to provide early care
through recommendations, which is otherwise not possible. Thus, medical conditions
leading to medical emergencies may also be prevented.
Healthcare recommender systems for diagnosis and monitoring of chronic dis-
eases play an important role in the continuous monitoring and support of people in
need through extending proper advice and prediction of risks associated with diag-
nosed diseases. Such systems may act as managing and controlling tools to assist
physicians and patients. However, providing an accurate recommendation for med-
ical data in real-time is a challenging task due to factors such as the complexity of
medical data in terms of unbalanced, large, multi-dimensional, noisy and/or missing
data.
Depression and mental disorder are increasingly becoming a major problem in
present society. Depression is usually accompanied by a negative effect, the assort-
ment of physical, emotional, and behavioral symptoms. Hence, an intelligent health
recommender system is proposed in [18], based on smartphones to monitor patients
with a mental disorder (mainly related to anxiety) and provides treatment as neces-
sary.
Recommender systems are designed exploiting IoT enabled technologies for m-
Health domain [3] to acquire patient data based on which proper advice is rendered.
Such systems facilitate the task of caregivers by suggesting suitable advice that may
lead patients towards a better quality of life. To tackle with sufficient dataset, existing
benchmark dataset has been referred to the experiment with the proposed system.
Review of Machine Learning and Deep Learning … 107

Using heterogeneous sensors various physiological signals are sensed to analyze the
patient condition to prescribe personalized solutions. The Cloud based architecture
of recommender systems help in uploading and downloading of health data with
proper access control policy.
In [19], a recommender system especially for patients suffering from chronic dis-
eases such as diabetes is designed to improve quality of life by assisting both patients
and caregivers with the prediction of accurate disease related risks and trustworthy
health recommendations. Accurate prediction model has been built to diagnose risks
related to chronic diseases by applying multiple classifications using decision tree
algorithms and to prescribe more accurate medical advice by applying unified collab-
orative filtering based on patients’ medical history, external features, etc. Challenges
of existing recommender systems are: (i) missing or erroneous data due to human
error or sensor devices, large size of medical database, etc. (ii) two dimensional data
problems—one is based on historical recommendations and another is the relation
between the patient’s external features and the practitioner’s advice. Accordingly the
recommendation system presented in [19] is found to outperform in terms of recall,
precision using random forest algorithm compared to other algorithms such as J48,
decision stump, REP tree, etc.
Intelligent and accurate recommender system development has attracted funding
for its relevance in current socio-economic condition and having the support of
enabling technologies such as IoT, machine learning, big data, etc.
In [20], the authors proposed and developed a recommender system for person-
alized care and support of people suffering from dementia, which causes memory
loss to the sufferers whose number is increasing alarmingly worldwide. This work
is funded by EU H020 project and targeted to build a software platform consider-
ing dementia patients and their caregivers as a dyad. Dependence of recommender
systems on user data creates problems that are termed as cold-start problem. Deal-
ing with new users in the system is problematic as sufficient information may not be
present in the database for a new user. There should be a balance between generalized
solutions based on general model and over-accuracy/overestimation.

(b) Content based HRS

This type of recommender systems is intended for users to semantically explore and
detect his/her disease related conditions. Such systems often follow a layered archi-
tecture as in [21]. This is comprised of (i) user layer to keep a record of interactions
of user agents and their preferences, to manage semantic search, data source access
and ranking of preferences, (ii) data layer to store acquired data with access control.
The performance of this system can be improved further by combining this semantic
based approach with a more structured medical practitioner based method.
The huge popularity of health related videos on the Internet raises concerns about
the video quality and content. To aid people referring to such videos a content based
recommender system is designed in [22] to link with health related videos to content
rich websites. Method of such linking is done by application of NLP that is, metadata
or keywords are extracted from YouTube videos like video name, title, topic, etc.
108 J. Saha et al.

that are used to search for semantic web based content for reference. Correctness
and effectiveness of such linking are evaluated through several metrics measurements
such as relevance, precision, etc.
Systems are also designed to search and select trustworthy health related web
based contents available in the internet for recommendation with the individualistic
approach [23]. In this context, recommender systems could be categorized as col-
laborative recommender system, content based, and knowledge based recommender
systems, etc. Profiles of users and items, social media information are generally fed
as input to the recommender systems.

4 Learning Techniques for Health Informatics

Selecting the proper learning technique for analyzing health data is important to
mitigate several challenges of the health recommender system. Such techniques are
applied to build patterns to describe, analyze, predict data and define the current
health status of the users. Several works could be found in medical image process-
ing to diagnose and earlier detection of diseases using different machine learning
techniques. The existing learning techniques are Supervised, Semi-supervised and
Unsupervised as shown in Fig. 3.

Fig. 3 Learning techniques for machine learning and deep learning


Review of Machine Learning and Deep Learning … 109

4.1 Supervised Learning

Supervised learning is a learning technique used to identify objects, and diagnose a


disease based on previous related data. It can be applied to sufficiently labeled data.
Representative supervised classifiers applied for health recommender systems are
detailed as follows.

4.1.1 Instance Based Learning

Instance-based learning methods (IBL) [24] are supervised learning algorithms to


learn several types of database and classify the objects. Each instance can be described
by n attribute-value pairs. In general, a group of related instances is fetched from
memory for classifying any new queries. The most popular instance-based learning
algorithm is the k-Nearest Neighbor (kNN) algorithm. Though it can solve a complex
problem, it is a simple learning algorithm and it can work with little information.
Several distance metrics can be used to define the nearest neighbor of an instance.
Euclidean distance can be calculated for all instances to find the nearest neighbor in
the n-dimensional space as shown in Fig. 4b.
Human Activity Recognition (HAR) systems face the challenge, when training
and test environment is totally different. One of the major issues is device independent
activity monitoring. Smartphone based inertial sensor like accelerometer, gyroscope
are generally used for collecting raw sensory data for several daily activities. kNN
is applied in [25] for device independent activity monitoring and is found to achieve
considerable accuracy.
Researchers recently have come up with a term—Energy Expenditure (EE). Both
HAR and EE have been investigated, still, certain challenges remain like energy

Fig. 4 Example of a logistic regression and b k nearest neighbor (kNN) classification


110 J. Saha et al.

consumption during human movement or no movement. In [26] authors made an


attempt to solve this problem using an accelerometer and ECG. To propose this
system data are collected from thirteen voluntary participants for six daily activi-
ties. Some selected Heart-Rate Variability (HRV) parameters are used to analyze the
performance of HAR system. The activity-specific model with HRV parameters pro-
vides better performance. Their results indicate that the use of human physiological
data has an important effect on HAR and Energy expenditure, which are important
for assisted living as it aids healthcare system efficiently.

4.1.2 Decision Tree (J48)

The decision tree (J48) algorithm can be used in classification and regression problem
and it can solve the problem by using tree representation. It can represent the decision
explicitly and visually. Each tree contains internal node and leaf nodes. The internal
node corresponds to an attribute and class labels are present in the leaf node. The
representation of the tree is understandable as if-then rules are used here. Trees are
grown arbitrarily, so a minimum number of inputs should be fixed for leaf node
or the maximum depth of the model should be specified. Pruning helps to improve
performance and reduce the complexity of this algorithm. It removes a few branches
of the tree, which make use of features having low importance.
The authors in [19] proposed a health recommender system for disease based on
decision tree and collaborative filtering. The disease related data are mostly huge and
collected from multiple sources. Most of the time data are multi-dimensional and few
data are missing or noises are present in the dataset. It becomes difficult to handle
those data using traditional approaches. Filtering techniques are used to remove the
noises and reduce the ambiguous labels. Decision tree is applied here to build a
model for predicting, diagnosis of the diseases and their risk. An ensemble model of
Random Forest is built using several decision trees. The unified collaborative filtering
method helps to achieve better recommendation on the basis of previous records and
other features.
Decision trees are either used alone or in combination with other supervised
classifiers for HRS. In [27], the authors considered smartphone based and wrist
worn motion (accelerometer, gyroscope and linear acceleration) sensors to identify
several complex activities like smoking, eating, drinking coffee, etc. Naive Bayes,
decision tree and k nearest neighbor (kNN) three different classifiers are used for the
work with different window size to recognize simple as well as complex activities.
GENEActiv is a wrist-worn triaxial accelerometer that is used in [28], to classify
walking, running and stationary activities and achieved good accuracy. The authors
in [29] deployed both support vector machines (SVM) and decision trees in their
framework.
Depression prediction and monitoring is a crucial challenge for the health recom-
mender systems. Huge data like user behavior, daily activities, mood details, etc. are
needed for analyzing and predicting the disease. The heterogeneous data make the
system complex. Hence the authors in [18] proposed an intelligent system to provide
Review of Machine Learning and Deep Learning … 111

useful recommendation. Combination of Decision tree and SVM are used to build
this system. Various external factors related to depression are considered to build
this prediction model.

4.1.3 Logistic Regression (LR)

Logistic regression is a classification technique that applies the sigmoid function for
a linear combination of input features. It can predict the data based on real-valued
inputs that are combined linearly using weights or coefficient values. In general, the
outputs are binary values 0 or 1. The output of Logistic regression classification when
applied on a diabetic dataset with default parameter is shown in Fig. 4a.
In [30], the authors proposed a device independent activity monitoring with a
minimal number of smartphone inertial sensors. The energy efficient ubiquitous
system is machine learning based and, performs well with Logistic Regression using
inexpensive time domain features.

4.1.4 Multi-layer Perceptron (MLP)

Multi-layer Perceptron is a feed forward artificial neural network, composed of more


than one perceptron. It has at least three layers, (i) the input layer is to feed input
patterns, (ii) output layer makes the prediction of the given input, (iii) an arbitrary
number of hidden layers in between these two layers. Each node of this network is
neuron and use nonlinear activation functions. MLP utilizes a supervised learning
technique called back-propagation. In forward pass signal moves from the input layer
to output layer through the hidden layer. The outputs are fed back to input following
the back-propagation algorithm in order to adjust the weights and biases. It is easily
distinguishable from linear perceptron because it has multiple layers and nonlinear
activation function.
MLP is heavily applied in HRS. For HAR, MLP could be utilized to monitor
several detailed daily activities [31]. MLP can also be used in combination with
other classifiers to further boost the accuracy. For instance, in [31], it is also applied
in combination with LogitBoost, and SVM to identify daily activities even when the
smartphone is held by the users in their hands.

4.1.5 Ensemble Model

Sometimes, it could be hard to detect all individual class labels with appreciable
accuracy using one base classifier. An ensemble of classifiers can be applied instead.
The ensemble model combines the outcome of different base learners. Every base
learner attempts to classify the test set instances based on the training set instances.
The ensemble model takes a decision about the class label of the test instances
through combining the outcome of all the base learners. This adds generality to the
112 J. Saha et al.

system. Bagging and boosting are the two methods of ensembling that are heavily
used in literature. In bagging, the training set is divided into a no. of bags and a base
classifier is tuned according to each of these subsets forming a set of classification
models. But, in boosting, the same training set is applied in different iterations, though
each instance is assigned a different weight depending on the ease of classifying the
instance in the previous iteration.
Ensemble may indicate a combination of different condition based classifiers also.
For instance, in [32], a condition based ensemble classifier is formed to address the
effect of using different smartphones (having various hardware configurations) and
usage behavior, such as, how the smartphone is carried by the user (shirt pockets, right
pants pocket, or right hand) on detailed HAR. It follows the principles of bagging.
The health care recommendation systems for consumers need to make relevant
suggestions on the basis of predicting probability values for different health condi-
tions. The ensemble model is used in [33] to build this kind of model. The Bayesian
network and Random Forest are used to build the ensemble model and it provides
the better recommendation.

4.2 Semi-supervised Learning

Labeling becomes expensive for various healthcare recommendation problems,


such as, gathering enough data for different emergency conditions. Hence, semi-
supervised learning algorithms are designed to deal with a combination of labeled
and unlabeled data. Features are extracted from unlabeled data and are mapped to
determine the dispersion of data in the feature space.

4.2.1 Multi-instance Learning

In Multi-instance learning, each object contains a set of instances and only associated
with a single label as shown in Fig. 5a. Thus, every single instance need not be labeled,
only a bag of instances is assigned a proper label.

Fig. 5 Semi-supervised learning technique a multi-Instance learning and b multi-label learning


Review of Machine Learning and Deep Learning … 113

Semi-supervised learning is essential for sparsely labeled data. The authors in [34]
proposed a HAR framework to monitor user daily activity. The dataset is sparsely
labeled. They applied Multi-Instance Learning (MIL) for handling different annota-
tion strategies. Few novel extensions of MIL are also found in literature to reduce
the required level of traditional supervision. MI-SVM, citation kNN classifiers are
also designed to deal with multiple instances having a single label. Several types
of bags are used to represent the continuous dataset in MIL. In [34], three types
of labeling (Single, multi-labeled and majority voting) for the bag of instances are
considered to represent the entire test and training dataset. Iterative multi-instance
Support Vector Machine (SVM) is found to perform better for single labeled bags,
whereas the standard multi-instance SVM has been found to perform better for multi
labeled bags.

4.2.2 Multi-label Learning

In Multi-label learning, the training dataset contains instances associated with a set
of labels. It can classify the label sets of unseen instances on the basis of training
instances with known label sets. In general, one instance is present in a multi-label
object and K number of class labels are associated with it as shown in Fig. 5b.
The authors in [35] proposed a HAR system based on Multi-label machine learn-
ing and Expectation-Maximization (EM) algorithm. The system can identify several
activities correctly when there is a time gap between the two actions. The pseudo
sequence data are used for the entire experiment. The multi-label data set is stochas-
tically labeled. EM algorithm is executed and the probability distribution of the data
labels is learned.

4.2.3 Graph Based Learning

The graph based semi-supervised learning technique is also used for HRS based HAR
systems. A small set of labeled data with few unlabeled data is found to be present
in the experimental dataset reported in [36]. The HAR framework can record long
duration activity data, by using experience sampling without detailed annotations by
propagating provided labels to the neighboring data.

4.3 Unsupervised Learning

In unsupervised learning, data sets need not have any label, the data pattern is
unknown to us and we need to find the hidden patterns in the unlabeled data. It
is useful when the approximation of the data label is poor. Clustering is an unsu-
pervised learning mechanism to grouping similar data into clusters. Representative
clustering mechanisms that could be applied in HRS is discussed below.
114 J. Saha et al.

4.3.1 k-Means Clustering

k-means clustering is an iterative clustering algorithm. The total number of clusters


or k is needed to be defined in the initial state. The centroid is the center of clusters and
centroids are randomly chosen in the initial state. All the data points are grouped into
k clusters so that each data point belongs to any one of the clusters. The algorithm
starts with choosing initial individual centroids for k clusters. Each data point is
assigned to its nearest centroid and all the points which are assigned to a centroid
create a cluster. The centroids of clusters are updated and assignments of data points
are changed from the initial state. It will continue until the clusters stabilize.
The authors in [37] proposed a system to monitor daily activity based on skeletal
movement data. Data are collected from an inexpensive RGBD (RGB-Depth) sensor.
The data labeling is not properly maintained, hence unsupervised learning is used
here. Appreciable precision and recall values are reported using k-means clustering.

4.3.2 Density Based Spatial Clustering of Applications with Noise


(DBSCAN)

It is a clustering technique and mainly used in unsupervised learning. No need to


set a cluster number on a priority basis. It can build arbitrary shaped clusters. Two
important parameters are used in this algorithm to define the clusters. One is eps
or epsilons a positive number to denote the radius of the neighborhood of a point.
Another is MinPts, the minimum number of points in between eps-neighborhood
of a point. It can identify the data points which are in a dense region of the feature
space. The data points within a dense region with MinPts are known as core points.
Border point has no MinPts, it is directly reachable from the core point. Initially, it
starts by randomly choosing a data point p from the dataset, which is not included in
any cluster. The neighborhood of that point is calculated. The given data point p is
considered as core point when there are more MinPts points including p within eps-
neighborhood distance. All the directly reachable points of eps-neighborhood from
point p are included to create the cluster. Expand the cluster as necessary to include all
the density reachable points. The algorithm randomly picks the next unclassified data
point from the dataset when there is no more points to include in the present cluster.
The algorithm will be continued until all the points are classified or processed. Few
points have less than MinPoints points and do not belong to any cluster. Those points
are considered as noise points and are discarded. It can separate high density cluster
from low density cluster and manage outlier points in a given dataset. Figure 6a
shows the dataset before clustering and Fig. 6b represents the clusters after applying
DBSCAN. The DBSCAN cannot work well if the density of clusters are very high
dimensional and varies widely.
Sometimes it is difficult to get the proper labeling of each activity performed by
a user. Hence, instance wise labeling becomes costly. The authors in [39] proposed
a HAR using Unsupervised Learning. Here, the number of different activities is
unknown. Data are collected from smartphone inertial sensors. Several features are
Review of Machine Learning and Deep Learning … 115

Fig. 6 DBSCAN clustering technique a original dataset before clustering, b clustered data [38]

extracted. The mix of Gaussian method with DBSCAN clustering makes this system
more efficient. With proper tuning of MinPts and eps, it achieves good accuracy for
daily living activities when the number of activity is unknown to the system.

4.3.3 Hierarchical Clustering

Hierarchical clustering is another type of clustering algorithm. The data points are
grouped together to form a tree or hierarchy of clusters. The clusters are graphically
represented using dendrogram. Initially, all data points are assigned a cluster. It needs
a terminating condition to stop the algorithm. In general, two types of hierarchical
clustering are available, one is Agglomerative (bottom–up) and another is Divisive
(top–down). Agglomerative clustering starts with each cluster representing a single
data point. All the similar pair of clusters are merged in each step. On the other hand,
divisive clustering starts from top level with a single cluster and it includes all the
data points. It splits the top level cluster into child clusters in each step until the
individual child clusters contain only a single data point. The condition of cluster
build-up is known as linkage or dissimilarity of two objects.
Several types of linkage are used in Hierarchical clustering. The smallest distance
between two points of two different clusters is known as Single link or Min, whereas
the maximum distance between two points of two different clusters is known as
complete link or max. Initially, the distance between each pair of points is computed
for individual clusters and then the average distance between all the points of two
different clusters is computed. This is known as Average link or Group Average.
Alternatively, Ward’s method can also be used that computes the sum of the square of
the distances of individual points of two different clusters. Few state-of-the art works
are summarized in Table 2. These works are based on pervasive sensing applications
116 J. Saha et al.

Table 2 Comparison of state-of-the-art works


Existing work Sensor or device Learning Implemented Remarks
and year and position technique classifier
[25] 2013 Smartphone, Supervised, TM k nearest Device
accelerometer, neighbor (kNN) independent
gyroscope, activity
magnetic sensor monitoring,
with expensive
frequency
domain features
[37] 2013 RGBD Unsupervised k-means Unlabeled data
(RGB-depth) learning, clustering are recognized
sensor clustering
[39] 2014 Smartphone Unsupervised DBSCAN Unlabeled data
inertial sensor learning, are recognized,
clustering when the
number of
activities are
unknown can
recognize
[26] 2017 IMU sensor Supervised, TM Linear SVM, Recognition
accelerometer, RBFSVM, ambulatory
wrist, T-Rex kNN, LDA activity and
TR100A ECG energy
expenditure
during activity.
LDA achieves
the highest
accuracy with
IMU and
selected heart
rate
[27] 2017 Wrist worn Supervised, TM Naive Bayes, Recognition
motion sensor decision tree complex
(accelerometer, and kNN activity, vary
gyroscope, window size too
barometer)
[32] 2018 Smartphone Supervised, LR with Device and
accelerometer, ensemble parameter position
gyroscope learning tuning independent
(condition detail (activity
based) slow/brisk walk,
Sit Floor/Chair)
monitoring
Review of Machine Learning and Deep Learning … 117

that also have implications in public health. The table shows how recent works heavily
use different types of machine learning techniques stated above.

4.4 Performance Metrics

The performance of the system can be measured by several performance metrics


[40].
Confusion Matrix: The confusion matrix (C Mn∗n ) represents classification results
for the different classification algorithm. It specifies the following.
• T r ue Positives (T P): The number of positive instances that were classified as
positive.
• T r ue N egatives (T N ): The number of negative instances that were classified as
negative.
• False Positives (F P): The number of negative instances that were classified as
positive.
• False N egatives (F N ): The number of positive instances that were classified
as negative.
Sensitivity: Probability that a test result will be positive when a positive label is
detected.
TP
Sensitivit y =
(T P + F N )

Specificity: Probability that a test result will be negative when a negative label is
detected.
TN
Speci f icit y =
(F P + T N )

Accuracy: Overall classification performance for all classes is denoted by the fol-
lowing equation in the state of the art literature.

(T P + T N )
Accuracy =
(T P + T N + F P + F N )

F-measure (F1): It computes a model’s accuracy that combines precision and recall.
If the output has low false positives and low false negatives, the classifier is correctly
identifying real objects. It is defined as follows.

Pr ecision × Recall
F1 = 2 ×
Pr ecision + Recall
118 J. Saha et al.

Precision: Precision talks about how accurate the model is out of those predicted
positive, and how many data points are actually positive.

TP
Pr ecision =
(F P + T P)

Recall: The completeness of classifiers can be measured using recall. A low recall
indicates many False Negatives. Recall indicates how many of the Actual Positives
captured by the model are really labeled as Positive (True Positive).

TP
Recall =
(T P + F N )

Most of the existing works reported here use one or more of the above mentioned
performance metrics.

5 Deep Learning for Health Data

For bioinformatics and medical imaging applications, it is challenging to build a


sufficiently correct recommender system. The main challenge is that with supervised
learning, the accuracy does not improve appreciably as we add more data to the system
beyond a certain point. Initially, the accuracy improves but it almost stabilizes and
does not scale well even when we add more data. Performance of such techniques
heavily depends on the handcrafted features that are extracted from data. Today,
not only the machines are made more powerful to execute complicated machine
learning techniques, but also huge amount of data are available today that sufficiently
represents each of the different diseases. Hence, deep learning techniques can be
applied to health records that can automatically extract features from data. These
techniques can extract higher dimensional features that are perceived by human
users but are hard to define mathematically by them. With huge data that sufficiently
represents different labels of the data set, deep learning techniques can extract useful
features so as to scale the accuracy with more data points. Not only for analyzing
medical images or gene sequences, but with digitized societies, and hence, availability
of multi-source data, deep learning is also useful for HRS designed for personal
recommendations.
Deep learning techniques are based on neural networks. Multiple hidden layers
are present in a deep network. The first input layer receives several data as inputs,
and then the activation functions of the first hidden layer are applied. Then those
activations are passed on to the subsequent hidden layers to achieve desired output.
With deep learning techniques, health recommender systems could be built based
on recommendations from users and their social relations with the patients. In [41],
a system is developed on the trust and distrust relations of the recommenders. The
Review of Machine Learning and Deep Learning … 119

node information and structural information are merged with a deep learning method
to achieve better performance.
Some of the deep learning techniques are discussed below.

5.1 Supervised Learning

5.1.1 Convolutional Neural Network

Convolutional Neural Network is a type of multi-layer neural network containing


several hidden layers. Each layer is comprised of neurons and weights are attached
to those neurons. Convolution and pooling are the two main functions performed by
these hidden layers. Convolution filter is used to generate feature maps. But a large set
of features increase computation complexity and it can also be prone to over fitting.
So, pooling is used after convolution layers that connect a subset of perceptrons
from the previous layer by applying some pooling function (max, min etc.). Pooling
is used to reduce the dimensionality so as not to overfit. Different combination of
convolution and pooling are used by the CNN based HRS.
Rectified Linear Unit (ReLU) can be used as an activation function for CNNs
rather than sigmoid functions (mostly used in ANNs). This generally achieves better
learning speed for the gradient descent search. The last layer for CNN is a fully
connected layer similar to ANNs and is known as softmax.
The authors in [10] proposed a CNN based algorithm to detect the Myocardial
Infarction (MI) using ElectroCardioGram (ECG) signal. The system is capable to
detect abnormal beats from unknown ECG signal even with noise. Two datasets
are used for this work. The Daubechies wavelet 6 mother wavelet function is used
to remove noise and baseline wander from ECG record. R-peaks of the signal are
detected and the entire signal is segmented using R-Peaks. Segmented signals are nor-
malized using Z-score and are passed through the CNN layer. Features are extracted
in Convolution layer of CNN from the input ECG signal. The activation function
Leaky Rectified Linear Unit is used in several layers of the network and softmax
function is used in the last layer. Here max-pooling function helps to reduce the size
of the feature map and reduce the number of neurons in the next layers. Backprop-
agation method is applied in this proposed network. Few parameters are needed to
be maintained during execution, like Regularization (control the data overfitting),
Momentum (fast or slow network learning time), Learning rate, etc. The proposed
system performs better for ECG beats without noise and achieves good accuracy
value. The proposed system is beneficial for earlier diagnosis of cardiovascular dis-
eases.
Parkinson’s Disease is a neurological movement disorder. Accelerometer signals
captured from the wearable sensors attached to the patients could be beneficial for
monitoring Perkinson’s patients. In [42], the authors developed such a mechanism.
Accelerometer signals are fed as input and several one dimensional kernels are used
to filter the input. Bias is applied on the accelerometer data. Max-pooling is used to
120 J. Saha et al.

reduce the feature dimensions. The softmax function is used here in the last layer for
the classification. It provides better results from state-of-the art classifiers.
Now-a-days, deep learning plays a crucial role in HAR. The authors in [43] pre-
sented a deep learning framework based on the operation of CNN, LSTM, and ELM
classifier. Most of the existing HAR system applied handcrafted features with expert
knowledge, like statistical methods, etc. Here CNN is used in the first stage to extract
features from accelerometer signals and it is considered as a higher-level abstraction
of raw data. It is difficult to recognize the sequence of activity from real-time sensor
data as the temporal dependencies are ignored in the basic structure of this deep net-
work. Several challenges may occur during human activity monitoring, like the simi-
larity between few activity classes (like normal walk or slow walk), variable changes
in accelerometer value in a period of time, etc. Hence the authors applied Long Short-
Term Memory (LSTM) along with basic CNN to achieve better recognition accuracy
from basic CNN. But in real-time, it becomes difficult to achieve good classification
result. So, the Extreme Learning Machine (ELM) is integrated with CNN-LSTM to
improve the performance of the proposed framework in real-time. The parameters of
hidden layers are chosen randomly and weights are calculated using the least square
method. The proposed framework achieves 0.88 F-Score applying Baseline CNN,
whereas CNN-LSTM-ELM technique achieves better prediction, and the results are
improved with the proposed technique.
CNN is used in [44] to model intelligent health recommender systems. It works
on supplementary data to find the recommended hospital on the basis of previous
data analysis. The Convolution Restricted Boltzmann Machine (RBM) model is the
combination of RBM and CNN, works as two layer model and use the features
of both the learning methods. It can work with big data and help to build effective
health recommender system. Two techniques like Root Square Mean Error and Mean
Absolute Error are used to minimize the system errors.

5.1.2 Recurrent Neural Network

In real life, it is difficult to represent all problems with fixed length inputs and out-
puts. Like time series human daily activity accelerometer trace, and due to continuous
data pattern there will be new data samples in each time window, hence it requires a
capable system to store the sequence of data and use the context of the information.
The Recurrent Neural Networks (RNN) is a robust neural network that can utilize
sequential information of the data pattern. This helps to build context aware recom-
mendation systems. RNNs can capture the information about previous computation
and use it as input to the next hidden layer. It can process the large network towards
the time direction in the training phase and fast sequential process in the identifica-
tion phase. Sometimes, the output of RNN model not only depends on the previous
output in the sequence but also needs future elements as shown in Fig. 7a. This kind
of network is known as Bidirectional RNN. RNN can work well with different size
of processing inputs. It can take historical information for computation. Generated
Review of Machine Learning and Deep Learning … 121

Fig. 7 Deep learning technique of a recurrent neural network and b restricted Boltzmann machine

weights are shared across the network. But, computation is slow from other network
and sometimes it is difficult to access the information of distant past.
The authors in [45] proposed a novel patient monitoring framework using RNN
and Density Based Clustering method. It can monitor ECG signals, and identify ECG
beats with different heart rates of the user. Here, features are extracted automatically,
based on morphology information including the current heartbeat and T wave of
the former heartbeat. It computes a strong correlation between ECG signals and
considers ECG beats with various lengths. Here Long Short Term Memory (LSTM),
a variation of RNN is applied to maintain the details of the previous context.

5.2 Unsupervised Deep Learning

5.2.1 Restricted Boltzmann Machine and Deep Belief Network

Restricted Boltzmann Machine (RBM) is an important technique in unsupervised


deep learning. It is a probabilistic undirected graphical model as shown in Fig. 7b.
It is useful for dimensionality reduction, classification, regression, feature learning,
etc. Two layers of RBM are stacked on each other to build a Deep Belief Network
(DBN). Several low-level features are extracted from data points and each visible
nodes capture that information. Each input is multiplied by a weight and the results are
added to a bias. Then these results are fed into an activation function. The activation
function helps to generate the output. It can add all the inputs to a single node, then
all the inputs are multiplied by weights and summation of all the products are added
to the bias.
RBM can be trained in a greedy manner to build DBN. The single-layer network
of DBN works in dual mode, hidden for its previous node and visible or input layer
for the next node.
122 J. Saha et al.

The authors in [11] proposed a HAR system based on DBN using wearable sen-
sors. Features are extracted from the raw data set automatically. The Linear Discrim-
inant Analysis (LDA) and Kernel Principal Component Analysis (KPCA) are used
to reduce feature space dimensionality. Several hyper parameters like mini batch
size, initial value of weight, learning rate, number of hidden layers and units, etc. are
needed to be configured for DBN.

5.2.2 Autoencoder

Autoencoder plays an important role in unsupervised learning of deep network using


back propagation technique. An autoencoder network has two parts, encoder, and
decoder. In general, encoder part of the autoencoder converts input data to a com-
pressed version without losing relevant information and thus, overall data size is
reduced. The reduction process is known as dimensionality reduction. The decoder
part helps to convert the data in large format and get the output similar to input. It
has the capability to reduce and reproduce the input features. It follows the archi-
tecture of traditional neural network, input, and output layers have the same number
of nodes. The total number of nodes is less in the hidden layers. Few parameters
are important for autoencoders—(i) total number of nodes in middle layer or code
size, (ii) total number of layers, (iii) total number of nodes per layer, and (iv) loss
function according to the range of input values (cross-entropy and mean squared
error). In literature, different types of autoencoders are mentioned, such as, sparse,
regularized, and multilayer autoencoders.
The authors in [46] proposed a novel deep network based on an unsupervised deep
feature learning autoencoder. Here patients’ information is retrieved from Electronic
Health Records (EHRs) to represent the general set of information. It helps to manage
and compose the multi domain patients’ data automatically without user intervention.
It works well to predict diseases for several patients from a large database. The
preprocessed data helps to understand detailed information of patients from EHR
using a deep sequence of non-linear transformations.
Cancer disease is generally detected from gene expression of cells. The difference
between gene expressions of normal, non-cancerous tissues and gene expressions
of cancerous tissues can be differentiated using unsupervised deep network [47].
Initially, PCA is considered representing high dimensional raw feature space using
sparse feature learning techniques. Then, autoencoders are applied to improve the
accuracy of cancer detection from gene expressions.
It is very difficult to find relevant information of patients’ current condition and
understand different medical terms and their relations. Collaborative health recom-
mender systems provide useful recommendation in such cases but it faces several
issues like sparse data, cold start problem, etc. In [48], a recommender system is
developed that incorporates various external factors (current time, weather etc.) for
monitoring users’ daily behavior. It considers flexible context specific input and
transition matrices in place of constant data. Autoencoder is applied to build such a
collaborative filtering recommendation system. The user preferences are generated
Review of Machine Learning and Deep Learning … 123

using encoding and decoding process of Autoencoder and the data is stored as a
matrix. Parameters are optimized to reduce reconstruction issues.
In [49], the authors proposed a deep learning based collaborative health rec-
ommender system based on heterogeneous data from multiple sources. Variational
autoencoder neural network is designed to learn the details of primary care of doctors
and extract the various features of patient to incorporate with user profile. It is found
to perform appreciably well.

6 Conclusion

Learning the health data to detect and identify a disease or anomalies in activities of
the user is an important challenge to build a robust health recommender system. We
can find various applications of this type of systems like healthcare, early diagnosis,
elderly care, fitness tracking, and activity monitoring or fall detection. This chapter
provides an insight into the learning techniques used in health recommender systems.
It presents the recent trends and developments in machine learning techniques as well
as deep learning techniques. Deep learning techniques are found to perform better
and make the system more efficient and intelligent due to their automated feature
extraction techniques. Here, we have also discussed several unsupervised learning
techniques, and how it is helpful when the data set is completely unknown to the
system.

References

1. Swan, M.: Sensor mania! The internet of things, wearable computing, objective metrics, and
the quantified self 2.0. J. Sens. Actuator Netw. 1(3), 217–253 (2012)
2. Calero Valdez, A., Ziefle, M., Verbert, K., Felfernig, A., Holzinger, A.: Recommender systems
for health informatics: state-of-the-art and future perspectives. In: Holzinger, A. (ed.) Machine
Learning for Health Informatics. Lecture Notes in Computer Science, vol. 9605. Springer,
Cham (2016)
3. Erdeniz, S.P., Maglogiannis, I., Menychtas, A., Felfernig, A., Tran, T.N.T.: Recommender
systems for IoT enabled m-health applications. In: Iliadis, L., Maglogiannis, I., Plagianakos,
V. (eds.) Artificial Intelligence Applications and Innovations. AIAI 2018. IFIP Advances in
Information and Communication Technology, vol. 520. Springer, Cham (2018)
4. Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively Mul-
titask Networks for Drug Discovery (2015). arXiv:1502.02072
5. Zhang, S., Zhou, J., Hu, H., Gong, H., Chen, L., Cheng, C., Zeng, J.: A deep learning framework
for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 44 (2015).
https://doi.org/10.1093/nar/gkv1025
6. Tian, K., Shao, M., Wang, Y., Guan, J., Zhou, S.: Boosting compound-protein interaction
prediction by deep learning. Methods 110 (2016). https://doi.org/10.1016/j.ymeth.2016.06.024
7. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical
dysplasia diagnosis. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds.)
Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016. MICCAI
2016. Lecture Notes in Computer Science, vol. 9901. Springer, Cham (2016)
124 J. Saha et al.

8. Brosch, T., Tam, R., The Alzheimer’s Disease Neuroimaging Initiative: Manifold learning of
brain MRIs by deep learning. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.)
Medical Image Computing and Computer-Assisted Intervention—MICCAI 2013. MICCAI
2013. Lecture Notes in Computer Science, vol. 8150. Springer, Berlin (2013)
9. Rose, D.C., Arel, I., Karnowski, T.P., Paquit, V.C.: Applying deep-layered clustering to mam-
mography image analytics. In: Biomedical Sciences and Engineering Conference, Oak Ridge,
TN, pp. 1–4 (2010)
10. Acharya, U.R., Fujita, H., Oh, S., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep
convolutional neural network for automated detection of myocardial infarction using ECG
signals. Inf. Sci. 415–416, 190–198 (2017)
11. Hassan, M.M., Huda, S., Uddin, M.Z., Almogren, A., Alrubaian, M.: Human activity recogni-
tion from body sensor data using deep learning. J. Med. Syst. 42, 99 (2018)
12. Poggi, M., Mattoccia, S.: A wearable mobility aid for the visually impaired based on embedded
3d vision and deep learning. In Proceeding of IEEE Symposium of Computer and Communi-
cation, pp. 208–213 (2016)
13. Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using real-sense. In: Proceeding
of IEEE China, SIP, pp. 166–170 (2015)
14. Garimella, V.R.K., Alfayad, A., Weber, I.: Social media image analysis for public health. In:
Proceeding of CHI Conference Human Factors Computer System, pp. 5543–5547 (2016)
15. Zou, B., Lampos, V., Gorton, R., Cox, I.J. On infectious intestinal disease surveillance using
social media content. In: Proceeding of 6th International Conference on Digital Health Con-
ference, pp. 157–161 (2016)
16. Saha, J., Chowdhury, C., Biswas, S.: Two phase ensemble classifier for smartphone based
human activity recognition independent of hardware configuration and usage behavior.
Microsyst. Technol. 24, 2737 (2018)
17. Huang, T., Lan, L., Fang, X., An, P., Min, J., Wang, F.: Promises and challenges of big data
computing in health sciences. Big Data Res. 2(1), 2–11 (2015)
18. Yang, S., Zhou, P., Duan, K., Hossain, M.S., Alhamid, M.F.: emHealth: towards emotion health
through depression prediction and intelligent health recommender system. Mob. Netw. Appl.
23, 216–226 (2018)
19. Hussein, A.S., Omar, W.M., Li, X., Ati, M.: Efficient chronic disease diagnosis prediction and
recommendation system. In: Proceeding of IEEE-EMBS Conference on Biomedical Engineer-
ing and Sciences, Langkawi, pp. 209–214 (2012)
20. Felipe, LO., Barrué, C., Cortés, A., Wolverson, E., Antomarini, M., Landrin, I., Votis, K.,
Paliokas, I., Cortés, U.: Health recommender system design in the context of CAREGIVER-
SPROMMD project. In: Proceeding of PETRA ’18: The 11th PErvasive Technologies Related
to Assistive Environments Conference, June, Corfu, Greece (2018)
21. Morrell, T.G., Kerschberg, I.: Personal health explorer: a semantic health recommendation
system. In: Proceeding of IEEE 28th International Conference on Data Engineering Workshops,
Arlington, VA, pp. 55–59 (2012)
22. Bocanegra, C.L.S., Ramos, J.L.S., Rizo, C., Civit, A., Fernandez-Luque, L.: HealthRecSys: a
semantic content-based recommender system to complement health videos. BMC Med. Inform.
Decis. Mak. 17, 63 (2017)
23. Sanchez-Bocanegra, C.L., Sanchez-Laguna, F., Sevillano, J.L.: Introduction on health recom-
mender systems. Methods Mol. Biol. 1246, 131–146 (2015)
24. Keogh, E.: Instance-based learning. In: Sammut, C., Webb, G.I. (eds.) Encyclopedia of Machine
Learning. Springer, Boston (2011)
25. Ustev, Y.E., Incel, O.D., Ersoy, C.: User, device and orientation independent human activity
recognition on mobile phone challenges and a proposal. In: The ACM Conference on Pervasive
and Ubiquitous Computing Adjunct Publication, Zurich, pp. 1427–1435 (2013)
26. Park, H., Dong, S.Y., Lee, M., Youn, I.: The role of heart-rate variability parameters in activ-
ity recognition and energy-expenditure estimation using wearable sensors. Sensors (Basel)
2017(7), 1698 (2017)
Review of Machine Learning and Deep Learning … 125

27. Shoaib, M., Bosch, S., Incel, O.D., Scholten, H., Havinga, P.J.M.: Complex human activity
recognition using smartphone and wrist-worn motion sensors. In: Sensors, p. 426 (2016)
28. Zhang, S., Rowlands, A.V., Murray, P., Hurst, T.L.: Physical activity classification using the
GENEA wrist-worn accelerometer. Med. Sci. Sports Exerc. 44, 742–748 (2012)
29. Garcia-Ceja, E., Brena, R.F., Carrasco-Jimenez, J.C., Garrido, L.: Long-term activity recogni-
tion from wristwatch accelerometer data. Sensors 14, 22500–22524 (2014)
30. Saha, J., Chowdhury, C., Biswas, S.: Device independent activity monitoring using smart
handhelds. In: Proceeding of 7th International Conference on Cloud Computing, Data Science
and Engineering—Confluence, Noida, pp. 406–411 (2017)
31. Bayat, A., Pomplun, M., Tran, D.A.: A study on human activity recognition using accelerometer
data from smartphones. Procedia Comput. Sci. 34, 450–457 (2014)
32. Saha, J., Roy Chowdhury, I„ Chowdhury, C., Biswas, S., Aslam, N.: An ensemble of condition
based classifiers for device independent detailed human activity recognition using smartphones.
Information 9(4), 94 (2018)
33. Jamshidi, S., Torkamani, M.A., Mellen, J., Jhaveri, M., Pan, P., Chung, J., Kardes, H.: A hybrid
health journey recommender system using electronic medical records. In: The Proceedings
of the 3rd International Workshop on Health Recommender Systems, HealthRecSys 2018,
co-located with the 12th ACM Conference on Recommender Systems (ACM RecSys 2018),
Vancouver, BC, Canada (2018)
34. Stikic, M., Schiele, B.: Activity recognition from sparsely labeled data using multi-instance
learning. In: Proceeding of Location and Context Awareness. LoCA 2009. Lecture Notes in
Computer Science, vol. 5561. Springer, Berlin (2009)
35. Toda, T., Inoue, S., Tanaka, S., Ueda, N.: Training human activity recognition for labels with
inaccurate time stamps. In: Proceeding of UbiComp ’14 Adjunct, pp. 863–872, 13–17 Sept
2014
36. Stikic, M., Larlus, D., Schiele, B.: Multi-graph based semisupervised learning for activity recog-
nition. In: Proceeding of International Symposium on Wearable Computers, Linz, pp. 85–92
(2009)
37. Ong, W.H.: An unsupervised approach for human activity detection and recognition. Int. J.
Simul. Syst. Sci. Technol. 14(5) (2013)
38. https://medium.com/odessa-ml-club/a-journey-to-clustering-introduction-to-dbscan-
e724fa899b6f. Last seen 20/5/2019
39. Kwon, Y., Kang, K., Bae, C.: Unsupervised learning for human activity recognition using
smartphone sensors. Expert Syst. Appl. 41(14), 6067–6074 (2014)
40. Lara, O.D., Labrador, M.A.: A survey of human activity recognition using wearable sensors.
In: IEEE Communication Surveys and Tutorials, vol. 15 (2013)
41. Yuan, W., Li, C., Guan, D., et al.: Socialized healthcare service recommendation using deep
learning. Neural Comput. Appl. 30, 2071–2082 (2018)
42. Eskofier, B.M., et al.: Recent machine learning advancements in sensor-based mobility analysis:
deep learning for Parkinson’s disease assessment. In: Proceeding of 38th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Orlando, FL,
pp. 655–658 (2016)
43. Sun, J., Fu, Y., Li, S., He, J., Xu, C., Tan, L.: Sequential human activity recognition based on
deep convolutional network and extreme learning machine using wearable sensors. Hindawi J.
Sens. (8580959), 10 (2018)
44. Sahoo, A.K., Pradhan, C., Barik, R.K., Dubey, H.: DeepReco: deep learning based health
recommender system using collaborative filtering. Computation 7(25) (2019)
45. Zhang, C., Wang, G., Zhao, J., Gao, P., Lin, J., Yang, H.: Patient-specific ECG classification
based on recurrent neural networks and clustering technique. In: Proceeding of 13th IASTED
International Conference on Biomedical Engineering (BioMed), Innsbruck, Austria, pp. 63–67
(2017)
46. Miotto, R., Li, L., Kidd, A.B., Dudley, J.T.: Deep patient: an unsupervised representation to
predict the future of patients from the electronic health records. Sci. Rep. 6, 1–10 (2016)
126 J. Saha et al.

47. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis
and classification. In: Proceedings of the 30th International Conference on Machine Learning,
JMLR: W&CP vol. 28, Atlanta, Georgia, USA (2013)
48. Sedhain, S., Menon, A.K., Xie, L., Sanner, S.: AutoRec: auto encoders meet collaborative
filtering. In: Proceeding of 24th International Conference World Wide Web, Florence, Italy
(2015)
49. Deng, X., Huangfu, F.: Collaborative variational deep learning for healthcare recommendation.
IEEE Access 7, 55679–55688 (2019)

Jayita Saha is currently pursuing the Ph.D. degree in Computer Science and Engineering at
Jadavpur University, India. She received her B. Tech. and M. Tech. degrees in Computer Science
and Engineering from Durgapur Institute of Advanced Technology and Management and Jadavpur
University, India, in 2008 and 2011, respectively. Her research interests include Human Activity
Recognition and machine learning.

Chandreyee Chowdhury is an Assistant Professor in the department of Computer Science and


Engineering at Jadavpur University, since 2006. She received Ph.D. in Engineering from Jadavpur
University in 2013 and M.E. in Computer Science and Engineering from Jadavpur University in
2005. Her research interests include Wireless Sensor Networks and its variants, mobile crowd-
sensing, and human activity recognition. She was awarded Post Doctoral Fellowship from Erus-
mus Mundus in 2014 to carry out research work at Northumbria University, UK. She is a member
of technical program committees of many international conferences. She is a member of IEEE and
IEEE Computer Society.

Suparna Biswas is an Associate Professor in the Department of Computer Science and Engineer-
ing, Maulana Abul Kalam Azad University of Technology (formerly WBUT), India since 2018.
She obtained her M.E. and Ph.D. from Jadavpur University. She was awarded Post Doctoral Fel-
lowship from Erusmus Mundus in 2014 to carry out research work at Northumbria University, UK.
She has co-authored a number of research papers published in Conferences and journals of inter-
national repute. She has served as a reviewer in Conferences and journals of international repute.
Her areas of research interests are Mobile Computing, Network Security, Wireless Body Area Net-
work, Healthcare Applications etc.
Deep Learning and Electronics
Health Records
Deep Learning and Explainable AI
in Healthcare Using EHR

Sujata Khedkar, Priyanka Gandhi, Gayatri Shinde


and Vignesh Subramanian

Abstract With the evolving time, Artificial Intelligence (AI) has proved to be of
great assistance in the medical field. Rapid advancements led to the availability of
technology which could predict many different diseases risks. Patients Electronic
Health Records (EHR) contains all different kinds of medical data for each patient,
for each medical visit. Now there are many predictive models like random forests,
boosted trees which provide high accuracy but not end-to-end interpretability while
the ones such as Naive-Bayes, logistic regression and single decision trees are intel-
ligible enough but less accurate. These models are interpretable but they lack to see
the temporal relationships in the characteristic attributes present in the EHR data.
Eventually, the model accuracy is compromised. Interpretability of a model is essen-
tial in critical healthcare applications. Interpretability helps the medical personnel
with explanations that build trust towards machine learning systems. This chapter
contains the design and implementation of an Explainable Deep Learning System
for Healthcare using EHR. In this chapter, use of an attention mechanism and Recur-
rent Neural Network(RNN) on EHR data has been discussed, for predicting heart
failure of patients and providing insight into the key diagnoses that have led to the
prediction. The patient’s medical history is given as a sequential input to the RNN
which predicts the heart failure risk and provides explainability along with it. This
represents an ante-hoc explainability model. A neural network having two levels and
attention model is trained for detecting those visits of the patient in his history that
could be influential and significant to understand the reasons behind any prediction
done on the medical history of the patient data. Thus, considering the last visit first
proves to be beneficial. When a prediction is made, the visit-level contribution is

S. Khedkar (B) · P. Gandhi · G. Shinde · V. Subramanian


Department of Computer Engineering, VESIT, Mumbai, India
e-mail: [email protected]
P. Gandhi
e-mail: [email protected]
G. Shinde
e-mail: [email protected]
V. Subramanian
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 129


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_7
130 S. Khedkar et al.

prioritized i.e. which visit contributes the most to the final prediction where each
visit consists of multiple codes. This model can be helpful to medical persons for
predicting the heart failure risks of patients with diseases they have been diagnosed
with based on EHR. This model is then worked upon by local interpretable model-
agnostic explanations (LIME) which provide the different features that positively
and negatively contribute to heart failure risk.

Keywords Heart failure · Predictive modeling · Deep learning · RNN · LIME ·


Explainability · Interpretability · Attention

1 Introduction

Artificial Intelligence (AI) has a huge impact in the Medical Domain. Applications
like managing medical records, assisting to the physicians while operations, predict-
ing diseases based on patient history, drug creation or health monitoring applications
like FitBit, everything can be achieved using AI. But the number of users using these
systems is still less. The main reason for this being that it is difficult for a human
being to trust a machine when it comes to their health. It should be mandatory to
understand what exactly was responsible for output the model gave when dealing
with healthcare as a domain suggested in the new European General Data Protection
Regulation act.
Talking to domain experts revealed that physicians do not prefer using Artificial
Intelligence (AI) systems as an aid to their diagnosis. According to them, these
systems do pace up their diagnosis process but do not provide the reasons behind
their decisions. Thus, they cannot trust the system and have to continue with their
manual method of diagnosis. Thus, even though huge advancements have been made
in the research sector, due to a lack of trust in the systems they are unable to find
wide-ranging business applications. This chapter contains the design of Explainable
AI system based on EHR data [1, 2]. Use of an attention mechanism and Recurrent
Neural Network (RNN) on EHR data has been discussed, for predicting heart failure
of patients and providing insight into the key diagnoses that have led to the prediction
using LIME.
The chapter is organized as follows.
In Sect. 2, related work is discussed. Then Sect. 3, describes the proposed Method-
ology. Section 4 describes the experiments, evaluation. Conclusion and future work
are described in Sect. 5.

2 Related Work

The following Section describes the related work.


Holzinger et al. [3] discussed the various ways to build an explainable model for the
medical domain. Explanations of predictions can be beneficial in teaching, learning,
Deep Learning and Explainable AI in Healthcare Using EHR 131

research, and even in court. In the medical domain, the demand for interpretable
and explainable models is increasing. They must be to re-enact the decision-making
and knowledge extraction process. Explainability is classified into two categories:
ante-hoc and post hoc. Ante-hoc systems incorporate explainability into the model
itself, whereas post hoc systems involve explaining the predictions of a complex
model using a secondary simpler model. Examples of ante-hoc models are decision
trees, linear regression, fuzzy inference system, etc. Examples of post hoc models are
algorithms like BETA (Blackbox Explanations using Transparent Approximations).
Zhao [4] used Electronic Health Records (EHR) datasets filled with a wealth of
all kinds of medical data for each patient for each medical visit. Existing methods
of data analysis on EHR datasets prove to be impossible to understand due to its
size, dimensionality, and irregularity. Heart failure (HF) is difficult to predict as it
is an overarching condition rather than a distinct phenotype. Choi et al. [5] uses
CNNs in the context of natural language processing to process this data. MIMIC
III dataset has been used, consisting of 46,520 patients, 651,047 diagnosis events,
240,095 procedures, and 4,156,450 predictions. For each patient, information about
the ICD9(International Classification of Diseases) codes, procedure items and, drug
names are extracted from the EHR records, and arranged in a sequence similar to a
“sentence”. A word2vec model is then used to convert these sentences to embeddings,
which are then used to train the CNN. The activation function used is Rectified linear
units (ReLu) in the convolutional ad fully connected layers.
Heart failure (HF) is a complex condition whose prediction has proved particularly
difficult due to the various conditions and events that lead to it. Heart failure may
occur due to kidney failure, coronary artery diseases, neural disorders, diabetes,
medications for other conditions, procedures performed, and previous instances of
heart attacks. This complex nature makes it very difficult to predict heart failure.
EHR datasets hold the key to solving this task, however, its size has made it virtually
impenetrable by traditional techniques. Hence, the authors of this paper have taken the
novel approach of using CNNs in the context of NLP (Natural Language Processing)
to efficiently process this data. The data is first concatenated into a sequence form
drawn from diagnoses, procedures and medications. This sequence is then fed to
an embedding layer. Random and word2vec embeddings both have been used for
comparison. Multiple such embedding vectors are stacked and are together fed into
the CNN. The CNN processes this input and produces a binary output (HF or not).
Guestrin et al. [6] discussed in their paper that machine learning models are black
boxes. Trust can be built by understanding the reasons behind predictions. It provides
insights into the model and can be used as a technique to assess model performance
and build better, more accurate and correct models.
Guestrin et al. [6] introduce a new algorithm called LIME algorithm for explaining
predictions of any model. LIME treats the given model as a ‘black-box’ and tries
to explain a prediction instance x by trying to learn the behavior of the prediction
function f(x) in the surroundings of x. The instances surrounding x are obtained
by computing random perturbations of the input by random sampling of the input.
The random sampling is uniform, so as to maintain ensure samplings evenly in the
surrounding of x. This allows obtaining a locally faithful explanation of the prediction
132 S. Khedkar et al.

instance. Explaining a single instance is not enough, so explanations for multiple


instances are generated and presented to the user to explain the model as a whole. Two
techniques: SP-LIME (Selective pick LIME) and RP-LIME (Random pick LIME)
are used to pick the instances to be presented to the user. RP-LIME involves picking
instances randomly. This approach may leave some features unexplained. SP-LIME
involves picking instances such that all features are covered, and minimum redundant
instances are picked.
Bengio et al. [7], this paper deals with neural machine translation, the model
proposed can be used in a host of other applications. Attention models in the context
of EHR data can help in pointing out the features used to generate the prediction.
This serves the dual purpose of adding interpretability to the model and allowing
assessment of the model as to whether the model is considering the right features
while making a prediction. At each hidden state of the RNN, a context vector is
formed by considering the attention weights of all the input features w.r.t. to that
hidden state. In the context of EHR data, attention weights can be learned from the
data, and visualization of these attention weights can help us to analyze the prediction
and the reason behind the prediction.
Choi et al. [8] have used Graph-Based Attention Model (GRAM) for creating
interpretable predictive models based on EHR data. GRAM uses a directed acyclic
graph called Knowledge DAG along with the predictive NN model, in which each
leaf node represents a medical concept, and a non-leaf node represents a more general
concept. It exploits the robust hierarchical ontologies that have been established in
medicine. The process of using the parent nodes (concepts) can be performed using
attention mechanisms and end-to-end training. GRAM shows to achieve 10% higher
accuracy than the basic RNN, the standard model used with EHR data, with an AUC
of 84.48%, while also being interpretable, unlike RNNs. For qualitative assessment
of the interpretations, a 2-D plot using the t-SNE algorithm of the final representation
of 2000 randomly chosen diseases learned by GRAM is used.
GRAM demonstrates how an auxiliary model can be used to interpret the predic-
tions of any neural network. Knowledge DAG successfully exploits heuristic knowl-
edge of ontologies in medicine to learn interpretations even if the dataset available is
small. The use of RNNs as a predictive neural network is very useful for evaluating
the model, as RNNs are already being used to process EHR data.

3 Proposed Methodology

To achieve explainability, two methodologies are being used, shown in Fig. 1.


Explainability is broadly classified into two types—Ante-hoc explainability and
Post hoc explainability. The explanations are built into the prediction model in the
ante-hoc explanatory model, whereas the explanations are provided after the predic-
tion is made in the post hoc explanatory model.
Electronic Health Records (EHR data) was obtained from the MIMIC III database
[2] for which an examination was conducted by the Citi program. MIMIC III dataset
Deep Learning and Explainable AI in Healthcare Using EHR 133

Fig. 1 Proposed methodology

consists of 46,520 patients, 651,047 diagnosis events, and 240,095 procedures. For
each patient, there exists information about the ICD9 (International Classification
of Diseases) codes related to the diagnosis and procedures conducted in the patient.
This dataset was used for the Attention-based RNN model.
The second dataset that was used was Cleveland dataset [9] from the UCI ML
repository and used by many to build heart disease prediction models. This is a small
dataset consisting of 303 patient records. It was used to study the LIME algorithm.
134 S. Khedkar et al.

Attention Mechanism:
Attention mechanisms in neural networks work almost similar to visual attention
mechanism found in humans. Attention, when seen from the human perspective,
tells the human brain what exactly is to be understood and visualized about the
model’s work. The attention mechanism allows the network to refer back to the input
sequence while calculating attention values and does not force the network to encode
all input information into a vector of fixed length.

3.1 Conceptual System Design

• Data pre-processing module


The data is processed, and dictionaries are created, mapping patient numbers to an
admission sequences, and every single admission with a sequence of ICD9 codes,
along with other mappings like timestamps and length of stay. These mappings are
directly used by the model, as a representation of the actual data required.
• Extraction of relevant codes

Only patients showing codes relevant to heart failure are considered for training.
Also, the patient must have made a minimum of three visits within twelve months.

• Visit level attention

Attention is given to individual patient’s visits as an overall feature, considering the


length of stay and the time between visits.

• Variable level attention

Attention is given to the ICD9 codes in each visit, calculating their contributions to
the output.

• Result integration

The results from both the attention levels are integrated, and the final output of the
model is forwarded along with contribution scores for presentation and visualization.

• Visualizations and sentential explanations

Visualizations of the attention scores of each ICD9 code by visit are created to analyze
which diagnoses contributed the most to the output of the model.
Deep Learning and Explainable AI in Healthcare Using EHR 135

3.2 Attention Models

Attention models work by “attending” to input parts while predicting the output,
instead of processing it sequentially. So, if the attention value of a particular feature
in the input is high, it would imply that it highly influenced the output. This gives us
the advantage of being able to interpret the model to understand what part of the input
was considered while predicting whether heart failure is present or not. Attention
maps visualization make us understand where the network sees when trying to make
the prediction. The attention mechanism gives the network the capability to access
the internal memory. So the network chooses what to retrieve from memory. The
weighted combination of all memory locations is retrieved by the network.
A sequence is important for every task that is performed in our everyday lives. Be
it our language where the sequence of words is important or the data of a genome
sequence where every sequence has a different meaning. Time defines the occurrence
of events in time series data. Thus a specific neural network model is needed, known
as the recurrent neural network, designed to work on data that is defined by time.
Medical history of any patient is vital for predicting accurate medical diagnosis.
RNNs make use of this medical history as sequential information to predict the
patient’s heart failure risk and the explanations behind this prediction.
Recurrent Neural Networks (RNNs) surfer’s from the vanishing gradient problem.
It causes information from the past to be washed out for long input sequences. To
solve this problem, many techniques such as Long Short Term Memory (LSTM)
units, Gated Recurrent Units (GRUs).GRU (Gated Recurrent Unit) tries to solve this
problem using vectors called gates.
RNN has vanishing gradient problem which is solved in GRUs by using two
‘gates’—reset and update gate. These two vectors identifies what information(values)
should be passed to the output (the next t-state in the RNN). GRUs can be trained to
preserve information from long ago, without forgetting it over time.

3.3 GRU: How It Works

Let us look at some of the workings of GRUs and the mathematics behind it shown
in Fig. 2. A GRU can be represented diagrammatically as shown.
The notations in the figure are as follows:

The sigmoid functions represent the gates mentioned earlier, one each for the
update and the reset gates.
136 S. Khedkar et al.

Fig. 2 Gated recurrent unit

1. Update Gate

The update gate determines how much past information from previous time steps is
passed along to the future.
The update gate value zt for time step t is calculated by using the following
formula:
 
z t = σ W (z) xt + U (z) h t−1

2. Reset Gate

This gate is used to decide how much of the past information to forge from the model.
It is calculated as follows:
 
rt = σ W (r ) xt + U (r ) h t−1

3. Final memory content

Final memory content, i.e., the output forwarded to the next time step, is calculated
in two steps.
First, the relevant information from the past is stored using the reset gate, in a

variable called the current memory content, h t :
Deep Learning and Explainable AI in Healthcare Using EHR 137


h t = tanh(W xt + rt  U h t−1 )

Finally, we calculate the ht vector, which holds the information to be passed on


down the network. The update gate is used for this. The information to be collected

from h t and h(t −1) are determined as follows:

h t = z t  h t−1 + (1 − z t )  h t

The attention mechanism-based model has two levels of attention, which first
detects influential past visits and then detects significant clinical variables within
those visits. The attention model tries to imitate a physician’s behavior during an
encounter. Just like a physician, it gives greater attention to recent clinical visits, by
considering the recent visits first and the previous visits later, i.e., in reverse order.
This is because stationary models often put together all the previous information,
thus ignoring any information that is time-dependent and can result in loss of tem-
poral relationships present in the input data, which can lead to input data having
huge temporal differences getting similar predictions. So considering the last visit
first proves to be beneficial, as a result of which the model knows which visit is
more important and the model is trained on visit specific features that contribute to
prediction.
When a prediction is made, the visit-level contribution is prioritised i.e. which
visit contributes the most to the final prediction where each visit consists of multiple
codes. Also, the variable level contribution i.e. which variable contributes more to
the final prediction must be known. The model can be viewed in three parts. Part 1
is governed by GRU for visit-level attention weights and since each visit consists
of multiple variables, Part 2 is governed by GRU that generates attention weights
for variable-level. Part 3 is Multi-Layer Perceptron to embed visit information to
preserve interpretability. The visit is embedded to a lower dimensional space using
MLP. Parts 1 and 2 make side loops which later are combined with the MLP model
for prediction. As there is no loop in the prediction process, the model is interpretable
end to end.
There are two major advantages of the model:
1. Running the GRU in reverse time order gives computational concessions
2. There can be a substantial improvement in the prediction process when times-
tamps are used. Timestamps provide the duration of the time spent by the patient
in the ICU. This parameter adds to the accuracy of the model, as longer ICU
stays can indicate increased risk.
The patients having heart failure and their qualifying ICD_9 diagnosis codes
were extracted from the MIMIC III dataset to train and test the attention-based
explanatory models. There are 2349 patients, having 9587 admission records having
135,709 diagnoses records in the dataset prepared and 2989 unique ICD_9 codes.
These extracted patients conform to the conditions of having at least 3 visits and
having diagnosis codes from a list of heart failure related diagnosis codes. This list
138 S. Khedkar et al.

has been compiled from data provided by the creators of the MIMIC III dataset itself,
and some experts in the field.
• Hyperparameter tuning:
It is very easy to achieve a very high accuracy while training the data using dense
neural networks, but these might not generalize well to validation and test set. Also,
eschewing deep/complex architectures may lead to low accuracy on the data sets.
Hence, a sweet spot has to be found which generalizes well and has a high accuracy.
Some models fail due to saddle points and local minima making gradients zero, hence
hyper-parameters like learning rate need to be tweaked and change the optimizer to
either Adam or Adadelta to not get stuck and stop learning further.
Hyperparameters:
• Number of Layers: It must be chosen wisely as a very high number may introduce
problems like overfitting and vanishing and exploding gradient problems and
a lower number causes the model to with high bias and low potential model.
As the model have two separate GRU units for training visit level codes and
variable level codes, visit codes and variable codes with 128 hidden alpha layers
and 128 hidden beta layers are trained respectively. As the model’s performance
metric is accuracy, on changing the number of hidden layers to 256 there was no
significant change in the accuracy of the model. Thus the hidden layer count is
kept at 128 so as to maintain the simplicity of the model. The linear embedding
applied to the initial list of integers were tweaked from 128 embedding size to
256 as it showed an increase in the accuracy of the model by 9% and also a
substantial decrease in the cost function.
• Activation Function: The popular choices in this are ReLU, Sigmoid, Tanh,
and LeakyReLU. For the Update gate (used to determine how much of the past
information is to be passed on) and Reset gate (used to decide how much of the
past information to forget) of the GRU models, sigmoid and tanh as activation
functions are being used.
• Optimizer: It is the algorithm used by the model to update weights of every
layer after every iteration to minimize the cost function. For this model, initially
AdaGrad was used as an optimizer but it has some concerns of its own like
continually decaying learning rate η, manual selection of the learning rate η. To
resolve these concerns optimizer was switched to Adadelta.
• Initialization: Doesn’t play a very big role as defaults work well but still one
must avoid using zero or any constant value (same across all units) weight ini-
tialization. The weights are initialized between −1 and 1 for linear embedding,
for visit level (alpha), for variable level (beta). The biases are initialized with
0’s of suitable data type and format.
• Batch Size: It is indicative of no. of patterns shown to the network before the
weight matrix is updated. If the batch size is less, patterns would be less repeat-
ing and hence the weights would be all over the place and convergence would
Deep Learning and Explainable AI in Healthcare Using EHR 139

become difficult. For this model, batch size is initialized as 100. This was appro-
priate as modifying it any further was only increasing the time taken for execu-
tion.
• Number of Epochs: The no. of epochs is the no. of times the whole training
dataset is passed through the model. Seventeen epochs are used here as increas-
ing/decreasing it further does not affect on accuracy. The number of epochs is
an important hyperparameter since an increase in this number might result in
overfitting of the model and a decrease in it may yield poor results as the model
may not function to its fullest potential. Overfitting can lead to generalization,
which eventually would result in vanishing and exploding gradient problems.
• Dropout: The keep-probability of the Dropout layer can be thought of as hyper-
parameter which could act as a regularizer to help us find the optimum bias-
variance spot. Dropouts are applied to two places: (1) to the input embedding,
(2) to the context vector c_i. Their respective dropout rates are 0.4 and 0.4
respectively. In simplest terms, this value is precise as it complements the per-
formance metrics of the model. Dropout values affect the performance so it is
recommended to tune them for the data.
• L1/L2 Regularization: Any machine learning model needs to learn from all
features provided to it. L2 regularization is applied to W_emb (weight of linear
embedding layer), w_alpha (weight of visit level GRU model), W_beta ((weight
of variable level GRU model)), and w_output (at the output layer after the
concatenation of alpha, beta weights with the embedding of the input vector).
Trained model is evaluated based on performance measure using test dataset. The
difference between the predicted value and its corresponding real values is measured
by the cost function. To find this cost (train_cost), the Adadelta optimization algo-
rithm is used. Adadelta is an optimization algorithm from the family of Stochastic
Gradient Descent algorithms. It finds the minimum cost value. It uses various weights
and always updates the weights according to the loss, so every time it gets to try new
weight values. The model is first run with some initial weights and the algorithm
updates them, trying to find the right combination by performing thousands of iter-
ations. It is important to note that Adadelta is looking for the minimum cost, not
minimum weights, and hence it is only updating weights, not minimizing them.

3.4 LIME Algorithm

The general approach LIME takes to achieve the goal is as follows:


1. For each prediction to explain, permute the observation n times.
2. Let the complex model predict the outcome of all permuted observations.
3. Calculate the distance from all permutations to the original observation.
4. Convert the distance to a similarity score.
5. Select m features best describing the complex model outcome from the permuted
data.
140 S. Khedkar et al.

6. Fit a simple model to the permuted data, explaining the complex model outcome
with the features from the permuted data weighted by its similarity to the original
observation.
7. Extract the feature weights from the simple model and use these as explanations
for the complex model’s local behavior.

4 Results and Discussions

Results for LIME Algorithm using Multilayer Perceptron, Random Forest and Naïve
Bays is described below.

4.1 Multi-layer Perceptron(MLP)

A multilayer perceptron (MLP) is composed of layers, which are of three types—


to receive the signal, there is an input layer, an output layer makes a decision or
prediction, and any number of hidden layers between input and output perform the
actual processing of the features.
The number of hidden layers used in MLP was 14 and activation function ReLU
was used. Solver lbfgs was used.
The mean accuracy obtained was 83.11%.
Figure 3 shows the result of MLP as a black box model when LIME, the explainer
model was run on it.

Fig. 3 Multi-layer perceptron (MLP)


Deep Learning and Explainable AI in Healthcare Using EHR 141

Fig. 4 Random forest algorithm

4.2 Random Forest Algorithm

Ensembled classifier Random Forest Classifier is used for classifying objects.


This classifier selects a subset of the training set. It generates a set of decision
trees and aggregates the decisions from different decision trees and calculates the
final class of test object. The accuracy obtained for the Cleveland dataset was 77%
as shown in Fig. 4.

4.3 Naive Bayes Algorithm

It is a statistical classifier based on Bayes theorem. It uses the class conditional


independence assumption.
It uses a simplified calculation of probabilities, hence the name Naive Bayes. The
simplification makes the calculations more tractable. Thus it performs well by giving
an accuracy of 77% as shown in Fig. 5.
From the above results it can be said that for this particular patient, the model
predicted the patient to have presence of heart disease and on an average it can be
said that the feature Thalach (maximum heart rate), presence of exercise included
angina, having Asymptomatic chest pain, ca (number of major vessels) and resting
electrocardiographic results are the features that contributed positively to the patient’s
presence of heart failure disease. Thus these features must be monitored in the future
to decrease the risk of heart failure. The feature “Thal” has a value less than 3,
indicating it is normal, and thus does not contribute to the indication of heart disease
in the patient, or rather contributes negatively.
142 S. Khedkar et al.

Fig. 5 Naive Bayes algorithm

4.4 Results for Attention Mechanisms

With a change in the number of input layers from 128 to 256, accuracy was increased
to 82.6%, which was initially 73% as showed in Figs. 6 and 7.
As shown in Figs. 8 and 9, here D_428: Congestive Heart Failure, D_427: Tachy-
cardia and D_996: Mechanical Complication due to a cardiac pacemaker.
On running the model on test data, a ‘.txt’ file is generated. It contains the contribu-
tion score of each ICD9 code with respect to each visit for an individual patient. This

Fig. 6 Before proper hyperparameter tuning


Deep Learning and Explainable AI in Healthcare Using EHR 143

Fig. 7 Final hyperparameter tuning

graph shows the contribution of the ICD9 code at a particular visit for a particular
patient.
As shown in Fig. 10, the accuracy of the RNN model was 82.5%. Some diseases
were misclassified with this accuracy but considering that the model is also capable of
interpretation the accuracy of the model can be said to be efficient. Also, this allows
the engineer to understand that the model is somewhere giving wrong predictions
and must be improved, thus increasing transparency.
Figure 11, graph plots the number of patients found having a particular disease. For
example, Chronic Kidney Disease has the highest count of 1670 patients; similarly,
Acidosis is found in 1638 patients, etc.
On average, which are those diseases that contribute negatively and has the least
effect on heart failure are plotted in Fig. 12. For example, Tobacco use disorder
contributes as low as −0.0134; similarly Obesity with average −0.0127 has also less
contribution.
144 S. Khedkar et al.

Fig. 8 Various patients considered for testing

Fig. 9 Textual explanations for predictions

On average, which are those diseases that contribute positively and have the most
effect on heart failure are plotted in this graph. For example, Atrial fibrillation con-
tributes the most with average score 0.0253; similarly, Congestive Heart Failure with
average 0.0206 has also more contribution as shown in Fig. 13.
Figure 14 shows the total number of times that particular disease was found in a
visit for a particular patient. For example, Chronic Kidney Disease was found 4029
times in total; similarly, Congestive Heart Failure was found 3742 times, etc.
Deep Learning and Explainable AI in Healthcare Using EHR 145

Fig. 10 Graphical output for a patient using ante-hoc explanatory model—ATTENTION mecha-
nism

Fig. 11 Total patients versus diagnosis


146 S. Khedkar et al.

Fig. 12 Negatively contributing ICD_9 codes

Fig. 13 Positively contributing ICD_9 codes

5 Conclusions

Artificial Intelligence (AI) and neural networks, in particular have seen unprece-
dented advancement in the last decade, mainly due to constantly improving com-
putational capabilities. However, these advancements have not been harnessed in a
business and social perspective due to a lack of trust in the models owing to their
Deep Learning and Explainable AI in Healthcare Using EHR 147

Fig. 14 Occurrence of ICD_9 codes

black-box nature, with business applications still using relatively simple and less
accurate algorithms.
This conundrum signals an urgent need to bring about explainability and inter-
pretability to deep neural networks. This chapter addresses this need by describing
an explainable neural network, which can explain its own predictions, while also
comparing it with a post hoc explainer like LIME. Predicting the possibility of heart
failure in an interpretable manner would give doctors an early warning, and help
reduce readmission rates. The model using RNN gives 82.5% Accuracy. This solu-
tion would contribute towards building trust in AI, and also towards putting neural
networks into widespread and constructive use. In Future, the model can be extended
to predict other diseases. The efficacy of ensembled algorithms can be tested for more
precise predictions.

References

1. Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.Ch., Mark, R.G., Mietus,
J.E., Moody, G.B., Peng, C.-K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet(June
13): components of a new research resource for complex physiologic signals. Circulation
101(23), e215–e220. (Circulation Electronic Pages; http://circ.ahajournals.org/content/101/23/
e215.full) (2000)
2. MIMIC III dataset (Medical Information Mart for Intensive Care III). https://mimic.physionet.
org/
3. Holzinger, A., Biemann, C., Pattichis, C.S., Kell, D.B.: What do we need to build explainable
AI systems for the medical domain (2017). arXiv:1712.09923v1
4. Zhao, C., Shen, Y., Yao, L.-P.: Convolutional neural network-based model for patient represen-
tation learning to uncover temporal phenotypes for heart failure (2017)
148 S. Khedkar et al.

5. Choi, E., Bahadori, M.T., Kulas, J.A., Schuetz, A., Stewart, W.F., Sun, J.: RETAIN: an inter-
pretable predictive model for healthcare using reverse time attention mechanism. In: 30th con-
ference on neural information processing systems (NIPS), Barcelona, Spain (2016)
6. Guestrin, C., Singh, S., Ribeiro, M.T.: Why should i trust you? Explaining the predictions of
any classifier (2016). arXiv:1602.04938
7. Bahdanau, D., Cho, K.H., Bengio, Y.: Neural machine translation using attention mechanism
paper, ICLR (2015)
8. Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: GRAM: graph-based attention model
for healthcare. In: ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 787–795 (2017)
9. Cleveland Heart Disease Dataset: (1988). https://archive.ics.uci.edu/ml/datasets/heart+Disease

Prof. Sujata Khedkar is working as Associate Professor in Computer Engineering Department,


Vivekanand Education Society’s Institute of Technology, University of Mumbai, India. Her current
research focuses on Artificial Intelligence and Big Data Analytics. She is member of the ISTE and
CSI.

Priyanka Gandhi is a Software Developer with active research interests in the field of AI and
Big Data Technologies. She has pursued Bachelor in Computer Engineering from V.E.S.I.T., Uni-
versity of Mumbai in 2019.

Gayatri Shinde is a Software Developer with active research interests in the field of AI and
Deep Learning. She has pursued Bachelor in Computer Engineering from V.E.S.I.T., University
of Mumbai in 2019.

Vignesh Subramanian is working as Software Developer with active research interests in the
field of AI and Big Data Analytics. He has pursued Bachelor in Computer Engineering from
V.E.S.I.T., University of Mumbai in 2019.
Deep Learning for Analysis of Electronic
Health Records (EHR)

Pawan Singh Gangwar and Yasha Hasija

Abstract In current scenario, every medical equipment, clinical instrument, lab


setup in healthcare centres and hospitals, is linked with digital devices which has
brought about digital data explosion. Due to this, the amount of digital information
generated and stored in Electronic Health Records (EHRs) has increased exponen-
tially. Therefore, EHRs have become an area of booming research, as EHRs can
provide a host of untouched possibilities which, the data contained in them, can
bring about. EHRs have several classification schema and controlled vocabularies
are present to record relevant medical information and events. Thus, harmonizing
and analysing data among institutions and across terminologies is an ongoing field of
research. Several clinical code representation forms have been proposed by various
deep learning EHR systems that share themselves easily to cross institutional anal-
ysis and applications. EHR records have primary use in storing patient information
such as patient medical history, progress, demography, diagnosis and medications.
But researchers across the globe have invented secondary use of EHRs for several
clinical and health informatics applications. Secondary usage of electronic health
records (EHRs) promises to boost clinical research and result into better informed
clinical decision making. Challenge in summarizing and representing patient data
prevents widespread practice to predict the future of patients using EHRs. Simulta-
neously, over the span of time, the machine learning field has witnessed widespread
advancements in the area of deep learning. The current research in healthcare infor-
matics focusses on applying deep learning based on EHRs to clinical tasks. In this
context, the deep learning techniques described here can be applied to various types
of clinical applications such as extraction of information, representation learning,
outcome prediction, phenotyping and de-identification. Several limitations of cur-
rent research have been identified like model interpretability and heterogeneity of
data.

P. S. Gangwar · Y. Hasija (B)


Delhi Technological University, Delhi 110042, India
e-mail: [email protected]
P. S. Gangwar
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 149


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_8
150 P. S. Gangwar and Y. Hasija

1 Introduction

Over the earlier decade, emergency clinic selection of electronic health record
(EHR) systems has expanded numerous folds, which gave $30 billion motivators to
restorative organizations, medical clinics and specialists to receive EHR systems [1].
According to the most recent report, about 84% of medical clinics have embraced at
any rate a fundamental EHR framework, a 9-overlay increment from 2008 [2]. More-
over, office-based doctor appropriation of essential and ensured EHRs has expanded
to 87% from 42% [3].
EHR frameworks store every patient experience information, including statistic
data, research facility tests and results, analysis, remedies, clinical notes, radiologi-
cal pictures, and so forth [1]. While for the most part intended for improving social
insurance proficiency from a dynamic stance, numerous investigations have discov-
ered optional utilizations of clinical data [4, 5]. Specifically, the patient information
included in EHR frameworks have been utilized for several such assignments as
medicinal idea extraction [6, 7], infection deduction, quiet direction displaying, clin-
ical choice emotionally supportive networks, and more (Table 1).
Until some most recent couple of years, a significant part of the techniques to
investigate rich EHR information, depended on, customary statistical and machine
learning procedures like logistic regression, support vector machines (SVM), and
random forests [13]. Of late, deep learning strategies have made incredible progress
in a few spaces through catching long-run conditions and deep hierarchical feature
construction in information in an able way [14]. Taking a gander at the ascent in the
fame of deep learning strategies and the inexorably tremendous measure of patient
information, there has been additionally, an expansion in the quantity of publications
which apply deep learning to EHR information for clinical informatics errands yield-
ing better performance over conventional techniques and which require less tedious
pre-processing and highlight designing.
This chapter audits the particular deep learning strategies employed for EHR
information examination and inference, and talks about the strong clinical applica-
tions empowered by such advances. Dissimilar to other new studies, which explored
deep learning in the broad context of health informatics applications of informatics,
running from genome investigation to biomedical picture examination, this chapter
is focussed only on deep learning methods combined to EHR information [15].

Table 1 Recent deep EHR


Project Deep EHR task References
projects
Deepr Hospital re-admission prediction [8]
DeepPatient Multi-outcome prediction [9]
Doctor AI Prediction of heart failure [10]
DeepCare EHR concept representation [11]
eNRBM Stratification of suicide risk [12]
Med2Vec EHR concept representation [10]
Deep Learning for Analysis of Electronic Health Records (EHR) 151

Fig. 1 Patterns in the quantity of publications relating to deep EHR [16]

Selection Criteria and Search Strategy


The investigations include publications which were published up to August 2017.
All inquiries incorporate the expression “electronic-health-records” or “EHR” or
“electronic-medical-records” or “EMR”, mutually with “deep learning” or a partic-
ular deep learning procedure (Sect. 4). Figure 1 demonstrates the dispersion of the
number of distributions every year in a variety of zones identified with deep EHR. The
first distribution of Fig. 1 contains overall results “deep-learning” and “electronic-
health-records”, which illuminates the general yearly increment in the number of
productions identified with deep learning and EHR. The second distribution shows
these same terms in conjunction with a variety of specific application areas. For these
inquiries, varieties of included terms are incorporated, for example: “recurrent neural
network” OR “RNN”, “deep-learning”, “electronic-health-records”. As the general
number of publications is moderately less, the most noticeable and original deep
EHR publications are incorporated into the rest of the part.
In Sect. 2 there is a survey of EHR frameworks. At that point key machine learning
ideas are clarified in Sect. 3, trailed by deep learning systems in Sect. 4. Further, in
Sect. 5 ongoing utilizations of deep learning (DL) for EHR information examination
are talked about. At last, the part is closed by recognizing current difficulties and
future open doors in Sect. 7.

2 Electronic Health Record (EHR) Systems

Usage of the EHR frameworks has altogether extended in both hospital an ambulatory
thought context [2]. EHR use at crisis hospitals and clinics can improve understanding
thought by reducing errors, growing profitability, and improving care coordination,
while furthermore giving a rich wellspring of data for examiners. EHR frameworks
can change in functionality terms, and are regularly masterminded into basic EHR
without clinical notes, basic clinical notes with EHR, and comprehensive systems.
152 P. S. Gangwar and Y. Hasija

While lacking additionally created usefulness, even basic EHR frameworks can give
an information on patient’s medicinal history, challenges, and medication use. EHR,
since, was generally proposed for internal hospital administrative assignments, a
couple classification design were available for record relevance therapeutic informa-
tion and cases. A couple of models consolidate investigation codes, system codes,
re-look office perceptions, and solution codes. These codes can change between foun-
dations, with midway guide pings kept up by resources. Given the gigantic display
of schemata, mixing and investigating data across over wordings and between foun-
dations is a consistent region of research. A couple of the profound EHR frameworks
in the part proposes sorts of clinical code portrayal that credit themselves even much
viably to across foundation examination and application. EHR frameworks store a
couple of sorts of patient information, including demographics, diagnoses, physi-
cal exams, sensor measurements, laboratory test results, prescribed or administered
medications, and clinical notes [15]. EHR data is heterogeneous, include data types:
(1) Numerical sums, for instance, BMI (weight file),
(2) Date time objects, for instance, birth date or time of insistence,
(3) Categorical characteristics, and
(4) Natural language free-content, for instance, advance notes or discharge sum-
maries. Besides, these data types can be mentioned sequentially to outline the
explanation behind,
(5) Derived course of action of time, for instance, perioperative essential sign or
multimodal tolerant history.
While other biomedical data, for instance, restorative pictures or genomic in-
course of action exist and are peddled in later huge articles, in this review we focus
on these 5 data sorts found in many present day EHR frameworks.

3 An Overview of Machine Learning

Machine learning strategies can be thoroughly apportioned into 2 vital orders: super-
vised and unsupervised learning. Supervised learning procedures incorporate deriv-
ing a mapping capacity for example y equals f(x), sources of info x to yields y.
Examples of supervised learning tasks include regression and classification, with
algorithms including logistic regression and support vector machines. On the other
hand, the target of unsupervised learning frameworks is to get fascinating properties
of the scattering of x. E.g. of unsupervised learning tasks include clustering and
density estimation. The representation of inputs is a fundamental issue spanning all
types of machine learning frameworks. For every datum point, attributes set called
as, features, are separated to use as input to ML frameworks. In standard ML, the
features used to be hand-made reliant on territory data. One inside norms of deep
learning is the automatic data-oriented feature extraction.
Deep Learning for Analysis of Electronic Health Records (EHR) 153

4 Deep Learning and Its Approaches

Deep learning wraps a broad grouping of methodology. In this segment, a short


layout of the much generally perceived deep learning systems. For every specified
engineering, a key condition is included that depicts its critical technique for errand.
The main idea in deep learning are/is of portrayal. Generally, input in features to
a ML computation should be hand-made from unrefined data, contingent upon pro-
fessional aptitude and territory figuring out how to choose un-ambiguous instances
of prior premium. The planning methodology of making, separating, choosing, and
evaluating legitimate feature(s) could be troublesome and tedious, and is frequently
thought of as “dim craftsmanship” requiring imagination, experimentation, and as a
rule karma. Then again, deep learning techniques increase perfect features straight-
forwardly from the data itself, with no human bearing, taking into thought the cus-
tomized divulgence of inactive data associations that may somehow be dark or con-
cealed. Complex data portrayal in deep learning is regularly imparted as plans of
other, increasingly clear portrayals. For example, seeing a man in a picture could
incorporate findings portrayal of edge-from-pixel. This thought of unsupervised dif-
ferent levelled portrayal of growing multifaceted nature is a repetitive profound
learning theme. Most by a long shot of profound learning computations and archi-
tectures depend on the arrangement of the artificial neural framework (ANN). ANNs
are made out of different interconnected nodes (neurons), engineered in layers as
showed up in Fig. 2.


D
E(θ, D) = − [log P(Y = yi |xi , θ )] + λθ  P (1)
i=0

The main term in condition constrains the whole of the log setback over the whole
preparing data-set (D); 2nd term tries to restrict p-standard of the educated model-
parameters θ i which is constrained by a tuneable-parameter λ. This second term
is called as regularization; and is a strategy used to keep a model from over-fitting
and to manufacture its ability to total up to new, covered points of reference. The
misfortune work is generally upgraded using back propagation, a framework for
weight streamlining that limits misfortune in reverse.
In the rest of this area, a few normal kinds of profound learning-models utilized for
deep EHR application are assessed, which is/are all founded on the ANN’s design

Fig. 2 A fundamental
neural-network [16]
154 P. S. Gangwar and Y. Hasija

and enhancement technique. A various levelled perspective on these regular deep


learning models for investigating EHR information, alongside chose works in this
overview which actualize them, are appeared in Fig. 3.

4.1 Multilayer Perceptron (MLP)

A MLP is a kind of ANN which comprise of multiple hidden layers, in which every
neuron in the layer I is totally associated with one another neuron in the layer I + 1.
Conventionally, these systems are constricted to two or three shrouded layers, and
the information streams just in one direction, as opposed to repetitive/undirected
models. Expanding the possibility of a single-layer ANN, each shrouded unit forms
a weighted sum of the yields from the past layer, trailed by a nonlinear initiation
(σ) of the determined aggregate as in condition. Here; d is the amount of units in
past layer x j is the yield from the past layer’s jth hub, and wij and bij are weight and
inclination substances related with each x j . Customarily sigmoid/tan h were picked
nonlinear enactment capacities, however present day systems are utilizing capacities,
for example, amended direct units (ReLU) [17].
⎛ ⎞
d
hi = σ ⎝ x j wi j + bi j ⎠ (2)
j=1

In the wake of advancing hidden layer loads amid preparing, the system learns
a connection between data x and yield y. As more hidden layers are included, it is
normal that the information will be appeared in an obviously progressively unique
way in light of each shrouded layer’s nonlinear enactment. While the MLP is one of
least troublesome models, various structures frequently combine totally associated
neurons.

4.2 Convolutional Neural Networks (CNN)

CNN had transformed into an incredibly common gadget of late, especially in the
image processing community. CNNs power neighbourhood availability on the unre-
fined information. For instance, instead of viewing a 50 × 50 picture as 2500 irrele-
vant pixels, increasingly significant features are separated by studying the image as
an accumulation of neighbourhood pixel patch. Basically, a one-dimensional (1D)
time course of action can in like manner be considered as an integration of neigh-
bourhood signal bits. The condition for 1-D convolution is showed up in condition,
where x is information sign and w is gauging capacity or is convolutional channel.
Deep Learning for Analysis of Electronic Health Records (EHR)

Fig. 3 The most widely recognized architectures of deep learning for examining EHR information [16]
155
156 P. S. Gangwar and Y. Hasija

Fig. 4 Convolutional neural network (CNN) for ordering pictures [16]



C1d = x(a)w(t − a) (3)
a=−∞

In same way, two-dimensional (2D) convolution is presented in the expression


below, in which X is a 2-D grid and K is the kernel.

C2d = X (m, n)K (i − m, j − n) (4)
m n

CNNs incorporate deficient associations as the channels are reliably humbler than
the information, accomplishing usually unobtrusive number of parameters. Convolu-
tion in like way invigorates parameter sharing since each channel is associated over
the whole data. In a CNN, the convolution layer is diverse convolutional channels
depicted over, all tolerating a similar responsibility from the past layer, which ideally
make sense of how to remove unmistakable lower-level highlights. Thus, a subsam-
pling or pooling layer is conventionally interfaced to signify the removed highlights
(Fig. 4).

4.3 Recurrent Neural Networks (RNN)

RNNs is/are an exact decision when information is progressively requested, (for


instance, time course of action information or normal language). While 1D (one-
dimensional) courses of action could be encouraged to CNN, the consequent removed
feature(s) is/are shallow, as in just immovably restricted associations between a cou-
ples of neighbours are factored in the segment portrayals. RNNs are intended to man-
age several long-run common conditions. RNNs is worked by successive refreshing
a hidden state ht put together not just with respect to the enactment of the present
information x t at time t, yet additionally on the past concealed state ht − 1, which
thusly is refreshed from x t − 1, ht − 2, etc. (Fig. 5). As such, the last hidden state
consequent to setting up a whole progression contains data from all its past segments.
Deep Learning for Analysis of Electronic Health Records (EHR) 157

Fig. 5 RNN: symbolic representation (left), expanded representation (right) [16]

Standard RNN varieties incorporate the long short-term memory (LSTM) and
gated recurrent unit (GRU) model, the two named to as gated-RNNs. while standard-
RNNs are involved inter-connected shrouded sanctum units, every unit in the gated-
RNN is supplanted by an uncommon cell which contains an inward recurrent circle
and an arrangement of doors which controls the movement of information. They
have showed up in demonstrating longer term progressive conditions among various
preferences.

4.4 Auto-encoders (AE)

Sort of deep learning models encapsulating possibility of unsupervised representation


learning is AE. 1st promoted as an early gadget to pre-train regulated deep learning
models, particularly when labelled information was uncommon, yet in the meantime
hold handiness for altogether unmanaged assignments, for instance, phenotype dis-
closure. Auto encoders are intended to encode the contribution to a low dimensional
space; z. The encoded portrayal is then decided by reproducing an approx. portrayal
x̃ of the information x. W, W0 are the individual encoding and interpreting loads,
and as the reproduction mistake x − x̃ is smaller than usual mined, the encoded
portrayal z is considered progressively dependable.

z = σ (W x + b) (5)

 
x̄ = σ W  z + b (6)

At the point when AE is prepared, a lone data is bolstered through the net-work,
with most deep hidden layer initiations filling in as the data’s encoded portrayal.
AEs serve to change the information into a format where simply the most critical
inferred measurements are put away. Thusly, they resemble standard dimensional-
ity decrease systems like principal component analysis (PCA) and singular value
decomposition (SVD), yet with a basic bit of leeway for complex issues because
of nonlinear changes by methods for each concealed layer’s enactment capacities.
158 P. S. Gangwar and Y. Hasija

Fig. 6 Two hidden layers independently-trained stacked auto-encoder [16]

Profound AE systems can be built and prepared in an insatiable structure by a method-


ology referred to as stacking (Fig. 6). Various varieties of AEs had been presented,
including de-noising auto encoders (DAE), sparse auto-encoders (SAE), and varia-
tion auto encoders (VAE).

4.5 Restricted Boltzmann Machine (RBM)

Other unsupervised deep learning engineering for learning input information por-
trayals is RBM. The purpose behind RBMs resembles auto-encoders, yet RBMs
rather take a stochastic point of view by assessing the probability dispersion of the
data information. Thusly, RBMs are regularly seen as generative model/s, attempt-
ing to demonstrate the hidden technique by which the information was created. The
acknowledged RBM is an imperativeness based model with two-fold discernible
units (~v) and shrouded units (~h), with essentialness work indicated in condition.

E(v, h) = −b T v − c T h − W v T h (7)

In a BM, all the units are totally associated, while in a RBM there are no associa-
tions between any two discernible units/any two concealed units. Preparing a RBM is
consistently practiced through stochastic improvement, for instance, Gibbs testing.

5 Deep EHR Learning Applications

In this area, we review the present forefront in clinical applications coming about in
light of continuous advances in profound EHR learning. A diagram generally deep
EHR learning ventures and the target assignments is seemed table, where we star
Deep Learning for Analysis of Electronic Health Records (EHR) 159

Table 2 Summary of EHR deep learning tasks


Task Subtasks Input data
Information extraction Temporal event extraction Clinical notes
Abbreviation expansion
Relation extraction
Single concept extraction
Representation learning Concept representation Medical codes
Patient representation
Outcome prediction Static prediction Mixed
Temporal prediction
Phenotyping New phenotype discovery Mixed
Improving existing definitions
De-identification De-identification of clinical text Clinical notes

present errand and subtask definitions dependent on a coherent social event of ebb
and flow examine. An impressive part of the applications and results in the rest of this
area depend on datasets of private EHR having a place with autonomous medicinal
services foundations in Section VII. In any case, a couple of concentrates incorpo-
rated, a transparently available fundamental thought information base, similarly as
open clinical note datasets (Table 2).

5.1 EHR Information Extraction (IE)

Instead of the organized fragments of EHR information conventionally utilized for


charging and regulatory purposes, clinical notes are more nuanced and are basically
utilized by medicinal services suppliers for detailed documentation. Each patient
experience is related with a couple of clinical notes, for instance, admission notes,
discharge summaries, and transfer orders. Historically these techniques have needed
a great deal of non-automatic part building and ontology mapping; one inspiration
why this methodology had seen restricted appropriation. Everything considered, a
couple of progressing thinks about have concentrated on separating critical clinical
information by using deep learning (Fig. 7).
The major sub-tasks incorporate
(1) Single idea extraction,
(2) Temporal event extraction,
(3) Relation extraction, and
(4) Abbreviation expansion

Assessing method
Accuracy, review, and F1 score are the essential classification measurements for the
assignments including single idea extraction, Temporal event extraction [18], and
160 P. S. Gangwar and Y. Hasija

Fig. 7 EHR information-extraction (IE) [16]

clinical relation extraction [19]. The study on clinical shortened form development
used exactness as its assessment method. While a few studies share comparative
assignments and assessment measurements.

5.2 EHR Representation Learning

Presently, carefully assembled example are utilized for mapping between organized
medicinal thoughts, where each thought is appointed an unmistakable code by its
significant metaphysics. These static various levelled associations disregard to gauge
the natural likenesses between thoughts of various sorts and coding plans. Continuous
deep learning systems utilized for progressively point by point examination and
logically careful prescient assignments.
In this area, at first deep EHR strategies for addressing discrete medicinal codes is
depicted as certifiable esteemed vectors of discretionary measurement. These under-
takings are, all things considered, unsupervised and focus on normal associations
and gatherings.

5.2.1 Concept Representation

Several recent studies have applied deep unsupervised representation learning tech-
niques to derive EHR concept vectors that capture the latent similarities and natural
Deep Learning for Analysis of Electronic Health Records (EHR) 161

clusters between medical concepts. We insinuate this district as EHR thought por-
trayal, and its fundamental goal is to get vector portrayals from meagre medicinal
codes to such a degree, that practically identical thoughts are adjoining in vector
space. Inactive Encoding: Aside from NLP-roused strategies, other typical profound
learning portrayal learning methodology have similarly been utilized for addressing
EHR thoughts. Tran et al. plan an adjusted restricted RBM which uses an organized
preparing strategy to fabricate portrayal interpretation. They assessed the nature of
associations between different restorative thoughts, and found that preparation direct
models on portrayals got through AEs massively outflanked customary straight mod-
els alone, achieving top tier execution.

5.2.2 Patient Representation

A few distinctive profound learning techniques for getting vector portrayals of


patients have been proposed in the composition. Most of the techniques are either
propelled by NLP systems, for instance, conveyed word portrayals, or use dimen-
sionality decrease methodology, for instance, auto encoders.

Methods of Assessment for EHR Representation-Learning


A significant part of the examinations including portrayal learning evaluate their
portrayals dependent on partner arrangement under-takings, with the comprehended
doubt that redesigns in expectation are ascribed to a logically fiery portrayal of
either clinical thoughts or patients. Techniques for appraisal are as such shifted and
undertaking subordinate, including estimations, for instance, AUC (heart frustration
starting forecast, illness expectation, clinical peril bundle forecast), precision@k
(infection development, ailment naming), recall@k (therapeutic code expectation,
coordinated clinical event forecast), precision (impromptu readmission forecast),
or exactness, audit, and F1 score (association extraction), spontaneous readmission
expectation, chance stratification). A couple of studies do exclude any optional order
errands, and focus on evaluating the educated portrayals legitimately.

5.2.3 Outcome Prediction

A definitive objective of numerous Deep-EHR systems is to predict persistent results.


(1) Static/one-time prediction (for example heart failure prediction utilizing data from
a solitary experience), and (2) temporal outcome prediction (for example heart fail-
ure prediction within a half year, or disease beginning prediction utilizing recorded
data from consecutive experiences). A significant number of these prediction sys-
tems utilize unsupervised data modelling, for example, clinical idea representation
(Section V-B). As a rule, the principle commitment is simply the deep representation
learning.
(1) Static Outcome Prediction:
162 P. S. Gangwar and Y. Hasija

The clearest class of result forecast application’s expectation of a particular


outcome not including common imperatives.
(2) Temporal Outcome Prediction:
They furthermore anticipate future readmission dependent on these past con-
clusions and interventions. For all errands, they found the profound strategies
brought about the best execution. Nickerson et al. guess postoperative reac-
tions including post-usable urinary support (POUR) and transient instances of
postoperative torment using MLP and LSTM systems to propose logically pow-
erful postoperative torment the administrators. Nguyen et al. Deepr system uses
a CNN for foreseeing spontaneous re-confirmation following release. Like a
couple of various strategies, Deepr works with discrete clinical event codes.

5.2.4 Computational Phenotyping

As the whole and accessibility of itemized clinical wellbeing records has detonated of
late, there is a gigantic task open door for coming back to and refining wide infection
and determination definitions and limits. A model utilization rationale of allowing
the information to legitimize itself with genuine proof by finding lethargic associ-
ations and various levelled thoughts from the rough information, with no human
supervision or earlier inclination. With the accessibility of massive proportions of
clinical information, various continuous investigations have utilized profound learn-
ing frameworks for computational phenotyping. Computational phenotyping has two
basic applications:
(1) Finding and stratifying new subtypes;
(2) Finding unequivocal phenotypes for improving arrangement under existing ail-
ment limits and definitions.
The two zones attempt to find new information driven phenotypes; the past is
an, all things considered, unsupervised endeavour i.e. hard to quantitatively survey,
where the other is naturally attached to an administered learning with viably approved
result.

5.2.5 Clinical Data De-identification

Clinical notes usually incorporate unequivocal PHI (individual wellbeing in-course


of action), which makes it tuff to openly release various profitable clinical datasets
[20]. A framework Dernoncourt et al. [20] was made for the modified de-
distinguishing proof of clinical substance, which substitutes a generally tenacious
manual de-recognizable proof procedure for sharing restricted information. Their
structure includes a bidirectional LSTM arrange (Bi-LSTM) and both word-level
and character embedding. The makers observed their system to be top tier, with an
outfit approach with restrictive irregular fields furthermore faring incredible. In a
Deep Learning for Analysis of Electronic Health Records (EHR) 163

similar endeavour, Shweta et al. research different RNN designs and word insert-
ing methods for distinguishing perhaps recognizable named substances in clinical
substance.

6 Interpretability

Since deep learning strategies have increased sick notoriety for creating best in class
execution on a wide assortment of errands, its real analysis is that the yield models are
hard to translate normally. Accordingly, a few deep learning structures are much of the
time alluded to as “secret elements”, where just the information and yield forecasts
pass on significance to a human spectator. The fundamental downside for this absence
of model straightforwardness is actually what makes deep realizing so viable: the
layers of nonlinear information changes that uncover concealed variables of trap
in the info. This issue exhibits an exchange off among execution and receptiveness
(Table 3).
In the clinical area, model straightforwardness is most significant, given that fore-
casts may be utilized to influence understanding medicines and certifiable restorative
basic leadership. This is the motivation behind why interpretable direct models like
calculated relapse stifle connected clinical informatics. In this part, clinical deep
learning is endeavoured to be made increasingly interpretable.

6.1 Maximum Activation

A mainstream game plan inside the picture handling network is to take a gander at
the classes of information sources that end in the most extreme enactment of each
concealed unit of a model. This speaks to a preliminary to take a gander at what

Table 3 Techniques of interpretability for deep EHR systems


Type Methods
(1) Maximum activation • Output activation maximization [21]
• Convolutional filter response [8]
• Dense top-layer weight maximization [16]
(2) Constraints • Non-negative matrix factorization [21]
• Non-negativity [12]
• Ontology smoothing [12]
• Sparsity [22]
• Regularization [12]
(3) Qualitative • t-SNE [8]
clustering
(4) Mimic learning • Interpretable mimic learning [23]
164 P. S. Gangwar and Y. Hasija

unequivocally the model has learned, and may be utilized to dole out significance to
the crude info choices. This methodology has been embraced by numerous investi-
gations encased in our outline.

6.2 Constraints

Others have mandatory preparing imperatives explicitly pointed towards expand-


ing the interpretability of deep models. The creators take the k biggest estimations
of every section of the subsequent code weight network as an unmistakable ill-
ness bunch that is interpretable upon subjective survey. They likewise play out the
indistinguishable procedure on the subsequent visit installing grid for examining
the sorts of visits every neuron figures out how to spot. Correspondingly, eNRBM
engineering additionally implements non-pessimism inside the loads of the RBM. In
phenotype revelation system for information of time arrangement, their regulariza-
tion and sparsity requirements on the AE came about persistent highlights on the first
layer that were interpretable as finders of practical component like tough or downhill
sign inclines, is another case of progress of interpretability of scholarly model loads
through sparsity.

6.3 Qualitative Clustering

In the kind of EHR idea representation and phenotype thinks about, a few exami-
nations point to a much roundabout thought of interpretability by looking at normal
groups of the subsequent vectorised representations. In comparable manner, Nguyen
et al. venture dispersed representations of clinical occasion and patient vectors into 2
measurements by means of t-SNE, taking into consideration a subjective correlation
of comparative determinations and patient subgroups.

6.4 Mimic Learning

The issue of deep model straightforwardness was handled at long last in the Inter-
pretable Mimic Learning systems. Initial a deep neural system was prepared on crude
patient information with related marks of class, which results into a vector for every
example. An extra gradient boosting tree (GBT) was prepared on the crude patient
information, yet the deep system’s likelihood expectation was utilized as the objec-
tive name. As GBTs are interpretable straight models, highlight significance can be
appointed to the crude information highlights while outfitting the intensity of deep
systems. The copy learning technique has comparative or preferred execution over
Deep Learning for Analysis of Electronic Health Records (EHR) 165

both of the standard straight and deep models for some phenotyping and mortality
forecast undertakings, while holding the needed component straight forwardness.

7 Discussion and Future Prospectus

This chapter provides a brief overview of current deep learning research as it pertains
to EHR analysis. This is a rising zone as seen by the fact that the greater part of the
chapter was published in past two years [1].
Tracing back the deep learning-based advances in image and natural language
processing, we see a clear chronological similarity to the progression of current
EHR-driven deep learning research. In particular, a dominant part of study in the
review are associated with the thought of representation learning, i.e., how suc-
cessfully to represent the enormous measure of crude patient information that has
amazingly turned out to be accessible in the earlier decade. Fundamental image
processing research is concerned with increasingly complex and hierarchical repre-
sentations of images composed of individual pixels. Additionally, NLP focusses on
word, sentence, and report level representations of language including singular words
or characters. Moreover, the investigation of different plans of speaking to quiet well-
being information is occurring from individual medical codes, demographics, and
vital signs [1].

References

1. Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in
deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health
Inform 22(5), 1589–1604 (2018)
2. Birkhead, G.S., Klompas, M., Shah, N.R.: Uses of electronic health records for public health
surveillance to advance public health. Annu. Rev. Public Health 36(1), 345–359 (2015)
3. Charles, D., Gabriel, M., Searcy, T., Carolina, N., Carolina, S.: Adoption of Electronic Health
Record Systems Among U.S. Non-federal Acute Care Hospitals: 2008–2014. The Health Infor-
mation Technology for Economic and Clinical Health (HITECH) Act of 2009 Directed the
Office of the National Coordinator for Health, vol. 4, no. 23, pp. 2008–2014 (2015)
4. Jamoom, E., Yang, N.: Table of electronic health record adoption and use among office-based
physicians in the U.S., by state. In: 2015 National Electronic Health Records Survey, pp. 1–2
(2016)
5. Botsis, T., Hartvigsen, G., Chen, F., Weng, C.: Secondary use of EHR: data quality issues and
informatics opportunities. In: AMIA Joint Summits Translational Science Proceedings, vol.
2010, pp. 1–5 (2010)
6. Skrøvseth, S.O., Augestad, K.M., Ebadollahi, S.: Data-driven approach for assessing utility of
medical tests using electronic medical records. J. Biomed. Inform. 53, 270–276 (2015)
7. Meystre, S.M., Savova, G.K., Kipper-Schuler, K.C., Hurdle, J.F.: Extracting information from
textual documents in the electronic health record: a review of recent research. In: Yearbook of
Medical Informatics, pp. 128–144 (2008)
166 P. S. Gangwar and Y. Hasija

8. Ekbal, A., Saha, S., Bhattacharyya, P.: Deep learning architecture for patient data de-
identification in clinical records. In: Proceedings of the Clinical Natural Language Processing
Workshop, pp. 32–41 (2016)
9. Choi, Y., Chiu, C.Y.-I., Sontag, D.: Learning low-dimensional representations of medical con-
cepts. In: AMIA Joint Summits Translational Science Proceedings, vol. 2016, pp. 41–50 (2016)
10. Nguyen, P., Tran, T., Wickramasinghe, N., Venkatesh, S.: Deepr: a convolutional net for medical
records. IEEE J. Biomed. Health Inform. 21(1), 22–30 (2017)
11. Choi, E., et al.: Multi-layer representation learning for medical concepts. In: Proceedings of
the ACM SIGKDD International Conference on Knowledge and Discovery and Data Mining,
pp. 1495–1504, 13–17 Aug 2016
12. Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for
predictive medicine. In: Lecture Notes in Computer Science (including Subser. Lecture Notes
in Artificial Intelligence Lecture Notes Bioinformatics), vol. 9652 LNAI, pp. 30–41, Feb 2016
13. Jiang, M., et al.: A study of machine-learning-based approaches to extract clinical entities and
their assertions from discharge summaries. J. Am. Med. Inform. Assoc. 18(5), 601–606 (2011)
14. Borovcnik, M., Bentz, H.-J., Kapadia, R.: A Probabilistic Perspective (1991)
15. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press (2016)
16. Cheng, Y., Wang, F., Zhang, P., Hu, J.: Risk prediction with electronic health records: a deep
learning approach. In: 16th SIAM International Conference on Data Mining 2016 (SDM 2016),
pp. 432–440 (2016)
17. Wong, C., Deligianni, F., Berthelot, M., Andreu-perez, J., Lo, B., Yang, G.: Deep learning for
health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017)
18. Fries, J.A.: Brundlefly at SemEval-2016 task 12: recurrent neural networks vs. joint inference
for clinical temporal information extraction. In: SemEval 2016—10th International Workshop
Semantic Evaluation Proceedings, pp. 1274–1279 (2016)
19. Lv, X., Guan, Y., Yang, J., Wu, J.: Clinical relation extraction with deep learning. Int. J. Hybrid
Inf. Technol. 9(7), 237–248 (2016)
20. Dernoncourt, F., Lee, J.Y., Uzuner, O., Szolovits, P.: De-identification of patient notes with
recurrent neural networks. J. Am. Med. Inform. Assoc. 24(3), 596–606 (2017)
21. Tran, T., Nguyen, T.D., Phung, D., Venkatesh, S.: Learning vector representation of medical
objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM). J. Biomed.
Inform. 54, 96–105 (2015)
22. Lasko, T.A., Phenotype Discovery from Electronic Medical Records How Do You Perceive a
Chessboard? (2017)
23. Che, Z., Purushotham, S., Khemani, R., Liu, Y.: Interpretable deep models for ICU outcome
prediction. In: AMIA … Annual Symposium Proceedings. AMIA Symposium, vol. 2016,
pp. 371–380 (2016)

Mr. Pawan Singh Gangwar is a dynamic individual currently pursuing his master’s in bioinfor-
matics from Delhi Technological University. Having completed his bachelor’s in biotechnology
he is highly motivated towards computational research in life sciences and possesses an in-depth
knowledge of the field. In his free time Pawan likes to play badminton, chess and listen to music.

Dr. Yasha Hasija a master of many fields Dr. Yasha is an Associate Professor in the Delhi Tech-
nological University. She holds a bachelor’s and master’s degree in biotechnology and Ph.D. in
Bioinformatics. Besides having a sound academic foundation Dr. Yasha is a vibrant individual
and a very good orator. Specializing in genome informatics and interaction study with human dis-
eases, some of her research interests are—genetic analysis of dermatological disorders, tubercu-
losis study and role of human genetic variations in age-related disorders.
Application of Deep Architecture
in Bioinformatics

Sagnik Sen, Rangan Das, Swaraj Dasgupta and Ujjwal Maulik

Abstract Recent discoveries in the field of biology have transformed it into a data-
rich domain. This has invited multiple machine learning applications, and in par-
ticular, deep learning a set of methodologies that have rapidly evolved over the last
couple of decades. Deep learning (DL) is extensively used in many domains, includ-
ing bioinformatics for the analysis and classification of biomedical imaging data,
sequence data from omics and biomedical signal processing. It has been used to
predict protein structures, uncover gene expression regulation, classify anomalies
and understand functionalities of the brain. Basic deep neural networks, which con-
tains stacked columns of non-linear processing units, are quite versatile and has been
extensively used in almost every domain of bioinformatics. Convolutional neural net-
works have proved to be quite effective when working with image data and are used
in classifying biomedical images such as histopathology images, cell images, X-ray
images, magnetic resonance images and so on. They have been used for anomaly
classification, recognition, and segmentation. For areas that require dealing with
sequential data, such as protein structure prediction and brain decoding, recurrent
neural networks have been used extensively. Besides these, a lot of new architec-
tures are being currently explored to address some of the common drawbacks of
deep learning. Incorporation of fuzzy systems in deep learning has been done in an
attempt to improve the performance of such models. Multimodal learning in deep
learning is enabling modern architectures to work with heterogeneous data.

Keywords Deep architecture · Bioinformatics · Biomedical images ·


Convolutional neural network · Recurrent neural network

S. Sen (B) · R. Das · S. Dasgupta · U. Maulik


Department of Computer Science and Engineering, Jadavpur University, Jadavpur, Kolkata
700032, India
e-mail: [email protected]
R. Das
e-mail: [email protected]
S. Dasgupta
e-mail: [email protected]
U. Maulik
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 167


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_9
168 S. Sen et al.

1 Introduction

Deep learning (DL) is an important area in machine learning that has received a lot
of attention recently. Deep learning methods work by progressively extracting com-
plex features from the input data and mapping those features from the output. The
learning algorithm can efficiently build up complex relationships between an input
and the desired output. Therefore, it has been used extensively in areas like com-
puter vision and pattern recognition, self-driving cars, robots, prediction of weather
forecasts, earthquakes, and even generate deep neural networks. The innovations
were not only fueled by the recent algorithmic advances in deep architectures as
well as the availability of large throughput data. For training deep learning models
effectively, copious amounts of data is necessary. With modern devices, sequencing
techniques and improved imaging technologies, biology have become a data-rich
field. Omics data itself is a major fraction of the accumulated data. There are also
vast repositories of image data and signal data available. To make good use of the
vast amount of available data, deep learning provides the perfect set of tools. Con-
temporary deep learning models are used to solve diverse biological problems such
as protein structure prediction, protein-protein interaction analysis, protein function
prediction, bioimage analysis, brain signal analysis and so on.
Previously, to make sense of biological data, many other well-known algo-
rithms were used, such as support vector machines (SVMs), hidden markov models
(HMMs), random forests, Gaussian networks, Bayesian networks, and so on. These
have been heavily implemented in proteomics, genomics, systems biology, and many
other related domains. No matter what algorithm is used, the performance of the
method depended extensively on what features were presented as the input. Features
represent the input data which, subsequently processed by machine learning algo-
rithms, provide the relevant output. But selecting what the right features are can be
quite a difficult task, especially in the domain of omics. This has been a great con-
tribution to deep learning that has not only helped make massive progress in other
domains but in bioinformatics too.

1.1 Deep Learning: An Overview

Deep learning has shown great promise in real-world applications where the majority
of machine learning algorithms have failed. Most early machine learning approaches
relied heavily on the knowledge of domain experts for feature engineering—the task
of crafting the inputs for the machine learning model. This was a great limitation
since the task of processing raw data for creating features was a tedious task. This is
something that deep learning takes care of, but generating the features on its own and
mapping them to the outputs. This is done by multiple layers of nonlinear processing
units, called artificial neurons, or perceptions. Each neuron of a layer is connected
with all the other neurons of the preceding and the succeeding layer, but with none
Application of Deep Architecture in Bioinformatics 169

of the neurons in its own layer. Layers of these neurons stacked one after the other
forms a deep neural network. Each neuron can be tweaked using a set of parameters,
called weights and biases. As the deep neural network is trained, these parameters
are adjusted by the learning algorithms so that the error is minimized. As each layer
of neurons gets trained over each iteration, they get better at extracting the relevant
features from the input data. Most deep learning models are built based on this
technique only. Deep learning architectures that can be broadly classified into three
groups: deep neural networks (DNN), convolutional neural networks (CNN) and
recurrent neural networks (RNN). The term deep neural networks is a very generic
term that is often used to refer to all deep learning architectures, but in this case,
refers to multilayer perceptrons (MLPs), restricted Boltzmann machines (RBMs) and
stacked autoencoders (SAEs). CNNs have been used for computer vision problems
before and in this case, medical image, microscopic images are extensively used with
CNNs for analysis. RNNs are used to predicting or analyzing sequence data, such as
biomedical signal data and sequence data.

1.2 An Overview of Protein Structures

Proteins are essentially polymers of amino acids [1, 2]. After an amino acid sequence
is created after transcription and translation, the chain of amino acids take up different
shapes as it folds onto itself. The sequence of the chain of amino acids represent the
primary structure of a protein. This chain is formed by the peptide bonds during the
synthesis of the protein. Hence, amino acid sequences are also called polypeptides.
These polypeptides fold into simple structures such as loops, sheets or helices. These
structures are known as secondary structures. Secondary structures are regular local
sub-structure on the polypeptide backbone chain. Depending on the amino acids that
are present in the chain, the different structures are formed. Formally, the structures
are mainly of three types: αhelix, β-sheets and Loops [1, 2]. The hydrogen bonds
that exist between the carbonyl oxygen and amine hydrogen in the peptide backbone
determine this. Subsequently, the tertiary structure of a protein is its 3D structure
which is formed as the secondary structure folds onto itself. The tertiary structure
has a one polypeptide chain backbone with single or multiple protein secondary
structure (PSS). There are different bonding and non-bonding types of energy at work
that determine the tertiary structure [1, 2]. These include the covalent bond energy,
Hydrogen-Hydrogen (H–H) interaction energy, electrostatic energy, van der Waals
forces and other intra-molecular forces. Many tertiary structures together combine
to form the quaternary structure. This happens when multiple protein structures bind
together to reach a minimum global energy state [3].
Protein structure prediction is quite a difficult task due to the various parameters
at play. Predicting secondary structure from the primary sequences is not that diffi-
cult with contemporary methodologies. Chou-Fasman algorithm is a statistical tool
that was initially used to find the secondary protein structure from its polypeptide
sequence [4]. Now, multiple methods are available that can perform this prediction
170 S. Sen et al.

with a higher accuracy. For instance, a RNN can easily outperform the Chou-Fasman
algorithm. However, predicting the 3D structure is quite a challenge. Two types of
computational techniques i.e., template based and ab initio are implemented to pre-
dict three-dimensional structure computationally. Among them, the template-based
technique is quite dated and depends on sequence similarity with another known
structure sample e.g., homology modelling. However, the utmost target is to design a
structure with a global minimum energy. Till date, multiple machine learning based
algorithms are designed [5, 6]. Most of them approximate the results with multiple
structures and then further optimize that result.
The function of a particular protein depends on its structure of a protein. A pro-
tein interacts with other elements (mostly proteins except for DNA binding proteins)
through the binding site. The interaction partner of protein is decided by compatible
binding sites among proteins. Similarly, the protein-protein interaction (PPI) network
is derived from multiple interaction partners [7, 8]. At biological level, PPI networks
have their own importance. Technically, proteins are the main functional elements of
any biological elements. Their behaviour depends on the functions and the interaction
partners which assist in defining the position of a protein in any biological pathway
[7, 8]. Predicting the functions and interaction partners are few known challenges in
computational biology. Already different Hidden Markov model [9], Genetic Algo-
rithms are implemented to solve this type of issues at some optimal level. However,
the processing time is quite larger. So there is more scope of algorithmic improve-
ments in such a field where few deep architectures are implemented e.g., Zhao and
Gong [10] describes a deep model to predict protein protein interaction pairs.
Deep architecture has a greater impact on image processing [11]. Therefore, deep
learning approaches are implemented in medical images to diagnose unusual dis-
eased conditions [12, 13]. In recent researches, it is observed that different deep
architectures are implemented on MRI data [13], hyperspectral images [14] and so
on.

2 Deep Learning Approaches for Predicting Protein


Structures

Predicting protein structures is one of the oldest problems in the domain of bioinfor-
matics. Since structures are determined by the sequence, recurrent neural networks
intuitively appear to be the suitable choice. However, other networks such as CNN
and generative stochastic networks (GSN) have also been used. These are discussed
below.
Application of Deep Architecture in Bioinformatics 171

2.1 Predicting with Long Short Term Memory (LSTM)


Network

A RNN, unlike DNNs or CNNs, have a cyclical computational graph. Traditional


DNNs and CNNs are modeled as a feed-forward network. RNNs are commonly
implemented using a Long Short Term Memory (LSTM) [15–17] units. In the feed
forward neural network [18–21], the connection between neurons does not make any
cycle while in the recurrent neural networks (RNN) [22, 23] allows cyclic connection.
The main difference between RNN and multilayer perceptron (MLP) is that MLP
can map only between input and output while RNN can map the entire history of
the preceeding input to each output. LSTM recalls values over arbitrary intervals.
An LSTM network is similar to a standard RNN, except that the hidden layer now
has memory cells instead of summation units. A schematic to describe the workflow
is described in Fig. 1. These memory cells can remember past patterns it has seen
without loss. The same output layer which is utilized for RNN can also be used for
LSTM [15]. LSTM can be implemented for structure prediction of proteins [24–26].
Even RNN can also be applied to find the predicting the secondary structure of
protein [27], however, one of the disadvantages of RNN is the issue of vanishing
gradients [27, 28]. LSTM is used for solving the problem of vanishing gradients.
For predicting the secondary structure of protein, a simple RNN is not suitable.
RNN only considers the past sequences, however, the entire sequence is required
beforehand for protein sequences. This problem can be solved by bidirectional RNN
[29]. Bidirectional RNN processes the data in both direction with two separate hidden
layer, which subsequently are feed forwarded to output layer [29]. The combination of
bidirectional RNN and LSTM is bidirectional LSTM [30]. Two layers are combined
by normalizing the activation from each layer in a softmax layer [22]. The LSTM

Fig. 1 A schematic diagram to show the workflow of recurrent neural network


172 S. Sen et al.

uses a feed-forward network for PSS prediction using softmax prediction function
[24]. Equations 1–8 gives the detailed description of the LSTM architecture which
is used for protein secondary structure prediction [24].

Ft = σ (W F h t−1 + W F at + b F ) (1)

It = σ (W I h t−1 + W I at + b I ) (2)

t = tanh(at WG + h t−1 WG + bG )
G (3)

Mt = Ft  Mt−1 + It  
gt (4)

Yt = σ (αt WY + h t−1 WY + bY ) (5)

h t = Yt  tanh(Mt ) (6)

h t−r ec = h t + Feed f or war d(h t ) (7)

1
σ (x) = (8)
1 + exp(1 − x)

at : input from the previous layer: h l−1


t
F t : Forget Gate
I t : Input Gate
t : New memory cell
G
M t : Final memory cell.
Y t : Output Gate
ht : Final hidden state
ht −rec : Forward recursion.

Sonderby and Winther [24] modified their LSTM architecture for protein
secondary structure prediction by introducing a feed-forward network between
recurrent-hidden state as in Eq. 7. This approach for protein secondary structure pre-
diction mainly focuses on 8-class secondary structure [31] prediction which is more
informative than the traditional 3-class and 8-class secondary structure labels were
designed using the DSSP program [32]. The DSSP program classify each residue
into eight classes (C: Loops and irregular elements (corresponding to the blank char-
acters output by DSSP), E: β-strand, H: α-helix, B: β-bridge, G: 3_{10} helix, I:
π-helix, T: Turn, S: Bend). This model uses 3 layers that have 300 or 500 LSTM
units per layer. The FF network is implemented using a two layers ReLU activation
with similar number of units per layer. The output from bidirectional forward and
backward is connected to a vector that is forwarded through two ReLU activation
layers which have 300 or 500 hidden units. This approach achieved accuracy of
Application of Deep Architecture in Bioinformatics 173

67.40% [24], better than GSN approach [33] which achieved accuracy of 66.40%.
And LSTM network also perform much better than bidirectional RNN approach [34]
which got accuracy of 51.10%.

2.2 Deep Supervised and Convolutional Generative


Stochastic Networks

Generative Stochastic Network (GSN) [35] has been recently used to [36] learn gen-
erative data distribution models without stating any probabilistic graphical model.
Backpropagation is applied to train the GSN model [35, 37]. GSN can estimate the
data, generated by the transition operator of a Markov Chain rather than directly
parameterizing P(X) [38]. GSN trains a stochastic computational graph for recon-
structing the input X [39]. The primary advantage of a GSN is that the computational
graph may have latent states. This is similar to generative models like Deep Boltz-
mann Machine (DBM) [40]. The architecture is described below:
There are two inputs, i.e., a feature channel X and label channel y for applying
convolution GSN. Figure 2 shows the architecture of a convolutional GSN model. For
supervised convolutional GSN, the computational graph corrupts the label channels
and reconstructs the label channels. Feature map is given as input to the first hidden
layer to compute the activation function [33]. The convolutional GSN includes an
input layer and a convolutional layer. Computational graph in convolutional GSN
utilizes layer-wise sampling which is similar to DBM [40]. The convolutional GSN
layer in computation graph of convolutional GSN must have a convolutional layer but
the pooling layer is optional. Stacked convolutional layers can be used deeper archi-
tectures [36]. The convolutional GSN approach for PSS prediction mainly focuses on
8-state secondary structure [31] prediction which gives more structural information
than 3-state secondary structure. Unlike the 3-class SS, the 8-class can distinguish
between 3-helix and 4-helix. Therefore, it can be used describe different types of
loop regions. Position-specific scoring matrix (PSSM) is also used predicting the
secondary structure of protein [41]. PSSM is a matrix of size n × b where n is

Fig. 2 Show the architecture


of convolutional GSN with 2
convolutional GSN layer
174 S. Sen et al.

the protein length and b is the number of amino acid types. PSSM matrix is gen-
erated using the UniRef90 data set. The generated PSSM matrix is used as input
for convolutional GSN model [33]. Score of the PSSM matrix is then transformed
into a range of 0–1 using sigmoid function [42]. The protein data set is generated
by PISCES Cull PDB server [43]. The data set consists of 6128 proteins which is
divided randomly into a training set which contained 5600 proteins and a validation,
n set of 256 proteins and test dataset contained 272 proteins [33]. 8-state secondary
structure labels are determined from the 3D protein data bank (PDB) structure by the
database of secondary structure assignments (DSSP) program [32]. The training data
contain both labels and features. To inject some noise into the input labels, half of the
input labels were randomly set to zeros. The Convolutional GSN is trained globally
by backpropagation [35]. Sigmoid activation is used in the visible layer while tanh
activation function used for all other layer. This Convolutional Generative Stochas-
tic Network approach for protein secondary structure prediction [33] achieved Q8
accuracy of 66.40%, better than CNF/Raptor-SS8 [44] which achieved Q8 accuracy
of 64.90%. The main disadvantage of this convolutional GSN approach is that the
convolutional structure is hard-coded, thus it some times may not capture the spatial
organization of the protein sequence.

2.3 Latent Convolutional Neural Networks

A deep architecture applying CNN algorithm was utilized to implement a latent deep
learning system for predicting protein structure. This architecture has two levels.
Firstly, stacked sparse autoencoder approach was implemented to extract initially
protein features and then the screened data are utilized as input for latent CNN
architecture. Detail description of the levels is given below.
Stacked Sparse Autoencoder Approach to Extract Initial Protein Features An
autoencoder is an unsupervised feature extraction model. An autoencoder consists of
three layers of artificial neurons where the intermediate hidden layer has fewer nodes
than the input layer, while the output layer and the input layer has smae number of
nodes. The goal of an autoencoder is to replicate the input in the output. Since the
data is passed through a smaller number of intermediate nodes, the features are com-
pressed and the output is represented by only the most dominant features of the input.
This is how the important features are automatically extracted by the autoencoder.
A stacked autoencoder is made out of multiple consecutive autoencoders where the
extracted features of one layer is passed as the input to the succeeding autoencoder.
[45]. The sparse autoencoder is used to extract the initial level of protein features.
This, when used in conjunction with a CNN can enable us to get a better set of
features. In the architecture, sparse autoencoder works as a reprocessing and fea-
ture extraction unit. For preprocessing, the available protein dataset is separated into
two part, training data and validation data. Binary representation is mapped with the
sequence string. From the combination of 20 amino acid, one amino acid is coded
with 1 and 0 is set in all other positions. Twenty binary strings are needed where
Application of Deep Architecture in Bioinformatics 175

each string represent one amino acid. So the size of input data is 20 × M, where M
is number of amino acid in the chain [5]. The same procedure has been used in the
output. The α-helix is represented as [1 0 0] whereas β-sheet and the Loop are repre-
sented as [0 1 0] and [0 0 1] respectively [5]. The input data of 20 × M dimension is
fed into the autoencoder to detect the initial of features from the training data. Using
this feature, the softmax classifier is trained [46] to predict the secondary structure
[47].
Deep Learning Implemented Using Latent CNN Structure CNN based deep
architecture is motivated by animal visual cortex [48]. Al-Azzawi describes at [47]
that a latent deep learning architecture can be based on the stacked sparse autoencoder.
A CNN is based on neural networks. They are composed of layers or artificial neurons
which can learn shared weights and biases. CNN uses backpropagation algorithm to
train the network [49, 50]. Local receptive field or local filter scans the entire input
data. This local filter unit shares the same weights and biases. This means that all
the neurons in the initial hidden layer learn the same features [49, 50]. The feature
extraction is done by convolution of the input data with filter and by including a
bias term, and then passing the data through an activation function. CNN applies
learned filter to convolve the features map from the previous layer. The second
operation is pooling. The pooling layer performs subsampling to decrease the size
of the output. The max-pooling is a common method of subsampling that takes the
maximum value in a local window of the output. The entire map is divided into small,
equally sized regions and the maximum value from each region is taken [49, 50].
The final layer of connection is fully connected layer. The final layer of connection
is a fully connected layer. As mentioned before, the latent CNN structure is the
combination method of stacked sparse autoencoder and deep CNN. Al-Azzawi [47]
used SCRATCH protein dataset that contains the primary and secondary structures
with their three-class descriptions. The performance of PSS prediction system is
measured by the ratio between the number of correct predictions or true positives to
the total number of attempts [47]. By using stacked sparse autoencoder the training
accuracy achieved is 62.67% and testing accuracy achieved is 61.04% [47]. The
latent deep learning approach for protein secondary structure prediction system is
achieved by the accuracy of 90.31% using SCRATCH protein dataset [47]. While
the machine learning approach proposed by Chistophe is achieved by 84.51% [51].

3 Deep Learning Approach for Protein–Protein Interaction


and Protein Function Prediction

To understand the molecular mechanism, protein function prediction is the key point.
Under the structure-function paradigm, the functional dependencies of proteins are
associated with structures of proteins at the cellular and subcellular level. Organism-
specific function prediction of the protein from the structure or biophysical proper-
ties is a machine learning based modeling problem. Following that, the interaction
176 S. Sen et al.

partners are determined by the functional classification of the proteins. Predicting


interaction partners from Protein–Protein Interaction (PPI) network is also one of
the computational challenges. Aforementioned issues can be addressed by applying
deep architecture. Few recent types of research on this topic have been discussed
below.

3.1 Identification of Protein Function Based on Its Structure


Using Deep CNN

Protein function prediction methods are techniques that are used to define the biolog-
ical and biochemical role of proteins. DCNN, a high-performance model in machine
learning [49, 50] is introduced to design a predictive model for protein function
prediction Fig. 3. DCNN consists of convolutional and pooling layers. The depth
of each filter increases from the start to the end in the network. The last stage is
basically made of one or more fully connected layers. DCNN architecture can be
used to predict the function of the protein [52]. The protein function is associated
with the 3D structure of the protein. The binding site also influences the functions. A
domain in a protein is a structural motif which folds into a definite structure. CATH
is a hierarchical classification of the structure of protein domains [53]. SCOP was
introduced to provide details and a elaborate description of the structure and cor-
relations of the known protein structure [54]. For tertiary structure recognition of
protein, feature extraction is a vital step. One conventional method for identification
of the 3D structure of a protein is extracting the feature vector and then comparing
them by some distance measure [55]. But this distance based method may not give
similar structures of certain types and it is very sensitive. The 3D structure is based

Fig. 3 The DCNN architecture for tertiary protein structure prediction [52]
Application of Deep Architecture in Bioinformatics 177

on a backbone polypeptide chain that is flanked by one or more protein secondary


structures or domains. The bonding, as well as the interactions, of side chains with
the subunits of the protein define the complete 3D structure. The protein tertiary
structure is represented using the position of the atoms in 3D space [2]. For protein
function prediction using DCNN, a three dimensional (3D) array is required which
represent the tertiary structure of the protein. Visualization tool [56] can describe the
protein structure in 3D form. However, DCNN requires coordinates which shows the
connection between atoms in a 3D array. Virtual Reality Machine language (VRML)
can be applied for better visualization and pixel coordinates. For converting PDB
files to VRML format, Molscript Tool can be used [57]. Then Binvox a 3D mesh
visualization tool is used to generate the 3D array which contains 0 and 1. These
1 represent the presence of an object in array [52, 58]. Tertiary protein structure is
represented as a 3D array format after pre-processing. This 3D array is projected
into three perpendicular hyperplane XY, XZ, YZ of feature space [52]. Each of this
projected 2D image is provided as the input to the DCNN. In deep CNN for each pro-
jected view, separate feature extraction layer is implemented. The last layer applies
Rectified Linear Unit (ReLU) [11] for classification. The DCNN extracts the feature
from three separately projected image and classifies them using the fully connected
neural network. As the functional property of protein is completely depending on
the shape and size of the binding site region. The proposed approach [52], applying
DCNN, can classify the protein based on their active domains and thus their func-
tionality. The data set is divided into 5 non-overlapping data set and each data set is
also divided into a training set and a validation set [52]. This proposed model [52]
achieves maximum accuracy of 88% and average accuracy of 81% which is close
enough to recently developed successful method based on protein fold recognition
[59–62] with the accuracy of 89%, 74%, 80% and 84.04% respectively (Fig. 4).

Fig. 4 A schematic diagram to show the workflow of deep convolution neural network
178 S. Sen et al.

3.2 DL Based PPI Interface Residue Pair Prediction

PPIs are biochemical events that involve two or more protein molecules. PPI plays
an important role in the functioning of the cell. It is important to identify PPI sites
at they show which amino acid residues contribute most to the protein–protein inter-
actions. This allows them to be potential drug targets too. Furthermore, they allow
us to gain insight into metabolic and signal transduction networks. Domains like
protein engineering, protein design, drug design, and other applications heavily rely
on the understanding of PPI. For correctly predicting the PPI, methods have been
designed to predict the biding sites of monomer protein [63]. There are mainly four
kinds of approaches including machine learning [64], template-based [65], corre-
lated mutations [66] and structural model [67]. The latter is widely utilized but it
has some limitation [68]. Nowadays, deep architecture has become one of the pop-
ular approaches to perform [69–71] PPI interface residue pair prediction [10]. More
precisely, LSTM is applied [15–17]. As mentioned before, LSTM is an RNN archi-
tecture that remembers values over arbitrary intervals. Unlike RNN, the summation
units are substituted by memory cells in LSTM. The memory cells can remember
previously seen information. The output layer which is used for RNN can also be
used for LSTM. The RNN consists of the input, the hidden, and the output layer.
The input propagates through these layers in order. This is the forward pass. There
are mainly two methods for regulating the weights in the neural networks, first is
real-time recurrent learning and the second is backpropagation. For backpropagation,
first, the partial derivative has to be calculated of the loss or the error function with
respect to the output of the network. To change the weights, the partial derivative of
the loss with respect to the weights are calculated. And finally, applying the chain
rule adjustment, the direction of the weights is observed. This procedure is called
a backward pass. The output depends on the cell state which is determined by run-
ning a sigmoid layer. Thereafter, the cell state is put into the activation layer with
an activation function tanh activation function to normalize the value from −1 to 1.
Multiplying the sigmoid value at the sigmoid gate, the final output is stored. This
LSTM model train, validate and test on the International Critical Assessment of Pro-
tein–protein Interaction Prediction [72, 73]. The method has achieved the accuracy
of 90% for prediction of protein–protein interaction interface residue pairs.

4 DL in Medical Imaging and Disease Diagnosis

CNN is a powerful tool for solving the problem in computer vision. DCNN can
automatically learn mid-level and high-level abstraction which is acquired from raw
input data. Accurate disease diagnosis is heavily depending on both image acquisition
and image interpretation. In 1996, CNN was applied to medical image processing
for breast cancer detection [74].
Application of Deep Architecture in Bioinformatics 179

4.1 Patch-Based CNN Approach for Brain MRI


Segmentation

Magnetic Resonance Image (MRI) plays a crucial role in medical diagnosis, espe-
cially when diagnosing issues with the brain. Structural variation of the brain may
correspond to a symptom of many diseases. Medical Image Segmentation is the
process of automatic or semi-automatic recognition of boundaries within a 2D or
3D image. The high variability in such images is the biggest challenge. Not only is
there huge variation in the anatomy of different humans, but the different medical
imaging methods, such as CT, PET X-Ray, and so on have their own distinguishing
characteristics. MRI provides quite detailed imaging. MRI images, therefore, has
been used for implementing automatic segmentation. In modern medical research,
segmentation of brain MRI plays an important role. The seriousness of some dis-
ease or evaluation in the brain can be done by observing structural variation by
measuring volumes of the region of interest [75]. There are several segmentation
methods available, which are basically edge-based and contour-based [12]. How-
ever, it is quite challenging to achieve good accuracy using mentioned methods on
brain MRI segmentation. CNN is a suitable method because it can work with multi-
dimensional vectors. Therefore, both gray-scale and color images can be processed
using CNNs [76–78]. Conventional methods of brain MRI segmentation have some
limitation too. The conventional approach for brain MRI segmentation is very time
consuming and along with that, training data is a major problem in brain MRI seg-
mentation. To conquer these difficulties, brain MRI segmentation, [79] implemented
this using a patch-based CNN architecture. Cui et al. [79] used a public data set
CANDI neuroimaging access point for brain MRI segmentation using patch-based
CNN architecture. The dataset contains 103 MRI from four diagnostic group: bipolar
disorder with and without psychosis, schizophrenic spectrum and finally, a healthy
control [80]. In [79], Cui et al. extracts a few sets of MRI data where each data
set consists of 4 to 5 MRI. These images are divided into 256 × 256 to 32 × 32
and 13 × 13 patches. The training set has nearly a hundred thousand training image
patches. This method utilized CNN for pixel-based automatic segmentation of brain
MRI [81]. In image segmentation tasks, each image patch has a label. The labels of
these patches are used to create a new segmented MRI image. The proposed CNN
architecture achieved an accuracy of 90.83% [79]. It makes use of multiple 5 × 5
kernels. This proposed CNN architecture is compared with five different deep learn-
ing architecture, three CNN (CNN1, CNN2 and CNN3) and two artificial neural
networks (ANN1 and ANN2). The layered architecture of the first two CNN archi-
tecture CNN1 and CNN2 are identical to the proposed CNN, the only difference is,
CNN1 and CNN2 used fewer features map than proposed CNN. Input patch size
for both CNN1 and CNN2 is 32 × 32. The activation function is replaced by a sig-
moid function in the convolutional layer. The third CNN architecture CNN3 used
1313 input patch size. CNN3 contains 4 convolutional layers and a fully connected
layer. Max-pooling layer is not a part of CNN3. The structure of two different ANN
are: ANN1 is a 3 layer architecture and ANN2 is a 5 layer architecture. In ANN1,
180 S. Sen et al.

Table 1 A list of applied machine learning approaches for different biological problems and along
with their performance
Implementation on biological Applied machine learning approaches Accuracy (%)
issues
Protein secondary structure Latent CNN [47] 90.3126
prediction Machine learning and structural similarity [51] 84.51
LSTM [24] 67.4
GSN [33] 67.4
Stacked sparse auto encoder [47] 62.674
CNF/Raptor-SS8 [44] 64.9
RNN [34] 51.1
Protein function prediction Protein folding [59] 89
Deep CNN [52] 88
Graph Kernel [62] 84.04
Hierarchical classification [61] 80
Protein–protein interaction LSTM [10] 90
interface residue pair prediction
Brain MRI segmentation Proposed CNN((conv, pool) = 48, (conv, pool) 90
= 96, conv = 700, conv = 19, softmax = 19)
[79]
CNN2((conv, pool) = 20, (conv, pool) = 50, 90.83
conv = 500, conv = 19, softmax = 19) [79]
CNN1((conv, pool) = 20, (conv, pool) = 50, 90.81
conv = 500, conv = 19, softmax = 19) [79]
CNN3(conv = 40, conv = 160, conv = 500, 89.97
conv = 19, softmax = 19) [79]
ANN1(3layer(1024-150-10)) [79] 86.25
ANN2(5layer(1024-800-400-150-10)) [79] 74.94
Cell classification Deep CNN(Bloodcell-3size (973 × 799 × 33)) 93
[14]
Deep CNN(Bloodcell-2size(462 × 451 × 33)) 89.92
[14]
SVM(Bloodcell-2) [14] 63.11
SVM(Bloodcell-3) [14] 56.35
Alzheimers disease recognition Deep CNN [13] 96.8588
SVM [83] 84
Identifying metastatic breast Deep CNN [84] 98.4
cancer
Annotating the pathogenicity of DNN [85] 66.1
genetic variants
Classifying and segmenting DCNN [86] 72.3
microscopy images
Application of Deep Architecture in Bioinformatics 181

the first layer, the second layer, and third layer contain 1024, 150 and 10 neurons
respectively. And in the ANN2 first layer, the second layer, the third layer, fourth
layer, and fifth layer contain 1024, 800, 400, 150 and 10 neurons respectively. The
accuracies, achieved by this 5 different architectures CNN1, CNN2, CNN3, ANN1,
and ANN2, is 89.97%, 90.18%, 86.25%, 76.68%, and 74.94% [79] respectively. The
proposed CNN performs best because of a number of feature maps. Dice-ratio (DR)
[82] is also used to measure the segmentation accuracy. The larger value indicates a
higher segmentation accuracy. The propose CNN achieved DR of 95.19%. CNN1,
CNN2, and CNN3 achieved DR of 94.12%, 94.83%, 92.62% respectively [79]. The
proposed CNN can segment complex edge pixels successfully. However, there are
also some pixels which are wrongly classified (Table 1).

References

1. Pauling, L., Corey, R.B., Branson, H.R.: The structure of proteins: two hydrogen-bonded helical
configuration of the polypeptide chain. Proc Natl Acad Sci 37(4), 205–211 (1951)
2. Ivar, B.C.: Introduction to Protein Structure. Garland Publishing, New York (1999)
3. Patel, M., Shah, H.: Protein secondary prediction using support vector machine. In: International
Conference on Machine Intelligence and Research Advancement, pp. 594–598 (2013)
4. Chou, P.Y., Fasman, G.D.: Prediction of the secondary structure of proteins from their amino
acid sequence. Trends Biomed. Sci. 2, 128–131 (1977)
5. Hasic, H., Buza, E., Akagic, A.: A hybrid method for prediction of protein secondary structure
based on multiple artificial neural networks, pp. 1195–1200. MIPRO, Opatija (2017)
6. Cheng, J., Tegge, A.N., Baldi, P.: Machine learning method for protein structure prediction.
IEEE Rev. Biomed. Eng. 1, 41–49 (2008)
7. Andreopoulos, W., Labudde, D.: Protein-protein interaction networks. In: Protein Purification
and Analysis I: Methods and Applications. iConcept Press (2013)
8. Jaimovich, A.: Understanding protein-protein interaction network. Ph.D. Thesis. Hebrew Uni-
versity (2010)
9. Asai, K., Hayamizu, S., Handa, K.I.: Prediction of protein secondary structure by the hidden
Markov model. Bioinformatics 9(2), 141–146 (1993)
10. Zhao, Z., Gong, X.: Protein-protein interaction interface residue pair prediction based on deep
learning architecture, IEEE/ACM Trans. Comput. Biol. Bioinform. (2017)
11. Krizhevsky, A., Sutskever, I., Hinto, G.E.: Imagenet classification using deep convolutional
neural network. In: Advances in Neural Information Processing System, pp. 1097–1105 (2012)
12. Cireşan, D.C., et al.: Mitosis detection in breast cancer histology images with deep neural
networks. In: International Conference on Medical Image Computing and Computer-assisted
Intervention. Springer, Berlin, Heidelberg (2013)
13. Sarraf, S., Tofighi, G.: Deep learning-based pipeline to recognize alzheimers disease using
fMRI Data. In: IEEE, Future Technologies Conference, pp. 816–820, 2016
14. Li, X., Li, W., Xu, X., Hu, W.: Cell classification using convolutional neural networks in medical
hyperspectral imagery. In: 2nd International Conference on Image, Vision and Computing,
pp. 501–504 (2017)
15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
16. Greff, K., Kumar Srivastava, R., Koutin, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a
search space Odyssey (2017). arXiv:1503.04069v1
17. Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent
networks. J. Mach. Learn. Res. pp. 115–143 (2002)
182 S. Sen et al.

18. Svozil, D., Kvasnicka, V., Pospichal, J.: Introduction to multi-layer feed forward neural net-
work. Chemom. Intell. Lab. Syst. 39, 43–62 (1997)
19. Toh, K.-A., Lu, J., Yau, W.-Y.: Global feedforward neural network learning for classification
and regression. In: International Workshop on Energy Minimization Methods in Computer
Vision and Pattern Recognition, pp. 407–422 (2001)
20. Bishop, C.M.: Neural network for pattern recognition. Oxford University Press Inc., New York
(1995)
21. Schmidt, W.F., Kraaijveld, M.A., Duin, R.P.W.: Feed forward neural networks with ran-
dom weights. In: 11th IAPR International Conference on Conference B: Pattern Recognition
Methodology and Systems, Proceedings, vol. 2, pp. 1–4 (1992)
22. Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Berlin,
pp. 5–13 (2012)
23. Pascanu, R., Gulcehre, C., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks
(2013). arXiv preprint arXiv:1312.6026
24. Sonderby, S.K., Winther, O.: Protein secondary structure prediction with long short term mem-
ory networks (2015). arXiv:1412.7828v2
25. Hochreiter, S., Heusel, M., Obermayer, K.: Fast model-based protein homology detection
without alignment. Bioinformatics 23(14), 1728–1736 (2007)
26. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Brief. Bioinf. 18(5), 851–869
(2017)
27. Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the past and the future in
protein secondary structure prediction. Bioinformatics 15(11), 937–946 (1999)
28. Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is
difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
29. Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Pro-
cess. 45(11), 2673–2681 (1997)
30. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and
other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
31. Yaseen, A., Li, Y.: Template-based prediction of protein 8-state secondary structures. In:
IEEE 3rd International Conference on Computational Advances in Bio and Medical Sciences
(ICCABS), pp. 1–2 (2013)
32. Wolfgang, K., Christian, S.: Dictionary of protein secondary structure: pattern recognition of
hydrogen bond and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
33. Zhou, J., Troyanskaya, O.G.: Deep supervised and convolutional generative stochastic network
for protein secondary structure prediction. In: Proceeding of the 31st International Conference
on Machine Learning, Beijing, China, JMLR: W&CP, vol. 32, pp. 745–753 (2014)
34. Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary
structure in three and eight classes using recurrent neural network and profiles, proteins: struc-
ture. Funct. Genet. 47(2), 228235 (2002)
35. Bengio, Y., Thibodeau-Laufer, E., Alain, G.: Deep generative stochastic networks trainable by
backprop. In: International Conference on Machine Learning, pp. 226–234 (2014)
36. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
37. Du, C., Zhu, J., Zhang, B.: Learning deep generative models with doubly stochastic gradient
MCMC. IEEE Trans. Neural Netw. Learn. Syst. (2017)
38. Ozair, S., Yao, L., Bengio, Y.: Multimodal transitions for generative stochastic network. arXiV:
1312.5578v4 (2014)
39. Bengio, O., Yao, L., Alain, G., Vincent, P.: Generalized denoising auto-encoders as generative
models. In: Advances in Neural Information Processing Systems, pp. 899–907 (2013)
40. Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. Appearing in Proceedings of the
12th International Conference on Artificial Intelligence and Statistics (AISTATS), Clearwater
Beach, Florida, USA, vol. 5 of JMLR: W&CP 5 (2009)
41. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices.
J. Mol. Biol. 292(2), 195–202 (1999)
Application of Deep Architecture in Bioinformatics 183

42. Jamel, T.M., Khammas, B.M.: Implementation of sigmoid activation function for neural net-
work using FPGA. In: 13th Scientific Conference of Al-Ma’moon University College (2012)
43. Wang, G., Dunbrack Jr., R.L.: PISCES: a protein sequence culling server. Bioinformatics 19,
1589–1591 (2003)
44. Wang, Z., Zhao, F., Peng, J., Xu, J.: Protein 8-class secondary structure prediction using con-
ditional neural fields. Proteomics 11(19), 3786–3792 (2011)
45. Ng, A.: Sparse Autoencoder. CS294A Lecture notes, vol. 72 (2011)
46. Ng, A.: Supervised learning. CS229 Lecture Notes, pp. 1–3 (2000)
47. Al-Azzawi, A.: Deep learning approach for secondary structure protein prediction based on
first level features extraction using a latent cnn structure. Int. J. Adv. Comput. Sci. Appl. 8(4),
5–12 (2017)
48. Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex.
J. Physiol. 195, 215–243 (1967)
49. LeCun, Y., Bengio, Y.: Convolutional Networks for Image, Speech and Time-Series. AT and
T Bell Laboratories, Dept Imformatique Recherche (1995)
50. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.:
Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst.
396–404 (1990)
51. Magnan, C.N., Baldi, P.: Perfect prediction of protein secondary structure and relative solvent
accessibility. Mach. Learn. Struct. Similarity Bioinform. 30(18), 2592–2597 (2014)
52. Tavanaei, A., Maida, A.S., Kaniymattam, A., Loganantharaj, R.: Towards recognition of pro-
tein function based on its structure using deep convolutional network. In: IEEE International
Conference on Bioinformatics and Biomedicine (BIBM), pp. 145–149 (2016)
53. Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., Thornton, J.M.: CATH a
hierarchic classification of protein domain structures. Structure 5(8), 1093–1109 (1997)
54. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of pro-
teins database for the investigation of sequences and structures. J. Mol. Biol. 247(4), 536–540
(1995)
55. Karim, R., Al-Aziz, M.M., Shatabda, S., Rahman, M.S., Mia, M.A.K., Zaman, F., Rakin, S.:
CoMOGrad and PHOG: from computer vision to fast and accurate protein tertiary structure
retrieval. Sci. Rep. 5, 1–11 (2015)
56. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin,
T.F.: UCSF chimera a visualization system for exploratory research and analysis. J. Comput.
Chem. 25(13), 1605–1612 (2004)
57. Kraulis, P.K.: MOLSCRIPT: a program to produce both detail and semantic plots of protein
structures. J. Appl. Crystallogr. 24, 946–950 (1991)
58. Nooruddin, F., Turk, G.: Simplification and repair of polygonal models using volumetric tech-
niques. In: IEEE Trans. Vis. Comput. Graph. 9(2), 191–205 (2003)
59. Zakeri, P., Jeuris, B., Vandebril, R.: Protein fold recognition using geometric kernel data fusion.
Bioinformatics 30(13), 1850–1857 (2014)
60. Brylinski, M., Lingam, D.: eThread: a highly optimized machine learning based approach to
meta threading and the modeling of protein tertiary structure. PLoS One 7(11), e50200 (2012)
61. Lin, C., Zou, Y., Qin, J., Jiang, Y., Ke, C., Zou, Q.: Hierarchical classification of protein folds
using a novel ensemble classifier. PLoS One 8(2), e56499 (2013)
62. Borgwardt, K.M., Ong, C.S., Schonauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.-P.:
Protein function prediction via graph kernels. Bioinformatics 21, i47–i56 (2005)
63. Giard, J., Ambroise, J., Gala, L.J.: Regression applied to protein binding site prediction and
comparison with classication. BMC Bioinform. 10(1), 1–12 (2009)
64. Cheng, J., Baldi, P.: Improved residue contact prediction using support vector machines and a
large feature set. BMC Bioinform. 8(2), 1–9 (2007)
65. Ohue, M., Matsuzaki, Y., Shimoda, T.: Highly precise protein-protein interaction prediction
based on consensus between template-based and de novo docking methods. BMC Proc. 7(7),
S6 (2013)
184 S. Sen et al.

66. Gobel, U., Sander, C., Schneider, R.: Correlated mutations and residue contacts in proteins.
BMC Proc. 7(7), S6 (2013)
67. Singh, R., Park, D., Xu, J., Hosur, R., Berger, B.: Struct2Net: a web service to predict pro-
tein–protein interactions using structure based approach. Nucleic Acids Res. 38(2), 508–515
(2010)
68. Moult, J.B., Fidelis, K., Rost, B.: Critical assessment of methods of protein structure prediction,
CASP, Round 6. Proteins (2010)
69. Lena, D.P., Nagata, K., Baldi, P.: Deep architectures for protein contact map prediction. Bioin-
formatics 28(19), 2449–2457 (2012)
70. Larochelle, H., Bengio, Y., Louradour, J.: Exploring strategies for training deep neural net-
works. J. Mach. Learn. Res. 1–40 (2009)
71. Alessandro, L., Gianluca, P., Pierre, B.: Deep architectures and deep learning in chemoinfor-
matics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inform. Model.
53(7), 1563–1575 (2013)
72. Vreven, T., Moal, H.I., Vangone, A.: Updates to the integrated protein–protein interaction
benchmarks: docking benchmark version 5 and affinity benchmark version 2. J. Mol. Biol.
427(19), 3031–3041 (2015)
73. Janin, J., Henrick, K., Moult, J.: Assessment of predicted interactions. CAPRI: a critical assess-
ment of predicted interactions. Proteins Struct. Funct. Bioinform. 52(1), 2–9 (2003)
74. Sahiner, B.: Classification of mass and normal breast tissue: a convolution neural network
classifier with spatial domain and texture images. Proteins Struct. Funct. IEEE Trans. Med.
Imag. 15(5), 598610 (1996)
75. Shaun, P.: Brain MRI Segmentation, Computational Surgery and Dual Training, pp. 45–73.
Springer, US (2010)
76. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici,
G.: Beyond short snippets: deep networks for video classification. In: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 4694–4702 (2015)
77. Ye, H., Wu, Z., Zhao, R.-W., Wang, X., Jiang, Y.-G., Xue, X.: Evaluating two-stream CNN for
video classification. In: Proceedings of the 5th ACM on International Conference on Multime-
dia Retrieval, pp. 435–442 (2015)
78. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Largescale video
classification with convolutional neural networks. In: Proceedings of the IEEE Conference on
International Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
79. Cui, Z., Yang, J., Qiao, Y.: Brain MRI segmentation with patch-based CNN approach. In:
Proceedings of the 35th Chinese Control Conference, pp. 27–29 (2016)
80. Kennedy, N.D., Haselgrove, C., Hodge, M.S.: CANDIShare: a resource for pediatric neu-
roimaging data. Neuroinformatics 10(3), 319–322 (2012)
81. Leena Silvoster, M., Govindan, V.K.: Convolutional neural network based segmentation. In:
Computer Networks and Intelligent Computing: 5th International Conference on Information
Processing, ICIP, vol 157, pp. 190 (2011)
82. Zhang, W., Li, R., Deng, H., Wenlu, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural
networks for multi-modality isointense infant brain image segmentation. NeuroImage 214–224
(2015)
83. Tripoliti, E.E., Fotiadis, D.I., Argyropoulou, M.: A supervised method to assist the diagnosis
and classification of the status of alzheimers disease using data from an FMRI experiment.
In: Engineering in Medicine and Biology Society. EMBS 2008. 30th Annual International
Conference of the IEEE, pp. 4419–4422 (2008)
84. Wang, D., Khosla, A., Gargeya, R., Irshad, H., Beck, A.H.: Deep learning for identifying
metastatic breast cancer (2016). arXiv preprint arXiv:1606.05718
85. Quang, D., Chen, Y., Xie, X.: DANN: a deep learning approach for annotating the pathogenicity
of genetic variants. Bioinformatics 31(5), 761–763 (2014)
86. Kraus, O.Z., Grys, B.T., Ba, J., et al.: Automated analysis of high-content microscopy data
with deep learning. Mol. Syst. Biol. 13(924 (2017)
Application of Deep Architecture in Bioinformatics 185

Sagnik Sen is currently a doctoral researcher at Department


of Computer Science and Engineering, Jadavpur University,
Kolkata. He has his expertise in the field of Computational Biol-
ogy, Bioinformatics and Structural Biology. A gold medallist
in his Masters, Sagnik has been awarded the Department of
Science and Technology (Govt. of India) INSPIRE fellowship
for his research. Sagnik’s work has already been published in
many highly esteemed peer-reviewed international journals. In
last few years Sagnik has done some insightful works on Intrin-
sically disordered proteins and their functional and structural
dynamics. His works provide a strong blend between biology
and computer science.

Rangan Das is a Master’s student in the department of Com-


puter Science and Engineering at Jadavpur University, Kolkata,
India. His current research interests encompass the area of deep
learning.

Swaraj Dasgupta is a Master’s student in the department


of Computer Science and Engineering at Jadavpur University,
Kolkata, India. His current research interests encompass the area
of advance machine learning.
186 S. Sen et al.

Dr. Ujjwal Maulik is a Professor in the Department of Com-


puter Science and Engineering, Jadavpur University, Kolkata,
India since 2004. He did his Bachelors in Physics and Computer
Science in 1986 and 1989 respectively. Subsequently, he did
his Masters and Ph.D. in Computer Science in 1992 and 1997
respectively. Dr. Maulik has worked in Los Alamos National
Laboratory, Los Alamos, New Mexico, USA in 1997, Univer-
sity of New South Wales, Sydney, Australia in 1999, Univer-
sity of Maryland Baltimore County, USA in 2004, University of
Heidelberg, Germany in 2009, German Cancer Research Center
(DKFZ) in 2010, 2011 and 2012, International Center of Theo-
retical Physics (ICTP), Trieste, Italy in 2014 and 2017, Univer-
sity of Padova in 2014 and 2016.
Intelligent, Secure Big Health Data
Management Using Deep Learning
and Blockchain Technology:
An Overview

Sohail Saif, Suparna Biswas and Samiran Chattopadhyay

Abstract Sensor-based health data collection, remote access to health data to render
real-time advice have been the key advantages of smart and remote healthcare. Such
health monitoring and support are getting immensely popular among both patients
and doctors as it does not require physical movement which is always not possible for
elderly people who lives mostly alone in current socio-economic situations. Health-
care Informatics plays a key role in such circumstances. The huge amount of raw
data emanating from sensors needs to be processed applying machine learning and
deep learning algorithms for useful information extraction to develop an intelligent
knowledge base for providing an appropriate solution as and when required. The
real challenge lies in data storage and retrieval preserving security, privacy, relia-
bility and availability requirements. Health data saved in Electronic medical record
(EMR) is generally saved in a client-server database where central coordinator does
access control like create, access, update, or delete of health records. But in smart
and remote healthcare supported by enabling technologies such as Sensors, Internet
of Things (IoT), Cloud, Deep learning, Big data, etc. EMR needs to be accessed in a
distributed manner among multiple stakeholders involved such as hospitals, doctors,
research labs, patients’ relatives, insurance provider, etc. Hence, it is to be ensured
that health data be protected from unauthorized access specifically to maintain data
integrity using advanced distributed security techniques such as blockchain.

Keywords Electronic medical record · Big data · Security · Blockchain · Deep


learning

S. Saif · S. Biswas (B)


Department of Computer Science and Engineering, Maulana Abul Kalam Azad
University of Technology, Kolkata, West Bengal, India
e-mail: [email protected]
S. Saif
e-mail: [email protected]
S. Chattopadhyay
Department of Information Technology, Jadavpur University, Kolkata, West Bengal, India
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 187


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_10
188 S. Saif et al.

1 Introduction

Smart and Remote Healthcare for elderly care [1] and patient’s monitoring are get-
ting increasingly popular among researchers due to its applicability and acceptance
in today’s socio-economic scenario where the average lifetime of human being has
increased leading to live with age related ailments and without personalized care.
Electronic Medical Record (EMR) [2] has traditionally been saved in distributed
databases which mostly follow client-server architecture. There is a central con-
trol called administrator to supervise or manage permission of end-users to create,
access, update, or delete health records. Health data sensed by the sensors are prone
to security attacks and vulnerabilities [3]. In the sensing unit, several tiny sensors
wearable, implanted or ambient, etc. acquire data. These devices are prone to dam-
age by fall while manual handling leading to loss of data or erroneous data, also
devices may be compromised by the adversary for stealing or tampering data, may
be replaced by illegitimate one, etc. In the communication unit, sensor data travel
through heterogeneous communication links such as short-range communication
links e.g. Bluetooth, Wi-Fi, Zigbee, WiMax having varying link quality, security
measures, etc. In storage and processing unit, data gets stored in cloud servers for
further access, processing, knowledge building, feedback or advice generation, etc.
[4]. The first phase of this process comprises sensing of data, the transmission of
data and saving the data in the cloud. The second phase comprises access of data
from the cloud, analyze or modify data, update or delete data at cloud by multiple
stakeholders in healthcare (Fig. 1). Now, for healthcare data, privacy and integrity are
two important properties to be ensured so that people do not worry about revealing
their sensitive data through unauthorized access. Integrity of health data is impor-
tant as advice generation is inaccurate if it is based on incorrect data. Underlying
security measures and principles of IoT and cloud enabled healthcare framework
helps to avoid additional computational complexity resulting in more resource con-
sumption due to implementation of encryption algorithms separately. As there are
multi-parties involved in health data access, if data needs to be encrypted at sender
and then decrypted at corresponding receiver then it would increase latency which
is not desirable in time critical application like healthcare. Moreover, devices in
IoT enabled smart healthcare systems are heterogeneous having varying resource
level e.g., sensors to Smartphone/tablet/laptop to high-end workstation/server. So,
unilateral encryption-decryption algorithms like Data Encryption Standard (DES),
Advanced Encryption Standard (AES) or Rivest-Shamir-Adleman (RSA) cannot be
applied at all levels of smart healthcare architecture. Thus, direct confidentiality may
have an implementation issue but confidentiality may be ensured by implement-
ing authentication and integrity. In blockchain [5], there is a decentralized network
where stakeholders (hospitals, doctors, research labs, and insurance provider) are
connected to each other called as blockchain nodes. Health Sensors collect data and
send to Personal Digital Assistant (PDA) which then forwards data to the blockchain
network through access points. Data forwarded to the blockchain network in one
session is called a block. Hash of the previous data is bound with the new data so
Intelligent, Secure Big Health Data Management … 189

Fig. 1 The traditional 3-tier architecture of wireless body area network (WBAN) based healthcare
[8]

that the blockchain network can validate the new block of data, once validated hash
of the data is stored in the blockchain nodes and the health information are stored in
the cloud database in encrypted format.
Big health data [6] stored in cloud Database requires further analysis using
Machine learning techniques for knowledge extraction. Deep Learning techniques
are now widely used in healthcare; some of the popular applications include early
disease detection, DNA (Deoxyribonucleic Acid) analysis, prediction of new drug
effectiveness, personalized treatments, etc. One of the big challenges of using deep
learning techniques in health informatics is the need for a huge amount of labeled
data. But EMR may contain different unlabeled data, for example, X-ray images
without any medical conditions like cancer or fibrosis. In such cases, unsupervised
learning techniques can be used for labeling of the data using data mining. For labeled
data, supervised learning can be used. For a combination of labeled and unlabeled
data, semi-supervised learning is to be applied. Convolution Neural Network (CNN)
is highly impacted Deep learning technique among others like Deep Neural Net-
work (DNN), Deep Auto encoder, Deep Belief Network (DBN), Recurrent Neural
Network (RNN) as health data ar pre-dominantly image-based nowadays. DNNs in
real-time applications such as healthcare have successfully been implemented with
parallelism support of Graphical Processing Units (GPUs) [7].

2 Related Works

This section describes some of the works related to big health data and issues related
to intelligent processing of them using deep learning techniques. Also, the security
190 S. Saif et al.

of health data in terms of privacy, authentication, and integrity is utmost important


so efficient security techniques to ensure the security of big health data taking care
of related issues and challenges to be focused necessarily.
In [7], authors have presented a rigorous review of deep learning techniques and
implementation issues in handling big health data in terms of advantages, draw-
backs and future scope. Application of deep learning in both sensor-based health
data and image based data is also focused. In health informatics, EMR contains the
medical history of a patient such as diagnoses with and without medical test, pre-
scription advice and follow up, vaccination data, proneness to allergy, time-varying
signals such as Electroencephalogram (EEG) or Electrocardiogram (ECG) or Elec-
tromyogram (EMG) signals, sensory data using pervasive sensors such as pressure,
temperature, pulse rate, heart rate etc. Health data in health informatics is not always
complete and labeled. Such data may also be erroneous. Moreover, as sensing or data
acquiring devices are heterogeneous, data are not in the same format, size. Rate and
sequence of data acquisition also vary posing a great challenge to processing of such
data set. Genuine challenges in the application of deep learning techniques such as
DNN, CNN, etc. in big health data processing lies in the nature of this technique
itself. Inadequacy of correct and complete training data may lead to poor training of
DNN model. Also, limited training labeled data may be appropriate to get very low
error but lead to huge error while testing with new dataset. Often researchers apply
deep learning techniques such as CNN model as black box without having proper
interpretation of hidden layers, weights, etc. Correct and efficient preprocessing, fil-
tering, normalization of training data set is utmost important as noise may lead to
misclassification of data in machine learning techniques such as logistic regression,
etc. In spite of all, if insight and appropriate interpretation of hyper parameters can be
built up, the structures of DNN, number of filters in CNN can be controlled and pre-
defined. In this context, a blockchain-based medical data sharing among untrusted
multi-parties ensuring confidentiality, authentication and access control has been pro-
posed [9]. To eliminate illegal modifications by intruders, transaction requests among
legitimate parties to access data from the cloud are secured with cryptographic keys.
A threat model has been developed identifying security attacks and threats of health
data as well as medical reports. When requests for data access come from an entity,
signature-based authentication is done first and then data is retrieved. This data is
encrypted and sent to the requestor. The purpose of the data access request is also con-
sidered. Performance evaluation in terms of latency as number requestor increases
is done based on real test scenarios. In [10] authors have presented a rigorous and
logical review of related works on blockchain and its application in healthcare. The
approach systematically finds characteristics of blockchain technology that make it
suitable to ensure secure and trusted transactions among many stakeholders on the
shared medical record of patients. A large number of publications have been ana-
lyzed and evaluated based on relevant parameters such as blockchain platform used,
consensus algorithm implemented, type of blockchain network, smart contracts, etc.
This work involves a rigorous search of blockchain papers in healthcare during the
span between 2008 and 2019 to discover that a good quality work worthy of analysis
have been published only 2015 onwards and implemented works have only been
Intelligent, Secure Big Health Data Management … 191

published in 2017, 2018 showing that interest in blockchain-based secure health-


care is increasing. Though blockchain is a very promising and efficient technique
to handle with shared medical records among multiple interested parties ensuring
authentication, confidentiality, integrity, access control, etc., it has several limita-
tions and challenges in the context of real healthcare scenario. In [11], authors have
identified many challenges e.g., additional overheads in terms of communication,
storage, delay in executing requests to access data, scalability issues. Moreover, the
performance evaluation of the proposed blockchain-based algorithm has been done.
An interesting review of the application of deep learning techniques in healthcare
has been presented in [12]. They have categorized deep learning techniques applied
to specific physiological signals such as ECG, EMG, EEG. A detailed and insightful
illustration on methods of deep learning both mathematically and architecture wise
will attract budding or existing researchers in related areas. Authors in [13] proposed
a unique secure remote healthcare system using smart contracts by implementing
private blockchain based on Ethereum protocol. Ethereum based private blockchain
ensures that only authenticated users can access patients’ health data. In this work,
only the events such as data sensing in sensors, processing in smart devices using
smart contracts, alert or alarm generation and sending to caregivers, etc. are stored
in the blockchain ledger. Actual confidential information related to medical records
is saved in EMR and mapping is maintained between EMR and blockchain ledger
to access or retrieve data. Some limitations while developing and implementing this
work have also been identified e.g. efficient key management, latency etc. In IoT
enabled systems number of sensors is high and will increase rapidly over time hence
key generation, distribution is a trivial issue. Scope of such system to provide support
to emergency health scenario by reducing latency in blockchain ledger processing
is also a real challenge. A recent research work [14] has proposed a CNN based
approach to predict chronic disease risk. Authors have collected real-life hospital
data for a regional chronic disease from central hospitals of China, since health data
are mostly unlabeled and contains missing data, they have used a latent factor model
to reconstruct the missing information. They have implemented their proposal and
compared with other disease prediction algorithms. Their experimental results show
the accuracy of 94.8% and a convergence speed which is comparatively better. In
[15] authors have shown that some of the machine learning algorithms are prone
to a security attack known as poisoning attack. In this type of attack, the attacker
augments the training dataset using malicious data which causes wrong results than
expected. This can be life-threatening while diagnosing a patient. Finally, authors
have presented prevention techniques for these types of attacks.
In Table 1, we have presented some works published in the last few years applying
Deep Learning based techniques in healthcare applications. Table 2 shows works for
sharing health data using blockchain.
192 S. Saif et al.

Table 1 Application of various deep learning methods in health informatics


Authors, year Application Input data Deep learning
method
Sun et al. [16], 2016 Lung cancer Lung Image CNN
diagnosis Database Consortium
(LIDC) dataset
Esteva et al. [17], Skin cancer Clinical imaging CNN
2017 classification
Ahmed et al. [18], Breast cancer Wisconsin breast DBN
2016 classification cancer dataset
(WBCD)
Fakoor et al. [19], Cancer diagnosis and Gene expression Deep Auto encoder
2013 classification
Ramsundar et al. Drug Discovery Molecular DNN
[20], 2015 Compound
Rongjian et al. [21], Brain Disease Alzheimer’s disease CNN
2014 Diagnosis neuroimaging
initiative(ADNI)
dataset
Mohsen et al. [22], Brain tumor Magnetic resonance DNN
2017 classification imaging (MRI)
images
Amin et al. [23], Brain tumor Magnetic resonance DNN
2018 detection imaging (MRI)
images
Yaniv et al. [24], Chest pathology X-ray images CNN
2015 detection
Charissa et al. [25], Human activity Raw sensor data CNN
2016 recognition

Table 2 Use of blockchain technology in health data sharing


Authors, year Security requirements considered Implementation
Confidentiality Authentication Access Integrity
control
Zhang et al. ✓ ✓ × ✓ ✓
[26], 2018
Azaria et al. ✓ ✓ × × ×
[27], 2016
Peterson et al. ✓ ✓ × × ×
[28], 2016
Patel et al. ✓ ✓ × × ×
[29], 2018
Xia et al. [9], ✓ ✓ ✓ × ✓
2017
Intelligent, Secure Big Health Data Management … 193

3 Preliminaries

In this section, we have discussed about Internet of Things (IoT), Bigdata, various
Deep Learning techniques, and blockchain technology briefly. Then, the proposed
architecture has been discussed in details.

3.1 Internet of Things (IoT)

The concept behind IoT is to connect the internet with humans that can be achieved
through connecting machines and other physical things with internet [30]. This tech-
nology is rapidly growing and adopted in healthcare. Usage of IoT based technologies
has helped physicians and patients a lot. For example, a patient can take advice from
doctors without physically visiting clinics or patients who need real-time monitor-
ing, do not need to visit hospitals. Using biological sensors and internet, doctors
can observe the physiological parameters of patients. Wireless body area network
(WBAN) is one of the core technologies to support remote healthcare. It basically
consists of some battery-powered lightweight wireless sensors that can wearable
and implantable. These sensors are connected with an access point using short-range
communication and that access point forwards the data to a medical facility such as
clinic, hospital. These IoT systems produce massive data which can be qualified as
“Big Data”. These data need to be handled in a secured and efficient way so that it
can be accessed by all stakeholders.

3.2 Big Data

Big Data is a large dataset, which may contain data in a structured, unstructured and
semi-structured format. Structured data are basically stored in different databases
or in spreadsheets in a tabular format. Image, video, audio belong to unstructured
category and these data are very difficult to be analyzed. Semi-structured data do
not follow any strict standard, such as XML. These data can be used in emerging
applications such as clinical decision support, disease prediction, etc. through various
Machine Learning technologies. Healthcare sector produces a huge amount of data
such as sensor data, previous health records, drug records. This enormous data are
difficult to manage using traditional software or hardware systems. Use of cloud
platform reduces the cost for efficient storing and sharing.
194 S. Saif et al.

3.3 Deep Learning

Deep learning is prominent unsupervised feature learning method which is used to


extract high-level features from low-level data. Since feature identification is time-
consuming and expensive, Deep learning (DL) is used. The main advantage of unsu-
pervised learning is that it does not need labeled data for learning purpose. In most
of the cases, medical health data does not contain a label, like X-ray images without
any medical condition. Labeled data can also be used in DL techniques, which are
called supervised learning.
There is various type of DL techniques. In this section, we discuss some of popular
DL techniques.
One of the most popular Deep Learner is an Artificial Neural Network (ANN). It
consists of perceptrons, the neurons, which are organized in layers. Layers contain an
input layer, one or multiple hidden layer(s) and an output layer. Hidden layer works
as the training layer, but increasing the number of the hidden layer does not guarantee
improved results. Overfitting problem may occur if too many layers or perceptrons
are added, as a result too many noise data is captured instead of the actual feature.
This decreases the accuracy. The architecture of Artificial Neural Network is shown
in Fig. 2. Convolutional Neural Network (CNN) is most helpful in healthcare, a fixed
size of vector is given as input. For example, an array containing pixel values of a
pathological image is the input and then it is mapped to an output such as a type of
tumor. In this case, different types of tumor images are given as input for training.
In CNN perceptron’s are connected and during training, weights are assigned and
adjusted in every iteration. After each iteration loss function is used to determine the
error then back propagation is done to adjust the weights. Since signals are passed in
one direction i.e. input layer to the output layer, it is called a feed-forward network.
In Recurrent Neural Network, both forward and backward connections are present.
The loss function can be decreased by using gradient descent.

Fig. 2 The architecture of


an artificial neural network
Intelligent, Secure Big Health Data Management … 195

3.4 Popular Deep Learners

Various deep learning techniques are available; we have to choose to wisely the best
technique for a specific problem. Table 3 shows some popular methods which have
been used in health informatics.

Table 3 Summary of popular deep learning architectures


Description Advantage/disadvantage Architecture
Deep neural network Advantage
• This is a type of neural • Widely used because of
network that contains the success rate in
multiple hidden layers different applications
• In general, used for Disadvantage
classification or regression • The training process can
• Non-linear hypotheses be very slow if the
can be expressed computation power of the
CPU is not good
Deep auto encoder Advantage
• It is a fully connected • Labeled data is not
neural network which has mandatory for the
two phases known as the learning process. Many
encoder and decoder variations are available
• It performs better for Disadvantage
feature extraction • Requires high processing
• Encoder transforms a high time
dimensional input vector
to low dimensional feature

Deep belief network Advantage


• It is comprised of RBM’s, • Both supervised and
a RBM has a visible layer unsupervised training is
and a hidden layer possible
• The hidden layer of each Disadvantage
sub-network act as a • Due to initialization and
visible layer for the Next sampling, the learning
RBM process is expensive

Deep Boltzmann machine Advantage


• It consists of multiple • Both supervised and
hidden layers which are unsupervised training is
connected to each other in possible
a unidirectional manner Disadvantage
• Nodes in the layers are • The technique for an
independent of each other approximation of
but are dependent on other inference based on
layers mean-field is slower than
deep belief networks
(continued)
196 S. Saif et al.

Table 3 (continued)
Description Advantage/disadvantage Architecture
Recurrent neural network Advantage
• It has the ability to • It can store sequential
analyze streaming type events in the form of
data. Useful for the activations if feedback
applications where output connection is present
is dependent on previous Disadvantage
inputs • Training can be difficult if
• Each hidden layer has its tanh and rely activation
own weight and biases function is used

Convolutional neural Advantage


network • It can take images as
• It consists of one or more input, which is very
convolutional layers helpful for medical
followed by one or more applications
connected layers Disadvantage
• Cross entropy, square loss • Labeled dataset of large
errors are some popular size is required for
function to calculation execution
error

Convolutional Neural Network

Convolutional Neural Network (CNN) is one of the most popular deep learning
methods which are inspired by human visual cortex. It is a kind of feed-forward
network that consists of many layers also is a collection of interleaved feed-forward
layers having convolutional filters. When input data are passed through the layers,
high-level features are extracted in each layer. This technique is highly helpful in the
era of medical imaging. For example, tumors can be classified from the irregularities
in tissue morphology. CNN can be applied to read pattern which is a difficult task
by human experts. For example, early stages of many diseases can be detected from
tissue samples.

Recurrent Neural Networks

Recurrent Neural Networks (RNN) is another useful technique for healthcare because
it supports streaming data and which can be analyzed further. Fixed-size of input vec-
tors are used here also data such as speech, text or DNA sequences can be provided
as input where output depends on previous input. In the architecture of RNN per-
ceptrons are interconnected with themselves, which act as a memory for consecutive
inputs. For healthcare scenario, RNN can be applied for the analysis of medical text
like anamnesis. For instance, a pool of patient has the same disease with different
symptoms. RNN has the ability to scan a set of text files to find the similarities; this
can help a physician for diagnosing an illness.
Intelligent, Secure Big Health Data Management … 197

Deep Autoencoders

Recent studies show that there is no universal set of features which works accurately
on various datasets. Feature extraction using data-driven learning method is more
accurate. So, Autoencoder Neural Network is introduced. In this case, the same
number of input and output is used so that the input vectors can be recreated instead
of assigning a class label. This is an unsupervised technique. Typically, the hidden
layer is less than the input/output layers. To extract the relevant features, it encodes
the data in lower-dimensional space, but if the input data has higher dimensionality
then a single hidden layer is not sufficient.
Deep Boltzmann Machine

It is an unsupervised learning technique where the connections between the different


layers are undirected and it consists of multiple hidden layers. If we treat odd layers
on one side and even layers on another side, it can be treated as a bipartite graph. No
intralayer connections exist in Deep Boltzmann Machine (DBM), only the units of
neighboring layers are connected. Markov chains are used to determine the gradient
of the likelihood function but practically it is slow.

Restricted Boltzmann Machine

One of the popular variants of Boltzmann Machine is Restricted Boltzmann Machine


(RBM) which is stochastic in nature. A specific distribution function in stochastic
units is used to model the network. There are some steps in the learning process
called Gibbs Sampling, which adjust the weights so that the reconstruction error
can be minimized. Nodes are undirected in RBM and as a result, values can be
propagated in both directions. To train an RBM one of the common method is the
use of Contrastive Divergence (CD) algorithm, which is an unsupervised learning
technique. There are two phases in CD algorithm referred to as positive and negative
phase. The training set is replicated by changing the network configuration in the
positive phase; in the negative phase, data is recreated based on the current network
configuration.
Deep Belief Network

Deep Belief Network (DBN) can be treated as a composition of Restricted Boltzmann


Machine (RBM). In DBN, hidden layer of every sub-network is connected to the
visible layer of the next RBM. Connections are undirected for the top two layers and
the lower layers connections are directed. Layer-by-layer greedy learning technique
is used to initialize DBN and gradually modifications are done to achieve the target
outputs.
Some popular Deep Learning Techniques are summarized in Table 3.
Table 4 presents popular software packages where Neural Networks can be imple-
mented.
198 S. Saif et al.

Table 4 Software tools for implementation of neural networks


Software Developer Platform Supported technique Cloud
support
CNN RNN DBN RBM
Neural Artelnics Microsoft ✓ ✓ ✓ × ✓
Designer Windows,
[31] Linux
Keras [32] François Microsoft ✓ ✓ ✓ × ×
Chollet Windows,
Linux
Apache Apache Linux, ✓ ✓ × ✓ ×
SINGA [33] Software macOS,
Foundation Windows
Deeplearning4j Adam Linux, ✓ ✓ ✓ ✓ ×
[34] Gibson, Josh macOS,
Patterson Windows,
Android
Microsoft Microsoft Windows, ✓ ✓ × × ×
Cognitive Research Linux
Toolkit [35]
Apache Apache Windows, ✓ ✓ × ✓ ✓
MXNet [36] Software macOS,
Foundation Linux
OpenNN Artelnics Microsoft ✓ ✓ × × ×
[37] Windows,
Linux
PyTorch [38] Adam Linux, ✓ ✓ ✓ ✓ ×
Paszke macOS,
Windows
TensorFlow Google Linux, ✓ ✓ ✓ ✓ ×
[39] Brain macOS,
Windows,
Android
Theano [40] Montreal Linux, ✓ ✓ × ✓ ×
Institute for macOS,
Learning Windows
Algorithms
(MILA)

3.5 Applications and Challenges of Deep Learners

Machine Learning (ML) has various successful applications in the area of health
informatics whereas Deep Learning (DL) techniques are more recent and its adoption
is slow. However, DL has rapid progress and results can be promising in spite of the
challenges. We can divide medical applications of DL in three categories.
Intelligent, Secure Big Health Data Management … 199

• Predictive healthcare, e.g., the efficiency of treatment prediction for various dis-
eases.
• Medical Decision Support, e.g., using physiological information of the patient
various disease can be detected and diagnosed.
• Personalized treatments, e.g., personalized drugs can be designed as per the need
of individual patients.

Predictive Healthcare

This type of applications is designed for detection of diseases at early stages so


that treatment can be started before the patient goes into a critical state. In general,
detection of Alzheimer is very difficult in its early stage. Other areas of predictive
healthcare include predicting the effectiveness of treatments. Deep learning (DL) can
be used to detect anomalies which are difficult to be detected by the human eye, for
examples Computerized axial tomography (CAT) scans or radiographs. DL can be
very much effective in anomaly detection since it can detect small variations which
can remain undetected by the human in early stages. Medical images are easy to
obtain and it can be used as training data which can solve the sparse data problem.
Behavioral data of patients can be also used for the early detection of illness. Using
these different medical data DL can build a prediction model. Another important of
predictive health care can be prediction of the efficiency of new drugs. So far results
are not promising but new development approaches can be invented.

Medical Decision Support

One of the important application of Deep Learning in health informatics in medical


decision support which is very much trending nowadays. Deep Learning techniques
can help the doctors in every stage of a medical diagnosis like detection of the disease,
proposing personalized treatment, post-treatment therapy, etc. In the case of disease
prediction from image analysis, Deep Learning techniques can be more accurate
than humans. Biomedical text analysis can be done through DL. Due to domain-
independent nature, any kind of data can be analyzed as well as correlated using DL.
Correlation analysis can be done using a different kind of electronic health record
of patients to provide a better diagnosis. Also, from a single data set, correlation
analysis can be done, for example, brain regions can be correlated from different
MRI images. For correlation analysis, CNN techniques are widely used. CNN can
create abstractions of the input data even data are collected from heterogeneous
sensors. A medical practitioner may not be able to go through a big medical history
of a patient; hence, DL can do that task and can provide medical decision support.

Personalized Treatments

Personalized treatments are closely related to medical decision support. Based on


the prediction Deep Learning techniques can support decision making and hence
personalized treatments can be provided as well drugs can be designed. Electronic
health records stored at cloud database are mostly multimodal and unstructured and
due to the recent advancement of technologies, DL can offer a diagnosis based on
200 S. Saif et al.

the data. Personalized treatments can be offered based on various data. For example
biomarkers can be determined by DNA analysis and genome mining. Biomarkers are
nothing but a biological state (disease) indicator which can be measured. Every dis-
ease is developed in the human body itself. Biomarkers can determine this probability
of development and that can help the medical experts to provide better prediction
and diagnoses. Genomics helps to identify gene allele which is responsible for the
development of an illness. Drug effectiveness can be determined by evaluations the
differences in genes when the drug is applied, this is called Pharmacogenomics. This
helps to reduce the dosage levels as well as the side effects of the drug. Deep Learn-
ing techniques perform very well in cancer classification from gene expression data.
For example, to predict splicing pattern, features extraction from Ribonucleic Acid
(RNA) and Micro ribonucleic Acid (miRNA) data can be efficiently done using DL.
So, DL can help us to analyze data from EMR and can offer personalized medicines.

Challenges

There are many challenges of Deep Learning in the domain of health data. Depending
on the nature of medicines there is a requirement of security, availability, reliability,
efficiency. For example, a health sensor must work continuously without any inter-
ruption, so that emergencies can be handled. Some recent works show that weight
filters can be is used in CNN for extraction of high-level features but the entire learn-
ing module may become non-interpretable. Most of the researches use DL techniques
without knowing the possibility of success; if misclassification problems occur then
they do not have the ability to modify. We have discussed in the previous sections
that large datasets are required for effective and reliable training model. Nowadays
enormous healthcare data is available but disease-specific data is still limited. So
DL is not suited for applications involving rare diseases. Another common issue
in training of Deep Neural Network is overfitting problem when the small training
dataset is used. This happens when the total number of samples in the training set
is proportional to the number of parameters in that network. Overfitting problem
can be avoided by exploiting regularization techniques such as dropout during the
training process. DNN does not support raw data directly as input data; so, some
preprocessing is needed or the input domain needs to be changed. Hyper parameters
which control the architecture of a DNN, for example, the number of filters in CNN,
is a blind exploration process and accurate validation is very much required. Finding
an optimal set of hyper parameters and correct preprocessing of raw data is a chal-
lenging task and this can lead to the long training process. Another important issue in
DL is that many DNNs can be fooled easily; if the minor change is done in input data
(adding imperceptible noise in an image) then the samples will be misclassified [41].
It can be noted that most of the machine learning algorithms can be affected by this
issue. If the value of a particular feature is set very high or very low, misclassification
problem will surely arise in logistic regression. In decision trees, if a single binary
feature is switched in the final layer, then it will product incorrect results. So, we can
say that any machine learning technique is vulnerable for security attacks also, as a
simple alteration will lead the system to produce wrong results.
Intelligent, Secure Big Health Data Management … 201

3.6 Blockchain Technology

A blockchain is a collection of decentralized CPU/node where data can be stored


in blocks. It is also known as a decentralized ledger where data blocks are updated
continuously (Fig. 3). Data blocks may contain agreements, contracts, sales, finan-
cial transaction, health data, etc. Blockchain was introduced by Satoshi Nakamoto
in the year 2008. Basically, it was developed to secure the cryptocurrency (Bitcoin)
transactions. But nowadays this peer to peer technology has been adopted by various
sectors like finance, transportation, education. healthcare, governance, etc. Crypto-
graphic algorithms make the system tamper-proof; these algorithms make the system
computationally impossible to alter the data/transaction stored in the blockchain. An
intruder needs to compromise 51% of CPU/nodes to overcome the hashing power of
the targeted blockchain network.
A block contains a header and a message. Several parameters form the header are
as follows: (i) Timestamp which records the exact time of creation of the block, (ii)
Previous block hash refers to hash of the previous block of the chain, (iii) Merkle
root which contains a hash of the root of the tree that is the SHA256 Hash of the
transactions, (iv) Difficulty Target is nothing but a piece of data which is difficult to
achieve set by Proof of Work (PoW) algorithm, (v) Nonce is required to achieve the
difficulty target, it also defends reply attacks. Each block is interconnected through
the Hash of the previous block. On the other hand, the message contains the hash
of the previous transaction, transaction/message to be sent and a digital signature of
the owner. This digital signature is treated as a proof of ownership of the transac-
tion/message which can be verified by the public key of the owner. Each time a block
is approved and added to the blockchain that becomes immutable which cannot be
tampered or altered.

Fig. 3 Structure of a block in blockchain [42]


202 S. Saif et al.

3.7 Types of Blockchain

In general, there are various types of blockchains which can depend on managed
data, availability of the data and actions performed by a user. We can categorize in
there
• Public Blockchain (permissionless)
• Consortium (public permissioned)
• Private Blockchain.
From the types of blockchain, it is clear that blockchains which are accessible
and visible to the public are public blockchain. However, the entire data may not be
accessible by the public, since some part of the data can be in an encrypted format
to keep participants anonymity [43]. In public blockchains, anyone can join the
blockchain and act as a node, or can become a miner; hence, approvals are required.
Cryptocurrency networks come in this category where a miner gains some economic
incentive. For instance, Bitcoin, Ethereum, Litecoin are cryptocurrency networks
based on the public blockchain.
In Consortium type of blockchains, only selected nodes are allowed to participate
in the distributed consensus process [43]. Any kind of industry can use this kind of
blockchain. Sometime consortium blockchains are developed for a particular industry
(e.g., healthcare sector), but open for public use based on approval.
Private blockchains are decentralized network [43] where only permissioned
nodes can join the network. The task of the nodes such as, to perform transactions,
to execute smart contracts or to act as a miner, is controlled in private blockchains.
Basically, a trusted organization manages the blockchain. Platforms like Ripple [44],
Hyperledger Fabric [45] only support private blockchain network.

3.8 Challenges of Blockchain in Healthcare

Integrating blockchain in healthcare systems is very challenging. Challenges like


management, technical problems need to be taken care of. Here we have discussed
several coherent challenges.
Interoperability

To share data among different healthcare providers in a fast and effective way is a
challenging task. Due to non-collaboration and lack of coordination, it becomes a
barrier for effective data sharing [27]. Patients and other stakeholders of healthcare
may face problems in data sharing and retrieval process.

Management, Anonymity, and Privacy of Data

Management of large health records and sharing it over the healthcare providers is
not an easy task while integrating blockchain. Since health data is sensitive in nature,
Intelligent, Secure Big Health Data Management … 203

it should be shared only with trusted parties. It must be ensured that an unauthorized
entity does not get access to the data. National regulations and privacy of data must
be adhered to adopt blockchain in healthcare.

Quality of Service (QoS)

One of the big concerns in adopting blockchain in healthcare is delivery time, data
must be delivered within the required time. Patients’ lives can be in danger if the
required data is not delivered on time. Since blockchain architecture is complex in
nature, incorporating blockchain may create computational delays. A lot of research
needs be done in order to reduce delay and maintain QoS in terms of reliability before
incorporating blockchain in healthcare.

Heterogeneous Devices and Traffic

Biological sensors are important parts of healthcare and these sensor devices gen-
erate various kind of traffic. In general, data traffic is classified into two categories,
emergency traffic, and normal traffic. Traffic generated from the data gathered from
patients in an emergency situation is emergency traffic and the data gathered by
sensors in regular monitoring are known as normal traffic. So, while implementing
blockchain in healthcare, a priority mechanism is very much required, so that the
emergency traffic experiences a minimum delay compared to regular traffic.

Latency

Latency is an important parameter of healthcare. Some healthcare applications are


based on real-time monitoring to make diagnosis process faster. In the blockchain,
blocks are verified before they are added/shared to different stakeholders, this process
will create a delay in accessing data and analysis. So, when designing a blockchain-
based healthcare system this delay must be considered.

Resource Constraints and Energy efficiency

Since blockchains add up computational complexities, cryptographic approaches can


be a burden for the sensors [46]. Biological sensors are resource-constrained devices
in terms of computational power, battery backup, etc. so, this high computational load
may cause a rise of temperature in the sensors. This will create discomfort for the
patient. Energy efficiency is another challenge since the sensors are battery-powered.

Storage Capacity and Scalability

Since data generated by health sensors is huge in volume, the nodes of the blockchain
should be capable to store these huge data. Health data may consist of medical images,
laboratory reports, drug history records, all these require a large amount of storage
space. This issue could be solved if cloud storage platforms are used.
204 S. Saif et al.

Security
Another important issue for incorporating blockchain in healthcare is the reliability of
data gathered. Although blockchain is popular because data stored in the blockchain
is immutable, sometimes data that come from the sensors may be corrupted; so,
the data will remain corrupt. Data received in the blockchain nodes may be altered
and it might be possible because of different security attacks like fake data injection,
eavesdropping, etc. So, an effective security mechanism must be taken care to ensure
the integrity of the data.
Data Mining
Blockchain is based on validation of data block; each data that comes from sensors
is considered as a block of data and data sent from the sensors each time needs to be
validated before adding to the chain. So, the problem will arise when the number of
patients is increased; in that case, it will take more time for time for mining because
the computation load will increase. So, efficient mining is also a very challenging
issue while integrating blockchain in healthcare.

4 System Model

A typical IoT based health care system consists of three layers, the first layer consists
of different health sensors like, ECG, Pulse, Blood pressure, etc. Usually, these
sensors are placed on the body of a patient. They are responsible for sensing different
physiological parameters from the patient body, and then this information is sent to
the PDA device. In the second layer, PDA device forwards the data to the medical
server through an internet connection and in the third layer doctor/medical facility get
access of the data. But data in transmit is vulnerable for various cyber-attacks. So, we
must need to adopt like confidentiality, authentication, integrity, access control. These
four parameters are well-known security requirements for health care applications
[47].
Attack Model
Traditional IoT based applications mainly faces two types of attacks: attack against
confidentiality and attack against integrity. Confidentiality means non-disclosure of
private information of patient which is prone to different threats. Some common
security attacks on confidentiality are Eavesdropping, impersonation attack, side-
channel attack, packet sniffing, etc. Therefore, it is very important to handle security
attacks against confidentiality. Integrity ensures the intactness of data during commu-
nications. Nowadays, IoT based biological sensors gather physiological information
from a patient and that information is sent to medical facilities since these data
are sent through some insecure wired/wireless links, it is easy for an adversary to
physically/remotely capture the forwarding device and manipulate the information
gathered by sensors. As a result, it may lead to the wrong diagnosis. Some of the com-
mon attacks against integrity are data modification attack, fake data injection attack,
Intelligent, Secure Big Health Data Management … 205

replay attack, etc. In our proposed framework we have considered blockchain, which
can defend attacks against integrity due it its nature of working and to handle attacks
against confidentiality various cryptographic schemes can be used for encryption of
data at forwarder device and data can be decrypted using the secret key of medical
service providers.
Proposed Architecture
Here, we propose a secure and smart framework to share the data with different
medical facilities in an effective manner. The overall blockchain-based architecture
is shown in Fig. 4 where cloud storage is used to store electronic medical record
(EMR).
In our proposed framework data gathered through sensors are first sent to a PDA
device; this device will generate the hash of health data using standard Hash algo-
rithms and after that, the Hash will be forwarded to a private blockchain network
through the internet. In the blockchain each medical facility like hospitals, labs,
insurance companies, research labs, etc. will act as blockchain node. Hash sent from
PDA device will be received by each node and that data block needs to be validated
and verified by nodes. Verification is done based on the received hash and that hash is
compared with the hash of the previously received data block. It is possible because
the data block generated by the PDA device also contains the previous block hash.
Majority of the blockchain nodes needs to verify the block. Once verified, the block
is added to the chain and a unique secret key and an identifier (ID number) are gen-
erated. The key and ID is sent back to the PDA device. The PDA device encrypts
the actual health data using the key and the encrypted data, hash of the health data,
ID is sent to the cloud-based database server. If someone tries to tamper the data
of one block, then the next blocks are also affected. Whenever any medical facility
needs to access health data stored in the cloud, first the data is identified through the
ID and then decryption is done using the secret key. Once the decryption is done,
health data becomes available. It is given as input in the various Deep Learning-based
healthcare applications. We have discussed earlier the various applications of Deep
Learning techniques for various healthcare applications like Predictive healthcare,

Fig. 4 Proposed blockchain-based health data-sharing framework


206 S. Saif et al.

Medical Decision Support, Personalized treatments, etc. The main advantage of the
proposed architecture is the data sharing among the different medical facilities in a
secure manner and the data can be used for various healthcare applications. In our
architecture, security requirements are maintained. Since private blockchain is used,
data is stored/accessed by authenticated users only. Cryptographic algorithms help
to maintain the confidentiality of the data. Integrity is maintained due to the working
nature of the blockchain and access control is based on the secret key. Since the secret
key is generated by the blockchain nodes, only they have permission to decrypt the
data.

5 Open Research Issues

In this paper, we have described the role of blockchain and deep learning in health
informatics. Both these two emerging technologies face some challenges, which is a
research area, proper research can mitigate these issues. As discussed earlier unilat-
eral cryptographic algorithms like DES, AES, 3-DES are not a good choice to apply
in a blockchain for healthcare applications, applying these algorithms will increase
the latency in terms of data sharing. So low complexity encryption-decryption algo-
rithm design is an important research area. Key generation and key sharing should be
done in an efficient way so that it does not increase the complexity of a blockchain-
based health data-sharing platform. Moreover, as the number of stakeholders may go
on increasing in IoT enabled smart and remote healthcare, communication overhead
issues, storage overhead issues to be taken care of while designing a blockchain-
based secure healthcare system. Health data stored in EMR are largely unlabeled,
missing data, noisy data, etc. So researchers should consider the reconstruction of
data from the missing data, data filtration is needed to remove the noises. Also, health
data is big data due to the large sample size and volume of data. Budding researchers
can explore preparing own database based on their own research context besides
using existing benchmark database considering demography, geographical location,
concerned disease, etc. of target subjects to achieve more realistic intelligent data
processing results. There are many research issues in this field, proper exploration
is needed to adopt deep learning and blockchain technology in health informatics
inefficient way.

6 Conclusion

The present era is the era of smart and remote applications in various areas, health-
care in specific, where multiple stakeholders are involved related to big health data
which need to be acquired, stored, retrieved in a distributed manner using security
techniques such as blockchain and processed intelligently by applying deep learning
techniques. Issues and challenges remain in applying deep learning techniques as a
medical health record are always not complete, maybe erroneous and not be labeled.
Intelligent, Secure Big Health Data Management … 207

Also, as health record is huge in size and multi parties are involved, to execute all
steps of blockchain method may lead to additional storage overhead, communication
overhead, and latency to process a submitted request to access data thus making IoT
enabled real-time healthcare support unrealistic. This book chapter discusses all rel-
evant deep learning algorithms, and tools, presents basic and fundamental concepts
related to big data, healthcare, security, IoT, etc., and illustrates the blockchain-based
architecture and defines attack model for a complete view and exploration for the
researchers in this domain.

Acknowledgements This work has been carried out as a part of sanctioned research project from
Government of West Bengal, Department of Science & Technology and Biotechnology, project
sanction no. 230(Sanc)/ST/P/S&T/6G-14/2018.

References

1. Majumder, S., Aghayi, E., Noferesti, M., Memarzadeh-Tehran, H., Mondal, T., Pang, Z., Deen,
M.J.: Smart homes for elderly healthcare—Recent advances and research challenges. Sensors
17, 2496 (2017)
2. Bahga, A., Madisetti, V.K.: Healthcare data integration and informatics in the cloud. Computer
48(2), 50–57 (2015)
3. Movassaghi, S., Abolhasan, M., Lipman, J., Smith, D., Jamalipour, A.: Wireless body area
networks: a survey. IEEECommun. Surv. Tutor. 1–29 (2013)
4. Zhang, Y., Qiu, M., Tsai, C., Hassan, M.M., Alamri, A.: Health-CPS: healthcare cyber-physical
system assisted by cloud and big data. IEEE Syst. J. 11(1), 88–95 (2017)
5. Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system (2008)
6. Andreu-Perez, J., Poon, C.C.Y., Merrifield, R.D., Wong, S.T.C., Yang, G.: Big data for health.
IEEE J. Biomed. Health Inf. 19(4), 1193–1208 (2015)
7. Ravi, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.: Deep
learning for health informatics. IEEE J. Biomed. Health Inf. 21(1), 2–41 (2017)
8. Karmakar, K., Saif, S., Biswas, S., Neogy, S.: WBAN security: study and implementation
of a biological key based framework. Inb: 2018 Fifth International Conference on Emerging
Applications of Information Technology (EAIT), pp. 1–6 (2018)
9. Xia, Q., Sifah, E.B., Asamoah, K.O., Gao, J., Du, X., Guizani, M.: MeDShare: trust-less medical
data sharing among cloud service providers via blockchain. IEEE Access 5, 14757–14767
(2017)
10. Hölbl, M., Kompara, M., Kamišalić, A., NemecZlatolas, L.: A systematic review of the use of
blockchain in healthcare. Symmetry 10, 470 (2018)
11. Shen, B., Guo, J., Yang, Y.: MedChain: efficient healthcare data sharing via blockchain. Appl.
Sci. 9, 1207 (2019)
12. Faust, O., Hagiwara, Y., Hong, T.J., Lih, O.S., Rajendra Acharya, U.: Deep learning for health-
care applications based on physiological signals: a review. Comput. Methods Progr. Biomed.
161, 1–13 (2018)
13. Griggs, K.N., Ossipova, O., Kohlios, C.P., et al.: Healthcare blockchain system using smart
contracts for secure automated remote patient monitoring. J. Med. Syst. 42, 130 (2018)
14. Chen, M., Hao, Y., Hwang, K., Wang, L., Wang, L.: Disease prediction by machine learning
over big data from healthcare communities. IEEE Access 5, 8869–8879 (2017)
15. Mozaffari-Kermani, M., Sur-Kolay, S., Raghunathan, A., Jha, N.K.: Systematic poisoning
attacks on and defenses for machine learning in healthcare. IEEE J. Biomed. Health Inf. 19(6),
1893–1905 (2015)
208 S. Saif et al.

16. Sun, W., Zheng, B., Qian, W.: Computer aided lung cancer diagnosis with deep learning algo-
rithms. In: Proceedings of SPIE 9785, Medical Imaging 2016: Computer-Aided Diagnosis,
97850Z, 24 Mar 2016
17. Esteva, A., Kuprel, B., Novoa, R.A., Ko, J., Swetter, S.M., Blau, H.M., Thrun, S.:
Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639),
115–118 (2017)
18. Abdel-Zaher, A.M., Eldeib, A.M.: Breast cancer classification using deep belief networks.
Expert Syst. Appl. 46, 139–144 (2016)
19. Fakoor, R., Ladhak, F., Nazi, A., Huber, M.: Using deep learning to enhance cancer diagnosis
and classification. In: Proceedings of the ICML Workshop on the Role of Machine Learning
in Transforming Healthcare, June 2013
20. Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively mul-
titask networks for drug discovery. arXiv preprint arXiv:1502.02072 (2015)
21. Li, R., Zhang, W., Suk, H., Wang, L.: Deep learning based imaging data completion for
improved brain disease diagnosis. In: Proceedings of MICCAI 2014, pp. 305–312, Sept 2014
22. Mohsen, H., El-Dahshan, E.S.A., El-Horbaty, E.S.M., Salem, A.: Classification using deep
learning neural networks for brain tumors. Fut. Comput. Inf. J. 3(1), 68–71 (2018)
23. Amin, J., Sharif, M., Yasmin, M., Fernandes, S.: Big data analysis for brain tumor detection:
deep convolutional neural networks. Fut. Gener. Comput. Syst. 87, 290–297 (2018)
24. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., Greenspan, H.: Chest pathology
detection using deep learning with non-medical training. In: 2015 IEEE 12th International
Symposium on Biomedical Imaging (ISBI), New York, pp. 294–297 (2015)
25. Ronao, C.A., Cho, S.B.: Human activity recognition with smartphone sensors using deep
learning neural networks. Expert Syst. Appl. 59, 235–244 (2016)
26. Zhang, P., White, J., Schmidt, D.C., Lenz, G., Rosenbloom, S.T.: FHIRChain: applying
blockchain to securely and scalably share clinical data. Comput. Struct. Biotechnol. J. 16,
267–278 (2018)
27. Azaria, A., Ekblaw, A., Vieira, T., Lippman, A.: MedRec: using blockchain for medical data
access and permission management. In: 2016 2nd International Conference on Open and Big
Data (OBD), Vienna, pp. 25–30 (2016)
28. Peterson, K., Deeduvanu, R., Kanjamala, P., Boles, K.: A blockchain based approach to health
information exchange networks. In: Proceedings of NIST Workshop Blockchain Healthcare,
vol. 1, pp. 110 (2016)
29. Patel, V.: A framework for secure and decentralized sharing of medical imaging data via
blockchain consensus. Health Inf. J. 1–14 (2018)
30. Chun-Wei, T., Chin-Feng, L., Ming-Chao, C., Yang, L.T.: Data mining for internet of things:
a survey. IEEE Commun. Surv. Tutor. 16(1), 77–97 (2014)
31. Artelnics: Neural designer (2015). Available online: https://www.neuraldesigner.com
32. Chollet, F.: Keras (2016). Available online: https://keras.io/
33. Apache Software Foundation: Apache Singa (2016). Available online: https://singa.incubator.
apache.org
34. Skymind: Deeplearning4j (2016). Available online: http://deeplearning4j.org
35. Microsoft: Microsoft cognitive toolkit (2016). Available Online: https://github.com/microsoft/
cntk
36. Apache Software Foundation: Apache MXNet (2016). Available Online: https://mxnet.apache.
org/
37. Artelnics: OpenNN (2014). Avaiable Online: http://www.opennn.net
38. Paszke, A, Gross, S., Chintala, S., Chanan, G.: PyTorch (2016). Avaiable Online: https://
pytorch.org
39. Google: Tensorflow (2016). Available Online: https://www.tensorflow.org
40. Universite de Montreal: Theano (2019). Available Online: http://deeplearning.net/software/
theano/
41. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Adversarial attacks on
deep neural networks for time series classification. In: IEEE International Joint Conference on
Neural Networks (2019)
Intelligent, Secure Big Health Data Management … 209

42. Bahga, A., Madisetti, V.K.: Blockchain platform for industrial internet of things. J. Softw. Eng.
Appl. 09, 533–546 (2016)
43. Zheng, Z., Xie, S., Dai, H., Chen, X., Wang, H.: An overview of blockchain technology:
architecture, consensus, and future trends. In: Proceedings of the 2017 IEEE International
Congress on Big Data (BigData Congress), Boston, MA, USA, 11–14 Dec 2017, pp. 557–564
(2017)
44. Ripple: Ripple—one frictionless experience to send money globally (2018). Available online:
https://ripple.com
45. Androulaki, E., Manevich, Y., Muralidharan, S., Murthy, C., Nguyen, B., Sethi, M., Singh, G.,
Smith, K., Sorniotti, A., Stathakopoulou, C., et al.: Hyperledger fabric: a distributed operating
system for permissioned blockchains. In: Proceedings of the Thirteenth EuroSys Conference,
Porto, Portugal, 23–26 Apr 2018
46. Zhang, J., Xue, N., Huang, X.: A secure system for pervasive social network-based healthcare.
IEEE Access 4, 9239–9250 (2016)
47. Saif, S., Gupta, R., Biswas, S.: Implementation of cloud assisted secure data transmission in
WBAN for healthcare monitoring. In: Proceedings of International Conference on Advanced
Computational and Communication Paradigms (ICACCP 2017), Advances in Intelligent Sys-
tems and Computing, vol. 705, pp. 665–674 (2018)

Sohail Saif is working as a Full Time Ph.D. Research Scholar at Maulana Abul Kalam Azad Uni-
versity of Technology, West Bengal, India. He completed his B.Tech in Computer Science and
Engineering and M.Tech in Software Engineering from Maulana Abul Kalam Azad University of
Technology, WB in 2014 and 2018, respectively. His areas of research interests are internet of
things, network security and remote healthcare.

Suparna Biswas is an Associate Professor in the Department of Computer Science and Engineer-
ing in Maulana Abul Kalam Azad University of Technology, WB. She completed her ME and
Ph.D. from Jadavpur University, West Bengal in 2004 and 2013 respectively. She was an ERAS-
MUS MUNDUS Post Doctoral Research Fellow in cLINK project in Northembria University,
Newcastle, UK during 2014–2015. Her areas of research interests are internet of things, wireless
body area network, machine learning, network security and remote healthcare. She has authored
a number of research papers published in peer reviewed international journals and conferences
of repute. She is currently PI of a WB DST funded major research project on IoT based secure
remote healthcare.

Samiran Chattopadhyay is a professor in the Department of Information Technology, Jadavpur


University. He has served as the head of the department for more than twelve years and as the Joint
Director of the School of Mobile Computing and Communication since its inception. A gradu-
ate, post graduate and gold medalist from Indian Institute of Technology, Kharagpur he received
his Ph.D. Degree from Jadavpur University. He has two decades of experience of serving reputed
Industry houses such as Computer Associates, Interra Systems India, Agilent, Motorola in the
capacity of technical consultant. He led the development of an open-source C++ infrastructure
and tool set for reconfigurable computing, released under the GNU GPL 3.0 license. He has visited
several Universities in the United Kingdom as a visiting professor. He has been working on Algo-
rithms for Security, Bio Informatics, Distributed and Mobile Computing, and Middleware. He has
authored, edited several books and book chapters. Prof. Chattopadhyay acted as a program chair,
organizing chair and IPC member of over 30 international conferences. He has published more
than 180 papers in reputed journals and international peer reviewed conferences.
Malaria Disease Detection Using CNN
Technique with SGD, RMSprop
and ADAM Optimizers

Avinash Kumar, Sobhangi Sarkar and Chittaranjan Pradhan

Abstract Malaria is life-threatening disease spread when an infected female


Anopheles mosquito bites a person. Malaria is one of the predominant diseases
in the world. There exists many drugs which make malaria a curable disease but
due to inadequate technologies and equipments, we are unable to detect and cure
it. The method of diagnosing malaria involves counting of parasite and red blood
cells drugs physically which is a labor-intensive and error-prone process, especially
if patients have to be tested several times a day. This issue can be solved by training
machines to do the work of pathologists. We can the train the machine using many
deep learning algorithms. Our model uses CNN based classification to classify the
blood films to infected and normal blood films. The experimental result show our
model works well on microscopic image and achieves an accuracy of 96.62% and
the model has a lower model complexity are requires less computation time. Thus
outperforming the state of art used previously.

Keywords Malaria · CNN · SGD · ADAM · RMSprop · Deep learning

1 Introduction

World’s leading causes of death include very harmful disease such as Malaria. Malaria
is spread when an infected female Anopheles mosquito bites a person. It is one of the
predominant diseases in the world causing life threatening disease and increasing
the Mortality rate in the countries like India. Different kinds of malaria parasite
including P. falciparum, P. ovale, P. vivax and P. malariae can cause disease to
humans, of which P. falciparum is the deadliest. As per WHO Malaria Report of

A. Kumar (B) · S. Sarkar · C. Pradhan


School of Computer Engineering, KIIT DU, Bhubaneswar, India
e-mail: [email protected]
S. Sarkar
e-mail: [email protected]
C. Pradhan
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 211


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_11
212 A. Kumar et al.

2015 [1], roughly 3.3 billion people in different countries are estimated to be in
the risk of being affected with malaria. Also in the report it was mentioned that
around 1.2 billion people are at higher risk. It was estimated that there were around
214 million instances of malaria all over the world in 2015 and about 438,000 deaths
were seen due to malaria. The impact was more in countries like Africa, where
approximately 91% [2] of total demise happened due to malaria which included two
third of all deaths were of children of age below 5 years. Some sign of malaria include
Muscle pain, vomiting, Chills and in some critical instance it leads to comma which
results in Person’s death.
There exist many medicines which make malaria a remediable disease but due to
lack of new equipments and manual counting of blood cells the rate of deaths are
increasing rapidly. The standard method used worldwide for diagnosis of malaria is
light microscopy of blood films. This method though frequently used but comes with
heavy drawbacks. This method requires heavy expertise of the pathologist which
depends on the amount of burden imposed by large scale analysis which is common
in malaria prone area. This method involves counting of parasite and RBC drugs
manually which is a labor-intensive and error-prone process, especially if patients
have to be tested several times a day. However, accurate counts are essential to diag-
nosing malaria accurately, and are an important part of testing for drug-effectiveness,
drug-resistance, and estimating disease severity. This issue can be solved by training
machines to do the work of pathologists. We can the train the machine using many
deep learning algorithms [3–5].
Deep learning, which is the fastest growing area, has been performing excep-
tionally well in medical field these days. We use a deep learning model which is
popularly known as Convolutional neural network (CNN) in our Model [4].
The main feature of CNN model is that it can automatically detects the important
features without any human supervision by training the learning layers once the
model fits the input feature. The CNN model provides us the great visualization
which helps us in understanding the relations. As compared to other models, CNN is
computationally more effective than other models. Other advantage of CNN is that
it is easy to train the models and also have less parameter as compared to networks
which are fully connected with identical amount of hidden units [6–8].

2 Background

CNN are mainly used to categories images, group the images by their similarity and
carry out different recognition operation such as image or object recognition. CNN
application is not limited in any one field. Some of the applications of CNN are that
it can detect different anomalies in the medical images, character text generation,
automation of many devices and many more [9].
Malaria Disease Detection Using CNN Technique with SGD … 213

2.1 Convolutional Neural Network (CNN)

Nowadays we can see the application of CNN everywhere. It is one of the most
sought-after deep learning architecture. Popularity and effectiveness of convents
increased the interest in Deep Learning. By AlexNet in 2012, the interest in CNN
increased rapidly and has been growing till date.
CNN is the best solution to the entire image related problem. When it comes to
image related problem statement, CNN is the ultimate go-to model because of its
accuracy. CNN can be applied to different models such as recommendation model,
natural language processing and many more. The main advantage CNN has over
other algorithms is that it automatically detects the features which are essential for
classification without teaching the model throughout. For example, given pictures of
two different objects, it automatically detects the features that differentiate the two
classes. CNN model follow some architecture which is shown in the Fig. 1.
First the input image is taken on which we will perform the operations. Convo-
lution and Pooling are performed on input image along with different number fully
connected layers. We get output as softmax while performing multiclass classifica-
tion.

2.1.1 Layers of CNN

Convolution

The fundamental layer of CNN is the convolution layer. Convolution is performed on


two sets of information computed mathematically to merge them. The input image
is on the left side and convolution filter is on the right side [11]. We term convolution
filter also as kernel. Figure 2 shows the convolutional operation.
Convolution is performed by sliding the window over the input signal. It is cal-
culated by multiplying the elements like that of matrix element multiplication and
the adding the results. This sum is mapped to the feature map. Receptive area is the

Fig. 1 Neural network with many convolutional layers [10]


214 A. Kumar et al.

Fig. 2 The convolution operation [10]

field when all these operations take place. The size of the receptive field is same as
that of the filter. Figure 3 shows the convolutional layer.
In case of image related problem 3D convolution is performed. Here an image
has three dimensions namely length, height and breath. The colour of an image or
the RGB channel is represented by the height of the image. In order to perform
actual convolution we need to perform multiple convolution using different filter,
the outcome of each convolution performed is then taken together to form the actual
output of the convolution layer.

Fig. 3 The convolutional layer [10]


Malaria Disease Detection Using CNN Technique with SGD … 215

Fig. 4 Relu activation function [10]

Non Linearity

Neural networks like ANN and auto encoder are powerful because of its non-linearity.
Here the sum of weighted input is passed through an activation function to gain
output. Similar technique is used by CNN also. In CNN, the output we obtain from
convolution layer is passed through relu activation function. This implies that the
output that has been mapped to feature map is not just the summation of the matrix
multiplied element but also has relu applied on it. If we consider all the convolution
performed, relu activation function is applied on every network because without that
the network cannot be powerful [11].
Equation 1, defines the Relu activation function mathematically.

y = max(0, x) (1)

Figure 4, represents the Relu curve.


We can add relu layer to our model using keras through following code imple-
mentation.
from keras.layers import Activation, Dense
model.add(Dense(64, activation=’relu’))

Stride and Padding

Stride is the count of how we slide the convolution filter at each step of the convolution
to be performed. The default value of stride is considered as 1. The bigger the stride,
the smaller is the feature map.
When the size of the stride is increased, the feature map size gets reduced and
may become smaller than that of the input image because the image must contain the
convolution filter. In order to maintain same dimensions of image and that of feature
map we need to have padding around the image [11].
The padding can be of all zeros or else can be of the values already mentioned on
edges of the input image. Now with padding we can achieve a feature map of similar
216 A. Kumar et al.

size of that of image. That’s why to maintain the size of feature map, padding is used
in CNN or else it may shrink with each step performed.
Figure 5 illustrates how the full padding and same padding are applied to CNN.
Pooling
We perform pooling after convolution to reduce the size. It also helps us to lessen
the parameters which in turn reduce the time of training. It helps to down sample the
feature maps by reducing their height and width and keeping the depth or the RGB
values constant.
Max pooling is the most commonly used pooling technique. It works by consid-
ering the maximum value in each pooling window. Pooling has no parameters. It
also performs sliding window technique by selecting the maximum value from each
window. The size of window is specified using the value of the stride [11].
Figure 6 shows the max pooling, in which a window is slides, like a normal
convolution, and get the biggest value as the output.

Fig. 5 Full padding and same padding [10]

Fig. 6 Max pooling layer [10]


Malaria Disease Detection Using CNN Technique with SGD … 217

Hyper parameters

If only the convolution is considered by ignoring pooling then we have take into
consideration four important factors. They include:
• Filter size: filter size of 3 × 3 or 5 × 5 or 7 × 7 is generally used.
• Filter count: It is generally a variable size within the range of 32–1024. The
more the number of filters used, more powerful the network becomes. This has
a limitation also. When the number of filter is increased the over fitting issue
increases because of the increase in the count of parameters.
• Stride: The size of stride always kept 1.
• Padding: Padding is generally preferred.

Fully Connected

Now after performing pooling and convolution we add an extra layer named fully
connected to complete the CNN architecture. The output we obtain after both pooling
convolution is performed is a 3D volume but for fully connected layer we need the
input to it should be a 1D volume. Therefore we need to convert we need to flatten the
3D volume output obtained in pooling layer to 1D volume so that it can be an input
to the fully connected layer. Flattening is a simple converting a 3D volume to a 1D
volume. Figure 7 shows the fully connected layer of convolutional neural network.
Training
The training of CNN is done in the same way as of ANN, back propagation fol-
lowed by gradient descent. The involvement of mathematical operation is due to
convolution.

Fig. 7 Fully connected layer [10]


218 A. Kumar et al.

Fig. 8 Implementation architecture of convolutional neural network [10]

2.1.2 Intuition and Implementation

To implement CNN architecture we need to implement two techniques which are


namely feature extraction and feature classification. Feature extraction is performed
by convolution layer and pooling layer. In this phase from an anonymous picture, the
important features are extracted. For example from a given picture of human being
it extracts the number of legs number of hands, eyes etc. the convolution layer trains
itself to identify these features by overlapping several layers one upon another. For
example the 1st layer detects the outline of the image, the 2nd layer detects the size,
and the 3rd layer detects the color which when combined results a particular feature
by comparing them in many images [12].
The architecture used by the model is a combination of four convolution layer +
one pooling layers, followed by two fully connected layers. The implementational
architecture of convolutional neural network is shown in Fig. 8.
The basic code for implementing CNN is given below as:
model=Sequential()
model.add(Conv2D(32,(3,3),activation=’relu’,padding=
’same’,name=’conv_1’,input_shape=(150,150,3)))
model.add(MaxPooling2D((2,2),name=’maxpool_1’))
model.add(Conv2D(64,(3,3),activation=’relu’,padding=
same’,name=’conv_2’))
model.add(MaxPooling2D((2,2),name=’maxpool_2’))
model.add(Conv2D(128,(3,3),activation=’relu’,padding=
‘same’,name=’conv_3’))
model.add(MaxPooling2D((2,2),name=’maxpool_3’))
model.add(Conv2D(128,(3,3),activation=’relu’,padding=
‘same’,name=’conv_4’))
model.add(MaxPooling2D((2,2),name=’maxpool_4’))
model.add(Flatten())
model.add(Dropout(0.5))
Malaria Disease Detection Using CNN Technique with SGD … 219

model.add(Dense(512,activation=’relu’,name=’dense_1’))
model.add(Dense(128,activation=’relu’,name=’dense_2’))
model.add(Dense(1,activation=’sigmoid’,name=’output’))

2.2 Stochastic Gradient Descent (SGD)

In SGD, stochastic tells about the system or task that is associated with random
possibility. In this process, instead of whole data set, we select few samples randomly
from dataset. SGD computes the parameter’s gradient using only a single or a less
training examples [12]. Equation 2 shows the updation of each training example.

W := w − n∇ Q i (w) (2)

2.3 RMSprop

The RMSprop optimizer is alike the gradient descent algorithm with momentum.
The RMSprop optimizer limits the oscillations in the upright direction. Therefore,
we can increase our learning rate and our algorithm could take substantial steps
in the horizontal direction converging quickly. The difference between RMSprop
and gradient descent is on how the gradients are calculated [13]. We are calculating
Running average in terms of means square as shown in Eq. 3,

v(w, t) = γ v(w, t − 1) + (1 − γ )(∇ Q i (w))2 (3)

In Eq. 3, γ is forgetting factor


n
w := w − √ ∇ Q i (w) (4)
v(w, t)

Equation 4 shows the updation of parameters.

2.4 Adaptive Moment Estimation (ADAM)

We can use ADAM, which is an optimization algorithm, as an substitute of classical


stochastic gradient descent system to update network weights in training data. This
is used to perform optimization and is one of the best optimizer at present. ADAM
is derived from adagrad and it is the more adjustable approach. ADAGRAD and
momentum collectively is known as ADAM [14].
220 A. Kumar et al.

Parameters w (t) and L (t) , where index t indicates the current training iteration,
Parameter updation in ADAM is given by:

m (t+1)
w ← β1 m (t)
w + (1 − β1 )∇w L
(t)
(5)

 2
vw(t+1) ← β2 vw(t) + (1 − β2 ) ∇w L (t) (6)

 m (t+1)
w
mw = (7)
1 − (β1 )(t+1)
 vw(t+1)
vw = (8)
1 − (β2 )(t+1)


mw
wt+1 ← w t − η  
(9)
vw + ∈

In Eq. 5 and 6, β1 and β2 are gradient’s forgetting factors and second moment of
gradients. In Eq. 9, ∈ is small scalar used to prevent division by 0.

3 Automated Diagnosis of Malaria

Deep learning can be instrumental in prevent the wrong diagnostic decision by imple-
menting the classification of cell images. An area of machine learning popularly
known as Deep Learning has executed outstandingly well in fields other than med-
ical because the its applications had been less implemented in medication area due
to absence of expertise in knowledge in that area and due to some privacy concerns
as well. But, in last few years medical sectors have started using deep learning [15].
A well known super class of artificial neural networks, Convolutional neural net-
work (CNN) has become most influential in diverse computer vision operations and
has gained recognition across a different diversity of domains which includes med-
ical science fields. CNN model can learn spatial features through means of back
propagation which involves different building blocks. Figure 9 depicts an example
of CNN model.
CNN is a best deep learning model specially defined for 2-Dimensional facts
such as videos and images. The CNN model provides us the great visualization
which help us in understanding the relations. The main feature of CNN model is that
it can automatically detects the important features without any human supervision
by training the learning layers once the model fits the input feature. As compared
to other models, CNN is computationally more effective than other models. Other
advantage of CNN is that it is easy to train the models and also have less parameters
as compared to networks which are fully connected with similar number of hidden
units. [17–20].
Malaria Disease Detection Using CNN Technique with SGD … 221

Fig. 9 Example of CNN Model [16]

Fig. 10 Sample image of dataset

3.1 Image Acquisition

The data that has been used in the development of the system were taken from official
website of National Library of Medicine (NLM) which contains 27,558 images of
cells which is further divided into infected and uninfected cells. Figure 10 shows the
sample dataset.

3.2 Data Visualization

The technique in which an array of static and interactive graphics within a specific
context is used to help us understand and interpret a large amount of data is known
222 A. Kumar et al.

Fig. 11 Labeled image of infected and uninfected cells

as data visualization. We randomly plotted parasitized and uninfected cells which is


shown below in Fig. 11 and labeled them as 1 and 0 respectively.

3.3 Data Preprocessing

Preprocessing is the process of making transformations on the raw data before the
machine learning or deep learning algorithm are applied on it. Preprocessing of data is
an essential stage in Machine Learning because the standard of data and functional
details can be extracted from it which can affect the quality and accuracy of our
model, therefore, processing of data is of utmost important.
For example, if we train convolutional neural network on raw images then it
will give us poor result. The preprocessing phase also helps to accelerate the whole
model. In our Model, Images are processed into Jupyter Notebook. Before inputting
the image to CNN for training, we normalize the image by dividing it by 255.
Malaria Disease Detection Using CNN Technique with SGD … 223

4 Proposed Model

The Convolutional Neural Network is one of the most effective neural networks to
work with images and make classifications. In our model we have used Keras to
create the CNN model. Figure 12 depicts the basic flow of our model.

Fig. 12 Flow chart of proposed model


224 A. Kumar et al.

Convolution 2D

This creates a convolution kernel. We set a few properties as defined below:


• Filters: The first parameter defines the output shape of the layer. In this case, for
different layers we kept the value as 16, 32, 64.
• Kernel Size: It defines the size of the window we want to use that will traverse
along the image. We set it as 2.
• Input Size: It is used to define the input size of each image. The parameter input
shape will be (50, 50, 3). We need to define input shape only for the first layer.
• Activation: The activation function is defined in this parameter. We used relu as
the activation function which is also known as Rectified Linear Unit.
• Padding: When the size of output feature-maps is same as input feature-maps then
it is known as padding.

MaxPool 2D
Pool_Size: It defines the matrix size which defines the number of pixel values that
will be converted to 1 value. We used the pool_size value as 2.

Dropout

It selects some of the values at random to be set as 0 so as to prevent over fitting in


the model and we used only the rate parameter and set it as 0.2.

Flatten
It flattens the complete n-dimensional matrix to a single array.

Dense

It defines a densely connected neural network layer and I defined the following
parameters:
• Activation: It defines the activation function which we set as relu.
• Units: Number of neurons in a given layer is defined by Units.
Model Training and Result Analysis

Using fit method, we trained the model with x_train and y_train. We have used total
epochs as 50, which is basically 50 iterations of the complete dataset with a batch size
of 50. We have also splitted our data into validation of 0.1, so the model trained on
90% training data and validated on 10% training data. Summary of our Experimental
exemplary is shown in Table 1.
We have evaluated our model with different optimizer and obtained the different
accuracy.
Malaria Disease Detection Using CNN Technique with SGD … 225

Table 1 Summary of model


Layer (type) Output shape Param#
conv2d_1 (Conv2D) (None, 50, 50, 16) 208
max_pooling2d_1(MaxPooling2 (None, 25, 25, 16) 0
conv2d_2(Conv2D) (None, 25, 25, 32) 2080
max_pooling2d_2((MaxPooling2 (None, 12, 12, 32) 0
conv2d_3(Conv2D) (None, 12, 12, 64) 8256
max_pooling2d_3(MaxPooling2 (None, 6, 6, 64) 0
dropout_1(Dropout) (None, 6, 6, 64) 0
flatten_1(Flatten) (None, 2304) 0
dense_1(Dense) (None, 500) 1,152,500
dropout_2(Dropout) (None, 500) 0
dense_2(Dense) (None, 2) 1002
Total parameters: 1,164,046
Trainable parameters: 1,164,046
Non-trainable parameters: 0

4.1 Malaria Detection Using SGD Optimizer

Here, stochastic tells about the system or task that is associated with random pos-
sibility. In this process, instead of whole data set, we select few samples randomly
from dataset. SGD computes the parameter’s gradient using only a single or a few
training examples. When we applied SGD optimizer in our model, it gave us the
accuracy of 95.54% on test set and 95.33% on train set.
Accuracy and Log-Loss (also known as Cost Function) parameter were found
during the training of our model and are plotted which is shown in Fig. 13.
The classification Report obtained while using SGD optimizer in our model is
given in Table 2.

Fig. 13 Graph of log-loss and accuracy while using SGD technique


226 A. Kumar et al.

Table 2 Classification report using SGD


Precision Recall F1-score Support
0 0.98 0.94 0.96 1408
1 0.94 0.98 0.96 1347
Avg/total 0.96 0.96 0.96 2755

4.2 Malaria Detection Using RMSprop Optimizer

The RMSprop optimizer is alike the gradient descent algorithm with momentum.
The RMSprop optimizer limits the oscillations in the upright direction. Therefore,
we can increase our learning rate and our algorithm could take substantial steps in
the horizontal direction converging quickly. The difference between RMSprop and
gradient descent is on how the gradients are calculated.
When we applied RMSprop optimizer in our model, it gave us the accuracy of
95.54% on test set and 95.32% on train set. Accuracy and Log-Loss (also known
as Cost Function) parameter were found during the training of our model and are
plotted which is shown in Fig. 14.
The classification Report obtained while using RMSprop optimizer in our model
is given below in Table 3.

Fig. 14 Graph of log-loss and accuracy while using RMSprop technique

Table 3 Classification report using RMSprop


Precision Recall F1-score Support
0 0.96 0.96 0.96 1408
1 0.95 0.95 0.95 1347
Avg/total 0.96 0.96 0.96 2755
Malaria Disease Detection Using CNN Technique with SGD … 227

Fig. 15 Graph of log-loss and accuracy while using ADAM technique

Table 4 Classification report using ADAM


Precision Recall F1-score Support
0 0.95 0.96 0.95 1408
1 0.96 0.96 0.96 1347
Avg/total 0.96 0.96 0.96 2755

4.3 Malaria Detection Using ADAM Optimizer

We can use ADAM, which is an optimization algorithm, as an substitute of classical


stochastic gradient descent system to update network weights in training data. This
is used to perform optimization and is one of the best optimizer at present. ADAM
is derived from adagrad and it is the more adjustable approach. ADAGRAD and
momentum collectively is known as ADAM.
In our model, Adam optimizer gave us the accuracy of 96.88% on train set and
96.62% on test set. Accuracy and Log-Loss (also known as Cost Function) parameter
were found during the training of our model and are plotted which is shown in Fig. 15.
The classification Report obtained while using Adam optimizer in our model is
given below in Table 4.

5 Comparison of Different Techniques

After analyzing different optimizer on our dataset we got different accuracies on our
train and test set which is plotted in Figs. 16 and 17.
The different accuracy is plotted which was obtained by using different optimizer.
On Test and Train set we saw that ADAM optimizer worked very well with our dataset
and gave us good accuracy of 96.62% in Test Set and 96.88% in Train Set.
228 A. Kumar et al.

Fig. 16 Accuracy on test set Accuracy on Test Set


97
96.62

Accuracy (in %)
96.5

96
95.54 95.54
95.5

95
SGD RMSProp ADAM
Optimizers

Fig. 17 Accuracy on train Accuracy on Train Set


set 96.88
97
Accuracy (in %)

96.5
96
95.33 95.32
95.5
95
94.5
SGD RMSProp ADAM
Optimizers

6 Conclusion and Future Work

The purpose of the proposed method is to improve the quality of detection of Malaria
which can help microscopists to detect malaria easily and accurately and further can
start the proper medication as soon as possible. The future work is directed towards
improving the performance and enhancing the algorithm and denoising the images
of blood cell for better detection of Malaria. Another direction of future work is
by implementing this model into a single application which can be operated on any
Smartphone to detect malaria easily.

References

1. Malaria Microscopy Quality Assurance Manual, version 2. World Health Organization (2016)
2. World Malaria Report. World Health Organization (2016)
3. O’Meara, W.P., Mckenzie, F.E., Magill, A.J., Forney, J.R., Permpanich, B., Lucas, C., Gasser,
R.A., Wongsrichanalai, C.: Sources of variability in determining malaria parasite density by
microscopy. Am. J. Trop. Med. Hyg. 73(3), 593–598 (2005)
4. Rajaraman, S., Antani, S.K., Xue, Z., Candemir, S., Jaeger, S., Thoma, G.R.: Visualizing
abnormalities in chest radiographs through salient network activations in deep learning. In:
Life Sciences Conference, IEEE, Australia, pp. 71–74 (2017)
Malaria Disease Detection Using CNN Technique with SGD … 229

5. Liang, Z., Powell, A., Ersoy, I., Poostchi, M., Silamut, K., Palaniappan, K., Guo, P., Hossain,
M.A., Sameer, A., Maude, R.J., Huang, J.X., Jaeger, S., Thoma, G.: CNN-based image analysis
for malaria diagnosis. In: International Conference on Bioinformatics and Biomedicine, IEEE,
China, pp. 493–496 (2016)
6. Dong, Y., Jiang, Z., Shen, H., Pan, W.D., Williams, L.A., Reddy, V.V.B., Benjamin, W.H.,
Bryan, A.W.: Evaluations of deep convolutional neural networks for automatic identification
of malaria infected cells. In: International Conference on Biomedical and Health Informatics,
IEEE, USA, pp. 101–104 (2017)
7. Shang, W., Sohn, K., Almeida, D., Lee, H.: Understanding and improving convolutional neu-
ral networks via concatenated rectified linear units. In: International Conference on Machine
Learning, ACM, USA, pp. 2217–2225 (2016)
8. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
Rabinovich, A.: Going deeper with convolutions. In: International Conference on Computer
Vision and Pattern Recognition, IEEE, USA, pp. 1–9 (2015)
9. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn.
Res. 13(1), 281–305 (2012)
10. Saha, S. https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-
networks-the-eli5-way-3bd2b1164a53
11. Majumdar, S.: DenseNet Implementation in Keras. GitHub
12. Shang, F., Zhou, K., Liu, H., Cheng, J., Tsang, I.W., Zhang, L., Tao, D., Jiao, L.: VR-SGD: a
simple stochastic variance reduction method for machine learning. IEEE Trans. Knowl. Data
Eng. (2018)
13. Yazan, E., Talu, M.F.: Comparison of the stochastic gradient descent based optimization tech-
niques. In: International Artificial Intelligence and Data Processing Symposium. IEEE, Turkey
(2017)
14. Zhang, Z.: Improved Adam optimizer for deep neural networks. In: International Symposium
on Quality of Service. IEEE, Canada (2018)
15. Gopakumar, G.P., Swetha, M., Sai Siva, G., Sai Subrahmanyam, G.R.K.: Convolutional neu-
ral network-based malaria diagnosis from focus stack of blood smear images acquired using
custom-built slide scanner. J. Biophotonics 11(3) (2017)
16. Saha, S.: A comprehensive guide to convolutional neural networks—the ELI5 way. Towards
Data Science
17. Prabhu, R.: Understanding of convolutional neural network (CNN)—deep learning
18. Bibin, D., Nair, M.S., Punitha, P.: Malaria parasite detection from peripheral blood smear
images using deep belief networks. IEEE, pp. 9099–9108 (2017)
19. Das, D.K., Maiti, A.K., Chakraborty, C.: Automated system for characterization and classifica-
tion of malaria-infected stages using light microscopic images of thin blood smears. J. Microsc.
257(3), 238–252 (2015)
20. Kumar, A., Sarkar, S., Pradhan, C.: Recommendation system for crop identification and pest
control technique in agriculture. In: International Conference of Communication and Signal
Processing. IEEE, India, pp. 185–189 (2019)

Avinash Kumar is a Final year student of KIIT DU, Bhubaneswar, India. His research interests
area includes Image Processing, Deep Learning and Machine Learning and currently working in
different research domains.

Sobhangi Sarkar is a Final year student of KIIT DU, Bhubaneswar, India. Her research interests
area includes Deep Learning, Image Processing and Machine Learning and currently working in
different research domains.
230 A. Kumar et al.

Chittaranjan Pradhan is working at School of Computer Engineering, KIIT DU, Bhubaneswar,


India. His research includes Information Security, Image Processing, Data Analytics and Multi-
media Systems. Dr. Pradhan has published more than 50 articles in the national and international
journals and conferences.
Deep Reinforcement Learning Based
Personalized Health Recommendations

Jayraj Mulani, Sachin Heda, Kalpan Tumdi, Jitali Patel,


Hitesh Chhinkaniwala and Jigna Patel

Abstract In this age of informatics, it has become paramount to provide personal-


ized recommendations in order to mitigate the effects of information overload. This
domain of biomedical and health care informatics is still untapped as far as person-
alized recommendations are concerned. Most of the existing recommender systems
have, to some extent, not been able to address sparsity of data and non-linearity of
user-item relationships among other issues. Deep reinforcement learning systems
can revolutionize the recommendation architectures because of its ability to use non-
linear transformations, representation learning, sequence modelling and flexibility
for implementation of these architectures. In this paper, we present a deep reinforce-
ment learning based approach for complete health care recommendations including
medicines to take, doctors to consult, nutrition to acquire and activities to perform that
consists of exercises and preferable sports. We try to exploit an “Actor-Critic” model
for enhancing the ability of the model to continuously update information seeking
strategies based on user’s real-time feedback. Health industry usually deals with
long-term issues. Traditional recommender systems fail to consider the long-term
effects, hence failing to capture dynamic sentiments of people. This approach treats
the process of recommendation as a sequential decision process, which addresses the

J. Mulani · S. Heda · K. Tumdi · J. Patel (B) · J. Patel


Department of Computer Science and Engineering, Institute of Technology
Nirma University, Ahmedabad, India
e-mail: [email protected]
J. Mulani
e-mail: [email protected]
S. Heda
e-mail: [email protected]
K. Tumdi
e-mail: [email protected]
J. Patel
e-mail: [email protected]
H. Chhinkaniwala
Adani Institute of Infrastructure Engineering, Ahmedabad, India
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 231


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_12
232 J. Mulani et al.

aforementioned issues. It is estimated that over 700 million people will possess wear-
able devices that will monitor every step they take. Data collected with these smart
devices, combined with other sources like, Electronic Health Records, Nutrition Data
and data collected from surveys can be processed using Big Data Analysis tools, and
fed to recommendation systems to generate desirable recommendations. These data,
after encoding (state) into appropriate format, will be fed to the Actor network, which
will learn a policy for prioritizing a particular recommendation (action). The action,
state pair is fed to the critic network, which generates a reward associated with the
action, state pair. This reward is used to update the policy of the Actor network. The
critic network learns using a pre-defined Expected Reward. Hence, we find that using
tools for Big Data Analytics, and intelligent approaches like Deep Reinforcement
Learning can significantly improve recommendation results for health care, aiding
in creating seamlessly personalized systems.

Keywords Big data · Deep reinforcement learning · Recommendation systems ·


Biomedical and health informatics · Actor critic model · Electronic health records

1 Introduction

Recommender systems have begun to play a key role in industries including enter-
tainment, retail, education, tourism and many more. However, one of the largest
industries exploiting the potential of effective and accurate recommendation systems
is the healthcare industry. The time has come when people have started to unleash
the potential of the state-of-the-art technologies, that we have and can develop, to
improve the most important aspect of their lives, i.e. Health. Facts and figures shown
in Sect. 2.2 are surprising. Studies by the World Health Organization conclude that
there are over 12,400 diseases, disorders and health related ailments that can poten-
tially strike us at any given day. On one hand, around half the population of the
United States is not aware about the potential health related threats, be it obesity,
diabetes, heart failure, etc., and on the other, we have quintillions of bytes of data,
just related to health and fitness being generated on a daily basis. Personalized health
recommendation systems are meant to bridge this gap.
In this chapter, we propose a deep reinforcement learning based framework for
generating personalized health recommendations. Sections 2 and 3 aim to provide
some basic, yet important literature concerning the topic on hand. It discusses various
facts and figures, role of big data and reinforcement learning in recommendation
systems, elaborates on existing recommendation systems and also proposes a need
for a reinforcement learning based recommendations. It also discusses the various
problems that we are trying to address. They include awareness, data not harnessed
to its potential, some security issues and also about the low doctor to patient ratio.
The next section deals with some of the limitations of the existing solutions to the
said problems, which include lack of an all-round recommendation system, system
biases and myopic recommendations.
Deep Reinforcement Learning Based Personalized … 233

Next, we try to use some of the features of Deep Reinforcement Learning, com-
bined with standard machine learning and data mining algorithms and techniques,
along with the potential in Big Data to help address the problems by overcoming
the said limitations. Thereafter, the aforesaid three-layer deep reinforcement learn-
ing based framework is discussed to support our claims. The proposed framework
consists of three layers. The first is the data integration and preprocessing layer. In
this layer, we try to integrate the data collected from various sources and process
it, using big data [1] and data mining techniques, so that it can be fed to the sec-
ond layer. The second layer is the disease probability prediction layer. It consists of
10 legacy machine-learning and deep-learning algorithms to predict the probability
of 10 commonly occurring diseases, and for which some recommendations can be
made. Finally, the third layer is the recommendation generation layer. It consists of
an actor critic model helping us to make sequential decisions, hence making desirable
recommendations. Towards the end, the method to process the outputs of the actor
critic model and put it to use is described. Lastly, some sample recommendations
prescribed by a medical practitioner have been provided for ready perusal.

2 Background

In this era of internet and informatics, the amount of information being consumed and
generated has grown exponentially. Before the advent of this myriad of applications
generating and using information, it was not so difficult to manage information and it
was fairly possible to deliver the right information to the right person. However, con-
sidering the present scenario, it has become extremely difficult to deliver personalized
information to a targeted audience. Here are where powerful recommendations come
into the picture. Previous works in the recommendation systems primarily include
content-based collaborative filtering techniques, deep learning models, factorization
machines, regression models, hybrid mechanisms, etc.

2.1 Recommendation Systems

Content-based recommendation system works on the user’s profile, which contains


the information about the user as well as a gist of the types of items liked by the user.
Using this information, a model can be built to identify and recommend other similar
items that the user is likely to prefer. However, this model fails to identify any other
domain that the user might be interested in. So, to solve this issue, a new approach
called collaborative filtering came into existence. The basic idea behind collaborative
filtering is that “similar users share appreciations”. Collaborative filtering [2] exploits
the fact that people with similar choices tend to prefer similar products. Moreover,
similar items preferred by peers can be lucrative to recommend to other people
in the same neighborhood. Collaborative filtering can be implemented in two ways
234 J. Mulani et al.

discussed above. The first one is called user-user collaborative filtering, and the other
is called item-item collaborative filtering.
Furthermore, a recommender system can also be built using a combination of
the aforementioned content-based and collaborative filtering based, called a hybrid
recommendation system [3]. A third approach called property based collaborative
filtering can be used to solve some of the persistent issues of data sparsity, over-
specification, slow start, etc. Health Aware REcommendation System or HARE is
an ontology-based model that uses levels of appeal as a basis of providing recom-
mendations.
All of the above-mentioned approaches for recommendation are built from the
perspective of customers to help them choose effectively, something that they may
or may not be looking for. However, we could not find efficient models to provide
recommendations in the discipline of health and bioinformatics. Moreover, there
are some issues with these methods that need to be fixed. These methods fail to
consider the long term effects of the recommendations they make. Especially, in
the health domain, the long-term results far outweigh the short-term successes. The
ability of reinforcement learning based model, collectively working with the power
of neural networks, to work in a dynamic environment; hence overcoming the short-
comings of traditional recommendation systems make them stand apart. Healthcare
recommendations have a dire need to make sequential decisions, rather than sponta-
neous decisions [4]. The proposed framework helps resolve such issues and provide
a unique all-around perspective to effective health recommendations.

2.2 Facts and Figures

For machines to learn and perform well, they need is data; not just any data, relevant,
complete, formatted, and consistent data. According to International Data Corpora-
tion, it is estimated that 2314 exabytes (1 exabyte = 1 billion bytes) of data, relating to
healthcare industry alone will be produced annually by 2020, which is growing at an
unbelievable rate of 48% per annum. Given this, and the highly advanced algorithms
to extract valuable information from this data, coupled with compatible sophisti-
cated hardware, we have an opportunity to give something to the society. Having
mentioned this, the biggest question that arises is the sources and the authenticity of
the sources of this data.
We were not surprised to know about some of the following facts mentioned in the
Stanford Medicine 2017 Health Trends Report, titled “Harnessing the Power of data
in Health”. 84% of the patients are ready to share vital statistics like blood pressure
or basic lab test results and 75% of the people are willing to share information about
the health of internal organs. We have been hammered with buzzwords like IoT, Big
Data, Machine Learning, Deep learning and what not. Well, statistically analyzing,
it is going to be a $34 billion-dollar market for wearable technology, generating
quintillions of bytes of research usable data every day. The exponentially growing
pace of research in the health domain motivates many researchers to make significant
Deep Reinforcement Learning Based Personalized … 235

contributions. Apart from wearable devices, a substantial amount of data is avail-


able for public and research use at platforms like Kaggle, world health organization
datasets, data.gov, etc. Big Data tools assist the intelligent systems to gather, manage
and process the data effectively.

2.3 Big Data

There are a few buzzwords which have gained momentum in past few decades, one of
them is Big Data. Before defining it technically, let us give you some reasons behind
tossing of this topic. Forbes has reported that approximately 4.15 M YouTube videos
are watched every minute, 456,000 tweets are sent on Twitter, 46,740 photos are
posted on Instagram and on Facebook 510,000 comments are posted and 293,000
statuses are updated. Not only this Forbes has also reported that with our current
pace, we are creating 2.5 quintillion bytes of data, and this pace is only advancing.
Internet of Things (IoT) is one of the major technologies which plays a vital role in
this advancing. Just imagine the volume of data being produced with these activities.
This rapid creation of data that is being developed by social media, telecom, business
applications, and various other domains is leading to the formation of Big Data.
‘Big Data is all about size and volume of data’, this is the biggest myth that
people have for Big Data. But in reality, it is not just limited to huge volume of data
being collected, indeed it is a collection of large volume of data coming from various
sources in different formats. Data was generated previously also, but those were
in proper formats and that’s why the relational databases were capable of storing
them. But due to the varied nature of data, now it is not possible to store them in
traditional formats. Big Data has three varied formats: Structured, unstructured and
Semi structured.

2.3.1 Characteristics

The following fig explains the five V’s of Big Data [5]:
1. Volume: Huge amount of data
2. Variety: Different formats of data from various resources, being integrated
3. Velocity: Pace of generation of data
4. Value: Extraction of useful data
5. Veracity: Inconsistencies and uncertainty in data (Fig. 1).

2.3.2 Big Data Analytics

Apart from storing this huge amount of data, there’s another vital problem associated
with it, which is to find useful information (knowledge) from this data collection. This
236 J. Mulani et al.

The following fig explains the five V’s of Big Data [15]: -
1. Volume: Huge amount of data
2. Variety: Different formats of data from various resources, being
integrated
3. Velocity: Pace of generation of data
4. Value: Extraction of useful data
5. Veracity: Inconsistencies and uncertainty in data

Fig. 1 Big Data Characteristics

gives the birth to Big Data Analytics. It is the complex process of processing big data
in order to search for any hidden information, interesting patterns, market trends
and preferences of customers which can indeed help organizations making their
marketing strategies. It is a process of refining the raw, unstructured data retrieved
from various sources to useful information. There are various tools available for
performing this task like Hadoop, Spark, Hive, Pig etc.
Present day organizations realize that Big Data is ground-breaking, however
they’re beginning to understand that it’s not so valuable as when it’s matched with
wise computerization. With enormous computational power, Machine Learning (ML)
and Reinforcement Learning (RL) frameworks help organizations oversee, break
down, and utilize their information definitely more effectively than any time in recent
memory. Machine Learning and Reinforcement Learning are also used to find hidden
information and patterns from huge amount of data using complex algorithms to be
faster and accurate.
Their capabilities are impacting almost every field. They have a profound effect
on healthcare, by providing personalized treatment plans and improving diagnostics.
Predictive investigation empowers specialists and clinicians to concentrate on giv-
ing better administration and patient consideration, making a proactive system for
tending to quiet needs before they are wiped out.

2.4 Reinforcement Learning

Reinforcement Learning is an area of machine learning in which an agent learns


how to behave in the given environment by taking various actions and observing the
rewards/results obtained after taking those actions. It basically maps the situations
or states with the corresponding actions to be taken. Concretely, a learner or an agent
takes various actions and interacts with the environment and aims to maximize the
its expected rewards by taking actions in optimally. The reward can be defined as
the result that an agent receives after taking a particular action from the environment.
However, for maximizing the total or expected reward the agent cannot always act
greedily and maximize the immediate reward. The reinforcement learning algorithms
Deep Reinforcement Learning Based Personalized … 237

Fig. 2 Reinforcement learning elements

try to maximize the rewards in the long run. Policy may or may not be defined as the
plan of action of an agent (Fig. 2).

2.4.1 Markov Decision Process

Markov Decision Process (MDP) is the process for modeling the problems in the
reinforcement learning. It is used for modeling the sequential decision problems
mathematically.
The environment in Reinforcement Learning problem consists of a set of States
S, a set of actions A, transition probabilities p (st+1| st , at ), a probability distribution
of initial states p(s0 ), a reward function r: S A → R (where R is a real number) and
a discount factor γ ∈ [0, 1]. These components are used for formulating Markov
Decision Process. MDP is defined as a tuple (S, A, p, r, γ ). A policy π is used for
mapping the state with corresponding action. π : S → A. The discounted reward with
discount factor γ can also be used. Here the goal of the agent would be to maximize
the expected return as shown in Eq. 1.


Gt = γ k Rt+k+1 (1)
k=0

2.4.2 Q Learning

Q learning uses Action-Value function for a policy π which denotes how good it is
for an agent to take an action a being in the state s. Equation 2 denotes the Q value
function to be used.

Q π (s, a) = E π [G t |St = s, At = a] (2)

The basic version of Q learning maintains the table of Q values for each state-
action pair value. The Bellman equation (Eq. 3) is used for learning the optimal
Q-value function by performing multiple iterations. The optimal policy obtained by
238 J. Mulani et al.

the Q table can be denoted as Q*(s, a).


 
  
Q ∗ (s, a) = E Rt+1 + γ max

Q ∗ s , a (3)
a

Here (s , a ) denotes possible next state-action pair.

2.4.3 Deep Q Learning

The process of finding Q values for each state-action space cannot be feasible where
the actions and states are continues and high-dimensional. Moreover, in the recom-
mender systems the number states will be very large. Hence, the process of learning
the Q values for each state-action pair can become very slow if the state space size
increases. Therefore, a parameterized values function Q (s, a; θ) is required to approx-
imate the Q values. Here, θ denotes the parameter vectors that is used for defining the
Q values. Various function approximators such as Linear Combination of Features,
Neural Network, Nearest Neighbor, Fourier/wavelet bases can be used.
Deep Q Network (DQN) [6], an algorithm used in Deep Q Learning, uses Neural
Network as the value function approximators. The DQN gives the Q values (Q (s, a))
as the output for each of the actions(a) that can be taken from the given state(s). In
Deep Q Network the dataset is generated by the tuples of form <s, a, r, s > where an
action(a) is taken at state(s) and the immediate reward(r) is observed after reaching
the new state(s’). Experience replay is done by selecting the random tuples from the
stored database in the memory once the sufficient number of iterations are completed.
DQN uses ε-greedy policy for collecting the information of various states in the
memory. The network updates the weights of the neural network based on the loss
function give below.
   2
loss = E Q(s, a; θ ) − r (s, a) + Q s  , a  ; θ − (4)

Here θ − is a previously stored (frozen) parameter value and is the newly derived
parameters. There is also an improvement for DQN called Duel DQN which estimates
state-value function V(s) and the advantage function A (s, a) with shared network
parameters [7].

2.4.4 Policy Gradient

DQN method tries to learn the state-action value function through the neural network
and then select the actions accordingly. Policy gradient method directly learns the
policy with the parameterized function, (a, s) [8] The value of reward function is
depended on this policy and various algorithms can be applied to maximize the
reward. The reward function for continuous space can be defined as follows:
Deep Reinforcement Learning Based Personalized … 239
 
J (θ ) = d π (s) πθ (a|s)Q π (s, a) (5)
s∈S a∈A

Here d π is the stationary distribution of markov chain of π (theta). The equation


shows that the reward function depends on action selection as well as stationary dis-
tribution of states. The theorem uses likelihood ratios to compute the policy gradient
as follows.
θ
∇θ J (θ ) = E πθ ∇θ logπθ (s, a)Q π (s, a) (6)

2.4.5 Actor-Critic Model

Policy based methods and Value based methods (Deep Q Learning) have certain
drawbacks. Problem with Policy method based is that it is very hard to find a good
score function that evaluates the policy generated by the algorithm. Similarly for
Value based method, the policy is implicit in the value function approximation.
Hence, it is hard to evaluate the behavior of the model.
Actor-Critic model is a hybrid method that incorporates the features of both,
the policy-based method and value-based methods. Two neural networks, an actor
network that controls the behavior of the agent (policy based) and a critic network
that evaluates the actions taken by the actor (value based) are used in this model.
Figure 3 shows the architecture of Actor-Critic model. Actor interacts with the
environment and updates the θ parameter values of actor network that estimates

Fig. 3 Actor-critic model


240 J. Mulani et al.

the policy. Critic evaluates the actions of actor and updates the parameters of value
function approximations based on the reward obtained.

3 Problems

The problems that we are trying to address can be four-fold. There is a need to address
these problems. These are:

3.1 Data Utilization

First, is that despite having so much information about people’s previous health
records and knowledge about how it can affect the present health of a person, consid-
ering the environmental conditions as well as the medication he/she is undergoing;
we are not able to use it all to its full potential. Apart from this, the data that we have
may be highly time critical, that means if it is useful now, it may become obsolete
at any point in time. Hence, it is important to make the right use of the data and
generate useful insights from the same.

3.2 Health Awareness

The second, and the most important perspective is that, even with the advancements
in the technology, most of the people are not fully aware that they are even suffering
from a disease. Apart from this, primarily due to medical jargon, even if they carry
out the tests, once the tests are done, they do not track the results in the future. Being
so busy in the schedule, many people forget about the health threats hanging right
in front of them. By our approach, we provide this end to end solution to collect the
data, interpret it, and make people more aware about their own health and health
issues.

3.3 Doctor to Patient Ratio

A third perspective can be that, even if we have such high end state-of-the-art medic-
inal treatment techniques and technology, doctors fail to address to so many patients.
We have a doctor to patient ratio of less than 1:1000, making it quite difficult for
doctors to handle such a huge volume of patients in time. So, if we can develop
some smart machines that may substitute a doctor for not so high-risk diseases, that
will enable people to data-driven intelligent decision systems, recommending them
Deep Reinforcement Learning Based Personalized … 241

methods to mitigate the diseases, or in some cases even prevent them from happening
by predicting some illness that can strike, based on the available data and history of
similar patients.

3.4 Information Security

Finally, fourth perspective may be about security concerns. Data collected from
various sources related to healthcare can be used for providing better solutions to
the concerned people for their health-related problems. However, the security of the
data should be having prime importance. One must ensure that the health-related
data is used for the benefit and betterment of the society for providing health related
suggestions. It should not be misused for financial benefits of the company. The
framework that we have proposed ensures the security of the data. The health-related
datasets that we have collected are only used for giving recommendations to improve
health’s of the people. We have tried to avoid inclusion of recommendations that
involve financial benefits of various companies in health sector, doctors and hospitals.
The sole purpose of the framework is to use health related data and the knowledge
of various intelligent algorithms of Machine Learning and AI for the betterment of
society.
It is right that every individual is different, and that no two people can have same
medication even if the diseases they are suffering from is same. However, some
steps other than medication like a good diet, or a better exercise format can also
help conquer the disease. Our objective here is to provide better recommendations to
these aspects that can be generalized and they are beneficial to everyone irrespective
of metabolic differences.

4 The Limitations of Existing Solutions

There are many solutions to the aforementioned problems. However, no problem is


completely solved completely. We found that the following are some of the many
limitations that they have:

4.1 Lack of an All-round Solution

Many existing solutions for health-related recommendations lack all-around solu-


tions and suggestions. Various online chatbot systems (WebMD) are available that
ask various questions to the interactor related to his health and suggest medicine to
be taken. However, these systems only take the present scenarios into consideration.
The diseases that a person may face cannot be identified by the currently existing
242 J. Mulani et al.

solutions. Moreover, the medical history and family details of a particular person are
also not considered for medicine recommendation and disease identification. There
are diseases that come from family members inherently. Hence, if family history is
not taken into consideration the disease prediction can be false.
Similarly, there are various online systems available that suggest a person the
food to be taken and diet to be followed after getting the information of the person’s
age, sex, weight, height, and other required details. However, these systems lack the
feature of identification of potential diseases.

4.2 System Bias

The existing recommender systems deal with the items and feedback provided by the
users for those items. For building a model, the systems only take into consideration
the feedback of the users for the items that the system has already recommended.
This problem is called a System Bias, where the system only considers the feedback
of the users for the items that a system has recommended. In our case the system
has to recommend a content and detailed information based on the given situation.
Moreover, it is not necessary that the system will only take the already recommended
contents in to the account.

4.3 Myopic Recommendation

The recommendation systems are trained to optimize immediate response. Hence,


they tend to recommend the content or item which is catchy in nature or users
are highly familiar with. These systems avoid exploration of new things which can
give higher long-term benefits. However, as aforementioned, and also incorporates
a facility to control exploration and exploitation. Hence, Reinforcement Learning
Based algorithms can be used for solving the problems with existing recommender
systems.

5 Features of RL that Can Help Solve the Problems

5.1 Discounted Future Rewards

Reinforcement Learning based algorithms use discounted future rewards. The


rewards that an agent will receive after a few actions are also considered while
deciding the action for the current state. Hence, long term benefits can be achieved
with the help of RL based methods [9].
Deep Reinforcement Learning Based Personalized … 243

5.2 Exploration-Exploitation Control

Reinforcement Learning based algorithms provide the facility to control the explo-
ration (taking random action) and the exploitation (taking greedy action). The ε-
greedy policy allows us to control the exploration of an agent. Moreover, exploration
decay parameter is also used while building Deep Reinforcement Learning Models
in order to reduce the exploration of the agent after certain iterations or actions.

5.3 Ability to Learn in Dynamic Environments

Reinforcement Learning and Deep Reinforcement Learning algorithms are being


used for training robots to work in dynamic and real-time environments. Besides,
these algorithms are also being used for training an agent to play various games.
The agent gets the information about the environment by taking various actions and
then behaves optimally once sufficient training has been done [DeepMind]. Hence,
the RL based recommendation systems can also work efficiently with the dynamic
environment.

6 The Proposed Framework

In the preceding section, we saw various diseases, along with some shocking statis-
tics. It is very clear that the problem persists. Here is how we can contribute to a
possible solution for the same. So, here is the three-layer framework named “Deep
Reinforcement Learning based Personalized Health Recommendation Framework”.
The first layer is the data preprocessing layer. The second layer is the disease iden-
tification and prediction layer, and the third layer is the recommendation generation
layer. An overview of the same is shown in Fig. 4.
We have discussed earlier about the fact that millions of gigabytes of data being
generated every second. Websites, smartphones, wearable devices, hospital reports,
etc. are found to be the key contributors for the same. Accumulation of such huge
amount of data which is varied, versatile, volumetric, velocious and veracious in
nature is nowadays being referred to as Big Data.
As shown in Fig. 4, data collected from various sources have to be integrated first.
The process of integration is cumbersome, because of the irregular and inconsistent
structure and format of the data gathered from variety of sources. However, it is a
necessary step. In the proposed framework, the integration is done keeping patients
as subjects. Each patient can be assigned a patient ID, unique worldwide, and all
the data concerning that patient can be stored in a semi structured format, giving us
the flexibility to accommodate structured, semi structured as well as unstructured
244 J. Mulani et al.

Fig. 4 Framework overview

data, which may be obtained from reports, wearable devices, health records, hospital
patient records, etc.
Moreover, just integration is not sufficient. Quality data mining techniques have
to be employed for preprocessing the data before actually using it. The detailed
description of preprocessing as well as the usage of the data has been discussed
below.

6.1 The Data Preprocessing Layer

As discussed earlier, we can collect huge and huge volumes of data from a myriad of
sources. All these data, however, are raw and cannot be used directly. The data that
we have, consists of many heterogeneous parameters. Some of the common issues
with all the raw data that we have are:
1. Missing Data: It is not possible that we get all the details about all the people,
especially patients. We have to deal with the missing data. There are several
alternatives as to how to deal with them. Some of them are:
Replace with a constant: If we dig deeper, and think about the reason behind
the missing data, there is a high probability of that person not suffering from that
disease. Hence, no test results about that particular attribute is available, or the
case may be completely opposite. That the person is not aware about any such
test, or even that there is a possibility that he may suffer from such a disease in
the foreseeable future. So considering both the scenarios, we can convert it to
two records, by duplicating it. In the first record, we replace the missing value
with the value of that attribute for a normal person. We use this data to predict
the disease. The second record, we ignore that parameter, or if it is possible to
use some alternative, may be less correlated parameter for the prediction of the
disease can be used. Finally, both the prediction’s chances can be either compared
Deep Reinforcement Learning Based Personalized … 245

and maximum is chosen, or a mean of both the predictions can be taken as the
final result.
Interpolation: Another possibility of the missing data may be that the person did
not undergo a particular test for a particular year. But, the data for preceding and
succeeding time periods are available. Different types of interpolation techniques
can be employed to fill the missing information.
2. Data formats: The data that we plan to collect are from different sources, col-
lected and maintained by different organizations, about different diseases, and
different hospitals and stored under different models (unstructured, structured or
semi-structured). The best way to deal with such data is to convert the data to a
format that aids in accommodation of not just presently available data, but also
that the data generated and collected over years to come. XML, or JSON are the
best formats for the same. Many document-based databases help converting data
from different formats to the said formats.
3. Normalization: Deep learning and machine learning algorithms require data in
the normalized form.
4. Data Integration: The data that we collect and preprocess have to be integrated
in a format that is compliant with machine learning and deep learning algorithms’
input formats. Hence, integration of the data is also an important step before using
the data.

6.2 The Disease Prediction Layer

After preprocessing the data, we move to the disease prediction module. The pro-
cessed and integrated data are now fed to the disease prediction layer. In this layer,
we try to employ the most accurate existing machine learning based algorithms to
predict the chances of occurrences of some of the common diseases that we target
for recommendation generation.

6.2.1 Diseases

1. Obesity
Obesity has increased at an alarming rate since the last few decades. A survey in
the USA conducted by The Centers for Disease Control and Prevention (CDC)
reveals that around 39.8% of the population in the US is obese. High obesity
leads to heart attack, Type-2 diabetes and certain types of cancer [10]. CDC has
initiated many campaigns in order to make people aware of it. Research shows
that if you could detect obesity before the age of 5, necessary steps can be taken
to prevent it.
The SVM (Support Vector Machine) helps us the best in finding whether a
person is suffering/will suffer from obesity [11]. It is tested upon the National
246 J. Mulani et al.

Human Genome Research Institute Catalog, which is a manually curated and


publicly available database.
2. Heart Disease
Next, we take upon heart disease prediction. The recent reports unfold that
heart attacks have become the major cause of death, especially death due to
some medical illness. Many researches have contributed to providing optimized
solutions for predicting whether a person is suffering from any heart disease.
Once again, the Support Vector Machine [12] algorithm is preferred over others.
A typical heart disease-related dataset contains a total of 13 features and 1
target variable. Some among other data contributors are the wearable devices.
Accuracy up to 87% can be achieved.
3. Diabetes
One of the most chronic and frequently occurring diseases is Diabetes. Until
2015, 30.3 million people in the USA, or 9.4 percent of their population, suffered
from diabetes. The more shocking information is that 1 in 4 of them was not
knowing that they have it. This problem is solved by the advances in Machine
Learning.
It is recommended to use a Naive Bayes Classifier algorithm to predict the pres-
ence of diabetic sugar. It outperformed other classifiers by giving an accuracy
of 76.3% [13]
4. Rheumatoid Arthritis
Rheumatism is a torment in the musculoskeletal framework that brings down
the personal satisfaction of patients. It is very imperative to foresee patients
who will create rheumatic illnesses as far as personal satisfaction. Some of the
common symptoms are people developing fatigue, ambiguous pain in muscles
and joints, anorexia, etc. These are difficult to diagnose unless the patient is
aware of themselves. Some of the other symptoms that can be identified are
morning stiffness, inflammation in hand and wrists, etc.
Some features that can be used for early prediction are Rheumatoid factor,
anti-CCP, BUN, T_Cholestoral, LDL, HDL, TG, Glucose, ESR and CRP. The
best proposed algorithm for prediction of this disease is K-Means Clustering
algorithm, giving an accuracy of 84%, with k = 4.
5. Liver Disease
Any disturbance which causes a disturbance in the functioning of liver which
can lead to illness is termed as Liver Disease. It is also known as hepatic disease.
According to the World Health Organization (WHO) report, around 3% of the
world’s population is infected with hepatitis C. Out of every 6 infected people,
5 are unaware of their disease. It’s very important for the people to know about
it as liver coordinates some critical activities within the body.
We try to predict the disease and help the person know about it. We are using
the C4.5 decision tree algorithm to predict the disease [14]. We have tested it
on the UCI Liver dataset.
6. Asthma
It is a chronic disease where the bronchial tubes, present inside the airway of the
lungs become swollen or inflamed, making it more susceptible to an allergic
Deep Reinforcement Learning Based Personalized … 247

reaction. Moreover, the swelling makes the movement of air, to and fro the
lungs difficult, causing troubles while breathing. Annual U.S. expenditures for
Asthma are $56 billion. Around 8.3% of the people are suffering from one or
the other form of Asthma. These numbers clearly justify a need for an effective
method to deliver an intervention to identify severe exacerbations before the
patient actually experiences it.
So, in the paper [15], they have built an efficient prediction system that helps
address this alarming issue by using data prepared by Daily Asthma Diary, on
an Adaptive Bayesian Network algorithm to achieve a sensitivity of 73.8% and
specificity of 71.4%.
7. Dementia
Dementia is a neurodegenerative brain disease that results in causing the death
of nerve cells. The damage of nerve cells interferes with the ability of the cells to
communicate with each other. Dementia may not be termed as a specific disease,
rather, it is usually referred to as a term that describes a group of symptoms
associated with a decline in memory or other skills that hinders the person’s
ability to perform daily tasks. Alzheimer’s disease accounts for 60–80% of the
cases followed by vascular dementia. A Naive Bayes Classifier [16] is advised
to be used to predict the disease using the available data.
8. Thyroid
In India, around 42 million people suffer from thyroid disorders, mainly through
hypothyroidism. Every 1 among 10 adults is suffering through it. Most of the
patients include women. Every 3 out of 10 women suffering from this disease
are unaware of it. It is often confused with obesity.
With the help of an Artificial Neural Network (ANN), we try to figure out
whether the person is suffering from it or not. We have used the Thyroid Disease
Dataset from the UCI Machine Learning Repository for our framework.
9. Urine Infectious
Around 150 million people are reported to be diagnosed with Urinary Tract
Infection (UTI) per year. It is a common disease among women, due to their
urethral anatomy. This leads to some serious danger to life. People suffering
from it should be diagnosed frequently.
It is very hard to diagnose a person suffering from UTI as its most of the symp-
toms are similar to those caused by inflammation, etc. Using Back Propagation
Neural Network, we try to predict this disease, with complex symptoms [17].
The algorithm works on the following parameters;
Anamnesis: Gender, Age, Fever, Sudation, Chill, Low back pain, Suprapubic
Pain, Malaise, Dysuria, Pollakiuria
248 J. Mulani et al.

Full urine analysis: Urine culture, Glucose, Leucocyte, Erythrocyte, Protein


Ultrasound: Renal ultrasound, Bladder ultrasound.
10. Infectious Diseases
Infectious diseases tend to be visible enough. Still, some of them need some
care to be taken so as to improve health conditions. As they are infectious, they
are susceptible to spread quickly. They need to be addressed on time. Due to
their vast variety, LSTM (Long-Short Term Memory) and DNN (Deep Neural
Networks) are best-preferred algorithms. Chae et al. [18] They may be the
easiest to be noticed, but to get to the root cause is really difficult. Some basic
care can be taken so as to mitigate their harmful effects on a person as well as
the people around him.

State Representation Module


Actor network of the Actor-Critic model takes state or features of a state as an input
and gives the recommendations based on the given input [19]. The output obtained
from Disease Prediction Layer module contains the probability of occurrences of
each of the mentioned diseases. These probabilities along with patient’s general
information is given as an input to the State Representation Module. The state rep-
resentation module, will, as the name suggests, represent the said information in the
format required by the actor model to function. Basically, it represents values as a
finite number of states that are used to predict the actions in the actor model.

6.3 The Recommendation Generation Layer

The recommendation generation layer module consists of an Actor-Critic Model.


Figure 6 shows the architecture of Actor-Critic Model. The module can be divided
in the following 2 parts.

6.3.1 Actor Network

Actor network, also called as policy network is shown on the left part of Fig. 5. The
actor network generates the action a based on the given state s and tries to learn
the policy by adjusting the parameters of neural network. Here, the actor tries to
approximate the policy to give the recommendations related to health. The details
of these recommendations generated are specified in Sect. 6.3.3. Actor network
receives a state an input from State Representation Module. Based on the input, the
actor predicts the corresponding recommendations to be given. The parameters of
the actor networks are updated from the Q values of each of the state-action pair (s,
a) produced by the critic network. Policy Gradient algorithm is used for updating the
parameters of the actor network (Fig. 6).
Deep Reinforcement Learning Based Personalized … 249

Fig. 5 Disease prediction layer

Fig. 6 Recommendation Generation Layer

6.3.2 Critic Network

Critic network, also called as target network is shown on the right part of Fig. 5. The
critic network tries to approximate the value function for the system based on the
rewards r (s, a) obtained from the environment, after the actor takes an action a from
the current states s. The reward function mainly depends on the environment in which
the system is being implemented. Positive reward for pertinent recommendations and
negative reward for irrelevant recommendations can be given from the environment.
250 J. Mulani et al.

The Temporal Difference learning based error is used for updating the parameters
of critic network by calculating the TD error from the reward obtained and the Q
values predicted. The output generated by Critic Network Q (s, a) is also used for
evaluating and the actions of actor and updating the parameters of actor network as
mentioned earlier.

6.3.3 Interpreting the Outputs

The activation function being tanh(x) produces the outputs ranging from −1 to 1. It
will be an array of size equal to the number of output neurons. Each of the output
neurons represents a health-related recommendation. For example, walking, jogging,
playing a particular sport, a recommended diet, a specific set of physical activities to
be carried out, etc. The number of output neurons depend on the scale of application
and the number of diseases that are targeted. The question arises that how will this
array of numbers help in generating actual recommendations.
The following steps are suggested to generate apt recommendations:
Step 1: Categorize
The probabilities and age groups may be categorized as shown in Table 1.
These categories can be altered and adjusted as per the requirements. They are
made to help the end users (the ones for whom the recommendations are generated).
Then we generate the recommendations based on these categories.
Step 2: Generate
After categorizing, the recommendations that can be generated combining the outputs
of Disease Prediction layer, age groups and probability of occurrences as shown in
Table 2. Now, the question arises that how are these recommendations communicated
to the target user. If we have a dedicated medical portal, we can do show pop-ups
when the user is logged in. However, in absence of such a facility, we can use the
push notification services that may include media like e-mail, SMS, etc.

7 Future Improvements

The system proposed here has various scopes of improvements in the future. Some
of the possible improvements that we aim to do have been mentioned below.

7.1 Actor-Critic Recommendation System

The approach presented here uses Actor-Critic model for generating recommenda-
tions related to health. However, with advancements in Deep Reinforcement Learning
Deep Reinforcement Learning Based Personalized … 251

Table 1 Categories of
Categories of probability of occurance of disease
probabilities and age
Probability of occurrence of disease (%) Category
0–25 Very low
26–50 Low
51–75 High
75–100 Very high
Categories of age group
Age group Category
10–20 Adolescent
21–30 Young
31–40 Adult
41–50 Middle aged
51–60 Old
61–80 Veteran

Table 2 Generated recommendations


Disease: diabetes
Probability of Age group Output of actor network Recommendation
occurrence Context Value
Low Young Jogging 0.5 Jogging is more
Walking 0.3 recommended than
brisk walking for
Swimming 0.7 young people, along
Aerobics 0.8 with exercises that
include swimming,
aerobics, etc.
High Veteran Walking 0.8 For elderly people with
Jogging/ -0.4 a high probability of
Running occurrence of diabetes,
walking is better
Yoga 0.5 recommended than
running. Moreover,
instead of vigorous
exercises, yoga is
recommended,
considering the age
range
Related diseases: Obesity
252 J. Mulani et al.

based algorithms, various improved and efficient algorithms can be used for generat-
ing recommendations by interacting with the environment. For example, Hindsight
Experience Replay (HER) uses the mistakes committed by the model to learn a bet-
ter policy [20]. Hence, current model can be combined with HER can be used for
achieving higher accuracy by learning the negative rewards obtained from the wrong
recommendations generated.

7.2 Recommendations

The proposed system mainly gives general recommendations for physical activities
and food. However, various other health related recommendations can be incorpo-
rated in order to give an efficient and personalized recommendations to each of the
patient.

7.3 Data Preprocessing

The system aims to collect readily available data from hospitals, wearable devices
and laboratories. However, availability of a slightly more specific data such as family
health history of a particular patient, type of physical activities that a patient is
performing daily and other details about the patient’s daily routines can significantly
improve the disease prediction accuracy of the model. Hence, more specific and
accurate recommendations can be generated based on the disease and the daily routine
of the patient. Information retrieval systems can be employed for collecting data in
a much efficient manner.

7.4 Disease Prediction

A generalized approach can be developed for prediction of various diseases based on


the details of the patient provided. This can help us achieve the goal of “Personalized
Health Recommendations” in an efficient and a robust way.

8 Conclusion

The road to health is paved with good recommendations. Healthcare development


has been found to be one of the fastest growing fields among others including space
exploration, software development, etc. With the advancement in technology, it is
now possible to build and achieve what couldn’t even be imagined a few years
Deep Reinforcement Learning Based Personalized … 253

back. The proposed approach is a step closer towards building such systems by
exploiting the technology we have. In this chapter, we have tried to use the concepts
of Recommendation Systems, Reinforcement Learning, Machine Learning and Big
Data for the same. The increasing health consciousness among people, along with
gigantic growth of data and improvements in technology make this framework a
promising work for the future. Because of its personalized solutions, this can even
be a propitious business model.

References

1. Elgendy, N., Elragal, A.: Big data analytics: a literature review paper. In: Industrial Conference
on Data Mining, pp. 214–227. Springer, Cham (2014, July)
2. Pan, C., Li, W.: Research paper recommendation with topic analysis. In: 2010 International
Conference On Computer Design and Applications, vol. 4, pp. V4–264. IEEE (2010, June)
3. Han, Q., Ji, M., de Troya, I.M.D.R., Gaur, M., Zejnilovic, L.: A hybrid recommender system
for patient-doctor matchmaking in primary care. In: 2018 IEEE 5th International Conference
on Data Science and Advanced Analytics (DSAA), pp. 481–490. IEEE (2018, Oct)
4. Wiesner, M., Pfeifer, D.: Health recommender systems: concepts, requirements, technical
basics and challenges. Int. J. Environ. Res. Public Health 11(3), 2580–2607 (2014). https://doi.
org/10.3390/ijerph110302580
5. Patgiri, R., Ahmed, A.: Big data: the v’s of the game changer paradigm. In: 2016 IEEE 18th
International Conference on High Performance Computing and Communications; IEEE 14th
International Conference on Smart City; IEEE 2nd International Conference on Data Science
and Systems (HPCC/SmartCity/DSS), pp. 17–24. IEEE (2016, Dec)
6. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller,
M.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. (2013)
7. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network
architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581. (2015)
8. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforce-
ment learning with function approximation. In: Advances in Neural Information Processing
Systems, pp. 1057–1063. (2000)
9. Zhao, X., Zhang, L., Ding, Z., Yin, D., Zhao, Y., Tang, J.: Deep reinforcement learning for
list-wise recommendations. CoRR, vol. abs/1801.00209. (2018)
10. Mokdad, A.H., Ford, E.S., Bowman, B.A., Dietz, W.H., Vinicor, F., Bales, V.S., Marks, J.S.:
Prevalence of obesity, diabetes, and obesity-related health risk factors, 2001. JAMA 289(1),
76–79 (2003)
11. Montañez, C.A.C., Fergus, P., Hussain, A., Al-Jumeily, D., Abdulaimma, B., Hind, J., Radi,
N.: Machine learning approaches for the prediction of obesity using publicly available genetic
profiles. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2743–2750.
IEEE (2017, May)
12. Sharmila, R., Chellammal, S.: A conceptual method to enhance the prediction of heart diseases
using the data techniques. Int. J. Comput. Sci. Eng. (2018, May)
13. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Proc. Comput.
Sci. 132, 1578–1585 (2018)
14. Sindhuja, D., Priyadarsini, R.J.: A survey on classification techniques in data mining for ana-
lyzing liver disease disorder. Int. J. Comput. Sci. Mob. Comput. 5(5), 483–488 (2016)
15. Finkelstein, J.: Machine learning approaches to personalize early prediction of asthma exacer-
bations. Ann. New York Acad. Sci. 1387(1), 153–165 (2017)
16. Jammeh, E.A., Camille, B.C., Stephen, W.P., Escudero, J., Anastasiou, A., Zhao, P., Chenore,
T., Zajicek, J., Ifeachor, E.: Machine-learning based identification of undiagnosed dementia in
primary care: a feasibility study. BJGP open 2(2). bjgpopen18X101589. (2018)
254 J. Mulani et al.

17. Ozkan, I.A., Koklu, M., Sert, I.U.: Diagnosis of urinary tract infection based on artificial
intelligence methods. Comput. Methods Progr. Biomed. 166, 51–59 (2018)
18. Chae, S., Kwon, S., Lee, D.: Predicting infectious disease using deep learning and big data.
Int. J. Environ. Res. Public Health 15(8), 1596 (2018)
19. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., et al.: Deep reinforcement learning
based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:
1810.12027. (2018)
20. Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., et al.: Hindsight
experience replay. In: Advances in Neural Information Processing Systems, pp. 5048–5058.
(2017)

Jayraj Mulani is pursuing B. Tech from Institute of Technology, Nirma University; currently
studying in the penultimate year. He was a versatile student at high school being beneficiary of
The Best Student award for an all-round performance at Divine Child School. Being passionate
about learning new things, he has always looked to explore and work on technologies ranging from
development to modelling to machine learning. His areas of interest include data science, image
processing, recommendation techniques, machine learning, deep learning, reinforcement learning
and information retrieval. He is a computer science enthusiast and aims to pursue higher educa-
tion from one of the best universities across the globe. His interest in recommendation systems,
combined with the alarming health related issues people face, has led him and his friends to think
of this personalized health recommendation system. He hopes that it will change the way to gener-
ate personalized recommendations and this technology becomes domain independent. He strongly
believes in solving data-driven problems using the state-of-the-art tools ruling the Indian market.

Sachin Heda is a third-year undergraduate pursuing B. Tech from Institute of Technology, Nirma
University. Since childhood he has engaged himself in solving real life problems. He is an enthu-
siastic learner and has high conceptual clarity. He was awarded with the Head Boy of his School.
Being a computer science enthusiast, he had explore and work on many technologies including
data science, recommendation techniques, big data. He is an extrovert person and thinks that now
people are becoming more aware about their health. He and his friends think that this personalized
health system will bring a huge change (positive) in the lives of people.

Kalpan Tumdi is a third-year undergraduate pursuing B. Tech from Institute of Technology,


Nirma University. His enthusiasm and passion to learn and develop new things and a keen interest
in Machine Learning influenced him to get involved in various Machine Learning related research.
His areas of interest include data science, image processing, reinforcement learning, machine
learning and computer vision. As people are becoming more and more health conscious, he aims
to use machine learning and deep reinforcement algorithms to help people get knowledge about
improving their health and fitness.

Prof. Jitali Patel is working as an Assistant Professor in Computer Science and Engineering
Department at Institute of Technology, Nirma University. She obtained post-graduation degree
ME(CE) from Dharmsinh Desai University in the year 2010. She has an experience of more than
10 years in the field of Teaching. She has taught Artificial Intelligence, Information Retrieval, Data
Mining, Object Oriented Programming and Data Structure. Her area of interest and research are
Machine Learning and its applications. She has published more than 10 peer review research arti-
cles
Deep Reinforcement Learning Based Personalized … 255

Hitesh Chhinkaniwala is an associate professor and Head of the department in Information


and Communication Technology, Adani Institute of Infrastructure Engineering, India. His area
of interest are Data Mining and Knowledge Discovery, Privacy Preserving, Text Mining, Text
summarization, Information Extraction, Sentiment Analysis, Statistical Data Analysis and Ontol-
ogy Learning. He has published more than 20 peer review research articles and a book. He is a
reviewer of Transactions on Knowledge Discovery from Data (TKDD)

Prof. Jigna Patel is working as an assistant professor in Computer Science and Engineering
Department at Institute of Technology, Nirma University. She obtained post graduate degree ME
from Dharmsinh Desai University in the year 2008. She has experience of more than 10 years
in the field of teaching. She has taught Theory of Computation, Cyber Security, Artificial Intel-
ligence, Big Data Analytics, Principles of Programming Language, Mathematical Foundation for
Computer Science and C Programming. Her area of interest and research are Data Warehousing,
Data Mining and Big Data Analytics
Using Deep Learning Based Natural
Language Processing Techniques
for Clinical Decision-Making with EHRs

Runjie Zhu, Xinhui Tu and Jimmy Huang

Abstract Natural language processing (NLP) is an interdisciplinary domain of


research that focuses on the interactions between human languages and comput-
ers. There has been a recent trend of solving the NLP problems using deep learning
approach. The applications of deep learning in the healthcare sector are mostly con-
sidered to be related to canonical examples of applying image processing and com-
puter vision techniques to medical scans for disease diagnoses. Electronic Health
Record (EHR) is another source of data often being neglected, equally if not more
important than medical scans, that can change the way we learn useful features
and information from the medical records of patients. These text-based information
stored within the EHR are data-rich by nature, but are often not well-understood due
to its characteristics of high volume, variety, velocity and complexity. However, these
specific characteristics fit right to the nature of deep learning. Therefore, we believe
it is the right time to summarize the current status, to review and learn from the state-
of-the-art medical-based NLP techniques. Different from the existing reviews, we
examine and categorize the current deep learning-based NLP techniques in medical
domain into three major purposes: representation learning, information extraction
and clinical predictions. Meanwhile, we discuss whether the application of deep
learning methods has tackled the problems differently and transformed these tasks
revolutionarily. Based on the results, we find that the distance to revolutionize the
existing healthcare sector using deep learning methods still remains long. However,
the recent progress made by these proposed methods have already made a promis-
ing good start. Furthermore, we state some of the legal and ethical considerations,

R. Zhu (B)
Information Retrieval and Knowledge Management Research Lab, Department of Electrical
Engineering and Computer Science, York University, Toronto, Canada
e-mail: [email protected]
X. Tu (B)
School of Computer Science, Central China Normal University, Wuhan, China
e-mail: [email protected]
J. Huang (B)
Information Retrieval and Knowledge Management Research Lab, School of Information
Technology, York University, Toronto, Canada
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 257


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_13
258 R. Zhu et al.

present the status quo of the healthcare industry applications, and provide several
possible directions of future research.

Keywords Deep learning · Electronic health records · Natural language


processing · Representation learning · Information extraction · Clinical predictions

1 Introduction

Health problems remain as the central issue for human lives. Healthcare is a concept
which usually refers to a set of services exercised by health professionals that can
help the patients to improve or maintain health by body check, diagnosis, treatment,
restoration, and prevention etc. Because of the nature of the healthcare sector, it
can produce data in many different forms and structures, such as DNA sequences,
medical scans, and electronic health records (EHRs), at a large scale and an unprece-
dented speed by the continuously growing number of patients, medical facilities, and
healthcare providers. However, the data provided by these healthcare units are often
not well-processed nor well-understood due to its features of high volume, velocity,
variety and complexity. Thus, the adoption of deep learning-based natural language
processing technologies into healthcare studies is increasingly common over the past
few decades.
Deep learning, as a subgroup of machine learning, is a class of algorithms that
can extract features directly from a set of given raw inputs using hidden layers
without human intervention. In the past few years, this class of algorithms have
caught attention of researchers due to its promising and robust results across a wide
variety of tasks and domains. It is now widely accepted that deep learning approaches,
such as convolutional neural networks (CNN), recurrent neural network (RNN) etc.,
can perform well and achieve robust results given different data structures and under
different circumstances. In healthcare, for example, it is more common to apply
convolutional neural network to examine radiology images, whereas for text-based
medical notes, recurrent neural network is more commonly seen.
The Electronic Health Record (EHR) systems, as a component of the healthcare
industry, has gradually drawn more attention recently. The EHR refers to a system
that collects all patients’ health information to store in digital format. Researchers
and healthcare professions consider EHRs to be a very important source of medical
data to provide insights to domain problems. However, as the data types of EHR vary
extensively, and that the system contains a great amount of free text, it has been chal-
lenging for traditional models to tackle these problems. Thus, deep learning models
which can extract features without human interventions are particularly well suited
to solve the EHR problems. The useful information in EHR system include but are
not limited to, patient demographics, lab results, medical scan images, prescriptions,
medical history, and clinical notes, etc. Hospitals initially adopt the EHR system to
store all patients’ data for tracking care, as well as to achieve administrative and
Using Deep Learning Based Natural Language … 259

billing purposes. In fact, these EHRs can provide undiscoverable insights to capture
disease trends, make chronical decisions and draw medical conclusions.
Among all the information that an EHR system contains, text-based clinical notes
are one of the most important resources of the patient’s EHR, however, doctors,
nurses, physiotherapists and pharmacists usually complete each section separately
using different medical wordings and representations. This has created difficulties
for information alignment. Hence, the biggest challenge in the current EHR system
is that all the EHRs are structurally different. It is already time consuming to align
the information in medical profiles, not even to create a pipeline for these EHR
notes processing, term extractions, embeddings or aligning over years, and across
hospitals or other units of healthcare providers. Therefore, the current goal of EHR
management is to increase clinical efficiency by empowering the physicians with
better and more user-friendly EHR systems. Moreover, the system has to help by
lowering the clinical diagnosing costs and minimizing the possibilities of medical
misdiagnoses.
In this chapter, we present the current status of deep learning-based methods that
are adopted in healthcare sector. Then we discuss the unique challenges, and give
the directions of both clinical and technical opportunities for future work. In second
section, we lay out an overview of the existing deep learning methods and cover
the backgrounds and motivations of applying deep learning approaches to medical
domain. In Sect. 3, we examine and categorize the current deep learning-based NLP
techniques into three major purposes: representation learning, information extraction
and clinical predictions. Meanwhile, we compare the experimental results of these
deep learning-based NLP models in medical domain and focus on the novelty and
diversity of these techniques, as well as their evaluation metrics. In Sect. 4, we present
some other related application themes and issues of deep learning in healthcare,
following by the discussion on legal & ethical considerations, as well as the industry
applications. Finally, in Sect. 5, we acknowledge that the recent progress made by
these proposed methods have already made a promising good start. Last but not least,
we provide several possible and promising directions of future research.

2 Deep Learning for Natural Language Processing

Natural language processing is a hot research field in computer science and artificial
intelligence focusing on the interactions between human languages and computers.
Specifically, it suggests how to represent, process and analyze the large amount of
natural language data.
Neural networks are powerful learning models in general. Deep learning
approaches have gained impressive successes in image and speech recognition. In
the past few years, natural language processing has also taken great advantage of
the deep learning algorithms and methods to achieve great advances. There has seen
an attention shifted from traditional machine learning models, such as Support Vec-
tor Machines (SVM) and Logistic Regressions, towards the deep neural network
260 R. Zhu et al.

models such as CNN and RNN. These deep learning approaches eliminate the time-
consuming work on hand-crafted features and replace them with automatic feature
learning.
In this section, we present the main deep learning approaches and architectures
applied in natural languages processing research field. The major approaches include
distributed representations which is the foundation of deep learning models, CNN,
RNNs and Transformer-based Neural Networks.

2.1 Distributed Representation

In the past, local representation is often used to store memories and represent entities
with single element directly. It is an easy to understand and easy to implement
structure but very inefficient, as each unit is associated with only one represented
thing [1]. Distributed representation, however, provides an effective and efficient
way of using more than one representational elements to represent each unit. As the
representations of different units overlap in the neural network, it is possible for the
network to respond to a new input based on its generalization capability. The network
is able to output significant features automatically by pretraining all raw data inputs.
In most cases of our research, scholars do not have large enough annotated data to
use as features to classify tasks. Therefore, we need an unsupervised approach like
distributed representation to pretrain data and to embed words with similar meanings
to similar vectors.

2.1.1 Word Embeddings

Word embeddings follow distributional hypothesis in a theory that words or terms


with similar meanings are more likely to present in similar contexts. It typically serves
as data preprocessing function in the first layer of the deep learning architectures.
While it can also capture the target term’s neighboring context to calculate the degree
of similarity between words. In most of the cases, word embeddings are pretrained
and learnt in order to be applied to the large text corpus to capture syntactic and
semantic features of the text document collections. It applies invariable and reusable
embeddings to learn word representations in context.

2.1.2 Word2Vec

Word2Vec, created by Tomas Mikolov [2–4], refers to a group of models taking large
text corpus with hundreds of dimensions as inputs to produce word embeddings. It
maps each unique word in the text corpus to a corresponding vector in the space.
And these embedded word vectors are positioned in a way that similar context-based
words should be located nearer to each other. These models are structured as neural
Using Deep Learning Based Natural Language … 261

networks of two layers, aiming to rebuild the context of words among the entire
text corpus. Although Word2Vec is not a deep neural network (DNN), it serves
to transform texts into numerical values that DNN can understand without human
intervention.
In general, Word2Vec trains the inputs against all other neighboring words in
context in one of the two ways, Skip-Gram or Continuous Bag of Words. Skip-Gram
is a way of predicting target context with a given word. Specifically, the goal of the
Skip-Gram model is to achieve the maximization of the average log probability by,

1  
T
log p(wt+ j |wt ) (1)
T t=1 c≤ j≤c, j=0

Given a sequence of words, w1 , w2 , . . . .wt , for training, where c is the training


context size. The more training examples the model gets, the more accurate it can
produce the results. The basic Skip-Gram model defining p(wt+ j | wt ) with the
softmax function is presented as follow:
  
exp vwO vw I
p(w O |w I ) = W   (2)

w=1 exp vw vw I

where vw and vw are the representations of vector inputs and outputs, W is the amount
of words in the vocabulary.
Continuous Bag of Words (CBOW) is a way of predicting a target word with
given context. It is built on top of the bag of word concept. Bag of Words (BoW)
is a method of simplifying representation in NLP. It represents a text as a bag of
its words without restrictions on word orderings or grammar. It is often used for
training a classifier to classify documents or texts by counting frequencies of each
word as a feature. CBOW is a way of representing an unbounded number of features
with fixed size of vectors when the number of features is unknown in advance. The
CBOW works in a very similar way as the approach of BoW that can sum or average
the embedded vectors of the corresponding vectors while ignoring the word order
information:

1
k
CBOW( f 1 , f 2 , . . . , f k ) = v( f i ) (3)
k i=1

Weighted CBOW is a simple variation of CBOW where each vector manages to


receive different weights, by associating a weight ai to feature f i ,

1 
k
WCBOW( f 1 , f 2 , . . . , f k ) = k ai v( f i ) (4)
i=1 ai i=1
262 R. Zhu et al.

Chalapathy et al. [5] propose an RNN approach with bidirectional long short-term
memory model and conditional random field decoding to generate word embed-
dings, namely GloVe [6] and Word2Vec [3]. In order to practice concept representa-
tion and extraction, this proposed bidirectional Long Short-Term Memory (LSTM)-
Conditional Random Field (CRF) model allows every single word within a sentence
to be fed in and mapped to a random word embedded vector first. Thus, the targeted
word embeddings and model could be briefly described. Then, the model applies
word embedding training methods of GloVe, Skip-Gram and CBOW to learn the
entire data collection in order to generate vector representations. These sequences of
vectors are thus fed into the RNN based LSTM model which is good at processing
sequential data to produce a class of medical concepts. As the LSTM tend to favor
the most recent input data, Chalapathy et al. computes both forward and backward
state of hidden representation to eliminate the possible biases.
Besides GloVe, the other most popular way of learning medical concept represen-
tations among current researchers in the field is to generate the distributed embed-
dings with Skip-Gram [4, 7]. Distributed representations of words in a vector space
are able to group similar words together effectively to improve the performance of
learning algorithms of natural language processing. The Skip-Gram technique [4]
introduced by Mikolov et al. is an efficient method for learning vector representations
of words from large quantity of unstructured text data. And it is able to predict the
context, thus to capture relations between words.

2.1.3 Contextualized Word Embeddings

Learning high quality representations has never been an easy task. In the past few
years, contextualized word embeddings have achieved impressive results and have
been adopted in many of the recent deep learning NLP models. These contextualized
word embedding approaches are considered to be ideal pre-training word representa-
tion models that can accurately capture complicated characteristics of syntactical and
semantic word use, and understand how these uses change across different natural
language contexts.
ELMO, or Embeddings from Language models, is one of the deep contextual-
ized word representation models. The word vectors are pretrained on a large text
corpus and resulted as learnt functions of all the internal layers or internal states
of a deep bidirectional language model (biLM). In other words, the results of these
vectors, which are stacked above each input word or term for each end task, are
linearly combined in the end to produce a final output. This unique design can boost
the model performance significantly. When these functions serve as an “add up”
onto the existing models, there can be significant improvements in the existing prob-
lems of the NLP domain. The biggest difference in between commonly seen word
embeddings and the ELMO model is that ELMO word representations use the entire
sentence as input. These embeddings are computed on top of the biLMs with char-
acter convolutions, represented as a linear function of internal network states. The
representations generated from the model are contextual, deep, and character-based
Using Deep Learning Based Natural Language … 263

meaning that each word representation depends on the entire context, combines all
layers of the deep neural network, and allows network to form robust representations
for out-of-vocabulary tokens to be trained. The algorithm of the model is presented
as follow:

  
L
E L Moktask = E Rk ; task = γ task s task
j h kLjM (5)
j=0
 −
→L M ← −L M  
Rk = xkL M , h k, j , h k, j | j = 1, . . . , L = h k,
LM
j | j = 0, . . . , L (6)

where Rk is the representation of a set of 2 L-layer biLM +1, h k,


LM
j is the token layer,
γ task
is the optimizing process, the s task
is the weights for softmax normalization,
E L M Ok = E(Rk ; e) stands for the condition when ELMO collapses all layers in
R into a single vector.

2.2 Convolutional Neural Networks (CNN)

Convolutional neural network is a class of deep neural networks in deep learning


that is commonly applied to computer vision [8] and natural language processing
(NLP) studies. It is an analogy to the neurons connectivity pattern in human brains,
and it is a regularized version of multilayer perceptrons which are in fully connected
networks. Specifically, a CNN is made up of one input layer, multiple hidden lay-
ers and an output layer. The hidden layers structurally include convolutional layers,
ReLU (activation function) layers, pooling layers, fully connected layers and nor-
malization layers. Compared to other classification algorithms, CNN requires much
less preprocessing, and it can get better results as the number of trainings increase.
In natural language processing, a CNN is adopted to identify predictive features
in local field from large text corpuses. The features extracted are then processed to
generate vector representations in fixed size of the entire structure. Thus, in essence,
CNN is an effective feature extraction architecture which can identify the predictive
n-gram vocabularies in a sentence automatically. The basic convolutional and pooling
model for NLP serves to adopt learnt nonlinear functions to every example of sliding
window in the size of k word in sentence. These nonlinear functions act as filters
to transform the k words window into a scalar value. By applying several functions
on top, an l dimensional vector is generated with important characteristics of the
words in the window captured. After that, pooling layers of the CNN architecture is
applied to combine the generated vectors from each l dimensional vector, by taking
the maximum and average values of l dimensions in each window. The purpose here
is to target at extracting the most prominent features in the context without taking the
location into account. The combined vector is then applied to a prediction network.
The gradients serve to tune the parameters in the filter functions, and thus will
emphasize those important aspects of the data in the initial given task. In general, as k
264 R. Zhu et al.

size window running over the sentence or text corpus, the filter function automatically
extracts k grams features from the learning experience.

2.3 Recurrent Neural Networks

The network connections between notes in an RNN form a directed graph along
a temporal sequence, which allows it to exercise temporal dynamic behavior. Due
to the characteristics of RNN that can process sequences of inputs, it is common
for NLP tasks such as information extractions and speech recognitions to use RNN
architectures.

2.3.1 Recurrent Neural Network

The simple recurrent neural network architecture is sensitive to the sequential order-
ing of elements. Mikolov explored it in 2012 and applied it in use of language model-
ing [9]. As suggested in the paper, a basic RNN model is a network of nodes organized
in a set of successive layers where each node in the respective layer is connected with
a one way directed connection with every single node in the next successive layer.
These nodes are consisted of input nodes group, output nodes group and the hidden
nodes group. Specifically, each neuron in the layers has a real valued activation which
can change with the time, while each connection contains a real valued weight that
could change with the time as well. By forming a directed graph along a temporal
sequence with the connected nodes, the RNN model can exhibit a dynamic behavior
by using the internal memory state to process the input sequences.
It is worth noting that the basic RNN structure introduced above is less effective
in training due to the problem of vanishing gradients. As the gradients in later steps
cannot reach earlier input signals and diminish fast in the backpropagation process,
the basic RNN model can hardly capture the long range dependencies.

2.3.2 Long Short Term Memory

LSTM was the first to introduce the gating mechanism to solve the vanishing gradi-
ents problem. It is one of the most successful types of RNN architecture in research
field. The feedback connections that distinguish LSTM models from standard feed-
forward neural networks allow it to not only process single data points, but also entire
sequences of data. A classic LSTM model architecture consists of a cell of memory
unit, and three regulator gates, namely an input gate, a forget gate and an output gate.
Theoretically speaking, the memory unit serves to keep track on the elements in the
input sequence and their dependencies. The input gate is in charge of allowing to
what extent the new value input can flow into the memory cell. The forget gate is in
charge of allowing to what extent the new value could remain in the memory cell.
Using Deep Learning Based Natural Language … 265

And the output gate is in charge of controlling to what extent the value in the memory
cell can be used in the activation function, or often known as the logistic function.
In other words, in the proposed model, LSTM splits the state vector into memory
cell, which aims to preserve memory and error gradients across time and working
memory. Several smoothing mathematical functions that are capable of simulating
logical gates are controlling these cells of memories. Within each level of input state,
there is a gate applied to make decisions on to what extent should the new input be
incorporated into the memory cell; and to what extent should the currently existing
content within the memory cell be forgotten.

2.3.3 Gated Recurrent Unit (GRU)

To solve the problem of vanishing gradient or long range dependencies, Cho et al.
[10] proposed LSTM and GRU respectively as gating-based architectures based
on Hochreiter and Schmidhuber’s [11] theories presented earlier in the years. The
GRU architecture uses update gate and reset gate as two vectors to decide which
information inflow should be passed onto the output level. Compared to LSTM, GRU
is able to train the models while keeping the long-ago memories during the training,
without washing them off throughout the time or removing the irrelevant information
for future predictions. Also, it does not involve separate memory component and also
contains significantly fewer gates. In fact, the architecture of the gated recurrent unit
model is similar to a long short term memory model with forget gate.

2.4 Transformer-Based Neural Networks

Bidirectional Encoder Representations from Transformer (BERT) was proposed by


Google in late 2018. Once it was released, it got all the attention from academia and
industry to conduct further research. BERT is constructed with Word Embeddings
and Transformer. In the model, word embeddings carried out in BERT is simply
the low dimensional representations of words projecting onto the high dimension
of vector space. Compared to conventional sequential models such as LSTM, RNN,
GRU etc., the Transformer, presented by Google, is a new NN architecture that is
more effective in modeling tokens’ long term dependences in temporal sequences.
And it is more efficient in training while eliminating sequential dependencies from
previous tokens. Instead of sequentially feeding in results, Transformer performs
an encoder to decoder architecture where the model adopts an attention system to
forward the entire big picture of the whole sequence to decoder as output. Therefore,
BERT is a model incorporating all features above by using encoders solely. In fact,
Google developed two specific versions of BERT model, namely B E RTBase and
B E RTLarge . B E RTBase is a basic BERT model consisting of 12 transformer blacks,
266 R. Zhu et al.

768 hidden layers and 12 attention heads; while the B E RTLarge model is a much
bigger model consisting of 24 transformer blacks, 1024 hidden layers and 16 attention
heads [12].
GPT-2 proposed by OpenAI stole the thunder arose by BERT just a bit after.
The model is a large transformer-based language model with 1.5 billion parameters
trained on a dataset of 8 million web pages [13]. It is easy to train to make predictions
on the upcoming word, given all previous words in the context, in 40 GB of Internet
text. Indeed, it is capable of generating astonishing and promising results which
elevated the NLP study furthermore. Particularly, it demonstrates the unprecedented
capability of generating synthetic text samples. And the results have shown that it
generally outperforms other language models which trained on the same domain
without training on the domain specific datasets. The developers decided not to
release data or parameters of the biggest model, therefore it will not be elaborated
here further.
It is worth noting that compared to the RNN and LSTM models, the Transformer-
based neural networks (NN) are more hardware friendly. The existing problem with
RNN LSTM model is that they are difficult to train, since the memory bandwidth
bound computation is a must in process. It is the headache for many hardware design-
ers since training the network takes up a lot of resources in the cloud whereas the
cloud is not scalable in nature. Thus, the applicability of these solutions is limited. For
example, running the LSTM model requires four linear layers which takes up great
amount of memory bandwidth to be computed for each cell and for each sequence of
the time step. Whereas for Transformer-based approaches, only a 2D convolutional
based NN with causal convolution is required for the test, and the generated results
can be even better.

2.5 Generative Adversarial Network

Generative Adversarial Network (GAN) is proposed by Ian Goodfellow in 2014.


It serves as a set of deep neural network architectures where two neural networks
compete with each other, with one acting as a generator and the other acting as
a discriminator. Specifically, the generator is in charge of generating all different
sorts of data, including text, image, music, speech etc., that look close enough as
the original training set; while the discriminator serves to identify whether the input
data is an authentic real training data or a make-up data. The potential of the GAN
model is huge as they are capable of learning to mimic any type and distribution
of data, including EHR data in medical domain. When GAN is applied to the EHR
systems, the generator in the model will create new and synthetic patient’s records
that passes onto the discriminator. The mission of the generator is to produce fake
“authentic” clinical data that the discriminator will not be able to catch. However, the
discriminator’s goal is to take in both real and fake medical data and return with one
Using Deep Learning Based Natural Language … 267

or a few values of possibilities in between 0 to 1 (0 meaning fake while 1 meaning


authentic) to represent how likely the given data is real. Many recent papers proceed
to use GAN as their architecture for EHR studies, which will be further discussed in
the next section.

3 Major Applications of Deep Learning in Medical


Information Processing

In the past few years, we have seen a rising trend of applying deep learning-based NLP
techniques to medical information processing. The current goal of EHR management
is to increase clinical efficiency by empowering the physicians with better and more
user-friendly EHR systems. Moreover, the system has to help lowering the clinical
diagnoses costs and minimizing the medical misdiagnoses possibilities. In fact, in
the past few years, NN-based representation learning has gained promising results
in many fields, and many natural language processing applications of representation
learning have been developed. Hinton [14] introduced distributed representation for
symbolic data in his paper in 1986. The idea is to form a word embedding layer by
learning the distributed representation for each word in the given text. Meanwhile,
Bengio et al. [15] presented it in the context of statistical language modeling, named
neural net language models [16]. To measure how good the learnt representation are
normally depends on how expressive the representation can capture features of the
huge number of inputs behind [17]. In 2006, Hinton [18] initiated a breakthrough
of greedy layerwise unsupervised pretraining in representation learning that many
other scholars followed to same track quickly after [19–23]. The proposed method
uses unsupervised feature learning to learn each level of the features separately, and
then it consolidates the results from the previous layer. Specifically, by adding up
the weights of each layer to the next, the model builds the deep neural network
by learning representations in an unsupervised way. Thus, a final deep supervised
predictor is generated from the raw data inputs directly.
In order to study the current techniques that have achieved those purposes stated
above, we classify the existing deep learning-based NLP techniques into three major
groups, representation learning, information extraction and clinical predictions. And
these three are considered to be the key technologies and applications adopted in the
current EHR system.

3.1 Representation Learning (RL)

The promising performance of deep learning models is primarily dependent on the


data representations or features selections. Representation learning, known as feature
learning, refers to a group of techniques that allows the systems or models to learn
268 R. Zhu et al.

representations of the raw data inputs for feature extraction, and to build predictors
or classifiers. The tasks of representation learning can be supervised or unsuper-
vised. The supervised tasks involve feature learning with labeled inputs, whereas the
unsupervised tasks are the ones with unlabeled input data. In general, as the studies
conducted both in academia and in industry grown rapidly, representation learning
has been nourished by all new discoveries and gained empirical successes overall.
Generally, there are three different ways of using the learnt word embeddings in
the existing literature. First, the scholars choose to train the entire model directly as a
supervised task with randomly initialized embedding matrix from end to end. This is
an easy to adopt method, however, it completely skips the word embedding learning
process, and thus could cause problems such as overfitting. Second, some scholars
pick part of their data to learn word embeddings, and freeze them while training the
rest of the model. Third, most of the conducted research in the past few years choose
to use and train the entire word embeddings from end-to-end. As the deep learning
approaches can perform better with larger amount raw input data, and most of the
recent studies fall into this category, we are going to focus on the literature of the
third approach only in this chapter.
In the medical domain, representation usually involves learning a list of medical
codes or notes which serve for administrative purpose or diagnosis and medication
needs in patient’s EHR system. Unlike sentences which contain an ordered sequence
of words, medical codes in patient’s profiles are randomly ordered. For the purpose
of using these codes and notes as inputs to the machine learning models, represen-
tation learning is adopted to turn them into meaningful representations. Skip-gram,
GloVe, CBOW, stacked autoencoders and BERT are commonly used NLP tech-
niques to learn the distributed embeddings nowadays. In this section, the trending
deep learning based NLP methods will be discussed in the following three subcat-
egories: representations for learning medical concepts, representation for learning
patients, as well as representations learning for clinical abbreviations disambiguation
and abbreviation.

3.1.1 Medical Concept

Both the medical codes and clinical notes contain plenty of valuable information
for physicians to do medical predictions and decision-makings. In a regular patient’s
EHR profile, the unstructured data would take up a considerable proportion of his/her
file. Doctors, nurses, physiotherapists and pharmacists each take in charge of one
section of the general profile and fill in the relative information in unstructured format,
known as free texts. The difficulty here for patients to approach to these notes is to
understand those medical jargons and medical instructions. Whereas for researchers,
these free texts are valuable information for producing effective clinical predictions,
but they are also difficult to process. In reality, due to the different structures of the
EHR systems across institutions, as well as the wide variety of medical jargons used
by different healthcare providers, extracting useful information from the big clinical
notes data pool remains as an unsolved problem.
Using Deep Learning Based Natural Language … 269

The heterogeneous nature of the medical data elements and the high volume of
unstructured data make clinical care and medical analytics studies difficult. Most
of the existing literatures learn features and representations by applying ontology
mappings, or by exploiting information directly from the raw data inputs, for example
medical notes or codes. Although the higher level of medical features such as disease
phenotypes can reduce feature aspects to some extent, they may still not be able to
understand the meaningful information embedded in patient data in the entire EHR
system.
Medical concept learning from patient’s medical notes is a dominant research
subfield. Researchers and scholars all understand that many existing approaches to
concept representation in medical domain still face data inefficiency challenges. They
still depend heavily on hand crafted features and extensive domain knowledge that are
difficult to define. To solve the problem, Choi et al. [24] take advantage of the medical
codes’ encoded relationships, which are inherently in multilevel structure in EHR
system, to construct their novel approach. Specifically, they propose a Multilevel
Medical Embedding (MiME) architecture to learn the embeddings of the EHR data
in multilevel, and to make clinical predictions based on the inherent EHR structures
without the help of external labels. The prediction function is evaluated on two
separate tasks of prediction, namely the prediction of heart failure and the prediction
of sequential disease. As a result, the proposed MiME consistently outperform all
other baseline models with significant percentage of improvement.
Escudie et al. [25] demonstrate the possible way of learning low dimensional rep-
resentations of patient’s visits using deep neural network to predict International Clas-
sification of Diseases (ICD) diagnosis categories when these codes are not provided.
The deep neural network approach adopted in this paper takes both structured/semi-
structured data and unstructured free-texts notes in MIMIC-III as inputs. These learnt
codes are pertinent to medical domain, meanwhile they can directly be used as inputs
to DL or ML algorithms for future patient’s health status prediction and prevention.
Choi et al. [26] proposed a different data driven approach to leverage EHR
data directly for medical concept learning. Specifically, the method maps medi-
cal concepts to similar concept vectors close to each other depending on temporal
co-occurrence relationships among raw data inputs. Furthermore, it is capable of
transforming heterogeneous patient’s medical data in EHR system to clinically mean-
ingful features. Hence, the patient vectors are constructed at the same time from the
related clinical concept vectors. As a result, their proposed representation manages
to generate patient representations by learning representations of medical concepts.
In their paper [26], the authors presented the method based on Skip-gram [4, 7]
to learn multi-dimensional vectors, and to capture the latent relationships between
diagnoses, medications and procedures with multi-dimensional real-valued vectors.
De Vine et al. [27] in their paper utilize the UMLS concepts to learn representations
from patient records of free-texts and abstracts of journals. Basically, rather than
directly learn representations from terms in free-texts, they propose a variation of
neural language modeling to learn concepts from structured ontologies and to extract
information from free-texts by preprocessing the medical texts mapping words to
medical concepts in the UMLS. Then, the Skip-gram model is adopted to learn
270 R. Zhu et al.

word representations of these medical concepts. As a result, the empirical findings


suggest that the proposed model correlates strongly to expert judgement of semantic
similarity measures than existing benchmarks in medical domain.
Choi et al. [28] in 2016 initiated a work demonstrating how to learn the medical
concepts’ low-dimensional representations using neural language modeling as well.
The novelty of this method is to learn representations not only from texts, but also
from the abundant claims data. Besides the most direct way of learning medical
concept embeddings from medical journals (MCEMJ), Choi et al. also introduced two
novel medical concept embeddings for temporal data learnt from medical claims and
a diagram of medicine, constructed from word co-occurrences in medical corpuses.
The first embeddings, MCEMJ, is the embedding introduced in [27]. The second
set of embeddings are learnt from a private health insurance company’s medical
claims datasets. As the data contains many duplicate codes and multiple events
happening in a short period of time, the authors apply partitioning and random-
shuffling to the data before feeding into the Skip-Gram model. The medical claims
data are partitioned into intervals first, then the duplicates are removed in each interval
before being randomly shuffled to a sequence of concepts. Finally, the sentence is
fed into the word2vec system to go through the stochastic gradient descent on Skip-
Gram models. The last set of embeddings comes from the opensource EHR data
collection. The authors learnt the representations in two ways, (1) based on the co-
occurrence counts, the authors sample the graph edges proportionally to the edge
weights, and then to feed these word pairs to Skip-Gram model; (2) to utilize the
characteristic of Word2Vec, being implicitly factorizing the shifted positive pointwise
mutual information (SPPMI) matrix of words and contexts [28].
Minnaro-Gimenez et al. [29] use the medical texts collections of PubMed, Merck
Manuals, Medscape and Wikipedia to apply Skip-gram to different clinical texts and
to practice the representation learning for medical terms. However, the results of
the experiments are shown as a low hit rate of adopting the word2vec methods. The
authors believe that the methods are not suitable for high precision required tasks
such as retrieving medical concepts from restricted medical text data collection.
Liu et al. [30] propose a multi-task framework for predicting diseases that can
integrate the structured information like medical codes into the information-rich free-
text medical notes. The proposed model is flexible enough to utilize both structured
data with numerical values and unstructured data with free texts to generate vec-
tor representations of words or texts. In their paper, they evaluate the current deep
learning methods of CNN, LSTM and hierarchical models on their performances of
processing clinical notes in EHR systems. Meanwhile, they propose a novel approach
to take negations in free texts into consideration towards clinical predictions. The
results suggest that their approach can not only effectively do disease prediction
within a prediction window, but also require no disease specific feature engineering
which serves to affirm the benefit of deep learning approaches.
Besides learning medical concept from patient’s medical notes, medical codes
within the text-based patient encounters are useful insights to do concept representa-
tion too. Medical codes refer to a string of numbers and/or characters used by health
providers to symbolize or describe diagnoses, disease types, exercised treatment,
Using Deep Learning Based Natural Language … 271

bills and costs, and applied medicines etc. Patient usually receives his/her own EHR
report with a list of demographic codes serving for each hospital’s administrative
purposes, a bunch of medical jargons with medical codes, as well as lab tested val-
ues. The most common medical codes include but are not limited to CPT Codes
(Current Procedural Terminology), HCPCS Codes (Healthcare Common Procedure
Coding System), ICD Codes (International Classification of Diseases), ICF Codes
for Disabilities, Diagnostic Related Grouping (DRG), NDC Codes (National Drug
Codes), CDP Codes (Code on Dental Procedures and Nomenclature), and DSM-IV-
TR Codes for Psychiatric Illnesses. However, all the medical codes that seem to be
common knowledge for health providers are difficult for the public to understand
the meanings behind. Hence, it is necessary for researchers and scholars to use these
codes as inputs to feed into the models to generate the perceivable information for
the public, as well as to produce credible clinical predictions.
In general, there are two approaches for physicians to make clinical decisions
with medical codes extracted from patient’s medical profiles. A more straightforward
approach is a static one to predict the medical outcomes by feeding models a single set
of inputs for only one time. For example, Choi et al. [26] propose to feed in the EHR
data directly for models to learn heterogeneous concepts and patient representations
based on co-occurrence patterns. This effective method of medical concept as well
as the patient representation learning uses single inputs to generate the results of a
possible heart failure (HF). Meanwhile, it serves to link up relevant concepts and to
boost the performance of predictive modeling.
A more complicated approach is dynamic to predict the medical outcomes by
feeding models a sequence of inputs. The models are capable of producing clinical
decisions after each EHR raw input is fed in or after the entire sequence of EHR
data points are learnt. For example, Choi et al. [31] leveraged a large dataset in EHR
system to develop a temporal predictive model, Doctor AI, for learning observed
medical conditions and uses, which will be discussed further in Sect. 3.3.
In 2016, Choi et al. [32] approached this issue by proposing an algorithm named
Med2Vec and structuring a dataset which consists of patient visit records, diagnosis
codes (ICD9), lab test results(LOINC) and drug usage(NDC). Since the Skip-Gram
can predict the context and capture the relationship between words by learning word
representation vectors, it is necessary to convert the medical codes used in the study
into an ordered form of (target, context) pairs. Thus, they define the (target, context)
pairs at each patient’s profiles level, instead of the sequence of medical codes level.
By doing so, Choi et al. aim to learn medical concepts representations effectively
and efficiently. Besides, the authors were also able to make predictions to patient’s
neighboring visits by representing his/her medical records as binary vectors, and to
further feed into a two-layered neural network.
Another popular representation learning technique applied to medical concept and
event extraction is bag of words (BOW). Li et al. [33] propose an embedding learning
method that incorporates word’s distributional characteristics into medical event
extraction. Their model uses BOW features as baseline, and the results generated
from the word embedding feature learning are promising since the n-gram effectively
enriches the context information.
272 R. Zhu et al.

Tang et al. [34] apply feature learning procedures such as bag-of-words (BOW)
and part-of-speech (POS) to their study which are different from the GloVe and Skip-
Gram approaches. In their experiment, Tang et al. adopt a neural language model to
generate word embedding vectors from the biomedical corpus, and the experimental
results are a bit better than the existing works.
Gong et al. [23] evolve the BOW representation learning method by altering
it to bag of events (BOE) in their study. The BOE stands for the number of events
occurred in the first 24 h of their stay. In their paper, the authors aim to map database-
specific representations to a shared list of medical concepts. Hence, the model can
transfer itself across databases. Meanwhile, the Item ID feature is constructed as a
new identifier pair consisting of each patient’s unique (ID, text value). Lastly, the
representations are converted to the UMLS concepts by a frequently used tool for
identifying UMLS concepts.
Indeed, no matter if the medical codes are fed as a single set of inputs or a
sequence of inputs, they are also common sources of data similar to common medi-
cal notes to serve for medical concept representation and the final clinical decision-
making processes. However, since the medical decision-making process is compli-
cated, researchers and scholars should never consider only one single type of data as
inputs to generate effective clinical predictions.

3.1.2 Patient Representation

The purpose of learning patient representation is to map raw information existed


in the patient’s medical notes to a dense vector that can be used for future clinical
predictions and analytics such as disease phenotypes or clustering tasks.
Miotto et al. [35] propose a study based on EHR datasets that uses a different
approach of unsupervised deep representation learning method to help clinical pre-
dictions as well as to layout a general-purpose patient representation. The paper
experimented on over 700,000 patients from the Mount Sinai data warehouse and
designed a three-layer stack of denoising autoencoders for these “deep patients”
to understand the hierarchical regularities and the dependences among the EHR
systems. The experimental results have proved that the proposed design performs
significantly better than those representations generated directly from raw EHR data
inputs. Particularly, Miotto et al. use multi-layered DL neural network to discover
patient representations. Each layer in DNN serves the next layer by producing higher
level representations of the observed patterns from data inputs of the previous layer.
As each layer generates a relatively more abstract feature than the layer before, the
last layer of the network outputs the final result of the patient representation by
consolidating all the previous inputs.
To continue the discussion of Choi et al’s. [26] work in 3.1.1, because of the
impressive features of the Skip-Gram [4], the word vectors here are capable of doing
word analogy calculation for both syntactically and semantically meanings of the
corpus. Thus, the patient representation is generated directly by summing up all
Using Deep Learning Based Natural Language … 273

vectors, from the conversion of all medical concepts in his/her profile to medical
concept vectors, to a single representation vector.
Dligach et al. [36] consider an alternative way of learning patient representation
by applying text variables only with a deep neural network. In their proposed work,
Dligach et al. use billing codes, for example ICD 9 or CPT, as a source of supervision
to learn patient vectors first. Then, they train the proposed model together with a set
of UMLS Concept unique identifiers (CUIs) generated from the clinical notes in
patient’s profile to predict all billing codes might be associated with the patients. The
results from the experiments prove that these learnt representations with the new
method are good enough to reach the currently existed performances on standard
comorbidity detection tasks.
Zhang et al. [37] propose a computational framework named Patient2Vec to learn
patient representations while overcoming the interpretability problem of the deep
learning architectures. It learns each patient’s personalized deep representation of
longitudinal EHR data. For purpose of evaluating their proposed method, they uti-
lize it to predict the future hospitalizations with EHR data from real hospitals. More-
over, they also compare the method’s performance on clinical predictions to other
baseline models. The proposed architecture consists five parts: the learning vector
representations of medical codes with Skip-Gram, the learning within-subsequence
self-attention with one-side convolution operation with a filter and a nonlinear acti-
vation function, the learning subsequence-level self-attention with a bidirectional
GRU-based RNN, construction of the aggregated deep representation by adding
patients characteristics such as demographic information and static medical condi-
tions, and the prediction of outcome with a linear and a softmax later. Indeed, the
Patient2Vec model is able to produce meaningful structures of vector space and to
outperform baseline models with a significant percentage.
Denaxas et al. [38] propose a method of learning word embeddings for disease
diagnoses and medical procedures using global vector (GloVe) base on the national
UK EHR system. They leverage the learnt patient representation to evaluate their
performance on identifying patients who are more likely to be hospitalized due to
the congestive heart failure. Specifically, they adopt GloVe model to four different
corpuses created on their own to learn the word embeddings for concepts. After that,
they evaluate the learnt and normalized patient-level embeddings by predicting heart
failure onset to be tasks of supervised binary classification with linear SVM classi-
fiers. The experimental results are able to produce marginally improved performance
on clinical predictions compared to the current conventional one-hot models. Thus,
it can potentially enable us to build robust EHR-based disease risk prediction models
in the near future.
Wei et al. [39] propose an end-to-end based clinical decision support system
which is able to generate and retrieve relevant information and literature for target
patients with distant supervision. The experiment uses GloVe to train Wikipedia
texts and Word2Vec to train biomedical texts in order to train model for ICD codes
prediction from raw text inputs. Note all the raw input data are drawn from the
MIMIC-III data collections. Then, the Deep Relevance Matching Model (DRMM)
is adopted as a semantic matching model to learn the terms. After that, user’s query
274 R. Zhu et al.

and the candidate documents are split into different paragraphs, while the word
embeddings are replaced with convolution embeddings of paragraph level. Lastly,
cosine similarity is computed to calculate the direct semantic similarity scoring to
output the final results. Their experiment shows a promising result with substantial
improvement in the information retrieval tasks.
Zhu et al. [40] introduced both supervised and unsupervised methods to evaluate
patient similarity also with temporal properties matching of patient’s longitudinal
data in EHR system. In fact, the authors suggest to define a unique medical context by
those medical events that are happened before and after it, and thus use a fixed-length
representation of vectors to express medical concept embedding and to make further
predictions. With the embedded matrix of patients representations, the supervised and
unsupervised methods are adopted to measure similarity. Specifically, the supervised
approach adopts a CNN architecture to learn an optimal representation of patient’s
medical record in the EHR system and to map the convolutional filters towards the
fixed-length of feature vector; whereas the unsupervised approach applies the RV
and dCov coefficients respectively to learn the linear and nonlinear relationships
between patients. As a result, these experiments run on testing data outperform the
baseline models significantly. They also suggest possibilities of future study towards
the same direction.
Liu et al. [41] tackle the medical events and patient representation problem differ-
ently by distinguishing long time scale medical events with strong temporal patterns
from short time scale medical events with disordered co-occurrences. Thus, to accom-
modate clinical events happened in different time scales, Liu et al. propose a model to
learn hierarchical representations of the sequence of events, that are adaptive to dif-
ferent time range events and can capture core temporal dependencies. To be detailed,
the proposed model splits the entire sequence of medical events into several groups
of events with an adaptive event sequence segment module using RNN first. Second,
the model learns the event sequences’ hierarchical representations with two different
mechanism, namely the event attention with aggregating GRU event group function
and temporal attention with GRU sequence representation. The experimental results
outperform most of the existing models and suggest promising results of predictions
on deaths and ICU admissions.

3.1.3 Clinical Abbreviation Representation

Abbreviations appear frequently in the EHR systems. A study conducted by a popu-


lar online knowledge base has shown that among the 3,096,346 stored abbreviations,
197,787 records are in the medical domain. The number is ranked to the top among all
ten domains (www.allacronyms.com) [42]. Disambiguation of the clinical abbrevia-
tions is a special example of disambiguation in word meaning. The disambiguation
and extension of clinical abbreviations in the medical texts have been important tasks
in the medical research. Since there are no universal dictionary or rules of recording
clinical abbreviations, healthcare providers, such as doctors, pharmacists and nurses,
Using Deep Learning Based Natural Language … 275

tend to use their own abbreviations to denote certain diagnoses, treatments or medi-
cations in patient’s medical profile. Thus, it becomes one of the most difficult tasks
to study the ambiguous abbreviations in EHR system, especially in the intensive care
unit (ICU) where medical notes are taken in high pressure of workload and limited
time. Therefore, a deeper understanding of the abbreviations in clinical notes would
not only help medical researchers to understand diseases better, but also to enhance
the healthcare service quality more effectively and efficiently.
In [42], Liu et al. initiate to learn word embeddings for clinical abbreviation
expansions by exploiting task-oriented resources. They explore the domain for two
purposes: (1) to effectively reduce misinterpretation of the clinical abbreviations by
normalizing all abbreviations used in ICU documentations; (2) to allow the public to
understand the abbreviations in the medical free texts better. Specifically, based on
the intuition introduced by Harris in 1954 [2], Liu et al. exercise word embedding
or distributional semantic representation to learn the meanings of an abbreviation
in the given medical context without labelled input data. They use Word2Vec [3] to
learn word embeddings first. After that, they used traditional approach to use regular
expressions to detect all medical abbreviations in the ICU notes, the possible candi-
dates of abbreviation expansions are then generated from specific domain knowledge
base [42]. They compute the expansion of abbreviations, which is a multi-word phrase
in most cases, by defining Candidate Ci as the group of the words of the candidate
list, following by similarity computation. Although Liu et al. did not apply deep
learning to continue their experiment, their method still significantly outperforms all
base line methods and achieves 82.27% accuracy.
Wu et al. [43] examines the use of neural word embeddings applied in clini-
cal abbreviation disambiguation and develops two new word embedding methods,
named LR_SBE and MAX_SBE, to generate word sense disambiguation represen-
tations from a large unlabelled medical corpus. Li et al. [44] proposed the method of
Surrounding based embedding feature (SBE) in 2014 which serves as a foundation
Wu et al.’s the next step of the study. The target SBE word representation is learnt
by consolidating the embedded row vectors of all neighboring words that are existed
in the given k size of the window. Built on top of the SBE, Wu et al. assume that the
direction would help to learn better word representations. Similarly, the MAX-SBE
representation takes the same approach by learning to take the maximum value of the
embedding dimensions of the surrounding words. The authors present the intuition
as the higher score of a latent feature gets, the higher importance should the word
win.

3.2 Information Extraction (IE)

Information extraction is considered to be a classical and widely used task in NLP. It


refers to a set of techniques extracting target structured information from unstructured
natural/ human language texts. Although there has been a rise of the studies and
attentions on information extraction tasks in the past due to the availability of the
276 R. Zhu et al.

data volume, the development of the information extraction still remains in narrowed
and restricted domains because of the relatively higher degree of difficulty.
Indeed, information extraction in the medical domain has always been one of the
most important tasks, especially after the adoption of the electronic health record
system. In healthcare, extracting information from patient’s EHR profile involves
learning and extracting medical information from the doctor’s notes, ambulance
records and prescriptions etc. to be used in machine learning or deep learning models.
It is not only necessary for all related health practitioners to know the patient’s health
conditions more efficiently and thoroughly, but also important for the researchers,
scholars or policy makers in the healthcare sector to understand the diseases and
patient groups better in order to provide better diagnoses, treatment, intervention
and even prevention.
In the EHR system, text-based clinical notes are one of the most important and
informative resources to study about the patients. However, as we mentioned before,
the biggest challenge here is that doctors, nurses, physiotherapists and pharmacists
normally complete each section of the same patient’s file separately without referring
to each other’s notes. On one hand, it is time consuming and unnecessary for them
to flip through the documents; on the other hand, each health practitioner has his/her
own preferred medical jargons that it is hard to align in nature. Therefore, these
different medical wordings and representations create difficulties for information
alignment in the patient’s EHR system. In other words, the most challenging task in
the current EHR system is to align all the EHRs that are structurally different. In fact,
it is already time consuming to align the information in medical profiles, not even to
create a pipeline for these EHR notes processing, term extractions, embeddings or
aligning over years and across hospitals or other health institutions.
Thus, it is necessary for current research and studies to find out what the entities
mean according to different context. In general, there are three ways of doing infor-
mation extraction on medical corpuses. A traditional way of extracting information
is to follow rule-based approach and do it manually. However, this method is not only
financially too costly but also time consuming. The second way of doing information
extraction is using traditional machine learning approaches. These methods are more
efficient than rule-based approach but also involve certain degrees of human involve-
ment, therefore they can be costly as well. Thus, a more effective and efficient way of
extracting information had been developed recently, known as deep learning-based
approach. Indeed, many recent papers and studies on medical information extraction
tasks start using this third approach, deep learning, as they can greatly benefit from
lower cost and no human interventions at all.

3.2.1 Name Entity Recognition

The current deep learning (DL) approaches to entity recognition are categorized
into three major groups. The classical rule-based approach usually applies key-
word matching and assign document level labels to the study; RNN approach which
requires large datasets of annotated entities; while the Transfer Learning approach
Using Deep Learning Based Natural Language … 277

which uses language modeling to extract the biomedical name entity recognition
(NER). To eliminate the limitation of using large amount of annotated entities as
prerequisites for training, most of the recent studies adopt the second and the third
approaches for entity recognition.
Gligic et al. [45] introduce a novel approach with transfer learning which can
overcome the problem of neural network model’s dependency on large labelled data
and data scarcity issue for name entity recognition. Specifically, neural networks as
a more robust structure is adopted to experiment on all datasets released by I2B2
(2007–2012). Then, both CBOW and Continuous Skip-Gram (CSG) are adopted to
train embeddings to feed in three term classification architectures, namely context
free feedforward NN, context aware feedforward NN, and a RNN based LSTM
model. The introduced method extracts information on medications, dosages, modes,
frequencies, durations and reasons individually first, and studies the relationship
between them with a sequence to sequence Bidirectional RNN model comprising
one hundred GRUs versus a Bidirectional LSTM encoder-decoder framework.
Sachan et al. [46] tackle the problem by using unlabelled text data to achieve better
results of the NER models. Specifically, they train a bidirectional language modeling
(BiLM) on unannotated data from PubMed abstracts as a transfer learning approach
to pretrain the NER model weights with same architectures of BiLM. The results
generated from this training above are initializing better parameters for the NER
models and improving F1 scores as the speed of convergence with less data inputs.
Hence, the transferred weights of the proposed model along with the pretrained
word embeddings allow the authors to practise end-to-end learning as well as the
supervised NER tasks.
Gorinski et al. [47] take a different perspective by comparing the three dominant
systems, (1) rule-based EdIE-R, (2) a bidirectional Long Short-Term Memory com-
bined with deep learning-based conditional random field, EdIE-N, and (3) transfer
learning based SemEHR with GATE Bio-YODIE. They evaluate these three archi-
tectures on performances on name entity recognition from patient’s stroke records
in the brain imaging reports. By trainings on common data set, the experiment is
able to identify the advantages and disadvantages of these three different systems.
Moreover, it can also construct rules and empirically evaluate the performance of
each system. As a result, they believe although machine learning approaches can be
easier to adopt, the rule-based handcrafted system remains as the most accurate and
trustworthy source of labeling EHR contents automatically.
Other related research about entity extraction on genomic data include Yin et al.
[48], Huang et al. [49, 50], and An et al. [51]. An et al.’s work [51], constructed on top
of [48–50], propose a new metric to evaluate the novelty and relevancy of a medical
term in information retrieval based on the aspect-level performance measure provided
by TREC Genomics Track. The experimental results show that the proposed geNov
metric is superior than the existing metrics in discovering the novelty, redundancy
and relevancy in the ranking process. Moreover, it is considerably sensitive to novelty
and relevancy of a medical term, and the proposed three parameters are highly tunable
according to different evaluation requirements.
278 R. Zhu et al.

3.2.2 Relationship Extraction

Both entity recognition and relationship extraction are standard tasks in natural lan-
guage processing. In biomedical research, besides name entity recognition, it is also
necessary to extract biomedical entities relationships from texts. Many of the existing
literature apply feature-based pipeline models to do relationship extractions which
could cause problems such as error propagation, extracting subtasks without interac-
tions, and heavy work needed on feature engineering. To overcome the issues stated
above, deep learning based natural language processing techniques are commonly
applied.
Li et al. [52] present a competitive and effective neural joint model for practicing
relationship extractions with minimalizing the work on feature engineering. This
novel approach uses CNN first to encode the word characteristics to a corresponding
character-level representation. Then, the generated character-level representation,
word embeddings and part-of-speech embeddings are input into the RNN based
BiLSTM model to learn entity representations and the related context for medical
entities recognition. After that, the relationship representation along shortest depen-
dency path (SDP) of the two target entities is learnt by a second BiLSTM model get
relationship classifiers. The parameters of the LSTM units in both BiLSTM RNN
models are shared, therefore those parameters used in the first part can affect the
second in training in entity recognition and relation classification tasks. Mehryary
et al. [53] also propose an extraction approach based on LSTM model with syntactic
dependency graphs (SDP) and Skip-Gram model to get word embeddings.
Similar to Li et al.’s work, Quan et al. [54] presented a multichannel CNN to
exercise automatic relation extraction in medical domain in 2016 to tackle drug-
drug interaction (DDI) extraction and protein-to-protein interaction (PPI) extraction
problems. This proposed method also eases the complicated feature learning work
by CNN base automated feature learning technique. CBOW was used in the study
to capture information from the entire medical corpus on Medline, while all other
word embeddings are borrowed from Pyysalo et al.’s study [55]. As a next step,
five versions of the word embeddings from PubMed, PMC, MedLine and Wikipedia
are consolidated within the multichannel word embedding input layer. In fact, the
multichannel word embedding used in the model outperformed the current best DDI
Extraction models by 5.1%. In the convolutional layer, the generated embeddings are
filtered to produce n-grams of extracted information by adjusting the window sizes.
Thus, the max pooling layer would be able to distinguish the most important local
features while reducing feature dimensions effectively. Last but not least, the softmax
layer does the final relationship classification based on the information consolidated
from all above.
Cheng et al. [56] focus on medical information extraction from patient’s EHR
system for disease phenotyping. The authors construct a temporal matrix represen-
tation, with time on one dimension while events on the other, for each patient in
the EHR system. Then, the deep learning approach adopts a four layered CNN to
extract phenotypes and predict future medical events. Specifically, the first layer of
the framework consists of temporal matrices of EHR. The second layer then performs
Using Deep Learning Based Natural Language … 279

a one side convolution to extract the features from the first layer. Similar to the works
presented above, the max pooling layer on the third level eliminates certain sparse
data points to leave the most important ones stayed. Lastly, a fully connected layer
with softmax activation function is in structure to output the predictive results.
Zeng et al. [57] learn the relationship between medical notes in EHR and the iden-
tification of distant recurrences of breast cancer closely in their paper. To overcome
the challenge of relying on manual charts reviews to discover the possibilities of
breast cancer’s distant recurrences, they design a hybrid model to work with clinical
narratives and structured data from EHR system only. Specifically, the model first
extracts medical narratives features with MetaMap while retrieving the structured
EHR medical data from the system directly. Second, a linear kernel type of support
vector machine is adopted as a prediction model to learn and identify patients that are
potentially distant recurrences in breast cancer. The model consists of four baseline
classifiers that are adopted here to learn different types of the features, both struc-
tured and unstructured, and to achieve the best results. Generally speaking, the model
gives promising results by combining feature elements extracted from unstructured
clinical text-based notes and from structured data in EHR system to diagnose distant
recurrences of the breast cancer.
Galko et al. [58] learn a broad scale of relationship extraction by retrieving relevant
passages from publicly available data and BioASQ tasks. In other words, they achieve
to retrieve passages in a question answering form. To be detailed, they use the neural
network word embeddings to propose a weighted scheme for cosine distant retrieval.
The paper first projects the terms into semantically meaningful vector spaces which
are learnt from Word2Vec or GloVe. Thus, both the query questions and the retrieving
passages are all represented in fixed-length of vectors. Then, each term in the space
is able to link each other with cosine distance functions. Lastly, with the given
representation and similarity measurement, the passages are clustered and ranked
in list to generate the final results. The proposed method has proved to outperform
traditional models with this cosine distance text matching scheme significantly, and
future work in this direction is possible to be applied on broader range of topical
domains.
Li et al. [59] also utilize the convolutional neural network and distributed semantic
representation to exploit binary event relation extraction tasks. Specifically, the study
employs CNN to model raw data inputs with word embeddings from medical texts by
convolutional layer and max pooling layer. As a result, the most important features
are generated automatically from the Max Pooling layers and thus contribute directly
back to relation extraction tasks.

3.2.3 Medical Event Extraction

Among the existing literatures of deep learning NLP techniques applied in medical
information extraction, event extraction has an important standing in the research
subfield. A medical event refers to a change that has been made in patient’s medical
records. Those medical events can be insightful and useful for discovering abnormal
280 R. Zhu et al.

clinical decisions and applications that could cause serious problems such as patient’s
negative reactions to certain treatments and medications.
Rahul et al. [60] apply bidirectional recurrent neural network (RNN) to sequential
labeling for medical events extractions and the understanding of unstructured clin-
ical texts in EHR system. The proposed RNN model avoids using time-consuming
handcrafted features generated by NLP toolkits, and is able to extract higher level of
features directly from the sentences in text corpus to achieve comparable F1-scores
on Multi Level Event Extraction (MLEE) corpuses. The input layer uses embed-
ded representations of words to learn a higher level feature representations, layer by
layer, until it gets the final classification. Specifically, for the input feature layer, the
proposed method extracts two types of features from each single word in the text
corpus, namely the word and the entity. Then, in the embedding layer, each feature
input is mapped to a dense feature vector for the next layer to use. In the bidirec-
tional RNN layer, each word is learnt by both forward and backward RNN to capture
representations of the past and of the future. In this way, the entire context is learnt
within the neural network.
Similarly, Jagannatha et al. [61] in 2016 try to tackle the EHR semantic under-
standing problem by sequence labeling for medical events extractions. Conditional
Random Fields (CRF) is used as a baseline model to compare results from the exper-
iments. Initially, for the purpose of ensuring unbiased representations of infrequent
words, the system trains word vectors from large data corpus in the embedding layer
with Skip-Gram techniques. Then, as the words are assigned to the representations
of corresponding vectors, they are also input into the double chained long short term
memory model for training in both forward and backward directions. The output of
the bidirectional long short term memory layer, an output of the combined repre-
sentations of both words and the related context, is then input into a feedforward
neurons with Softmax functions producing those rates of probability. Meanwhile,
the paper also utilizes another recurrent neural network based algorithm GRU [62]
in the same structure to train the input data as the LSTM structure. The experiments
have shown that RNN models in general are valuable techniques to extract medical
events from the large amount of EHR corpus. The improvements achieved by these
models, especially GRU, suggest that the capability of RNN models to remember
information across different ranges and dependencies of contexts is very important
for effective information extraction.

3.2.4 Generalization and Summarization

Natural language generation is one of the NLP tasks focusing on natural language
generation from structured data sources such as the knowledge base or a linguistic
logical form. The technique can be applied on either long or short tasks, which may be
content summarizations and news reports, or product descriptions on online shopping
website respectively. In the past few years, DL approaches have made huge progress
towards the language generation tasks. Ideally, the natural language generalization
is trained as end to end NN models consisting of an encoder and a decoder. The
Using Deep Learning Based Natural Language … 281

encoder will serve to produce the hidden representation of the source text while the
decoder aims to generate the target text.
Choi et al. [63] apply the natural language generalization techniques to tackle the
problem of data scarcity. They propose a deep learning based generative adversarial
network (GAN) model to synthesize data in EHR systems. The model is capable
of learning distributions of the count-valued and binary-valued variables with two
neural networks. The first one serves to generate fake records while the other serves
to distinguish which records are real and which records are fake. The advantage of
this system is that the GAN model is able to generate patient level records which
are needed for the study while keeping the patient’s personal information in privacy.
However, the GAN system proposed by Choi et al. could only produce discrete data,
and fails to produce free-text records of the EHR system which are valuable for the
research community.
Lee [64] introduce an end-to-end DL encoder-decoder algorithm to build syn-
thetic chief complaints from the electronic health record discrete variables include
age, gender and exercised diagnosis. These generated synthetic chief complaints take
advantage of the optimization process of the model, which allow them to eliminate the
comparably uncommon medical abbreviations and misspellings, while protecting the
patients’ privacy with de-identification characteristic by preserving no personally-
identifiable information (PII). Those chief complaints are preprocessed with LSTM
model to downsize the matrix of word embedding. The encoder is constructed with
a single feed forward network layer to compress records to LSTM cell, while the
decoder is also a single layered LSTM model following be Vinyals et al. [65]. Fol-
lowing the same concept, the word embedding matrix is adopted to transform the
complaints from integer sequences to dense vector sequences, while the softmax
activation function and a feedforward layer are applied to deliver the final output of
the predicted word probabilities.
Besides natural language generalization, text summarization is also a common
problem and research subfield in natural language processing. Text summarization
refers to the creation of brief, accurate and fluent summary of a longer piece of text
corpus. Being able to summarize text automatically will help not only to discover
and extract relevant information easier, but also to consume them more efficiently.
In general, there are two different ways of summarizing texts in natural language
processing tasks. The first is extraction-based summarization. The extraction-based
techniques refer to the set of algorithms and models that can pull out key terms and
phrases from the source document and can join them fluently into a summary in the
end. The second approach is an abstraction-based summarization. This approach is
based on the techniques of re-paraphrasing and shortening the pieces of information
contained in source documents. In other words, the abstractive summarization meth-
ods are able to create or rewrite new terms, phrases and sentences like human beings
to relay the most important information from the source documents. Thus, algorithms
for extractive summarization is still relatively more popular as the abstraction based
ones are more difficult to develop and adopt.
Liu et al. [66] apply the extractive summarization technique to the data in EHR
system in medical domain. They used an unsupervised pseudo-labeling approach to
282 R. Zhu et al.

study how to make use of the intrinsic correlation between different data in EHR.
Their proposed method is capable of generating pseudo-labels while training the
supervised models without any external sources of annotated data. For purpose of
finding a subset that can give the best summary of the entire document of patient’s
information, they train supervised model without direct human annotations. Then the
intrinsic correlation between medical notes and the patient is used to find pseudo-
labels and produce summaries to find out the answers to three research questions
they proposed [66]. As the model proceeds to these questions, the system answers
the RQ1 by learning the clinical entities relate to specific disease. For RQ2, the model
generates binary label vectors for notes and applies Integer Linear Programming to
train data with pseudo labels while optimizing the results. For RQ3, the medical
records are summarized by a supervised neural model, the two layered Bidirectional
GRU. In general, the study confirms the effectiveness of the proposed model in
text summarization task by showing it outperforming other existing unsupervised
baselines. It can also be improved in future to further help physicians to understand
medical histories of patients better while reducing clinical costs even more.

3.2.5 Information Extraction on Specific Disease

Datta et al. [67] released a scoping review of the existing medical NLP techniques
applied in cancer study earlier in 2019. It aims to provide a valuable resource of
the cancer frames annotations as well as the related natural language processing
tools on general purpose. The paper summarizes the trending NLP techniques, that
are able to learn useful features related to cancer, from the EHR system with a
wide range of data collections. By reviewing 79 papers, the authors create frame
semantic principles with pertained information including cancer diagnosis, tumor
descriptions, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis
and pain in prostate cancer patients. [67] The study reviews that most of the recent
work have a specialization on information extraction towards treatment and breast
cancer diagnosis, meanwhile cancer diagnosis amounts the top one focus of all the
reviewed papers, with a quantity of 36 out of 79.

3.3 Clinical Predictions (CP)

Clinical Prediction contains uncertainties in its nature as a probability problem.


Indeed, the clinical predictions are supervised tasks performed by researchers, physi-
cians or other healthcare providers to make future predictions. Specifically, they go
through both structured and unstructured data in patient’s EHR profile and identify
the probability of a specific disease or outcome by leveraging the learnt representa-
tions, signs, symptoms and codes. Then a calculated probability score is given to the
Using Deep Learning Based Natural Language … 283

patient to predict the likelihood of certain disease, diagnosis or outcomes. The prob-
ability scores provided by physicians are heavily correlated to one’s clinical experi-
ences. Therefore, the computer-based technologies providing data to physicians act
as the best human physician’s assistant in the prediction process. These technologies
include but are not limited to understanding medical codes, reading behind clinical
notes, interpreting time-series data, and handling medical scans. Among the existing
literature, the clinical prediction tasks are split into two subfields, the general clinical
predictions and the specific disease targeted medical predictions.

3.3.1 General Clinical Predictions

Zeng et al. [68] reviews and compares the traditional NLP methodologies as well as
the DL-based NLP techniques used in disease phenotyping of the EHR system in
the past few years. This paper gives a thorough review on the current applications
of the EHR-based computational phenotyping, as well as the NLP-based computa-
tional phenotyping methods. On one hand, traditional keyword search and rule-based
approaches give promising results for the prediction task, however, these methods
require human to compute manually which is very costly. Supervised machine learn-
ing models that are able to perform data pattern and structure classification are instead
popular among the researchers because of their capability. On the other hand, as the
DL methods grow to be more important in the natural language processing field,
more studies begin to conduct deep learning approaches due to its power of gen-
erating novel phenotypes. As a result, despite of posing some opportunities and
challenges remained in the field, this paper also concludes that a combination of
multiple sources of the data information from the EHR system would produce better
performance in general.
Rajkomar et al. [69] present a deep learning-based patient’s representation with
his/her entire medical record in the EHR system using Fast Healthcare Interoperabil-
ity Resources (FHIR) format. With the help of sequential format and the procedure
of patient de-identification, they study 46,864,534,945 data points generated from a
sample size of 216,221 patients in their adulthood who had been hospitalized for 24 h
minimum in the two American academic medical centers. Since the patient records
contain different data points in length and in density, the authors proposed three
different deep learning models to tackle the issue. The first model used is the LSTM
based on RNN; the second is an attention-based time-aware NN model; and the third
one is a NN built with boosted time-based decision stumps. All experimental results
show that the proposed method outperforms all other traditional predictive models
used in the current clinical studies, and is able to predict multiple clinical events
happened across multiple medical centers accurately without harmonizing data with
specific sites. In future, this method could have the potential to extend to a variety
of scenarios due to its promised accuracy and scalability.
Zhang et al. [70] present a novel meta learning approach to predict clinical risks
from longitudinal patient EHRs, named MetaPred. The MetaPred uses a list of related
risk prediction tasks to teach and train the meta-learner how to learn a good predictor
284 R. Zhu et al.

for predicting target risks where patient data is in limited. Specifically, the MetaPred
framework is built on the model agnostic meta learning strategy to generate risk
predictor from specific domain. Meanwhile, the meta-learners can directly serve as
inputs into the risk prediction function, while those limited data can help to boost the
model performance further with fine-tuning. The risk prediction models adopted in
the experiments are either computed on CNN or LSTM based RNN. The experimental
results conducted on real patients’ data provided by Oregon Health and Science
University show that the CNN and the RNN based MetaPred predictor can outperform
all other predictors trained with limited samples significantly.
Hosseini et al. [71] introduce a heterogeneous information network named Het-
eroMed to run predictions on accurate and robust clinical diagnoses with high-
dimensional data and abundant relationships within the EHR data. The suggested
model can get higher level semantic relationships between words and terms in EHR
system for disease diagnoses with heterogeneous network embedding, while han-
dling the missing values and heterogeneous data directly. Furthermore, it can also
empower its joint embedding framework to accommodate the representations of
medical events to the goal of disease diagnoses. As the very first study to model clin-
ical data and disease diagnoses with Heterogeneous Information Network (HIN),
the HeteroMed achieves significantly better results over other existing literatures in
diagnoses codes extraction and disease prediction.
Avati et al. [72] present a scoring rule and a generalization of continuous ranked
probability score (CRPS) to make survival rate predictions, named Survival-CRPS,
as well as two variants of right and interval-censored. Aside from that, in order
to evaluate the quality of event predictions over time, Survival-AUPRC evaluation
metric is proposed to compute a precision-recall like curve. To prove the efficiency of
the introduced techniques, this paper runs experiments on EHR data with a multilayer
deep RNN model to test the accuracy of the patients’ survival rate as the prediction
model. The model intakes a sequence of features to predict the mortality probability
over time. And the results from the extensive experiments prove that the proposed
RNN method dominates the success of large-scale survival predictions with log-
normal parameterization.
Chung et al. [73] takes back the scope of the research from great population back
to individualized and reliable patient-centric prediction model. The proposed frame-
work aims to extract useful information from the EHR system to make promising
predictions and to provide tailored clinical services of disease diagnoses, treatment,
intervention and prevention with time-series data. The framework consists two parts:
(1) a globally developed section which could capture trends across various groups of
patients; (2) an individualized section to model tailored services for each patient. To
combine the two sections together, a RNN model to capture global patients trends
versus a Gaussian Processes probability model to capture individual patient’s char-
acteristics are built together on top of a deep RNN foundation to make clinical
predictions more accurate and credible.
Heo et al. [74] propose the input-dependent uncertainty notion to attention mech-
anism in their work, realizing that the attention mechanisms can sometimes be unre-
liable when they are generated from weakly supervised networks. Indeed, the newly
Using Deep Learning Based Natural Language … 285

proposed notion can build attention to each feature by learning the input noise level
effectively. The general framework of the study is based on stochastic attention
mechanism. Then the attentions are generated by the stochastic mechanism with
input-adaptive Gaussian noise and variance inference. After that, an attentional RNN
model with both timesteps attentions and feature attentions is adopted learn the pre-
diction possibilities. As a result, the uncertainty-aware attention mechanism shows
significantly better performance on the training datasets than baseline models.
Wang et al. [75] take a different way to consolidate supervised learning and the
reinforcement learning methods together in their study to generate recommendations
to patient treatments. They present this novel architecture of Supervised Reinforce-
ment Learning with Recurrent Neural Network (SRL-RNN) to act as an off-policy
actor-critic framework to deal with the complicated relationships between different
types of data in the EHR system. The indicator signal and the evaluation signal then
co-supervise the actor in SRL-RNN to generate effective prescriptions and low rate
of mortality. In the real world of the medical domain, there is always a limit in fully
observed states. Because of this characteristic, the RNN is further adopted here to
tackle this problem of Partially-observed Markov Decision Process (POMDP). The
paper conducts experiments on MIMIC-III data. And the results have shown that the
proposed architecture is able to provide ideal accuracy rate in doctor’s prescriptions
matching as well as to lower the estimated mortality rate.
Pham et al. [76] present this end-to-end approach of DNN, DeepCare, which can
interpret clinical records, save patient’s all medical histories, infer the current medi-
cal conditions and make possible clinical predictions base on the given information.
DeepCare is built on the LSTM recurrent neural network that can store memories
of the applicable experiences. At each micro data level, DeepCare uses the LSTM
model to read the given input, to update the memory cell and to represent the care
episodes as output in the system. It also functions to suggest medical interventions for
helping patients with current illness, future clinical risks. At the macro health state
level, the DeepCare also learn and aggregate the recorded health states by applying
multiscale temporal pooling to get them fed into the deep dynamic neural networks
for future estimations. The experiments are done on two chronical diseases, diabetes
and mental health, to prove that the proposed method is capable of modeling dis-
ease progression, recommending possible clinical interventions, improving general
modeling and making clinical predictions accurately.
Ma et al. [77] believe the importance of incorporating prior medical knowledge
into risk prediction tasks. Therefore, in this study, they initiate a new deep learning
approached PRIME framework using posterior regularization method to incorporate
all prior knowledge into the predictive models. Specifically, this paper introduced
PRIMEr and PRIMEc models based on LSTM and CNN respectively to practice
the prediction steps. Besides, the prior knowledge applied in risk prediction model
are totally without human intervention while doing disease distribution, in other
words, the knowledge doesn’t need to be processed by human to set boundaries. By
modeling log linear to prior knowledge, the PRIME framework could even learn the
importance of each piece of prior knowledge automatically.
286 R. Zhu et al.

Suresh et al. [78] focus on real-time clinical predictions in the data from intensive
care units (ICUs) in MIMIC III. Different from previous studies, this work integrates
ICU based data from all different sources to focus on learning insightful represen-
tations for clinical interventions predictions. Particularly, the authors compared the
two most commonly used approaches to exploit clinical decisions, the LSTM and
CNN, on 5 tasks of clinical interventions: invasive ventilation, non-invasive venti-
lation, vasopressors, colloid boluses, and crystalloid boluses [78]. The experiments
have shown great results when comparing to other state-of-the-art literatures.
Choi et al. [31] propose a RNN model for clinical prediction. This RNN-based
model takes historical data of ICD codes in EHR system as raw inputs to perform
multilabel predictions over a period of time. Specifically, the proposed model predicts
possible future visits of the patients, possible future diagnoses practiced by physi-
cians, and possible future use of medications. Choi et al. apply skip-gram embedding
to ICD codes as inputs to initialize a scheme for the recurrent neural network model.
These high dimensional input vectors are projected to a lower dimensional space
through RNN by gated recurrent units. Finally, the patient’s potential next visit is
predicted by a rectified linear unit (ReLU), while diagnoses codes and medication
codes are predicted by a softmax layer. Thus, the medical concepts and the patients
are better represented in the proposed architecture. Meanwhile, the experiments ran
in this paper suggest the potential of adopting RNN based models to other medical
systems by transfer learning, as well as the opportunity of medical systems with
insufficient patient data to improve clinical predictions towards their smaller client
base.
Lasko et al. [79] propose a computational phenotype discovery method in EHR
clinical data. Since the nature of medical data in EHR system is unstructured, noisy
and sparse, the method adopts a deep learning approach with longitudinal probability
densities inferred from Gaussian process regression to study these clinical data. As a
result, the study produces continuous phenotypic features accurately to indicate the
multiple population subtypes among data collection.
Liang et al. [80] propose a deep belief networks-based model to tackle the
computer-aided medical decision making (CAMDM) issues, with a focus on clin-
ical decision-making support and medical data analyses in the traditional Chinese
medicine in mid 2019. The model adopts an unsupervised learning algorithm of
seven layered deep belief network (DBN) to get feature representations following
by a supervised learning model of support vector machine (SVM) on top of the deep
belief network. The experimental results suggest that the novel deep learning DBN +
SVM model outperforms simple decision tree and SVM models in computer-aided
medical decision-making tasks.
Earlier in 2014, Liang et al. [81] used a convolutional deep belief network to train
the electrical medical records to support clinical decision making. The experiments
were run on a dataset of hypertension retrieved from HIS system, and a dataset on
Chinese medical diagnosis and treatment prescriptions in manually converted EHR
system. The experimental results are able to perform significantly better than the
conventional shallow models in discovering previously unknown medical concepts.
Using Deep Learning Based Natural Language … 287

3.3.2 Specific Disease Predictions

Except for the studies conducted on general disease, clinical or patient’s trends pre-
dictions, many of the existing literature narrow down their research fields to some
specific disease predictions. Among all those work, diabetes disease is very popular
in the past few years.
Mei et al. [82] take the raw data from the EHR system as inputs to construct their
proposed “Deep Diabetologist” model with RNN for EHR sequential data model-
ing. The goal of their study is to generate personalized clinical predictions, on hypo-
glycemia medicines specifically, for the diabetic patients. The data preprocessing was
done by linking patient IDs together with the event IDs. Then, the RNN medication
prediction model is adopted to generate the probabilities, following by a hierarchi-
cal RNN model of medication prediction to follow those time steps. Compared to
other baseline models, the hierarchical RNN model outperforms them significantly,
while provide more useful insights for future physicians and researchers to conduct
secondary studies.
While most of the existing work takes raw data from the EHR system as inputs
to their proposed model, Sousa et al. [83] studies the chronic diabetic disease with
financial records from health plan providers solely to make predictions on the disease
evolution. They believe the financial data is a way of aligning towards the interna-
tional standard where the records can encode medical procedures. The proposed
experiment is exercised on a self-attentive RNN model, where the most relevant
sentences are expected to be selected. Specifically, the input embedding layer is
pretrained with Word2Vec Skip-Grams. Then, the model’s embedding layer is con-
nected to two fully connected layers of BiLSTM model along with self-attention
mechanism. The experimental results generated from the study show it as an effec-
tive way of predicting diabetes, however, a full paper on the task is still waiting to
be published.

4 Challenges and Remaining Problems

Section 3 gives a clear picture of the current status of the published literature adopting
deep learning-based natural language processing techniques to medical domain. It
is also worth reviewing the existing problems and challenges in the research that are
remained to be solved. The challenges include but are not limited to,
(1) Data Volume: The foundation for deep learning-based techniques to perform
well is to have a huge amount of data. In healthcare sector, limited accessibility
of primary healthcare in certain areas and the fact that most patients perceive
medical records as one of the most private information and are not willing to
share are largely affecting the volume of data for research.
(2) Data Variability: The task of collecting a wide and unbiased variability of data
is difficult.
288 R. Zhu et al.

(3) Data Quality: Unlike the data generated in other domain, healthcare data are
by nature “dirty” data which are heterogenous, unstructured, noisy, incomplete
and ambiguous. Data preprocessing for deep learning models is challenging and
sometimes very time-consuming.
(4) Uncertainty: Diseases or viruses are developing and evolving in an uncertain pat-
tern all the time. Therefore, designing the deep learning based natural language
processing techniques to tailor this temporal data characteristic is important.
(5) Causal Inference: Identifying a reasonable and rational causal relationship
between viruses and diseases or treatments and patient’s body reactions are
never easy. Kale et al. [84] initially proposed a DNN to approach the causal
inference issue by analyzing the relationships between hidden feature represen-
tations to generated outputs.
(6) Interpretability: Interpretability has remained as one of the biggest challenges
in DL approaches in medical domain although they have delivered promising
results. Scholars often refer the deep learning methods as a black box that it is
hard to interpret how and why the proposed algorithms can perform so well.
Since all the medical results generated are closely related to life and death
problem, it is still hard to convince healthcare providers to practice exactly
what the machines recommend humans to do.
(7) Legal and Ethical Issues: As Choi et al. [63] discussed in their 2018 paper,
data privacy and synthetic patient’s EHR are rising issues in medical domain. In
many countries, patients’ EHR records are confidential data that are not allowed
to be shared across health institutions nor across industries. Research institu-
tions including government usually find it hard to study ongoing diseases as
the real patients’ data are at the hands of the hospitals. Thus, as the owner of
the data, hospitals usually need to form their own teams to conduct research in
specific medical domain. However, this legal and ethical data sharing restriction
would not only limit the scale of the study but also limit the diversity of the data
resulting in less efficient experimental results. There have been a few papers
focusing on this issue. For example, Choi et al. [63] try to find ways of solv-
ing the problem of limited data availability by proposing a novel deep learning
approach, medical GAN (medGAN), to generate synthetic patient records for
medical research. Given the real patients’ medical records, the proposed model
generates high-dimensional discrete variables by combining autoencoders and
GANs. Furthermore, they use minibatch average values to avoid collapse of the
mode, and to improve the efficiency of machine learning with batch normaliza-
tion and shortcut connections. To sum up, the presented approach demonstrates
the ability to produce synthetic patient records of comparable performance to
real data collections on many medical prediction tasks.
Using Deep Learning Based Natural Language … 289

5 Conclusion and Direction of Future Research

This book chapter provides a thorough review of deep learning-based NLP techniques
applied towards the clinical research. It presents the current status of deep learning-
based NLP models and their recent changes, as well as reviews the existing techno-
logical adoption in specific medical NLP tasks. Unlike other existing surveys, we use
a novel structure of methods categorization to split the published deep learning-based
NLP techniques for clinical decision making into three major task-oriented groups:
representation learning, information extraction and clinical predictions. Meanwhile,
from the experimental results presented in these literatures, we believe that it is
still early for the deep learning approaches to revolutionarily change the healthcare
industry due to its embedded challenges and problems such as uncertainty, inter-
pretability and ethical issues. However, recent advances and improvements made by
the proposed deep learning-based NLP models have suggested a promising start.
Thus, further research and studies towards various directions in the medical domain
are necessary. Some possible research directions towards future work include but are
not limited to,
(1) Feature enrichment: To solve the data volume and variability problem, we should
enrich models by capturing as many features as possible to get a well represen-
tation of patients and build more robust models to process the growing number
of features.
(2) Privacy control: Government could possibly collect and provide inference to
medical data at the federal level to protect patient’s privacy while allowing
necessary research projects to conduct efficiently.
(3) Incorporating expert knowledge into current deep learning approaches: Due to
all the challenges presented above and the limited amount of diverse data, human
experts will continue to play a dominant role in the healthcare sector in the near
future. Therefore, incorporating the invaluable expert knowledge into current
deep learning processes can not only produce better results, but also train the
machines to learn in a more accurate way. And,
(4) Improving model interpretability: The performance of the DL models and the
interpretability of the model performance are equally important in the healthcare
sector. It is a serious ethical problem for healthcare providers to adopt a system
if they do not understand. Therefore, in the future studies, it is necessary to find
logical and reasonable explanations about how and why the black box of the
DNN can perform well on given tasks.

Acknowledgements This work is supported by the Natural Sciences and Engineering Research
Council (NSERC) of Canada, an NSERC CREATE award in ADERSIM,1 the York Research Chairs
(YRC) program and an ORF-RE (Ontario Research Fund-Research Excellence) award in BRAIN
Alliance.2

1 http://www.yorku.ca/adersim.
2 http://brainalliance.ca.
290 R. Zhu et al.

References

1. Hinton, G.E., Mcclelland, J.L., Rumelhart, D.E.: Distributed representation. https://web.


stanford.edu/jlmcc/papers/PDP/Chapter3.pdf
2. Harris, Z.S.: Distributional structure. Word (1954)
3. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of
words and phrases and their compositionality. In: Advances in Neural Information Processing
Systems, pp. 3111–3119 (2013)
4. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in
vector space. In: arXiv preprint. arXiv:1301.3781 (2013)
5. Chalapathy, R., Borzeshi, E.Z., Piccardi, M.: Bidirectional LSTM-CRF for clinical concept
extraction. arXiv. https://arxiv.org/abs/1611.08373v1. (2016)
6. Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In:
European Conference on Machine Learning (ECML), pp. 1532–1543 (2014)
7. Kiela, D., Grace, E., Joulin, A., Mikolov, T.: Efficient large scale multi-modal classfication.
arXiv. http://arxiv.org/pdf/1802/02892.pdf
8. Krizhevsky, A., Sutskever, I., Hinton, G.: ImageNet classification with deep convolutional
neural networks. NIPS. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-
convolutional-neural-networks.pdf
9. Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis, Brno Uni-
versity of Technology (2012)
10. Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F, Schwenk, H., Ben-
gio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine
translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Lan-
guage Processing (EMNLP), pp. 1724–1734, Doha, Qatar. Association for Computational
Linguistics, Oct 2014b. https://doi.org/10.3115/v1/d14-1179 (2014)
11. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780.
https://doi.org/10.1162/neco.1997.9.8.1735 (1997)
12. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding. arXiv. https://arxiv.org/pdf/1810.04805.pdf
13. Better Language Models and Their Implications. https://openai.com/blog/better-language-
models/
14. Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of the Eighth
Conference Cognitive Science Society, pp. 1–12. (1986)
15. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J.
Mach. Learn. Res. 3, 137–1155 (2003)
16. Bengio, Y.: Neural net language models. Scholarpedia 3(1), 3881 (2008)
17. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives.
IEEE Trans. Patt. Anal. Mach. Intell. 35(8), 1798–1828 (2013)
18. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural
Comput. 18, 1527–1554 (2006)
19. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep
networks. In: Proceedings of the Neural Information and Processing Systems (2006)
20. Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representa-
tions with an energy-based model. In: Proceedings of the Neural Information and Processing
Systems (2006)
21. Lee, H., Ekanadham, C., Ng, A.: Sparse deep belief net model for visual area V2. In: Pro-
ceedings of the Neural Information and Processing Systems (2007)
22. Bengio, Y.: Learning Deep Architectures for AI. Foundations and Trends in Machine Learning
2(1), 1–127 (2009)
23. Gong, J.J., Naumann, T., Szolovits, P., Guttag, J.V.: Predicting clinical outcomes across chang-
ing electronic health record systems. In: International Conference on Knowledge Discovery
and Data Mining (KDD). ACM, pp. 1497–1505 (2017)
Using Deep Learning Based Natural Language … 291

24. Choi, T., Xiao, C., Stewart, W.F., Sun, J.: MiME: multilevel medical embedding of electronic
health records for predictive healthcare. arXiv. https://arxiv.org/pdf/1810.09593.pdf
25. Escudie, J.-B., Saade, A., Coucke, A., Lelarge, M.: Deep representation for patient visits from
electronic health records. arXiv. https://arxiv.org/pdf/1803.09533.pdf
26. Choi, E., Schuetz, A., Steward, W.F., Sun, J.: Medical concept representation learning from
electronic health records and its application on heart failure prediction. arXiv. https://arxiv.
org/abs/1602.03686 (2017)
27. De Vine, L., Zuccon, G., Koopman, B., Sitbon, L., Bruza, P.: Medical semantic similarity
with a neural language model. In: Proceedings of the 23rd ACM International conference
on Information and Knowledge Management-CIKM ‘14, 3–7 Nov 2014, Shanghai, China,
pp. 1819–1822. ACM, New York, NY, USA
28. Choi, E., Chiu, C.Y., Sontag, D.: Learning low-dimensional representations of medical con-
cepts. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001761/pdf/2381736.pdf (2016)
29. Minarro-Gim ́enez, J.A., Mar ́ın-Alonso, O., Samwald, M.: Exploring the applica-
tion of deep learning techniques on medical text corpora. Studies in health technology and
informatics (2013)
30. Liu, J., Zhang, Z., Razavian, N.: Deep EHR: chronic disease prediction using medical notes.
arXiv. https://arxiv.org/pdf/1808.04928.pdf (2018)
31. Choi, E., Bahadori, M.T., Schuetz, A., Stewart, W.F., Sun, J.: Doctor AI: predicting clinical
events via recurrent neural networks. arXiv. https://arxiv.org/abs/1511.05942 (2016)
32. Choi, E., Bahadori, M.T., Searles, E., Coffey, C., Thompson, M., Bost, J., Tejedor-Sojo, J.,
Sun, J.: Multi-layer representation learning for medical concepts. In: Proceedings of the 22nd
ACM SIGKDD International Conference Knowledge Discovery and Data Mining—KDD
’16’, 13–17 Aug 2016, San Francisco, CA, USA, pp. 1495–1504. ACM, New York, NY, USA
(2016)
33. Li, C., Song, R., Liakata, M., Vlachos, A., Seneff, S., Zhang, X.: Using word embedding
for bio-event extraction. In: Proceedings of the 2015 Workshop on Biomedical Natural Lan-
guage Processing (BioNLP 2015), Beijing, China, 30 July 2015, pp. 121–126. Association
for Computational Linguistics, Stroudsburg, PA (2015)
34. Tang, B., Cao, H., Wang, X., Chen, Q., Xu, H.: Evaluating word representation features in
biomedical named entity recognition tasks. Biomed. Res. Int. 2014, 1–6 (2014). https://doi.
org/10.1155/2014/240403
35. Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to
predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016).
https://doi.org/10.1038/srep26094
36. Dligach, D., Miller, T.: Learning patient representations from text. ARXIV. https://arxiv.org/
pdf/1805.02096.pdf
37. Zhang, Z., Kowsari, K., Harrison, J.H., Lobo, J.M., Barnes, L.E.: Patient2Vec: a personalized
interpretable deep representation of the longitudinal electronic health record. arXiv. https://
arxiv.org/pdf/1810.04793.pdf
38. Denaxas, S., Stenetorp, P., Riedel, S., Pikoula, M., Dobson, R., Hemingway, H.: Application
of clinical concept embeddings for heart failure prediction in UK EHR data. arXiv. https://
arxiv.org/pdf/1811.11005.pdf
39. Wei, X., Eickhoff, C.: Embedding electronic health records for clinical information retrieval.
arXiv. https://arxiv.org/pdf/1811.05402.pdf
40. Zhu, Z., Yin, C., Qian, B., Cheng, Y., Wei, J., Wang, F., Measuring patient similarities via a
deep architecture with medical concept embedding. arXiv. https://arxiv.org/pdf/1902.03376.
pdf
41. Liu, L., Li, H., Hu, Z., Shi, H., Wang, Z., Tang, Z., Zhang, M.: Learning hierarchical rep-
resentations of electronic health records for clinical outcome prediction. arXiv. https://arxiv.
org/pdf/1903.08652.pdf
42. Liu, Y., Ge, T., Mathews, K., Ji, H., McGuinness, D.: Exploiting task-oriented resources
to learn word embeddings for clinical abbreviation expansion. In: Proceedings of the 2015
Workshop on Biomedical Natural Language Processing (BioNLP 2015), Beijing, China, 30
July 2015. Association for Computational Linguistics, Stroudsburg, PA, pp. 92–97 (2015)
292 R. Zhu et al.

43. Wu, Y., Xu, J., Zhang, Y., Xu, H.: Clinical abbreviation disambiguation using neural word
embeddings. In: Proceedings of the 2015 Workshop on Biomedical Natural Language Process-
ing (BioNLP 2015), Beijing, China, 30 July 2015. Association for Computational Linguistics,
Stroudsburg, PA, pp. 171–176 (2015)
44. Li, C., Ji, L., et al.: Acronym disambiguation using word embedding. In: Proceedings of the
29th AAAI Conference on Artificial Intelligence (2014)
45. Gligic, L., Kormilitzin, A., Goldberg, P., Nevado-Holgado, A.: Named entity recognition in
electronic health records using transfer learning bootstrapped neural networks. arXiv. https://
arxiv.org/pdf/1901.01592.pdf
46. Sachan, D.S., Xie, P., Sachan, M., Xing, E.P.: Effective use of bidirectional language modeling
for transfer learning in biomedical named entity recognition. arXiv https://arxiv.org/pdf/1711.
07908.pdf (2018)
47. Gorinski, P.J., Wu, H., Grover, C., Tobin, R., Talbot, C., Whalley, H., Sudlow, C., Whiteley, W.,
Alex, B.: Named entity recognition for electronic health records: a comparison of rule-based
and machine learning approaches. arXiv. https://arxiv.org/pdf/1903.03985.pdf
48. Yin, X., Huang, X.J., Li, Z., Zhou, X.: A survival modeling approach to biomedical
search result diversification using wikipedia. IEEE Trans. Knowl. Data Eng. (TKDE) 25(6),
1201–1212
49. Huang, X., Zhong, M., Si, X.: York University at TREC 2005: genomics track. In: Proceedings
of the Fourteenth Text REtrieval Conference (TREC), Gaithersburg, Maryland, USA, 15–18
Nov (2005)
50. Huang, X., Hu, Q.: A bayesian learning approach to promoting diversity in ranking for biomed-
ical information retrieval. In: Proceedings of the 32nd Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval (SIGIR), pp. 307–314.
Boston, MA, USA, 19–23 July (2009)
51. An, X., Huang, X., geNov: a new metric for measuring novelty and relevancy in biomedical
information retrieval (Special Issue on Biomedical Information Retrieval). Nov 2017, 68(11),
2620–2635 (2017)
52. Li, F., Zhang, M., Fu, G., Ji, D.: A neural joint model for entity and relation extraction
from biomedical text. BMC Bioinformatics 18, 1 (2017). https://doi.org/10.1186/s12859-
017-1609-9
53. Mehryary, F., Bjo¨rne, J., Pyysalo, S., Salakoski, T., Ginter, F.: Deep learning with minimal
training data: TurkuNLP entry in the BioNLP shared task 2016. In Proceedings of the 4th
BioNLP Shared Task Workshop, 13 Aug 2016, Berlin, Germany, pp. 73–81. Association for
Computational Linguistics, Stroudsburg, PA (2016)
54. Quan, C., Hua, L., Sun, X., Bai, W.: Multichannel convolutional neural network for biolog-
ical relation extraction. Biomed. Res. Int. 2016, 1–10 (2016). https://doi.org/10.1155/2016/
1850404
55. Pyysalo, S., Ginter, F., Moen, F., Salakoski, T.: Distributional semantics resources for biomed-
ical text processing. In: Proceedings of the Languages in Biology and Medicine (LBM ’13),
pp. 39–44, Tokyo, Japan, Dec 2013 (2013)
56. Cheng, Y., Wang, F., Zhang, P., Hu, J.: Risk prediction with electric health record: a deep
learning approach. SDM 2016. https://astro.temple.edu/tua87106/sdm16.pdf (2016)
57. Zhang, Z., Roy, A., Li, X., Espino, S., Clara, S., Khan, S., Luo, Y.: Using clinical narratives
and structured data to identify distant recurrences in breast cancer. arXiv. https://arxiv.org/
pdf/1806.04818.pdf
58. Galk´o, F., Eickhof, C.: Biomedical question answering via weighted neural network passage
retrieval. arXiv. https://arxiv.org/pdf/1801.02832.pdf
59. Li, H., Zhang, J., Wang, J., Lin, H., Yang, Z.: DUTIR in BioNLP-ST 2016: utilizing convolu-
tional network and distributed representation to extract complicate relations. In: Proceedings
of the 4th BioNLP Shared Task Workshop, 13 Aug 2016, Berlin, Germany, pp. 93–100.
Association for Computational Linguistics, Stroudsburg, PA (2016)
60. Rahul, P.V.S.S., Sahu, S.K., Anand, A.: Biomedical event trigger identification using bidi-
rectional recurrent neural network based models. arXiv. https://arxiv.org/abs/1705.09516v1
(2017)
Using Deep Learning Based Natural Language … 293

61. Jagannatha, A.N., Yu, H.: Bidirectional RNN for medical event detection in electronic health
records. In: Proceedings of the Conference Association for Computational Linguistics. North
American Chapter. Meeting. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5119627/
62. Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine
translation: encoder-decoder approaches. arXiv e-prints. 2014 Sep. 1409:arXiv:1409.1259
(2014)
63. Choi, E., Biswal, S., Malin, B., Duke, J., Stewart, W.F., Sun, J.: Generating multi-label discrete
electronic health records using generative adversarial networks. arXiv. https://arxiv.org/abs/
1703.06490v1 (2017)
64. Lee, S.: Natural language generation for electronic health records. arXiv. https://arxiv.org/
pdf/1806.01353.pdf
65. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator.
In: Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on 2015 Jun
7, pp. 3156-3164. IEEE (2015)
66. Liu, X., Xu, K., Xie, P., Xing, E.: Unsupervised pseudo-labeling for extractive summarization
on electronic health records. arXiv. https://arxiv.org/pdf/1811.08040.pdf
67. Datta, S., Bernstam, S.V., Roberts, K.: A frame semantic overview of NLP-based information
extraction for cancer-related EHR notes. arXiv. https://arxiv.org/pdf/1904.01655.pdf
68. Zeng, Z., Deng, Y., Li, X., Naumann, T., Luo, Y.: Natural language processing for EHR-based
computational phenotyping. arXiv. https://arxiv.org/pdf/1806.04820.pdf
69. Rajkomar, A., Oren, E., Chen, K., Dai, A.M., Hajaj, N., Liu, P.J., Liu, X., Sun, M., Sundberg,
P., Yee, H., et al.: Scalable and accurate deep learning for electronic health records. arXiv
preprint. arXiv:1801.07860 (2018)
70. Zhang, X.S., Tang, F., Dodge, H., Zhou, J., Wang, F.: MetaPred: meta-learning for clinical
risk prediction with limited patient electronic health records. arXiv. https://arxiv.org/pdf/1905.
03218.pdf
71. Hosseini, A., Chen, T., Wu, W., Sun, Y., Sarrafzadeh, M.: HeteroMed: heterogeneous infor-
mation network for medicaldiagnosis. arXiv., https://arxiv.org/pdf/1804.08052.pdf
72. Avati, A., Duan, T., Jung, K., Shah, N.H., Ng, A.: Countdown regression: sharp and calibrated
survival predictions. arXiv. https://arxiv.org/pdf/1806.08324.pdf
73. Chung, I., Kim, S., Lee, J., Hwang, S.J., Yang, E.: Mixed effect composite RNN-GP: a
personalized and reliable prediction model for healthcare. arXiv. https://arxiv.org/pdf/1806.
01551.pdf
74. Heo, J., Lee, H.B., Kim, S., Lee, J., Kim, K.J., Yang, K., Hwang, S.J.: Uncertainty-aware
attention for reliable interpretation and prediction. arXiv. https://arxiv.org/pdf/1805.09653.
pdf
75. Wang, L., Zhang, W., He, X., Zha, H.: Supervised reinforcement learning with recurrent neural
network for dynamic treatment recommendation. arXiv. https://arxiv.org/pdf/1807.01473.pdf
76. Pham, T., Tran, T., Phung, D., Venkatesh, S.: DeepCare: a deep dynamic memory model for
predictive medicine. arXiv. https://arxiv.org/abs/1602.00357v2 (2016)
77. Ma, F., Gao, J., Suo, Q., You, Q., Zhou, J., Zhang, A.: 2018 risk prediction on electronic health
records with prior medical knowledge. In: KDD ’18: The 24th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, 19–23 Aug 2018, London, United
Kingdom. ACM, New York, NY, USA, p. 10. https://doi.org/10.1145/3219819.3220020
78. Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention
prediction and understanding with deep neural networks. In: Machine Learning for Healthcare
Conference, pp. 322–337 (2017)
79. Lasko, T.A., Denny, J.C., Levy, M.A.: Computational phenotype discovery using unsupervised
feature learning over noisy, sparse, and irregular clinical data. PLoS ONE 8, e66341 (2013).
https://doi.org/10.1371/journal.pone.0066341
80. Liang, Z., Liu, J., Ou, A., Zhang, H., Li, Z., Huang, X.: Deep generative learning for automated
EHR diagnosis of traditional Chinese medicine. Comput. Methods Progr. Biomed. 174, 17–23
(2019)
294 R. Zhu et al.

81. Liang, Z., Zhang, G., Huang, X., Hu, Q.: Deep learning for healthcare decision making
with EMRs. In: Proceedings of 2014 IEEE International Conference on Bioinformatics and
Biomedicine (BIBM), pp. 556–559
82. Mei, j., Zhao, S., Jin, F., Xia, E., Liu, H., Li, X.: Deep diabetologist: learning to prescribe
hypoglycemia medications with hierarchical recurrent neural networks. arXiv. https://arxiv.
org/pdf/1810.07692.pdf
83. Sousa, R.T., Pereira, L.A., Soares, A.S.: Predicting diabetes disease evolution using financial
records and recurrent neural networks. arXiv. https://arxiv.org/pdf/1811.09350.pdf
84. Kale, D.C, Che, Z., Bahadori, M.T., Li, W., Liu, Y., Wetzel, R.: Causal phenotype discovery
via deep networks. AMIA Annual Symposium Proceedings https://www.ncbi.nlm.nih.gov/
pmc/articles/PMC4765623/ (2015)
85. Ghassemi, M., Naumann, T., Schulam, P., Beam, A.L., Ranganath, R.: Opportunities in
machine learning for healthcare. arXiv. https://arxiv.org/pdf/1806.00388.pdf (2018)
86. Lyu, X., Huser, M., Hyland, S.L., Zerveas, G., Ratsch, G.: Improving clinical predictions
through unsupervised time series representation learning. arXiv https://arxiv.org/pef/1812.
00490.pdf (2018)
87. Nickel, M., Kiela, D.: Poincar\’e embeddings for learning hierarchical representations. arXiv
preprint arXiv:1705.08039 (2017)
88. Greenland, S., Robins, J.M., Pearl, J.: Confounding and collapsibility in causal inference.
Stat. Sci., pp. 29–46 (1999)
89. Miotto, R., Wang, F., Wang, S., Jiang, Z., Dudley, J.T.: Deep learning for healthcare: review,
opportunities and challenges. Brief. Bioinform. 375, 4 (2017). https://doi.org/10.1093/bib/
bbx044
90. Wei, C.-H., Harris, B.R., Kao, H.-Y., Lu, Z.: tmVar: a text mining approach for extracting
sequence variants in biomedical literature. Bioinformatics 29, 1433–1439 (2013). https://doi.
org/10.1093/bioinformatics/btt156
91. Liu, S., Tang, B., Chen, Q., Wang, X.: Effects of semantic features on machine learning-based
drug name recognition systems: word embeddings vs. manually constructed dictionaries.
Information 6, 848–865 (2015). https://doi.org/10.3390/info6040848
92. Mohan, S., Fiorini, N., Kim, S., Lu, Z.: Deep learning for biomedical information retrieval:
learning textual relevance from click logs. In: Proceedings of the BioNLP 2017 Workshop,
Vancouver, Canada, 4 Aug 2017, pp. 222–231. Association for Computational Linguistics
Stroudsburg, PA (2017)
93. Ohno-Machado, L.: Realizing the full potential of electronic health records: the role of natural
language processing. J. Am. Med. Inform. Assoc. 18, 539 (2011). https://doi.org/10.1136/
amiajnl-2011-000501
94. Bruijn, Bd, Cherry, C., Kiritchenko, S., Martin, J., Zhu, X.: Machine-learned solutions for
three stages of clinical information extraction: the state of the art at i2b2 2010. J. Am. Med.
Inform. Assoc. 18, 557–562 (2011). https://doi.org/10.1136/amiajnl-2011-000150
95. Yoon, H.-J., Ramanathan, A., Tourassi, G.: Multi-task deep neural networks for automated
extraction of primary site and laterality information from cancer pathology reports. In:
Advances in big data, INNS 2016, 23–25 Oct 2016, Thessaloniki, Greece; Angelov, P.,
Manolopoulos, Y., Iliadis, L., Roy, A., Vellasco, M. (eds.)Advances in Intelligent Systems
and Computing, vol. 529. Springer, Cham (2016)
96. Beaulieu-Jones, B.K., Greene, C.S.: Semi- supervised learning of the electronic health record
for phenotype stratification. J. Biomed. Inform. 64, 168–178 (2016). https://doi.org/10.1016/
j.jbi.2016.10.007
97. Bowman, S.: Impact of electronic health record systems on information integrity: quality and
safety implications. Perspect. Health Inf. Manag. 10, 1c (2013)
98. Beaulieu-Jones, B.K., Wu, Z.S., Williams, C., Byrd, J.B., Greene, C.S.: Privacy-preserving
generative deep neural networks support clinical data sharing. bioRxiv https://doi.org/10.
1101/159756 (2017)
99. Letham, B., Rudin, C., McCormick, T.H., Madigan, D., et al.: Interpretable classifiers using
rules and bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9(3),
1350–1371 (2015)
Using Deep Learning Based Natural Language … 295

100. Robins, J.M.: Robust estimation in sequentially ignorable missing data and causal inference
models. Proc. Am. Stat. Assoc. 1999, 6–10 (2000)
101. Robins, J.M., Rotnitzky, A., Scharfstein, D.O.: Sensitivity analysis for selection bias and
unmeasured confounding in missing data and causal inference models. In: Statistical models
in epidemiology, the environment, and clinical trials. Springer, pp 1–94 (2000)
102. Papernot, N., McDaniel, P., Sinha, A., Wellman, M.: Towards the science of security and
privacy in machine learning. arXiv. https://arxiv.org/abs/1611.03814v1 (2016)
103. Xu, Z., Chou, J., Zhang, X.S., Luo, Y., Isakova, T., et al.: Identification of predictive sub-
phenotypes of acute kidney injury using structured and unstructured electronic health record
data with memory networks. arXiv. https://arxiv.org/pdf/1904.04990.pdf
104. Chou, E., Nguyen, T., Beal, J., Haque, A., Fei-Fei, L.: A fully private pipeline for deep learning
on electronic health records. arXiv. https://arxiv.org/pdf/1811.09951.pdf
105. Banerjee, I., Gensheimer, M.F., Wood, D.J., Henry, S., Chang, D., Rubin, D.L.: Probabilistic
prognostic estimates of survival in metastatic cancer patients (PPES-Met) utilizing free-text
clinical narratives. arXiv. https://arxiv.org/pdf/1801.03058.pdf
106. Kayali, I.: Expert system for diagnosis of chest diseases using neural networks. arXiv. https://
arxiv.org/pdf/1802.06866.pdf
107. de la Torre, J., Valls, A., Puig, D.: A deep learning interpretable classifier for diabetic retinopa-
thy disease grading. arXiv. https://arxiv.org/pdf/1712.08107.pdf
108. Holzinger, A., Malle, B., Kieseberg, P., Roth, P.M., M¨uller, H., Reihs, R., Zatloukal, K.:
Towards the augmented pathologist: challenges of explainable-ai in digital pathology. arXiv.
https://arxiv.org/pdf/1712.06657.pdf

Runjie Zhu is currently a Ph.D. student at Electrical Engineering and Computer Science Program
at Lassonde School of Engineering, York University. Her research interests are in information
retrieval, natural language processing, with a specialization in biomedical information retrieval,
Electronic Health Records, Clinical Decisions and Predictions.

Xinhui Tu is currently an Associate Professor at the School of Computer, Central China Normal
University. He received his Ph.D., master’s and bachelor’s degrees from Central China Normal
University in 2012, 2006 and 2001, respectively. His current research interests include information
retrieval and natural language processing. He has published more than 30 papers in the leading
journals and conferences, such as SIGIR, CIKM, etc.

Jimmy Huang School of Information Technology holds a York Research Chair Professorship.
His research focuses on information retrieval, AI and big data analytics with complex structures
and their applications to Web & healthcare. He has published 230+ papers in top-tier venues (e.g.
ACM Transactions on Information Systems, IEEE Transactions on Knowledge & Data Engineer-
ing, ACM SIGIR, CIKM, KDD, ACL, IJCAI and AAAI). The outcome of his contributions in
developing and applying probabilistic modeling techniques to large-scale data analysis had sig-
nificant impacts on both academia and industry. He was and will be General Chairs for the 19th
CIKM and 43rd SIGIR.
Deep Learning for Medical Image
Processing
Diabetes Detection Using ECG Signals:
An Overview

G. Swapna, K. P. Soman and R. Vinayakumar

Abstract Diabetes Mellitus (or diabetes) is a clinical condition marked by hyper-


glycaemia and it affects a lot of people worldwide. Hyperglycaemia is the condition
where high amount of glucose is present in the blood along with lack of insulin.
The incidence of diabetes affected people is increasing every year. Diabetes cannot
be cured. It can only be managed. If, not managed properly, it can lead to great
complications which can be fatal. Therefore, timely diagnosis of diabetes is of great
importance. In this chapter, we see the effect of diabetes on cardiac health and
how heart rate variability (HRV) signals give an indication about the existence and
acuteness of the diabetes by measuring the diabetes-induced cardiac impairments.
Extracting useful information from the nonstationary and nonlinear HRV signal is
extremely challenging. We review that deep learning methods do that extricating task
very effectively so as to identify the correlation between the presence of diabetes and
HRV signal variations in the most accurate and fast manner. We discuss several deep
learning architectures which can be effectively used for HRV signal analysis for the
purpose of detection of diabetes. It can be seen that deep learning methods is the
state of art to understand and analyse the fine changes from the normal in the case
of HRV signals. Deep learning networks can be developed to a scalable framework
which can process large amount of data in a distributed manner. This can be followed
by application of distributed deep learning algorithm for learning the patterns so as
to do even correct predictions about future progress of the disease. Presently, there
is no publicly available data of normal and diabetic HRV. If large amount of private
data of diabetic HRV and normal HRV can be made available, then deep learning
networks have the capability to give the authorities different kind of statistics from
the stored data and projections of future prognosis of diabetes.

G. Swapna (B) · K. P. Soman · R. Vinayakumar


Amrita School of Engineering, Center for Computational Engineering and Networking (CEN),
Amrita Vishwa Vidyapeetham, Coimbatore, India
e-mail: [email protected]
K. P. Soman
e-mail: [email protected]
R. Vinayakumar
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 299


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_14
300 G. Swapna et al.

Keywords ECG · Diabetes · Machine learning · Heart rate variability · Deep


learning · Cardiovascular autonomic neuropathy

1 Introduction

Biosignals are biological signals extracted from the human body or in general human
beings. Commonly referred biosignals are electrical in nature, but there are nonelec-
trical biosignals also. Some examples of biosignals are electrocardiography (ECG)
which measures electrical activity of the heart, electroencephalography (EEG) which
measures brain activity, photoplethysmography (PPG) which depicts the volumetric
changes of an organ. ECG signal is employed for the noninvasive diagnosis of dia-
betes. ECG is used by clinicians to electrically measure rhythm of the heart, attaching
electrodes to the skin surface. ECG depicts the complete electrical patterns of the
heart including atrial depolarization and ventricular repolarization. Heart rate vari-
ability (HRV) data is extracted out of ECG signal. HRV is a simple, but powerful
signal which clearly reflects the condition of cardiovascular system.
From the initial days, biosignals are processed and analysed mainly through
extracting features and then classifying them. These processes are performed by
developing computer-aided design (CAD) systems. The features are manually
selected and need to be optimal since identification of suitable features requires
domain knowledge. The performance of above-mentioned approaches is not satis-
factory as the complexity of the data increases. The analysis of complex, high dimen-
sional, real-world data can be effectively done using deep learning. Deep learning
is done using deep learning architectures typically made up of very large number of
hidden layers and containing millions of neurons interconnected in a structure simi-
lar to a 2D matrix. These complex networks are capable of handling and analysing
complex, very large sized and very high dimensional data. Raw data (or data under-
gone very little signal processing) can be directly fed into these networks. Each
layer of the network produce at its output, representations which are automatically
designed by the deep learning network, using a general learning method (in place of
manually decided feature extraction in the case of machine learning based typical
neural networks, which are very small sized and of very simple structure compared
to deep learning networks). Though deep learning networks are commonly used
for two-dimensional image analysis problems, it can be very effectively used for
one-dimensional data also. We review application of deep learning based methods
to one-dimensional HRV data. The main bottle neck of applying deep learning to
biosignal in general is the present non-availability of very large sized training data
belonging to medical domain which is required for training deep learning networks
having gigantic number of parameters.
Diabetes mellitus, which is usually called diabetes, is a long-term metabolic dis-
order wherein the body is incapable of metabolizing glucose (sugar) properly. This
creates a very high level of glucose in the blood (this condition is known by the term
hyperglycaemia). Insulin is a hormone that is necessary for the body cells to absorb
Diabetes Detection Using ECG Signals: An Overview 301

blood glucose (produced from the carbohydrates in the food we intake) and to store
glucose for future needs. The condition of diabetes is either because of the incapa-
bility of the body to generate sufficient insulin or because of the state where body
cells do not react to the generated insulin. Medically, there is no cure for diabetes.
Hence it should be properly controlled. Below are the different types of diabetes.
Type 1 diabetes is the name of the diabetes found in children. In type 1 diabetes,
the immune system of the body destroys its own beta cells resulting in deficiency
of insulin. Type 2 diabetes is the common type of diabetes that develops in adults
usually above the age of 40. The cells generally become insensitive to the insulin
produced or the cells are unable to use the produced insulin properly. This is known as
insulin resistance. Gestational diabetes is the glucose intolerance developed during
pregnancy period. Out of these three types, type 2 diabetes is the most commonly
prevalent type. In this chapter, we mean type 2 diabetes by the word diabetes.
A 2017 statistics estimates that 8.8% of people worldwide have diabetes. It is
rising more alarmingly in underdeveloped countries. According to National Diabetes
Statistics Report 2017 (pertaining to United States), about 9.4% of U.S. population
has diabetes in 2015. Of these, about 23% were not aware or did not report having
diabetes (diabetes was undiagnosed for them).
As per the statistics of International Diabetes Federation, India has a diabetes
population of 6.9 crores. India is the country having the second largest diabetes
population in the world. Kerala is one among the states having the largest number
of diabetes affected people in India. As per the new statistics of Indian Medical
Association (IMA), in Kerala every year, 138 people are being newly diagnosed
by diabetes out of a population of 1000 people. Some of the consequences due to
diabetes have been briefed by World Health Organization (WHO) as follows. In
2015, approximately 1.6 million deaths globally were directly caused by diabetes.
Almost 50% of these deaths happen earlier to 70 years of age because of increased
blood sugar.
Diabetes causes damage to nerves known as diabetic neuropathy. Diabetes
increases the possibility of heart ailments and stroke. About 50% of diabetes inflicted
people die due to heart related complications. Diabetes can lead to amputation of
limb caused by neuropathy in feet. Another problem caused is diabetic retinopathy
wherein the nerve problem caused by diabetes can cause heavy damage to blood
vessels in retina which may affect eye vision (10% of diabetic people), may lead to
blindness (2% of diabetic people) also. Death comes in the form of kidney failure
in average 15% of diabetic people. Thus, over time, uncontrolled diabetes leads to
serious damage of many vital organs of the body like heart, blood vessels, kidneys
(nephropathy), nerves, feet and eyes. Diabetes deaths are mainly due to complications
caused by the disease.
Hyperglycaemia in less severe condition is known as impaired glucose tolerance.
This condition is characterised by high risk of large blood vessel disease and may
lead to complications like myocardial infarction. The impaired glucose tolerance
condition does not considerably lead to microvascular disease similar to the condition
of diabetes induced hyperglycaemia.
302 G. Swapna et al.

All the above data and reports underline the necessity and challenges in the devel-
opment of effective diabetic detection and management methods. Some of the symp-
toms of hyperglycaemia due to diabetes are enormous urine excretion, high levels of
thirst, hunger and fatigue. Reduction in weight and impairment in vision are likely to
happen. In terms of diagnosis, major challenge is the fact that these symptoms are not
that marked at the onset of diabetes. Symptoms get pronounced only after diabetes
worsens to the extent of leading to complications. To minimize such complications,
early detection of diabetes is important. Methods should be developed that will help
to prevent or delay diabetes. Effective ways should be developed for diagnosis and
treatment of this disease. Further challenge is developing methods which are capable
to predict much early diabetes in a cost effective way so that corrective steps and
treatment can be given in time to avert diabetes, thus also saving the person from the
serious complications to which diabetes if undetected or not properly managed can
lead to.
Here, we review methods that are related to non-invasive diagnosis methods of
diabetes with high accuracy using HRV signals derived from ECG signals. Heart rate
value based diabetes detection has been observed to be computationally efficient than
the decision theoretic approach and hence has been heavily explored. Deep learning
methods are now being increasingly used in healthcare analytics. Initially, machine
learning techniques were extensively used for HRV based diabetes detection. Deep
learning architectures have the potential to improve the accuracy of diabetes detection
by capturing minute variations in ECG. Further big stride possible in future is the
prediction of diabetes if sufficiently large amount of training and testing data are
made available.
In this chapter, Sects. 2 and 3 provide discussion of the relevant medical aspects of
diabetes and its detection methods. Sections 4 and 5 detail the machine learning and
deep learning methods used by researchers for diabetes detection. Section 6 gives
the detailed literature survey of works using ECG-derived-HRV as input for diabetes
detection. A sample architecture and implementation details are described in Sect. 7.
The limitations and challenges of deep learning methods are discussed in Sect. 8.
The chapter concludes with Sect. 9.

2 Diabetes

2.1 Diabetes and Its Associated Mechanism

Glucose homeostasis is the natural regulation mechanism of the body by which the
blood glucose (blood sugar) levels are maintained within a narrow range. Diabetes
refers to a group of conditions which indicates that blood glucose balance in the body
has gone out of control. For proper functioning of the body, the blood glucose values
have to strictly fall between a very narrow range (70 ml/dl and 110 mg/dl) (ml is
millilitre and dl is decilitre). The pancreatic endocrine hormones namely insulin and
Diabetes Detection Using ECG Signals: An Overview 303

glucagon make this happen. Insulin and glucagon are the vital hormones secreted
by pancreatic islet cells in response to the level of blood sugar, but in an opposite
manner.
The beta cells of the pancreas secrete insulin. Glucose is the main source of energy
for the body cells. But glucose is a large molecule which cannot be passed through
the cell membrane through simple diffusion mechanism. Insulin enables glucose
transport into the cells. There is a very low base level of insulin always secreted.
When we take food, carbohydrates are converted to glucose and most of it is sent
to the blood. When blood glucose is high, then a proportional amount of insulin is
produced. When insulin is present, the cells of the body can absorb glucose out of the
blood thus leading to the reduction of blood glucose level. The cells use the absorbed
glucose for getting energy for carrying out their assigned functions. When the blood
glucose decreases to the normal level, then the amount of insulin secreted also goes
down to the base minimum. Thus high blood glucose serves as a signal to pancreas
to release insulin to the blood. Suppose the level of blood glucose remains high even
after cell absorption, then insulin facilitates the storage of the excess glucose in the
cells of the liver in the form of a substance known as glycogen by the process called
glycogenesis.
The alpha cells of the pancreas secrete glucagon whose action is opposite to that
of insulin. Glucagon production is inversely proportional to the amount of blood
glucose. If blood glucose is high, no glucagon is produced. If blood glucose is low
(for example when there is long gap after taking food), large amount of glucagon
is secreted. Glucagon induces liver to release its stored glucose by converting the
glycogen to glucose by the process called glycogenolysis. Thus, the level of blood
glucose is increased. Glucagon also induces liver and some muscle cells to produce
glucose from other nutrients such as protein. The above mentioned processes are
summarized in Fig. 1.

2.2 Types of Diabetes

Type 1, 2 and gestational diabetes are the commonly seen categories of diabetes. The
type 1 is mainly found in children. This is characterized by the incapability of the
body to generate insulin, mainly because of the autoimmune damage of beta cells
in the pancreas which produces insulin. The people having this diabetes have to live
their whole life with the support of insulin injections; otherwise complications will
occur due to the increased blood glucose. Type 1 diabetes people commonly show
symptoms of fast weight loss, polydipsia (abnormally high thirst), polyuria (large
amount of urine production) and the associated nocturia (tendency to urinate more
times during night). There will be presence of ketone bodies in urine (condition
known as ketonuria).
304 G. Swapna et al.

Fig. 1 Mechanism of
maintaining desired blood
glucose levels

Table 1 Important
Different features Type 1 Type 2
distinguishing features of
type 1 and 2 diabetes Age of the start of <40 years >50 years
disease
Duration of Weeks Months to years
symptoms
Body weight Normal or low Above normal
Ketonuria Present Absent
If insulin treatment Can lead to rapid Does not pose
is not given death immediate threat
to life
Complications at No Around 25%
the time of
diagnosis
Family history of Need not be there More likely to be
diabetes there

Type 2 diabetes is the state of decreased sensitivity to the action of insulin. Diabetic
patients need external insulin support for maintaining the proper balance of blood
glucose. If not treated properly, the diabetes is likely to progress. This is the most
prominent type of diabetes prevalent (Table 1).
Diabetes Detection Using ECG Signals: An Overview 305

Gestational diabetes develops in pregnancy (gestation) period. The blood sugar


levels, which are normal before pregnancy, increase beyond allowable ranges. If not
properly managed, it will affect pregnancy and baby’s health.
There is another term related to diabetes known as prediabetes. It is the condition
where sufficient insulin is produced in the body, but the body doesn’t make use of
it properly. The blood glucose levels are high in the case of prediabetes, but not as
high as found in type 2 diabetes. Prediabetes is an indicator of the future high risk
of developing type 2 diabetes.
Diabetes, if not treated properly, result in too much increased blood glucose
(hyperglycaemia) leading to complications. If the diabetes affected people take too
much insulin or if they exercise without sufficient food, it can lead to low blood sugar
condition known as hypoglycaemia which is highly life threatening.

2.3 Complications Due to Diabetes

Uncontrolled diabetes over a long duration can lead to many complications. Type
2 diabetes doesn’t show noticeable symptoms at the initial stage. Because of this,
about 25% of the people show evidences of diabetic complications at the time of
diagnosis only.
70% of the deaths in diabetes are due to cardiovascular diseases. A statistics from
USA indicate that diabetic people have 1.7 times higher cardiovascular death rates
than their non-diabetic counter parts among people aged 20 and above. The chance
of diabetic people affected by myocardial infarction and stroke are 1.8 and 1.5 times
higher when compared to non-diabetic people. The effects of cardiovascular risk
factors like smoking and hypertension gets magnified by the presence of diabetes.
Macrovascular (large blood vessel) disease caused by diabetes lead to fatal com-
plications like angina, stroke, myocardial infarction, cardiac failure, intermittent
claudication (cramping pain in leg) etc. Diabetic people suffer from atherosclerosis
(deposit of fatty material in the inner walls of the arteries) much earlier with much
severity than non-diabetic people. Diabetes also affects the small blood vessels in
the body. This condition is also known as microvascular disease (also known as dia-
betic microangiopathy) and it leads to thickening of the basement membrane of the
capillaries and further leads to increase in the vascular permeability throughout the
body.
Retinopathy induced by diabetes is the most common form of vision related
impairment in adults. Capillary occlusion (blockage) due to hyperglycaemia
increases local vascular endothelial growth factor (VEGF) in retina. The occlusion
of a lot of capillaries leads to the growth of new vessels in retina. There will be
swellings called microaneurysms in capillary vessels in retina which leak fluid and
blood resulting in retinal haemorrhages. The most serious form of diabetic retinopa-
thy is called proliferative retinopathy which if left untreated causes extensive visual
damage in the form of retinal detachment and frequent haemorrhages.
306 G. Swapna et al.

Diabetic nephropathy refers to the damage caused to the kidneys which may
finally lead to kidney failure. Kidney is made up of microscopic units called nephrons
which filter out impurities from the blood. Diabetes induced hyperglycaemia affects
the proper filtering functions performed by the nephrons. Diabetic nephropathy is a
prominent reason for long-term kidney disease and end-stage renal disease (ESRD)
wherein the kidneys do not work properly. ESRD is the last stage in diabetic nephropa-
thy where the person cannot survive without dialysis.
It is found that diabetic neuropathy is an important cause of morbidity and mor-
tality in diabetes. In peripheral neuropathy, peripheral nerves are affected resulting
in problems like deficiencies in motor and sensory functions. Weakening of the
proximal muscles (muscles close to the body’s midline), abnormality in gait, pain
in limbs and feet can happen. In autonomic neuropathy, parasympathetic or sym-
pathetic nerves may be affected in many visceral systems. There are innumerable
clinical features of autonomic neuropathy affecting different systems of the body like
cardiovascular systems (e.g. resting tachycardia), gastrointestinal systems (e.g. con-
stipation, abdominal fullness, nocturnal diarrhoea), pupillary systems (e.g. reduced
reflexes to light, reduction in pupil size) etc. All the above described complications
are shown in Fig. 2.

2.4 Causes (Risk Factors) of Diabetes

Overeating, under activity and obesity may lead to diabetes in the case of middle-
aged people according to the epidemiological studies conducted. People with a body
mass index (BMI) larger than 30 kg/m2 are 10 times more prone to getting type 2
diabetes. Middle-aged and elderly people are also at greater risk of diabetes.
Ethnic origin is another major risk factor of diabetes. It is found that in USA, only
5.5% of the Alaskan people are affected by diabetes, while it is 7.1% for non-hispanic
white people and 13% for non-hispanic black people. The highest value of 33% is
for native Americans in USA. These disparities observed based on ethnicity may be
due to a variety of unknown and known factors like life style, BMI related etc.

2.5 Treatment and Management of Diabetes

Proper treatment, effective blood glucose monitoring and control are very essential
in preventing diabetes causing complications. Popular treatment is through the oral
intake of effective drugs in order to maintain proper blood glucose level for diabetic
people. Another mode of treatment is by insulin injection subcutaneously applied
commonly to upper arms, thighs and buttocks with a disposable plastic syringe and
a sharp needle. They are normally given in multiple doses several times a day. In
acute cases, especially to those belonging to type 1 diabetes, continuous subcuta-
neous insulin therapy (or insulin pump) is administered. A further improvement of
Diabetes Detection Using ECG Signals: An Overview 307

Fig. 2 Complications of diabetes

insulin pump which incorporates a closed loop system is known as artificial pan-
creas. Artificial pancreas is an integrated system working in closed loop consisting
of insulin pumps along with continuous glucose monitoring systems (CGMS). The
CGMS system can be considered to include interstitial glucose measurement done
every 5–15 min, a personal glucose monitor which uses the glucose information to
calculate the amount of insulin to be delivered into the body by the insulin pump and
finally the insulin pump that delivers insulin.
It is important to adopt a healthy lifestyle by doing regular physical activity
and maintaining proper BMI. Healthy diet is very important. Alcohol consumption,
smoking and stress have to be avoided. Many of the important medical aspects
discussed in this paper are taken from book Davidson’s Principles and practice of
Medicine [1].
308 G. Swapna et al.

3 Common Methods of Diabetes Detection

3.1 Invasive Methods of Diabetes Detection (Blood Testing)

As said initially, blood glucose level has to be maintained between 70 and 110 mg/dl
in the fasting condition. If it is below 70, then the condition is hypoglycaemia. If food
is taken within two or three hours, then the glucose level can exceed 110. Irrespective
of the amount of food one has taken, blood sugar should not exceed 180 in the normal
case. If it is more than 180, the condition is hyperglycaemia indicative of diabetes.
All the commonly used methods for detecting diabetes are invasive in nature. It
generally involves extracting blood sample from the person and testing it for the
possible anomaly. Popular invasive tests for diabetes detection and its acuteness are
explained below. Table 2 also highlights the importance of these tests in diabetes
detection.

3.1.1 Oral Glucose Tolerance Test (OGTT)

OGTT is mainly done to check for gestational diabetes in pregnant woman. A pre-
scribed amount of sugar contained drink is given to the person under test. Blood sam-
ples are tested at the prescribed time intervals. Blood glucose measurement greater
than 200 indicates the presence of diabetes. If diabetes is undetected in pregnant
woman, it may lead to complications.

3.1.2 HaemoglobinA1c (HbA1c)

HbA1c blood test gives the average blood sugar value for the past three months.
HbA1c means glycated haemoglobin. Haemoglobin is a protein contained in red
blood cells whose task is to carry oxygen throughout the body. Haemoglobin is
glycated when haemoglobin combines with blood glucose. HbA1c greater than 6.5%
indicates diabetes.

Table 2 Indication of diabetes and prediabetes


Indication of diabetes Indication of prediabetes
Fasting blood sugar ≥126 mg/dl Fasting blood sugar ≥110 mg/dl and
≤126 mg/dl
Blood sugar two hours later a 75 g oral Blood sugar two hours later a 75 g oral
glucose drink ≥200 mg/dl glucose drink in range 140–200 mg/dl
HbA1c ≥6.5% HbA1c in the range 5.7–6.4%
Diabetes Detection Using ECG Signals: An Overview 309

3.1.3 Interstitial Glucose Monitoring

This is a recently developed test to detect diabetes through interstitial continuous


glucose monitoring (CGM). This test involves insertion of a tiny sensor under the
skin in order to measure the glucose level in the interstitial fluid. One sensor can
remain in that place for two weeks after which it has to be replaced by a new sensor.
The sensor measures glucose level every one or five minutes in real time. In a span of
two weeks, the sensor collects a substantial amount of data which can be analysed to
get a variety of information like daily glucose profile, night-time glucose profile etc.
It is possible to incorporate alarms into the sensor so that it can give the individual
who wears it, warning in case hypoglycaemia occurs.

3.2 Non-invasive Methods of Diabetes Detection (Using ECG


Analysis)

3.2.1 Diabetes and Associated Cardiac Changes

Diabetes can cause severe autonomic impairments. Diabetes induced high blood
glucose/sugar (hyperglycaemia) causes cardiovascular malfunction and precapillary
damage. This damage will affect the endothelial cells’ normal working and blocks
the normal route of passage of nitric oxide (NO) [2]. NO is essential for vasodilation.
Diabetes-induced-hyperglycaemia causes reduced activation of phosphorylation cas-
cade, leading to less endothelial NO synthase which is required to synthesize NO.
Diabetes, thus leads to reduction in the availability of NO. The endothelial cell dam-
ages due to diabetes cause the blood vessels to be vasoconstricted and it affects the
normal blood circulation.
Hyperglycaemia results in the production of free oxygen radicals which acti-
vate NO (derived from endothelium) and protein kinase C which boosts vasocon-
strictive prostanoid production [3]. Hyperglycemia leads to endothelial damages,
increases the activity and aggregability of the platelets [3, 4]. Eventually, monocytes,
leukocytes and platelets are strongly adhered to endothelium. Blood coagulability is
increased and fibrinolitic activity is decreased.
Thus, fatty material is increasingly deposited on the inner side of the blood vessel
wall due to the high blood glucose condition. The deposit leads to production of
blocks and hardening of blood vessels (atherosclerosis), obstructing flow of blood
through the blood vessels. Two major types of cardiovascular disease are coronary
artery disease and cerebral vascular disease. Coronary artery disease (ischemic heart
disease) is caused by thickening of blood vessels that go to the heart by deposits
of fatty material. Heart’s blood flow is thus decreased or blocked leading to a heart
attack. Increased blood sugar levels not only damage blood vessels, but also change
the level of blood lipid. Diabetic people are at least twice more probable to develop
310 G. Swapna et al.

heart disorders or stroke than non-diabetic people. Heart attacks in people with
diabetes are more serious (more likely to result in death).
60–70% of diabetic patients have some form of neuropathy caused by diabetes.
Diabetic neuropathy can be further grouped as autonomic, focal, peripheral and
proximal neuropathy. Our focus is on the diabetic neuropathy affecting the nerves
connected with the functioning of the heart (neuropathy known by cardiovascular
autonomic neuropathy (CAN)). Heart rate and blood pressure are affected by CAN.
High glucose level associated with diabetes causes serious problems in different
organs of the body. All the autonomic microvascular damages also cause decrease
in local reflexes. CAN leads to diminished HRV indicative of diabetic neuropathy
[5]. Diabetes induced CAN may cause ECG alterations like ST-T changes, sinus
tachycardia, heart rate variability changes, long QTc etc. It was also confirmed that
QT, QTc and ST dispersions are predictors of death in diabetic patients [6, 7]. Among
these ECG alterations, we are concentrating on the HRV signal which can be used
for diabetes diagnosis since HRV is indicative of cardiac disorders developed due to
diabetes.

3.2.2 ECG Changes Due to Diabetes

ECG represents the role of autonomic nervous system (ANS) in regulating heart’s
natural rhythm. The generation method of ECG signal is as follows. The origin
of the heartbeat is in a form of an electric impulse from sino-atrial (SA) node. This
contracts both atria and then activates atrioventricular (AV) node and spreads through
both ventricles. The complete activity is represented in the ECG waveform (Fig. 3).

Fig. 3 Conducting system of the heart [8]


Diabetes Detection Using ECG Signals: An Overview 311

P, QRS, T and U are the prominent electrocardiographic deflections in the ECG


signal. The activation or depolarization of the atria is represented by the P wave.
Ventricles’ depolarization is represented by the QRS. The repolarization of the ven-
tricles is the T wave. U wave represents the papillary muscle repolarization. Normal
duration of P wave is 0.11 s while that of QRS complex is 0.10 s. The normal range
of QT interval is 0.35–0.43 s while normal PR interval is 0.12–0.20 s.
The widely used configuration for ECG measurements consists of 5 electrodes.
One electrode each is positioned on the left arm (LA), right arm (RA), left leg (LL),
right leg (RL) and chest to the right of the reference electrode. Another widely used
ECG capture system consists of 10 electrodes (12 leads).
Stern et al. found out from ECG that a diabetes affected person who showed no
indications of CAN develop left ventricular hypertrophy [9]. This shows the high risk
of a diabetic patient to develop cardiovascular disease in future. The work by Stern
et al. did not stop there. Diet was strictly monitored and proper measures were taken
to ensure cardiac health for the patient. Under these conditions, a six year follow-
up was performed. Their observation was that the diabetes of this person remained
well controlled and his ECG did not change further and he did not further show any
clinical or ECG signs of neuropathy.
The shape of the ECG indicates the cardiac health of the person [10].The difficulty
in using ECG for the purpose of analysis is due to the fact that the delicate variations in
the ECG waveform are extremely difficult to be differentiated by human perception.
The performance of usual biosignal analysis methods is thus not up to the mark on
ECG signals.

3.2.3 Heart Rate Variability

SA node functions as the heart’s pacemaker. The cardiac impulse generated here
is influenced by the parasympathetic and sympathetic nervous systems. Cardio-
acceleration is caused by enhanced activity of sympathetic nervous system (SNS)
or decreased parasympathetic nervous system (PNS) activity. Cardio-deceleration is
caused by decreased SNS or increased PNS activity. Thus the status of the ANS is
clearly understood from HRV signals. The SNS and PNS are the two branches of
the ANS which together control the heart rate. Thus HRV can give a clear picture
about sympathetic-parasympathetic balance. The instantaneous heart rate, together
decided by the SNS and PNS, is strongly influenced by different kinds of neural,
myocardial and hormonal factors [11].
The analysis of the non-invasive HRV data has innumerable applications in clin-
ical areas of cardiology, physiology and pharmacology. HRV related cardiological
impairment analysis is of real significance. They are simple and non-invasive, can
detect impairments which have not gone to the stage of showing clear symptoms. If
detected, the patient can further go in for detailed clinical tests. Research showed that
the non-invasive HRV measurements are also reproducible if done under standard
conditions [12, 13].
312 G. Swapna et al.

Heart rate signal contains the RR interval information ordered in time. The vari-
ation of RR intervals is known as HRV. The variations in the ANS due to hypergly-
caemia can be represented well by HRV signals. Shape is an irrelevant feature for the
discrete HRV signal. The HRV data available (i.e. instantaneous heart rate against
time axis) can be analysed by different methods. It can serve as an excellent and
accurate non-invasive technique to understand the state of the ANS which regulates
the cardiac activity and heart rate.

4 Machine Learning for Diabetes Detection

Before deep learning techniques emerged, biosignals were analysed mainly using
machine learning (ML) techniques. ML applies artificial intelligence (AI) to systems
to make them capable of automatic learning without explicit rule-based programming
and without human assistance. In anomaly detection case, ML algorithm finds a
mathematical function by itself that produce the correct outcome (anomaly present
or absent) from the input training data (data from diagnostic tests like ECG, HRV),
understanding the hidden patterns in input data. With this learned mathematical
function, it should be able to predict the output state for a new set of input data with
high accuracy.
Extensive domain knowledge of the human system and its intricate mechanism
coupled with deep understanding of the biosignal variations happening during the
anomaly is imperative to decide what type of features has to be extracted from the
biosignal and analysed. So the initial step required is the selection of desirable fea-
tures which can be effectively used for the purpose of anomaly detection. Then these
features are extracted and fed to classifiers to detect the presence of anomalies. In
the case of diabetes detection using HRV, the initial research used different meth-
ods like time, frequency, nonlinear methods etc. All these methods gave different
ranges for the parameters for the normal and abnormal signals. These distinctive
ranges enabled classifiers to classify with accuracy above 85%. The nonlinear meth-
ods were specifically suited to biosignals like ECG which are inherently nonlinear
and nonstationary in nature. The important methods of HRV analysis for diabetes
detection using ML techniques are discussed below briefly. The features belonging
to the below described domains are then passed through suitable classifiers.

4.1 Time Domain Methods

Time domain measures involve statistical operations that involve calculating the
mean and variance of the RR interval of HRV data. Important time domain param-
eters are average of heart rate, RMSSD and SDNN. Parameters like RMSSD are
indicators of high frequency changes affecting heart rate and thus reflect the state of
parasympathetic activity. The shortcoming of time domain measurements is that they
Diabetes Detection Using ECG Signals: An Overview 313

are very easily prone to outliers and artifacts. Hence, elimination of these artifacts
has to be necessarily done for the data analysis.

4.2 Frequency Domain Methods

Frequency domain measures analyse all available frequency components present in


the HRV. Power spectrum density (PSD) can give valuable information about the neu-
rogenic heart rhythms [14]. The high frequency region (0.15–0.5 Hz) is an indicator
of the parasympathetic activity, the low frequency region (0.04–0.15 Hz) indicates the
complete sympathetic and parasympathetic activities. Fast Fourier transform (FFT)
is generally used for the estimation of PSD. Autoregressive (AR) model is another
popular frequency domain representation very much suitable for analysis of biosig-
nals like ECG and EEG. The reliability of frequency domain based methods decrease
with the decrease in signal-to-noise power.

4.3 Wavelet Transform

The traditional frequency domain techniques are incapable to provide exact time
localization in a typical nonstationary biosignal. To overcome these, better techniques
were developed. The wavelet analysis, which shows very good performance, involves
comparison of the signal with a selected wavelet of limited duration and finding
parameters. HRV analysis can thus be effectively performed making use of wavelet
transform and also be used to obtain the time related information of various frequency
bands [15].

4.4 Nonlinear Methods

Nonlinear methods are much suited for analysing the nonlinear and nonstationary
biosignals like ECG. Some of the important nonlinear parameters used for HRV
analysis are approximate entropy (ApEn), higher order spectrum (HOS), detrended
fluctuation analysis (DFA), correlation dimension (CD), recurrence quantification
analysis (RQA) features and empirical mode decomposition (EMD) features.

4.4.1 Detrended Fluctuation Analysis (DFA)

DFA (Peng et al.) is very useful in assessing the fractal scaling characteristics of
HRV data [16]. The fluctuation inherent in the data is represented by parameter α
314 G. Swapna et al.

(indicates irregularity of input data). Typically, α is closer to 1 for normal (young


and healthy) people. α varies according to different cardiac disorders.

4.4.2 Correlation Dimension (CD)

CD is a nonlinear feature which can be effectively used for detecting anomalies. CD


is a type of fractal dimension. Popular technique for finding out CD (proposed by
Grassberger et al.) constructs a function C(r) by finding out the distance among all
data points and then grouping them [17].
CD is found out by the expression given by

log[C(r )]
C D = lim (1)
r →0 log(r )

The normal people produce a higher CD value when compared to the diabetic
signal because normal RR signal has higher RR variability.

4.4.3 Approximate Entropy (ApEn)

ApEn is a measure of disorder in HR signal [18]. The value of ApEn is larger for more
complex or irregular data (the normal case) and vice versa for cardiac impairment
(diabetic) cases.

4.4.4 Recurrence Quantification Analysis (RQA)

Recurrence plot (by Eckmann et al.) is a graphical aid to identify concealed reoc-
currences in time domain signal which may not be pronounced [19]. It measures the
nonstationarity of the time-series. Several important parameters can be calculated
from recurrence plot. Example of these parameters are laminarity (LAM), mean
diagonal line length, recurrence rate (RR), determinism (DET), entropy and trap-
ping time (TT).

4.4.5 Higher Order Spectrum (HOS)

HOS is very useful in the dynamical analysis of nonlinear, nonstationary and non-
gaussian biosignals. HOS (also called polyspectra) represents the cumulants and
moments of order three and above. HOS can be effectively used for the analysis of
HRV signals. Several useful HOS features can be extracted from HRV data and fed
to different classifiers for the purpose of diabetes detection.
Diabetes Detection Using ECG Signals: An Overview 315

4.4.6 Empirical Mode Decomposition (EMD)

EMD will split the input signal into intrinsic mode functions (IMFs). The IMF gener-
ated features are well suited to effectively capture the nonlinearity and nonstationarity
characteristics of biosignals like HRV.

5 Methodology of Deep Learning Techniques

A variety of time, frequency, wavelet, nonlinear based features along with classifiers
have been used for detecting diabetes in previous works. Our concentration in this
chapter is on deep learning. Deep learning is an improvisation of machine learning
and it is particularly suited to high dimensional data and for complex artificial intel-
ligence problems. The shortcomings of machine learning led to development of deep
learning [20].
All the explicit feature-related processes found in the conventional machine learn-
ing networks are implicitly performed in deep learning networks. Deep networks
self-learn from the data and its efficiency is much better compared to the traditional
feature extraction networks.
Deep learning networks use cascaded layers of nonlinear processing units. These
units do the task of feature extraction and transformation. The output of one unit is
fed as input to the succeeding unit. The learning can be performed in a supervised or
unsupervised manner. They normally use some kind of gradient descent method for
training using back propagation method. Popular deep learning networks are briefly
explained below.

5.1 Autoencoder (AE)

AE is a type of neural network using unsupervised learning techniques and back


propagation methods. Its target values are set to be equal to the inputs [21]. AE is
built up of two symmetrical deep networks (typically four or five layers deep), one
is for encoding and the other is for decoding. AE is thus implemented very similar
to conventional neural networks except for the novelty that its goal is to recreate the
input by learning the input data [22, 23].

5.2 Convolutional Neural Network (CNN)

CNN is modified multilayer perceptron (MLP) employing convolution operation as


one of its layers. CNN is basically built of three layers; convolutional layer followed
by pooling and fully connected layers. CNN resembles neural networks in many of
316 G. Swapna et al.

its characteristics. In conventional neural network, it is y = f(x·w) where x and y


denote input vector and output vector respectively and w the set of weights. But in
the convolutional layer of CNN, it is y = f(s(x·w)) where s indicates the convolution
operation between inputs and weights. CNN can be applied on a time series input
data (1D) or on an image (2D).

5.3 Recurrent Structures (RNN, LSTM and GRU)

5.3.1 Recurrent Neural Network (RNN)

RNN is an improvement on feedforward network. RNN contain feedback loops


(Fig. 4) which serve as short-term memory using which past information (in time
scale) can be stored and retrieved. Temporal tasks can be adeptly executed by this
modernization. There is no constraint on the permitted length of temporal sequences
in RNN, unlike MLP. Parameters can also be shared across time-steps in RNN. In
brief, the storage of RNN is replaced by another model incorporating feedback loops
and these controlled states are named as gated memory. RNN is widely used in the
areas of speech recognition, language modelling and machine translation.
The cyclic connections present in RNN architecture makes it difficult to under-
stand the working of RNNs in entirety. For better understanding and analysis purpose,
RNN’s intricate network structures can be intelligently converted to FFNs form by
unfurling in time scale (Fig. 4).

5.3.2 Long Short-Term Memory (LSTM)

LSTM (Hocreiter et al.) is an enhanced model of RNN, developed in order to model


long-range dependencies of temporal sequences more accurately than conventional
RNNs [24]. LSTM contains memory blocks in place of simple memory units of RNN

Fig. 4 Schema of RNN and


unfolded RNN in time (t =
1, t = 2) in onward path
Diabetes Detection Using ECG Signals: An Overview 317

Fig. 5 Memory blocks in RNN (left) and LSTM (right)

(Fig. 5). This property of LSTM made it of wide use in complex tasks like language
modelling. Generally, it is of wide use in areas where long time series data analysis
is required.
Memory block in LSTM can be considered as a complex processing centre built of
memory cells. The input and output gates are multiplicative gates which can permit
or block the flow of cell activation through the memory unit to nodes coming further
in the path. A set of modifiable multiplicative gates manage the entire processes
happening in the memory block. Peephole connections and forget gate are the new
additions to the LSTM architecture as research progressed. The forget gate can be
used in place of CEC (constant error carousel). These three gates also assist the
memory cell to store the information ranging across many time steps.

5.3.3 Gated Recurrent Unit (GRU)

GRU is an improved variety of LSTM having less number of parameters. GRU enable
each recurrent unit to capture dependencies corresponding to different time scales
in an adaptive manner. GRU has gating units that modulate the flow of information
inside its memory, but unlike LSTM, it doesn’t have separate memory cells. The
memory consumption and computational cost of GRU is much smaller than that of
LSTM.

5.4 Hybrid of CNN-RNN, CNN-LSTM, CNN-GRU

Hybrid deep neural network, in general, is a fusion of generative and discriminative


neural networks so that the advantages of both can be combined effectively. Hybrid
deep learning networks can be built out of cascading heterogeneous networks like
CNN-LSTM. CNN extracts the spatial features and LSTM extracts the sequential
information. This means CNN-LSTM collectively helps to extract spatio-temporal
318 G. Swapna et al.

information of signals like ECG (The details of experimental analysis and topology
of work using CNN and CNN-LSTM are explained in Sect. 7).
In the case of hybrid architectures like CNN-LSTM, CNN is made up of convo-
lutional1D and maxpooling1D layers alone. Maxpooling layer’s output is passed as
input to subsequent network.

yi = C N N (xi ) (2)

The input and output of the CNN is xi and yi respectively. Each data type of xi has
an associated class label. yi is the output vector of the maxpooling layer in CNN. yi is
fed to the next deep learning network placed after CNN. The deep learning network
can be of RNN, LSTM and GRU.

6 Literature Survey

6.1 Earlier Methods of Analysis of HRV Signals

HRV signals are earlier analysed using the above described time, frequency and non-
linear based parameters. Evidences suggest that heart does not oscillate periodically
under normal conditions [25]. Thus, nonlinear techniques, capable of extracting and
analysing nonlinear features from HRV signals, are also widely used. Nonlinear
features like Lyapunov exponent (Rosenstien et al.), 1/f slope (Kobayashi et al.),
approximate entropy (ApEn) (Pincus), detrended fluctuation analysis (DFA) (Peng
et al.) can be extracted from the HRV signals for further analysis [16, 18, 26, 27]. The
range of the feature values gives indication of the possible anomaly. HRV signals
classification is also done by nonlinear techniques [28, 29]. Nonlinear techniques are
employed for the cardiac signal analysis for developing cardiac arrhythmia detection
algorithms [30, 31].

6.2 Previous Works of Diabetes Detection Using Heart Rate


(Including Machine Learning Based)

Wheeler et al. first reported a reduced beat-to-beat variation is caused by diabetic


neuropathy during deep breathing [32]. The works of Pfeifer, Singh, Villareal had
confirmed that parasympathetic autonomic activity was reduced in diabetes affected
people much earlier to clinical visibility of neuropathic symptoms [5, 33, 34].
Researchers have found out that diabetes patients who produced negative results
after undergoing traditional cardiac function tests showed a decreased HRV. Corre-
lation between fasting blood sugar and cardiovascular complications has been clearly
established by many works [35, 36]. About one-fourth of the patients with serious
Diabetes Detection Using ECG Signals: An Overview 319

coronary disorder turned out to be diabetic patients too [37, 38]. This is because
diabetes results in early development of coronary disease and atherosclerosis. All
these results proved that HRV analysis can be used to identify diabetes.
Diabetes-induced-CAN can be very damaging. Hence, early detection of CAN
due to diabetes is very important. Ahsan et al. showed the HRV analysis using features
likes sample entropy (SampEn) and Poincare plots are very useful in detecting CAN
present in diabetic people [39].
Kirvela et al. performed frequency and time domain analysis of HRV (extracted
from 24 h duration ECG recordings) [40]. All analysis parameters (both time and
frequency) were significantly reduced in diabetic HRV samples compared to those
from normal people. Mackay measured heart rate variation at different levels of
breathing modes for normal and diabetic patients. It was observed that heart rate
variation was markedly lower in diabetic people [41].
Jelinek et al. researched on the consequences of QT dispersion on normal and
diabetic people also ensuring that people belonging to both classes had no previous
history of cardiac diseases [42]. Heart rate variability was measured through a param-
eter named tone-entropy (T-E) where tone (T) is the representation of sympatho-vagal
balance and entropy (E) is the representation of the autonomic regularity. T-E was
observed to be reduced in diabetic people. On similar group of people on simi-
lar conditions, Awdah et al. observed that time domain parameters like St. George
index, RMSSD, SDRR etc. were reduced in diabetic cases in comparison to normal
cases [43]. Chemla et al. used the method of autoregressive frequency modelling for
studying of the effect of HRV signals in diabetes affected people [44].
Schroeder et al. found out that time domain parameters of RMSSD, SDNN and RR
interval were lower in diabetic people. They also observed that as diabetes progresses,
proper autonomic function of the body will be badly affected [45]. Seyd et al. did
time and frequency analysis of HRV [46]. The time domain parameters like mean RR
interval, TINN, RMSSD, SDNN, NN50 count, HRV triangular index were reduced
in diabetic patients than normal people. It was observed that there is considerable
difference in power across different frequency ranges between diabetes people and
normal people when frequency domain analysis was done.
Trunkvalterova et al. proved that multiscale entropy (MSE) is capable of detect-
ing very small aberrations in the cardiovascular systems of patients having type 1
diabetes. In their work, they used the estimator parameter of SampEn and linear mea-
sures like RMSSD [47]. Faust et al. analysed time, frequency and nonlinear features
derived from HRV signals and showed that nonlinear methods gave better results in
the diagnosis of diabetes compared to time domain and frequency domain methods
[48]. Jian et al. applied principal component analysis (PCA) to HOS bispectrum
magnitude plots obtained out of HRV signals. These were fed to SVM classifier to
obtain diabetes detection accuracy value of 79.93% [49].
Acharya et al. arrived at an innovative diabetic integrated index (DII) making use
of nonlinear features derived from HRV signal [50]. They obtained diabetes detection
accuracy of 86% using adaboost classifier. Swapna et al. used HOS based features for
diabetes detection with an accuracy of 90.5% [51]. Acharya et al. obtained accuracy
of 90% extracting four nonlinear features using adaboost classifier [52]. Acharya
320 G. Swapna et al.

Table 3 A summary of machine learning methods used for detecting HRV parameters that were
significantly different in diabetic patients (DM = Diabetes Mellitus)
Authors Methods/features Observed activity for extracted
features for DM
Pfeifer et al. [5] Time domain
Kirvela et al. [40] Frequency domain, time HRV reduced
domain
Singh et al. [33] Frequency domain, time Reduced LF power
domain
Awdah et al. [43] Time domain Reduced
Flynn et al. [55] DFA Reduced short-term correlation
in DM
Chemla et al. [44] FFT, Autoregressive spectral Decreased
analysis
Schroeder et al. [45] Time domain Decreased
Seyd et al. [46] Time, frequency domain Decreased
Trunkvalterova et al. [47] Nonlinear methods (multiscale Decreased MSE
entropy (MSE))
Faust et al. [48] Time, frequency, nonlinear Decreased
Acharya et al. [50] Nonlinear (RQA, CD) Accuracy is 86%
Swapna et al. [51] HOS Accuracy is 90.5%
Jian et al. [49] HOS Accuracy is 79.93%
Acharya et al. [52] Nonlinear features Accuracy is 90.0%
Acharya et al. [53] DWT Accuracy is 92.02%
Pachori et al. [54] EMD related features Accuracy is 95.63%

et al. used entropies, energy skewness and kurtosis to achieve diabetes detection
accuracy of 92.02% employing decision tree (DT) classifier [53]. Pachori et al. used
EMD on HRV signals along with Morlet wavelet kernel function to achieve the very
high accuracy of 95.63% [54]. Table 3 summarises all the above works.

6.3 Deep Learning Based Diabetes Detection Works Using


HRV

These are some of the works connecting deep learning analysis methods and ECG.
CNN based deep learning methods were used to analyse ECG to detect coronary
artery disease (Acharya et al.), myocardial infarction (Acharya et al.), classify heart-
beats (Acharya et al.) [56–58]. Sujadevi et al. analysed ECG to detect atrial fibrillation
[59].
Diabetes Detection Using ECG Signals: An Overview 321

Table 4 Deep learning methods used for diabetes detection (with HRV as input)
Authors Methods/features Accuracy
Swapna et al. [60] Deep learning (CNN-LSTM) Accuracy is 95.1%
Swapna et al. [61] Deep learning (CNN-LSTM) followed by SVM Accuracy is 95.7%

Regarding diabetes detection using ECG signals, Swapna et al. employed hybrid
deep learning CNN-LSTM network with HRV as input to achieve a very high accu-
racy value of 95.1% which is comparable to maximum accuracy achieved so far [60].
Swapna et al. improved the above diabetes detection accuracy to 95.7% by adding
SVM classifier after the CNN-LSTM network [61]. Accuracy details are given in
Table 4.

7 Architecture and Implementation of Deep Learning


Architecture—Sample Study

The hybrid architecture for diabetes detection is discussed in detail in [60, 61]. The
workflow of hybrid architecture is shown in Fig. 6. Deep learning architecture is
implemented using powerful software framework of TensorFlow [62] in the case of

Fig. 6 The architecture of proposed system of [60, 61] (with and without SVM)
322 G. Swapna et al.

[60, 61]. TensorFlow is Google’s open-source software library. TensorFlow allows


modelling of numerical systems as unified data flow graphs which in turn can be
modelled as math related operations using tensors, nodes and edges. Heterogeneous
platforms like CPUs, GPU and mobile devices can be used for performing compu-
tations.
Regarding the work of Swapna et al. [60], the following network structure was
implemented. The input layer was made of 1000 neurons (number of samples in
the input data set was 1000). The values of the input data were normalized to fall
between 0 and 1. The hidden layers consist of a CNN layer (with pool size as 2,
stride as 1, number of filters as 64, kernel-size as 3), after that came the maxpooling,
flatten and drop-out of 0.5 and ended with a fully-connected layer with sigmoid
activation function. There was full connectivity between input, hidden and output
layers. Three trials were made to run for 300 epochs (learning rate as 0.001, batch
size as 16). All the final values of the above mentioned hyperparameters were fixed
after experimenting with different values and then finding out the optimum values
based on the performance of the deep learning network. These hyperparameters were
corresponding to the first network architecture CNN1 (number of CNN networks in
the topology is 1) we tried. The number of CNN layers were increased one by one
to five and then we went to the hybrid architecture of attaching LSTM to each of the
above five configurations. The accuracy of diabetes detection was found to be 95.1%
(maximum value) for CNN5-LSTM.
With respect to the second work of Swapna et al. [61], same configuration of
the above work was used with the modification that the features extracted from the
CNN/CNN-LSTM architecture were passed to the SVM classifier. This improved
the accuracy of diabetes detection. Figure 6 shows the network topology comparison
of the works [60, 61].

8 Deep Learning in Big Data Analysis: Limitations


and Challenges

The amount data handled is increasing to unimaginable proportions in size as well


as dimensions today, take the case of applications like twitter and facebook. Related
to biomedical area, if data is continuously taken from patients in real time, then the
collected data can be viewed as big data making big data analysis or analytics capable
of playing a remarkable role. For big data analytics, the traditional machine learning
based analysis techniques are inadequate.
In big data, volume of data handled is very high. In vertical dimension, it is number
of records or samples present in the dataset and in horizontal dimension, it is number
of features or parameters handled in the dataset. This data volume explosion has
brought with it huge challenges in analysing the data. The time and memory taken
for computations will have an exponential increase with increasing dataset size.
The solution to handle this challenge is to develop architectures capable of parallel
Diabetes Detection Using ECG Signals: An Overview 323

processing of data. Another issue is that as the data volume is very high, it may not be
possible to store the entire data in memory or disk. Many training/testing algorithms
are designed assuming that the data is available in its entirety in memory. Because
of this, such algorithms cannot be run successfully. This is known as the curse of
modularity. Distributed computing and parallelization can be resorted to tackle this
challenge. Further, there are challenging issues of high dimensionality of the data,
highly diverse nature of data and high variation in the probability of occurrence of
classes in data which if not handled, will deteriorate the performance of the machine
learning network. In machine learning, proper selection of features is crucial using
domain knowledge. As the dataset grows in dimension as well as in sample size, it is
extremely difficult to create relevant features. Feature selection is also very difficult
in high dimensional data. These issues in handling and analysing big data led to the
situation of deep learning networks occupy the stage instead of traditional machine
learning networks.
Concentrating on applying deep learning techniques to ECG-derived-HRV data
for the purpose of diabetes detection, the best performed models [60, 61] applied it
on real-time data and these works can be considered as the foundation stone towards
future work in this direction. Further improvements in accuracy can be tried by giving
larger sized input data into the developed architecture compared to the data given in
the above works.
Present advanced ECG measurement equipment take very less duration (less than
5 min) to extract ECG signal for analysis. On the other hand, there are Holter monitors
which do a continuous (for at least 24–48 h) monitoring of ECG signal of a person
to check for possible abnormalities which cannot be known by the short-term ECG
monitoring. Machine learning techniques are sufficient to handle short-term ECG
data. Deep learning networks and algorithms are suitable for relatively short-term
data also considering the fact that analysis results can be obtained very quickly in
real time. The second case of analysis of large amount of data (continuous ECG
signal with duration more than 24 h), say from Holter monitors, also requires big
data analytics and deep learning algorithms. If long duration ECG data are available
to researchers, deep learning architectures like LSTM and hybrid systems like CNN-
LSTM are available which are capable of analysing the non-invasive data for the
future possibility of being affected by diabetes. Hence if real time big data is made
available to deep learning networks, the scenario will shift fast from the problem of
detection of a disease to that of prediction of a disease in near future.

9 Conclusion

The body of the diabetes affected person is either incapable of producing suffi-
cient insulin or resistant to the produced insulin leading to unbalanced high blood
sugar. Autonomic impairments which are nonsymptomatic, but can only be clinically
detectable, are evident only after many years have passed after the onset of diabetes.
Thus, HRV can be used as an early sign of the impending diabetic neuropathy and
324 G. Swapna et al.

can be used for diabetes detection with high accuracy. HRV analysis is thus a simple,
non-invasive and reproducible detection method of diabetes. Deep learning methods
can be used to detect diabetes with very high accuracy. Distributed deep learning
systems can give results very fast that can turn real time analysis of biosignals a
reality. So it can be said for sure that the future of biomedical engineering belongs
to the featureless, deep learning based systems which can do big data analytics with
no necessity of domain knowledge.

References

1. Ralston, S.H., Penman, I.D., Strachan, M.W., Hobson, R.P.: Davidson’s Principles and Practice
of Medicine, 23rd edn. Elsevier
2. Viktor, S., Steven, I., Marina, D.I., Aleksander, N., Vojislava, M.: Impact of diabetes on heart
rate variability and left ventricular function in patients after myocardial infarction. Facta Univ.
Ser.: Med. Biol. 12(3), 130–134 (2005)
3. Di Carli, M.F., Janisse, J., Grunberger, G., Ager, J.: Role chronic hyperglycemia in the patho-
genesis of coronary microvascular dysfunction in diabetes. J. Am. Coll. Cardiol. 41, 1387–1393
(2003)
4. Gresele, P., Guglielmini, G., Deangelis, M., et al.: Acute short-term hyperglycemia enhances
heart stress-induced platelet activation in patients with type 2 diabetes mellitus. J. Am. Coll.
Cardiol. 41, 1013–1020 (2003)
5. Pfiefer, M.A., Cook, D., Brodsky, J., Tice, D., Reenan, A., Swedine, S., et al.: Quantitative
evaluation of cardiac parasympathetic activity in normal and diabetic man. Diabetes 339–345
(1982)
6. Sawicki, P.T., Dahne, R., Bender, R., Berger, M.: Prolonged QT interval as a predictor of
mortality in diabetic nephropathy. Diabetologia 39(1), 77–81 (1996)
7. Okin, P.M., Devereaux, R.B., Howard, B.V., Welty, T.K.: Assessment of QT interval and QT
dispersion for prediction of all-cause mortality and cardiovascular mortality in American Indi-
ans: the Strong Heart Study. Circulation 101, 61–66 (2000)
8. Barrett, K.E., Barman, M.S., Boitano, S., Brooks, H.: Ganong’s Review of Medical Physiology.
McGraw-Hill Companies
9. Stern, S., Sclarowsky, S.: The ECG in diabetes mellitus. Am. Heart Assoc. (AHA) J. (2009)
10. Sokolow, M., Mcllroy, M.B., Chiethin, M.D.: Clinical Cardiology. VLANGE Medical Book
(1990)
11. Constant, I., Laude, D., Murat, I., Elghozi, J.L.: Pulse rate variability is not a surrogate for
heart rate variability. Clin. Sci. 97, 391–397 (1999)
12. Kleiger, R.E., Bigger, J.T., Bosner, M.S., Chung, M.K., Cook, J.R., Rolnitzky, L.M., et al.:
Stability over time of variables measuring heart rate variability in normal subjects. Am. J.
Cardiol. 68, 626–630 (1991)
13. Ge, D., Srinivasan, N., Krishnan, S.M.: Cardiac arrhythmia classification using autoregressive
modeling. Biomed. Eng. Online 1(1), 5 (2002)
14. Akselrod, S., Gordon, D., Madwed, J.B., Snidman, N.C., Shannon, D.C., Cohen, R.J.: Hemo-
dynamic regulation: investigation by spectral analysis. Am. J. Physiol. 249(4 Pt 2), H867–H875
(1985)
15. Gamero, L.G., Vila, J., Palacios, F.: Wavelet transform analysis of heart rate variability during
myocardial ischaemia. Med. Biol. Eng. Comput. 40, 72–78 (2002)
16. Peng, C.K., Havlin, S., Hausdorf, J.M., Mietus, J.E., Stanley, H.E., Goldberger, A.L.: Fractal
mechanisms and heart rate dynamics. J. Electrocardiol. 28(Suppl), 59–64 (1996)
17. Grassberger, P., Procassia, I.: Measuring the strangeness of strange attractors. Phys. D 9,
189–208 (1983)
Diabetes Detection Using ECG Signals: An Overview 325

18. Pincus, S.M.: Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci.
U.S.A. 88, 2297–2301 (1991)
19. Eckmann, J.P., Kamphorst, S.O., Ruelle, D.: Recurrence plots of dynamical systems. Europhys.
Lett. 4, 973–977 (1987)
20. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press. http://www.
deeplearningbook.org (2016)
21. Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an
energy-based model. In: Advances in Neural Information Processing Systems, pp. 1137–1144
(2006)
22. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks.
Science 313(5786), 504–507 (2006)
23. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust
features with denoising autoencoders. In: Proceedings of the 25th International Conference on
Machine Learning, pp. 1096–1103. ACM (2008)
24. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
25. Goldberger, A.L., West, B.J.: Application of non-linear dynamics to clinical cardiology. Ann.
N. Y. Acad. Sci. 504, 195–213 (1987)
26. Rosenstien, M., Collins, J.J., De Luca, C.J.: A practical method for calculating largest Lyapunov
exponents from small data sets. Phys. D 65, 117–134 (1993)
27. Kobayashi, M., Musha, T.: 1/f fluctuation of heart beat period. IEEE Trans. Biomed. Eng. 29,
456–457 (1982)
28. Acharya, U.R., Kannathal, N., Krishan, S.M.: Comprehensive analysis of cardiac health using
heart rate signals. Physiol. Meas. J. 25, 1130–1151 (2004)
29. Acharya, U.R., Paul Joseph, K., Kannathal, N., Lim, C.M., Suri, J.S.: Heart rate variability: a
review. Med. Biol. Eng. Comput. 44(12), 1031–1051 (2006)
30. Chua, K.C., Chandran, V., Acharya, U.R., Lim, C.M.: Computer-based analysis of cardiac state
using entropies, recurrence plots and Poincare geometry. J. Med. Eng. Technol. 2(4), 263–272
(2008)
31. Acharya, U.R., Suri, J.S., Spaan, J.A.E., Krisnan, S.M.: Advances in Cardiac Signal Processing.
Springer Verlag GmbH Berlin Heidelberg (2007)
32. Wheeler, T., Watkins, P.J.: Cardiac denervation in diabetes. Br. Med. J. 4, 584–586 (1973)
33. Singh, J.P., Larson, M.G., O’Donell, C.J., Wilson, P.F., Tsuji, H., Lyod-Jones, D.M., Levy, D.:
Association of hyperglycemia with reduced heart rate variability: the Framingham heart study.
Am. J. Cardiol. 86, 309–312 (2000)
34. Villareal, R.P., Liu, B.C., Massumi, A.: Heart rate variability and cardiovascular mortality.
Curr. Atheroscler. Rep. 4(2), 120–127 (2002)
35. Stamler, J., Vaccaro, D., Neaton, J.D., Wentworth, D.: Diabetes, other risk factors, and 12-year
cardiovascular mortality for men screened in the multiple risk factor intervention trial. Diabetes
Care 16, 434–444 (1993)
36. Coutinho, M., Gerstein, H.C., Wang, Y., Yusuf, S.: The relationship between glucose and
incidence cardiovascular events: a meta-regression analysis of published data from 20 studies
of 95783 individuals followed for 12.4 years. Diabetes Care 22, 233–240 (1999)
37. Melchior, T., Kober, L., Madsen, C.R., et al.: Accelerating impact of diabetes mellitus on
mortality in the years following an acute myocardial infarction. Eur. Heart J. 20, 973–978
(1999)
38. Braunwald, E., Antman, E., Beasley, J.W., et al.: ACC/AHA guidelines for the management
of patients with unstable angina and non-ST-segment elevation myocardial infarction. J. Am.
Coll. Cardiol. 36, 970–1062 (2000)
39. Khandoker, A.H., Jelinek, H.F., Palaniswami, M.L: Identifying diabetic patients with cardiac
autonomic neuropathy by heart rate complexity analysis. Biomed. Eng. Online 8, 1–12 (2009)
40. Kirvela, M., Salmela, K., et al.: Heart rate variability in diabetic and non-diabetic renal trans-
plant patients. Acta Anaesthesiol. Scand. 40(7), 804–808 (1996)
326 G. Swapna et al.

41. Mackay, J.D.: Respiratory sinus arrhythmia in diabetic neuropathy. Diabetologia 24(4),
253–256 (1983). https://doi.org/10.1007/BF00282709
42. Jelinek, H.F., Flynn, A., Warner, P.: Automated assessment of cardiovascular disease associated
with diabetes in rural and remote health practice. In: The National SARRAH Conference,
pp. 1–7 (2004)
43. Awdah, A., Nabil, A., Ahmad, S., Reem, Q., Khidir, A.: Time-domain analysis of heart rate
variability in diabetic patients with and without autonomic neuropathy. Ann. Saudi Med. 22,
5–6 (2002)
44. Chemla, D., Young, J., Badilini, F., Maison, B.P., Affres, H., Lecarpentier, Y., Chanson, P.:
Comparison of fast Fourier transform and autoregressive spectral analysis for the study of
heart rate variability in diabetic patients. Int. J. Cardiol. 104(3), 307–313 (2005)
45. Schroeder, E.B., Chambless, L.E., Liao, D., Prineas, R.J., Evans, G.W., Rosamond, W.D., et al.:
Diabetes, glucose, insulin, and heart rate variability: the Atherosclerosis Risk in Communities
(ARIC) study. Diabetes Care 28(3), 668–674 (2005)
46. Seyd, P.T.A., Ahamed, V.T., Jacob, J., Joseph, P.: Time and frequency domain analysis of heart
rate variability and their correlations in diabetes mellitus. World Acad. Sci. Eng. Technol. 2(3)
(2008)
47. Trunkvalterova, Z., Javorka, M., Tonhajzerova, I., Javorkova, J., Lazarova, Z., Javorka, K.,
Baumert, M.: Reduced short-term complexity of heart rate and blood pressure dynamics in
patients with diabetes mellitus type 1: multiscale entropy analysis. J. Physiol. Meas. 29(7)
(2008)
48. Faust, O., Acharya, U.R., Molinari, F., Chattopadhyay, S., Tamura, T.: Linear and non-linear
analysis of cardiac health in diabetic subjects. Biomed. Signal Process. Control 7(3), 295–302
(2012)
49. Jian, L.W., Lim, T.C.: Automated detection of diabetes by means of higher order spectral
features obtained from heart rate signals. J. Med. Imaging Health Inform. 3, 440–447 (2013)
50. Acharya, U.R., Faust, O., VinithaSree, S., Ghista, D.N., Dua, S., Joseph, P., Thajudin, A.V.I.,
Janarthanan, N., Tamura, T.: An integrated diabetic index using heart rate variability signal
features for diagnosis of diabetes. Comput. Methods Biomech. Biomed. Eng. 16, 222–234
(2013)
51. Swapna, G., Acharya, U.R., VinithaSree, S., Suri, J.S.: Automated detection of diabetes using
higher order spectral features extracted from heart rate signals. Intell. Data Anal. 17(2), 309–326
(2013)
52. Acharya, U.R., Faust, O., Kadri, N.A., Suri, J.S., Yu, W.: Automated identification of normal and
diabetes heart rate signals using nonlinear measures. Comput. Biol. Med. 43(10), 1523–1529
(2013)
53. Acharya, U.R., Vidya, S., Ghista, D.N., Lim, W.J.E., Molinari, F., Sankaranarayanan, M.:
Computer-aided diagnosis of diabetic subjects by HRV signals using discrete wavelet transform
method. Knowl.-Based Syst. 42, 4567–4581 (2015)
54. Pachori, R.B., Kumar, M., Avinash, P., Shashank, K., Acharya, U.R.: An improved online
paradigm for screening of diabetic patients using RR-interval signals. J. Mech. Med. Biol. 16,
1640003 (2016)
55. Flynn, A.C., Jelinek, A.F., Smith, M.: Heart rate variability analysis: a useful assessment tool
for diabetes associated cardiac dysfunction in rural and remote areas. Aust. J. Rural Health
13(2), 77–82 (2005)
56. Acharya, U.R., Fujita, H., Oh, S.L., Adam, M., Tan, J.H., Chua, C.K.: Automated detection of
coronary artery disease using different durations of ECG segments with convolutional neural
network. Knowl.-Based Syst. 132, 62–71 (2017)
57. Acharya, U.R., Fujita, H., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M.: Application of deep
convolutional neural network for automated detection of myocardial infarction using ECG
signals. Inf. Sci. 415, 190–198 (2017)
58. Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A., Tan, R.S.: A deep
convolutional neural network model to classify heartbeats. Comput. Biol. Med. 89, 389–396
(2017)
Diabetes Detection Using ECG Signals: An Overview 327

59. Sujadevi, V.G., Soman, K.P., Vinayakumar, R.: Real-time detection of atrial fibrillation from
short time single lead ECG traces using recurrent neural networks. In: The International Sympo-
sium on Intelligent Systems Technologies and Applications, pp. 212–221, Sept 2017. Springer
60. Swapna, G., Soman, K.P., Vinayakumar, R.: Automated detection of diabetes using CNN and
CNN-LSTM network and heart rate signals. Procedia Comput. Sci. 132, 1253–1262 (2018)
61. Swapna, G., Vinayakumar, R., Soman, K.P.: Diabetes detection using deep learning algorithms.
ICT Express 4, 243–246 (2018)
62. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving,
G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. OSDI 16, 265–283
(2016)

G. Swapna is a Ph.D. student in the Computational Engineering and Networking, Amrita School
of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India since July 2015. She is also a
faculty at Government Engineering College, Kozhikode, Kerala, India.

K. P. Soman has 25 years of research and teaching experience at Amrita School of Engineer-
ing, Coimbatore. He has around 150 publications in national and international journals and con-
ference proceedings. He has organized a series of workshops and summer schools in Advanced
signal processing using wavelets, Kernel Methods for pattern classification, Deep learning, and
Big-data Analytics for industry and academia. He authored books on “Insight into Wavelets”, “In-
sight into Data mining”, “Support Vector Machines and Other Kernel Methods” and “Signal and
Image processing-the sparse way”, published by Prentice Hall, New Delhi, and Elsevier.

R. Vinayakumar is a Ph.D. student in the Computational Engineering and Networking, Amrita


School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India since July 2015. He has
several papers in Machine Learning applied to Cyber Security. His Ph.D. work centers on Applica-
tion of Machine learning (sometimes Deep learning) for Cyber Security and discusses the impor-
tance of Natural language processing, Image processing and Big data analytics for Cyber Security.
He has participated in several international shared tasks and organized a shared task on detecting
malicious domain names (DMD 2018) as part of SSCC’18 and ICACCI’18. More details available
at https://vinayakumarr.github.io/.
Deep Learning and the Future
of Biomedical Image Analysis

Monika Jyotiyana and Nishtha Kesswani

Abstract Deep Learning (DL) is popular among the researchers and academicians
due to its reliability and accuracy, especially in the field of engineering and medical
sciences. In the field of medical imaging for the diagnosis of disease, DL techniques
are very helpful for early detection. Most important features of DL techniques are
that they are uncomplicated with lower complexity, which ultimately saves the time
and money and tackle many tough tasks simultaneously. Artificial Intelligence (AI)
and Deep Learning (DL) technologies have rapidly improved in recent years. These
techniques played an important role in every field of application, especially in the
medical field such as in image processing, image fusion, image segmentation, image
retrieval, image analysis, computer aided diagnosis (CAD), image registration and,
image-guided therapy and many more. The aim of writing this chapter is to describe
the DL methods and, the future of biomedical imaging using DL in detail and discuss
the issues and challenges.

Keywords Machine Learning · Deep Learning · Convolutional Neural Networks ·


Recurrent Neural Network · Computer-Aided Diagnosis

1 Introduction

Currently, DL techniques are one of the most often used algorithms for getting
better, scalable, and accurate results from the data as compared to state-of-the-art
methods of Machine Learning (ML). DL is also applied to the biomedical images
to detect (diagnose) diseases with precisely tailored treatment plans for improving
the patient’s health. EEG, ECG, MEG, MRI, etc. are the trending biomedical images
for diagnosis of patients by minimising the intervention of humans. These medical
images may also contain noise, which makes it difficult to analyse them accurately.
Deep Learning has the potential to give reliable and precise results with higher

M. Jyotiyana (B) · N. Kesswani


Central University of Rajasthan, Bandar Sindri, Ajmer, India
e-mail: [email protected]
N. Kesswani
e-mail: [email protected]
© Springer Nature Switzerland AG 2020 329
S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_15
330 M. Jyotiyana and N. Kesswani

accuracy. Every technology has some pros and cons, Similarly DL is also having
some cons too, like it gives promising outcomes when data size is huge, it needs GPU
to process the medical images with requiring higher system configurations. Although
Deep Learning is having some disadvantages but still it is trendy in present scenario
due to its capability of processing huge amount of data. This chapter discusses,
the state-of-the-art approaches of DL for biomedical images. We will also discuss
the application of Deep Learning (DL) for classification, registration, segmentation,
issues and challenges of DL approaches and future of Deep Learning in biomedical
imaging.

1.1 Deep Learning

Within the wide assortment of various Machine Learning (ML) approaches, Deep
Learning has truly marked its presence with its excellent performance, particularly
in the area of medical image processing.
Deep Learning (DL) belongs to the area of ML, which in turn is a fork of AI.
It deals with algorithms motivated by the structure and functioning of the brains. It
permits computing models to learn from the representation of dataset with the aid
of numerous hidden processing layers [1]. These layers are concerned with idea of
feature extraction and transformation. The output from the last layer is fed into the
subsequent one. It is a way to automate predictive analysis. Moreover, it can excel
in performance in both supervised and unsupervised approaches (Fig. 1).
The working of Deep Learning approach is shown in Fig. 2. In this approach,
firstly dataset and particular Deep Learning algorithm are chosen for which model
is to be designed, in further steps comprehensive experiments are performed and
thereafter the results are generated and analyzed.
Numerous Deep Learning models have been developed, such as Convolutional
Neural Networks (CNN), Deep Belief Networks (DBN), Recurrent Neural Network
(RNN) etc. which are discussed in the following subsections:

Convolutional Neural Networks (CNN)


Being a DL technique, CNN can take the image as input, allot weights, and biases to
multiple image objects and that can be distinguish from the each other. CNN has a
lower pre-processing requirement in comparison to other classification techniques.
The architecture of the Visual Cortex encourages the structure of CNN. In this,
respond to stimuli is done by single neuron in the restricted region of the receptive
field (visual field). A group of such fields coincides with covering the full visual sur-
face. It has multiple convolutional layers. The first layer captures low-level features
while add-on layers adapt to render an overall understanding of the image, which
used in the dataset (Fig. 3).
Deep Learning and the Future of Biomedical Image Analysis 331

Fig. 1 Deep Neural Networks (DNN) architecture

Fig. 2 Working of the Deep Learning model

Recurrent Neural Network (RNN)


RNN has been successfully deployed in the sequential data for Google Voice Search
and Apple Siri. It has a unique feature to remember the input due to its internal
memory. It considers current input and previously received inputs as feedback while
Convolution Neural Networks works as a Feed-Forward Neural Network (FFNN)
which is only concerned with current input. It is used in time series, text classification,
332 M. Jyotiyana and N. Kesswani

Fig. 3 Working model of Convolutional Neural Networks

audio, video as it develops a deeper understanding of the sequence and context. In


RNN, the information transfers through a loop. Decision making is done based upon
the previously learned parameters and the current one (Fig. 4).

Long-Short Term Memory (LSTM)


A version of RNN capable of handling a long sequence of input through the operation
of gates. It has following gates—input gate, forget gate, output gate. These three gates
manage to add new input, or to forget the least important information or to show to
the impact of the parameter on the output at the current timestamp. It has a memory
where read, write, and delete operations can be performed. It resolves the vanishing
gradient issue with relatively high training accuracy.

Encoder-Decoder (ED)
Encoder-Decoder architecture surpasses the traditions of ML methods. It has trans-
formed as a core technology for prediction in neural networks and sequence-to-
sequence technique. It has ability to tackle with variable length input and output.

Fig. 4 Represents the RNN architecture


Deep Learning and the Future of Biomedical Image Analysis 333

Fig. 5 Basic architecture of Encoder and Decoder

The encoder holds the input sequence and maps it to an encoded sequence. The
encoded version is utilized by the decoder to materialize it into output (Fig. 5).

1.2 Biomedical Imaging

Deep Learning is the growing and trendy research area in medical research for the
diagnosis of the diseases. In today’s scenario, people are primarily suffering from
lifestyle diseases like type-2 diabetes, obesity, heart diseases, and neurodegenerative
diseases due to the consumption of drugs, alcohol, smoking and unhealthy diet.
Deep Learning is playing a vital role in the prediction of such diseases. In our day to
day life, Computer-Aided Diagnosis (CAD) is preferable for testing and diagnosing
any disease via Computerised Tomography (CT), Single Photon Emission Computed
Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Resonance
Imaging (MRI) and some more. Deep Learning accelerates the processing speed of
the diagnosis as well as it can expand the 2D and 3D parameters for further details.
It can also resolve the issues regarding data labeling and over fitting to some extent.

1.3 Role of Deep Learning in Diagnosis from Various


Medical Images

There are many diseases which can be classified or diagnosed using DNN like breast
cancer, aphasia, attention deficit hyperactivity disorder (ADHD) and many more.
Deep Learning is very much popular research area with the help of which we can
diagnose any type of disease. For example, ADHD is a very common mental disorder
334 M. Jyotiyana and N. Kesswani

among children. A child suffering from ADHD may have to face some problems like
poor concentration power, distractibility, weakness, and excessive activity. Similarly
Deep Learning is also used for detection of cancer, Alzheimer, Parkinson’s, brain
tumor and many more.

1.4 Applications

Deep Learning is prevalent in nowadays not even in the field of health informatics
but in daily routine life too. Many prediction and classification tasks are managed
by Deep Learning because of its promising results, accuracy, and faster processing
with less complexity.
There are many applications of Deep Learning, but some typical popular health
informatics applications are:
1. Content-based image retrieval
2. Object detection
Face detection
Disease diagnosis
Lesion detection
3. Machine vision and medical imaging
Tumor detection
Tumor stage
Surgery planning
Remote surgery
Intra-surgery navigation
Virtual surgery simulation
4. Recognition tasks
Iris recognition
Pattern recognition.

2 Deep Learning in Medical Imaging

Machines are faster and more accurate as compare to humans, so humans prefer
machine/computer-based jobs mostly. In medical sciences, Computer-Aided Diag-
nosis (CAD) and automatic medical image analysis are the preferable choices, or
we can say crucial too. CAD also playing the important role in the modeling disease
progression [2, 3], like in many neurodegenerative disorders (NDD) such as strokes,
Parkinson’s disease (PD), Alzheimer’s disease (AD) and another type of dementia,
Deep Learning and the Future of Biomedical Image Analysis 335

brain scan is crucial and detailed maps of brain regions are available for analysis
and prediction of the diseases. We can add the most popular task of CAD in medical
imaging as a cancer diagnosis and measuring the intensity of lesions too. In current
years, CNN’s are more popular because of its spectacular performance and relia-
bility. The efficiency and performance of CNN’s are indicated in a survey of CNN
methods/algorithms in which brain pathology segmentation [4] and Deep Learning
approaches are used in CAD, shape prediction, and segmentation [2].
The massive challenge in CAD is in distinguishing intensity of tumors and shape
and the variations in imaging protocols in same neuro-imaging modality. In various
cases, it’s been noticed that intensity of pathological tissues may overlap with healthy
tissues and different types of noises like Rician noise, intensity-based noise and non-
isotropic resolution effects in MRI cannot be handled easily or by using elementary
Machine Learning (ML) approaches. To handle such type of data complications,
hand-crafted features and well established ML methods are used to classify them in
an entirely distinct step.
Deep Learning approaches can automate and unite the features with classification
approaches [5, 6]. CNN is capable of learning more complex features; thus, CNN is
capable of handling patch of the images centered on unhealthy tissues. CNN in med-
ical imaging is able to classify tuberculosis manifestation based on X-ray images [7],
and classification of lung disease based on CT images [8]. Along with Hemorrhages
detection in color fundus images [9] CNN can extract least discriminative patches
and most discriminative patches in pre-training stage. CNN has proposed some seg-
mentation methods of iso-intense stage brain cells [10] and extraction of different
brain regions from multi-modality Magnetic Resonance Images (MRI) [11]. There
are many hybrid approaches proposed in which CNN combines with other archi-
tectures for example, in [12] DL approach is proposed, to encode the parameters
of a distorted model and, the process of segmentation of heart’s left ventricle from
short-axis Magnetic Resonance Imaging. CNN itself distinguishes the left ventricle
while Deep Auto-Encoder (DAE) is employed to infer its shape.

2.1 Classification

Classification, classifies the data into various classes according to our need. There
are many cutting edge techniques for classification such as Support Vector Machine
(SVM), K-Nearest Neighbor (KNN), Random Forest (RF), Neural Networks (NN)
and most recent technique is Deep Learning (DL) in which we used different
approaches of DL for classification. CNN is a trendy method in the field of biomedical
imaging and health informatics for classification. Details of the image classification
are discussed in the next section.
336 M. Jyotiyana and N. Kesswani

2.1.1 Image Classification

Image classification is the broad area in which Deep Learning has an immense con-
tribution. In classification, multiple images are used as input with one variable as the
output and that output is compared to the desired output to check whether the disease
is diagnosed or not. We can use different classifiers like Support Vector Machine
(SVM), Random Forest (RF), Artificial Neural Networks (ANN), and many more.
Medical image classification is crucial in image recognition; its prime focus is to
classify medical images into various categories for diagnosis of a disease or helping
the researchers in further research. Medical image classification can be performed
by extracting useful features from the image and, using those features to build clas-
sification models that classify the image from the dataset.
When CAD was not as popular as it is today, in that era, doctors commonly used
their experience for extracting features, from the medical image and then classify
the image into various classes. This is an ordinarily complicated, tedious, and time-
consuming job. Deep Learning resolves the issue of accurate prediction means DL
is giving more precise results than humans and also it is faster to predict. It can
also process many datasets of different patients. In recent years, medical imaging
applications have great merits not only in the case of solving issues of doctors but in
research too. However, we researchers still cannot succeed in the mission efficiently.
If studies could perform classification efficiently and excellently, then it would be a
great help to doctors for diagnosis of diseases.

2.1.2 Object or Lesion Classification

In medical image analysis and diagnosis, CAD provides an opinion (second objective
or additional) as an assistant. In recent years many types of research and studies have
proved that incorporation of CAD system boots up the diagnosis processes faster as
well as accurate, by enhancing the image diagnosis by lessening inter-observer vari-
ation [13, 14]. CAD enhances quantitative support for clinical recommendations like
biopsy [15]. For the identification of tumor, CAD is often constructed from follow-
ing important steps such as, feature selection, feature extraction, and classification
[16–19]. Various ML and DL classification techniques [20] have been proposed to
classify cancerous and healthy cells [21]. The main challenge is to reduce the dimen-
sions of features without losing significant information. In Deep Learning, the dataset
is the major issue if the dataset is smaller in size; it makes it more difficult to predict
some instances with the least risk of over-fitting [21]. The researchers have given
many solutions for lesion classification, but most of them accomplish feature space
reduction by deriving short feature sets selecting the features or constructing new
features in supervised ways [21].
Deep Learning and the Future of Biomedical Image Analysis 337

2.2 Detection

In the detection of any organ or tissue, image preprocessing segmentation followed


by classification for detection of any disease or classify the subject. Detection of
carcinoma cancer cells using Deep Learning consist of few steps, unsupervised or
supervised feature learning, image representation using CNN, automatic detection
of BCC then last step is visual interpretation [4, 5, 22].

2.2.1 Organ, Region, Landmark Recognition

In medical imaging, the organ and region detection is an important task especially
in cancer, and neurodegenerative diseases, When the organ deformation activity is
recorded in MRI or other modality then it becomes easy to diagnose the type of disease
subject is suffering from and stages of the disease [23]. In case of cancer diagnosis
of tumor/brain tumor its plays vital role for treatment planning. A prime challenge
in microscopic image analysis is to analyze all independent cells for precise or exact
detection, although the distinction of most of the disease grades depends on the cell
level information [24]. To accomplish this dare, academician and researcher used
CNN for faultless detection and segmentation of cell robustly from histo-pathological
images [24, 25], outstandingly used for cancer diagnosis.

2.2.2 Object and Lesion Detection

As discussed in Sect. 2.1.2, object and lesion detection is similar to its classification.
The only difference is that for the detection of lesion we have to perform segmentation
task first then perform classification or prediction for the diagnosis of disease [23,
26]. In the current scenario, Deep Learning provides promising results so that early
stages and treatment can contribute to the patient at the right time. For example, in the
year 2018, Abraham et al. suggested a novel method of lesion segmentation using
U-Net Deep Learning architecture to enhance segmentation accuracy and disease
diagnosis or prediction [27].

2.3 Segmentation

Segmentation plays a significant role in predicting the disease/disorder by dividing an


image into multiple segments and then compare the segments/parts with testing data
[3]. CNN architecture, mostly followed in segmentation U-Net is recently most used
architecture for 3D image segmentation. In the medical image analysis, segmentation
plays a vital role; we divide the image based on similar properties like color, contrast,
brightness, and grey level, etc.
338 M. Jyotiyana and N. Kesswani

Fig. 6 Shows the image segmentation methods

Some of the image segmentation methods are threshold, edge-based, region-based


methods, ANN based methods, unsupervised learning methods and many more. For
the sake of brevity, the details have not been given here (Fig. 6).

2.3.1 Organ and Substructure Segmentation

Researchers and medical practitioners perform segmentation task for diagnosis of


disease stage and its intensity. It is widely used in cancer diagnosis, Cancer is preva-
lent now-a-days, for example, in the US alone 23,000 cases of brain tumor reported
in 2015, and this statistic is increasing day by day. Although the usual treatment
for brain tumors is brain surgery but other treatments including chemotherapy and
radio-therapies slowdowns the rate of the tumor growth. MRI gives full structural
and functional details of the brain. But, tumor segmentation from MR images, CT
images, or other medical imaging modality can enhance the improved diagnostic,
growth rate of the tumor, size of tumor and planning of the treatment [28]. Tumors
like meningioma as can be segmented effortlessly, while gliomas are complicated to
segment due to poor contrast and extended tentacles like structure [28]. The prime
objective of tumor segmentation is, to mark the location of the tumor and detect the
extended region (where cancer cells are present) and compare the affected tissues
with healthy tissues for diagnosis [28].

2.3.2 Lesion Segmentation

There are many leading edge approaches for lesion segmentation, but CNN gives
the most promising results in 2D as well as 3D biological data [29]. Yuan proposed
lesion segmentation method [30] for the detection of melanoma automatically from
surrounding skin cells using convolution and deconvolution method [30]. For the
diagnosis of various types of cancerous cells, CNN and other DL methods are used,
because they give more accuracy and promising results in less time period.
Deep Learning and the Future of Biomedical Image Analysis 339

2.4 Registration

Registration is a task of analysis of the images; in this procedure coordinate trans-


form is calculated via one to another image. Although registration is accomplishing
in an iterative framework [31], yet assumes a non-parametric transformation and
so the predetermined matrix is optimized [31]. MRI analysis is multi-parametric
tissue information gather within fewer acquisition times, larger cohorts, higher spa-
tial and temporal resolution, and atlases. We can also conclude that mathematically,
image registration is a challenging geometric analysis, optimization of strategies,
and numerical schemes [32–34].

2.5 Other Tasks in Medical Imaging

There are many other tasks in medical imaging for enriching the quality of image
and diagnosis of disease. We will describe them in following subsections:

2.5.1 Content-Based Image Retrieval

Content-Based Image Retrieval (CBIR) tasks prime goal is to assist the physician by
yielding similar medical cases of a given image in the process of decision making.
It requires massive dataset to be used in DL, sharp image representation and algo-
rithms that reliably retrieve the most identical image and their interpretation. The
first application of DL with CBIR came in 2015 [35]. In the year 2019, Pizarro et al.
[36] designed CNN architecture for automated inferring the contrast of MRI scans
based on pixel amplitude or intensity of the MR images of multiple slices [37].

2.5.2 Image Generation and Enhancement

DL in medical imaging has usually focused on classification, prediction, and seg-


mentation of reconstructed images. Deep Learning penetrates recently into the lower
level of MR measurement techniques or approaches from MR image acquisition to
denoising and super-resolution [32].

2.5.3 Combining Image Data with Reports

As the massive data is pre-processed in Deep Learning, it gives better results, which
helps the radiologist in disease diagnosis and further research. The nearby instances
and different probabilities of the occurring of the symptoms of the disease is included
in the report of the subjects which helps in strong decision making.
340 M. Jyotiyana and N. Kesswani

3 Future of Deep Learning in Biomedical Imaging

An upcoming new era will be known in the health sector, where medical imaging
and data will play a vital role. As the human population is increasing day by day, the
number of cases/subjects will also increase, as we are aware of the fact that Deep
Learning is applied on massive datasets, if the number of cases recorded will increase
then the problem of the large dataset will resolve automatically. The fundamental
requirement of any subject is that right treatment should be given to the right subject
in limited time. In this context, we can say that the availability of massive dataset
brings immense opportunities as well as challenges.
In many studies, it is reported that CAD is more accurate than humans in disease
diagnosis, and it can handle many of the cases simultaneously. Thus CAD availabil-
ity and reliability is no more an issue in this technological world. In current years,
Deep Learning replaces the ML and Pattern Recognition because of the availability
of great number of data-driven solutions in medical imaging by permitting the auto-
matic feature creation and lessens human intervention during the procedure [20]. It is
favorable in many health informatics problems, and ultimately, Deep Learning rein-
forces speedily in forward direction for unstructured data originate from biomedical
imaging, bioinformatics, and medical informatics. Most of designed applications of
DL to medical imaging process the health data which is an unstructured source [20].
However, a plenty of information is encoded in structured data [20]. This gives com-
plete information about the subject’s history, treatment, diagnosis and pathology. In
medical imaging, in tumor detection cases, the cytological notes include information
about the tumor stage and its spread [20]. Such kind of information is crucial; it is
required for judging the patient’s condition or disease. Deep Learning boosts up the
reliability of the clinical decision support system with artificial intelligence (AI).

3.1 Recent Methods and Predictive Models

As the popularity of Deep Learning increases due to its reliability and flexibility, there
are many approaches and frameworks in the field of biomedical imaging, which are
popular over time. Recently, CNN is popular with the combination of other Deep
Learning architecture like CNN with Auto-Encoder, CNN with SVM for classifi-
cation, CNN with K-Means algorithm in image segmentation; similarly there are
various methods and architecture available for resolving the real-life problems and
other research problems.
There are some CNN models available with different layers and structure, such
as VGG [38], AlexNet [39], GoogLeNet [40], ResNet [41], Highway nets [42],
DenseNet [43], ResNext [44], SENets [45], NASNet [46], YOLO [47], GANs [48],
Siamese nets [49], U-net [50], V-net [51], and many more.
Deep Learning and the Future of Biomedical Image Analysis 341

4 Challenges and Issues

There are various issues and challenges associated with various application domains
in particular with the medical applications that need to be solved:
• Data volume: Deep Learning being highly computational it tries to process big
amount of data. It is not generalized to have a specific number of training doc-
uments, but at least 10 sample parameters in the network should be there as a
general thumb rule. We can find large volume of data for the various application
domains like computer vision, speech, natural language etc. As we are aware of
the fact that the population of the earth is increasing day by day, so number of
cases of diseases also increase hence, collection of data is easier.
• Data quality: Data quality is again a pertinent issue in the area of Deep Learning
because, in some application domains the data which is heterogeneous, raw, noisy
and incomplete, may cause wrongly interpreted results; so, to maintain the quality
of data with such huge and heterogeneous raw database while training a good
DL model has several issues, such as data scarcity, repetition of data and missing
values that needs to be considered.
• Interpretability: Despite of successful implementation of Deep Learning models
in few application domains, still Deep Learning models are treated as black boxes,
as interpretability for the various application domains is crucial for the predictive
systems.
• Domain complexity: The domain complexity is another issue; as we talk about the
medical domain, the data sets are highly heterogeneous with incomplete knowl-
edge of their causes and their progress. Hence designing and developing Deep
Learning model with the domain complexity is very important aspect of the train-
ing models.
• Temporality: In various applications domains like medical domain datasets are
changing over the time in a nondeterministic way because the diseases are pro-
gressing and the Deep Learning models are trained with static vector based inputs
and are not trained to handle the time factor. So, designing or developing the DL
model while taking temporal data into consideration is another aspect of Deep
Learning. These challenges and issues associated with the Deep Learning opens
the door for the future research directions.
• Feature enrichment: There is limited data available in the world because of less
number of patient are present that characterize each disease. The data set required
for generating the features are not limited to the specific data source like social
media etc., the data sources can be collected through various wearable devices,
surveys, social communities etc. The integration of data sources with the Deep
Learning models is another research challenges in front of the research community.
• Temporal modeling: In health sector and real life problems time is crucial. If
the involvement of machine like CAD systems and EHR and other monitoring
devices, then time is very sensitive and training with Deep Learning should be
faster, accurate and reliable too for understanding subject’s condition and detecting
342 M. Jyotiyana and N. Kesswani

the stage of the disease. For solving the issue we can trust on RNNs and architecture
coupled with memory.
• Interpretable modeling: In Deep Learning, performance of model is important but
reliability or interpretability of the model is also very important. Deep Learning is
trendier because of its promising results and great performance, yet, how to make
the results more explanatory is also a task. Researchers should focus on model
performance as well as on algorithms too; to develop better prediction inability of
the systems.

References

1. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
2. Greenspan, H., Van Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical
imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging
35(5), 1153–1159 (2016)
3. Stoyanov, D., Taylor, Z., Sarikaya, D., McLeod, J., Ballester, M.A.G., Codella, N.C., De Rib-
aupierre, S. (eds.): OR 2.0 Context-Aware Operating Theaters, Computer Assisted Robotic
Endoscopy, Clinical Image-Based Procedures, and Skin Image Analysis: First International
Workshop, OR 2.0 2018, 5th International Workshop, CARE 2018, 7th International Work-
shop, CLIP 2018, Third International Workshop, ISIC 2018, Held in Conjunction with MICCAI
2018, Granada, Spain, September 16 and 20, 2018, Proceedings, vol. 11041. Springer (2018)
4. Havaei, M., Guizard, N., Larochelle, H., Jodoin, P.M.: Deep learning trends for focal brain
pathology segmentation in MRI. In: Machine Learning for Health Informatics, pp. 125–148.
Springer, Cham (2016)
5. Nie, D., Zhang, H., Adeli, E., Liu, L., Shen, D.: 3D deep learning for multi-modal imaging-
guided survival time prediction of brain tumor patients. In: International Conference on Medi-
cal Image Computing and Computer-Assisted Intervention, pp. 212–220, Oct 2016. Springer,
Cham
6. Xu, T., Zhang, H., Huang, X., Zhang, S., Metaxas, D.N.: Multimodal deep learning for cervical
dysplasia diagnosis. In: International Conference on Medical Image Computing and Computer-
Assisted Intervention, pp. 115–123, Oct 2016. Springer, Cham
7. Cao, Y., Liu, C., Liu, B., Brunette, M.J., Zhang, N., Sun, T., Curioso, W.H.: Improving tuber-
culosis diagnostics using deep learning and mobile health technologies among resource-poor
and marginalized communities. In: 2016 IEEE First International Conference on Connected
Health: Applications, Systems and Engineering Technologies (CHASE), pp. 274–281, June
2016. IEEE
8. Anthimopoulos, M., Christodoulidis, S., Ebner, L., Christe, A., Mougiakakou, S.: Lung pattern
classification for interstitial lung diseases using a deep convolutional neural network. IEEE
Trans. Med. Imaging 35(5), 1207–1216 (2016)
9. van Grinsven, M.J., van Ginneken, B., Hoyng, C.B., Theelen, T., Sánchez, C.I.: Fast con-
volutional neural network training using selective data sampling: application to hemorrhage
detection in color fundus images. IEEE Trans. Med. Imaging 35(5), 1273–1284 (2016)
10. Zhang, W., Li, R., Deng, H., Wang, L., Lin, W., Ji, S., Shen, D.: Deep convolutional neural
networks for multi-modality isointense infant brain image segmentation. NeuroImage 108,
214–224 (2015)
11. Kleesiek, J., Urban, G., Hubert, A., Schwarz, D., Maier-Hein, K., Bendszus, M., Biller, A.:
Deep MRI brain extraction: a 3D convolutional neural network for skull stripping. NeuroImage
129, 460–469 (2016)
Deep Learning and the Future of Biomedical Image Analysis 343

12. Avendi, M.R., Kheradvar, A., Jafarkhani, H.: A combined deep-learning and deformable-model
approach to fully automatic segmentation of the left ventricle in cardiac MRI. Med. Image Anal.
30, 108–119 (2016)
13. Singh, S., Maxwell, J., Baker, J.A., Nicholas, J.L., Lo, J.Y.: Computer-aided classification of
breast masses: performance and interobserver variability of expert radiologists versus residents.
Radiology 258(1), 73–80 (2011)
14. Sahiner, B., Chan, H.P., Roubidoux, M.A., Hadjiiski, L.M., Helvie, M.A., Paramagul, C., Blane,
C.: Malignant and benign breast masses on 3D US volumetric images: effect of computer-aided
diagnosis on radiologist accuracy. Radiology 242(3), 716–724 (2007)
15. Joo, S., Yang, Y.S., Moon, W.K., Kim, H.C.: Computer-aided diagnosis of solid breast nodules:
use of an artificial neural network based on multiple sonographic features. IEEE Trans. Med.
Imaging 23(10), 1292–1300 (2004)
16. Chen, C.M., Chou, Y.H., Han, K.C., Hung, G.S., Tiu, C.M., Chiou, H.J., Chiou, S.Y.: Breast
lesions on sonograms: computer-aided diagnosis with nearly setting-independent features and
artificial neural networks. Radiology 226(2), 504–514 (2003)
17. Sun, T., Zhang, R., Wang, J., Li, X., Guo, X.: Computer-aided diagnosis for early-stage lung
cancer based on longitudinal and balanced data. PLoS ONE 8(5), e63559 (2013)
18. Newell, D., Nie, K., Chen, J.H., Hsu, C.C., Hon, J.Y., Nalcioglu, O., Su, M.Y.: Selection
of diagnostic features on breast MRI to differentiate between malignant and benign lesions
using computer-aided diagnosis: differences in lesions presenting as mass and non-mass-like
enhancement. Eur. Radiol. 20(4), 771–781 (2010)
19. Tourassi, G.D., Frederick, E.D., Markey, M.K., Floyd, C.E.: Application of the mutual informa-
tion criterion for feature selection in computer-aided diagnosis. Med. Phys. 28(12), 2394–2402
(2001)
20. Ravì, D., Wong, C., Deligianni, F., Berthelot, M., Andreu-Perez, J., Lo, B., Yang, G.Z.: Deep
learning for health informatics. IEEE J. Biomed. Health Inform. 21(1), 4–21 (2017)
21. Lu, J., Getz, G., Miska, E.A., Alvarez-Saavedra, E., Lamb, J., Peck, D., Downing, J.R.:
MicroRNA expression profiles classify human cancers. Nature 435(7043), 834 (2005)
22. Cruz-Roa, A.A., Ovalle, J.E.A., Madabhushi, A., Osorio, F.A.G.: A deep learning architecture
for image representation, visual interpretability and automated basal-cell carcinoma cancer
detection. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 403–410, Sept 2013. Springer, Berlin, Heidelberg
23. Bowles, C., Qin, C., Guerrero, R., Gunn, R., Hammers, A., Dickie, D.A., Rueckert, D.: Brain
lesion segmentation through image synthesis and outlier detection. NeuroImage Clin. 16,
643–658 (2017)
24. Shen, D., Wu, G., Suk, H.I.: Deep learning in medical image analysis. Annu. Rev. Biomed.
Eng. 19, 221–248 (2017)
25. Chen, H., Dou, Q., Wang, X., Qin, J., Heng, P.A.: Mitosis detection in breast cancer histology
images via deep cascaded networks. In: Thirtieth AAAI Conference on Artificial Intelligence,
Feb 2016
26. Van Leemput, K., Maes, F., Vandermeulen, D., Colchester, A., Suetens, P.: Automated seg-
mentation of multiple sclerosis lesions by model outlier detection. IEEE Trans. Med. Imaging
20(8), 677–688 (2001)
27. Abraham, N., Khan, N.M.: A novel focal Tversky loss function with improved attention U-Net
for lesion segmentation. arXiv preprint arXiv:1810.07842 (2018)
28. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Larochelle, H.:
Brain tumor segmentation with deep neural networks. Med. Image Anal. 35, 18–31 (2017)
29. Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised dice overlap
as a deep learning loss function for highly unbalanced segmentations. In: Deep Learning in
Medical Image Analysis and Multimodal Learning for Clinical Decision Support, pp. 240–248.
Springer, Cham (2017)
30. Yuan, Y.: Automatic skin lesion segmentation with fully convolutional-deconvolutional net-
works. arXiv preprint arXiv:1703.05165 (2017)
344 M. Jyotiyana and N. Kesswani

31. Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., Sánchez, C.I.:
A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017)
32. Lundervold, A.S., Lundervold, A.: An overview of deep learning in medical imaging focusing
on MRI. Z. Med. Phys. (2018)
33. Maclaren, J., Herbst, M., Speck, O., Zaitsev, M.: Prospective motion correction in brain imag-
ing: a review. Magn. Reson. Med. 69(3), 621–636 (2013)
34. Zaitsev, M., Akin, B., LeVan, P., Knowles, B.R.: Prospective motion correction in functional
MRI. NeuroImage 154, 33–42 (2017)
35. Juneja, K., Verma, A., Goel, S., Goel, S.: A survey on recent image indexing and retrieval
techniques for low-level feature extraction in CBIR systems. In: 2015 IEEE International
Conference on Computational Intelligence & Communication Technology, pp. 67–72, Feb
2015. IEEE
36. Pizarro, R., Assemlal, H.E., De Nigris, D., Elliott, C., Antel, S., Arnold, D., Shmuel, A.: Using
deep learning algorithms to automatically identify the brain MRI contrast: implications for
managing large databases. Neuroinformatics 17(1), 115–130 (2019)
37. Sklan, J.E., Plassard, A.J., Fabbri, D., Landman, B.A.: Toward content-based image retrieval
with deep convolutional neural networks. In: Medical Imaging 2015: Biomedical Applications
in Molecular, Structural, and Functional Imaging, vol. 9417, p. 94172C, Mar 2015. International
Society for Optics and Photonics
38. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-
nition. arXiv preprint arXiv:1409.1556 (2014)
39. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional
neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105
(2012)
40. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going
deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition, pp. 1–9 (2015)
41. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778
(2016)
42. Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in
Neural Information Processing Systems, pp. 2377–2385 (2015)
43. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional
networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-
tion, pp. 4700–4708 (2017)
44. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep
neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 1492–1500 (2017)
45. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
46. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable
image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 8697–8710 (2018)
47. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time
object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, pp. 779–788 (2016)
48. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Bengio,
Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems,
pp. 2672–2680 (2014)
49. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recogni-
tion. In: ICML Deep Learning Workshop, vol. 2, July 2015
50. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image seg-
mentation. In: International Conference on Medical Image Computing and Computer-Assisted
Intervention, pp. 234–241, Oct 2015. Springer, Cham
Deep Learning and the Future of Biomedical Image Analysis 345

51. Milletari, F., Navab, N., Ahmadi, S.A.: V-net: Fully convolutional neural networks for volu-
metric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision
(3DV), pp. 565–571, Oct 2016. IEEE

Monika Jyotiyana currently is a doctoral student in the Department of Computer Science at


the Central University of Rajasthan, Rajasthan, India. She received her Master’s degree in 2012
and completed her Undergraduate degree in 2010 from Aryan International College affiliated to
MDS University Ajmer, Rajasthan, India. Her research interests include medical image process-
ing, Machine Learning, Deep Learning and Neural Networks. She has published quality papers in
various conferences, and book chapters.

Nishtha Kesswani is currently assistant professor in the Department of Computer Science at the
Central University of Rajasthan. She did her post-doctorate research from California State Univer-
sity, San Bernardino, USA and doctorate from the University of Rajasthan, Rajasthan, India. She
received her Master’s degree from Malaviya National Institute of Technology, Rajasthan, India.
Her research interests include Algorithms, Human-Computer Interaction and Wireless networks.
She has publications in various international journals, conferences, books and book chapters.
Automated Brain Tumor Segmentation
in MRI Images Using Deep Learning:
Overview, Challenges and Future

Minakshi Sharma and Neha Miglani

Abstract Brain tumor segmentation of MRI images is a crucial task in the medical
image processing. It is very important that a brain tumor can be diagnosed in initial
stages which eventually improve treatment as well as survival chances of patient.
Manual segmentation is highly dependent on doctor, it may vary from one expert to
another as well as it is very time-consuming. On the other side, automated segmenta-
tion helps a doctor in quick decision making, results can be reproduced and records
can be maintained electronically which improves diagnosis and treatment planning.
There are numerous automated approaches for brain tumor detection which are popu-
lar from last few decades namely Neural Networks (NN) and Support Vector Machine
(SVM). But, recently Deep Learning has attained a central tract as far as automa-
tion of Brain tumor segmentation is concerned because deep architecture is able to
represent complex structures, self-learning and efficiently process large amounts of
MRI-based image data. Initially the chapter starts with brain tumor introduction and
its various types. In the next section, various preprocessing techniques are discussed.
Preprocessing is a crucial step for the correctness of an automated system. After pre-
processing of image various feature extraction and feature reduction techniques are
discussed. In the next section, conventional methods of image segmentation are cov-
ered and later on different deep learning algorithms are discussed which are relevant
in this domain. Then, in the next section, various challenges are discussed which are
being faced in medical image segmentation due to deep learning. In the last section,
a comparative study is done between various existing algorithms in terms of accu-
racy, specificity, and sensitivity on about 200 Brain Images. The motivation of this
chapter is to give an overview of deep learning-based segmentation algorithms in
terms of existing work, various challenges, along with its future scope. This chapter
deals with providing the crux of different algorithms involved in the process of Brain
Tumor Classification and comparative analysis has also been done to inspect which
algorithm is best.

M. Sharma (B) · N. Miglani


Department of Computer Engineering, National Institute of Technology, Kurukshetra, India
e-mail: [email protected]
N. Miglani
e-mail: [email protected]

© Springer Nature Switzerland AG 2020 347


S. Dash et al. (eds.), Deep Learning Techniques for Biomedical and Health Informatics,
Studies in Big Data 68, https://doi.org/10.1007/978-3-030-33966-1_16
348 M. Sharma and N. Miglani

Keywords Convolution neural networks · Brain tumor segmentation · Deep


learning · Magnetic resonance images · Support vector machine · Medical image
processing

1 Introduction

In earlier times, one could not imagine getting facilitated with a huge amount of
health care data; whereas, an enormous amount of data (precisely, big data) is avail-
able today; reason being an enhancements in an image acquisition devices and tools,
which is further engrossing as well as leading to varying challenges in the domain
of image analysis. The magnification and widening extent in terms of medical data,
such as images and techniques demands exhaustive and arid attempts by medical pro-
fessionals which would not only be error-prone but also be mutable across different
professionals. Ergo, an equivalent substitute is an absolute concern to automate the
diagnostic process. Although, machine language could help to do such automation,
yet the traditional approach would not work efficiently for complicated problems.
Thus, some sort of blending could be considered to raise an accuracy and preci-
sion level in such fields. Henceforth, machine learning along with high-performance
computing might help to tackle complex medical images for authentic and adequate
diagnostic outcomes. Similarly, feature extraction could be more powerful if done
with the help of deep learning; as such it could help to build new images as well. The
conclusions obtained by deep learning would hit the many domains namely, diag-
nosing the disease, providing accurate measurement of targets as well as providing
solutions in terms of predictive models suggesting what actions could be preformed,
eventually, guiding the field experts.
In the past few years, many fields have shown fleet evolution such as Artifi-
cial Intelligence, Deep Learning, and Machine Learning. These modalities played
a crucial role in the medical domains such as segmenting images, registering and
interpreting images, automated diagnosis, image processing, analyzing and retriev-
ing image data. Machine learning assists in image data and features extraction and
presenting this information in an organized way. These techniques of Artificial Intel-
ligence and Machine Learning can help medical experts make predictions about the
likelihood of diseases in a more detailed and precise manner and eventually, would
help to prevent them beforehand. Specialists, experts and researchers of medical
fields get enhanced and clear vision for making an analysis of generic variations that
are actually responsible for disease manifestation. Numerous traditional algorithms
form the core of these techniques, namely K Nearest Neighbors Algorithm, Neural
Networks, etc. [1]. Though these approaches are efficient yet they have their own
shortcomings in terms of processing power and time consumption. They have the
potential to process the images in their raw form but need more time for feature
extraction as well as an expert comprehension.
Automated Brain Tumor Segmentation in MRI Images Using … 349

Along with these conventional approaches, many other approaches have also
started empowering the domain namely, Long Short Term Memory, Extreme Learn-
ing Model, Recurrent Neural Network, Convolution Neural Network and many more.
These techniques overcome the limitations of conventional approaches as in feature
extraction is automated and learning becomes fast. They tend to automate the depic-
tion of information and training multiple levels of cogitation from a broad set of
images that exhibit required data behavior [2]. Despite the fact that conventional
approaches have proven to revert significantly precise results in the medical fields,
still emerging technologies and advancements helped to derive accurate solutions for
complex problems as well. Numerous deep learning algorithms produced significant
performance and speed improvements in major areas like the discovery of drugs, text
and speech recognition, facial recognition, etc. The chapter persuasion lies in exten-
sive and exhaustive retrospection of deep learning algorithms in the medical fields,
particularly, medical image analysis marking the future perspective while consider-
ing the past work as well. The chapter is inclined towards providing the elementary
information and modernity and highest development of the deep learning approaches
in the context of the medical domain.

2 What Is Brain Tumor?

Brain controls all imperative and essential functions of the human body. It forms one
of the most crucial and complicated organs of the human body and is a dominant
part of the Central Nervous System. Skull masks the human brain, which further
consists of- “gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF)”.
Cerebrospinal fluid (CSF) is a translucent liquid that sheathes the human brain as well
as the spinal cord. It provides different functionalities to the central nervous system
(CNS) as well as acts as confines shocks comprising ions, oxygen, and glucose,
distributed in nervous tissues at full length. CSF also aids in ejecting garbage from
nervous tissues [3–5].

2.1 Types of Brain Tumor [6–9]

As per the World health organization (WHO), there are approximately 120 different
types of brain tumor which have been detected so far. WHO’s classification criteria
is the cell’s origin. Broadly, brain tumors can be further classified in two categories
(as shown in Fig. 1):
a. Primary brain tumors: These are the tumor that originates in the brain. The
name of the tumor can be determined from where they originate. Tumors can
be categorized in two ways—benign and malignant. Benign tumors are also
known as non-cancerous and do not affect other parts. While malignant brain
350 M. Sharma and N. Miglani

Types of Brain Tumor

Primary Brain Tumor Secondary Brain Tumor

Giloma(45%) Metastasis

Astrocytoma Ependymoma(2%) Oligodendroglioma


(34%) (3%)

Pilocytic Low grade Anaplastic Giloblastoma Subependymoma


GRADE 1 GRADE 2 GRADE 3 GRADE 4

Fig. 1 Brain tumor classification based on their originating cell

tumors start in the brain itself and affect other parts of the body like a spine. The
growth rate of malignant tumors is fast. Benign brain tumors are easy to treat
than malignant tumors because they are not deeply buried in the brain and have
defined boundaries. Also, benign tumor if removed successfully there is very
little chance to come back. But that is not true in case of malignant tumor, if they
have been removed still there is a chance of coming back.
b. Secondary Brain Tumors: These are tumors that come from other body parts
(Metastasis). This type of tumor is named according to body parts from where
they originally spread. If a brain tumor develops from lung, then it is known
as metastatic lung tumor; reason being that this tumor gets developed due to
abnormal growth of lung cells [10].

By Grade

In medical terms, brain tumors can be classified into four grades. Grade 1 tumors are
the tumors that are in the initial phase and their growth rate is slow. Grade 2 is known
as a benign tumor. Grade1 and Grade2 come under low-grade tumor category. Grade
3 and Grade 4 can be categorized as a high-grade tumor (Malignant) and need urgent
treatment (Refer Fig. 2).

3 What Is Deep Learning?

Deep learning is also known as structured learning or hierarchical learning. It is a sub


type of machine learning that mimics human brain. In simple neuron architecture,
there are only three types of layers (input layer, hidden layer, output layer). Instead,
deep neural network contain many hidden layers for multiprocessing.
Automated Brain Tumor Segmentation in MRI Images Using … 351

Brain Tumor

Benign Malignant

Grade 1/Grade 2 Grade 3/Grade 4


• Lethargic growth • Fast growth rate
• Little chances of
• May Recur even after
reverting back
treatment (surgery)
• No proliferation in
other parts • Infect other parts as
• Surgery is enough to well
treat
• radiotherapy /
• Radiotherapy or
chemotherapy is
Chemotherapy is not
required
required

Fig. 2 Brain tumor classification according to grade

3.1 Deep Learning Architecture and Neural Network

Artificial Neural Network is an initiative to generate a computer model of the human


brain, targeted to build a system which can do computational work at a faster rate
in comparison to traditional approaches. ANN receives basic units which are inter-
connected in some way to offer communication between those units [11]. An earliest
known neural network being built upon the analogy of biological neural network
was perceptron. The simple concept of perceptron was the nodes in an input layer
being directly linked to the nodes of an output layer and was efficient enough to
hit the problem of linear separability of patterns. But this simple network was not
enough to target complex problems, henceforth, a layered structure as proposed to
focus complex patterns, which comprised, input layer and output layer; and along
with these layers, it also consisted of n number of hidden layers. In basic neural
network approach, neurons in an inter-connected form receives input, manipulations
and operations are performed on the received data and is eventually, forwarded to
the next layer which could be an output layer or a hidden layer depending upon the
created structure as shown in Fig. 3.
The concept of activation function is used to check what value would be given
to the next layer. Depending upon the threshold value selected, neuron would get
excited or inhibited. By increasing the number of hidden layers, complex problems
can be solved as hidden layers apprehend non-linear relationship. Though hidden
layers focus complex problems yet it is important to use minimum number of hidden
352 M. Sharma and N. Miglani

Input Input Hidden Hidden Output


Values Layer Layer1 Layer2 Layer

Fig. 3 Neural network structure [70]

layers because it would complicate the structure as well. Such types of neural net-
works are called Deep Neural Network. Training and learning of data is cost-effective
in such networks. These extra layers, precisely hidden layers, facilitate constitution
of features originating from lower layers and moving towards upper layers by pro-
viding ability of designing complex architecture. For designing and developing auto-
mated applications, deep learning has emerged as a promising approach and has set
a benchmark as well. Results being obtained by automations outperformed manual
observations, i.e., when applied in medical domains, computer vision applications
based upon deep learning provided accurate and precise results in capturing cancer
identifying indicators in tumors and blood in MRIs. It can be judged as an augmenta-
tion of artificial neural network which comprised numerous hidden layers permitting
an abstract view and refined image analysis. This approach has grabbed attention by
many researchers of varying fields because of its unsurpassed conclusions and results
obtained in different applications such as facial recognition, object detection as well
as medical fields. Deep Neural Network assembles many layers of nodes/neurons,
generating a hierarchical structure. The layer count has even exceeded over thousand
layers in a single network. With such tremendous modeling dimensions, the network
can absolutely recollect every feasible mapping with the help of regressive training
process by collecting huge database and could be able to make apt predictions such
as reckoning of unseen cases. Therefore, it can be concluded that this approach def-
initely has an empowering impression in the fields of medical images and computer
vision. Nonetheless, its huge influence can also be seen in the fields of voice and
text as well. Researchers are exploiting different domains and extensions of deep
neural network; one such example is Convolution Neural Network, an absolute trend
nowadays. Along with it, many more fields of this domain has become an interest of
researchers such as deep Boltzmann machine (DBM), deep neural network (DNN),
Automated Brain Tumor Segmentation in MRI Images Using … 353

deep autoencodre (dA), deep belief network (DBN), recurrent neural network (RNN)
and its variants such as MDLATM or BLSTM etc. (depicted with their advantages
and disadvantages) in Table 1. The CNN model is grabbing an attention in the fields
of digital image processing and vision.
Basic Working
Deep neural network’s working has been divided into five steps as depicted in Fig. 4.
First step involves identification of problem and feasibility study that should be
carried out. It is very crucial step to know whether deep learning can solve given
problem or not. In second step, relevant data is required to be collected. There are
various deep learning algorithms available, thus in the third step, selection of appro-
priate algorithm is carried out. Eventually, fourth and fifth step deals with training
and testing of data.

4 Benefits of Deep Learning Over Machine Learning

Image interpretation and acquisitions are two ways of performing correct disease
diagnosis. In the past few years, tools and devices for acquiring images have upgraded
considerably in such a way that nowadays high-resolution radiological images are
retrieved for performing further analysis namely, CT scans, X-Ray, MRI. Nonethe-
less, this is just an initiation of achieving benefits from automating the process of
interpreting images. Numerous applications of machine learning, such as computer
vision, are already there, yet conventional machine learning approaches which are
used for image interpretations have strong dependency on experts in terms of features
extraction, an instance could be detection of brain tumor, which entails structural fea-
ture extraction. Conventional approaches though efficient, yet yields an inaccurate
and unreliable results; reason being the huge dissimilarities between patients’ data.
Henceforth, machine learning algorithm plays crucial role in handling disordered
and convoluted data [12].
Furthermore, deep learning, being more peculiar and precise approach has diverted
so much interest in every field, precisely, medical fields for analyzing images and
expectations behold around $300 million medical imaging market would be held
by deep learning by 2021. It would separately get huge investment for medical
domains, as in providing better accuracies and results for complex data as well.
Growth of deep learning has shown tremendous growth over years (as shown in
Fig. 5) The approach falls in the class of supervised machine learning method. Deep
learning has targeted many and varying fields, one of them is the computer vision. Yet
main success lies in the contraction of human involvement for disease diagnosis and
relying on automated results in order to get high veracity level, specifically in the field
of brain tumor where infinitesimal mistake in analysis could cause blunder. In such
cases, deep learning approach provides significant results as it can better approximate
and mimic the human brain by using leading methodologies and technologies in
comparison to basic neural network approach. Deep learning delves the utility of
354 M. Sharma and N. Miglani

Table 1 Comparative analysis of deep learning architecture


Type of network Description Advantages Disadvantages
Deep Neural Network It contains more than Adapt to new problem It requires large
[1, 72] two hidden layers very easily amount of data
which can be applied It does not require Requires more
to more complex feature engineering Computational time to
relationship. It is which consumes lot of train
mainly used in the time and efforts. It cannot summarize
field of classification classification process
and regression
Convolution Neural Convolution neural CNN have relatively Requires lots of
Network [62] network comprises n less preprocessing of labeled data for
basic building blocks, image classification
namely pooling layers, CNN shows good
convolution layers, result for 2D data
and fully
inter-connected layers,
and is developed to
yield an automation of
training features
spatial hierarchies
using
back-propagation
approach
Recurrent Neural In RNN weights are RNN architecture RNN has the
Network (RNN) [73] shared across helps in time disadvantage that it
sequences dependent events. It needs dataset in large
also play major role in number
speech recognition,
natural language
processing
Deep Boltzmann It maintains only Helpful I in Optimization is very
Machine (DBM) [74] unidirectional ambiguous dataset difficult for such a
connections between large dataset
hidden layers
Deep Auto-encoder Unsupervised learning DA does not require DA suffers from the
(DA) [47] uses Deep Auto labeled data problem of vanishing
encoder and helps in Processing time is
feature or more due pre-training
dimensionality step
reduction
Deep Belief Network The model comprises Greedy approach An initialization phase
(DBN) [75, 76] connection which is being used in every makes learning
unidirectional. It respective layer as process
makes use of well as inference computationally
supervised as well as compliant enhances exhaustive and
unsupervised machine the plausibility expensive
learning approaches.
Every sub network
consisting of hidden
layer remains visible
to its next layer
Automated Brain Tumor Segmentation in MRI Images Using … 355

Fig. 4 Basic working of


deep neural network Understand problem and check for feasibility

Collect Data

Select Deep Learning Algorithm according to


requirement

Training Algorithm

Testing for performance

Fig. 5 Growth of deep


neural network [12]

deep and in-sight model of neural network. The technique proves it’s worth when
available knowledge in little and problem in hand is complicated and realistic. The
crux of neural network is its basic unit-neuron, inspiration being the working of
human brain, where multiple signals acts as an input unit, signals are passed on from
one layer to another, layers being linked together on the basis of inter-connection
weights. Eventually, the combined signals are passed through different non-linear
operations, resulting in an output signal.

4.1 Comparison of Different Architecture of Deep Learning


Models

Comparison of different architecture of deep learning models (Shown in Table 1).


356 M. Sharma and N. Miglani

5 Brain Tumor Classification Steps

Brain tumor classification consist seven stages from data collection to tumor detection
as shown in Fig. 6:

5.1 MR Image Acquisitions

First step is to develop MR image database. Images are collected from 1.5 T MRI
machine and images which are generally used have size 256 × 256. The intensity
of grey scale image has range [0 255] where 0 represents black and 255 represents
white. This database can be divided into two types-Training database and Testing
database. These images are stored in jpeg format. Examples of brain images are
shown in Fig. 7.

5.2 Image Preprocessing

Image preprocessing is a crucial step for an accurate result of subsequent steps. It


removes image noise, detect edges or contrast enhancement and is used for loading
input image to MATLAB environment [13–17]. Some techniques which are used by
proposed system for preprocessing are:

1. MR Image Acquisition

2. Image Pre-Processing

3. Feature Extraction (GLCM)

4. Feature Reduction (Genetic Algorithm)

5. Image Classification and image Knowledge Base


segmentation using ANFIS, FCM

6. Result (Type of tumor, Area of tumor)

7. Performance (Evaluation and comparisons)

Fig. 6 Seven steps involved in the brain classification


Automated Brain Tumor Segmentation in MRI Images Using … 357

Fig. 7 Brain images having tumor [4]

(a) Histogram Equalization


(b) Conversion of colored images (RGB) to grey scale images
(c) Morphological Operations
(d) Edge Detection
(a) Histogram Equalization: For enhancing contrast of image histogram equal-
ization has been applied to image. It improves image contrast which will be
beneficial for subsequent steps. Image histogram represents grey level variation
of image using graph. For producing uniform histogram all different intensity
values spread over the entire scale. Generally, CLAHE (Contrast limited adap-
tive histogram equalization) is more popular due to high accuracy [10, 17–21]
(Fig. 8).
(b) Conversion of colored images (RGB) to grey scale images: This technique
is used to convert colored images into gray scale images in order to reduce
complexity for later steps (as shown in Fig. 9). Grey scale images contain only
one image plane instead of three plane of RGB image. This conversion reduces
the data to be maintained by 1/3. This data reduction results in faster processing
of algorithm. So, this is very crucial step before any further processing [22, 23].

Fig. 8 Shows brain image before histogram equalization and after applying it. It can be seen from
both figures that contrast of image is enhanced which is beneficial for later steps [71]. a The original
MRI [5]. b Histogram equalized MRI
358 M. Sharma and N. Miglani

Fig. 9 a Colored MRI [4]. b Grey scale MRI

This can be done by using function (available in MATLAB)

I = rb2gray(RGB)

where RGB is the image to be converted in grey scale image and I is the resulting
image
(c) Morphological Operations: Morphological operations can be applied to
images for sharpening the regions and for filling gaps of image. Basically there
are four basic operations: dilation, erosion, opening and closing. Figure 10
shows before and after result of morphological operations [24, 25].

MATLAB : Image = imerode (Image1, SE0);

where SE0 = strel(‘disk’, 8);


where se is a structuring element and strel is used to create structuring element
in which shape is disk of radius 8.

Fig. 10 a Original image. b After morphological operation


Automated Brain Tumor Segmentation in MRI Images Using … 359

Fig. 11 Masks use for fuzzy Pi1 Pi2 Pi3


edge detection

Pi4 Pi5 Pi6

Pi7 Pi8 Pi9

(d) Edge Detection [26–48]


Edge detection algorithms are very helpful in finding sudden changes in the
intensity of an image and hence useless information can be filtered out. These
algorithms finds application in many areas like computer vision, image enhance-
ment, and security during multimedia communication, medical diagnosis, image
encryption, image compression and image segmentation. There are various edge
detection algorithms like Sobel edge detection, Robert edge detection, Prewitt
edge detector, Laplacian of Gaussian (LoG) detector, Canny edge detector and
Fuzzy based edge detector. Generally Fuzzy based edge detector is used. Since
medical images contains more vagueness and uncertainty. Other standard edge
detection algorithms fails in correct determination of true edges. Fuzzy based
edge detection algorithm has some more advantages like there is no need of
parameter setting, works well under all conditions even noise does not affect
process of edge detection, no filtration of noise is needed as in other edge detec-
tion algorithm. Working of fuzzy based edge algorithm has been elaborated in
following steps:
(1) First step involves conversion of colored image into grey scale image.
(2) Second step involves 3*3 mask (Fig. 11) which is used to scan the whole
image. Scanning process is repeated until complete image is scanned with
3*3 block at a time.
(3) Then, p1 to p9 crisp input are functions based on membership functions
defined. Fuzzy Input values:
For Black [0 0 255] and membership function—Triangular for White [0
255 255] and membership function-Triangular
(4) Corresponding Output will classify either white, black or an edge accord-
ing to fuzzy output values: For Black [0 2 4] and membership function:
triangular
For Edge [133 131 122] and membership function—triangular For White
[209 232 235] and membership function-triangular (as shown in Fig. 12)
(5) A total of 81 rules are made. A sample of fuzzy rules are shown in Table 2
360 M. Sharma and N. Miglani

Fig. 12 Output membership function

5.3 Feature Extraction

Features are an important attributes as far as an image is concerned. One of the vital
features of an image is the texture of the image. Filtering different features from
any pre-processed image is known as feature extraction. Such features are used in
classifying images [49–51]. There are two different approaches to segment an image:
Structured approach and statistical approach. The proposed study deals with the
statistical approach. Numerous techniques being used for texture measurement are
Gabor filters, co-occurrence matrix, wavelet transform, Fractals. The technique used
in this study applies Gray Level Co-occurrence Matrix (GLCM). This technique relies
on apprehending feature values numerically by making use of spatial relationships
among neighboring pixel features. They can also aid further in classification and
making comparison of different features values obtained numerically. The function
used to compute these features for any given image is available in MATLAB:

GLCM2 = graycomatrix(image, ‘Offset’)


Table 2 Sample of fuzzy rules
PIX1 PIX2 PIX3 PIX4 PIX5 PIX6 PIX7 PIX8 PIX9 PIX_OUTP UT
BK BK BK BK BK BK BK BK WT EDGE
BK BK BK BK BK BK BK BK BK BK
WT WT WT WT WT WT WT WT WT WT
BK WT WT WT WT WT WT WT WT WT
WT WT WT WT WT WT WT WT BK EDGE
Automated Brain Tumor Segmentation in MRI Images Using …
361
362 M. Sharma and N. Miglani

where, image is a variable used for input image and offset is used to measure features
from four different directions—0°,45°,90°,135° and have offset value—0 1, −1 1,
−1 0, −1−1 respectively.
0 0 0
(-1,-1) 135 (-1, 0)90 (-1, 1)45

These features are used for segmenting image. Image segmentation can be done in two
ways: statistical approach and structured approach. Most of the researchers make use
of statistical approach. There are several statistical techniques for measuring texture
such as co-occurrence matrix, Fractals, Gabor filters, wavelet transform. Proposed
research work uses Gray Level Co-occurrence Matrix (GLCM). GLCM captures
numerical feature values using spatial relationship among neighborhood pixels fea-
tures. These numerical feature values are used for further comparing and classifying
features. GLCM extract 20 texture features, “Autocorrelation, Contrast, Correlation,
Cluster Prominence, Cluster Shade, Dissimilarity, Energy, Entropy, Homogeneity,
Maximum probability, Sum of squares, Variance, Sum average, Sum variance, Sum
entropy, Difference variance, Difference entropy, information measure of Correla-
tion, Information measure of correlation 2 Inverse difference (INV), Inverse differ-
ence normalized (INN) Inverse difference moment normalized” (as shown in Fig. 13)
[52, 53].
GLCM features are an extracted images- for three different brain images, namely,
Brain image 1, Brain image 2, and Brain image 3 as depicted in Fig. 14 and Table 3
presents results obtained from these images.
1. Contrast (contdr): It measure variation between pixel and its adjoining pixel
in terms of grey scale change. Contrast can be computed using the formula
suggested below

Contdr = |a − b|2 Pi (a, b) (1)
a,b

Where Pi (a, b) represents pixel at position (a, b)


2. Energy (energd): It calculates- how uniform an image is?

a= Pi2 (i, j) (2)
i, j
Automated Brain Tumor Segmentation in MRI Images Using … 363

Feature types

Shape Based Intensity Based Texture Based Features


Feature features 1. Autocortrelaion
1. Area 1. Mean (autoc)
2. Perimeter 2. Variance 2. Contrast(contrd)
3. Circularity 3. S.D
3. Co-relation1 (corrpd)
4. Irregularity 4. Median
5. Skewness 4. Co-relation2 (cpromd)
5. Shape Index
6. Kurtosis 5. Cluster shade(cshad1)
7. Range 6. Energy(energd)
8. Pixel 7. Dissimilarity(Dissid)
Orientation 8. Entropy(entrod)
9. Entropy(entrod)
10. Homogeneity (homopd)
11. Maximum probability
(maxprd)
12. Sum of Squares (sosvhd)
13. Sum Average (savghd)
14. Sum Variance (svarhd)
15. Sum entropy(senthd)
16. Difference Variance(dvhd)
17. Difference entropy(denthd)
18. Information measure of Co-
relation1(inf1hd)
19. Information measure of Co-
relation2(inf2h)
20. Inverse difference (indncd)
21. Inverse difference moment
normalized (idmncd)

Fig. 13 List of feature that can be extracted from image

Fig. 14 a Brain image 1. b Brain image 2. c Brain image 3


364 M. Sharma and N. Miglani

Table 3 GLCM features for brain image 1, brain image 2, brain image 3
Feature no Feature name Feature value Feature values Feature values
image 1 image 2 image3
1 Autocortrelaion 0.07978 0.152848 43.1530
(autoc)
2 Contrast (contrd) 0.95866 0.919698 1.8692
3 Co-relation1 295.685 294.6303 0.1392
(corrpd)
4 Co-relation2 30.1227 30.70965 34.6933
(cpromd)
5 Cluster shade 0.06146 0.093196 5.2662
(cshad1)
6 Energy (energd) 0.83309 0.787922 0.1233
7 (Dissimilarity) 0.5369 0.672409 0.6877
Dissid
8 Entropy (entrod) 0.97182 0.960849 2.6980
9 Homogeneity 0.97524 0.959027 0.65645
(homopd)
10 Maximum 0.91314 0.886946 0.6411
probability
(maxprd)
11 Sum of Squares 2.31867 2.48507 0.1973
(sosvhd)
12 Sum Average 2.31703 2.447587 44.9329
(savghd)
13 Sum Variance 0.16651 0.622724 13.2626
(svarhd)
14 Sum entropy 0.53192 0.96682 133.5676
(senthd)
15 Difference 10.9731 0.965064 1.8188
Variance (dvhd)
16 Difference 1.85308 0.895411 1.8927
entropy (denthd)
17 Information 0.15269 2.227421 1.2145
measure of
Co-relation1
(inf1hd)
18 Information 0.65648 0.886946 −0.0322
measure of
Co-relation2
(inf2 h)
19 Inverse difference 0.96834 2.48507 0.2863
(indncd)
20 Inverse difference 0.62785 2.447587 0.9107
moment
Automated Brain Tumor Segmentation in MRI Images Using … 365


energd = a (3)

3. Homogeneity (HOM): Measure changes in grey values. If there are large vari-
ation in grey values then homogeneity will also be large and vice versa.
 Pi (a, b)
HOM = (4)
i, j
1 + |a − b|

4. Energy (E): It yields the sum of squared errors in the GLCM. If an image is
constant, then value of Energy becomes one.

E= Pi (a, b)2 (5)
a,b

5. Entropy (Entrod): It measures an extent of disorder in an image.



Entrod = Pi (a, b) log2 {Pi (a, b)} (6)
a,b

6. Variance (VAR): It predicts the difference between gray levels and the mean
value obtained

VAR = Pi (a, b)Pi (a, b) − µ2 (7)
a b

7. Maximum Probability (MAX): Max value represents largest value of Pi in


matrix.
8. Cluster Shade: It calculates skewness of the matrix.
9. Information measure of correlation 1

Hx y − Hx y1
I MC1 = (8)
max(Hx , Hy )

where Hxy represent homogeneity.

5.4 Feature Reduction Using Genetic Algorithm

Feature reduction helps in minimizing feature set out of total available features
to enhance the accuracy and precision of segmentation and time complexity will
also be minimized. The key behind feature reduction is to filter out merely those
366 M. Sharma and N. Miglani

Initialization and
Representation

Selection based on fitness


value

Mutation and cross-over

Stopping criteria=false
Stopping
criteria?

Stopping criteria=true

Exit

Fig. 15 General steps of genetic algorithm

features which are more relevant. Most popular feature reduction algorithms are-
“Sequential forward Selection, Sequential Backward selection, Genetic Algorithm
and Particle Swarm Optimization, Principal Component analysis [54, 55]”. Genetic
Algorithm was developed by Jon Holland in 1975 which relies on the biological
concept, that is, fittest can only survive [56, 57]. It means that only best parent can
produce their offspring. In the same manner only best solution can lead to another
best solution. Genetic Algorithm finds application in many areas like optimization
problem, Machine learning and pattern recognition.
Generally, Genetic algorithm has following steps (as shown in Figs. 15 and 16):
(1) Initialization and representation: In the first phase, initial population is gen-
erated. This initial population is randomly generated out of available search
space. Genetic algorithm uses binary coding scheme for representation where
1 shows gene is present and 0 shows gene is absent.
(2) Selection: Selection is also known as “survival of the test operator”. In this
phase, worst solutions are removed from the population while best items are
duplicated. A fitness function is used to decide whether an item is best or whether
it is worst.
(3) Cross Over and Mutation: In mutation, a position in string is chosen at random
and flips that value of that bit i.e. 1–0 or 0–1. Whereas, in crossover two best
chromosome joins at some point to generate new population.
(4) Stopping Criteria: There must be some stopping criteria for feature selection
process otherwise this process will keep on going uninterruptedly. There are
various ways to stop feature selection process-(1) a pre-defined number of fea-
tures can be selected as a stopping criteria which depends on user requirement,
Automated Brain Tumor Segmentation in MRI Images Using … 367

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1
Features f1 f2 ……………………. f20

Initialization of different parameters (Chromosome


length, Population size)

Generate temporary feature subset

ANFIS calculates fitness in terms of accuracy

no
Accuracy>threshold

yes

Feature Subset is promising

New feature subset is generated

Mutation and single point cross-over

no yes
Maximum iteration exit
reached?

Fig. 16 Different steps involved in the process of feature reduction

(2) number of iterations can be used as a stopping criteria, (3) fitness function
value cannot be changed further, then algorithm must be stopped.
1. Initially, Grey level co-occurrence matrix (GLCM) abstracts twenty texture
features from all the respective images. Each feature has been assigned
number from 1 to 20. For example f1, f2 … f20
368 M. Sharma and N. Miglani

2. These 20 features have been passed to genetic algorithm for reducing them
up to Y features. For Genetic Algorithm to be start, following parameters
should be set:
Population Size: 20 Maximum Chromosomal Length: Y
3. After the feature reduction phase, only Y most promising features will be
selected out of 20 features. 3. For initialization, initially Y features are ran-
domly selected and assigned to temporary feature set. Each time algorithm is
executed, different feature set is selected. Features are shown in the form of
binary string “10010110101010000000”, where 1 signifies respective feature
set is available and 0 signifies corresponding feature is not available or absent.
For example-
In this example, stopping criteria that has been chosen is Maximum iteration.
When maximum iteration would be reached, algorithm will stop.
4. Fitness Function: For the success of genetic algorithm, fitness function must
be defined which is used to determine whether a particular feature subset is
promising or not. In the proposed research work, ANFIS is used to calculate
fitness value using Eq. (1).


20
Tr P Tr N Tr P + Tr N
Fitness(fi) = + + (9)
i=1
Tr P + Fa N Tr N + Fa P Tr P + Fa N

where, Fitness (fi) represents fitness value corresponding to a particular feature


subset. In Eq. (1), “True Positive (Tr P) means both training algorithm and testing
algorithm results are positive, True Negative (Tr N) is both training algorithm
and testing algorithm results are negative, False Positive (Fa P) signifies training
algorithm result is positive and testing algorithm is negative and False Negative
(Fa N) suggests Training algorithm result is negative and Testing algorithm results
are positive”. Thus, there are two types of sets being formed i.e. A and B. A
represents those features which are selected and B represents those features which
are not selected.

Total features = A U B
Fitness(A) = F(i) − penalty(A)

where A is the subset of selected features, and penalty (A) = w * (|A| − d) where
w is penalty coefficient. On the basis of fitness function, next generation features
are selected (as shown in Table 4). The fitness value helps in deciding whether
feature selected is good or not.
Automated Brain Tumor Segmentation in MRI Images Using … 369

Table 4 Best chromosome


Sr. no Feature number selected Classification accuracy
selected by Genetic
Algorithm with their 1 14,16,6,12,9,7,20 74.0331
classification accuracy 2 13,19,20,4,6,20,7 74.0222
3 15,19,17,12,14,7,20 73.3425
4 20,17,9,18,3,15,19 73.2432
5 9,20,15,5,19,18,3 73.2044
6 1,2,1617,16,1,11 73.0
7 3,7,16,20,19,6,15 75
8 9,16,3,13,14,6,3 72.0994
9 9,20,15,5,19,18,3 74.7238
10 18,20,19,6,10,5,13 71.2707

5.5 Neuro Fuzzy Modeling

Neuro Fuzzy concept was developed in 1995 by J.S.R Jang. The hybridization of
neuro-fuzzy is the most fruitful integration of the Soft Computing techniques. Neuro
Fuzzy system combines benefits of both fuzzy system and neural network. Fuzzy
logic is capable of modeling vagueness, handling uncertainty and supporting human-
type reasoning.
The Adaptive Network based Fuzzy Inference System (ANFIS) uses a Takagi
Sugeno Fuzzy Inference System and it has five layers as shown in Fig. 17. The
first hidden layer is used for mapping of the input variable to their corresponding
membership functions. To calculate antecedent of rule, T-norm is applied in the
second hidden layer. Final shape of membership function is also tuned in the second
layer. The third hidden layer is concerned about normalization of rule strength.

Fig. 17 Layered architecture of anfis


370 M. Sharma and N. Miglani

Fig. 18 GUI of ANFIS editor

5.6 ANFIS Editor

ANFIS Editor GUI can be used for initialization of FIS properties. To start ANFIS
editor in MATLAB type:
anfisedit
Figure 18 shows GUI of ANFIS editor in which no. of inputs are 7 and corresponding
to 7 input there is 1 output. Each input has two membership functions of custom type.
There are different panel in GUI such as loading the data, generating FIS, training
FIS, and testing FIS where loading of data is the first step. Data should be in matrix
form and can be either taken from file or workspace.

5.7 Training and Testing Phase

Proposed system comprises two steps: In first step, training is done and in the second
step, testing is done as shown in Fig. 19.
Automated Brain Tumor Segmentation in MRI Images Using … 371

Training Image Data Set Feature Stored in Database Test Image

Feature Extractor [X1 X2……X7]

Feature
Extractor

Feature Extractor [X1 X2……X7]

Feature of test image

[Y1, Y2……Y7]

Best Match searches in


Feature Extractor [X1 X2……X7] database

Feature Extractor [X1 X2……X7]

Fig. 19 Schematic diagram for MRI training and testing

In training phase, features from different images are extracted using GLCM and
are reduced to 7 feature subset using Genetic Algorithm and then store them in the
database along with the corresponding output. Total 57 images are used to train
proposed system. When a query image comes for tumor identification, firstly its
GLCM image features are extracted and are finally send to recognizer of proposed
work for finding the best suitable match. After, finding suitable match, corresponding
output will be generated. Output means which type of tumor is there and grade of
tumor as well [58].
372 M. Sharma and N. Miglani

Image Segmentation and Classification Methods

Edge Based Methods Supervised Methods


Region Based
1. Gradient based 1. KNN
1. Region growing
methods 2. SVM
and splitting
2. Gray Histogram 3. PCA
2. Region merging
method
3. Watershed
segmentation
4. Level set method
5. Active Contour Unsupervised Methods Neural network Based
1. K means
2. FCM
3. ANT Tree Algorithm

Feed Forward learning Feed Back Learning


1. Single layer 1. ART models
2. Multi-Layer

Fig. 20 Classification of MRI brain image segmentation methods

5.8 Image Segmentation Methods [59–61]

There is various image segmentation methods as shown in Fig. 20.

5.9 Fuzzy C-Means Segmentation

Fuzzy C-Means Segmentation (FCM) is a well-known clustering algorithm, used in


pattern recognition [62–68]. FCM has an advantage that it is not necessary that one
data belongs to only one cluster instead one data can share more than one cluster.
Basic FCM features are shown in Fig. 21.
The FCM algorithm partitions finite collection of n elements X = {x1 , …, xn }
into a collection of c fuzzy clusters with respect to some given criterion.
Step 1: Initialization
Initialize membership function means assign cluster to each one of them. For
example-Four clusters (C1, C2, C3, C4) have been used for detecting four type of
brain tumor.


C
µ j (xi ) = 1 (10)
j=1
Automated Brain Tumor Segmentation in MRI Images Using … 373

Start

Initialize membership matrix

Calculate centroids

Calculate dissimilarity between the data points


and centroid using Euclidean distance

Update new membership matrix

No
Is previous cluster center same
as new cluster center?

yes

Stop

Fig. 21 Flow chart of FCM algorithm

where
i = 1, 2, 3, … n
n represent no. of elements to be partition into clusters.
J = 1, 2, 3 … C
C represents no. of clusters in which elements are to be partitioned
µ j (xi ) represents degree to which element xi belongs to cluster Cj

Step 2: Calculate centroids

  m
i µ j (x i) xi
cj =   m (11)
i µ j (x i)

where, m is fuzzification parameter and its value lies between 1.25 and 2 (generally)

Step 3: Calculate dissimilarity between the data points and centroid using
Euclidean distance
374 M. Sharma and N. Miglani


Di = (x2 − x1 )2 + (y2 − y1 )2 (12)

Step 4: Update new membership matrix using the eq

  m−1
1
1
d ji
µ j (xi ) =   m−1 (13)
c 1
1

j=1 d ji

Step 5: Go back to step 2, unless centroids are not changing

In Fig. 22, four clusters are represented by four colors-red, blue, purple and green
and cluster center is represented by “X”.
• Shape feature can also be used to increase classification accuracy. Get extra infor-
mation from patient like history, age to increase classification accuracy.
• Modified Sugeno type ANFIS can be used.

Fig. 22 Output after FCM segmentation


Automated Brain Tumor Segmentation in MRI Images Using … 375

6 Various Challenges Faced by Deep Learning

Though deep learning in itself is a domain with numerous benefits and has large
number of practical applications yet to attain those benefits, one might encounter
some challenges as discussed below:

6.1 Huge Amount of Data

The human brain requires lots of information and experiences to reach to any out-
come. On similar pattern, artificial neural networks demands huge amount of data for
training and learning. Huge dataset is beneficial to obtain accurate and precise results.
Deep learning classifier relies heavily on the magnitude and quality of dataset avail-
able. If limited data or information is available, it could directly hamper the success
ratio of deep learning, specifically in medical domains [69]. Although, huge dataset
is a crucial concern, yet another challenge lies in generating such data for medical
imaging as it depends on the observations and interpretations provided by experts
of that field. In order to minimize inaccuracies and human errors, it is important to
consider multiple experts opinions. This would become difficult if field experts are
not available. Moreover, in extreme cases of rare diseases, sufficient cases might
not be available. One more issue could be unbalancing of data as if it is the case of
rare disease, data set could be unprecedented, and in which case an imbalance may
supervene.

6.2 Domain Specific and Multi-tasking

In deep learning, training the data can yield productive and precise results, but only
for a specific problem. In the current scenario, deep learning approach is highly
domain-specific in such a way that if one requires solution for similar kind of prob-
lems or patterns, one has to re-assess and re-train the data all over. Although, the
approach is efficient enough for solving some specific problem, yet it is inflexible to
accommodate multi-tasking. Research is going on to focus multi-tasking without the
need of revising complete architecture. Multi-Task Learning (MTL) and Progressive
Neural Networks are being explored to bring some amelioration in this aspect.

6.3 Deep Learning Is Intrinsically a Black Box

Deep learning algorithms bought new hopes in the field of medical imaging and
triggered new opportunities. It provided the solution for the problems which were
376 M. Sharma and N. Miglani

previously considered to be unsolvable by conventional approaches. Still, it has its


own shortcomings. One of them is Black-Box problem. Although a clear vision is
there about what input has been fed to the network, and how they would be combined
together yet an output generation is quite complex and there is no clear understanding
about how output has been generated. Identifying inputs, applying model parameters,
and building the model is available but how the model is actually working is quite
an issue to understand. For such reasons, the domain becomes weak in the situations
where verification is the foremost requirement as internal manipulations are hidden
from user.

6.4 Optimizing Hyper-parameters

When the values of parameters are set before the learning process begins, these are
called hyper-parameters. If a small change is done in these values, it could largely
affect the model performance. When real life problems are considered, default value
of parameters cannot help building accurate results. It can hamper the system per-
formance significantly. If small number of hyper-parameters are considered and are
tuned manually instead of optimizing them with standard methods, could also raise
a performance issue.

6.5 Requires High Performance Hardware

Deep learning requires high capacity hardware which is costly and demands huge
power consumption as well.

6.6 Less Flexibility

Deep neural network can be trained to one domain only. It cannot adapt to another
domain. For different problem, it again requires training of neurons.

7 Research Issues and Future Perspectives

Processing Power, Big Data and Deep Learning Algorithms based on human brain
are three key features that are stimulating the revolution of deep learning. Undoubt-
edly, the benefits achieved by deep learning are remarkable and for attaining those
benefits, human efforts and cost incur is also high. Large scale companies and differ-
ent research laboratories with prominent hospitals are also engaging and functioning
Automated Brain Tumor Segmentation in MRI Images Using … 377

together towards reaching the most favorable unravelments in medical fields. Numer-
ous companies namely, Hitachi, Siemen etc. have already step forward for putting
high expenses in the domain. For detection of pediatric brain disorders, GE Health-
care with Bostons Children Hospital is developing smart imaging technology. Even
research labs are expending money for delivering potent image-based applications.

7.1 Enhancements in Deep Learning Approach

Deep learning technology relies on supervised learning approach. Nonetheless, illus-


trations of medical data, precisely, medical images are not available often. These are
the cases when either disease occurrence is rare or field expert is not available. To
overpower as issue of data unavailability, it is crucial to switch from supervised
to either unsupervised or semi-supervised learning method. If training approach is
shifted to unsupervised or semi-supervised approach, specifically in medical fields,
an accuracy and precision of final results might come on stake. Though efforts are
being put in this aspect, yet some rigid solution has not been attained to tackle
with an issue of inaccuracies. There are infinite opportunities lying for the scope of
improvements and modifications.

7.2 Big Image Data Exploitation

There is a requirement of huge dataset for applying deep learning methods, and
availability of such huge data in itself is a crucial and difficult task. Illustration of
real world data is easy in comparison to medical image data. For instance, illustration
of objects, distinction of men or women in real world is a negligible task to do whereas
interpretation of medical images requires field expertise as well as it is costly affair
which demands lot of time for processing. In fact, not only an opinion of single
expert but a multiple experts for same data are required for gaining accuracies and
peculiarities in manipulating image data. One more issue could lie in whether data
is available or not in case where diseases are rare. In such cases, it becomes more
difficult to get large amount of dataset. The solution for above- suggested problem
could be the sharing of data by different healthcare service providers as far as possible.
In this way problem of data access could be minimized.

7.3 Pervasive Inter-organization Collusion

Even though numerous predictions about benefits and growth of deep learning in
medical image field are being made by stakeholders, yet replacement of human with
machines or tools will always remain a debatable issue. Significant improvements in
378 M. Sharma and N. Miglani

accuracies of analysis and prediction in disease diagnosis by deep learning approach


cannot be ignored. However, some issues persists which needs immediate attention
of researchers. Collusion between vendors, field experts and hospitals is unavoidable
in order to meet exceptional benefits for enhancing the health quality. This would
resolve the problem of data availability to the field experts and researchers. Another
issue contracts an advanced tools and equipments to tackle exhaustive and unlimited
healthcare data. This would be more helpful in the cases where sensor networks are
increasing volume of data in an exponential way.

7.4 Privacy and Judicial Concerns

Either technical or sociological issues can affect data confidentiality, thus there is an
urge of dealing it with both perspectives technical as well as sociological. To deal
with privacy concerns, HIPAA comes to the mind as far as medical field is concerned.
HIPAA stands for Health Insurance Portability and Accountability Act of 1996, is an
US Legislation. It renders patients with the legal rights concerning his/her individual
accountable information and providing some standards and protocols to secure their
personal details and their use in any form. This privacy concern is an absolute need of
the current scenario yet it is challenging in terms of how to secure and hide the patient
personal information in order to forbid its misuse. If some kind of restrictions would
prevail on data, then it would limit the content availability, which would further
raise an issue of limited dataset and henceforth, would lead to inaccurate results.
Although it is not mandatory to comply with HIPAA yet secure health information
can be stored and maintained as HIPAA covered entity. Applicability of HIPAA exists
only if Protected Health Information for transactions is transmitted electronically.
Indian organizations and companies are also being assisted for HIPAA compliance
in order to stay ahead in the world of data protection. Moreover, health care data is
dynamic in nature, thus existing methodologies are insufficient to tackle the problem.

8 Performance Comparison

Diagnostic accuracy of different image segmentation algorithm can be analyzed (as


shown in Fig. 23 and Table 5) in terms of following parameters:

Sensitivity = True Positive/(True Positive + False Negative) ∗ 100%


Specificity = True Negative/(True Negative + False Positive) ∗ 100%
Accuracy = (True Positive + True Negative)/(True Positive
+ True Negative + False Positive + False Negative) ∗ 100
Automated Brain Tumor Segmentation in MRI Images Using … 379

Fig. 23 Comparative analysis between deep learning and other segmentation methods

Table 5 Comparative
Algorithms Sensitivity Specificity Accuracy
analysis between deep
(%) (%) (%)
learning and other
segmentation methods (also, Fuzzy C 96.1 93.4 86.16
refer Fig. 23) means
segmentation
ANFIS + 95.1 93.1 90.1
Genetic
K-Mean + 80.1 93.32 83.4
FCM
Deep 97.01 96.1 97.17
learning
(CNN)

9 Conclusion

For the automation of daily life tasks, deep learning has gained much popularity in
recent years. In the upcoming years most of the routine jobs would be performed
using automatic devices rather than manual work. This chapter yields an overview of
different segmentation methods for images. Deep learning methods are more efficient
and can address problem in better way than other algorithms. Deep learning provides
380 M. Sharma and N. Miglani

improvised results in comparison to conventional approaches of machine learning. In


this chapter we discuss various phases in brain tumor segmentation. Each phase has
been discussed in brief. Various deep learning algorithms has been compared with
their relevant advantages and disadvantages. This chapter also discusses the reasons
behind slow growth of deep learning in medical field. Various solutions have been
proposed by different researchers. In the last section various research open issues
and future directions have been addressed.

10 Future Scope

(1) More features can be embedded to enhance classification precision


Shape feature is one of those features which can help raise an accuracy level of
classification being done. Get extra information from patient like history, age
to increase classification accuracy.
(2) More efficient deep learning Model
Major problem in automatic brain tumor segmentations the similarity between
background and tumor pixels. Some background pixels are misclassified as brain
tumor pixels. So, in future a more efficient deep learning model can be developed
that can differentiate between tumor and background pixels with more accuracy.
(3) To train Deep CNN a more efficient loss function can be chosen. A more effective
loss function helps in differentiating between background and tumor pixels with
improved accuracy
(4) Colored images may also be considered. This study targets only grey-scale
images. Besides, it could be intensified to augment colored images.

References

1. Zikic, D., Ioannou, Y., Brown, M., Criminisi, A.: Segmentation of brain tumor tissues with
convolutional neural networks. In: Proceedings of MICCAI workshop on Multimodal Brain
Tumor Segmentation Challenge (BRATS), pp. 36–39 (2014)
2. Pereira, S., Pinto, A., Alves, V., Silva, C.A.: Brain tumor segmentation using convolutional
neural networks in MRI images. IEEE Trans. Med. Imaging 35(5), 1240–1251 (2016)
3. Central Brain Tumor Registry of the United States (CBTRUS), Fact Sheet available at (2011)
http://www.cbtrus.org/factsheet.html
4. Christ, J.M., Parvathi, R.M.S.: Brain tumors: an engineering perspective. IJCSI 9(4), 392–396
(2012)
5. Schmidt, F.E.W.: Development of a time-resolved optical tomography system for neonatal
brain imaging. Ph.D. thesis, Chapter-2, pp. 25–34 (1999)
6. Thurnher, M.M., Thurnher, S.A., Fleischmann, D., Steuer, A., Rieger, A., Helbich, T., Trattnig,
S., Schindler, E., Hittmair, K.: Comparison of T2-weighted and fluid-attenuated inversion-
recovery. Am. Soc. Neuroradiol. 1601–1609 (1997)
7. Doolittle, N.D.: State of the science in brain tumor classification. Semin. Oncol. Nurs. 20,
224–230 (2004)
Automated Brain Tumor Segmentation in MRI Images Using … 381

8. Wen, P.Y., Teoh, S.K., Black, P.M.: Brain tumors: an encyclopedic approach. Cancer Neurol.
Clin. Pract. 217–248 (2001)
9. Chandrasoma, P.C.P.: Stereotactic brain biopsy. W. J. Med. 1–5 (1991)
10. Kong, N.S.P., Ibrahim, H., Hoo, S.C.: A literature review on histogram equalization and its
variations for digital image enhancement. Int. J. Softw. Eng. Res. Pract. 1(2), 386–389 (2013)
11. Singaravel, S., Suykens, J., Geyer, P.: Deep-learning neural-network architectures and methods:
Using component-based models in building-design energy prediction. Adv. Eng. Inform. 38,
81–90 (2018)
12. Du, X., Cai, Y., Wang, S., Zhang, L.: Overview of deep learning. In: 31st Youth Academic
Annual Conference of Chinese Association of Automation Wuham, China, 11–13 Nov 2016,
pp. 159–164
13. Ishak, N.F., Logeswaran, R., Tan, W.H.: Artifact and noise stripping on low-field brain mri.
Int. J. Biol. Biomed. Eng. 2(2), 59–68
14. Nobi, M.N., Yousuf, M.A.: A new method to remove noise in magnetic resonance and ultra-
sound images. J. Sci. Res. 3(1), 81–89 (2011)
15. Devasena, C.L., Hemalatha, M.: Noise removal in magnetic resonance images using hybrid
KSL filtering technique. Int. J. Comput. Appl. 27(8), 1–4 (2011)
16. Kumar, S., Kumar, P., Gupta, M., Nagawat, A.K.: Performance comparison of median and
wiener filter in image de-noising. Int. J. Comput. Appl. 12(4), 27–31 (2010)
17. Bhatia, A., Kulkarni, R.K.: High density salt and pepper noise removal through improved
adaptive median filter. Int. Conf. Comput. Sci. Inform. Technol. (CSIT-2012). 197–200 (2012)
18. Bagade, S.S., Shandilya, V.K.: Use of histogram equalization in image processing for image
enhancement. Int. J. Softw. Eng. Res. Pract. 6–10 (2011)
19. Chen, S.D.: Contrast enhancement using brightness preserving bi-histogram equalization. IEEE
Trans. Consum. Electron. 1, 1–8 (1997)
20. Wang, C., Zhongfu, Y.: Brightness preserving histogram equalization with maximum entropy:
a variational perspective. IEEE Trans. Consum. Electron. 51(4), 1326–1334 (2005)
21. Ning, C.Y., Liu S.F., Qu, M.: Research on removing noise in medical image based on median
filter method. IEEE Explore. 384–388 (2009)
22. Sawant, H.K., Deore, M.: A comprehensive review of image enhancement techniques. Int. J.
Comput. Technol. Electron. Eng. 1(2), 34–38 (2012)
23. Gonzalez, R.C., Woods, R.E.: Digital image processing, 2nd edn. Prentice Hall (2002)
24. Chen, S.D., Ramli, R.: Contrast enhancement using recursive mean-separate histogram equal-
ization for scalable brightness preservation. IEEE Xplore, 1301–1309 (2001)
25. Dykstra, C., Das, M.: The use of image morphing to improve the detection of tumors in emission
imaging. Nucl. Sci. Symp. 3, 1781–1785 (1998)
26. Marr, D., Hildreth, E.: Theory of edge detection. Proc. Roy. Soc. Lond. B. 187–217 (1980)
27. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell.
6, 679–698 (1986)
28. Schunck, B.G.: Edge detection with Gaussian filters at multiple scales. IEEE Comput. Soc.
Work. Comp. Vis.208–210 (1987)
29. Bergholm, F.: Edge focusing. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-09, 726–741
(1987)
30. Lacroix, V.: The primary raster: A Multiresolution Image Description. In: 10th International
Conference on Pattern Recognition, pp. 903–907 (1990)
31. Williams, D.J., Shah, M.: Edge contours using multiple scales. Comput. Vis. Graph Image
Process. 51, 256–274 (1990)
32. Goshtasby, A., Marr, D.: On edge focusing. Image visualization. Computer. 12, 247–256
33. Deng, G., Cahill, L.W.: An adaptive Gaussian filter for noise reduction and edge detection. In:
Proceedings IEEE Nuclear Science Symposium, pp. 1615–1619 (1994)
34. Bennamoun, M., Boashash, B., Koo, J.: Optimal parameters for edge detection. Proc. IEEE
Int. Conf. SMC. 2, 1482–1488 (1995)
35. Heric, D., Zazula, D.: Combined edge detection using wavelet transform and signal registration.
Elsevier J. Image Vis. Comput. 25, 652–662 (2007)
382 M. Sharma and N. Miglani

36. Shih, M.Y., Tseng, D.C.: A wavelet based multi resolution edge detection and tracking. Elsevier
J. Image Vis. Comput. 23, 441–451 (2005)
37. Bezdek, J.C., Chandrasekhar, R., Attikiouzel, Y.: A geometric approach to edge detection.
IEEE Trans. Fuzzy Syst. 6(1), 52–75 (1998)
38. Wu, J., Yin, Z., Xiong, Y.: The fast multilevel fuzzy edge detection of blurry images. IEEE
Signal Process. Lett. 14(5), 344–347 (2007)
39. Lu, S., Wang, Z., Shen, J.: Neuro-fuzzy synergism to the intelligent system for edge detection
and enhancement. Elsevier J. Pattern Recogn. 36, 2395–2409 (2003)
40. Shrivakshan, G.T., Chandrasekar, C., Bhandarkar, S.M.: An edge detection technique using
genetic algorithm-based optimization. Pattern Recogn. 27(9), 1159–1180 (1994)
41. Zhang, Y., Potter, W.D.: Comparison of various edge detection techniques used in image pro-
cessing. IJCSI Int. J. Comput. Sci. Issues 9(5), 269–276 (2012)
42. Becerikli, Y., Karan, T.M., Cabestany, J., Prieto, A., Sandoval, D.F.: A new fuzzy approach for
edge detection. IWANN 2005, 943–951 (2005)
43. Anver, M.M., Stonie, R.J.: Evolutionary learning of a fuzzy edge detection algorithm based on
multiple masks. Springer, vol. 12, pp. 1–13 (2005)
44. Suliman, C., Boldişor, C., Băzăvan, R., Moldoveanu, F.: A fuzzy logic based method for edge
detection. Eng. Sci. 4, 159–164 (2011)
45. Sharifi, M., Fathy, M., Mahmoudi, M.T.: A classified and comparative study of edge detec-
tion algorithms. In: Proceedings of the International Conference on Information Technology:
Coding and Computing (ITCC.02) IEEE, pp 1–4 (2002)
46. Yu-Qian, Z., Wei-Hua, G., Zhen-Cheng, C., Jing-Tian, T., Ling-Yun, L.: Medical images edge
detection based on mathematical morphology. In: Proceedings of the 2005 IEEE Engineering
in Medicine and Biology 27th Annual Conference Shanghai, China, pp. 6492–6495 (2005)
47. Saxena, S., Kumar, S., Sharma, V.K.: Comparative analysis of various edge detection tech-
niques. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(6), 758–761 (2013)
48. Haralick, R.M., Shanmugam, K., Dinstein, I.H.: Textural Features for Image Classification.
IEEE Trans. Syst. Man Cybern. 610–621 (1973)
49. Prasetiyo, Khalid, M., Yusof, R., Meriaudean, F.: A comparative study of feature extraction
methods for wood texture classification. SITIS, IEEE Conf.. 23–29 (2010)
50. Nithya, R., Santhi, B.: Comparative study on feature extraction method for breast cancer clas-
sification. J. Theor. Appl. Inf. Technol. 33(2), 220–226 (2011)
51. Chadha, A., Mallik, S., Johar, R.: Comparative study and optimization of feature-extraction
techniques for content based image retrieval. Int. J. Comput. Appl. 52(20), 35–42 (2012)
52. Ramamurthy, B., Chandran, K.R., Aishwarya, S., Janaranjani, P.: CBMIR: content based image
retrieval using invariant moments, GLCM and grayscale resolution for medical images. Eur. J.
Sci. Res. 460–471 (2010)
53. Hamza, R.M., Al-Assadi, T.A.: Genetic algorithm to find optimal GLCM features. Inf. Technol.
Univ. Babylon Iraq. pp. 1–16 (2012)
54. Jolliffe, I.T., Potter, W.D.: Principal Component Analysis, 2nd edn, pp. 1–5. Springer, New
York (2002)
55. Scholkopf, B., Smola, A., Muller, K.R.: Kernel Principal Component Analysis, pp. 327–352.
IT Press, Cambridge, MA (1999)
56. Shapiro, V.A., Veleva, P.K., Sgurev, V.S.: An adaptive method for image thresholding. In: 11th
IAPR International Conference on Image, Speech and Signal Analysis, pp. 696–699 (1992)
57. Sezgin, Mehmet, Sankur, Bulent: Survey over image thresholding techniques and quantitative
performance evaluation. J. Electron. Imaging 13, 146–165 (2004)
58. Elaiza, N., Khalid, A., Ibrahim, S., Manaf, M.: Comparative study of adaptive network-based
fuzzy inference system (ANFIS), k-nearest neighbors (k-NN) and fuzzy c-means (FCM) for
brain abnormalities segmentation. Int. J. Comput. 5(4), 513–524 (2011)
59. Zhang, J., Morgan, N.: Stochastic model based image segmentation using Markov random
fields and multi-layerperceptrons. IEEE Signal Process. 1–8 (1990)
60. Azmi, R., Norozi, N.: A new markov random field segmentation method for breast lesion
segmentation in MR images. J. Med. Signals Sens. 1(3), 156–164 (2011)
Automated Brain Tumor Segmentation in MRI Images Using … 383

61. Prastawa, M., Bullitt, E., Gerig, G.: A brain tumor segmentation framework based on outlier
detection. Med. Image Anal. 18, 217–231 (2004)
62. Dipali, B.B., Patil, S.N.: Brain tumor mri image segmentation using FCM and SVM techniques.
Int. J. Eng. Sci. Comput. 3939–3942 (2016)
63. Kannan, S.R., Ramathilagam, S., Devia, R., Hines, E.: Strong fuzzy C-means in medical image
data analysis. J. Syst. Softw. 2425–2438 (2012)
64. Zhang, J.G., Ma, K.K., Chong, V.: Tumor segmentation from magnetic resonance imaging by
learning via one-class support vector machine. IWAIT. 207–21 (2004)
65. Garcia, C., Moreno, J.: Kernel based method for segmentation and modeling of magnetic
resonance images. LNCS. 636–645 (2004)
66. Lee, C.H., Schmidt, M., Murtha, A., Bistritz, A., Sander, J., Greiner, R.: Segmenting brain
tumors with conditional random fields and support vector machines. LNCS 3765, 469–478
(2005)
67. Gibbs, P., Buckley, D.L., Blackband, S.J., Horsman, A.: Tumor volume determination from
MR images by morphological segmentation. Phys. Med. Biol. 2437–2446 (1996)
68. Letteboer, M., Olsen, O., Dam, E., Willems, P., Viergever, M., Niessen, W.: Segmentation of
tumors in magnetic resonance brain images using an interactive multiscale watershed algorithm.
Acad. Radiol. 11, 1125–1138 (2011)
69. Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin,
P.-M., Larochelle, H.: Brain tumor segmentation with deep neural networks. Med. Image Anal.
35(2017), 18–31 (2017)
70. Web Source: https://www.oreilly.com/library/view/deep-learning/9781491924570/ch04.html
71. Magudeeswaran, V., Ravichandran, C.G.: Fuzzy logic-based histogram equalization for image
contrast enhancement. Math. Eng. 1–10 (2013)
72. Vorontsov, A.O., Averkin, A.N.: Comparison of different convolution neural network architec-
tures for the solution of the problem of emotion recognition by facial expression. In: Proceedings
of the VIII International Conference “Distributed Computing and Grid-technologies in Science
and Education” (GRID 2018), Dubna, Moscow region, Russia, Sep 10–14 2018, pp. 35–40
73. Agarwal, V.: Analysis of histogram equalization in image preprocessing. BIOINFO Hum.
Comput. Interact. 1(1), 04–07
74. Yang, Y., Huang, S.: Novel statistical approach for segmentation of brain magnetic resonance
imaging using an improved expectation maximization algorithm. Optica Appl. 125–36 (2006)
75. Vinitski, S., Iwanaga, T., Gonzalez, C.F., Andrews, D., Knobler, R., Curtis, M.: Fast tissue
segmentation based on a 4D feature map. In: 9th International Conference (ICIAP 97), vol. 2,
pp. 445–452 (1997)
76. Revathy, M., Hemalataha, M.: Efficient method for feature extraction on video processing. In:
CCSEIT 2012 ACM International Conference, pp. 539–543 (2012)

Minakshi Sharma received the Ph.D. degree in Computer Science from Banasthali University
Rajasthan India, in 2015. In 2017, she joined as an Assistant Professor in NIT Kurukshetra in the
Department of Computer Engineering. She has more than 10 papers to his credit in national and
international conferences and journals. Her research interests include Deep Learning, Artificial
Intelligence, Neural Network, Fuzzy logic Based systems.

Neha Miglani she has received her Master Degree in Computer Science from Kurukshetra Uni-
versity, India in 2012. Currently, she is working as an Assistant Professor in National Institute
of Technology, Kurukshetra, India. Her research interest includes Cloud Computing, Neural Net-
works, Software Reliability ranging from Cost Models, Software Reliability Growth Models, and
Reliability metrics, etc.

You might also like