Computational Intelligence in Medical Informatics PDF
Computational Intelligence in Medical Informatics PDF
Naresh BabuMuppalaneni
Vinit KumarGunjan Editors
Computational
Intelligence
in Medical
Informatics
Series editors
Amit Kumar, Hyderabad, India
Allam Appa Rao, Hyderabad, India
Computational Intelligence
in Medical Informatics
123
Editors
Naresh Babu Muppalaneni
C.R. Rao Advanced Institute
of Mathematics, Statistics
and Computer Science
Hyderabad
India
ISSN 2191-530X
ISBN 978-981-287-259-3
DOI 10.1007/978-981-287-260-9
Foreword
This book is an enthusiastic contribution of the best research work in the eld of
bioinformatics, biotechnology, and allied domains to the International Conference
on Computational Intelligence: Health and Disease (CIHD 2014) to be held at
Visakhapatnam, India during December 2728, 2014. The main objective of this
conference is to create an environment for (1) cross-disseminating state-of-the-art
knowledge to CI researchers, doctors and computational biologists; (2) creating a
common substrate of knowledge that both CI people, doctors and computational
biologists can understand; (3) stimulating the development of specialized CI
techniques, keeping in mind the application to computational biology; (4) fostering
new collaborations among scientists having similar or complementary backgrounds.
Yet another element is provided by many interesting historical data on diabetes
and an abundance of colorful illustrations. On top of that, there are innumerable
historical vignettes that interweave computer science and biology in a very
appealing way.
Although the emphasis of this work is on diabetes and other diseases, it contains
much that will be of interest to those outside this eld and to students of
Biotechnology, Bioinformatics, Chemistry and Computer Scienceindeed to
anyone with a fascination for the world of molecules. The authors have selected a
good number of prominent molecules as the key subjects of their essays. Although
these represent only a small sample of the world of biologically related molecules
and their impact on our health, they amply illustrate the importance of this eld of
science to humankind and the way in which the eld has evolved.
I think that the contributors can be condent that there will be many grateful
readers who will have gained a broader perspective of the disciplines of diabetes
and their remedies as a result of their efforts.
Hyderabad, India
Preface
This volume contains a selection of the best contributions delivered at the International Conference on Computational Intelligence: Health and Disease (CIHD
2014) held at Visakhapatnam, India during December 2728, 2014. This conference is organized by Institute of Bioinformatics and Computational Biology
(IBCB), Visakhapatnam, India jointly with Andhra University and JNTU Kakinada.
The IBCB is a research organization established in Visakhapatnam, India. It is a
community of scholars devoted to the understanding of mysteries that remain in the
catalogue of human genes through intellectual inquiry. The Institute encourages and
supports curiosity-driven research in the elds of Bioinformatics and Computational Biology. The institute nurtures speculative thinking that produces advances in
knowledge that change the way we understand the world. It provides for the
mentoring of scholars, and ensures the freedom to undertake research that will make
signicant contributions in any of the broad range of elds in Bioinformatics and
Computational Biology.
CIHD 2014 is aimed to bring together computer professionals, doctors, academicians, and researchers to share their experience and expertise in Computational
Intelligence. The goal of the conference is to provide computer science professionals,
engineers, medical doctors, bioinformatics researchers, and other interdisciplinary
researchers a common platform to explore research opportunities.
A rigorous peer-review selection process was applied to ultimately select the
papers included in the program of the conference. This volume collects the best
contributions presented at the conference.
The success of this conference is to be credited to the contribution of many people.
In the rst place, we would like to thank Prof. Allam Appa Rao, Director, C.R. Rao
AIMSCS, who motivated and guided us in making this conference a grand success.
Our sincere thanks to Dr. Amit Kumar, Editor for Springer Briefs in Applied Sciences
and Technology, who helped us in bringing this series. Moreover, special thanks are
due to the Program Committee members and reviewers for their commitment to the
task of providing high-quality reviews. We thank Prof. B.M. Hegde (Padma Bhushan
Awardee, Cardiologist and Former Vice Chancellor, Manipal University) who
delivered the keynote address. Last but not least, we would thank the speakers Grady
vii
viii
Preface
Hanrahan (California Lutheran University, USA), Jayaram B. (Coordinator, Supercomputing Facility for Bioinformatics and Computational Biology, IIT Delhi),
Jeyakanthan J. (Professor and Head, Structural Biology and Biocomputing Lab,
Alagappa University), Nita Parekh (International Institute of Information Technology
Hyderabad (IIIT-H), Hyderabad, India), Pinnamaneni Bhanu Prasad (Advisor,
Kelenn Technology, France), Rajasekaran E. (Dhanalakshmi-Srinivasan Institute of
Technology, Tiruchirappalli), and Sridhar G.R. (Endocrine and Diabetes Centre,
Krishnanagar Visakhapatnam, India).
December 2014
Committees
General Chair
Dr. Allam Appa Rao, SDPS Fellow, Director, C.R. Rao AIMSCS, UoH,
Hyderabad, India
Conference Secretary
Dr. P. Sateesh, Associate Professor, MVGR College of Engineering
Organizing Committee
Dr. Ch. Divakar, Secretary, IBCB
ix
Committees
Prof. P.V. Nageswara Rao, Head of the Department, Department of CSE, GITAM
University
Prof. P.V. Lakshmi, Head of the Department, Department of IT, GITAM
University
Prof. P. Krishna Subba Rao, Professor, Department of CSE, GVP College of
Engineering (Autonomous)
Dr. G. Satyavani, Assistant Professor, IIIT Allahabad
Dr. Akula Siva Prasad, Lecturer, Dr VS Krishna College
Shri. Kunjam Nageswara Rao, Assistant Professor, AU College of Engineering
Shri. D. Dharmayya, Associate Professor, Vignan Institute of Information
Technology
Shri. T.M.N. Vamsi, Associate Professor, GVP PG College
Advisory Committee
Prof. P.S. Avadhani, Professor, Department of CS and SE, AU College of
Engineering
Prof. P. Srinivasa Rao, Head of the Department, Department of CS and SE, AU
College of Engineering
Dr. Raghu Korrapati, Professor, Walden University, USA
Prof. Ch. Satyanarayana, Professor, Department of CSE, JNTU Kakinada
Prof. C.P.V.N.J. Mohan Rao, Professor and Principal, Avanthi Institute of
Engineering and Technology
Dr. Anirban Banerjee, Assistant Professor, IISER Kolkata
Dr. Raghunath Reddy Burri, Scientist, GVK Bio Hyderabad
Dr. L. Sumalatha, Professor and Head, Department of CSE, JNTU Kakinada
Dr. D. Suryanarayana, Principal, Vishnu Institute Technology, Bhimavaram
Dr. A. Yesu Babu, Professor and Head, Department of CSE, Sir C.R. Reddy
College of Engineering, Eluru
Dr. T.K. Rama Krishna, Principal, Sri Sai Aditya Institute of Science and
Technology
Finance Committee
Shri. B. Poorna Satyanarayana, Professor, Department of CSE, Chaitanya
Engineering College
Dr. T. Uma Devi, GITAM University
Dr. R. Bramarambha, Associate Professor, Department of IT, GITAM University
Smt. P. Lakshmi Jagadamba, Associate Professor, GVP
Smt. Amita Kasyap, Women Scientist, C.R. Rao AIMSCS
Committees
xi
Publication Committee
Dr. Amit Kumar, Publication Chair, Director, BDRC
Dr. Kudipudi Srinivas, Co-chair, Professor, V.R. Siddhartha Engineering College
Dr. G. Lavanya Devi, Assistant Professor, Department of CS and SE, AU College
of Engineering
Dr. P. Sateesh, Associate Professor, MVGR College of Engineering
Dr. A. Chandra Sekhar, Principal, Sankethika Institute of Technology
Dr. K. Karthika Pavani, Professor, RVR and JC College of Engineering
Website Committee
Dr. N.G.K. Murthy, Professor of CSE, GVIT Bhimavaram
Dr. Suresh Babu Mudunuri, Professor of CSE, GVIT Bhimavaram
Shri. Y. Ramesh Kumar, Head of the Department, CSE, Avanthi Institute of
Engineering and Technology
Financing Institutions
Department of Science and Technology, Government of India
Contents
11
25
41
53
59
73
xiii
xiv
Contents
81
87
Abstract Sulfur oxidation is one of the oldest known redox processes in our
environment mediated by phylogenetically diverse sets of microorganisms. The
sulfur oxidation process is mediated mainly by dsr operon which is basically
involved in the balancing and utilization of environmental sulfur compounds.
DsrMKJOP complex from the dsr operon is the central player of this operon. DsrO
is a periplasmic protein which binds FeS clusters responsible for electron transfer to
DsrP protein from the dsr operon. DsrP protein is known to be involved in electron
transfer to DsrM protein. DsrM protein would then donate the electrons to DsrK
protein, the catalytic subunit of this complex. In the present work, we tried to
analyze the role of DsrO protein of the dsr operon from the ecologically and
industrially important organism Allochromatium vinosum. There are no previous
reports that deal with the structural details of the DsrO protein. We predicted the
structure of the DsrO protein obtained by homology modeling. The structure of the
modeled protein was then docked with various sulfur anion ligands to understand
the molecular mechanism of the transportation process of sulfur anion ligands by
this DsrMKJOP complex. This study may therefore be considered as a rst report of
its kind that would therefore enlighten the pathway for analysis of the biochemical
mechanism of sulfur oxidation reaction cycle by dsr operon.
1 Introduction
Sulfur oxidation reaction cycle is one of the important biogeochemical cycles in the
world. Sulfur has a wide range of oxidation states viz., +6 to 2. This makes the
element capable of taking part in a number of different biological processes. Sulfurbased chemo or photolithotrophy is one of such processes involving the transfer of
electrons from reduced sulfur compounds like sulte, thiosulfate, elemental sulfur,
etc. The sulfur oxidation process is mediated by a diverse set of microorganisms.
Only little is known about the molecular mechanisms of this sulfur oxidation
process in these microorganisms. One of the sulfur oxidizers is Allochromatium
vinosum (A. vinosum), a dominant member of purple sulfur bacteria. This bacterium
uses reduced sulfur compounds as electron donor for anoxygenic photosynthesis
[1]. Recent studies with A. vinosum revealed that a multiple gene cluster comprising
genes dsrA, dsrB, dsrE, dsrF, dsrH, dsrC, dsrM, dsrK, dsrL, dsrJ, dsrO, dsrP,
dsrN, dsrS, and dsrR is involved in the sulfur oxidation process [2]. The organism
A. vinosum has a wide range of applications in different industrial processes like
waste remediation and removal of toxic compounds, e.g., odorous sulfur
compounds like sulde and explosives and production of industrially relevant
organo-chemicals such as vitamins, bio-polyesters, and biohydrogen [1]. It is well
known that A. vinosum uses the DsrMKJOP protein complex to carry out the sulfur
oxidation process [3]. DsrJ, a periplasmic protein, may be involved in the oxidation
of a putative sulfur substrate in the periplasm and the released electrons would be
transported across the membrane via the other components, viz., DsrO, DsrP,
DsrM, DsrK successively, of the DsrMKJOP complex [3]. DsrO is a periplasmic
iron-sulfur protein and is known to be involved in electron transfer process [24]. It
is not yet known which amino acids of DsrO are involved in the electron transport
or interact with sulfur anions. So, in this work, we have attempted to characterize
DsrO protein at the structural level. We have predicted the putative active site
geometry of the DsrO protein. In order to predict the molecular mechanism of the
electron transport through DsrO protein we have docked the different sulfur anions
present in the environment with DsrO protein. Till date there are no reports that deal
with the analyses of the detailed structural information of DsrO as well as binding
of sulfur anions with this protein to predict the mechanism of electron transport.
This work is therefore the rst of its kind. Since there are no previous reports
regarding the molecular and structural details of DsrO protein, our work would
therefore be important to analyze the biochemical mechanism of sulfur oxidation
process by this ecologically and industrially important microbial species.
PLMCQHCEHPPCVDVCPTGASFKRADGIVMVDRHLCIGCRYCMMACPYKARSFIHQPTTG 187
Fig. 1 Sequence alignment of the DsrO functional domain with PsrABC (PDB code: 2VPW) as
template
Fig. 2 Superimposition of
the -carbon backbone of
DsrO on 2VPW (B). DsrO is
presented in red and 2VPW in
green
properties of the modeled protein neither considerable bad contacts nor C tetrahedron distortion nor hydrogen bond energy problems have been found. Moreover,
the average G factor, the measure of the normality degree of the protein properties,
has been found to be 0.10, which is inside the permitted values for homology
models. Furthermore, no distortions of the side chain torsion angles are found. The
Ramachandran plot [13] has been drawn. No residues are found to be present in the
disallowed regions of the Ramachandran plot. The residue prole of the model has
been checked by VERIFY3D and it indicates a good model quality.
and thiosulfate (S2O32) using the program GOLD [14]. The docked complexes that
have yielded the best GoldScore and ChemScore are selected and analyzed to study
the interactions.
Fig. 3 Three-dimensional
model of DsrO protein. The
helices are shown in red. The
strands are presented in
yellow. The rest of the part is
coil region shown in green
GoldScore tness
ChemScore DG
ChemScore H-bond
weighted
DsrO-thiosulfate
DsrO-sulfate
DsrO-sulte
36.9534
30.7964
26.9615
16.0675
13.9717
17.7079
10.5875
8.4917
12.2279
Entropy
DsrO-thiosulfate
DsrO-sulte
DsrO-sulfate
0.0975
0.55491
0.31514
3717.26643
3671.98443
3655.62403
17.03740
17.01440
16.59680
Fig. 4 Interactions of the DsrO protein with thiosulfate. Cys142, Gly164, Cys165, Tyr167,
Cys168 amino acids from DsrO protein are involved in binding with thiosulfate
Fig. 5 DsrO-sulte complex; the amino acids Gly164, Cys165, Arg166 are involved in binding
anion ligands used has the maximum chance of interactions with the protein. On the
other hand in sulfate, the sulfur atom has the highest oxidation state of +6. Hence,
sulfate has no ability to be oxidized and is not used in the oxidation process.
Fig. 6 DsrO-sulfate complex; the amino acid residues Cys162 and Gly164 are involved in
binding
4 Conclusions
In this study we elucidate the structural basis of the involvement of DsrO protein
from A. vinosum in electron transport during the oxidation of sulfur compounds. We
built the three-dimensional structure DsrO protein using comparative modeling
technique. The dockings of sulfur anions with DsrO allowed us to identify the details
of their mode of interactions. We identied the amino acid residues from the DsrO
protein that are involved in the binding of DsrO protein with sulfur anions. Results
from this study will be important for understanding the pathway of electron transport
via this protein in the global sulfur cycle. This homology model of DsrO provides a
rational framework for designing experiments aimed at determining the contribution
of amino acid residues responsible for electron transport via dsr operon.
Acknowledgments Ms. Semanti Ghosh is thankful to the University of Kalyani, Govt. of West
Bengal, India, and UGC for the nancial support. We would like to thank Bioinformatics Infrastructure Facility and the DST-PURSE program 20122015 going on in the department of Biochemistry and Biophysics, University of Kalyani for the support.
References
1. Weissgerber T, Zigann R, Bruce D, Chang Y, Detter JC, Han C, Hauser L, Jeffrie CD, Land
M, Munk AC, Tapia R, Dahl C (2011) Complete genome sequence of Allochromatium
vinosum DSM 180T. Stand Genomic Sci 5:311330
2. Grein F (2010) Biochemical, biophysical and functional analysis of the DsrMKJOP
transmembrane complex from Allochromatium vinosum. PhD. thesis, Rhenish Friedrich
Wilhelm University, Germany
10
11
12
S. Bhattacharya et al.
Abbreviations
DG
MDDGC9
OMIM
PDB
RMSD
WT
MT
Dystroglycan
Muscular Dystrophy, Dystroglycanopathy, Type C9
Online Mendelian Inheritance in Man
Protein Data Bank
Root Mean Square Deviation
Wildtype
Mutated
Highlights
1. T192M mutation causes changes in alpha-dystroglycan structure.
2. Interaction between Lymphocytic Choriomeningitis Virus (LCMV) with its
receptor alpha-dystroglycan changes due to the mutation.
3. Molecular dynamics run with temperature variation shows interaction of
LCMV with mutated receptor is stronger than the wild type receptor.
1 Introduction
Dystroglycan (DG) is a cell surface receptor and it belongs to dystrophindystroglycan complex which links extracellular matrices with cellular actin cytoskeleton [1].
Cellular actin cytoskeleton plays crucial role in development of tissues, organs not
only by providing mechanical strengths to cells but also by taking active part in
regulating macromolecular events [2] required for proper development [3]. Matured
DG protein is composed of two subunits: alpha-dystroglycan (-DG), the extracellular
part of DG that interacts and receives signals from extracellular proteins like laminin,
parlecan, argin, etc. and beta dystroglycan (-DG), which remains bound to cell
membrane [1]. This -DG is linked with cellular dystrophin, which is an active acin
binding protein [4]. The protein -DG contains a mucin like region sandwiched
between its N terminal and C terminal globular parts. This mucin rich region is
essential for the proper glycosylation and functioning of -DG [1]. Reportedly, a point
mutation T192M in -DG causes limb girdle muscular dystrophy, MDDGC9 (OMIM:
613818) affecting brain and nervous system growth resulting in severe cognitive
impairments, mental retardation and delayed motor development [5, 6]. This mutation
actually hampers LARGE (a transacetylase) mediated glycosylation leading to the
disease onset. This signies the role of -DG in development and muscle strength.
Interestingly, -DG also serves as a receptor to a large class of arenaviridae (AV),
Old World viruses (OWV) [7]. OWVs invade host system by its preglycoprotein
complex which has three major components: N terminal stable signal peptide (SSP),
followed by glycoprotein G1 (GP1 *42 kDa) and glycoprotein G2 (GP2 *35 kDa)
domains [7]. These viruses interact with the N terminal few amino acids and mucin
13
rich region of -DG via their GP1 for attachment and fusion to the host system. Some
of these AVs are also causative agents of diseases like lassa fever, Bolivian hemorrhagic fever, lymphocytic choriomeningitis (LCM), etc. causing signicant mortality
in affected human populations [8]. The prototype for OWV, Lymphocytic Choriomeningitis Virus (LCMV) causing LCM, a chronic asymptomatic, lifelong infection
of meninges which is the membrane to protect central nervous systems [9, 10]. This
has tempted us to study the molecular interaction pattern of -DG and LCMV GP1.
Although detailed studies have been carried out on LCMV regarding its global
distribution, pathogenicity, residues essential for its interaction with -DG [8], the
insight to its molecular interaction with MT -DG is still not deciphered. Therefore,
we have ventured into the prediction of molecular aspects of disease propagation. We
have taken a homology modelling approach to construct the model of LCMV GP1
domain and thereafter have docked this model with wild type (WT -DG) and T192M
mutant (MT -DG) forms of -DG separately. Our initial studies [6] with -DG
structures have also revealed that the mutant structure differs signicantly in terms of
its surface hydrophobicity in vicinity to the mutation site, 192. That is why we have
focused on the structural changes occurring in the vicinity of mutation site and its
effect on the binding orientation of LCMV GP1 with MT -DG to elucidate the effect
of the above-mentioned mutation. Molecular docking and subsequent molecular
mechanistic simulations for the rst time have revealed that MT -DG has increased
numbers of intermolecular hydrogen and hydrophobic bonds with LCMV GP1
establishing a stable interaction. Further we have employed MD with elevated temperature to monitor the changes in this interaction pattern. This study therefore for the
rst time sheds some light into the scenario where MT--DG interacts with LCMV.
And we have observed the pattern indeed changed with mutated receptor protein. In
future, this work will be helpful in designing some effective therapeutic approaches to
defeat the virus borne disease.
14
S. Bhattacharya et al.
15
3 Results
3.1 The Conformation of MT -DG Contributes
to Altered Interaction with LCMV GP1
Our studies [6] with the models of both WT and MT -DG initially have gured out
that mutant structure has more buried surfaces than that of its wild type form. In
order to investigate further the art of interactions, we have docked LCMV GP1 with
WT and MT -DG, separately. Analyses of docked pattern of the models clearly
have reflected that there are indeed signicant differences both in terms of amino
acid residues involved in interaction as well as the spatiotemporal orientation of
LCMV GP1, docked with either WT or MT -DG (Fig. 1a). As mentioned earlier,
the focus of our study was the immediate vicinity of mutation site at the amino acid
residue position 192. And we have marked those amino acid residues 8896 of
LCMV GP1 have successfully interacted with a groove formed by amino acid
residues 142192 of -DG (Fig. 1b). We have attempted to analyse the total number
of intermolecular hydrogen bonds (H-bond) and hydrophobic interactions (HPI)
formed between LCMV GP1 and -DG in both the cases, in two different systems,
i.e. discovery studio 2.5 and protein interaction calculator. Normalized average
values of the bonds formed have been used to generate a graphical view (Fig. 1c)
which reflected that LCMV-GP1 docked with MT -DG had more numbers of
H-Bonds and HPI interactions than the bonds formed for its docked model with
WT--DG (left panel, Fig. 1c). Hydrogen bonds and hydrophobic interactions are
the non-covalent interactions which are meant to stabilize the overall interaction
occurring between two proteins [22]. Disturbance in these bonds will denitely
affect the binding. Studies with the structures in further details have marked this
dispute to be largely contributed by a huge structural change in mutant form of
-DG in vicinity to the mutation site (Met 192) that has resulted in a new space for
LCMV GP1 interaction and also has created potentially altered kinks among few
residues (arrows, Fig. 1d, e). Superimpositions of models of WT and MT--DG,
minimized with either vacuum [6] or explicit solvent condition (Supplementary
File 3) have revealed that there indeed exist structural differences between the two
structures.
16
S. Bhattacharya et al.
Fig. 1 LCMV GP1 exhibited a different spatiotemporal orientation when interacted with MT -DG.
a Docked structures of LCMV GP1 (red backbone, yellow surface) with WT--DG (blue backbone,
left panel) and with MT -DG (blue backbone, right panel). The common interacting potion has been
shown with red backbone representation. b Amino acids in the vicinity of mutation site 192nd
position and the different orientation of the docked complex. c Differences in intermolecular
hydrogen bond and hydrophobic interactions have been shown. d, e Structural differences between
WT and MT -DG that open up an alternative binding site in case of mutated structure
17
Fig. 2 Unusual kinks in amino acid pairs burying the surface in MT -DG. a Different surface
hydrophobicity between WT and MT -DG making the interaction groove more buried. b The
change in dihedral angles contributing to altered kinks in MT -DG
18
S. Bhattacharya et al.
4 Discussions
LCM, an infection of meninges which serve as a protective membrane for central
nervous system, is developed due to the invasion of its causative agent LCMV to
host system via -DG, a host cell surface receptor and this interaction is largely
counted on LARGE mediated proper glycosylation of matured -DG. Additionally,
Fig. 3 Effect of temperature on the interactions. a, b RMSD versus Time plots exhibit that in case
of the complex of LCMV GP1 with wild-type receptor has shown greater deviation in RMSD for
the conformations (a). On the contrary, complex of LCMV GP1 with the mutated receptor has less
conformational changes (b) with change in temperatures
19
20
S. Bhattacharya et al.
therapeutic strategy should be developed against the viral protein to block the
interaction of LCMV GP1 with the host receptor protein. Studying the particular
folding pattern (s) of LCMV GP1 in near future may lead the path to develop new
potential therapeutic approaches to bar the spread of this viral transmission.
The only licensed drug available to certain diseases caused by Arenaviruses is
nucleoside analogue ribavirin [25]. A detailed analysis of this interaction with total
protein model will generate deeper insight to the molecular basis of the viral
attachment to mutant protein. Our future study therefore includes (i) homology
modelling of total -DG with its mutant form as well as that of LCMV GP1, (ii)
docking simulations followed by thereafter analyses of the interactions and (iii)
small molecular library screening to nd effective molecule against the viral GP1.
Acknowledgments Authors are thankful to Dept of Biochemistry and Biophysics, University of
Kalyani for their continuous support and for providing the necessary instruments to carry out the
experiments. The authors would like to thank the ongoing DST-PURSE programme. SB and AD
also are thankful to UGC, India and CSIR, India for their respective fellowships, and the DBT
(project no. BT/PR6869/BID/7/417/2012) for the necessary infrastructural support.
Conflict of Interest
The authors declare no conflict of interest.
Appendix
Supplementary
File 1 Superimposed models
built from Modbase, RaptorX
and I-TASSER
Supplementary
File 3 Superimposed WT
and MT alpha-dystroglycan
minimized with explicit
solvent system
21
22
S. Bhattacharya et al.
References
1. Henry MD, Campbell KP (1999) Dystroglycan inside and out. Curr Opin Cell Biol
11:602607. http://dx.doi.org/10.1016/S0955-0674(99)00024-1
2. Pollard TD (1986) Mechanism of actin lament self-assembly and regulation of the process by
actin-binding proteins. Biophys J 49:149151. http://dx.doi.org/10.1016/S0006-3495(86)
83630-X
3. Cheever TR, Ervasti JM (2013) Actin isoforms in neuronal development and function. Int Rev
Cell Mol Biol 301:157213. http://dx.doi.org/10.1016/B978-0-12-407704-1.00004-X
4. Henry MD, Campbell KP (1996) Dystroglycan: an extracellular matrix receptor linked to the
cytoskeleton. Curr Opin Cell Biol 8:625631. http://dx.doi.org/10.1016/S0955-0674(96)
80103-7
5. Diner P, Balci B, Yuva Y, Talim B et al (2003) A novel form of recessive limb girdle
muscular dystrophy with mental retardation and abnormal expression of alpha-dystroglycan.
Neuromuscul Disord 13:771778. http://dx.doi.org/10.1016/S0960-8966(03)00161-5
6. Bhattacharya S, Das A, Ghosh S, Dasgupta R, Bagchi A (2014) Hypoglycosylation of
dystroglycan due to T192M mutation: a molecular insight behind the fact. Gene 537:108114.
http://dx.doi.org/10.1016/j.gene.2013.11.071
7. Spiropoulou CF, Kunz S, Rollin PE et al (2002) New world arenavirus clade C, but not clade
A and B viruses, utilizes alpha-dystroglycan as its major receptor. J Virol 76:51405146
8. Oldstone MB, Campbell KP (2011) Decoding arenavirus pathogenesis: essential roles for
alpha-dystroglycan-virus interactions and the immune response. Virology 411:170179. http://
dx.doi.org/10.1016/j.virol.2010.11.023
9. Lapoov K, Pastorekov S, Tomkov J, Lymphocytic choriomeningitis virus: invisible but
not innocent. Acta Virol 57:160170. http://dx.doi.org/10.4149/av_2013_02_160
10. Kunz S, Sevilla N, McGavern DB et al (2001) Molecular analysis of the interaction of LCMV
with its cellular receptor [alpha]-dystroglycan. J Cell Biol 155:301310. http://dx.doi.org/10.
1083/jcb.200104103
11. Shi J, Blundell TL, Mizuguchi K (2001) FUGUE: sequence-structure homology recognition
using environment-specic substitution tables and structure- dependent gap penalties. J Mol
Biol 310:243257. http://dx.doi.org/10.1006/jmbi.2001.4762
12. Berman HM (2008) The protein data bank: a historical perspective. Acta Crystallogr A
64:8895. http://dx.doi.org/10.1107/S0108767307035623
13. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol
215:403410. http://dx.doi.org/10.1006/jmbi.1990.9999
14. Pieper U, Webb BM, Barkan DT et al (2011) ModBase, a database of annotated comparative
protein structure models, and associated resources, Nucleic Acids Res 39(Database issue):
D465D474. http://dx.doi.org/10.1093/nar/gkq1091
15. Eisenberg D, Lthy R, Bowie JU (1997) VERIFY3D: assessment of protein models with
three-dimensional proles. Methods Enzymol 277:396404. http://dx.doi.org/10.1016/S00766879(97)77022-8
16. Laskowski RA, MacArthur MW, Moss DS et al (1993) PROCHECKa program to check the
stereochemical quality of protein structures. Appl Crystallogr 26:283291. http://dx.doi.org/
10.1107/S0021889892009944
17. Chen R, Li L, Weng Z, ZDOCK: an initial-stage protein-docking algorithm. Proteins
52:8087. http://dx.doi.org/10.1002/prot.10389
18. Jimnez-Garca B, Pons C, Fernndez-Recio J (2013) pyDockWEB: a web server for rigidbody protein-protein docking using electrostatics and desolvation scoring. Bioinformatics
29:16981699. http://dx.doi.org/10.1093/bioinformatics/btt262
19. Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ (2005) PatchDock and
SymmDock: servers for rigid and symmetric docking. Nucl Acids Res 33:W363W367
23
20. Brooks BR, Bruccoleri RE, Olafson BD et al (1983) CHARMM: A program for
macromolecular energy, minimization, and dynamics calculations. J Comput Chem
4:87217. http://dx.doi.org/10.1002/jcc.540040211
21. Tina KG, Bhadra R, Srinivasan N (2007) PIC: Protein interactions calculator. Nucleic Acids
Res 35(Web Server issue):W473W476. http://dx.doi.org/10.1093/nar/gkm423
22. Jaenicke R (1991) Protein stability and molecular adaptation to extreme conditions. Eur J
Biochem 202:715728
23. The Centre for Food Security and Public Health (2010) Institute for International Co-operation
and in Animal Biologics. Iowa State University, Ames. http://www.cfsph.iastate.edu/
Factsheets/pdfs/lymphocytic_choriomeningitis.pdf
24. Pasqual G, Rojek JM, Masin M, Chatton JY, Kunz S (2011) Old world arenaviruses enter the
host cell via the multivesicular body and depend on the endosomal sorting complex required
for transport. PLoS Pathog 7:e1002232. doi:10.1371/journal.ppat.1002232
25. Lee AM, Pasquato A, Kunz S (2011) Novel approaches in anti-arenaviral drug development.
Virology 411:163169
Keywords WW domain
Arabidopsis thaliana
Homology modeling Molecular docking
Transcription
Splicing
1 Introduction
Different cellular phenomena like protein ubiquitination, splicing, organ development, tumor progression, and suppression are associated with large multi-protein
interactions. Conserved patches of proteins, known as domains, often assign the
structural and functional properties of a protein. Due to their functional importance,
A. Das S. Bhattacharya A. Bagchi (&) R. Dasgupta (&)
Department of Biochemistry and Biophysics, University of Kalyani,
Kalyani, Nadia 741235, WB, India
e-mail: [email protected]
R. Dasgupta
e-mail: [email protected]
The Author(s) 2015
N.B. Muppalaneni and V.K. Gunjan (eds.), Computational Intelligence
in Medical Informatics, Forensic and Medical Bioinformatics,
DOI 10.1007/978-981-287-260-9_3
25
26
A. Das et al.
27
diverse cellular events like inhibition of plant virus replication, cuticle development, plant flowering time control, morphological development, RNA processing,
and prolyl isomerization [914]. While the last two functions are quite common to
the WW domain family, others are specic to plants. Considering the conserveness
of different pathways involving RNA processing and prolyl isomerization, it would
not be surprising to nd out the involvement of WW domains in similar events of
plants. In this context, it is also surprising that the functional diversity of the plant
WW domains have not been well addressed yet. While plants are much less
diversied in terms of their WW proteome (Table 1) when compared to that of
animals, the WW domain-related studies of model plants like Arabidopsis thaliana
(A. thaliana) and Oryza sativa (O. sativa) are signicantly limited. Without better
understanding of the plant WW domain functions and their involvement in different
cellular events, it would not be possible to specically categorize the similarities
and differences of plant and animal WW domains, their PRL/PRP preferences as
well as their respective protein interactions. Especially characterization of those
WW domains and WW-CPs that are involved in evolutionary conserved processes
like RNA processing, prolyl isomerization, protein turnover would provide
signicant insight into better understanding of these essential cellular processes that
encompass all kinds of eukaryotes. In this context, we have undertaken homology
modeling- and docking-based studies to characterize and functionally annotate the
uncharacterized WW-CPs of model plants like A. thaliana.
As described in Table 1, the A. thaliana WW proteome consists of only 28
proteins (22 unique sequences and six redundant sequences) which has a total of 41
WW domain sequences. Among these 22 unique sequences, 8 are yet to be
functionally annotated while others were reported to be involved in most of the
above-mentioned plant WW domain functions. Among these eight uncharacterized
proteins, this study reports about the structural and functional characterization of
Species name
Number of total
WW domains
1
Homo sapiens
204
314
2
Mus musculus
139
243
3
Danio rerio
108
174
4
Drosophila melanogaster
76
139
5
Arabidopsis thaliana
28
41
6
Oryza sativa
16
25
According to the pfam database, WW domain containing proteins are more in number in animals
than to the same of plants. While vertebrates (like human, mouse, or zebra sh) contain more than
100 WW domain containing proteins and even larger amounts of WW domain sequences (on
average 1.5 WW domains per protein), variation of plant (A. thaliana and O. sativa) WW domains
is much less, even in comparison to Arthropods (fruit fly). However, the functional diversity of
plants and animals are comparable (details in text)
28
A. Das et al.
29
DALI server [21]. Only the outputs with a RMSD of <1.5 were used to assign
functions to our domains of interest.
The central groove created by the three sheets form the PRL/PRP binding
region of any known WW domain [1] and till date all the known PRLs/PRPs of
WW domains have been reported to be binding to this groove. In case of our
docking studies, only those docked complexes where the PRL/PRP was found to be
docked into this groove were considered for further analysis. Any other docking
poses of ligand-receptor complex were rejected.
We have extracted the 17 different PRL/PRP structures from different known
complexes of proteins with WW domains as receptors bound to different ligand
molecules or polypeptides which are available at the Protein Data Bank (http://
www.rcsb.org). These ligand structures were further heated to 373 K and then
cooled to 273 K to release the structural constraints, if any, that might be present in
them due to crystallization conditions or other structural limitations from the
reported structural studies. These heating and cooling procedures were performed
using the Accelrys Discovery Studio 2.5 package (Accelrys, USA; presently
BIOVIA, Dassault Systems, France). Required energy minimization for these
ligand structures were performed in Discovery Studio 2.5 using the Conjugate
Gradient algorithm. All the docking studies were performed through the Z-DOCK
server [23] (for rigid docking). Only those docked complexes that were found to
have least energy as well as stereo-chemically t, were selected for further
renement. These reliable best outputs of the Z-DOCK server were subjected to
further renement via the FlexPepDock server [24]. Since 10 different output
models are generated by FlexPepDock, similar criteria as of selecting the Z-DOCK
output were also applied to choose the output of the FlexPepDock server.
Validations of the docked complexes were performed through SAVES server using
previously mentioned analytical tools. Energy minimizations of the docked
complexes were performed as mentioned above with backbone xation to ensure
proper interactions of the docked complexes. After each minimization, the structures of the complexes were checked through Verify_3D and Procheck to determine
their stereo-chemical tness.
30
Fig. 1 Domain organization
of At F4JC80 WW domains
and structural properties of
these WW domains (left panel
for 1st WW domain and right
panel for 2nd WW domain).
a 892 amino acids long At
F4JC80 harbors two WW
domains (marked with green
boxes with their amino acid
positions marked on each
side). The pfam match
qualities of these two WW
domains are shown in details.
b The triple sheet structure
of the generated models of
these two WW domains with
each sheet numbering starting
from N terminal. c Side view
of the models shows the
difference in the upper
concave ligand binding
groove. d ProSA plot of the
energy functions of these two
WW domains validates their
structural properties. e and
f Structural alignment of these
two models with their
templates shows the areas of
differences (template shown
in yellow, major differences
are marked with red arrow
head). g Sequence view of
these model and template
structures alignment
A. Das et al.
31
(Fig. 1a). The level of condence to identify these two regions as WW domains
were very high (2.4e07 for the 1st WW domain and 8.9e06 for the 2nd WW
domain of At F4JC80). Since protein domains are the most important region of any
protein in terms of mediating its interaction with ligands, substrates, or other
proteins; presence of only WW domain in At F4JC80 rendered these two WW
domains as the sole functional motif of At F4JC80. So for further understanding of
the functional properties of At F4JC80, these two WW domains were subjected to
homology modeling.
Another important feature of At F4JC80 is the presence of polyproline rich
stretches in this protein. PRLs are known as the ligands for WW domains and based
on the composition of PRLs/PRPs, WW domains are classied into ve classes [1].
The At F4JC80 contains four polyproline stretches among which three are four
proline residues long (537540aa, 546549aa, and 568571aa) while one is ve
proline residues long (556560aa). Presence of both ligand and receptor domain
(i.e., polyproline stretches and WW domains, respectively) could have signicant
impact in determining the state of activation (through the intra-molecular interactions of these ligand and domain), oligomerization (through mediating intermolecular interactions), or binding partners (through interaction with other WW
domains or polyproline stretches of other proteins) of At F4JC80.
32
A. Das et al.
33
Ligand PDB ID
Ligand sequence
At 1st WW domain
At 2nd WW domain
1
1I5H
NEDD4-WW1
GSTILPIPGTPPPNYDSL
N/A
N
2
1K5R
YAP1
GTPPPPYTVG
N/A
Y
3
2DJY
SMURF2-WW3
GPLGSELESPPPPYSRYPMD
N/A
Y
4
2DYF
FBP11-WW1
GSTAPPLPR
N/A
N
5
2EZ5
NEDD4-WW3
TGLPSYDEALH
N/A
Y
6
2HO2
FE65
PPPPPPPPPL
N/A
Y
7
2JMF
Su(dx)-WW4
GPLGSPNTGAKQPPSYEDCIK
N/A
Y
8
2JO9
ITCH-WW3
EEPPPPYED
Y
Y
9
2JUP
FBP28-WW2
PPLIPPPP
Y
Y
10
2KQ0
NEDD4-WW3
ILPTAPPEYMEA
Y
Y
11
2KXQ
SMURF2-WW2
GPLGSELESPPPPYSRYPMD
N/A
Y
12
2LAW
YAP1
TPPPAYLPPEDP
N/A
Y
13
2LB1
SMURF2-WW2
DTPPPAYLPPEDP
Y
Y
14
2LB2
NEDD4L-WW2
ETPPPGYLSEDG
N/A
Y
15
2RLY
FBP28-WW2
PTPPPLPP
N/A
Y
16
2RM0
FBP28-WW2
PPPLIPPPP
N/A
Y
17
2V8F
PROFILIN
IPPPPPLPGV
N/A
Y
Although WW domains form a triple sheet structure, connected by turns, variation of sequences result in their variable ligand choices. We have used online
docking server Z-DOCK to nd out the ligand binding ability of both the WW domains of At F4JC80 with 17 different ligand sequences extracted from already
reported structures of different WW domainligand complexes (taken from protein database: http://www.rcsb.org). The result clearly shows that the 1st WW
domain of At F4JC80 can bind to only a handful of ligands while the 2nd WW domain binds to almost all the types of proline rich ligands that were taken into
consideration in this study. Since WW domains bind to their ligands only through their well known ligand binding groove (detailed in text), the deep groove
that we have found in case of the 1st WW domain of F4JC80 could well be the reason for the inability of this domain to bind to different ligands as such deep
grooves imposes a distance constrain to form successful atomic interactions
Serial no.
34
A. Das et al.
35
For better insight into the functional properties of the At F4JC80 WW domains,
we have used the DALI server for functional assessment by structural comparison.
The DALI output (only those outputs with a RMSD of <1.5 were considered)
showed that both the At F4JC80 WW domains are structurally similar to the WW
domains of NEDD4 family which also synergized with our ndings from the
docking studies.
Overall this study on the in silico characterization of At F4JC80 has identied
the presence of two WW domains in this protein as well as assigned functional
annotations two these domains by molecular docking techniques and structurebased functional assessment studies. Such computational studies which is a rst of
its kind on plant WW domains would provide a future direction on further characterization of this or similar types of proteins. Future studies addressing these
proteins from experimental approach will prove to be signicant in understanding
the overall interactome of these proteins.
4 Conclusions
This study on structural and functional properties of At F4JC80 identied presence
of two WW domains and four polyproline rich stretches in the protein. Both the
models of the WW domain were found to contain the standard triple sheet structure
but there were signicant difference in the PRL/PRP binding groove of the WW
domains. This difference caused better ability of the 2nd WW domain to bind to
PRL/PRPs as the deep groove was found to have a negative impact on the 1st WW
domains PRL/PRP binding capabilities. However, docking and structural comparison-based function annotation studies found both the WW domains belong to the
Class I WW domains (specically related to NEDD4 and SMURF family). Based on
the well-characterized roles of NEDD4 and SMURF E3 ubiquitin ligases [4], it can
be hypothesized that the At F4JC80 WW domains are most probably involved with
gene expression control by regulating protein turnover during TGF signaling. Also
the presence of nuclear localization signal (regions 746-KRTKKK-751 795-KRKR798 826-WREKVKRKRERAEKSQKKDPE846, identied through NLStradamus
server [25]), makes this protein an important player in nuclear context. Moreover,
the ability of the 2nd WW domain to bind to ligands of WW domains belonging to
Prp40 (PDB ID: 2DYF) and ca150 (PDB ID: 2RM0) as well as with formin
polyproline region (PDB ID: 2VDF) shows a wider arena of functional interactions
of At F4JC80 in cellular context where it has the possibility to directly regulate
cellular actin cytoskeleton [26] and gene transcription.
Acknowledgments The authors would like to thank the members of RD and AB laboratory for
their continuous support and critical assessments. AD and SB would like to thank CSIR (India)
and UGC (India), respectively, for their Ph.D. fellowships.
Author Contributions AD and SB have performed the modeling and docking studies and drafted
the manuscript. RD and AB have analyzed the results and prepared the nal version of the
manuscript. All the authors have read and approved the nal version of the manuscript.
36
A. Das et al.
Appendix
Details of At F4JC80 1st WW domain model template, secondary structure
and model quality
fbp21ww2
f4jc80ww1
LLSKCPWKEYKSDSGKPYYY-NSQTKESRWAKP 32
-----QWKMILHEESNQYYYWNTETGETSWELP 28
**
:.. *** *::* *: * *
37
sav1ww1
f4jc80ww2
LPPGWSVDWTMRGRK-YYIDHNTNTTHWSHP 30
LPSEWQAYWDESTKKVYYGNTSTSQTSWTRP 31
**. *.. *
:* ** : .*. * *::*
38
A. Das et al.
Supplementary Table 1 Detailed list of ligands used for docking studies from reported WW
domainligand complex structures
Serial no.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
PDB ID
1I5H
1JMQ
1K5R
2DJY
2EZ5
2HO2
2JMF
2JO9
2JUP
2KQ0
2KXQ
2LAW
2LB1
2LB2
2RLY
2RM0
2V8F
2V8F
Ligand sequence
GSTILPIPGTPPPNYDSL
GTPPPPYTVG
GTPPPPYTVG
GPLGSELESPPPPYSRYPMD
TGLPSYDEALH
PPPPPPPPPL
GPLGSPNTGAKQPPSYEDCIK
EEPPPPYED
PPLIPPPP
ILPTAPPEYMEA
GPLGSELESPPPPYSRYPMD
TPPPAYLPPEDP
DTPPPAYLPPEDP
ETPPPGYLSEDG
PTPPPLPP
PPPLIPPPP
IPPPPPLPGV
IPPPPPLP
References
1. Salah Z, Alian A, Aqeilan RI (2012) WW domain-containing proteins: retrospectives and the
future. Front Biosci (Landmark Ed) 17:331348. doi:http://dx.doi.org/10.2741/3930
2. Sudol M (1996) Structure and function of the WW domain. Prog Biophys Mol Biol
65:11332. doi:http://dx.doi.org/10.1016/S0079-6107(96)00008-9
3. Mouchantaf R, Azakir BA, McPherson PS, Millard SM, Wood SA, Angers A (2006) The
ubiquitin ligase itch is auto-ubiquitylated in vivo and in vitro but is protected from degradation
by interacting with the deubiquitylating enzyme FAM/USP9X. J Biol Chem 281:3873838747.
doi:http://dx.doi.org/10.1074/jbc.M605959200
4. Yang B, Kumar S (2010) Nedd4 and Nedd4-2: closely related ubiquitin-protein ligases with
distinct physiological functions. Cell Death Differ 17:6877. doi:10.1038/cdd.2009.84
5. DelMare S, Salah Z, Aqeilan RI (2009) WWOX: its genomics, partners, and functions. J Cell
Biochem 108:737745. doi:10.1002/jcb.22298
6. Sudol M, Shields DC, Farooq A (2012) Structures of YAP protein domains reveal promising
targets for development of new cancer drugs. Semin Cell Dev Biol 23:827833. doi:10.1016/j.
semcdb.2012.05.002
7. Kato Y, Miyakawa T, Kurita J, Tanokura M (2006) Structure of FBP11 WW1-PL ligand
complex reveals the mechanism of proline-rich ligand recognition by group II/III WW
domains. J Biol Chem 281:4032140329. doi:http://dx.doi.org/10.1074/jbc.M609321200
8. Lippens G, Landrieu I, Smet C (2007) Molecular mechanisms of the phospho-dependent
prolyl cis/trans isomerase Pin1. FEBS J 274:52115222. doi:http://dx.doi.org/10.1111/j.17424658.2007.06057.x
39
Abstract Cells respond to stress conditions. As a result of stress, most genes are
deactivated, while a few are activated with antistress response. The latter involves a
variety of molecules including molecular chaperones or heat shock proteins (Shps)
whose levels get increased in stressed conditions, particularly at elevated temperatures. Heat shock proteins help the other cellular proteins to achieve their native
states, i.e. correct folding or functional conformations. Thus, heat shock proteins
play a major role in protein homeostasis network of the cell. Small heat shock
proteins (sHsps) are one of the families of molecular chaperones that prevent the
irreversible aggregation and assist in the refolding of denatured proteins. Two
members of the sHsp family, IbpA and IbpB, are present in Escherichia coli. The
IbpA and IbpB proteins are 48 % identical at the amino acid sequence level and
have the characteristic -crystalline domain. It is known that the cooperation
between IbpA and IbpB is crucial for their chaperone activity in heat stressed
condition. So far, the molecular mechanisms of the stress response of the IbpA/
IbpB protein system have not been well understood. In the present work, an attempt
has been made to identify the amino acid residues of the IbpA and IbpB proteins,
which are found to be involved in proteinprotein interactions. The interactions
between IbpA and IbpB are studied with and without the presence of substrate
Lactate Dehydrogenase (LDH) at cold shock, physiological and heat shock temperatures to observe the changes in the pattern of interaction. This study is the rst
report to elucidate the mechanism of interactions between the proteins.
Keywords Small heat shock proteins
IbpaIbpB interaction
41
42
S. Bhattacharjee et al.
1 Introduction
Cells react to various types of physical (e.g. heat) or chemical (anoxia, low pH)
stresses. Cell stresses are frequent and recurring which challenge the cells leading
ultimately to stress response. Most of the genes are deactivated as a result of cellular
stress response, but only a few become activated to combat it. The antistress
response mechanism involves a variety of molecules including molecular chaperones and proteases or heat shock proteins (Shps) whose levels increase particularly
at elevated temperatures [1]. Heat Shps help the other cellular proteins to achieve
the native folding conformations, to localize them at their destined sub-cellular
organelles, to prevent their denaturation by heat stresses and retrieve native folding
states after partial denaturations. Thus, heat Shps play a vital role in cellular protein
homeostasis network [2].
Small heat shock proteins (sHsps) belong to the families of molecular chaperones that assist in the refolding of denatured proteins by resisting the irreversible
aggregation [3]. Small heat Shps are wide in distribution. They are characterized by
their low molecular weight (1230 kDa) and the presence of a conserved -crystalline domain [4]. Their robust upregulation under temperature-stressed conditions
make them among the most abundant cellular proteins [5].
Two members of sHsp family, IbpA and IbpB are found in Escherichia coli.
These two proteins are 48 % identical at amino acid sequence level [6] and have the
characteristic -crystalline domain flanked by N- and C-terminal ends [7]. Initially,
both are identied as inclusion body proteins but later they are found in protein
aggregates in temperature-stressed cells [8]. The efciency of IbpA/IbpB depends
on increased temperature with protein reactivation and protection from degradation
found at elevated temperatures [9]. It is reported that the presence of IbpA during
heat denaturation of substrate is sufcient to assist to change the macroscopic
properties of aggregates, yet this alone does not increase the efciency of the
subsequent reactivation of such aggregated polypeptides. The presence of IbpB is
required to assist to increase the efciency of disaggregation and refolding [10].
This observation depicts the cooperative mode of interaction of IbpA and IbpB, in
which IbpA associates rst with the aggregating protein or substrate and then
attracts IbpB to the complex [11]. So the cooperation between IbpA and IbpB is
crucial for their chaperone activity in temperature-stressed condition. So far, the
molecular mechanisms of the stress response of the IbpA/IbpB protein system have
not been well understood. In the present work, we tried to elucidate the interaction
proles between the proteins at the molecular level. We identied the amino acid
residues of the IbpA and IbpB proteins which are found to be involved in
proteinprotein interactions. The interactions between IbpA and IbpB are studied
with and without the presence of substrate Lactate Dehydrogenase (LDH) [12] at
cold shock, physiological and heat shock temperatures to observe the changes in the
pattern of interaction following heat and cold stress. This study is the rst report to
elucidate the mechanism of interactions between the proteins.
43
44
S. Bhattacharjee et al.
Next, the crystal structure of the substrate, i.e. LDH was retrieved from PDB
(PDB code-3H3F) [33] and the stereochemical qualities were tested with PROCHECK and Verify3D. Then, this structure was again docked with docked complex
of IbpAB using Zdock, PyDock, Patchdock, GrammX and Cluspro. The stereochemical qualities of the docked complexes so generated were tested with PROCHECK and Verify3D. Docked complex with the best score was selected as the
nal working model and was then energy minimized to ensure proper interactions
using SD algorithm in explicit solvent system applying CHARMM force eld in
GROMACS version 4.6.5 until the structure reached the nal derivative of
0.001 kcal/mol. The docked complex was minimized in 875 steps using SD
algorithm.
3 Results
Analyses of docked complexes have revealed that, irrespective of the presence or
absence of substrate, IbpA and IbpB have followed a denite pattern of interactions
following cold shock and heat shock temperature in contrast to physiological
temperature. It has been observed that, in case of IbpAIbpB docked complex, 12
amino acid residues are involved in forming H-bond interactions in cold shock
temperature (300 K) (Fig. 1). The interactions go down at the physiological temperature of (310 K) which involve only 10 amino acid residues (Fig. 2). However,
with rise in temperature from 300 to 310 K the previous 12 amino acid residues
which were involved in H-bond interactions in cold shock temperature are restored
when the docked complex is elevated to heat shock temperature (318 K) (Fig. 3)
(Table 1).
In presence of the substrate LDH, it has been observed that the IbpAB-LDH
docked complex has 18 amino acid residues forming H-bond interactions in cold
shock temperature, among which 8 amino acid residues from IbpA/B interact with
the substrate LDH and the remaining 10 residues show inter-protein H-bond
45
Fig. 1 Amino acid residues from IbpA and IbpB proteins involved in intermolecular H-bond
interaction in IbpAB protein complex in absence of substrate at cold shock temperature (300 K)
[backbone of IbpA is marked in orange and backbone of IbpB is marked in red. H-bonds are
marked in green dashed lines] (the relevant interacting residues that are unique to this
conformation are shown)
Fig. 2 Amino acid residues from IbpA and IbpB proteins involved in intermolecular H-bond
interaction in IbpAB protein complex in absence of substrate at physiological temperature (310 K)
[backbone of IbpA is marked in orange and backbone of IbpB is marked in red. H-bonds are
marked in green dashed lines] (the relevant interacting residues that are unique to this
conformation are shown)
Fig. 3 Amino acid residues from IbpA and IbpB proteins involved in intermolecular H-bond
interaction in IbpAB protein complex in absence of substrate at heat shock temperature (318 K)
[backbone of IbpA is marked in orange and backbone of IbpB is marked in red. H-bonds are
marked in green dashed lines] (the relevant interacting residues that are unique to this
conformation are shown)
interactions (Fig. 4). When this docked complex (IbpAB with substrate LDH) is
brought to physiological temperature, only 15 amino acid residues are involved in
H-bond interactions. Of these 15 amino acid residues, 6 residues interact with LDH
and the rest of them show inter-protein interactions (Fig. 5). When the complex
again is elevated to heat shock temperature, the docked complex restores the
interaction pattern as it shows 18 amino acid residues participating in H-bond
interactions, among which 8 amino acid residues interact with the substrate LDH
and the remaining 10 amino acid residues have inter-protein interactions (Fig. 6)
(Table 2).
A:GLU82:OE2
A:TYR85:O
A:GLN88:OE1
A:GLY33:O
A:GLY32:O
A:GLY32:O
B:ASN25:HD21
B:TYR35:HN
B:ASN36:HD22
B:ARG131:HE
B:ARG131:HH12
B:ARG131:HH22
A:ALA54:O
B:ALA22:HN
A:GLU92:OE2
A:ALA91:O
B:ALA20:HN
A:GLU82:O
B:TYR35:O
A:GLN88:HN
B:GLN24:HE22
B:GLN24:OE1
A:LYS81:HN
B:ALA22:HN
Donor
Donor
B:ARG131:HH21
B:ARG131:HE
B:TYR35:HN
A:GLY32:O
A:GLY32:O
A:LEU86:O
A:GLU82:OE2
A:THR84:OG1
B:ASN25:HD21
A:GLU82:O
B:ASN25:HN
A:GLU92:OE2
B:ASN36:OD1
B:TYR35:O
B:ASN21:OD1
Acceptor
B:GLN24:HE22
B:ALA22:HN
A:GLN88:HE21
A:GLN88:HN
A:TYR34:HH
310 K
Acceptor
300 K
B:ARG131:HH22
B:ARG131:HH12
B:ARG131:HE
B:ASN36:HD22
B:TYR35:HN
B:ASN25:HD21
B:GLN24:HE22
B:ALA22:HN
B:ALA22:HN
B:ALA20:HN
A:GLN88:HN
A:LYS81:HN
Donor
318 K
A:GLY32:O
A:GLY32:O
A:GLY33:O
A:GLN88:OE1
A:TYR85:O
A:GLU82:OE2
A:GLU82:O
A:GLU92:OE2
A:ALA54:O
A:ALA91:O
B:TYR35:O
B:GLN24:OE1
Acceptor
Table 1 Amino acid residues from IbpA and IbpB proteins involved in interactions in the IbpAB protein complex in the absence of substrate at cold shock,
physiological and heat shock temperatures [interacting residues that remain unchanged at these three temperatures are marked in bold pattern. IbpA is denoted
as A, IbpB is denoted as B]
46
S. Bhattacharjee et al.
47
Fig. 4 Amino acid residues from IbpA and IbpB proteins involved in intermolecular H-bond
interaction in IbpAB protein complex in presence of substrate LDH at cold shock temperature
(300 K) [backbone of IbpA is marked in orange, backbone of IbpB is marked in red and backbone
of substrate LDH is marked in blue. H-bonds are marked in green dashed lines] (the relevant
interacting residues that are unique to this conformation are shown)
Fig. 5 Amino acid residues from IbpA and IbpB proteins involved in intermolecular H-bond
interaction in IbpAB protein complex in the presence of substrate at physiological temperature
(310 K) [backbone of IbpA is marked in orange, backbone of IbpB is marked in red and backbone
of substrate LDH is marked in blue. H-bonds are marked in green dashed lines] (the relevant
interacting residues that are unique to this conformation are shown)
4 Discussions
Escherichia coli small heat Shps, IbpA and IbpB, function as molecular chaperones
and protect misfolded proteins against irreversible aggregation as well as help the
unfolded proteins to restore their native conformations following heat stress.
In our study, it is observed that irrespective of presence or absence of substrate
LDH, the number of interactions increases both in heat shock and cold shock temperatures in comparison to physiological temperature. In presence of substrate LDH, at
cold shock temperature (300 K) 8 amino acid residues of IbpA and IbpB interact with
substrate LDH and 10 residues show inter-protein interactions. Among these 8
48
S. Bhattacharjee et al.
Fig. 6 Amino acid residues from IbpA and IbpB proteins involved in intermolecular H-bond
interaction in IbpAB protein complex in the presence of substrate at heat shock temperature
(318 K) [backbone of IbpA is marked in orange, backbone of IbpB is marked in red and backbone
of substrate LDH is marked in blue. H-bonds are marked in green dashed lines] (the relevant
interacting residues that are unique to this conformation are shown)
A:GLN88:OE1
A:GLY32:O
A:GLY32:O
B:ASN36:HD22
B:ARG131:HE
B:ARG131:HH21
B:ARG131:HH21
B:ARG131:HE
A:GLY32:O
A:GLY32:O
A:THR84:OG1
A:LEU86:O
B:ASN25:HN
A:GLU82:O
A:GLU92:OE2
A:TYR34:OH
B:TYR35:O
B:GLN24:OE1
C:PHE70:O
C:ASP63:O
A:GLN100:OE1
B:TYR35:HN
B:GLN24:HE22
B:ALA22:HN
B:ASN21:HD21
A:GLN88:HN
A:GLU82:HN
A:ARG48:HH22
A:ASN38:HD22
C:TYR246:HH
A:LEU101:O
B:ALA100:O
B:SER135:O
Acceptor
B:ARG131:HH21
B:ARG131:HE
B:TYR35:HN
B:ASN25:HN
B:GLN24:HE22
B:ALA22:HN
B:ASN21:HD22
B:MET1:H2
A:ARG123:HH12
A:GLU82:HN
A:LYS81:HN
A:ARG48:HH22
A:ASP43:HN
C:TYR246:HH
C:LYS244:HZ2
C:HIS66:HE2
C:LYS58:HZ3
A:GLY32:O
A:GLY32:O
A:LEU86:O
A:THR84:OG1
A:GLU82:O
A:GLU92:OE2
A:TYR34:OH
A:ASN111:OD1
C:TRP249:O
B:GLN24:OE1
B:GLN24:OE1
C:PHE70:O
C:MET40:O
A:GLN100:OE1
A:LEU101:O
B:ALA100:O
B:SER135:O
Acceptor
B:SER135:O
Donor
C:LYS58:HZ1
318 K
Interacting residues that remain unchanged at these three temperatures are marked in bold pattern. Amino acid residues from IbpA and IbpB of IbpAB protein complex that interact
with the substrate are marked in blue. IbpA is denoted as A, IbpB is denoted as B and substrate is denoted as C
A:GLU82:O
A:THR84:OG1
B:ASN25:HN
A:GLU92:OE2
B:GLN24:HE22
A:TYR34:OH
B:ALA22:HN
C:TRP249:O
A:ARG123:HH12
B:ASN21:HD21
C:LYS244:O
A:GLN100:HE22
A:ASN111:OD1
B:GLN24:OE1
A:LYS81:HN
A:ASN111:OD1
C:PHE70:O
A:ARG48:HH22
B:MET1:H1
A:GLN100:O
C:TRP249:HE1
B:MET1:H3
A:GLN100:OE1
C:TYR246:HH
C:LYS244:HZ2
C:HIS66:HE2
A:PRO126:O
A:LEU101:O
C:LYS58:HZ1
B:ALA100:O
C:HIS66:HE2
C:HIS180:HE2
Donor
Donor
C:LYS244:HZ2
310 K
Acceptor
300 K
Table 2 Amino acid residues from IbpA and IbpB proteins involved in interactions in the IbpAB protein complex in the presence of substrate at cold shock,
physiological and heat shock temperature
50
S. Bhattacharjee et al.
Acknowledgments The authors are thankful to Department of Biochemistry and Biophysics and
BIF centre, University of Kalyani for their continuous support and for providing the necessary
instruments to carry out the experiments. The authors would like to acknowledge the ongoing DSTPURSE programme (20122015) and DBT (project no. BT/PR6869/BID/7/417/2012) for support.
Conflict of Interest
The authors declare no conflict of interests.
References
1. Gill RT, Valdes JJ, Bentley WE (2000) A comparative study on global stress gene regulation
in response to over-expression of recombinant proteins in Escherichia coli. Metab Eng
2:178189
2. Thomas JG, Baneyx F (1998) Roles of the Escherichia coli small heat shock proteins IbpA
and IbpB in thermal stress management: comparison with ClpA, ClpB, and HtpG in vivo.
J Bacteriol 180:51655172
3. Kitagawa M, Matsumura Y, Tsuchido T (2000) Small heat shock proteins, IbpA and IbpB, are
involved in resistances to heat and superoxide stresses in Escherichia coli. FEMS Microbiol
Lett 184:165171
4. Kuczyska-Winik D, Kdzierska S, Matuszewska E, Lund P, Taylor A, Lipinska B, Laskowska
E (2002) The Escherichia coli small heat shock proteins IbpA and IbpB prevent the
aggregation of endogenous proteins denatured in vivo during extreme heat shock.
Microbiology 148:17571765
5. Carri MM, Villaverde A (2003) Role of molecular chaperones in inclusion body formation.
FEBS Lett 537:215221
6. Strzecka J, Chrusciel E, Grna E, Szymanska A, Zietkiewicz S, Liberek K (2012) Importance
of N- and C-terminal regions of IbpA, Escherichia coli small heat shock protein, for chaperone
function and oligomerization. J Biol Chem 287:28432853. doi:10.1074/jbc.M111.273847
7. Van Montfort R, Slingsby C, Vierling E (2001) Structure and function of the small heat shock
protein -crystallin family of molecular chaperones. Adv Protein Chem 59:105156
8. Allen SP, Polazzi JO, Gierse JK, Easton AM (1992) Two novel heat shock genes encoding
proteins produced in response to heterologous protein expression in Escherichia coli.
J Bacteriol 174:69386947
9. Lee GJ, Roseman AM, Saibil HR, Vierling E (1997) A small heat shock protein stably binds
heat-denatured model substrates and can maintain a substrate in a folding-competent state.
EMBO J 16:659671
10. Jiao W, Qian M, Li P, Zhao L, Chang Z (2005) The essential role of the flexible termini in the
temperature-responsiveness of the oligomeric state and chaperone-like activity for the
polydisperse small heat shock protein IbpB from Escherichia coli. J Mol Biol 347:871884
11. Kuczyska-Winik D, Kedzierska S, Matuszewska E, Lund P, Taylor A, Lipiska B,
Laskowska E (2002) The Escherichia coli small heat shock proteins IbpA and IbpB prevent
the aggregation of endogenous proteins denatured in vivo during extreme heat shock.
Microbiology 148:17571765
12. Motohashi K, Watanabe Y, Yohda M, Yoshida M (1999) Heat-inactivated proteins are rescued
by the DnaKJ-GrpE set and ClpB chaperones. Pnas 13:71847189. doi:10.1073/pnas.96.13.
7184
13. Leinonen R, Diez FG, Binns D, Fleischmann W, Lopez R, Apweiler R (2004) UniProt archive.
Bioinformatics 20(17):32363237. doi:10.1093/bioinformatics/bth191 PMID15044231
14. Berman HM (2008) The Protein Data Bank: a historical perspective. Acta Crystallogr Sect A
Found Crystallogr A64(1):8895. doi:10.1107/S0108767307035623, PMID 18156675
51
1 Introduction
Multi-parameter patient monitoring (MPM) plays an important role in ensuring
quality health care in the intensive care units (ICU) and general wards to continuously monitor patients vital parameters, heart rate, blood pressure, respiratory rate
S. Premanand C. Santhosh Kumar
Machine Intelligence Research Laboratory, Department of Electronics
and Communication Engineering, Amrita Vishwa Vidyapeetham, Ettimadai 641112, India
A. Anand Kumar (&)
Department of Neurology, Amrita Institute of Medical Sciences, Cochin 682041, India
e-mail: [email protected]
The Author(s) 2015
N.B. Muppalaneni and V.K. Gunjan (eds.), Computational Intelligence
in Medical Informatics, Forensic and Medical Bioinformatics,
DOI 10.1007/978-981-287-260-9_5
53
54
S. Premanand et al.
and oxygen saturation (SpO2) and alert as and when the condition of the patient
deteriorates. Studies of vital parameters to show evidence of physiological deterioration even before the patients abnormal condition, which leads to improvement
in mortality rates [1]. The current early warning score (EWS) [2] systems assign
scores to each vital sign based on clinical experience, depending on its diversion
away from some assumed normal range. If the scores for a vital parameters, exceeds
threshold, then a clinical analysis of the patient is prompted. There is a substantial
error rate associated with manual scoring and they do not consider the intrinsic
relationship between the vital parameters that exist in healthy people [36].
Machine learning techniques can be applied to detect physiological deterioration
in patients health, with high accuracy as compared to the EWS system [7]. MPM
uses support vector machine (SVM) classication, found to be effective in
providing state-of-the-art performance [8, 9]. The vital parameter data gathered
from the bedside monitors can be utilized to train an MPM system that could be
used to effectively predict the condition of a patient under observation. One class
SVMs [10] using the prior intelligence gathered from some assumed normal
behaviour was found to be effective in the development of MPMs. However, when
sufcient samples from the abnormal condition is available, it was seen that the
two class SVMs [11] outperform one class SVMs.
Studies on the vital parameters in a healthy person suggests that there exists a
well-established intrinsic relationship [35] between the four vital parameters, for
example, when heart rate tend to increase, blood pressure is expected to get lower
and vice versa.
In this work, we capture the correlation between human vital parameters, to
capture the intrinsic relationship between the parameters for diagnosis more
accurately and also achieve higher sensitivity, specicity and overall classication
accuracy. We experimented with SVM algorithm [8] using linear, non-linear and
homogeneous kernels for constructing a model to verify the effectiveness of the
proposed approach and to enhance the operation of the MPMs. The system
developed in the present work uses MIMIC-II database to determine the developments in the vital parameters.
55
M
X
!
ai kx; svi
i1
where sv(i) means ith support vector, M is the number of support vectors and a be
the dual representation of separating hyperplanes normal vector.
In this work, we experiment with basic kernels, linear, polynomial and RBF
kernels and the additive kernels [15] like intersection [16, 17], Chi-square [15] and
JS kernels [18], in the SVM backend used in the MPM system.
1. Linear kernel: KLinear x; y xT y
2. Polynomial kernel: KPolynomial x; y 1 xT yd for any d [ 0
Z
1jjxyjj
3. RBF kernel: KRBF x; y exp
2r2
4. Intersection kernel: KIntersection x; y RNi1 minxi ; yi
:yi
5. Chi-square kernel: KChisquare x; y RNi1 xxi iy
i
i
i
6. JS kernel: KJS x; y RNi1 x2i log2 xi y
y2i log2 xi y
xi
yi
The kernel, K(x, y), interpreted as a measure of similarity [19] between the two
examples, x and y. N is the dimension of the feature vector.
56
S. Premanand et al.
parameters exposes the risk factor for problems like cardiovascular abnormalities,
hyperoxia (high oxygen content in the blood), hypoxia (low oxygen content in the
blood), etc. The baseline system features [6] set is given by,
F x1; x2; x3; x4;
pppppp
x1x2; x1x3; x1x4; x2x3; x2x4; x3x4
where x1, x2, x3 and x4 are the human vital parameters, heart rate, blood pressure,
respiration rate and oxygen saturation (SpO2) values, respectively, and the
remaining correlation features helps to capture the intrinsic relationship between the
vital parameters along with the help of explicit feature maps for addictive kernels
such as intersection, Chi square and JS kernels.
5 Experimental Results
The MIMIC-II database consists of four vital parameters collected from 413
patients. Among them, data from 12 patients were not suitable, and the remaining
401 patients data are used for the experiment. From the 401 patients, 1454,010
samples are separated as, 1100,510 (300 patients) samples as training data and
311,423 (101 patients) samples as testing data. Afterwards, the integral training and
testing data are shuffled randomly as 50,000 and 20,000 samples with the corresponding labels. The process is repeated to generate samples for seven independent
trials, and the results obtained from these trails are averaged to obtain the nal
result. We used LIBSVM for all our experiments. Table 1 illustrates the performance of SVM MPM system by using four vital parameters, Table 2 illustrates the
performance of SVM MPM system by using four vital parameters with additional
six correlation features by using geometric mean calculation thereby, increasing the
efciency and Table 3 explains the expansion of six features with the previous
process by the proportion of geometric mean to the arithmetic mean calculation,
hence we achieve better efciency than by 10-feature parameter. Results conclude
that the use of correlation features (16 features) along with the four vital parameters
helped to enhance the MPM system performance when compared to the baseline
system with ten features.
57
Table 1 Result for 4 feature by using baseline kernels along with homogeneous kernels
Kernels
Overall accuracy
Sensitivity
Specicity
Linear
Polynomial
RBF
Intersection
Chi-squared
JS
77.14
92.62
97.38
99.61
93.56
92.76
1.55
79.54
95.54
98.36
79.65
77.59
100
96.58
97.93
99.99
97.77
97.35
Table 2 Result for 10 feature using baseline kernels along with homogeneous kernels
Kernels
Overall accuracy
Sensitivity
Specicity
Linear
Polynomial
RBF
Intersection
Chi-squared
JS
91.94
95.06
96.82
99.02
93.08
91.92
77.09
86.48
95.86
97.84
80.09
77.48
96.43
97.72
97.67
99.56
97.16
96.37
Table 3 Result for 16 feature using baseline kernels along with homogeneous kernels
Kernels
Overall accuracy
Sensitivity
Specicity
Linear
Polynomial
RBF
Intersection
Chi-squared
JS
92.61
95.19
97.91
99.87
93.84
92.95
79.44
86.57
96.25
98.39
80.84
78.61
96.60
97.76
98.41
99.97
97.82
97.29
6 Conclusion
We notice that the values of the four vital parameters heart rate, blood pressure,
respiration rate and oxygen saturation are always positive quantities, and thus the
correlation features derived from these four vital parameters. In this report, we
proposed a novel approach to improve the process of the MPMs taking advantage
of the intrinsic relationship between the vital parameters, using additional six
features by a proportion of geometric mean to the arithmetic mean calculation of the
vital parameters required in a pair of two, making the total number of features in the
proposed system as 16. In our experiments with the feature expansion for MPM,
we are able to make an improvement in the proposed system. We measured the
baseline system and the proposed system using sensitivity, specicity and overall
classication accuracy.
58
S. Premanand et al.
References
1. National Patient Safety Association and others (2007) Safer care for acutely ill patients:
learning from serious accidents. Technical report, NPSA
2. Tarassenko L, Clifton DA, Pinsky MR, Hravnak MT, Woods JR, Watkinson PJ (2011)
Centile-based early warning scores derived from statistical distributions of vital signs.
Resuscitation 82(8):10131018
3. Dornhost AC, Howard P (1952) Respiratory variations in blood pressure. Circulation
6:553558
4. Yasuma F, Hayano J (2004) Respiratory sinus arrhythmia: why does the heart beat
synchronize with respiratory rhythm? Chest 2:683690
5. Beata G, Anna S et al (2013) Relationship between heart rate variability, blood pressure and
arterial wall properties during air and oxygen breathing in healthy subjects. Auton Neurosci
178(12):6066
6. Vishnuprasad K, Santhosh Kumar C, Ramachandran KI, Vaijeyanthi V, Anand Kumar A
(2014) Towards building low cost multi-parameter patient monitors, ICC, 10 Apr 2014
7. Chan AB, Vasconcelos N, Moreno PJ (2004) A family of probabilistic kernels based on
information divergence, Statistical Visual Computing Laboratory, SVCL-TR 2004/01, June
2004
8. Vapnik VN (1999) The nature of statistical learning theory, 2nd edn. Springer, Berlin,
pp 237240, 263265, 291299
9. Chang C-C, Lin C-J (2013) LIBSVM: a library for support vector machines. Initial version:
2001 last updated: March 2013, pp 553558
10. Clifton L, Clifton DA, Watkinson PJ, Tarassenko (2011) Identication of patient deterioration
in vital-sign data using one-class support vector machines. In: IEEE federated conference
computer science and information systems (FedCSIS), pp 125131
11. Khalid S, Clifton DA, Clifton L, Tarassenko L (2012) A two-class approach to the detection of
physiological deterioration in patient vital signs, with clinical label renement. IEEE Trans Inf
Technol Biomed 16(6):1231
12. Lee J, Scott DJ et al (2011) Open-access MIMIC-II database for intensive care research. In:
33rd annual international conference of the IEEE EMBS
13. MIMIC II Database. www.physionet.org/physiobank/database/mimic2db/
14. Ruiz-Llata M, Guarnizo G, Yebenes-Calvino M (2010) FPGA implementation of a support
vector machine for classication and regression. In: The 2010 international joint conference on
IEEE neural networks (IJCNN), pp 15
15. Vedaldi A, Zisserman A (2011) Efcient additive kernels via explicit features maps. IEEE
Trans Pattern Anal Mach Intell, 114
16. Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classication. In:
Proceeding of ICIP
17. Sharma G, Jurie F (2013) A novel approach for efcient SVM classication with histogram
intersection kernel. Oral presentation at the British Machine Vision Conference (BMVC),
pp. 111
18. Callut J, Dupont P, Saerens M (2011) Sequence classication in the Jensen-Shannon
embedding. In: International conference on machine learning
19. Scholkopf B, Smola AJ (2001) Learning with kernels-support vector machines, regularization,
optimization and beyond. MIT Press, Cambridge
Abstract Malaria disease is a major tropical public health problem in the world.
The diagnosis of this type of tropical diseases involves several levels of uncertainty
and imprecision. It causes severe infection to the brain and prevents brain from its
proper functioning. Hence prior detection of the malaria is much essential. Soft
Computing Techniques provide excellent methodologies to process the medical
data and help medical experts in nding out the nature of illness and to take
decision. True data set collection, feature squeezing, and classication are the basic
steps followed in designing an expert system. The designed expert system acts with
intelligence, prevents erroneous decisions, and produces sharp results in time. This
paper discusses on malaria investigation with missing data using rough set rulebased soft computing technique.
Keywords Accuracy
Rule set
1 Introduction
In India, malaria constitutes a great threat to the health of many communities. The
harmful effects of malaria parasites to the human body cannot be under estimated.
Malaria is a parasitic disease caused mainly by species of Anopheles mosquitoes. The
B.S. Panda (&)
MITS, Rayagada, Odisha, India
e-mail: [email protected]
S.S. Gantayat
GMRIT, Rajam, Andhra Pradesh, India
e-mail: [email protected]
A. Misra
CUTM, Paralakhemundi, Odisha, India
e-mail: [email protected]
The Author(s) 2015
N.B. Muppalaneni and V.K. Gunjan (eds.), Computational Intelligence
in Medical Informatics, Forensic and Medical Bioinformatics,
DOI 10.1007/978-981-287-260-9_6
59
60
2 Literature Review
Missing data are questions without answers or variables without observation. Even
a small percentage of missing data can cause serious problems with the analysis
leading to draw wrong conclusions and imperfect knowledge. There are many
techniques developed in the literature to manipulate the knowledge with uncertainty
and manage data with incomplete items, but no results, and the results are not of the
same type and absolutely better than the others [46].
To handle such problems, researchers are trying to solve it in different approaches
and then proposed to handle the information system in their way. We know that the
attribute values are important for information processing in a data set or information
table. In the eld of databases, various efforts have been made for the improvement
and enhance of database or information table query process to retrieve the data. The
methodology followed by different approaches such as fuzzy sets [7, 8], rough sets
[9, 10], Boolean logic, possibility theory, statistically similarity [11], etc.
61
3 Rough Sets
3.1 Denition and Notations
The concept of rough set is another approach to deal with imperfect knowledge. It
was introduced by Pawlak in 1982 [10]. From a philosophical point of view, rough
set theory is a new approach to deal vagueness and uncertainty, and from a practical
point of view, it is a new method of data analysis.
This method has the following important advantages:
It
It
It
It
It
It
It
It
62
We dene,
RA [ fY 2 U=R : Y Ag;
and
RA [ fY 2 U=R : Y \A 6 ug:
RA and RA are respectively called the R-lower and R-upper approximation of
A with respect to R.
It can be noted that
RA fx 2 U : xR Ag
and
RA x 2 U : xR \X 6 u :
The set BNR ARARA is called the R-boundary of A. The set RA consists of
all those elements of U which can with certainty be classied as elements of A,
employing the knowledge R. The set RA consists of all those elements of U which
can possibly be classied as elements of A, employing the knowledge R. Set
BNR(A) is the set of elements which cannot be classied as either belonging to A or
belonging to A having the knowledge R. We say that a set A is R-denable if and
only if RARA. Otherwise A is said to be R-rough.
The borderline region is the undecidable area of the universe. We say that X is
equivalently BNR X 6 /. X is
rough with respect to R if and only if RX 6 RX,
or BNR X /.
said to be R-denable if and only if RX RX,
63
5 Proposed System
The proposed system for malaria diagnosis uses rough set rule-based system. The
steps included in the proposed system are given below in Fig. 2.
Algorithm:
Step 1: Malaria Data Set Collect the Malaria data set which includes relevant
objects and attributes.
Setp 2: Divide Data Set Divide the malaria set into training and testing sets
using split factor of 60 and 40 %.
Step 3: Cut Set Form cut set for the training set which represents the
nonoverlapping subsets thus divides the attributes into set of intervals.
Cuts act as a boundary values representing those intervals.
Temperature
Mild
Moderate
Severe
Very severe
Moderate
Mild
Mild
Moderate
Moderate
Mild
?
Moderate
Mild
Severe
Mild
Case#
C01
C02
C03
C04
C05
C06
C07
C08
C09
C10
C11
C12
C13
C14
C15
Mild
Mild
Moderate
Mild
Mild
Moderate
Mild
?
Mild
Mild
Severe
Severe
moderate
Severe
Mild
Headache
Nausea
Mild
?
Mild
Mild
Moderate
Mild
Moderate
Moderate
Moderate
Moderate
Severe
Severe
Moderate
Severe
Moderate
Vomiting
Mild
Moderate
Mild
Severe
Moderate
?
Severe
Moderate
Moderate
?
Severe
Moderate
?
Severe
Mild
Joint_Pain
?
Moderate
Severe
Severe
?
Mild
Severe
Moderate
Moderate
Mild
Very severe
Severe
Moderate
Severe
Severe
Body_Weakness
No
Yes
Yes
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
No
Yes
No
Malaria
64
B.S. Panda et al.
65
Step 4:
Step 5:
Rule Set Rules are formed from the attribute by common attribute values
through logical AND. It also states the result class to which a particular
rule point to.
Substitute Common Attribute One of the simplest methods to handle
missing attribute values, such values are replaced by the most common value
of the attribute. In different words, a missing attribute value is replaced by the
most probable known attribute value, where such probabilities are
represented by relative frequencies of corresponding attribute values.
Note: The rules generated here can be extended with the logical operators OR and
NOT for a large dataset.
Temperature
Mild
Moderate
Very severe
Moderate
Mild
Moderate
Mild
?
Mild
Case#
C01
C02
C04
C05
C06
C08
C10
C11
C13
Mild
Mild
Mild
Mild
Moderate
?
Mild
Severe
Moderate
Headache
Nausea
Mild
Mild
?
Mild
Moderate
?
Moderate
Severe
Moderate
Vomiting
Mild
?
Mild
Moderate
Mild
Moderate
Moderate
Severe
Moderate
Joint_Pain
Mild
Moderate
Severe
Moderate
?
Moderate
?
Severe
?
Body_Weakness
?
Moderate
Severe
?
Mild
Moderate
Mild
Very severe
Moderate
Malaria
No
Yes
Yes
No
No
Yes
No
Yes
No
66
B.S. Panda et al.
67
Cut set states the result attribute in terms of other attributes, which is a feature
reduction based on Boolean logics of algebra.
ncovers
jD j
AccuracyR
ncorrect
ncovers
These formulas are determined by aid of both the expert doctors in the eld of
tropical medicine and literature. Some of the rules (Rules 1, Rules 2, Rules 5, Rule
8, Rule 15) can be interpreted as follows:
Rule 1:
IF temperature = mild and headache = mild and nausea = mild and
vomiting = mild and joint pain = mild and body weakness = ? THEN
malaria = No.
Rule 2:
IF temperature = moderate and headache = mild and nausea = mild
and vomiting = ? and joint pain = moderate and body weakness = moderate THEN malaria = Yes.
Rule 3:
IF temperature = severe and headache = moderate and nausea = mild
and vomiting = mild and joint pain = mild and body weakness = severe
THEN malaria = Yes.
68
Rule 5:
Rule 8:
Rule 15:
The notion of rough set [1517] is used to classify the patients with malaria or
not and also the other attributes which gives a group of similar information in the
data set. Here the other rules are not mentioned to avoid the complexity and
confusion with the given rules.
Case#
Common attribute
Actual attribute
Diagnosis
C01
C02
C04
C05
C06
C08
C10
C11
C13
Mild
Moderate
Severe
Moderate
Mild
Moderate
Mild
Severe
Moderate
Mild
Mild
Mild
Moderate
Mild
Moderate
Mild
Severe
Mild
No
Yes
Yes
No
No
Yes
No
Yes
No
Temperature
Mild
Moderate
Severe
Very severe
Moderate
Mild
Mild
Moderate
Moderate
Mild
Severe
Moderate
Mild
Severe
Mild
Case#
C01
C02
C03
C04
C05
C06
C07
C08
C09
C10
C11
C12
C13
C14
C15
Mild
Mild
Moderate
Mild
Mild
Moderate
Mild
Moderate
Mild
Mild
Severe
Severe
moderate
Severe
Mild
Headache
Nausea
Mild
Moderate
Mild
Mild
Moderate
Mild
Moderate
Moderate
Moderate
Moderate
Severe
Severe
Moderate
Severe
Moderate
Vomiting
Mild
Moderate
Mild
Severe
Moderate
Mild
Severe
Moderate
Moderate
Mild
Severe
Moderate
Moderate
Severe
Mild
Joint_Pain
Mild
Moderate
Severe
Severe
Moderate
Mild
Severe
Moderate
Moderate
Mild
Very severe
Severe
Moderate
Severe
Severe
Body_Weakness
No
Yes
Yes
Yes
No
No
Yes
Yes
Yes
No
Yes
Yes
No
Yes
No
Diagnosis
70
The bold word in the above table shows the replacement of approximated
information with the missing data in Table 4.
After replacing proper values to the missing data, we get the different classes
from the above table as follows.
Temperature = {C01, C06, C07, C10, C13, C15}, {C02, C05, C08, C09, C12},
{C03, C11, C14}, {C04}}
Headache = {{C01, C02, C04, C05, C07, C09, C10, C15}, {C03, C06, C08,
C13}, {C11, C12, C14}}
Nausea = {{C01, C02, C03, C05, C15}, {C06, C07, C08, C09, C10, C12, C13,
C14}, {C04, C11}}
Vomiting = {{C01, C03, C04, C06}, {C02, C05, C07, C08, C09, C10, C13,
C15}, {C11, C12, C14}}
Joint_Pain = {{C01, C03, C06, C10, C15}, {C02, C05, C08, C09, C12, C13},
{C04, C07, C11, C14}}
Body_Weakness = {{C02, C05, C08, C09, C13}, {C03, C04, C07, C12, C14,
C15}, {C01, C06, C10}, {C11}}
Malaria = {{C01, C05, C06, C10, C13, C15}, {C02, C03, C04, C07, C08, C09,
C11, C12, C14}}
After the imputation, the different classes for individual attributes are reassigned
with the currently added new data.
From the above table, it is clear that the classes generated for the decision
making are reducing after imputing the missing values by using the rule generation
method (Table 5).
The analysis of the classication shows the following result.
No. of changes in
classes |(A-B)|
Temperature
Headache
Nausea
Vomiting
Joint_Pain
Body_Weakness
5
4
4
4
4
5
4
3
3
3
3
4
1
1
1
1
1
1
71
9 Conclusion
Malaria investigation using rough set rule-based soft computing technique shows
good imputation and coverage. It has handled 15 cases and 6 attributes. The above
method handles attributes even in the presence of absent values. In future, the same
method can be used to investigate other diseases by replacing the absent values with
common attribute value. This approach can be combined with other classiers to
enhance further to retrieve the missing data in a data set or information table. This
technique can be enhanced by using rough logic, which is a future direction.
References
1. Uzoka FME, Osuji J, Obot O (2010) Clinical decision support system (DSS) in the diagnosis
of malaria: a case comparison of two soft computing methodologies. Expert Syst Appl
38:15371553
2. Szolovits P, Patil RS, Schwartz WB (1988) Articial intelligent in medical diagnosis. J Intern
Med 108:8087
3. Szolovits P (1995) Uncertainty and decision in medical informatics. Methods Inf Med
34:111121
4. Little RJ, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
5. Kantadzic M (2003) Data mining: concepts, models, methods and algorithms. Wiley, New
York
6. Gantayat SS, Misra A, Panda BS (2013) A study of incomplete dataa review. In: LNCS
Springer FICTA-2013, pp 401408. ISBN: 978-3-319-02930-6
7. Zadeh LA (1965) Fuzzy sets. Inf Control 8:338353
8. Zadeh LA (1973) Outline of a new approach to the analysis of complex system and decision
processes. IEEE Trans Syst Man Cybern 3:2844
9. Grzymala-Busse J (1988) LERS-a system for learning from examples based on rough sets.
J Intell Rob Syst 1:316
10. Pawlak Z (1982) Rough sets. J Inf Comp Sci II:341356
11. Devlin H, Devlin JK (2007) Decision support system in patient diagnosis and treatment.
Future Rheumatol 2:261263
12. Panda BS, Abhishek R, Gantayat SS (2012) Uncertainty classication of expert systemsa
rough set approach. In: ISCON proceedings with IJCA. ISBN: 973-93-80867-87-0
13. Grzymala-Busse J (1988) Knowledge acquisition under uncertaintya rough set approach.
J Intell Rob Syst 1:316
14. Panda BS, Gantayat SS, Misra A (2013) Rough set approach to development of a knowledgebased expert system. Int J Adv Res Sci Technol (IJARST) 2(2):7478. ISSN: 2319-1783
15. Pawlak Z (1991) Rough sets-theoretical aspects of reasoning about data. Kluwer Academic
Publishing, Boston
16. Pawlak Z, Skowron A (2007) Rough sets- some extensions. Inf Sci 177(1):2840
17. Pawlak Z (1996) Why rough sets, fuzzy systems. In: Proceedings of the fth ieee international
conference, vol 2
1 Introduction
Oesophagel cancer rate in India is one of the highest in the world. Although low
and stable incidence and mortality rates from colorectal cancers were observed in
India [1], these rates were associated with a low 5-year relative survival rate [2].
This low survival rate suggests severe deciencies in early diagnosis and effective
treatment in India. Since as noted in [2] population-based screening of colorectal
V.B. Surya Prasath (&)
University of Missouri-Columbia, Columbia, MO 65211, USA
e-mail: [email protected]
R. Delhibabu
Cognitive Modeling Lab, IT University Innopolis, Kazan, Russia
e-mail: [email protected]
R. Delhibabu
Department of CSE, SSN Engineering College, Chennai, India
R. Delhibabu
Machine cognition lab, Kazan Federal University, Kazan, Russia
The Author(s) 2015
N.B. Muppalaneni and V.K. Gunjan (eds.), Computational Intelligence
in Medical Informatics, Forensic and Medical Bioinformatics,
DOI 10.1007/978-981-287-260-9_7
73
74
Fig. 1 Wireless capsule endoscopy: a The patient swallows the capsule and capsule travels
through the tubular intestinal path. The imaging is done through a circular lens and at periodic
intervals. b Pillcam Capsule parts (used with permission from). 1 Optical dome. 2 Lens holder. 3
Lens. 4 Illuminating LEDs. 5 CMOS imager. 6 Battery. 7 ASIC transmitter. 8 Antenna
cancer is not cost-effective, given the low burden of colorectal cancer, early
diagnosis and adequate treatment using imaging techniques are important.
VCE introduced at the turn of this millennium [3] paved the way for painless
non-invasive way to image the gastrointestinal tract, see Fig. 1. A wireless capsule
consist of a tiny imaging device provides continuous video stream of the inner
mucosa-lumen tubular structure. Typically, these VCE exam for each patient
consist of 8 h of video which is around 55,000 frames. This puts an enormous
burden on the gastroenterologists since examining and reviewing the video from
capsule endoscopy requires concentration for a long durations. Thus, automatic
CAD methods are required which can help in analysis and diagnosis [46], see [7]
for a recent review. Computerized video analysis algorithms can reduce the time
required to review VCE exams and can augment the decision making processes.
Lumen detection in other imaging modalities have been studied by some
researchers. In [8, 9] colonoscopy images are considered and adaptive thresholding
type techniques were utilized. The wireless capsule optical system usually consist
of flashing LEDs, as opposed to xed lighting in colonoscopy, and an imaging
sensor which captures the images at predened time intervals. Thus, thresholding
type methods may not capture strong intensity variations between frames. In this
work, we study an automatic image segmentation method for VCE imagery to
segment lumen boundaries without any manual supervision. For this purpose we
start with the well-known image segmentation method based on active contours
without edges [10]. This method involves level sets which are known to handle
topological changes and thus is suitable for mucosa deformations which occur in
VCE data [11]. Though efcient, the traditional implementation was not fast for
real-time imaging and is restricted to takes 0.32 s/frame (of size 512 512). To
speed up the active contour segmentation further, in this paper we propose a fast
and efcient implementation of the active contour model which obtains good
segmentations. Compared with some of the classical and contemporary image
75
Z
d/jr/jdx k ju c1 j2 H/dx
X
X
Z
2
k ju c2 j 1 H/dx
min e/; c1 ; c2 l
/;c1 ;c2
where > 0 and 0 are given xed parameters, c1, c2 represent regions inside and
outside the level set function respectively. The function H(z) := 1 if z 0, H(z) :
= 0 if z < 0 is the Heaviside function, and dz : ddz Hz is the Dirac delta function
in the sense distributions. In [10], the Euler-Lagrange equation of the functional (1),
which is a necessary condition for a minimizer triplet (, c1, c2) to satisfy, is
implemented. That is, the following nonlinear PDE is solved for and c1, c2:
o/
r/
de / ldiv
k1 u c1 k2 u c2
ot
jr/j
with
R
uHe /dx
c1 R X
;
X uHe /dx
R
u1 He /dx
c2 R X
X u1 He /dx
Here, de ; He represent the regularized versions of the dirac delta and Heaviside
functions respectively.
76
Then the algorithm for solving the ACWE PDE (2) is an alternating iterative
scheme:
1. Fix l; k; e knowing k, compute c1(k), c2(k) from (3).
2. Compute k+1 by the following discretization and linearization of (2) in :
2
1
0
k
k1
6 l
C
B
/k1
Mx /i;j
i;j /i;j
x B
C
r
dh /ki;j 6
M
4
A
@
2
2
2
h
Mt
2
Mx /ki;j =h2 /ki;j1 /ki;j1 =2h
1
0
C
My /i;j
l yB
C
Br
M
@
A
2
2
2
h
2
y k
k
k
2
/i1;j /i1;j =2h M /i;j =h
2
2 i
k1 ui;j c1 /k
k2 ui;j c2 /k
4
The modied nite difference scheme (4) involve only rst order discrete
derivatives and theoretical convergence of the above discretization will be reported
elsewhere.
77
3 Experimental Results
3.1 Setup and Parameters
The core operations of the schemes are implemented in C with mex interface and
MATLAB R2012a is used for visualization purposes. Our fast scheme takes about
0.1 s (40 iterations) for a 512 512 image compared to 0.32 s in a previous work
[11]. This means on average the segmentation can be run for 10 frames per second
(fps), making it very attractive for endoscopic image analysis and diagnosis.
Enabling further code optimization via GPU may signicantly speed up the main
loop of the program. Moreover, instead of having a xed number of iterations for
segmentation, we devised a tolerance check to stop the scheme at convergence. For
example, /k1 /k \106 was set, and we observed convergence of the iterative
scheme (4) in less than 40 iterations for most of the images. Thus, our scheme takes
average runtime of 0.1 s/frame using non-optimized MATLAB code on 2.3 GHz
Intel Core i7, 8 GB RAM laptop to obtain a segmentation of lumen and mucosa.
The ACWE parameters 1 = 2 = 1, and e 106 are xed for all the experiments reported here. This gives equal weight to both inside and outside regions of
the zero level set. The parameter in Eq. (4) controls how regular the nal active
contour is, and hence, affects the nal segmentation in the experiments. After
conducting extensive experiments with respect to this parameter, we xed it at
= 0.2, and this seems to work well for most of the images in VCE data. The time
step parameter t = 0.5, step size for the nite difference grid h = 1 are xed
throughout all the experiments.
(a)
(b)
(c)
(d)
(e)
Fig. 2 Endoscopic image mucosa segmentation. a Input image and initialization of the modied
ACWE scheme, see Sect. 2.2. bd Intermediate segmentation results at time t = 20, 40 and 60,
curve laid on top of the original image for visualization. e Final segmentation at T = 80. The
segmentation algorithm usually converges within t = 40
78
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 3 Comparison with other segmentation schemes: a Otsu thresholding based scheme [8].
b Adaptive progressive thresholding scheme [9]. c Mean shift [17]. d ACWE [10]. e TextureACWE [18]. f Global minimization model [19] with Chan-Vese energy. g Our fast active contour.
h Expert (gastroenterologist) segmentation. Dice values () is given for each result and our fast
implementation obtained better agreement with ground-truth (GT) boundaries
Table 1 .
(a) Otsu thresholding based scheme (OT)
(b) Adaptive progressive thresholding (APT)
(c) Mean shift (MS)
(d) Chan-Vese scheme (ACWE)
(e) Texture based active contour (TAC)
(f) Fast global minimization (FGM)
[8]
[9]
[17]
[10]
[18]
[19]
Fig.
Fig.
Fig.
Fig.
Fig.
Fig.
3a
3b
3c
3d
3e
3f
79
Table 2 Dice similarity coefcient (, Eq. (5)) values for automatic segmentations when
compared with manual ground truth for different schemes in Fig. 3
Sub-gure
Ref.
(a)
[8]
(b)
[9]
(c)
[17]
(d)
[10]
(e)
[18]
(f)
[19]
Our
Result
0.6592
0.4987
0.8202
0.8182
0.3017
0.7444
0.8793
results of other schemes such as APT, ACWE give inaccurate segmentations of the
mucosa folds. We use the Dice Similarity Coefcient,
jA; B
2jA \ Bj
;
jAj jBj
where A and B are automatic and manual segmentations respectively, and |X|
denotes the total number of pixels in a set X. Dice values are in [0,1.0] with values
closer to one indicate good overlap between manual and automatic segmentations.
Table 2 provides Dice values for different schemes corresponding to Fig. 3. As can
be seen our method obtains the highest Dice value and conrms visual comparison
seen with the GT image. Similar active contour methods utilized in other imaging
modalities [1416] can also be adapted for VCE imagery and denes our future
work in this direction.
4 Conclusion
In this paper, we consider automatic segmentation of video capsule endoscopy
imagery. By utilizing an active contour without edges with a fast implementation
we obtain meaningful segmentations of the gastrointestinal tract imaged by the
capsule endoscopy. Numerical implementation of the proposed scheme is carried
out with nite differences and provides efcient segmentation results. Compared
with other related image segmentation methods our fast implementation obtains
better results when compared to ground-truth marked by gastroenterologists. We
believe automatic computer aided diagnostic methods can provide relief to the
big-data handling associated with reading video capsule endoscopy imagery and
further methods for augmenting the diagnostic capabilities are required currently.
References
1. Mohandas KM (2011) Colorectal cancer in india: controversies, enigmas and primary
prevention. Indian J Gastroenterol 30(1):36
2. Pathy S, Lambert R, Sauvaget C, Sankaranarayanan R (2012) The incidence and survival rates
of colorectal cancer in India remain low compared with rising rates in east asia. Dis Colon
Rectum 55(8):900906
80
3. Iddan G, Meron G, Glukhovsky A, Swain F (2000) Wireless capsule endoscopy. Nature 405
(6785):417
4. Figueiredo PN, Figueiredo IN, Prasath S, Tsai R (2011) Automatic polyp detection in pillcam
colon 2 capsule images and videos: Preliminary feasibility report. Diagn Ther Endosc
2011:7 pp Article ID 182435
5. Figueiredo IN, Moreno JC, Prasath VBS, Figueiredo PN (2012) A segmentation model and
application to endoscopic images. In: Campilho A, Kamel M (eds) International conference on
image analysis and recognition (ICIAR 2012). Springer LNCS, vol 7325. Aveiro, Portugal,
pp 164171 (June 2012)
6. Prasath VBS, Pelapur R, Palaniappan K (2014) Multi-scale directional vesselness stamping
based segmentation for polyps from wireless capsule endoscopy. Figshare (June 2014)
7. Karargyris A, Bourbakis N (2010) A survey on wireless capsule endoscopy and endoscopic
imaging. a survey on various methodologies presented. IEEE Eng Med Biol Mag 29(1):7283
8. Asari KV (2000) A fast and accurate segmentation technique for the extraction of
gastrointestinal lumen from endoscopic images. Med Eng Phys 22(2):8996
9. Asari KV, Srikanthan T (2002) Segmenting endoscopic images using adaptive progressive
thresholding: a hardware perspective. J Syst Architect 47(9):759761
10. Chan TF, Vese LA (2001) Active contours without edges. IEEE Trans Image Process 10
(2):266277
11. Prasath VBS, Figueiredo IN, Figueiredo PN, Palaniappan K (2012) Mucosal region detection
and 3D reconstruction in wireless capsule endoscopy videos using active contours. In: 34th
IEEE/EMBS international conference, San Diego, USA, pp 40144017 (September 2012)
12. Mumford D, Shah J (1989) Optimal approximations by piecewise smooth functions and
associated variational problems. Commun Pure Appl Math 42(5):577685
13. Osher S, Sethian JA (1988) Fronts propagating with curvature-dependent speed: algorithms
based on Hamilton-Jacobi formulations. J Comput Phys 79(1):1249
14. Prasath VBS (2009) Color image segmentation based on vectorial multiscale diffusion with
inter-scale linking. In: Chaudhury S, Mitra S, Murthy CA, Sastry PS, Sankar K.P (eds) Third
international conference on pattern recognition and machine intelligence (PReMI-09).
Springer LNCS, vol 5909. Delhi, India, pp 339344 (December 2009)
15. Moreno JC, Prasath VBS, Proenca H, Palaniappan K (2014) Brain MRI segmentation with fast
and globally convex multiphase active contours. Comput Vis Image Underst 125:237250
16. Prasath VBS, Pelapur R, Palaniappan K, Seetharaman G (2014) Feature fusion and label
propagation for textured object video segmentation. In: SPIE Defense + Security (DSS).
Baltimore, MD, USA (May 2014) In Geospatial Info Fusion and Video Analytics, IV
17. Comaniciu D, Meer P (2002) Mean shift: A robust approach toward feature space analysis.
IEEE Trans Pattern Anal Mach Intell 24(5):603619
18. Sandberg B, Chan TF, Vese L (2002) A level-set and Gabor-based active contour algorithm
for segmenting textured images. Technical Report, pp 0239, UCLA CAM (2002)
19. Bresson X, Esedoglu S, Vandergheynst P, Thiran J, Osher S (2007) Fast global minimization
of the active contour/snake model. J Math Imaging Vis 28(2):151167
81
82
P. Purkayastha et al.
1 Introduction
Kinases have become the most studied class for drug target and play important role
in metabolism, cell signaling, cellular transport, protein regulation, secretory
processes, and many other cellular pathways [1]. Hence, kinases annotations are
necessary for understanding the pathway related to signal transduction and for
understanding the disease pathway associated with impairment of kinases activity
[2]. So, classication of kinases will provide comparison of related human kinases
and insights into kinases functions and evolution. Currently the PROT R package in
R was to extract the features for kinases. With a wide range of features, choosing an
accurate features subset for classication is not an easy task. Feature selection
techniques are employed in such cases. Feature selection method identies subset of
features, based on which a classier will be trained. Feature selection is an
important step in training a classier with a subset of features, instead of training the
classier with the entire features of a dataset [3, 4].
Feature selection is benecial in various ways. First, a packed subset of features
can alleviate the curse of dimensionality and alienates the overtting problem which
is usually encountered during training a classier. Secondly, a model performs with
a highly accurate extent by removing the noisy features and preprocessing the
dataset. Thirdly, the accurate and essential subset of the dataset can be used with the
signicantly reduced computational cost. Finally, an illustrative subset of feature
can make the model output more understandable and reasonable. Computational
cost can be reduced by randomly eliminating a number of features. The major
challenge lies in search of feature subset that leads to enhanced performance of a
classier by removing the redundant and unnecessary features. Consequently, the
efcacy of a feature selection method is commonly assessed by the performance of
the nal model trained with the feature subset [4].
Researchers working on this problem have explored building classication
model using influential features, in which classication accuracy has thought to be
the best measure for assessing the performance of a classier. However, it has been
pointed out that accuracy is not always a suitable assessment metric and the Area
Under the ROC Curve (AUC) has been proven as a better performance metric in
evaluation with classication accuracy [5]. A classier with minimum cost is more
required than a classier with high accuracy [6]. In this work, we have used forward
greedy feature selection algorithm along with random forest classication algorithm
and evaluated the performance based on the feature subset which maximizes AUC.
83
Kinases families
Number of sequences
AGC
Atypical
CAMK
CK1
CMGC
OTHERS
RGC
STE
TK
TKL
Total
63
43
79
11
63
80
4
47
89
42
530
84
P. Purkayastha et al.
empirical studies revealed that a classier with the highest accuracy extent might not
be idyllic in real-world problems. Instead, the AUC has been demonstrated as the
alternative approach and measure to evaluate the performance of any classier.
Therefore, we attempt to develop classication models using Random Forest
classier [10]. We have developed an algorithm by building the model using 2/3rd of
the training dataset and remaining 1/3rd of the test dataset. The test datasets were
partitioned randomly.
The performance of the classier was evaluated using AUC. Suppose we need to
select k feature subset from a feature set of F = ff 1 ; f 2 ; :::; f m g. Forward greedy
search builds model by considering one feature at a time and by calculating AUC
for each of them [11]. Then, the combination feature subsets are ranked based on
the descending order. The combination of features with maximum AUC value is
selected for classication of kinases.
The performance of the subset of features from the amino acid, dipeptide, and
pseudo amino acid composition are shown in Figs. 1, 2 and 3 respectively. As
shown in Figs. 1, 2 and 3, an improvised performance of the classier was obtained
after applying forward greedy feature selection algorithm than with all the features.
The feature selection maximizes AUC measures for all the 10 classes of kinases.
AUC measured for amino acid, dipeptide and pseudo amino acid composition
for all 10 classes shows a major difference in AUC measure with all 20 features
than compared with feature selected subsets. The AUC measure was seen for amino
acid composition in Fig. 1. The major difference in AUC measure was found for
RGC class than compared with other kinase classes. The feature subset with respect
to RGC class was found to be reduced to three features instead of all 20 features,
using forward greedy algorithm. A negligible difference was identied in case of
CK1 and TK classes, which could further be studied for marking a signicant
difference in all the kinase classes.
1.2
Forward greedy
All features
AUC measure
1
0.8
0.6
0.4
0.2
0
AGC
Atypical
CAMK
CK1
STE
TK
TKL
Fig. 1 The performance of the classier for all 10 kinase classes using amino acid composition
using feature subset and all features
85
1.2
Forward greedy
All features
AUC measure
1
0.8
0.6
0.4
0.2
0
AGC
Atypical
CAMK
CK1
STE
TK
TKL
Fig. 2 The performance of the classier for all 10 kinase classes using dipeptide composition
using feature subset and all features
1.2
Forward greedy
All features
AUC measure
1
0.8
0.6
0.4
0.2
0
AGC
Atypical
CAMK
CK1
STE
TK
TKL
Fig. 3 The performance of the classier for all 10 kinase classes using pseudo amino acid
composition using feature subset and all features
Similarly, the performance in AUC measure was compared using all 400
features and subset of features using dipeptide amino acid composition as shown
in Fig. 2. The major difference was found in case of atypical class of kinases
using all 400 features and subset of 4 features (using forward greedy) and the
negligible difference was found in case of TK class. Similarly, for pseudo amino
acid composition the difference measured was found to be more in case of
atypical than compared with all other classes and very less in case of RGC as
shown in Fig. 3. The number of features generated using forward greedy was
found to contain six features with highest AUC measure. This brings us to the
hypothesis that kinases can be classied with maximum AUC extent, if good
subsets of features are used.
86
P. Purkayastha et al.
4 Conclusion
In this paper, we have shown the pros of feature selection method for identifying the
feature subset for classication of kinases. The performance of the classication
model is shown using the feature subset and using all the features. The evaluation of
the performance was done by measuring AUC. The random forest classier is able
to classify kinase groups with a better AUC measure for feature subsets than
compared with all the features. But the difference in AUC measure was found to be
negligible for a few classes of kinase like RGC class using amino acid composition,
atypical class using dipeptide, and pseudo amino acid composition which indicates
that group of kinases are classiable with maximum AUC extent, if a good subset
of features are used. Further, feature selection method could useful to classify large
set of biological data and for dimensionality reduction.
References
1. Cohen P (2002) Protein kinasesthe major drug targets of the twenty-rst century? Nat Rev
Drug Discov 1(4):309315
2. Zhang J, Yang PL, Gray NS (2009) Targeting cancer with small molecule kinase inhibitors.
Nat Rev Cancer 1(9):2839
3. Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene
expression data. In: Proceedings of the IEEE computer society conference on bioinformatics,
pp 523528. Washington, DC
4. Tang K, Suganthan P, Yao X (2006) Gene selection algorithms for microarray data based on
least squares support vector machine. BMC Bioinform 7:95
5. Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE
Trans Knowl Data Eng 17(3):299310
6. Rui W, Tang K (2009) Feature selection for maximizing the area under the ROC curve. In:
Data mining workshops, 2009. ICDMW09. IEEE international conference on. IEEE
7. Manning G et al (2002) The protein kinase complement of the human genome. Science 298
(5600):19121934
8. Bhasin M, Raghava GP (2004) Classication of nuclear receptors based on amino acid
composition and dipeptide composition. J Biol Chem 279(22):2326223266
9. Krajewski Z, Tkacz E (2013) Protein structural classication based on pseudo amino acid
composition using SVM classier. Biocybern Biomed Eng 33(2):7787
10. Breiman Leo (2001) Random forests. Mach Learn 45(1):532
11. Bradley Andrew P (1997) The use of the area under the ROC curve in the evaluation of
machine learning algorithms. Pattern Recogn 30(7):11451159
87
88
S. Singh et al.
1 Introduction
Network biology and gene expression proling with microarrays have turned into a
typical approach for nding genes and biological pathways which are correlated
with diverse complex diseases [3, 5, 12]. Biological network analysis combines
coexpressed gene networks to predict novel associations and key regulatory
molecules. It integrates independent data as well as biologically relevant information which emerge as a reliable method to nd out meaningful novel biological
network(s) [4, 24, 39].
Rheumatoid arthritis (RA) is a systemic autoimmune disease characterized by
inflammation, synovial hyperplasia, cartilage destruction, and bone erosion [31].
The involvement of immune cells and inflammatory molecules are basic hallmark
of RA [13]. The cells of synovium are basically divided into synovial broblast
(SF) and macrophages. The primary function of SF and macrophages are secretion
of hyaluronic acid and phagocytosis, respectively [1, 23]. They are involved in
production of various inflammatory cytokines and chemokines, which in turn attract
more inflammatory molecules to the synovium [16, 35]. As together they also
initiate in production of Matrix Metalloproteinase (MMP), Vascular Endothelial
Growth Factor (VEGF), formation of ectopic germinal layers, and over expression
of Major Histocompatibility Complex Class II (MHC II) [6, 15, 18]. Recent studies
have reported that both SF and macrophages play crucial role in progressive joint
inflammation and destruction as they are found larger in pannus and inflamed
synovial membrane than in normal joints [2]. Current research work focuses on
identication of key regulatory molecules from synovial cell layers.
Microarray technology has been implemented to identify gene expression level
which can be further used to detect transcriptionally altered key signature molecules
involved in the pathophysiology of RA. At current times, various studies have made
to predict differentially expressed genes (DEGs) in RA using multiple gene
expressions [17]. In the current study, molecular expression proles of human
macrophages and SF from Gene Expression Omnibus (GEO) were analyzed to
identify promising genes for RA. Our investigation provided a valuable methodology to analyze these novel genes which are involved in the pathophysiology of RA.
89
Fig. 1 Box plot of normalized values for series GSE7669, GSE8286, and GSE10500
90
S. Singh et al.
Fig. 2 Prole plot of group for series GSE7669, GSE8286, and GSE10500
91
through which they are associated [36]. Basically, biological networks are scale free
networks with statistically and functionally signicant interacting molecules [33].
Cytoscape plugins-Network analyzer and M-Code were implemented for identifying
the topology of the network keeping into consideration of K-core for identifying
promising hubs [30]. It was further validated by biological analysis by Biointerpreter
(http://www.biointerpreter.com/biointerpreterv3/) [42].
Downregulated
54
Downregulated
267
0168 h
Upregulated
51
Downregulated
90
92
S. Singh et al.
Table 2 K-core analysis with highly clustered seed genes for down- and upregulated networks
Clusters
Node
Score
Edges
69,984
7,309
2,351
179,358
2,440
991
Seed
ZNF516
ACP2
OLFML2B
OAS2
VCAN
CPB1
93
Common biological properties like angiogenensis, antigen processing and presentation, immune response, vasculogenesis, chemotaxis, and inflammation
conrmed that these nodes may play role in pathophysiology of RA from the
cytoplasmic and nuclear level. 40 % of genes identied under different categories of
synovial membrane (Rank 13 for up and Rank 13 for down) pointed to have
correlation with the disease hyperlipidemia which is a genetic disorder of increased
blood fats causing cardiovascular risk which is further connected with RA as it
doubles the risk in RA patient [11, 26]. All the ranks from the both networks
showed their influence in cytokinecytokine interaction, TGF beta signaling, ECM
receptor interaction, Osteoclast differentiation, and Wnt receptor signaling pathways. Under upregulated networks, the seed gene Chondroitin Sulfate Proteoglycan
(VCAN) showed its involvement in glycosaminoglycan and hyaluronic acid binding
[29, 43]. Being a major component of cartilage, it is the main point of attraction for
CD44 for initiating the inflammation in synovial cells [8]. Oligoadenylate Synthetase
2 (OAS2) and Carboxypeptidase B1 (CPB1) showed crossregulation and downregulation with autoimmune diseases, respectively [37]. OAS2 was involved in IFN
beta signaling pathway whereas CPB1which is present in plasma helps in brin clot
by acting as a procoagulant [20, 32, 40]. CPB1 also initiates osteopontin, C5a, and
bradykinin-like proinflammatory molecules [7]. Under downregulated networks,
Zinc Finger Protein 516 (ZNF516), Acid phosphatase 2, lysosomal (ACP2), and
Olfactomedin-like 2B (OLFML2B) showed their specic response in stress activity
94
S. Singh et al.
specically in immune response [19, 28]. These genes have already indicated
upregulation in cardiovascular and hormone regulation. ACP2 and OLFML2B also
showed their response to hyperlipidemia disease which was connected to RA [41].
All the proteins molecules reported through network analysis showed their effect in
pathophysiology of RA. Still there is wide scope to further investigate these networks
which can explore more genes/proteins related to RA.
4 Conclusion
The proposed research work is an expression network study frame work for RA
synovial cells which may play signicant role in its pathophysiology. The outcome
resulted in six candidate seed genes which gave a better understanding of the
progression of RA by different pathways involvement. This part of information may
lead to better understand and manage RA in future by considering these potential
targets in therapeutics.
Acknowledgments This research work was supported by Science Engineering and Research
Board-Department of Science and Technology, New Delhi and Karunya University, Coimbatore,
Tamil Nadu.
References
1. Anitua E, Snchez M, Nurden AT, Zalduendo MM, De La Fuente M, Azofra J, Anda I (2007)
Platelet-released growth factors enhance the secretion of hyaluronic acid and induce
hepatocyte growth factor production by synovial broblasts from arthritic patients.
Rheumatology 46(12):17691772
2. Athanasou NA (1995) Synovial macrophages. Ann Rheum Dis 54(5):392
3. Bauer JW, Bilgic H, Baechler EC (2009) Gene-expression proling in rheumatic disease: tools
and therapeutic potential. Nat Rev Rheumatol 5(5):257265
4. Begley TJ, Rosenbach AS, Ideker T, Samson LD (2002) Damage recovery pathways in
Saccharomyces cerevisiae revealed by genomic phenotyping and interactome mapping1 1 NIH
grants RO1-CA-55042 and P30-ES02109; NIH training grant ES07155 and National Research
Service Award F32-ES11733 (to TJB). Mol Cancer Res 1(2):103112
5. Chand Y, Alam MA (2012) Network biology approach for identifying key regulatory genes by
expression based study of breast cancer. Bioinformation 8(23):1132
6. Cho CH, Koh YJ, Han J, Sung HK, Lee HJ, Morisada T, Koh GY (2007). Angiogenic role of
LYVE-1-positive macrophages in adipose tissue. Circ Res 100(4):e47e57
7. Du XY, Zabel BA, Myles T, Allen SJ, Handel TM, Lee PP, Leung LL (2009) Regulation of
chemerin bioactivity by plasma carboxypeptidase N, carboxypeptidase B (activated thrombinactivable brinolysis inhibitor), and platelets. J Biol Chem 284(2):751758
8. Fujimoto T, Kawashima H, Tanaka T, Hirose M, Toyama-Sorimachi N, Matsuzawa Y,
Miyasaka M (2001) CD44 binds a chondroitin sulfate proteoglycan, aggrecan. Int Immunol 13
(3):359366
95
96
S. Singh et al.
30. Raman MP, Singh S, Devi PR, Velmurugan D (2012) Uncovering potential drug targets for
tuberculosis using protein networks. Bioinformation 8(9):403
31. Rossol M, Schubert K, Meusch U, Schulz A, Biedermann B, Grosche J, Wagner U (2013)
Tumor necrosis factor receptor type I expression of CD4+ T cells in rheumatoid arthritis
enables them to follow tumor necrosis factor gradients into the rheumatoid synovium. Arthritis
Rheum 65(6):14681476
32. Sadler AJ, Williams BR (2008) Interferon-inducible antiviral effectors. Nat Rev Immunol 8
(7):559568
33. Sengupta U, Ukil S, Dimitrova N, Agrawal S (2009) Expression-based network biology
identies alteration in key regulatory pathways of type 2 diabetes and associated risk/
complications. PLoS ONE 4(12):e8100
34. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Ideker T (2003) Cytoscape:
a software environment for integrated models of biomolecular interaction networks. Genome
Res 13(11):24982504
35. Smith RS, Smith TJ, Blieden TM, Phipps RP (1997) Fibroblasts as sentinel cells. Synthesis of
chemokines and regulation of inflammation. Am J Pathol 151(2):317
36. Snijesh VP, Singh S (2014) Molecular modeling and network based approach in explaining
the medicinal properties of nyctanthes arbortristis, lippia nodiflora for rheumatoid arthritis.
J Bioinform Intell Control 3(1):3138
37. Song JJ, Hwang I, Cho KH, Garcia MA, Kim AJ, Wang TH, Robinson WH (2011) Plasma
carboxypeptidase B downregulates inflammatory responses in autoimmune arthritis. J Clin
Invest 121(9):35173527
38. Szekanecz Z, Koch AE (2007) Macrophages and their products in rheumatoid arthritis. Curr
Opin Rheumatol 19(3):289295
39. Tornow S, Mewes HW (2003) Functional modules by relating protein interaction networks
and gene expression. Nucleic Acids Res 31(21):62836289
40. van Baarsen LG, Wijbrandts CA, Rustenburg F, Cantaert T, van der Pouw TC (2010)
Regulation of IFN response gene activity during infliximab treatment in rheumatoid arthritis is
associated with clinical response to treatment. Arthritis Res Ther 12(1):11
41. van Oorschot RA, Birmingham V, Porter PA, Kammerer CM, VandeBerg JL (1993) Linkage
between complement components 6 and 7 and glutamic pyruvate transaminase in the
marsupialMonodelphis domestica. Biochem Genet 31(56):215222
42. Varatharajan S, Karathedath S, Velayudhan SR, Srivastava A, Mathews V, Balasubramanian P
(2013) Harnessing gene expression proling in search of new candidate genes for Ara-C
resistance in acute myeloid leukemia. Blood 122(21):1299
43. Westling J, Gottschall P, Thompson V, Cockburn A, Perides G, Zimmermann D, Sandy J
(2004) ADAMTS4 (aggrecanase-1) cleaves human brain versican V2 at Glu405-Gln406 to
generate glial hyaluronate binding protein. Biochem J 377:787795
44. Yarilina A, Park-Min KH, Antoniv T, Hu X, Ivashkiv LB (2008) TNF activates an
IRF1-dependent autocrine loop leading to sustained expression of chemokines and
STAT1-dependent type I interferonresponse genes. Nat Immunol 9(4):378387