Bioinformatics and Biomedical Engineering Proceedings of the 9th International Conference on Bioinformatics and Biomedical Engineering 1st Edition James Chou 2024 Scribd Download
Bioinformatics and Biomedical Engineering Proceedings of the 9th International Conference on Bioinformatics and Biomedical Engineering 1st Edition James Chou 2024 Scribd Download
https://ebookultra.com/download/introduction-to-biomedical-
engineering-3ed-edition-enderle-j/
Biomedical Engineering Bridging Medicine and Technology
1st Edition W. Mark Saltzman
https://ebookultra.com/download/biomedical-engineering-bridging-
medicine-and-technology-1st-edition-w-mark-saltzman/
https://ebookultra.com/download/bioinformatics-software-engineering-
delivering-effective-applications-1st-edition-paul-weston/
https://ebookultra.com/download/basic-transport-phenomena-in-
biomedical-engineering-3rd-edition-fournier/
Bioinformatics and Biomedical Engineering Proceedings
of the 9th International Conference on Bioinformatics and
Biomedical Engineering 1st Edition James Chou Digital
Instant Download
Author(s): James Chou
ISBN(s): 9781315683010, 1315683016
Edition: 1
File Details: PDF, 11.82 MB
Year: 2015
Language: english
BIOINFORMATICS AND BIOMEDICAL ENGINEERING
Editors
James J. Chou
Harvard Medical School, USA
Huaibei Zhou
Wuhan University, China
All rights reserved. No part of this publication or the information contained herein may
be reproduced, stored in a retrieval system, or transmitted in any form or by any means,
electronic, mechanical, by photocopying, recording or otherwise, without written prior
permission from the publisher.
Although all care is taken to ensure integrity and the quality of this publication and the
information herein, no responsibility is assumed by the publishers nor the author for any
damage to the property or persons as a result of operation or use of this publication and/or
the information contained herein.
Table of contents
Preface xi
Organization xiii
Acknowledgements xv
About the editors xvii
Biomechanics
In vitro and in vivo biomechanical research on cervical arthroplasty and fusion 113
Z.H. Liao & W.Q. Liu
Mechanical conditions affect intervertebral disc degeneration concerning
its water retention 121
W. Weina, W.Q. Liu & Z.H. Liao
Viscoelasticity of the intervertebral segments after fusion under continuous
compression 127
B.Q. Pei, Z.Y. Liu, H. Li & Y.Y. Pei
vi
vii
Biomedical imaging
The application of photoacoustic tomography in joint tissues 335
X.C. Zhong, X.Y. Jing, S.Q. Jing, N.Z. Zhang & J. Rong
Frequency response mismatch correction in multichannel time interleaved analog
beamformers for ultrasound medical imaging 341
A. Zjajo & R. van Leuken
HIFU based photoacoutic tomography 349
X.C. Zhong, W.Z. Qi, S.Q. Jing, N.Z. Zhang & J. Rong
Analyzer-Based Phase Contrast X-ray Imaging for mouse tissues 355
H. Li, M. Wang, Z. Wang & S.Q. Luo
viii
Rehabilitation engineering
The research on motion recognition based on EMG of residual thigh 445
T.Y. Zhang
ix
Preface
It is our great pleasure to present the proceedings of The 9th International Conference on
Bioinformatics and Biomedical Engineering (iCBBE 2015), held September 18–20, 2015 in
Shanghai, China. We would like to take this opportunity to express our sincere gratitude and
appreciation to all the authors and participants for their support of this conference.
The research on Bioinformatics and Biomedical Engineering has enormous impacts on
science, education, culture and society as well. Actually, the discipline of Bioinformatics and
Biomedical Engineering has become a new focus of life science, mathematical science, com-
puter science and electronic information science. More and more scientists all around the
world are dedicating themselves to this interdisciplinary area, accumulating a lot of interest-
ing results.
We are proud to see that the previous iCBBE conferences were successful in providing an
ideal platform for them to exchange their exciting findings, to stimulate the further develop-
ment of Bioinformatics and Biomedical Engineering, and to enhance its impacts to various
areas of both science and medicine (see, e.g., Medicinal Chemistry, 2015, 11, 218–234). We
believe that the 2015 iCBBE will do even better in this regard.
On behalf of the organizing committee, we would like to take this opportunity to express
our gratitude to the conference’s sponsors: Wuhan University and The Gordon Life Science
Institute.
Our appreciation and gratitude are also extended to all the papers’ reviewers and the Con-
ference Organization Committee members. It is impossible to hold such a grand conference
without their help and support.
The papers collected in the “2015 iCBBE Proceedings” provide the detailed results of some
oral presentations that will be of use to the readership.
Editors
Prof. James J. Chou
Harvard Medical School, USA
xi
Organization
This volume contains the Proceedings of the 9th International Conference on Bioinformat-
ics and Biomedical Engineering (iCBBE 2015)—held September 18–20, 2015 in Shanghai,
China. iCBBE2015 has been organised by Wuhan University and The Gordon Life Science
Institute.
General Chair
Prof. James J. Chou, Harvard Medical School, USA
xiii
xiv
Acknowledgements
The Organising Committee members wish to express their sincere gratitude for the finan-
cial assistance from the following organisations: Wuhan University, the Gordon Life Science
Institute and the 1000 Think Tank.
The technical assistance of all paper peer reviewers and the publisher CRC Press/Balkema
is gratefully acknowledged. We are also thankful to the International Programme Commit-
tee as well as the members of the Local Organising Committee. Finally, the editors want to
acknowledge all peer reviewers for their great efforts and contributions to us.
Editors
Prof. James J. Chou
Harvard Medical School, USA
xv
xvii
ABSTRACT: In this study, we tried to find rules between mRNA sequences of Homo
sapiens histamine receptors H1, H2, H3 and H4 with data mining algorithms, namely Apriori
and Decision tree. In the Apriori algorithm, we split sequences into 5, 7 and 9 windows. The
results showed a strong relationship between the H1 and H4 receptors, and also between the
H2 and H3 receptors. The receptors were divided into two groups according to their compo-
nents. Additionally, we would leave relevancy between the H2 and H3 receptors for another
study with a different data mining algorithm. In the case of the H1 and H4 receptors, we
found that amino acid “F(phenylalanine)” would be a standard to classify the H1 and H4
receptors. We suggest that H4 could be a mutated form of H1. To support our hypothesis, we
conducted an additional experiment with the Decision Tree algorithm, focusing on the exist-
ence of amino acid “F”. The data showed the difference between the H1 and H4 receptors.
In conclusion, the H4 and H1 receptors are related to each other by mutation.
1 INTRODUCTION
Immune system is the system of the animal body that acts as a protection system against
the pathogen from the outer environment. However, there are several diseases caused by
disruptions in immune system functions, such as allergies. Allergies are a result of exagger-
ated and hypersensitive responses to certain antigens called allergens. Antigens of the IgE
class involves in allergic reaction. IgE antibodies are attached to mast cells in connective
tissues by their base. When antigens enter the body, they attach to antigen-binding sites of
IgE antibodies. This connects other antigens near IgE antibodies, causing these antigens to
band together. Mast cells secrete histamines or other inflammation induction substances
from granules. This process is called degranulation. Histamine expands the blood vessel
and increases the permeability of capillaries, which causes typical allergy symptoms such
as sneezing, runny nose, watery eyes and smooth muscle contractions which may lead to
breathing difficulty. Antihistamines blocks histamines from combining to receptor in order
to diminish allergy symptoms (Jane. 2011). Histamine is a type of amine that is produced
and secreted in the animal immune system. It is used in local immune response in order to
cause inflammation. Histamine, which is secreted from mast cells, would bind to its receptor
(histamine receptor). There are four kinds of histamine receptor known in the human body,
which are named H1, H2, H3 and H4. Previously, it was known that histamine receptor 1
(H1) is highly involved in allergic reaction; however, recent studies have shown that hista-
mine receptor 4 (H4) is also involved. It is supposed that there may be some kind of relation-
ships between these two receptors (Thrumond. 2008) (Fung-Leung. 2004). In this study, we
compare the amino acid strand of four types of histamine receptor H1, H2, H3 and H4 by
using an Apriori algorithm and a Decision Tree algorithm in order to see relatedness and
isoforms between the receptors.
2.1 Materials
For the experiment, we collected the mRNA sequence of Homo Sapiens histamine receptors
H1 (HRH1)*, H2 (HRH2)**, H3 (HRH3)*** and H4 (HRH4)**** including its transcript
variants. The mRNA sequences used can be found in the NCBI database.
2.2 Methods
To process the data, we used two kinds of algorithms: Apriori and Decision Tree (Lee. 2014)
(Lim. 2014) (Go. 2014).
*“Homo sapiens histamine receptor H1 (HRH1), transcript variant 1, mRNA”, NCBI Reference
Sequence: NM_001098213.1 (4,578 bp linear mRNA).
“Homo sapiens histamine receptor H1 (HRH1), transcript variant 2, mRNA”, NCBI Reference
Sequence: NM_001098212.1 (4,298 bp linear mRNA).
“Homo sapiens histamine receptor H1 (HRH1), transcript variant 3, mRNA”, NCBI Reference
Sequence: NM_001098211.1 (4,348 bp linear mRNA).
“Homo sapiens histamine receptor H1 (HRH1), transcript variant 4, mRNA”, NCBI Reference
Sequence: NM_000861.3 (4,427 bp linear mRNA).
**“Homo sapiens histamine receptor H2 (HRH2), transcript variant 1, mRNA”, NCBI Reference
Sequence: NM_001131055.1 (2,624 bp linear mRNA).
“Homo sapiens histamine receptor H2 (HRH2), transcript variant 2, mRNA”, NCBI Reference
Sequence: NM_022304.2 (3,095 bp linear mRNA).
***“Homo sapiens histamine receptor H3 (HRH3), mRNA”, NCBI Reference Sequence: NM_007232.2
(2,680 bp linear mRNA).
****“Homo sapiens histamine receptor H4 (HRH4), transcript variant 1, mRNA”, NCBI Reference
Sequence: NM_021624.3 (3,686 bp linear mRNA).
“Homo sapiens histamine receptor H4 (HRH4), transcript variant 2, mRNA”, NCBI Reference
Sequence: NM_001143828.1 (3,422 bp linear mRNA).
“Homo sapiens histamine receptor H4 (HRH4), transcript variant 3, mRNA”, NCBI Reference
Sequence: NM_001160166.1 (3,522 bp linear mRNA).
3 DISCUSSION
acid makes a small difference between H1 and H4 amino acids, while they still have a more
common part. This explains why the H1 and H4 receptors share a common position in the
immune system even though they are named differently. Also, the difference between the H2
and H3 receptors would occur by the amount of additional amino acids P and A. Although
the H2 and H3 receptors share their additional amino acids A, P and G, the H2 receptor has
7
shown a relatively high peak on P amino acid, while H3 amino acid had a peak on A amino
acid. As this difference cannot be defined through the Apriori algorithm, we believe that we
have reached the limit of the Apriori algorithm. So, with other data mining algorithm, this
difference may be defined.
Through two experiments using the Apriori algorithm and the Decision Tree algorithm, we
found some important facts. Based on these facts, we made some deduction on the relationship
between the H2 and H3 receptors and also on the relationship between the H1 and H4 recep-
tors. The relationship between the H2 and H3 receptor is the first one that we have found.
In the experiment with the Apriori algorithm, we found a strong correlation between the H2
and H3 receptors. Both receptors are composed of the same types and frequency of amino
acid, which leaves the only difference in the small frequency change of “Additional” amino
acid. In other words, this means that another type of data mining algorithm and experiment
is required in order to specify the difference. We would leave this part as possibility for the
next study and experiment.
The second one is about the H1 and H4 receptors, which is the most important point in
our experiment. In the experiment using the Apriori algorithm, we found a large correlation
and a small difference between the H1 and H4 receptors, which is whether they did or did not
have F amino acid. We decided that an additional experiment is needed in order to find more
information. Decision Tree algorithm was chosen for the second experiment. The results did
not differ much from the experiment using the Apriori algorithm, which became a big ground
for our hypothesis, which is evolutionary variation. This supports our suggestion to be valid.
In conclusion, we have analyzed the mRNA sequence of Homo sapiens histamine receptors,
which is related to the cause of allergy. The results showed much relevancy between the H1,
H2, H3 and H4 receptors. Also, we defined some of receptors to verify our hypothesis. As
there are results consistent with the experimental result data, we found that our study was
certainly meaningful. Our future task of this study is to develop an integration system for a
better environment in studies in the field of bioinformatics.
REFERENCES
Fung-Leung W.P. 2004. Histamine H4 receptor antagonists: the new antihistamines? Current Opinion
in Investigational Drugs.
Go E.B 2014. Analysis of Ebolavirus. International Journal of Machine Learning and Computing
(IJMLC).
Jane B.R. 2011. BIOLOGY, 9th Edition. California, CA: Pearson Education Inc.
Kim D.Y. 2014. Comparison of Hemagglutinin and Neruaminidase of Influenza A Virus Subtype H1N1,
H5N1, H5N2, and H7N9 using Apriori Algorithm. Lecture Note Computer Science (LNCS).
Kwon J.W 2014. Comparison of HTLV and STLV. APCBEE Procedia.
Lee J.H. 2014. Analysis of Malaria Inducing P. Falciparum P. Ovale, and P. Vivax. APCBEE Procedia
Lim S.J. 2014. Analyzing Patterns of Various Avian Influenza Virus by Decision Tree. International
Journal of Computer and Electrical Engineering (IJCEE).
Lim S.J. 2014. rRNA of Alphaproteobacteria Rickettsiales and mtDNA Pattern Analyzing with Apriori &
SVM. Lecture Note Computer Science (LNCS).
Thrumond R.L. 2008. The role of histamine H1 and H4 receptors in allergic inflammation: the search
for new antihistamines. Nat Rev Drug Discov.
1 INTRODUCTION
Filvirdae is a virus family of RNA viruses belonging to the order Mononegavirales. Ascribed
to causing hemorrhagic fever in primates including both human and non-humans, members
of the family (called filoviruses or filovirids) are identified as hazardous by organizations.
The family Filovirade contains three virus genera: Cuevavirus, Ebolavirus, Marburgvirus.
Ebolavirus can be further divided into five species: Bundibugyo ebolavirus (BEBOV), Reston
ebolavirus (REBOV), Sudan ebolavirus (SEBOV), Tai Forest ebolavirus (CIEBOV/TAFV),
and Zaire ebolavirus (REBOV) (Henzy, 2014). Ebolavirus and marburgvirus are thought
to be zoonotic, passed from animals to humans. Fruit bats of the Pteropodidae family are
generally considered to be the host of both viruses; however, apes and chimpanzees are also
regarded as possible hosts. More specifically, scientists infer that rosettus aegypti, which are
fruit bats inside the Pteropodidae family, are the natural hosts of Marburgvirus, transmitting
the disease to people and causing it to spread to other humans. The viruses both cause symp-
toms such as malaise, muscle pain, sudden fever, headache, etc., and at the moment, there is
no standardized cure of filovirus diseases (WHO, 2014). The name Filviridae comes from the
Latin term filum, meaning “threadlike” which accurately describes the slender structure of
filovirions. Filoviruses contain linear, non-segmented, single-stranded and antisense (often
called negative-sense) RNA genomes, ∼19-kb long in length. Proteins NP, VP30, VP35, and
L form the nucleocapsid; proteins VP24 and VP24 form the viral matrix, and protein GP
forms the surface of the particle. In total, filoviruses contain seven proteins that function in
its own distinct way.
As mentioned above, Filovirade can be classified into three virus genera: Cuevavirus,
Ebolavirus, Marburgvirus. This research focuses on the latter two, which are the more com-
monly known of the three. By using decision tree and apriori algorithm, we tried to obtain
the similarities and the differences between the protein sequences of the Ebolavirus and Mar-
burgvirus by analyzing proteins VP24, VP35, and GP from the RNA genome.
11
2.2 GP (Glycoprotein)
The Glycoprotein (GP) is in charge of letting in the virus into target host cells. Located on
the surface of the particle, GP mediates the entry.
3 EXPERIMENT
Virus Rule
12
Virus Rule
Virus Rule
Virus Rule
Virus Rule
relationships are proven. No rules can be extracted from VP24 of both viruses under
windows 7 and 9.
Table 4 conveys that there are similarities in pos1 = D between VP35 of Ebola virus and
Marburg virus. Although their frequency is different from 0.833 to 0.800, sharing of the
amino acid in this position might deeply contribute to the similarity of two viruses. According
to Table 5, rules are extracted only from the position 1 under 7 window. So, we assume that
position 1 will be the crucial factor which makes Ebola and Marburg virus different from
each other. Lastly, no rules can be extracted under 9 window.
Figure 5, Figure 6, and Figure 7 are the analysis of VP24 of both viruses under windows 5,
7, and 9. 5-window: Comparison between Ebolavirus amino1 Leucine and Marburgvirus
amino2 Leucine. 7-window: Ebolavirus amino3 Leucine and Marburgvirus amino1 Leucine.
9-window: Ebolavirus amino3 Leucine and Marburgvirus amino3 Leucine. As a result of this
analysis, we concluded that Ebolavirus has higher Leucine in VP24 than Marburgvirus has.
14
Figure 8 and Figure 9 are the analysis of VP35 of both viruses under 7 and 9 windows.
Under 5-windows, there were no amino acids both viruses share to compare with, so we only
chose 7 and 9 windows. 7-window: Comparison between Ebolavirus amino4 Leucine and
Marburgvirus amino5 Leucine. 9-window: Ebolavirus amino6 Leucine and Marburgvirus
amino3 Leucine. In this case, the figures above show that generally Leucine in VP35 of Mar-
burgvirus is higher than that in Ebolavirus.
15
4 CONCLUSION
As a result of analysis by Apriori algorithm, we found that the higher the windows are, the
more detailed and diverse the DNAs were. After using Decision Tree, we were able to figure
out similarities between Ebola virus and Marburg virus. Although rules were not extracted
from all windows, it was clear that Ebolavirus and Marburgvirus share similar traits for GP,
16
REFERENCES
Henzy, Jamie. 7 Sept. 2014. “Five Questions about Filoviruses.” ‘Small Things Considered’ N.p., Web.
12 Mar. 2015.
Prins, K.C., W.B. Cardenas, and C.F. Basler. “Ebola Virus Protein VP35 Impairs the Function of
Interferon Regulatory Factor-Activating Kinases IKK and TBK-1.” 2009. Journal of Virology 83.7:
3069–077. Web.
Rokach, Lior, and Oded Maimon. 2010. “9. Decision Trees.” Data Mining and Knowledge Discovery
Handbook. N.p.: n.p., n.d. N. pag. Web.
Shi, Zhongzhi. 2011. Advanced Artificial Intelligence. Singapore: World Scientific.
World Health Organization. Sept. 2014. Ebola Virus Disease. WHO, Web. 11 Feb. 2015. Centers for
Disease Control and Prevention. Centers for Disease Control and Prevention, Web. 12 Apr. 2015.
Xu, Wei, Megan R. Edwards, Dominika M. Borek, Alicia R. Feagins, Anuradha Mittal, Joshua B.
Alinger, Kayla N. Berry, Benjamin Yen, Jennifer Hamilton, Tom J. Brett, Rohit V. Pappu, Daisy W.
Leung, Christopher F. Basler, and Gaya K. Amarasinghe. 2014. “Ebola Virus VP24 Targets a Unique
NLS Binding Site on Karyopherin Alpha 5 to Selectively Compete with Nuclear Import of Phospho-
rylated STAT1.” Cell Host & Microbe 16.2: 187–200. Web.
17
Q.W. Dong
School of Computer Science, Fudan University, Shanghai, China
Department of Computational Medicine and Bioinformatics, University of Michigan,
Ann Arbor, Michigan, USA
Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China
K. Wang
College of Animal Science and Technology, Jilin Agricultural University, Changchun, China
ABSTRACT: After the completion of human genome project, the proteome research
becomes one of the center problems in post-genomics era. The Human Protein Project aims
to identify at least one protein product from each of the human protein-coding genes by
using experiment methods. However there are still many proteins without experimental evi-
dence which become one of the major challenges in chromosome-centric human proteome
project. Taking into consideration of the complexity of detecting these missing proteins by
using proteomics approach, here we provide the structure and function of these missing pro-
teins by bioinformatics methods. 616 “uncertain” missing proteins are extracted from the
neXtProt database and the structure and function of these missing proteins are predicted
by using state-of-the-art software I-TASSER and COFACTOR respectively. A comprehen-
sive evaluation shows that the results are in good consistent with many manually curated
annotations from well-established databases and other mass spectrum datasets. There are
188 foldable proteins (I-TASSER C-Score larger than −1.5) without using any homologous
template, which may be native gene-coding proteins. The Gene Ontology function predic-
tion results are in good agreement with the manual annotation from neXtProt database,
and also the confidence scores are well correlated with the evaluation metrics with Pearson
correlation coefficient of 0.65. The data are deposited into Human Proteome Structure and
Function database (HPSF) which can provide valuable references about the missing proteins.
The HPSF database is publicly available at http://zhanglab.ccmb.med.umich.edu/HPSF/.
1 INTRODUCTION
Proteins play an important role in biology activities. The successful completion of Human
Genome Project (Venter et al., 2001) provides a valuable blueprint about all the genes encod-
ing entire human proteins. However, due to the complexity of proteins and currently underde-
veloped proteomics technique, many of the proteins have not been detected and annotated.
Towards exploring the universal space of human proteome, the Human Proteome Organi-
zation (HUPO) has recently launched the Human Proteome Project (HPP) (Legrain et al.,
2011) including the Chromosome-centric Human Proteome Project (C-HPP) (Paik et al.,
2012) and Biology/Disease-Driven HPP (B/DHPP). (Aebersold et al., 2013) The primary
goal of the C-HPP is to identify at least one representative protein product and as many post-
translational modifications, splice variant isoforms as possible for each of the human genes.
This ambitious goal is collaborated by 25-membered international consortium covering 24
chromosomes and mitochondria. (Marko-Varga et al., 2013) The HPP executive committee
has established five baseline metrics for C-HPP (Marko-Varga et al., 2013): the Ensembl
database (Flicek et al., 2014) provides the number of protein-coding genes; Peptide Atlas
19
2.2 Pipeline of structure and function prediction for the missing proteins
The procedure for the structure and function prediction is illustrated in Figure 1. For a given
protein sequence, the LOMETS program (Wu and Zhang, 2007) is firstly run to get the possible
20
template from PDB library, and then the ThreaDom program (Xue et al., 2013) is used to get
the domain information. If the result is a single domain, the I-TASSER software (Roy et al.,
2010) is used to obtain the final 3-D structure, otherwise the 3-D structure of each domain
is obtained by I-TASSER (Roy et al., 2010), and then these structures are assembled to get a
single structure, which will be refined with FG-MD program (Zhang et al., 2011) to obtain the
final 3-D structure. The COFACTOR program (Roy et al., 2012) is then run to get the function
information including ligand-binding site, Gene-Ontology terms, and Enzyme Classification.
All the programs are extensively tested and achieve good assessment on many community-wide
experiments. For example, I-TASSER was ranked as the No. 1 server for protein structure pre-
diction in recent CASP7 (Moult et al., 2007), CASP8 (Moult et al., 2009), CASP9 (Moult et al.,
2011), and CASP10 (Moult et al., 2014) experiments. The COFACTOR algorithm was ranked
as the best method for function prediction in the CASP9 experiments (Moult et al., 2011).
Figure 3. The distribution of COFACTOR C-score for PE1 (A) and PE5 (B) proteins respectively.
The predicted function is accessed based on the annotation of neXtProt database and
the COFACTOR C-score. The COFACTOR C-score is defined based on the C-score of the
structure prediction and the global and/or local structural similarity between the predicted
models and their structural analogs in the PDB. The COFACTOR C-score has been normal-
ized from 0 to 1, where a high value indicates a high confidence prediction. Among the 625
highly confident proteins, 592 proteins have annotation in neXtProt database and one pro-
tein has no COFACTOR prediction result, so the evaluation is counted on the 591 proteins.
The evaluation metrics used here is the protein-centric metrics as used by Critical Assessment
of Function Annotation (Radivojac et al., 2013), where precision is defined as the number
of correctly predicted functional terms divided by the total number of prediction, the recall
is defined as the number of correctly predicted functional terms divided by the total number
of annotation and F-score is a harmonic mean between precision and recall. As shown in
Figure 3(A), many of the proteins are predicted with high COFACTOR C-score. The Pear-
son correlation coefficient between the COFACTOR C-score and F-score is 0.53, which
means the COFACTOR C-score is also a good indicator of the prediction quality.
3.2 Summary of the predicted structure and function of the missing proteins
To provide the most comprehensive information, the structure and function of the missing pro-
tein are predicted in homology and non-homology mode. In the homology mode, all the possible
templates from PDB library are used no matter they are homologous to the missing proteins
or not. In the non-homology mode, only the non-homologous templates are used, where the
sequence identities between the target missing protein and the templates are below 30%.
22
Table 2. Distribution of missing proteins in different gene loci types after HGNC
mapping.
In homology mode, the Pearson correlation coefficient between the COFACTOR C-score and
F-score is 0.69, where in non-homology mode, the Pearson correlation coefficient is 0.65.
4 CONCLUSION
In this study, 616 missing proteins, which have the lowest confidence (PE5) in neXtProt database,
have been investigated by using bioinformatics methods. The structure and function of these
proteins have been predicted by using cutting-edge software: I-TASSER and COFACTOR
respectively. The prediction is extensively evaluated by extracting well-established evidence
about missing proteins, such as neXtProt annotation, HGNC gene loci annotation and mass-
spectrometry dataset. The results show that there is good consistency between the prediction
and the evidence of the proteins in PE5. There are 188 foldable proteins with high confidence
score of I-TASSER structure simulation without using any homologous templates, indicating
that these proteins may be native gene-coding proteins. Both structure and function evalua-
tion shows that the missing proteins are over-represented in membrane proteins in comparison
with the highly confident proteins. Since the membrane proteins are hard to be separated
and purified, detection of the membrane proteins are more difficult than other proteins. The
results indicate that there may be more membrane proteins in the missing proteins.
REFERENCES
Aebersold, R., Bader, G.D., Edwards, A.M., et al. 2013. The biology/disease-driven human proteome
project (B/D-HPP): enabling protein research for the life sciences community. J Proteome Res, 12, 23–7.
Ashburner, M., Ball, C.A., Blake, J.A., et al. 2000. Gene ontology: tool for the unification of biology.
The Gene Ontology Consortium. Nat Genet, 25, 25–9.
25
26
S.P. Jang, S.H. Lee, S.M. Choi, H.S. Choi & T.S. Yoon
Natural Science, Hankuk Academy of Foreign Studies, HAFS, Yongin-si, Republic of Korea
ABSTRACT: SARS (Severe Acute Respiratory Syndrome) corona virus has hugely affected
humans for more than ten years. Virus’ RNA replicase gene is surrounding the polyprotein
la and lab, and the sequence of the polyproteins contains functional proteins, which is an
important factor of replication. The experiment was performed based on the “distorted key
theory” in order to prevent SARS corona virus from performing replication by inactivating
the main protease (also called CoV Mpro) of the virus. After the experiment, Neural Net-
work (NN) was utilized in order to reanalyze the results of polypeptides in the virus. This
approach by NN distinguished the fixed patterns in the sequence of cleavage site successfully,
and improved the comprehension of the protease structure. The method of preventing the
virus replication using competitive inhibitor could be found by analyzing these patterns.
1 INTRODUCTION
Severe Acute Respiratory Syndrome (SARS) is a respiratory disease, which has largely influ-
enced both humans and animals. After the outbreak of SARS in Asia, the WHO declared
the coronavirus, which is classified as a single-strand RNA virus of zoonotic origin, as the
main cause of SARS. Between November 2002 and July 2003, SARS had a serious effect on
8,273 individuals and caused 775 deaths (9.6% mortality rate) in multiple countries, with most
cases in Hong Kong (Chou K.C. 1996). The initial symptom of the sickness was high fever
above 38°C (100.4 °F) with an unspecific flu-like symptom, involving breathing difficulties.
SARS coronavirus main protease (CoV Mpro) is an enzyme that catalyzes RNA replicase of
the virus, which is an essential process for the virus to survive, through replicating essential
polyprotein (Marra Marco A. et al. 2003). Perceiving the virus to be disastrous to humans,
we decided to perform a study on the cleavage site in CoV Mpro, which is known as the secret
of developing drugs against SARS because of its status as an intimidator of SARS existence.
Based on the “distorted key theory”, we performed an experiment using the NN algorithm to
analyze the sequence of the cleavage site and increase its analytic accuracy in order to develop
an effective amino acid, which is a key factor to prevent viral replication.
27
Figure 1. Each circle represents individual nodes that are connected together, arrows represent the
output for one side and the input for another side.
Figure 2. Chou’s distorted key theory: (a) the peptide both cleaved by CoV Mpro and effectively
bound to the active site of the protease, while the peptide in plate (b) becomes non-cleavable through
modification but still bound to the active site. The modified peptide, also called a “distorted key”, can
become an effective inhibitor against SARS.
28
Neural Network (NN), as the name suggests, is a computer program imitating the informa-
tion procedure of the biological nervous system. The nervous system consists of basic units
called neurons. Neurons are linked to each other, developing a huge network as a whole. The
processing of an individual neuron is known as follows: a new electrical signal is received
through the dendrites, transferred down the axon, and finally sent to other neighbored
neurons. The process of receiving and passing the signal can be carried out only when the
signal exceeds the given threshold. Also, a single neuron can be linked with multiple neurons
at both ends.
Obviously, NN is a set of nodes in a same topology linked to each other. Each node has
input and output links to the others. The fact that these links allow every node to interact
with the others and the output value depends on whether the signal surpasses the threshold
corresponds to the processing in biological neurons. Reflecting this correspondence, the main
function in NN that actually processes the given information is called “neurons.”
Figure 3. Perceptron is the principle of NN (Neural Network), calculating the product of the input
and weight values and their sum, from which the result can be obtained.
29
In our experiment, we marked 20 amino acids with numbers ranging from 1 to 20. Next,
these numbers were given as inputs to the NN. The input product value and the weight fac-
tor were put into the neuron, and the output value is earned by the previously designed NN.
Analyzing the frequency of the certain amino acids in a specific spot, it was possible to know
the sequence patterns.
The polyprotein, which is related to CoV Mpro, can be parted into two groups: 8-mers and
12-mers. The rules shown between the two groups were considerably different: NC_004718
(TOR2), NC_002645 (HCoV 229E), NC_001846 (MHV), NC_003045 (BCoV), NC_--1451
(IBV), NC_002306 (TGEV), NC_003436 (PEDV), AF208067 (MHVM), AF201929 (MHV2),
AF208066 (MHVP), AY278741 (Urbani), AY278488 (BJ01), AY278554 (CUHK-W1),
AY282752 (CUHK-su10), and AY291451 (TW1). The experiment was performed with the data
of 154 8-mers and 45 12-mers within 14 cleavage sites. Finally, we gained the data from ‘Mining
SARS-CoV protease cleavage data using non-orthogonal decision trees, a novel method for
decisive template selection’, researched by Zheng Rong Yang (Zheng Rong Yang 2005).
3 RESULTS
In NN results, we could find a strong rule in the 8-mers experiment, but we could not find a
special rule because the 12-mers experiment did not show high accuracy and strong rule. The
results for 8-mers are given in Table 1.
3.1 8-mers
The amino acids of performing neural network algorithm with the 8-mers are as follows:
amino1 in {A,I,K,N,P,S,T,V}
amino2 in {A,D,F,G,I,K,L,M,N,Q,R,S,T,V,B}
amino3 in {F,I,L,M,V}
amino4 in {Q}
amino5 in {A,C,G,N,S}
amino6 in {A,C,E,F,G,I,K,L,N,R,S,T,V}
amino7 in {A,D,E,F,G,I,K,L,M,N,P,Q,R,S,T,W,Y}
amino8 in {A,D,E,G,H,I,K,L,M,N,P,Q,R,S,T,V,W}
3.2 12-mers
Unexpectedly, no key rule was found in the performance involving the 12-mers. The presence
of different amino acids among all the sites did not have differences that are huge enough to
consider as the key factor.
4 CONCLUSIONS
REFERENCES
Chou K.C. (1996). “Review: Prediction of human immunodeficiency virus protease cleavage sites in
proteins”. Analytical Biochemistry 233: 1–14. doi:10.1006/abio.1996.0001. PMID 8789141. (references).
Hansen, Lars Kai, and Peter Salamon. “Neural network ensembles.” IEEE transactions on pattern
analysis and machine intelligence 12.10 (1990): 993–1001.
Marra, Marco A., et al. “The genome sequence of the SARS-associated corona-virus.” Science 300.5624
(2003): 1399–1404.
Qi-Shi Du, Hao Sun and Kuo-Chen Chou, “Inhibitor Design for SARS Coronavirus Main Protease
Based on Distorted Key Theory,” Medicinal Chemistry, 2007, 3, 1–6.
Shinyoung Lee, Jisue Kang, Jiwoo Oh, Yoonjoo Kim, Jungwon Baek and Taeseon Yoon, “Prediction of
SARS Coronavirus Main Protease by Support Vector Machine”, IACSIT Press Vol. 59, 2014.
Zheng Rong Yang, “Mining SARS-CoV protease cleavage data using non-orthogonal decision trees;
a novel method for decisive template selection,” Vol. 21 no. 11 2005, pages 2644–2650. doi:10.1093/
bioinformatics/bti404.
Zheng Rong Yang, “Mining SARS-CoV protease cleavage data using non-orthogonal decision trees;
a novel method for decisive template selection,” Vol. 21 no. 11 2005, pages 2644–2650. doi:10.1093/
bioinformatics/bti404.
31
6. Dresden-Dippoldiswalde-Schmiedeberg
(26 km). Rehefeld-Zaunhaus (über Dorf Sayda
14 km, über Bärenfels und Schellerhau 13 km, doch
mehr Berg). Neustadt (6 km). Niclasberg (3 km).
Klostergrab (5 km). Ossegg (4 km). Teplitz
(10 km).
Bis Schmiedeberg siehe Tour 5. Hier verlassen wir die rothe Weisseritz
und gehen im Pöbelbachthal aufwärts die Strasse, die bei Ober-Pöbel
einen Bergrücken überschreitet und das Thal der Wilden Weisseritz
gewinnt. In Sayda Gasthof. Die Strasse folgt in der Hauptsache der
Weisseritz bis Rehefeld-Zaunhaus.
9. Freiberg-Halsbrücke (5 km).
Krummhennersdorfer Mühle (4 km).
Oberreinsberg (4 km). Zollhaus (1½ km).
Bergwerk Kurprinz (über Burkersdorf 4 km).
Ueber Altväterwasserleitung nach Freiberg (8 km).
Wir wandern, Herders Ruhe und später Tuttendorf zur Rechten, hinab ins
Muldenthal nach Halsbrücke. Grossartige fiscalische Hüttenwerke,
Silberschmelzen, ganzähnlich eingerichtet, wie die Muldenhütten.
Führung 1 Mk. Bei Halsbrücke befinden sich auch grosse Bingen. Die
Einbrüche geschahen 1662 und 1691. Das Muldenthal wieder
verlassend, wandern wir über die Höhe, dann durch
Krummhennersdorf, wo einst Markgraf Albrecht der Stolze an Gift
starb, hinab nach der stattlichen Mühle an der Bobritzsch. Dieser Fluss
ist der stärkste Nebenfluss der Mulde und hat prächtige Thalpartien.
Bei der Mühle beginnt die Grabentour, so nennt sich der schöne
Prommenadenweg im romantischen Thalzug, der allen Krümmungen
des Berggrabens folgt. Hier stossen wir auch auf Lichtlöcher des
Rothschönberger Stollens, einer der grössten Bergstollen. Sein Bau
begann 1840, seine Vollendung fällt ins Jahr 1879. Die Länge beträgt
15 km, erreicht also die des St. Gotthardtunnels.
Am 6. Lichtloch vorüber, verlässt der Pfad bald das Thalgehäng und führt
nach Oberreinsberg hinauf, dessen Schloss mit Kirche schon lange
sichtbar ist. Nun hinab an das Zollhaus, ein romantisch gelegenes
Gartenrestaurant unfern des Einflusses der Bobritzsch in die Mulde
(von hier nach Nossen im schönen Thalzug Über die Steier- und die
Beiermühle 6 km. Nossen s. unten.). Der Rückweg führt uns zunächst
nach Bieberstein, schöner Schlosspark mit Ruine der alten Burg
Bieberstein. Ueber Burkersdorf und Teichhäuser gelangen wir wieder
hinab ins Muldengebiet und zum Kurprinzen, einem der beliebtesten
Ausflugsorte der Freiberger. Im Huthaus Schenke. Schöne
Gartenanlagen vom Oberberghauptmann Frhrn. von Herder
herrührend. Die Altväterwasserleitung, die einst einen
Bergwerkscanal über das Muldenthal leitete, liegt etwas abseit des
Weges nach Freiberg. Die ganze Anlage erinnert an römische
Aquaducte und in einiger Entfernung giebt sie ein imposantes, bei uns
seltenes Ruinenbild.
Sayda. Löwe. Stern. Ross. Rest. zum Rathskeller. 1612 Einw. 677
m. ü. M. Das freundliche Städtchen ist nach dem Brande 1842 fast
neuerbaut. Sayda war im frühesten Mittelalter eine wichtige
Handelsetappe zwischen Böhmen und dem Norden Deutschlands.
Eine Judenstadt ist verschwunden, auch von der alten Sorbenburg
Saydowa ist kein Stein mehr zu sehen. In der Kirche Grabmäler vom
Bildhauer Nosseni, der Familie von Schönberg zugehörig. Von der
Thurmgallerie grosse Umschau über die Olbernhauer Gegend und
über das Centralerzgebirge bis zur Augustusburg.
Nach Purschenstein hinab geht man den angenehmeren Fussweg am
Wald, die Strasse zur Linken lassend. Purschenstein und weiter siehe
Routennetz.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookultra.com