0% found this document useful (0 votes)

39 views

An Ensemble of Support Vector Machines For Predicting Virulent Proteins

This document presents an ensemble machine learning approach for predicting bacterial virulent proteins based on features extracted directly from amino acid sequences. The proposed method trains an ensemble of support vector machine classifiers on different feature subsets and representations, including amino acid indices, substitution matrices, and clustering-based groupings. An evaluation using three independent test datasets demonstrated the validity and improved performance of this ensemble approach compared to single classifier methods based only on amino acid sequence features.

Uploaded by

Sebastian Pinto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

An Ensemble of Support Vector Machines For Predicting Virulent Proteins

Uploaded by

Sebastian Pinto

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Expert Systems with Applications 36 (2009) 74587462

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

An ensemble of support vector machines for predicting virulent proteins

Loris Nanni *, Alessandra Lumini
DEIS, University of Bologna, via Venezia 52, 47023 Cesena, Italy

a r t i c l e

i n f o

a b s t r a c t
It is important to develop a reliable system for predicting bacterial virulent proteins for nding novel drug/vaccine and for understanding virulence mechanisms in pathogens. In this work we have proposed a bacterial virulent protein prediction method based on an ensemble of classiers where the features are extracted directly from the amino acid sequence of a given protein. It is well known in the literature that the features extracted from the evolutionary information of a given protein are better than the features extracted from the amino acid sequence. Our method tries to ll the gap between the amino acid sequence based approaches and the evolutionary information based approaches. An extensive evaluation according to a blind testing protocol, where the parameters of the system are calculated using the training set and the system is validated in three different independent datasets, has demonstrated the validity of the proposed method. 2008 Elsevier Ltd. All rights reserved.

Keywords: Virulent proteins Machine learning Ensemble of classiers Support vector machines

1. Introduction The aim of this paper is to propose a novel ensemble of classier for predicting virulent proteins (Hastings, Paget-McNicol, & Saul, 2004; Weiss, 2002). The virulence factors of bacteria are typically proteins that are coded by genes in the chromosomal DNA. An example of virulent proteins is shown in Fig. 1 (from Lilic, Vujanac, and Stebbins (2006)). In the last years there is an increase threat due to drug resistant strains of infectious agents (Morens, Folkers, & Fauci, 2004). The rst pathogen genome sequenced was, in 1995, the genome sequence of Haemophilus inuenzae (Fleischmann et al., 1995). Nowadays, there are 532 microbial genomes completely sequenced (Liolios, Tavernarakis, Hugenholtz, & Kyrpides, 2006), moreover, a large number of virulent proteins are discovered. Several methods for predicting virulent proteins are proposed in the literature. The rst developed methods were similarity search methods like BLAST (Altschul, Gish, Miller, Myers, & Lipman, 1990) and PSI-BLAST (Altschul et al., 1997). More recently, machine learning algorithms for predicting virulent proteins are proposed: In Sachdeva, Kumar, Jain, and Ramachandran (2005) the authors propose a neural network based prediction of virulence factors; In Garg and Gupta (2008) the authors propose an ensemble of support vector machine (SVM) where the different SVMs
* Corresponding author. E-mail addresses: [email protected], [email protected] (L. Nanni). 0957-4174/$ - see front matter 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2008.09.036

classiers were trained with sequence features of bacterial virulent proteins such as amino acid compositions, 2-gram compositions, higher order dipeptide composition and evolutionary information. In the literature several methods are proposed to extract a feature vector from the primary sequence of a protein. The main part of these methods are proposed for the subcellular location prediction (Chou & Shen, 2007a). In Garg and Gupta (2008) is shown that the prediction methods for virulent proteins based on the features directly extracted from the amino acid sequence do not perform well as the feature extraction methods based on the evolutionary information. In this paper, we deal with the virulent proteins prediction problem using an ensemble of support vector machines trained using features extracted directly from the amino acid sequence. We show that the ensemble of classiers permits to boost the performance of a system based on the amino acid sequence. The good performance, respect to that obtained by a stand-alone method, of the ensemble of classiers are well known, several examples are published in the bioinformatics literature. Several ensemble methods are applied on protein secondary structure prediction (Riis & Krogh, 1996), protein fold pattern prediction (Shen & Chou, 2006), protein subcellular localization prediction (Chou & Shen, 2007d), membrane protein type prediction (Chou & Shen, 2007b), and signal peptide prediction (Chou & Shen, 2007c). To build the ensemble of classiers we select a set of amino acid indices where each amino acid index is used to train a random subspace of radial basis function of support vector machines.

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 74587462

7459

Fig. 2. Scheme of a genetic algorithm (from http://www.nitrogen.za.org/ viewtutorial.asp?id=4).

Fig. 1. The InvB molecule in Salmonella (blue and yellow) binds (purple arrow) the Salmonella invasion protein (red). (For interpretation of the references to color in this gure legend, the reader is referred to the web version of this article.)

2. Proposed algorithm The ensemble here propose is based on the perturbation of the system AAIndexLoc proposed in Tantoso and Li (in press). The AAIndexLoc extracts for each protein P a feature vector obtained concatenating the following features: Amino acid (AA) composition, it is the percentage or fraction of amino acid y in P. Weighted AA composition, the weighted AA composition for amino acid y is dened as (amino acid composition of y) (index value a for the amino acid y). Five-level grouping composition, the amino acids are classied into ve groups, based on their amino acid index values, by kmeans clustering, then the ve-level dipeptide composition is performed. The ve-level dipeptide composition is dened as the composition of the occurrence of two consecutive groups (see Tantoso and Li (in press) for more details). Since there are ve groups there are 25 combinations of two consecutive groups. Each protein is thus represented by 70 features: 20 input features for AA composition; 20 input features for weighted AA composition; 25 input features for the ve-level grouping composition. Moreover, in the original work each protein sequence is divided into three parts and a feature vector is extracted from each part. In the problem studied in this paper we have obtained the same performance considering the features extracted from the whole protein sequence and the features extracted from the three parts of the proteins. We have proposed the following modications to the original method proposed in Tantoso and Li (in press) for building an ensemble of classiers: a different classier is trained considering a different amino acid index to calculate the weighted AA composition. In this work 494 amino acid indices and 83 substitution matrices from Kawashima and Kanehisa (2000) have been encoded. In order to reduce the number of amino acid indices/substitution matrices a sequential forward oating selection (SFFS) (Pudil, Novovicova, & Kittler, 1994) feature selection approach has been adopted (as in Nanni and Lumini (2006)), where the objective function is the maximization of the area under the receiver operating characteristic (ROC) curve (Fawcett, 2004) in the training set; in this

way the number of the features used for classication is reduced to K (different values of K ranging from 1 to 15 have been tested in the experiments). Each feature (i.e. a amino acid index or a substitution matrix) selected by SFFS is used to build a different weighted AA composition. Each weighted AA composition (concatenated with the AA composition and the ve-level grouping composition) is used to train a different classier. a genetic algorithm1 is used for clustering the amino acids in ve groups. The scheme of a genetic algorithm is shown in Fig. 2. Genetic algorithms (implemented as in GAOT MATLAB TOOLBOX) are a class of optimization methods inspired by the process of the natural evolution (Goldberg, 1989; Goldberg, 2002). The objective function of the genetic algorithm is the maximization of the area under the receiver operating characteristic curve in the training set. In the encoding scheme, the chromosome is a string whose length is 20 (the number of amino acids). Each value in the chromosome species at which group a given amino acid belongs (as in Nanni and Lumini (2008)). a random subspace (Ho, 1998) of radial basis function of support vector machine (Cristianini & Shawe-Taylor, 2000) is used in the classication step. The random subspace (Sachdeva et al., 2005) is a method for the creation of ensembles that selects different subsets of all the features to train each classier of the ensemble. Given a training set T containing patterns represented by Q features, the random subspace method generates M (M = 50 in this paper) new training sets T1,. . ., TM; each containing Q K features (0 < K < 1, K = 0.5 in this paper). Then M classiers are trained using these generated training sets and combined by sum rule (Kittler, 1998). As classier the radial basis function support vector machine2 with the default parameters (C = 1, Gamma = 1) is used. SVMs are widely considered as the state-of-the-art among the machine learning classiers. The goal of SVMs is to establish the equation of a hyperplane that divides the feature space (as shown in Figs. 3 and 4), leaving all the points of the same class on the same side, while maximizing the distance between the two classes and the hyperplane.

1 The initial population is a randomly generated set of chromosomes, then a xed number E (in this paper E = 5) of generation steps is performed by the application of the following basic operators: selection, crossover and mutation.Selection: The selection strategy is cross generational. Assuming a population of size D (in this paper D = 10), the offspring doubles the size of the population and the best D individuals from the combined parent-offspring population are retained.Crossover: uniform crossover is used, the crossover probability is xed to 0.96 in the experiments. Mutation: the mutation probability is 0.02. 2 Implemented as in OSU svm Matlab Toolbox.

7460

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 74587462 Table 1 Characteristics of the datasets used in the experimentation. Virulent proteins Training set Independent dataset 1 Independent dataset 2 Independent dataset 3 1025 469 40 141 Non-virulent proteins 1030 703 43 143

0.26 0.25 0.24 0.23 SA RS GA 2-Gram

EUC
Fig. 3. Scheme of the SVM (from http://www.cac.science.ru.nl/people/ustun/ SVM.JPG).

0.22 0.21 0.2 0.19

0.206 0.204 0.202 0.2 0.198 0.196 0.194 0.192 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

0.18 1 2 3 4 5 6 7 8 9 10

K
Fig. 5. EUC in the independent dataset 1 varying the number K of classiers selected by SFFS.

Bordetella (27 virulent and 27 non-virulent sequences); Haemophilus (35 virulent and 35 non-virulent); Listeria (15 virulent and 17 non-virulent). A summary of the characteristics of these datasets is reported in Table 1. As performance indicator we have used the error under the ROC curve (EUC)3 (Fawcett, 2004). The EUC is a scalar measure to evaluate performance which can be interpreted as the probability that the classier will assign a lower score to a randomly picked virulent protein sample than to a randomly picked non-virulent protein sample. In Fig. 3 we plot the performance in the training set varying the number K of classiers selected by SFFS, in this test as classier a stand-alone support vector machine is used and the amino acid are clustered by k-means. Moreover, we want to stress that the SFFS is run only considering the training data, then the features selected in the training data are used to classify the test data. In Fig. 5 we report for the Independent dataset 1 the EUC obtained by: Two-gram, the best method used in Garg and Gupta (2008) for extracting features from the amino acid sequence; SA, varying the number K of features selected by SFFS, in this test as classier a stand-alone support vector machine is used and the amino acid are clustered by k-means; RS, varying the number K of features selected by SFFS, in this test as classier a random subspace of support vector machine is used and the amino acid are clustered by kmeans; GA, varying the number K of features selected by SFFS, in this test as classier a random subspace of support vector machine is used and the amino acid are clustered by the genetic algorithm.
3

Fig. 4. EUC in the training set varying the number K of classiers selected by SFFS.

3. Experiments We have used the same datasets used in Garg and Gupta (2008). Training set: The bacterial virulent protein sequences were retrieved from the SWISS-PROT (Bairoch & Apweiler, 2000) and VFDB (an integrated and comprehensive database of virulence factors of bacterial pathogens, (Chen et al., 2005). It consists of 1025 virulent and 1030 non-virulent sequences and it is freely available at VirulentPred web server site at http://bioinfo.icgeb.res.in/virulent. This dataset was used for training SVM classiers and for building the ensemble of classiers. Independent dataset 1: It is the SPAAN dataset (Sachdeva et al., 2005), it consists of 469 adhesins and 703 non-adhesins proteins. Independent dataset 2: This independent dataset consists of 83 SWISS-PROT sequences (40 virulent and 43 non-virulent protein sequences), in this dataset there are not two sequences that are more than 40% similar. Independent dataset 3: This dataset consists of 141 virulent and 143 non-virulent sequences from bacterial pathogens: Campylobacter (39 virulent and 40 non-virulent protein sequences); Neisseria (25 virulent and 24 non-virulent);

Implemented as in DDtool 0.95 Matlab Toolbox.

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 74587462

7461

Notice that the performance obtained by SA with K = 1 is the performance obtained using the stand-alone AAIndexLoc (without the multi-classication method proposed in this paper). It is clear the ensembles here proposed outperform the stand-alone method. Now, in Figs. 6 and 7, the EUC obtained by the baseline 2-gram and by GA are reported also for the independent dataset 2 and the independent dataset 3. The following conclusions can be drawn from the results reported in this section: The best proposed system (named GA in the previous gures) obtains performance better than that obtained by 2-gram (the best system based on the amino acid sequence used in Garg and Gupta (2008)); Our method tries to ll the performance gap between the amino acid sequence based features and the evolutionary information based features. In Garg and Gupta (2008) it is shown that the fusion among sequence based methods and the evolutionary information based method outperforms the evolutionary information based method. Since the proposed method outperforms the 2-gram we hope that the fusion between our best proposed system and the evolutionary information based method outperforms the fusion between the 2-gram method and the evolutionary information based method.

4. Conclusions In this paper, we have presented methods based on ensemble of classiers for virulent proteins prediction. An extensive evaluation on a large dataset according to a blind testing protocol has demonstrated the superiority of these ensembles with respect to the stand-alone approaches. We have demonstrated that our system, based on the features extracted from the amino acid sequence, efciently classies sequences not used in the training, including the ones from the organisms not present in the training set. Please note that all the reported results have been obtained without any kind of parameter optimization for the SVMs used in the ensemble, we have simply used the default parameters. Acknowledgment The authors would like to thank Aarti Garg and Dinesh Gupta for sharing the datasets used in this paper. References
Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403410. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 33893402. Bairoch, A., & Apweiler, R. (2000). The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Research, 28, 4548. Chen, L., Yang, J., Yu, J., Yao, Z., Sun, L., Shen, Y., et al. (2005). VFDB: A reference database for bacterial virulence factors. Nucleic Acids Research, 33, D325D328. Chou, K. C., & Shen, H. B. (2007a). Review, recent progresses in protein subcellular location prediction. Analytical Biochemistry, 370, 116. Chou, K. C., & Shen, H. B. (2007b). MemType-2L, a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. Biochemical and Biophysical Research Communications, 360, 339345. Chou, K. C., & Shen, H. B. (2007c). Signal-CF, a subsite-coupled and window-fusing approach for predicting signal peptides. Biochemical and Biophysical Research Communications, 357, 633640. Chou, K. C., & Shen, H. B. (2007d). Euk-mPLoc: A fusion classier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. Journal of Proteome Research, 6, 17281734. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., et al. (1995). Whole-genome random sequencing and assembly of Haemophilus inuenzae Rd. Science, 269, 496512. Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers. Technical report. Palo Alto, USA: HP Laboratories. Garg, A., & Gupta, D. (2008). VirulentPred: A SVM based prediction method for virulent proteins in bacterial pathogens. BMC Bioinformatics, 9, 62. doi:10.1186/ 1471-2105-9-62. Goldberg, David E. (1989). Genetic algorithms in search, optimization and machine learning. Boston, MA: Kluwer Academic Publishers. Goldberg, David E. (2002). The design of innovation: Lessons from and for competent genetic algorithms. Reading, MA: Addison-Wesley. Hastings, I. M., Paget-McNicol, S., & Saul, A. (2004). Can mutation and selection explain virulence in human P. falciparum infections? Malaria Journal, 2, 3. Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832844. Kawashima, S., & Kanehisa, M. (2000). AAindex: Amino acid index database. Nucleic Acids Research, 28, 374. Kittler, J. (1998). On combining classiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226239. Lilic, M., Vujanac, M., & Stebbins, C. E. (2006). A common structural motif in the binding of virulence factors to bacterial secretion chaperones molecular. Cell, 21, 653664. Liolios, K., Tavernarakis, N., Hugenholtz, P., & Kyrpides, N. C. (2006). The Genomes on line database (GOLD) v.2: A monitor of genome projects worldwide. Nucleic Acids Research, 34, D332D334. Morens, D. M., Folkers, G. K., & Fauci, A. S. (2004). The challenge of emerging and reemerging infectious diseases. Nature, 430, 242249. Nanni, L., & Lumini, A. (2006). An ensemble of K-local hyperplane for predicting proteinprotein interactions. Bioinformatics, 10(22), 12071210. Nanni, L., & Lumini, A. (2008). A genetic approach for building different alphabets for peptide and protein classication. BMC Bioinformatics, 9, 45. Pudil, P., Novovicova, J., & Kittler, J. (1994). Flotating search methods in feature selection. Pattern Recognition Letters, 15, 11191125.

0.145 0.14 0.135 0.13 0.125 GA 2-Gram

EUC

0.12 0.115 0.11 0.105 0.1 1 2 3 4 5 6 7 8 9 10

K
Fig. 6. EUC in the independent dataset 2 varying the number K of classiers selected by SFFS.

0.249 0.248 0.247 0.246 GA 2-Gram

EUC

0.245 0.244 0.243 0.242 0.241 0.24 1 2 3 4 5 6

K
Fig. 7. EUC in the independent dataset 3 varying the number K of classiers selected by SFFS.

7462

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 74587462 Shen, H. B., & Chou, K. C. (2006). Ensemble classier for protein fold pattern recognition. Bioinformatics, 22, 17171722. Tantoso, E., Li, K.-B. AAIndexLoc, predicting subcellular localization of proteins based on a new representation of sequences using amino acid indices, amino acids. On-line version 10.1007/s00726-007-0616-y. Weiss, R. A. (2002). Virulence and pathogenesis. Trends in Microbiology, 10, 314317.

Riis, S. K., & Krogh, A. (1996). Improving prediction of protein secondary structure using neural networks and multiple sequence alignments. Journal of Computational Biology, 3, 163183. Sachdeva, G., Kumar, K., Jain, P., & Ramachandran, S. (2005). SPAAN: A software for prediction of adhesins and adhesin-like proteins using neural networks. Bioinformatics, 21, 483491.

Lifeflow Brainwaves
No ratings yet
Lifeflow Brainwaves
3 pages
Pharmacology: Autonomic Nervous System
100% (1)
Pharmacology: Autonomic Nervous System
140 pages
Early Warning Score Chart PDF
No ratings yet
Early Warning Score Chart PDF
2 pages
2nd Year MBBS Syllabus Outline: Block - 4,5 and 6
No ratings yet
2nd Year MBBS Syllabus Outline: Block - 4,5 and 6
6 pages
The Five Animal Frolics
100% (4)
The Five Animal Frolics
113 pages
Machine Learning Based Prediction Methods in Bioinformatics
No ratings yet
Machine Learning Based Prediction Methods in Bioinformatics
34 pages
42 Han 2012 BMCBioinformatics
No ratings yet
42 Han 2012 BMCBioinformatics
10 pages
Deneke17 SciRepo MachineLearningPathogenDetection
No ratings yet
Deneke17 SciRepo MachineLearningPathogenDetection
13 pages
Viral Host Prediction With Deep Learning
No ratings yet
Viral Host Prediction With Deep Learning
18 pages
sabmple
No ratings yet
sabmple
16 pages
applsci-13-05106
No ratings yet
applsci-13-05106
17 pages
Systems Biology, Bioinformatics and Livestock Science
From Everand
Systems Biology, Bioinformatics and Livestock Science
Anupam Nath Jha
No ratings yet
Keshav 22PIM3722
No ratings yet
Keshav 22PIM3722
31 pages
benchmarking-protein-structure-predictors-to-assist-machine-learning-guided-peptide-discovery
No ratings yet
benchmarking-protein-structure-predictors-to-assist-machine-learning-guided-peptide-discovery
24 pages
rep16
No ratings yet
rep16
17 pages
SVM in Bioinformatics: Understandin G
No ratings yet
SVM in Bioinformatics: Understandin G
49 pages
PSSM SVM
No ratings yet
PSSM SVM
8 pages
AI4AVP
No ratings yet
AI4AVP
5 pages
Protein Classification Using Hybrid Feature Selection Technique
No ratings yet
Protein Classification Using Hybrid Feature Selection Technique
9 pages
Dipeptide 2
No ratings yet
Dipeptide 2
4 pages
TOPIC 3 AMP0 - Species-Specific Prediction of Anti-Microbial Peptides Using Zero and Few Shot Learning
No ratings yet
TOPIC 3 AMP0 - Species-Specific Prediction of Anti-Microbial Peptides Using Zero and Few Shot Learning
9 pages
Yukgehnaish Etal PhageLeads
No ratings yet
Yukgehnaish Etal PhageLeads
16 pages
NeurIPS-2023-predicting-mutational-effects-on-protein-protein-binding-via-a-side-chain-diffusion-probabilistic-model-Paper-Conference
No ratings yet
NeurIPS-2023-predicting-mutational-effects-on-protein-protein-binding-via-a-side-chain-diffusion-probabilistic-model-Paper-Conference
12 pages
Procascamc00016 0140
No ratings yet
Procascamc00016 0140
5 pages
A Strategy To Select Suitable Physicochemical Attributes of Amino Acids For Protein Fold Recognition
No ratings yet
A Strategy To Select Suitable Physicochemical Attributes of Amino Acids For Protein Fold Recognition
11 pages
Prediction of Protein Sub-Cellular: Localization Through Weighted Combination of Classifiers
No ratings yet
Prediction of Protein Sub-Cellular: Localization Through Weighted Combination of Classifiers
6 pages
Enhancing sepsis detection using feed-forward neural networks with hyperparameter tuning techniques
No ratings yet
Enhancing sepsis detection using feed-forward neural networks with hyperparameter tuning techniques
8 pages
Harumy 2020 J. Phys. Conf. Ser. 1566 012019 PDF
No ratings yet
Harumy 2020 J. Phys. Conf. Ser. 1566 012019 PDF
8 pages
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Vibrant Automated Recovery Annotation and Curation of Microbial Viruses
No ratings yet
Vibrant Automated Recovery Annotation and Curation of Microbial Viruses
23 pages
Prediction of betaturns with learning machines
No ratings yet
Prediction of betaturns with learning machines
5 pages
11 Chapter.4
No ratings yet
11 Chapter.4
26 pages
oxford_Nie_2024
No ratings yet
oxford_Nie_2024
14 pages
1548 6066 1 PB
No ratings yet
1548 6066 1 PB
10 pages
2014 - A word of caution about biological inference - Revisiting cysteine covalent state predictions
No ratings yet
2014 - A word of caution about biological inference - Revisiting cysteine covalent state predictions
5 pages
Prediction_of_protein_tertiary_structural_classes_based_on_ensemble_learning
No ratings yet
Prediction_of_protein_tertiary_structural_classes_based_on_ensemble_learning
4 pages
Machine learning in early drug discovery
No ratings yet
Machine learning in early drug discovery
2 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
2023.11.16.565910v4.full
No ratings yet
2023.11.16.565910v4.full
39 pages
Logical Modeling of Biological Systems
From Everand
Logical Modeling of Biological Systems
Luis Fariñas del Cerro
No ratings yet
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
No ratings yet
A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction
11 pages
Bioinformatics 2007 Dyer I159 66
No ratings yet
Bioinformatics 2007 Dyer I159 66
8 pages
4
No ratings yet
4
7 pages
A Sequence Based Multiple Kernel Model For Identifying DNA Binding Proteins
No ratings yet
A Sequence Based Multiple Kernel Model For Identifying DNA Binding Proteins
17 pages
2021.10.25.465658.full
No ratings yet
2021.10.25.465658.full
12 pages
Prediction of Protein Subcellular Locations by Combining K Local Hyperplane Distance Nearest Neighbor 1st Edition by Hong Liu, Haodi Feng, Daming Zhu 9783540738701instant download
No ratings yet
Prediction of Protein Subcellular Locations by Combining K Local Hyperplane Distance Nearest Neighbor 1st Edition by Hong Liu, Haodi Feng, Daming Zhu 9783540738701instant download
46 pages
Khan
No ratings yet
Khan
12 pages
ViroNia--LSTM-based-proteomics-model-for-precis_2025_Computers-in-Biology-an
No ratings yet
ViroNia--LSTM-based-proteomics-model-for-precis_2025_Computers-in-Biology-an
12 pages
IJSC
No ratings yet
IJSC
11 pages
Microbial Diagnosis
No ratings yet
Microbial Diagnosis
57 pages
08 Chapter4
No ratings yet
08 Chapter4
11 pages
j.bbagen.2020.129545
No ratings yet
j.bbagen.2020.129545
18 pages
Virulence Factors of Pathogenic Bacteria PDF
No ratings yet
Virulence Factors of Pathogenic Bacteria PDF
9 pages
2021, Transfer Learning Via Multi-Scale Convolutional Neural Layers For Human-Virus Protein-Protein Interaction Prediction
No ratings yet
2021, Transfer Learning Via Multi-Scale Convolutional Neural Layers For Human-Virus Protein-Protein Interaction Prediction
8 pages
(Computational Biology Series) Dmitrij Frishman, Manja Marz - Virus Bioinformatic-CRC Press (2021)
No ratings yet
(Computational Biology Series) Dmitrij Frishman, Manja Marz - Virus Bioinformatic-CRC Press (2021)
297 pages
Protein Stability Prediction-16
No ratings yet
Protein Stability Prediction-16
68 pages
An Optimal Feature Selection With Neural Network-Based Classification Model For Dengue Fever Predicti
No ratings yet
An Optimal Feature Selection With Neural Network-Based Classification Model For Dengue Fever Predicti
1 page
Islamic Studies Subjective-2017
No ratings yet
Islamic Studies Subjective-2017
7 pages
Predicting rRNA-, RNA-, and DNA-binding Proteins From Primary Structure With Support Vector Machines
No ratings yet
Predicting rRNA-, RNA-, and DNA-binding Proteins From Primary Structure With Support Vector Machines
10 pages
Virology Journal: Predicting The Subcellular Localization of Viral Proteins Within A Mammalian Host Cell
No ratings yet
Virology Journal: Predicting The Subcellular Localization of Viral Proteins Within A Mammalian Host Cell
8 pages
2402.04845v2
No ratings yet
2402.04845v2
27 pages
06 Chapter2
No ratings yet
06 Chapter2
24 pages
A semi-empirical method for prediction of antigenic dete~inants on_kolaskar1990
No ratings yet
A semi-empirical method for prediction of antigenic dete~inants on_kolaskar1990
3 pages
TMP 96 ED
No ratings yet
TMP 96 ED
4 pages
Measuring Similarity
No ratings yet
Measuring Similarity
11 pages
Carte Anatomie Partea1 - EN
100% (1)
Carte Anatomie Partea1 - EN
121 pages
02 February 2022
No ratings yet
02 February 2022
7 pages
Anatomy GI Tract
No ratings yet
Anatomy GI Tract
3 pages
Someya
100% (1)
Someya
10 pages
BMI 2marks
No ratings yet
BMI 2marks
8 pages
Ap Labs
No ratings yet
Ap Labs
11 pages
Hypertension: Hozan Jaza MSC Clinical Pharmacy College of Pharmacy 10/12/2020
No ratings yet
Hypertension: Hozan Jaza MSC Clinical Pharmacy College of Pharmacy 10/12/2020
81 pages
Responsi Kasus Sertifikat Kematian Umum 2019 - 2020
No ratings yet
Responsi Kasus Sertifikat Kematian Umum 2019 - 2020
43 pages
Concept Strengthening Sheet (CSS-05) Based On AIATS-05 (CF+OYM) - Botany
No ratings yet
Concept Strengthening Sheet (CSS-05) Based On AIATS-05 (CF+OYM) - Botany
4 pages
Latihan Mitosis
100% (1)
Latihan Mitosis
7 pages
Greater Sciatic Notch
No ratings yet
Greater Sciatic Notch
1 page
OB Answer
No ratings yet
OB Answer
2 pages
P1 Bio Unit 5 - Respiration - The Respiratory System
No ratings yet
P1 Bio Unit 5 - Respiration - The Respiratory System
89 pages
7th ICSE BIO TEST-2
100% (1)
7th ICSE BIO TEST-2
1 page
I. Acknowledgement: Metropolitan Hospital College of Nursing # 1357 G. Masangkay Cor. Mayhaligue, Sta. Cruz, Manila
No ratings yet
I. Acknowledgement: Metropolitan Hospital College of Nursing # 1357 G. Masangkay Cor. Mayhaligue, Sta. Cruz, Manila
32 pages
Aldin LP Final
100% (1)
Aldin LP Final
13 pages
Coturnix
No ratings yet
Coturnix
24 pages
EKG Interpretive Skills
100% (1)
EKG Interpretive Skills
6 pages
Pa Tho Physiology of Angina Pectoris
No ratings yet
Pa Tho Physiology of Angina Pectoris
2 pages
The Method of Bronnikov
40% (5)
The Method of Bronnikov
9 pages
Cover Trauma Thorax
No ratings yet
Cover Trauma Thorax
1 page
Bones of The Skulll
No ratings yet
Bones of The Skulll
9 pages
(Ebook) Cytokine and Chemokine Networks in Cancer by Manzoor Ahmad Mir ISBN 9789819946563, 9789819946570, 9819946565, 9819946573 - Quickly download the ebook to explore the full content
No ratings yet
(Ebook) Cytokine and Chemokine Networks in Cancer by Manzoor Ahmad Mir ISBN 9789819946563, 9789819946570, 9819946565, 9819946573 - Quickly download the ebook to explore the full content
79 pages
Pharma Midterm
No ratings yet
Pharma Midterm
24 pages
Biology Ssc Study Material 2024-2025
No ratings yet
Biology Ssc Study Material 2024-2025
48 pages

An Ensemble of Support Vector Machines For Predicting Virulent Proteins

Uploaded by

An Ensemble of Support Vector Machines For Predicting Virulent Proteins

Uploaded by

Expert Systems with Applications 36 (2009) 74587462

Contents lists available at ScienceDirect

Expert Systems with Applications

An ensemble of support vector machines for predicting virulent proteins

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 74587462

Fig. 2. Scheme of a genetic algorithm (from http://www.nitrogen.za.org/ viewtutorial.asp?id=4).

0.26 0.25 0.24 0.23 SA RS GA 2-Gram

0.22 0.21 0.2 0.19

0.206 0.204 0.202 0.2 0.198 0.196 0.194 0.192 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Implemented as in DDtool 0.95 Matlab Toolbox.

L. Nanni, A. Lumini / Expert Systems with Applications 36 (2009) 74587462

0.145 0.14 0.135 0.13 0.125 GA 2-Gram

0.12 0.115 0.11 0.105 0.1 1 2 3 4 5 6 7 8 9 10

0.249 0.248 0.247 0.246 GA 2-Gram

0.245 0.244 0.243 0.242 0.241 0.24 1 2 3 4 5 6

You might also like