Automated alphabet reduction with evolutionary algorithms for protein structure prediction

May 16, 20072 likes813 views

This paper focuses on automated procedures to reduce the dimensionality of protein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits of this procedure are faster and easier learning process as well as the generation of more compact and human-readable classifiers. The dimensionality reduction procedure we propose consists on the reduction of the 20-letter amino acid (AA) alphabet, which is normally used to specify a protein sequence, into a lower cardinality alphabet. This reduction comes about by a clustering of AA types accordingly to their physical and chemical similarity. Our automated reduction procedure is guided by a fitness function based on the Mutual Information between the AA-based input attributes of the dataset and the protein structure feature that being predicted. To search for the optimal reduction, the Extended Compact Genetic Algorithm (ECGA) was used, and afterwards the results of this process were fed into (and validated by) BioHEL, a genetics-based machine learning technique. BioHEL used the reduced alphabet to induce rules for protein structure prediction features. BioHEL results are compared to two standard machine learning systems. Our results show that it is possible to reduce the size of the alphabet used for prediction from twenty to just three letters resulting in more compact, i.e. interpretable, rules. Also, a protein-wise accuracy performance measure suggests that the loss of accuracy accrued by this substantial alphabet reduction is not statistically significant when compared to the full alphabet.

Automated alphabet reduction with evolutionary algorithms for protein structure prediction

This paper analyzes the relative advantages between crossover and mutation on a class of deterministic and stochastic additively separable problems with substructures of non-uniform salience. This study assumes that the recombination and mutation operators have the knowledge of the building blocks (BBs) and effectively exchange or search among competing BBs. Facetwise models of convergence time and population sizing have been used to determine the scalability of each algorithm. The analysis shows that for deterministic exponentially-scaled additively separable, problems, the BB-wise mutation is more efficient than crossover yielding a speedup of Θ(l logl), where l is the problem size. For the noisy exponentially-scaled problems, the outcome depends on whether scaling on noise is dominant. When scaling dominates, mutation is more efficient than crossover yielding a speedup of Θ(l logl). On the other hand, when noise dominates, crossover is more efficient than mutation yielding a speedup of Θ(l).

Modeling selection pressure in XCS for proportionate and tournament selectionkknsastry

In this paper, we derive models of the selection pressure in XCS for proportionate (roulette wheel) selection and tournament selection. We show that these models can explain the empirical results that have been previously presented in the literature. We validate the models on simple problems showing that, (i) when the model assumptions hold, the theory perfectly matches the empirical evidence; (ii) when the model assumptions do not hold, the theory can still provide qualitative explanations of the experimental results.

Modeling XCS in class imbalances: Population size and parameter settingskknsastry

This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially. The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.

Analyzing probabilistic models in hierarchical BOA on traps and spin glasseskknsastry

The hierarchical Bayesian optimization algorithm (hBOA) can solve nearly decomposable and hierarchical problems of bounded difficulty in a robust and scalable manner by building and sampling probabilistic models of promising solutions. This paper analyzes probabilistic models in hBOA on two common test problems: concatenated traps and 2D Ising spin glasses with periodic boundary conditions. We argue that although Bayesian networks with local structures can encode complex probability distributions, analyzing these models in hBOA is relatively straightforward and the results of such analyses may provide practitioners with useful information about their problems. The results show that the probabilistic models in hBOA closely correspond to the structure of the underlying optimization problem, the models do not change significantly in subsequent iterations of BOA, and creating adequate probabilistic models by hand is not straightforward even with complete knowledge of the optimization problem.

cec01.doc.docbutest

This document presents a genetic algorithm approach for learning classification rules from data. The key aspects of the approach are: 1. Binary encoding is used to represent classification rules, with bits indicating attribute values. Rule consequents are determined by the majority class of training examples matched. 2. The fitness function considers error rate, entropy measure, rule consistency, and hole ratio to evaluate rule sets. Error rate measures accuracy, entropy favors homogeneous rule matches, consistency penalizes ambiguous rules, and hole ratio measures coverage. 3. Adaptive asymmetric mutation is applied, with the mutation probability self-adjusting during the algorithm run. Crossover also utilizes two-point crossover of rules. 4. The approach is

B017410916IOSR Journals

This document describes a genetic algorithm approach for automatically generating fuzzy rules for classification problems. The genetic algorithm uses rule importance as the fitness criteria, calculated as the rule's support for each class. The algorithm encodes rules using fuzzy membership set numbers for antecedents and consequents. It iterates for a set number of generations or until a minimum number of rules are fired, selecting high-fitness rules to generate offspring via crossover. The offspring replace lower-fitness rules, and the new rule population is evaluated in the next generation. The approach aims to consistently generate optimal fuzzy rules for classification using genetic search.

Arabidopsis thaliana Inspired Genetic Restoration StrategiesCSCJournals

A controversial genetic restoration mechanism has been proposed for the model organism Arabidopsis thaliana. This theory proposes that genetic material from non-parental ancestors is used to restore genetic information that was inadvertently corrupted during reproduction. We evaluate the effectiveness of this strategy by adapting it to an evolutionary algorithm solving two distinct benchmark optimization problems. We compare the performance of the proposed strategy with a number of alternate strategies – including the Mendelian alternative. Included in this comparison are a number of biologically implausible templates that help elucidate likely reasons for the relative performance of the different templates. Results show that the proposed non- Mendelian restoration strategy is highly effective across the range of conditions investigated – significantly outperforming the Mendelian alternative in almost every situation.

F043046054inventy

The document describes using a genetic algorithm to find the maximum values of single-variable functions. It discusses: 1) How genetic algorithms work by simulating biological evolution to optimize solutions. 2) Testing the genetic algorithm on continuous and non-continuous functions that are difficult to optimize with traditional methods, such as multimodal, non-differentiable functions. 3) The genetic algorithm was able to find maximum values close to the real maximum for complex test functions, demonstrating its effectiveness at optimizing these difficult single-variable functions.

Jcb 2005-12-1103Farah Diba

The document summarizes a study that uses an information-based similarity index to classify the SARS coronavirus. Key points: 1) The study develops a novel alignment-free method to measure genetic sequence similarity based on word frequencies and information content. 2) The method is first validated on human influenza and mitochondrial DNA, correctly reconstructing known phylogenies. 3) The method is then applied to classify SARS coronavirus, finding it is most closely related to group 1 coronaviruses, with some matches to groups 2 and 3. 4) The information-based similarity index provides a new tool for large-scale genomic analysis without sequence alignment.

Exploring the Solution Space of Sorting by Reversals: A New ApproachIDES Editor

Analysing genome rearrangements is a problem from the vast domain of comparative genomics and computational biology. Several studies have shown that closely related species have essentially the same set of genes however their gene orders differ. The differences in the gene order are the results of various large-scale evolutionary events of which reversal is the most common rearrangement event. The problem of finding the shortest sequence of reversals that can transform one genome into another is called the sorting by reversals problem. The length of such a sequence is the reversal distance between the two genomes. In comparative genomics, sorting by reversals algorithms are often used to propose evolutionary scenarios of large-scale genomic mutations between species. Following the first polynomial time solution of this problem, several improvements has been published on the subject. In 2008, Braga et al. proposed an algorithm to perform the enumeration of traces that sort a signed permutation by reversals. This algorithm has exponential complexity in both time and space. To efficiently handle the traces, Baudet and Dias proposed a depth first approach in 2010. However, one of the limitations of the proposed algorithm was that it cannot provide the count of number of solutions in each trace. In this paper we are presenting an algorithm to list the normal forms of each trace in depth first manner and provide count of the total number of solutions in the solution space.

Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...ijtsrd

Advancement in science and technology has brought a remarkable change in the field of drug discovery. Earlier it was very difficult to predict the target for receptor but nowadays, it is easy and robust task to dock the target protein with ligand and binding affinity is calculated. Docking helps in the virtual screening of drug along with its hit identification. There are two approaches through which docking can be carried out, shape complementary and stimulation approach. There are many procedures involved in carrying out docking and all require different software's and algorithms. Molecular docking serves as a good platform to screen a large number of ligands and is useful in Drug-DNA studies. This review mainly focuses on the general idea of molecular docking and discusses its major applications, different types of interaction involved and types of docking. Rishabh Jain "Review on Computational Bioinformatics and Molecular Modelling: Novel Tool for Drug Discovery" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-1 , December 2018, URL: http://www.ijtsrd.com/papers/ijtsrd18914.pdf http://www.ijtsrd.com/pharmacy/pharmacoinformatics/18914/review-on-computational-bioinformatics-and-molecular-modelling-novel-tool-for-drug-discovery/rishabh-jain

Using STELLA to Explore Dynamic Single Species Models: The Magic of Making Hu...Lisa Jensen

The use of formal, mathematical models allows stakeholders, decision makers and scientists to better visualize interactions and relationships within ecological systems. This study uses STELLA, a modeling tool, to simulate simple population dynamics for the humpback whale (Megaptera novaengliae) to better understand the impacts of reproductive and mortality rates as well as alternative solution algorithms used to drive the model. A wide range of population dynamics occurred as a result of varying time increments for calculating populations and use of available solution algorithms. Populations are most likely to achieve equilibrium when reproduction and mortality result in approximately the same number of individuals.

ComparisonTomasz Waszczyk

The document compares five evolutionary optimization algorithms: genetic algorithms, memetic algorithms, particle swarm optimization, ant colony systems, and shuffled frog leaping. It provides a brief description of each algorithm, including how they are inspired by natural processes and behaviors. It also includes pseudocode to facilitate implementing each algorithm. The document then presents benchmark comparisons of the five algorithms on continuous and discrete optimization problems in terms of processing time, convergence speed, and solution quality. It discusses the performance of evolutionary algorithms and provides guidelines for determining the best parameters for each.

ORDINARY LEAST SQUARES REGRESSION OF ORDERED CATEGORICAL DATA- INBeth Larrabee

- The document describes a simulation study that evaluated the performance of ordinary least squares regression (OLSLR) for analyzing ordered categorical response (OCR) variables. - Across different frequency distributions and numbers of categories for the OCR, the empirical type I error rate for OLSLR was close to the nominal 0.05 level. - Empirical power for OLSLR increased as the number of categories in the OCR increased, but this trend slowed for OCRs with 5 or more categories. For most scenarios, OLSLR power was similar to probit regression power.

Trust Enhanced Role Based Access Control Using Genetic Algorithm IJECEIAES

Improvements in technological innovations have become a boon for business organizations, firms, institutions, etc. System applications are being developed for organizations whether small-scale or large-scale. Taking into consideration the hierarchical nature of large organizations, security is an important factor which needs to be taken into account. For any healthcare organization, maintaining the confidentiality and integrity of the patients’ records is of utmost importance while ensuring that they are only available to the authorized personnel. The paper discusses the technique of Role-Based Access Control (RBAC) and its different aspects. The paper also suggests a trust enhanced model of RBAC implemented with selection and mutation only ‘Genetic Algorithm’. A practical scenario involving healthcare organization has also been considered. A model has been developed to consider the policies of different health departments and how it affects the permissions of a particular role. The purpose of the algorithm is to allocate tasks for every employee in an automated manner and ensures that they are not over-burdened with the work assigned. In addition, the trust records of the employees ensure that malicious users do not gain access to confidential patient data.

Association mapping Preeti Kapoor

This document summarizes an association mapping study of seed oil and protein contents in upland cotton. 180 cotton accessions were genotyped using 228 SSR markers and phenotyped for oil and protein content over multiple locations and years. Population structure analysis identified two subpopulations. Association analysis identified 86 marker-trait associations between 58 SSR markers and the two traits, with 15 and 12 markers associated with oil and protein content respectively. 18 markers were significantly associated with the traits in more than one environment, with 9 markers associated with both oil and protein content simultaneously and stably across locations.

Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO csandit

1. The document proposes using a particle swarm optimization (PSO) algorithm to design stable drug molecules that minimize interaction energy with target proteins. 2. In the algorithm, drugs are represented as variable-length trees containing functional groups, and PSO is used to optimize van der Waals and electrostatic interaction energies. 3. Results show that PSO performs better than previous fixed-length tree methods at designing drugs that stably bind to active sites of human rhinovirus, malaria, and HIV proteins.

Comparison of Biological Significance of Biclusters of SIMBIC and SIMBIC+ Bic...IDES Editor

Query driven Biclustering Model refers to the problem of extracting biclusters based on a query gene or query condition. The extracted biclusters consist of a set of genes and a subset of conditions that are similar to the query gene or query condition and it includes the query input also. Two approaches applied for biclustering problems are topdown and bottom-up, based on how they tackle the problems. Top-down techniques [3, 4] start with the entire gene expression matrix and iteratively partition it into smaller sub-matrices. On the other hand, bottom-up approach starts with a randomly chosen set of biclusters that are iteratively modified, usually enlarged, until no local improvement is possible. In this paper, the biological significance of biclusters extracted using two query driven models viz SIMBIC and SIMBIC+ are compared.This paper is organized as follows. Section 2 analyzes the popular MSB algorithm and section 3 introduces an improved version of MSB namely SIMBIC model and the enhanced model of SIMBIC namely SIMBIC+ is presented in section 4. The experimental analysis and the biological significance are illustrated in section 5.

outputDenis Malyshev

This document is Denis Malyshev's PhD thesis presented to The Scripps Research Institute in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Chemical Biology. The thesis is dedicated to Sergey Semenov, the principal of the Moscow Chemical Lyceum and Malyshev's teacher. It acknowledges the contributions of his research advisor Floyd Romesberg and many colleagues who assisted with his research projects, which focused on expanding the genetic alphabet by developing unnatural base pairs that can be replicated in vitro and in vivo.

Jessica torresanaelishockey

This document discusses incorporating unnatural amino acids into proteins to expand their functions. It describes how pyrrolysine is used as an orthogonal system to incorporate unnatural amino acids specified by an amber stop codon. Various unnatural amino acids containing alkene and norbornene functional groups were successfully incorporated into GFP and other proteins in E. coli and mammalian cells. These modified proteins could then be site-specifically labeled using bioorthogonal reactions like the thiol-ene and Diels-Alder reactions. This allows proteins to be labeled and studied in live cells with new functions.

Xenobiologyaskhambhati

Genetic code2piya1apiya

1. The genetic code is composed of nucleotide triplets called codons that specify individual amino acids. 2. Experiments confirmed that the genetic code is a triplet code and that each codon corresponds to a specific amino acid, with some codons coding for the same amino acid (degenerate). 3. Key properties of the genetic code include it being triplet-based, non-overlapping, unambiguous, degenerate, and nearly universal across organisms.

DNA new alphabetArijit Shome

Toward the expansion of the genetic alphabet of DNA, several artificial third base pairs (unnatural base pairs) have been created. Organisms are defined by the information encoded in their genomes, and since the origin of life this information has been encoded using a two-base-pair genetic alphabet (A–T and G–C). In vitro, the alphabet have been expanded to include several unnatural base pairs (UBPs). A class of UBPs formed between nucleotides bearing hydrophobic nucleobases, exemplified by the pair formed between d5SICS and dNaM (d5SICS–dNaM) was developed, which is efficiently PCR-amplified and transcribed in vitro, and whose unique mechanism of replication has been characterized. However, expansion of an organism’s genetic alphabet presents new and unprecedented challenges: the unnatural nucleoside triphosphates must be available inside the cell; endogenous polymerases must be able to use the unnatural triphosphates to faithfully replicate DNA containing the UBP within the complex cellular milieu; and finally, the UBP must be stable in the presence of pathways that maintain the integrity of DNA. In a major breakthrough, it was reported that an exogenously expressed algal nucleotide triphosphate transporter efficiently imports the triphosphates of both d5SICS and dNaM (d5SICSTP and dNaMTP) into Escherichia coli, and that the endogenous replication machinery uses them to accurately replicate a plasmid containing d5SICS–dNaM was already reported. Neither the presence of the unnatural triphosphates nor the replication of the UBP introduces a notable growth burden. Thus, the resulting bacterium is the first semi-synthetic organism to propagate stably an expanded genetic alphabet. The unnatural base pair systems now have high potential to open the door to next generation biotechnology.

The French Revolution of 1789Tom Richey

Criterion based Two Dimensional Protein Folding Using Extended GA IJCSEIT Journal

In the dynamite field of biological and protein research, the protein fold recognition for long pattern protein sequences is a great confrontation for many years. With that consideration, this paper contributes to the protein folding research field and presents a novel procedure for mapping appropriate protein structure to its correct 2D fold by a concrete model using swarm intelligence. Moreover, the model incorporates Extended Genetic Algorithm (EGA) with concealed Markov model (CMM) for effectively folding the protein sequences that are having long chain lengths. The protein sequences are preprocessed, classified and then, analyzed with some parameters (criterion) such as fitness, similarity and sequence gaps for optimal formation of protein structures. Fitness correlation is evaluated for the determination of bonding strength of molecules, thereby involves in efficient fold recognition task. Experimental results have shown that the proposed method is more adept in 2D protein folding and outperforms the existing algorithms.

Protein structure prediction by meansijaia

Mining frequent pattern is a NP-hard problem and has become a hot topic in recent researches. Moreover, protein dataset contains distinct Pattern that can be used in many areas such as drug discovery, disease prediction, etc. In early decades, pattern discovery and protein fold recognition was determined by biophysics and biochemistry approach; and X-ray and NMR have been used for protein structure prediction which are very expensive and time consuming while, a mathematical approach can reduce the cost of such laboratory experiments. Many computer based tests have been applied for the protein fold detection such as graph based algorithms and data mining viewpoints like classification or clustering, and all have their advantages and drawbacks. Pattern matching in protein sequential dataset for fold recognition plays a meaningful role in the field of bioinformatics since it evolved prediction of unknown protein function. There are lots of pattern recognition algorithms but in this work we used PrefixSpan. The reason of selecting this algorithm will be discussed below in section 2. For evaluating the result of experiments we used SCOPE dataset which is a classified protein dataset and ASTRAL, a discriminative sequential dataset of SCOPE.

A family of global protein shape descriptors using gauss integrals, christian...pfermat

The document proposes a new method for classifying protein structures using Gauss integrals. It discusses current methods for protein classification that have limitations. The proposal focuses on developing a "family of global protein shape descriptors" using concepts from knot theory, including the writhing number. It aims to provide a fully automated, efficient method for protein structure comparison that overcomes current method limitations.

Research Inventy : International Journal of Engineering and Scienceresearchinventy

Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem

process in which instead comparing whole query sequence with database sequence it breaks query sequence into small words and these words are used to align patterns. it uses heuristic method which make it faster than earlier smith-waterman algorithm. But due small query sequence used for align in case of very large database with complex queries it may perform poor. To remove this draw back we suggest by using MSA tools which can filter database in by removing unnecessary sequences from data. This sorted data set then applies to BLAST which can then indentify relationship among them i.e. HOMOLOGS, ORTHOLOGS, PARALOGS. The proposed system can be further use to find relation among two persons or used to create family tree. Ortholog is interesting for a wide range of bioinformatics analyses, including functional annotation, phylogenetic inference, or genome evolution. This system describes and motivates the algorithm for predicting orthologous relationships among complete genomes. The algorithm takes a pairwise approach, thus neither requiring tree reconstruction nor reconciliation

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL

BLAST is most popular sequence alignment tool used to align bioinformatics patterns. It uses local alignment process in which instead comparing whole query sequence with database sequence it breaks query sequence into small words and these words are used to align patterns. it uses heuristic method which make it faster than earlier smith-waterman algorithm. But due small query sequence used for align in case of very large database with complex queries it may perform poor. To remove this draw back we suggest by using MSA tools which can filter database in by removing unnecessary sequences from data. This sorted data set then applies to BLAST which can then indentify relationship among them i.e. HOMOLOGS, ORTHOLOGS, PARALOGS. The proposed system can be further use to find relation among two persons or used to create family tree. Ortholog is interesting for a wide range of bioinformatics analyses, including functional annotation, phylogenetic inference, or genome evolution. This system describes and motivates the algorithm for predicting orthologous relationships among complete genomes. The algorithm takes a pairwise approach, thus neither requiring tree reconstruction nor reconciliation

More Related Content

What's hot (10)

Jcb 2005-12-1103Farah Diba

Exploring the Solution Space of Sorting by Reversals: A New ApproachIDES Editor

Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...ijtsrd

Using STELLA to Explore Dynamic Single Species Models: The Magic of Making Hu...Lisa Jensen

ComparisonTomasz Waszczyk

ORDINARY LEAST SQUARES REGRESSION OF ORDERED CATEGORICAL DATA- INBeth Larrabee

Trust Enhanced Role Based Access Control Using Genetic Algorithm IJECEIAES

Association mapping Preeti Kapoor

Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO csandit

Comparison of Biological Significance of Biclusters of SIMBIC and SIMBIC+ Bic...IDES Editor

Jcb 2005-12-1103Farah Diba

Exploring the Solution Space of Sorting by Reversals: A New ApproachIDES Editor

Review on Computational Bioinformatics and Molecular Modelling Novel Tool for...ijtsrd

Using STELLA to Explore Dynamic Single Species Models: The Magic of Making Hu...Lisa Jensen

ComparisonTomasz Waszczyk

ORDINARY LEAST SQUARES REGRESSION OF ORDERED CATEGORICAL DATA- INBeth Larrabee

Trust Enhanced Role Based Access Control Using Genetic Algorithm IJECEIAES

Association mapping Preeti Kapoor

Stable Drug Designing by Minimizing Drug Protein Interaction Energy Using PSO csandit

Comparison of Biological Significance of Biclusters of SIMBIC and SIMBIC+ Bic...IDES Editor

Viewers also liked (6)

outputDenis Malyshev

Jessica torresanaelishockey

Xenobiologyaskhambhati

Genetic code2piya1apiya

DNA new alphabetArijit Shome

The French Revolution of 1789Tom Richey

outputDenis Malyshev

Jessica torresanaelishockey

Xenobiologyaskhambhati

Genetic code2piya1apiya

DNA new alphabetArijit Shome

The French Revolution of 1789Tom Richey

Similar to Automated alphabet reduction with evolutionary algorithms for protein structure prediction (20)

Criterion based Two Dimensional Protein Folding Using Extended GA IJCSEIT Journal

Protein structure prediction by meansijaia

A family of global protein shape descriptors using gauss integrals, christian...pfermat

Research Inventy : International Journal of Engineering and Scienceresearchinventy

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL

Qsar Studies on Gallic Acid Derivatives and Molecular Docking Studies of Bace...bioejjournal

It is reported that Alzheimer disease is linked with hypertension, diabetes type 2 and high cholesterolemia. The underlying genetic cause relating these diseases are not well studied clinically. But it has been widely accepted that beta secretase (BACE1) is the main culprit of causing Alzheimer disease. This enzyme comes under peptidase A1 family. In the present work, ligand based and structure based drug designing have been reported. QSAR studies were done using 21 gallic acid derivatives dataset to develop good predictive model in order to predict biological activity and certain descriptors was reported to further enhance the analgesic activity of gallic acid derivatives. Molecular docking studies were performed in order to find structure based drug design. Two natural gallic acid derivative have been repoted as a potent inhibitor to beta secretase enzyme.

Qsar studies on gallic acid derivatives and molecular docking studies of bace...bioejjournal

Qsar Studies on Gallic Acid Derivatives and Molecular Docking Studies of Bace...bioejjournal

Principal component analysis (PCA) to analyze dataTakmamama

Protein structure prediction and classification.pptxDr Vardhana Janakiraman, VISTAS

Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody

This summarizes a document describing research using machine learning to classify protein helix capping motifs. The researchers: 1) Used structural data from protein databases and helix cap classifications to train machine learning models, including bidirectional LSTM and SVC models, to predict helix cap positions in proteins. 2) Engineered features for the models including backbone torsion angles, residue properties, and additional physicochemical descriptors. 3) Evaluated the models using accuracy, balanced accuracy, and F1 score since the dataset was imbalanced between cap and non-cap residues. 4) Achieved 85% balanced accuracy classifying helix caps using a deep bidirectional LSTM model, offering an objective way to classify this important

Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...International Journal of Engineering Inventions www.ijeijournal.com

This paper presents a literature survey conducted for research oriented developments made till. The significance of this paper would be to provide a deep rooted understanding and knowledge transfer regarding existing approaches for gene sequencing and alignments using Smith Waterman algorithms and their respective strengths and weaknesses. In order to develop or perform any quality research it is always advised to conduct research goal oriented literature survey that could facilitate an in depth understanding of research work and an objective can be formulated on the basis of gaps existing between present requirements and existing approaches. Gene sequencing problems are one of the predominant issues for researchers to come up with optimized system model that could facilitate optimum processing and efficiency without introducing overheads in terms of memory and time. This research is oriented towards developing such kind of system while taking into consideration of dynamic programming approach called Smith Waterman algorithm in its enhanced form decorated with other supporting and optimized techniques. This paper provides an introduction oriented knowledge transfer so as to provide a brief introduction of research domain, research gap and motivations, objective formulated and proposed systems to accomplish ultimate objectives.

Thesis defJay Vyas

Cadd and molecular modeling for M.PharmShikha Popali

Stock markets and_human_genomicsShyam Sarkar

The document proposes a new approach to compare stock market patterns to DNA sequences using compression techniques. Stock market data is converted to binary sequences representing increases and decreases, which are then encoded into DNA nucleotides. These nucleotide sequences are divided and matched against human genome sequences using BLAST. The analysis found certain sub-sequences of the stock market patterns matched 100% to the human genome, suggesting this approach could potentially predict stock market behavior.

Delineation of techniques to implement on the enhanced proposed model using d...IJDMS

In post genomic era with the advent of new technologies a huge amount of complex molecular data are generated with high throughput. The management of this biological data is definitely a challenging task due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its inventory success. Discovering new knowledge from the biological data is a major challenge in data mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram encoding method, neural network model and rough set classifier in a single model and in an appropriate place can enhance the quality of the classification system .Thus the primary challenge is to identify and classify the large protein sequences in a very fast and easy but intellectual way to decrease the time complexity and space complexity.

Main bioinfomatics alignment tools.pptxkhadijarafiq2012

Determining stable ligand orientationijaia

Each and every biological function in living organism occurs due to protein-protein interactions. The diseases are no exception to this. Identifying one or more proteins for a particular disease and then designing a suitable chemical compound (which is known as drug or ligand) to destroy those proteins is a challenging topic of research in computational biology. In earlier methods, drugs were designed using only a few chemical components and were represented as a fixed-length tree. But in reality, a drug contains many chemical groups collectively known as pharmacophore. Moreover, the chemical length of the drug cannot be determined before designing that drug. In the present work, a Particle Swarm Optimization (PSO) based methodology has been proposed to find out a suitable drug for a particular disease so that the drug-target protein interaction energy becomes minimum. In the proposed algorithm, the drug is represented as a variable length tree and essential functional groups are arranged in different positions of that drug. Finally, the structure of the drug is obtained and its docking energy is minimized simultaneously. Also, the orientation of chemical groups in the drug is tested so that it can bind to a particular active site of a target protein and the drug fits well inside the active site of target protein. Here, several inter-molecular forces have been considered for accuracy of the docking energy. Results are demonstrated for three different target proteins both numerically and pictorially. Results show that PSO performs better than the earlier methods.

Criterion based Two Dimensional Protein Folding Using Extended GA IJCSEIT Journal

Protein structure prediction by meansijaia

A family of global protein shape descriptors using gauss integrals, christian...pfermat

Research Inventy : International Journal of Engineering and Scienceresearchinventy

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...journal ijrtem

Performance Improvement of BLAST with Use of MSA Techniques to Search Ancesto...IJRTEMJOURNAL

Qsar Studies on Gallic Acid Derivatives and Molecular Docking Studies of Bace...bioejjournal

Qsar studies on gallic acid derivatives and molecular docking studies of bace...bioejjournal

Qsar Studies on Gallic Acid Derivatives and Molecular Docking Studies of Bace...bioejjournal

Principal component analysis (PCA) to analyze dataTakmamama

Protein structure prediction and classification.pptxDr Vardhana Janakiraman, VISTAS

Deep Learning Meets Biology: How Does a Protein Helix Know Where to Start and...Melissa Moody

Automatic Parallelization for Parallel Architectures Using Smith Waterman Alg...International Journal of Engineering Inventions www.ijeijournal.com

Thesis defJay Vyas

Cadd and molecular modeling for M.PharmShikha Popali

Stock markets and_human_genomicsShyam Sarkar

Delineation of techniques to implement on the enhanced proposed model using d...IJDMS

Main bioinfomatics alignment tools.pptxkhadijarafiq2012

Determining stable ligand orientationijaia

More from kknsastry (20)

Genetic Algorithms and Genetic Programming for Multiscale Modelingkknsastry

Effective and efficient multiscale modeling is essential to advance both the science and synthesis in a wide array of fields such as physics, chemistry, materials science, biology, biotechnology and pharmacology. This study investigates the efficacy and potential of using genetic algorithms for multiscale materials modeling and addresses some of the challenges involved in designing competent algorithms that solve hard problems quickly, reliably and accurately.

Towards billion bit optimization via parallel estimation of distribution algo...kknsastry

The document describes research into using efficient estimation of distribution algorithms (EDAs) like the compact genetic algorithm (cGA) to solve optimization problems involving billions of variables. Key aspects discussed include the cGA's memory and computational efficiency through techniques like parallelization and vectorization. The researchers were able to solve a noisy OneMax problem involving over 33 million variables to optimality and a problem with 1.1 billion variables with relaxed convergence. The document argues this research is important because many real-world problems involving nanotechnology, biology, and information systems require solving optimization problems at massive scales.

Empirical Analysis of ideal recombination on random decomposable problemskknsastry

This paper analyzes the behavior of a selectorecombinative genetic algorithm (GA) with an ideal crossover on a class of random additively decomposable problems (rADPs). Specifically, additively decomposable problems of order k whose subsolution fitnesses are sampled from the standard uniform distribution U[0,1] are analyzed. The scalability of the selectorecombinative GA is investigated for 10,000 rADP instances. The validity of facetwise models in bounding the population size, run duration, and the number of function evaluations required to successfully solve the problems is also verified. Finally, rADP instances that are easiest and most difficult are also investigated.

Population sizing for entropy-based model buliding In genetic algorithms kknsastry

This paper presents a population-sizing model for the entropy-based model building in genetic algorithms. Specifically, the population size required for building an accurate model is investigated. The effect of the selection pressure on population sizing is also incorporated. The proposed model indicates that the population size required for building an accurate model scales as Θ(m log m), where m is the number of substructures and proportional to the problem size. Experiments are conducted to verify the derivations, and the results agree with the proposed model.

Let's get ready to rumble redux: Crossover versus mutation head to head on ex...kknsastry

This paper analyzes the relative advantages between crossover and mutation on a class of deterministic and stochastic additively separable problems with substructures of non-uniform salience. This study assumes that the recombination and mutation operators have the knowledge of the building blocks (BBs) and effectively exchange or search among competing BBs. Facetwise models of convergence time and population sizing have been used to determine the scalability of each algorithm. The analysis shows that for deterministic exponentially-scaled additively separable, problems, the BB-wise mutation is more efficient than crossover yielding a speedup of Θ(l logl), where l is the problem size. For the noisy exponentially-scaled problems, the outcome depends on whether scaling on noise is dominant. When scaling dominates, mutation is more efficient than crossover yielding a speedup of Θ(l logl). On the other hand, when noise dominates, crossover is more efficient than mutation yielding a speedup of Θ(l).

Modeling selection pressure in XCS for proportionate and tournament selectionkknsastry

Modeling XCS in class imbalances: Population sizing and parameter settingskknsastry

Substructrual surrogates for learning decomposable classification problems: i...kknsastry

This paper presents a learning methodology based on a substructural classification model to solve decomposable classification problems. The proposed method consists of three important components: (1) a structural model that represents salient interactions between attributes for a given data, (2) a surrogate model which provides a functional approximation of the output as a function of attributes, and (3) a classification model which predicts the class for new inputs. The structural model is used to infer the functional form of the surrogate and its coefficients are estimated using linear regression methods. The classification model uses a maximally-accurate, least-complex surrogate to predict the output for given inputs. The structural model that yields an optimal classification model is searched using an iterative greedy search heuristic. Results show that the proposed method successfully detects the interacting variables in hierarchical problems, group them in linkages groups, and build maximally accurate classification models. The initial results on non-trivial hierarchical test problems indicate that the proposed method holds promise and have also shed light on several improvements to enhance the capabilities of the proposed method.

Fast and accurate reaction dynamics via multiobjective genetic algorithm opti...kknsastry

The document describes research into using multiobjective genetic algorithms to optimize semiempirical potentials for fast and accurate reaction dynamics simulations. The researchers developed a method to tune semiempirical parameters using a limited set of ab initio calculations to better describe excited state potential energy surfaces. They found that multiobjective optimization was able to find globally accurate potential energy surfaces more efficiently than weighted single-objective optimizations. Analysis of the optimized parameter sets showed they produced stable and physically reasonable results.

On Extended Compact Genetic Algorithmkknsastry

In this study we present a detailed analysis of the extended compact genetic algorithm (ECGA). Based on the analysis, empirical relations for population sizing and convergence time have been derived and are compared with the existing relations. We then apply ECGA to a non-azeotropic binary working fluid power cycle optimization problem. The optimal power cycle obtained improved the cycle efficiency by 2.5% over that existing cycles, thus illustrating the capabilities of ECGA in solving real-world problems.

Silicon Cluster Optimization Using Extended Compact Genetic Algorithmkknsastry

This paper presents an efficient cluster optimization algorithm. The proposed algorithm uses extended compact genetic algorithm (ECGA), one of the competent genetic algorithms (GAs) coupled with Nelder-Mead simplex local search. The lowest energy structures of silicon clusters with 4-11 atoms have been successfully predicted. The minimum population size and total number of function (potential energy of the cluster) evaluations required to converge to the global optimum with a reliability of 96% have been empirically determined and are O(n4.2) and O(n8.2) respectively. The results obtained indicate that the proposed algorithm is highly reliable in predicting globally optimal structures. However, certain efficiency techniques have to be employed for predicting structures of larger clusters to reduce the high computational cost due to function evaluation.

A Practical Schema Theorem for Genetic Algorithm Design and Tuningkknsastry

This paper develops the theory that can enable the design of genetic algorithms and choose the parameters such that the proportion of the best building blocks grow. A practical schema theorem has been used for this purpose and its ramification for the choice of selection operator and parameterization of the algorithm is explored. In particular stochastic universal selection, tournament selection, and truncation selection schemes are employed to verify the results. Results agree with the schema theorem and indicate that it must be obeyed in order to ascertain sustained growth of good building blocks. The analysis suggests that schema theorem alone is insufficient to guarantee the success of a selectorecombinative genetic algorithm.

On the Supply of Building Blockskknsastry

This study addresses the issue of building-block supply in the initial population. Facetwise models for supply of a single building block as well as for supply of all schemata in a partition have been developed. An estimate for the population size required to ensure the presence of all raw building blocks has been derived using these facetwise models. The facetwise models and the population-sizing estimate are verified with computational results.

Don't Evaluate, Inheritkknsastry

This paper studies fitness inheritance as an efficiency enhancement technique for genetic and evolutionary algorithms. Convergence and population-sizing models are derived and compared with experimental results. These models are optimized for greatest speed-up and the optimal inheritance proportion to obtain such a speed-up is derived. Results on OneMax problems show that when the inheritance effects are considered in the population-sizing model, the number of function evaluations are reduced by 20% with the use of fitness inheritance. Results indicate that for a fixed population size, the number of function evaluations can be reduced by 70% using a simple fitness inheritance technique.

Efficient Cluster Optimization Using A Hybrid Extended Compact Genetic Algori...kknsastry

A recent study Sastry and Xiao (2001) proposed a highly reliable cluster optimization algorithm which employed extended compact genetic algorithm (ECGA) along with Nelder-Mead simplex search. This study utilizes an efficiency enhancement technique for the ECGA based cluster optimizer to reduce the population size and the number of function evaluation requirements, yet retaining the high reliability of predicting the lowest energy structure. Seeding of initial population with lowest energy structures of smaller cluster has been employed as the efficiency enhancement technique. Empirical results indicate that the population size and total number of function evaluations scale up with the cluster size are reduced from O(n4.2) and O(n8.2) to O(n0.83) and O(n2.45) respectively.

Modeling Tournament Selection with Replacement Using Apparent Added Noisekknsastry

This paper analyzes the effects of tournament selection with replacement on the convergence time and population sizing for selectorecombinative genetic algorithms. This paper empirically demonstrates that the run duration remains the same and is not affected whether the tournament selection is performed with or without replacement. However, the population size required is more if tournament selection is performed with replacement rather than without replacement to attain the same level of accuracy. An approximate population-sizing model is derived based on apparent added noise for the case of tournament selection with replacement. The proposed model is verified with experimental results.

Evaluation Relaxation as an Efficiency-Enhancement Technique: Handling Varian...kknsastry

This study develops a decision-making strategy for deciding between fitness functions with differing bias values. Simple, yet practical facetwise models are derived to aid the decision-making process. The decision making strategy is designed to provide maximum speed-up and thereby enhance the efficiency of GA search processes. Results indicate that bias can be handled temporally and that significant speed-up values can be obtained.

Analysis of Mixing in Genetic Algorithms: A Surveykknsastry

Ensuring building-block (BB) mixing is critical to the success of genetic and evolutionary algorithms. There has been a growing interest in analyzing and understanding BB mixing and it is necessary to organize and categorize representative literature. This paper presents an exhaustive survey of studies on one or more aspects of mixing. In doing so, a classification of the literature based on the role of recombination operators assumed by those studies is developed. Such a classification not only highlights the significant results and unifies existing work, but also provides a foundation for future research in understanding mixing in genetic algorithms.

How Well Does A Single-Point Crossover Mix Building Blocks with Tight Linkage?kknsastry

Ensuring building-block (BB) mixing is critical to the success of genetic and evolutionary algorithms. This study develops facetwise models to predict the BB mixing time and the population sizing dictated by BB mixing for single-point crossover. Empirical results are used to validate these models. The population-sizing model suggests that for moderate-to-large problems, BB mixing—instead of BB decision making and BB supply—bounds the population size required to obtain a solution of constant quality. Furthermore, the population sizing for single-point crossover scales as O(2km1.5), where k is the BB size and m is the number of BBs.

Scalability of Selectorecombinative Genetic Algorithms for Problems with Tigh...kknsastry