Bioinfo Lab Manual
Bioinfo Lab Manual
LAB MANUAL
Offered to
I YEAR M.TECH BIOINFORMATICS
DEPARTMENT OF BIOINFORMATICS
SCHOOL OF BIOENGINEERING
SRM UNIVERSITY
KATTANGULATHUR
BI0505 LAB MANUAL
Aim:
To view and use the various biological databases available on the World Wide Web.
Description:
Biological data is highly complex and interrelated. Vast amount of biological information needs
to be stored organized and indexed so that the information can be retrieved and used. There are
five major types of databases namely nucleotide databases, protein databases, protein structure
databases, metabolic pathway databases and the bibliographic databases.
Procedure:
1. Open your web browser and type the web address of the required database.
2. Explore the database and analyze the various information available in the database.
Introduction:
Expasy can be reached by typing the URL www.expasy.org which is maintained by the Swiss
institute of Bioinformatics. The website has a navigation column on the left side of the window,
where the whole web server is categorized into various fields like proteomics, genomics,
phylogeny , systems biology etc to create a better user experience. Each category is divided into
two sections, Databases and tools. Since Expasy serves as the warehouse of many other
databases and tools, all the available databases and tools available are characterized under these
categories.
The home page is loaded with a query search tool where the user can search for biological
information inside Expasy. A drop down menu is also provided in order to narrow down the
search. The results will feature no of hits for the query with respect to each and every database
based on which the user can direct himself to the location of information.
BI0505 LAB MANUAL
Proteomics
• Protein Structure
• Genomics
• Structural Bioinformatics
• System Biology
BI0505 LAB MANUAL
• Phylogeny/evolution
Services :
BI0505 LAB MANUAL
NCBI is one of the leading online resources known for providing Biological sequence
information. NCBI is maintained by two organizations in US ,National Library of Medicine (
NLM) and National Institute of science ( NIH). As a national resource for molecular biology
information, NCBI's mission is to develop new information technologies to aid in the
understanding of fundamental molecular and genetic processes that control health and disease.
More specifically, the NCBI has been charged with creating automated systems for storing and
analyzing knowledge about molecular biology, biochemistry, and genetics.
Home Page:
NCBI has a simplified homepage from where the user can navigate to different resources. The
left side pane of the Homepage has a site map followed by different categories which narrows
down the possibility of finding the right sequence. On the right side , you can see the list of
popular resources which is very useful for first time users.
GenBank
The GenBank sequence database is an open access, annotated collection of all publicly
available nucleotide sequences and their protein translations. This database is produced and
maintained by the National Center for Biotechnology Information (NCBI) as part of
the International Nucleotide Sequence Database Collaboration (INSDC). The National Center for
Biotechnology Information is a part of the National Institutes of Health in the United States.
GenBank and its collaborators receive sequences produced in laboratories throughout the world
from more than 100,000 distinct organisms. In more than 20 years since its establishment,
GenBank has become the most important and most influential database for research in almost all
biological fields, whose data were accessed and cited by millions of researchers around the
world. GenBank continues to grow at an exponential rate, doubling every 18 months.
Entrez:
The NCBI database accepts queries and delivers data via a custom made search engine called
Entrez. The Home page of NCBI has a search box which directs the user to entrez. Entrez is
internally connected to various biological databases which increases the probability of getting the
correct information
BI0505 LAB MANUAL
BLAST:
BLAST stands for Basic Local Alignment Search Tool.BLAST is a tools that is used to find the
seqyuences homologous to a particular sequence.BLAST compares all the sequences in the
database with the one that is searched for and provides many hits which are usually arranged in
the increasing order of the scored obtained
BLAST uses PAM and BLOSUM matrices for scoring the alignment.
PubMed :
This is an online Bibliographic database which has a collection of the research papers, journals
and other bibliographic data. The Database is internally connected with other Bibliographic
databases like Medline, Biomedcentral etc.
Pubchem :
This contains data about the chemical compounds that are used for insillico analysis
Database of SNP’s:
OMIM:
OMIM stand for Online Mendelian Inheritance in Man. This database contains information about
the genetical disorders. OMIM gives complete data on the diseases the genetical background
behind it and also the corresponding journal resources.
OMIA:
This database is similar to OMIM, but contains data about the diseases of all the other animals at
the genetic level except human.
BI0505 LAB MANUAL
Output:
The file format of the particular protein keratin can be shown follows:
BI0505 LAB MANUAL
BI0505 LAB MANUAL
Introduction:
Biological databases are libraries of life sciences information, collected from scientific
experiments, published literature, high-throughput experiment technology, and computational
analyses. They contain information from research areas
including genomics, proteomics, metabolomics, microarray gene expression,
and phylogenetics. Information contained in biological databases includes gene function,
structure, localization (both cellular and chromosomal), clinical effects of mutations as well as
similarities of biological sequences and structures.
Biological databases are an important tool in assisting scientists to understand and explain a host
of biological phenomena from the structure of biomolecules and their interaction, to the whole
metabolism of organisms and to understanding the evolution of species. This knowledge helps
facilitate the fight against diseases, assists in the development of medications and in discovering
basic relationships amongst species in the history of life.
Biological knowledge is distributed amongst many different general and specialized databases.
This sometimes makes it difficult to ensure the consistency of information. Biological databases
cross-reference other databases with accession numbers as one way of linking their related
knowledge together.
An important resource for finding biological databases is a special yearly issue of the journal
Nucleic Acids Research (NAR). The Database Issue of NAR is freely available, and categorizes
many of the publicly available online databases related to biology and bioinformatics.
Introduction:
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches
of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the
organism. Knowledge of gene sequences has become indispensable for basic biological research,
other research branches utilizing sequencing, and in numerous applied fields such as
diagnostic, biotechnology, forensic biology and biological systematics.
The simplicity of FASTA format makes it easy to manipulate and parse sequences using text-
processing tools and scripting languages
Method:
ABL1_HUMAN
4. Obtain relevant information about protein and retrieve FASTA format of its sequence by
clicking on the FASTA tab at the right corner.
ABL1_HUMAN
Function of Protein:
Non-receptor tyrosine-protein kinase that plays a role in many key processes linked to cell
growth and survival such as cytoskeleton remodeling in response to extracellular stimuli, cell
motility and adhesion, receptor endocytosis, autophagy, DNA damage response and apoptosis.
Coordinates actin remodeling through tyrosine phosphorylation of proteins controlling
cytoskeleton dynamics.
Enzyme regulation: Stabilized in the inactive form by an association between the SH3 domain
and the SH2-TK linker region, interactions of the N-terminal cap, and contributions from an N-
terminal myristoyl group and phospholipids.
Protein attributes
2. Write about PTM involved in P53355 and comment on the residues involved in it.
Aim: To determine the Post Translational Modifications involved in P53355 and to determine
the residues involved in PTM.
Introduction:
Also, enzymes may remove amino acids from the amino end of the protein, or cut the peptide
chain in the middle. Also, most nascent polypeptides start with the amino
acid methionine because the "start" codon onmRNA also codes for this amino acid. This amino
acid is usually taken off during post-translational modification.
BI0505 LAB MANUAL
Other modifications, like phosphorylation, are part of common mechanisms for controlling the
behavior of a protein, for instance activating or inactivating an enzyme.
Method:
3. Retrieve any one FASTA sequence of GABA transaminase in Human, mouse, pig
and chick.
Aim: To retrieve any one FASTA sequence of GABA transaminase in Human, mouse,
pig and chick
Introduction:
4-aminobutyrate aminotransferase (or GABA transaminase) is an enzyme which
catalyzes the conversion of 4-aminobutanoic acid (GABA) and 2-
oxoglutarate into succinic semialdehyde and glutamate.
Method:
1. Open NCBI http://www.ncbi.nlm.nih.gov/
2. Choose the Protein Database and enter GABA transaminase in the search box
3. Click on Advanced Search Tab and Choose the Organism option from the drop down
menu.
4. Enter Homo sapiens and the results are displayed.
5. The above steps can be repeated by entering the Organism name as Sus scrofa(Pig), Mus
musculus(Mouse) and Gallus gallus(Chick).
BI0505 LAB MANUAL
Human
>gi|188536080|ref|NP_001120920.1| 4-aminobutyrate aminotransferase, mitochondrial
precursor [Homo sapiens]
MASMLLAQRLACSFQHSYRLLVPGSRHISQAAAKVDVEFDYDGPLMKTEVPGP
RSQELMKQLNIIQNAEA
VHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQNASMF
VNRPALGILPPENFV
EKLRQSLLSVAPKGMSQLITMACGSCSNENALKTIFMWYRSKERGQRGFSQEEL
ETCMINQAPGCPDYSILSFMGAFHGRTMGCLATTHSKAIHKIDIPSFDWPIAPFPR
LKYPLEEFVKENQQEEARCLEEVEDLIVKYRKKKKTVAGIIVEPIQSEGGDNHAS
DDFFRKLRDIARKHGCAFLVDEVQTGGGCTGKFWAHEHWGLDDPADVMTFSK
KMMTGGFFHKEEFRPNAPYRIFNTWLGDPSKNLLLAEVINIIKREDLLNNAAHAG
KALLTGLLDLQARYPQFISRVRGRGTFCSFDTPDDSIRNKLILIARNKGVVLGGCG
DKSIRFRPTLVFRDHHAHLFLNIFSDILADFK
Pig
4. Find out the number of entries in SWISSPROT for Serine kinase in PIG.
Aim: To determine the number of entries in SWISSPROT for Serine kinase in PIG.
Introduction:
Serine/threonine protein kinases (EC 2.7.11.1) phosphorylate the OH group
of serine or threonine (which have similar sidechains). Serine/Threonine Kinase receptors
plays a role in the regulation of cell proliferation, programmed cell death (apoptosis), cell
differentiation, and embryonic development.
Method:
1. Open the following url- http://www.ebi.ac.uk/uniprot/
2. Enter the query as follows:
Serine kinase AND organism:"Sus scrofa [9823]"
water), but the structure is stable only when the parts of a protein domain are locked into
place by specific tertiary interactions, such as salt bridges, hydrogen bonds, and the tight
packing of side chains and disulfide bonds.
Quaternary structure is a larger assembly of several protein molecules or polypeptide
chains, usually called subunits in this context. The quaternary structure is stabilized by
the same non-covalent interactions and disulfide bonds as the tertiary structure.
Complexes of two or more polypeptides (i.e. multiple subunits) are called multimers.
Specifically it would be called a dimer if it contains two subunits, a trimer if it contains
three sub-units, and a tetramer if it contains four subunits. The subunits are frequently
related to one another by symmetry operations, such as a 2-fold axis in a dimer.
Multimers made up of identical subunits are referred to with a prefix of "homo-" (e.g. a
homotetramer) and those made up of different subunits are referred to with a prefix of
"hetero-" (e.g. a heterotetramer, such as the two alpha and two beta chains
of hemoglobin).
Method:
1. Open Uniprot http://www.uniprot.org/
2. Enter the protein ID.
Function: Involved in oxygen transport from the lung to the various peripheral tissues and
is specific to the red blood cells.
The buffer capacity of a protein will affect the accuracy of its predicted pI, with poor buffer
capacity leading to greater error in prediction. Because of this, pI predictions for small
proteins can be problematic.
BI0505 LAB MANUAL
Protein Mw is calculated by the addition of average isotopic masses of amino acids in the
protein and the average isotopic mass of one water molecule.
This program does not account for the effects of post-translational modifications, thus
modified proteins on a 2-D gel may migrate to a position quite different to that predicted.
Protein glycosylation in particular can affect protein migration in both pI and Mw
dimensions. In addition to the standard one-letter-codes for the 20 amino acids, the 2 non-
standard amino acids (Selenocysteine and Pyrrolysine), the characters B, Z and X are
accepted:
Method:
LIRB2_HUMAN (Q8N423)
Q9H6F5 (CCD86_HUMAN)
Source: Human
Description:
Coiled-coil domain-containing protein 86
Supersecondary Structure:
It contains one coiled coil domain, a type of secondary structure composed of two or
more alpha helices which entwine to form a cable structure.
1-360 residues
8. Find the number of proteins which are having isoelectric point value between 5 and 5.5.
Comment on the result.
Aim: To find the number of proteins which are having isoelectric point value between 5
and 5. 5
Introduction:
TagIdent is a tool which allows the generation of a list of proteins close to a given pI and
Mw, the identification of proteins by matching a short sequence tag of up to 6 amino
acids against proteins in the UniProt Knowledgebase (Swiss-Prot and TrEMBL)
databases close to a given pI and Mw and the identification of proteins by their mass, if
this mass has been determined by mass spectrometric techniques for one or more species
and with an optional keyword. When searching in UniProtKB/Swiss-Prot, TagIdent
removes signal sequences and/or propeptides (as documented in the UniProtKB/Swiss-
BI0505 LAB MANUAL
Prot feature table (FT lines)) before computing pI and Mw for each of the resulting
chains.
The annotation in UniProtKB/TrEMBL is done automatically; it is incomplete and not
always correct. Thus information on UniProtKB/TrEMBL FT lines is not used to process
UniProtKB/TrEMBL proteins into mature chains or peptides (i.e. pI and Mw are always
computed for the whole sequence), and the use of a keyword is not allowed for searches
in UniProtKB/TrEMBL.
Method:
TagIdent tool is a Identify proteins with isoelectric point (pI), molecular weight (Mw)
and sequence tag, or generate a list of proteins close to a given pI and Mw.
And number of proteins found in the specified pI/Mw ranges(5 and 5.5) are 96933
Introduction
Basic local alignment search tool (BLAST) is a sequence similarity search program. The
National Center for Biotechnology Information (NCBI) maintains a BLAST server with a home
page at http://www.ncbi.nlm.nih.gov/BLAST/.
Basic local alignment search tool (BLAST) is a sequence similarity search program that can be
used via a web interface or as a stand-alone tool to compare a user’s query to a database of
sequences. BLAST is a heuristic that finds short matches between two sequences and attempts to
start alignments from these ‘hot spots’. In addition to performing alignments, BLAST provides
statistical information about an alignment; this is the ‘expect’ value, or false-positive rate. The
National Center for Biotechnology Information (NCBI) maintains a BLAST server with a
homepage at http://www.ncbi.nlm.nih.gov/BLAST/. On the homepage the different BLAST
searches are listed by type: nucleotide, protein, translated and genomes.
Introduction:
Conserved domains (CD) in proteins play a crucial role in protein interactions, DNA binding,
enzyme activity, and other important cellular processes. With recently released gene number
predictions in the human genome being less than many previous predictions, interactions among
these domains may prove to be central to proteome complexity. Protein domains are often
conserved across many species, and as such, they offer an interesting dataset in how genomes
maintain them with relationship to other conserved domains, as well as to proteome size.
Method:
Query ID
gi|25008336|sp|Q8NFM4.1|ADCY4_HUMAN
Description
Blast Results:
Max score = 2214
Total score = 2214
Query coverage = 100%
E value = 0.0
Similar Sequence:
NM_172961.3
Mus musculus 4-aminobutyrate aminotransferase (Abat), nuclear
gene encoding mitochondrial protein, transcript variant 1,
BI0505 LAB MANUAL
mRNA
Length=4653
GENE ID: 268860 Abat | 4-aminobutyrate aminotransferase [Mus musculus]
Score = 1817 bits (2014),
Expect = 0.0
Identities = 1305/1501 (87%), Gaps = 2/1501 (0%)
Strand=Plus/Plus
C7AE31
O90371 POLS_ONNVI
Structural polyprotein (O'nyong-nyong virus (strain Igbo Ora))
O90369 POLS_ONNVS
Structural polyprotein (O'nyong-nyong virus (strain SG650))
P22056 POLS_ONNVG
Structural polyprotein (O'nyong-nyong virus (strain Gulu))
Query ID
gi|48429239|sp|P80404.3|GABT_HUMAN
Description
4-aminobutyrate aminotransferase, mitochondrial precursor [Homo sapiens]
BI0505 LAB MANUAL
Paralogous protein:
AAB38510.1
gamma-aminobutyric acid transaminase [Homo sapiens]
Score = 1000 bits (2585), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 482/500 (96%), Positives = 483/500 (97%), Gaps = 0/500 (0%)
5. Find whether the given pattern is present in the following protein. Also find its homologous
proteins present in SWISPROT database possessing the similar pattern.
Aim: To find whether the given pattern is present in the following protein. Also to find its
homologous proteins present in SWISPROT database possessing the similar pattern.
Introduction:
By filling in the "regular expression" box on the PSI-blast page, you can execute a PHI-blast
search. PHI-blast enforces the presence of a motif in addition to the usual PSI-blast criteria
for matching. Regular expressions can be used to confine the results to a formally defined
family. The syntax for patterns in PHI-BLAST follows the conventions of PROSITE.
BI0505 LAB MANUAL
Pattern:
[LIVMFYWCS]-[LIVMFYWCAH]-x-D-[ED]-[IVA]-x(2,3)-[GAT]-
[LIVMFAGCYN]-x(0,1)-[RSACLIH]-x-[GSADEHRM]-x(10,16)-
[DH]-[LIVMFCAG]-[LIVMFYSTAR]-x(2)-[GSA]-K-x(2,3)-
[GSTADNV]-[GSAC]
Protein:
>
MASMLLAQRLACSFQHSYRLLVPGSRHISQAAAKVDVEFDYDGPLMKTEVPGPRSQELMK
QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQ
NASMFVNRPALGILPPENFVEKLRQSLLSVAPKGMSQLITMACGSCSNENALKTIFMWYR
SKERGQRGFSQEELETCMINQAPGCPDYSILSFMGAFHGRTMGCLATTHSKAIHKIDIPS
FDWPIAPFPRLKYPLEEFVKENQQEEARCLEEVEDLIVKYRKKKKTVAGIIVEPIQSEGG
DNHASDDFFRKLRDIARKHGCAFLVDEVQTGGGCTGKFWAHEHWGLDDPADVMTFSKKMM
TGGFFHKEEFRPNAPYRIFNTWLGDPSKNLLLAEVINIIKREDLLNNAAHAGKALLTGLL
DLQARYPQFISRVRGRGTFCSFDTPDDSIRNKLILIARNKGVVLGGCGDKSIRFRPTLVF
RDHHAHLFLNIFSDILADFK
Method:
1. Retrieve the sequence from NCBI.
2. Paste the sequence in the Query box in blastp and choose phi blast as option.
3. Run against a non-redundant database (nr).
PHI-BLAST Results:
BI0505 LAB MANUAL
6. Identify the given sequence and also find the similar sequences present in SWISSPROT
database for the following query.
>
AGAGCGCAGCGGCGAGCGTGACTCCGCCATCAGGTCCCCGGCTCCCTCCCCGGACCTAGCCCACTCCGCT
GCGCCAGCGCCGCGGGCACCCCGGGGCTCGGGGCTGGGGAGATCATGGCCCGCCTCTTCAGCCCCCGGCC
GCCCCCCAGCGAAGACCTCTTCTACGAGACCTACTACAGCCTGAGCCAGCAGTACCCGCTGCTGCTGCTG
CTGCTGGGGATCGTGCTCTGTGCGCTCGCGGCGCTGCTCGCAGTGGCCTGGGCCAGCGGCAGGGAGCTGA
CCTCAGACCCGAGCTTCCTGACCACTGTGCTGTGCGCGCTGGGCGGCTTCTCGCTGCTGCTGGGCCTCGC
TTCCCGGGAGCAGCGACTGCAGCGCTGGACGCGTCCCCTGTCCGGCTTGGTATGGGTCGCGCTGCTAGCG
CTAGGCCACGCCTTCCTGTTCACCGGGGGCGTGGTGAGCGCCTGGGACCAGGTGTCCTATTTTCTCTTCG
TCATCTTCACGGCGTATGCCATGCTGCCCTTGGGCATGCGGGACGCCGCCGTCGCGGGCCTCGCCTCCTC
ACTCTCGCATCTGCTGGTCCTCGGGCTGTATCTTGGGCCACAGCCGGACTCACGGCCTGCACTGCTGCCG
CAGTTGGCAGCAAACGCAGTGCTGTTCCTGTGCGGGAACGTGGCAGGAGTGTACCACAAGGCGCTGATGG
AGCGCGCCCTGCGGGCCACGTTCCGGGAGGCACTCAGCTCCCTGCACTCACGCCGGCGGCTGGACACCGA
GAAGAAGCACCAGGAACACCTTCTCTTGTCCATCCTTCCTGCCTACCTGGCCCGAGAGATGAAGGCAGAG
ATCATGGCACGGCTGCAGGCAGGACAGGGGTCACGGCCAGAGAGCACTAACAATTTCCACAGCCTCTATG
TCAAGAGGCACCAGGGAGTCAGCGTGCTGTATGCTGACATCGTGGGCTTCACGCGGCTGGCCAGCGAGTG
TTCCCCTAAGGAGCTGGTGCTCATGCTCAATGAGCTCTTTGGCAAGTTCGACCAGATTGCCAAGGAGCAT
GAATGCATGCGGATCAAGATCCTGGGGGACTGTTACTACTGTGTCTCTGGGCTGCCACTCTCACTGCCAG
ACCATGCCATCAACTGCGTGCGCATGGGCCTGGACATGTGCCGGGCCATCAGGAAACTGCGGGCAGCCAC
TGGCGTGGACATCAACATGCGTGTGGGCGTGCACTCAGGCAGCGTACTGTGTGGAGTCATCGGGCTGCAG
AAGTGGCAGTACGACGTTTGGTCACATGATGTCACACTGGCTAACCACATGGAGGCAGGCGGTGTACCAG
GGCGAGTGCACATCACAGGGGCTACCCTGGCCCTGCTGGCAGGGGCTTATGCTGTGGAGGACGCAGGCAT
GGAGCATCGGGACCCCTACCTTCGGGAGCTAGGGGAGCCTACCTATCTGGTCATCGATCCACGGGCAGAG
GAGGAGGATGAGAAGGGCACTGCAGGAGGCTTGCTGTCCTCGCTTGAGGGCCTCAAGATGCGTCCATCAC
TGCTGATGACCCGTTACCTGGAGTCCTGGGGCGCAGCCAAGCCTTTTGCCCACCTGAGCCACGGAGACAG
CCCTGTGTCCACCTCCACCCCTCTCCCGGAGAAGACCCTGGCTTCCTTCAGCACCCAGTGGAGCCTGGAT
CGGAGCCGTACCCCCCGGGGACTAGATGATGAACTGGACACCGGGGATGCCAAGTTCTTCCAGGTCATTG
AGCAGCTCAACTCGCAGAAACAGTGGAAGCAGTCGAAGGACTTCAACCCACTGACACTGTACTTCAGAGA
GAAGGAGATGGAGAAAGAGTACCGACTCTCTGCAATCCCCGCCTTCAAATACTATGAAGCCTGCACCTTC
CTGGTTTTTCTCTCCAACTTCATCATCCAGATGCTAGTGACAAACAGGCCCCCAGCTCTGGCCATCACGT
ATAGCATCACCTTCCTCCTCTTCCTCCTCATCCTTTTTGTCTGCTTCTCAGAGGACCTGATGAGGTGTGT
CCTGAAAGGCCCCAAGATGCTGCACTGGCTGCCTGCACTGTCTGGCCTGGTGGCCACACGACCAGGACTG
AGAATAGCCTTGGGCACCGCCACCATCCTCCTTGTCTTTGCCATGGCCATTACCAGCCTGTTCTTCTTCC
CAACATCATCAGACTGCCCTTTCCAAGCTCCCAATGTGTCCTCCATGATTTCCAACCTCTCCTGGGAGCT
CCCTGGGTCTCTGCCTCTCATCAGTGTCCCATACTCCATGCACTGCTGCACGCTGGGCTTCCTCTCCTGC
TCCCTCTTTCTGCACATGAGCTTCGAGCTGAAGCTGCTGCTGCTCCTGCTGTGGCTGGCGGCATCCTGCT
CCCTCTTCCTGCACTCCCATGCCTGGCTGTCGGAATGCCTCATCGTCCGCCTCTATCTGGGCCCCTTGGA
CTCCAGGCCCGGAGTGCTGAAGGAGCCCAAACTGATGGGTGCTATCTCCTTCTTCATCTTCTTCTTCACC
CTCCTTGTCCTGGCTCGCCAGAATGAGTACTACTGCCGCCTGGACTTCCTGTGGAAGAAGAAGCTGAGGC
AGGAGAGGGAGGAGACAGAGACGATGGAGAACCTGACTCGGCTGCTCTTGGAGAACGTGCTCCCTGCACA
CGTGGCCCCCCAGTTCATTGGCCAGAACCGGCGCAACGAGGATCTCTACCACCAGTCCTATGAATGCGTT
TGTGTCCTCTTCGCCTCAGTCCCAGACTTCAAGGAGTTCTACTCTGAATCCAACATCAATCATGAGGGCC
TAGAGTGTCTGAGGCTGCTCAATGAGATAATTGCTGATTTTGATGAGCTGCTCTCCAAGCCCAAGTTCAG
TGGGGTGGAGAAGATCAAGACCATCGGCAGCACCTACATGGCAGCCACAGGCTTAAATGCCACCTCTGGA
CAGGATGCACAACAGGATGCTGAACGGAGCTGCAGCCACCTTGGCACTATGGTGGAATTTGCCGTGGCCC
TGGGGTCTAAGCTGGACGTCATCAACAAGCATTCATTCAACAACTTCCGCCTGCGAGTGGGGTTGAACCA
TGGACCCGTAGTAGCTGGAGTTATTGGGGCCCAGAAGCCGCAATATGACATTTGGGGCAACACAGTGAAC
BI0505 LAB MANUAL
GTGGCCAGCCGCATGGAGAGTACAGGAGTCCTTGGCAAAATCCAAGTGACTGAGGAGACAGCATGGGCCC
TACAGTCCCTGGGCTACACCTGCTACAGCCGGGGTGTCATCAAGGTGAAAGGCAAAGGGCAGCTCTGCAC
CTACTTCCTGAACACAGACTTGACACGAACTGGACCTCCTTCAGCTACCCTAGGCTGAGATTGCACTCGC
CTTCTAAGAACCTCAATAAAGAGACTCTGGGGTGTCTGGAGCCCATTGATGTCTG
Method:
1. Run the query in Blastn to identify the sequence.
2. Then, run Blastx to determine the similar proteins.
Results and inference:
Blastn results-
Homo sapiens adenylate cyclase 4 (ADCY4), transcript variant 2, mRNA
Score = 6159 bits (6830), Expect = 0.0
Identities = 3415/3415 (100%), Gaps = 0/3415 (0%)
Strand=Plus/Plus
Blastx results-
BI0505 LAB MANUAL
Similar protein:
sp|Q8NFM4.1|ADCY4_HUMAN
Score = 1926 bits (4990),Expect = 0.0
Identities = 1077/1077 (100%), Positives = 1077/1077 (100%), Gaps = 0/1077 (0%)
NP_640340.2
adenylate cyclase type 4 [Homo sapiens]
7. Find the structurally solved homologous proteins for P80404. Comment on the results.
Aim: To find Struturally solved homologus proteins for P80404
Method:
1. Run Blastp for the query against PDB.
2. Observe results.
pdb|1OHV|A
Length=472
Score = 959 bits (2479), Expect = 0.0, Method: Compositional matrix adjust.
Identities = 453/472 (96%), Positives = 464/472 (98%), Gaps = 0/472 (0%)
BI0505 LAB MANUAL
Introduction:
A sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify
regions of similarity that may be a consequence of functional, structural, or
evolutionary relationships between the sequences. Aligned sequences of nucleotide or amino
acid residues are typically represented as rows within a matrix. Gaps are inserted between
the residues so that identical or similar characters are aligned in successive columns. Pairwise
sequence alignment methods are used to find the best-matching piecewise (local) or global
alignments of two query sequences. Pairwise alignments can only be used between two
sequences at a time, but they are efficient to calculate and are often used for methods that do not
require extreme precision (such as searching a database for sequences with high similarity to a
query). The three primary methods of producing pairwise alignments are dot-matrix methods,
dynamic programming, and word methods; however, multiple sequence alignment techniques
can also align pairs of sequences.
1. Perform the local alignment between following sequences using any two variants of
BLOSUM. Comment on the result.
Aim: To perform the local alignment between the given sequences using any two variants of
BLOSUM
Introduction:
The BLOSUM (BLOcks of Amino Acid SUbstitution Matrix) matrix is a substitution
matrix used for sequence alignment ofproteins. BLOSUM matrices are used to score
alignments between evolutionarily divergent protein sequences. They are based on local
alignments. BLOSUM matrices were first introduced in a paper by Henikoff and
Henikoff. They scanned the BLOCKS database for very conserved regions of protein
families (that do not have gaps in the sequence alignment) and then counted the relative
frequencies of amino acids and their substitution probabilities. Then, they calculated a log-
odds score for each of the 210 possible substitutions of the 20 standard amino acids. All
BLOSUM matrices are based on observed alignments; they are not extrapolated from
comparisons of closely related proteins like the PAM Matrices. Several sets of BLOSUM
matrices exist using different alignment databases, named with numbers. BLOSUM matrices
with high numbers are designed for comparing closely related sequences, while those with
low numbers are designed for comparing distant related sequences. For example,
BLOSUM80 is used for less divergent alignments, and BLOSUM45 is used for more
divergent alignments. The matrices were created by merging (clustering) all sequences that
were more similar than a given percentage into one single sequence and then comparing
those sequences (that were all more divergent than the given percentage value) only; thus
reducing the contribution of closely related sequences. The percentage used was appended to
the name, giving BLOSUM80 for example where sequences that were more than 80%
identical were clustered.
BI0505 LAB MANUAL
Method:
1. Enter the Given Sequences in Blastp.
MASMLLAQRLACSFQHSYRLLVPGSRHISQAAAKVDVEFDYDGPLMKTEVPGPRSQELMKQLNIIQNAEAVHFFCNYEESRGNYLV
DVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQNASMFVNRPALGILPPENFVEKLRQSLLSVAPKGMSQLITMACGSCSNENALK
TIFMWYR
QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQ
NASMFVNRPALGILPPENFVEKLRQSLLSVAPKGMSQLITMACGSCSNENALKTIFMWYR
>lcl|16131
QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQ
Length=60
Score = 127 bits (318), Expect = 3e-35, Method: Compositional matrix adjust.
Identities = 60/60 (100%), Positives = 60/60 (100%), Gaps = 0/60 (0%)
Matrix: BLOSUM 80
>lcl|60077
QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQ
Length=60
Score = 131 bits (297), Expect = 1e-36, Method: Compositional matrix adjust.
Identities = 60/60 (100%), Positives = 60/60 (100%), Gaps = 0/60 (0%)
There is significant difference in the score and Expect value between the two BLOSUM scoring
matrices.
2. Perform the local alignment between following sequences. Comment on the result.
QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRCGSCSNENALKTIF
Introduction:
Local Alignment is an alignment that searches for segments of the two sequences that
match well. There is no attempt to force entire sequences into an alignment, just those
parts that appear to have good similarity, according to some criterion.
BI0505 LAB MANUAL
Method:
1. Enter the 2 sequences in Blastp with BLOSUM62 as the matrix.
2. Observe results.
Query 61 QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNR 92
QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNR
Sbjct 1 QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNR 32
CGSCSNENALKTIF
Sbjct 33 CGSCSNENALKTIF 46
In the second sequence first 32 residues is found in the first query sequence from position 61 to
91 and the remaining are found in position 163 to 176.
3. Obtain the global alignment between the following sequences. Comment on the result.
QLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQ
NASMFVNRPALGILPPENFVEKLRQSLLSVAPKGMSQLITMACGSCSNENALKTIFMWYR
QLNIIQNAEAVHFFCNYEESRGNYLYSQISSVPASMFVNRPALGILPPENFVSCSNENALKTIFMWY
Introduction:
Global Alignment is an alignment that assumes that the two proteins are basically similar
over the entire length of one another. The alignment attempts to match them to each other
from end to end, even though parts of the alignment are not very convincing
Method:
NW Score = 260
Identities = 67/120 (56%), Positives = 67/120 (56%), Gaps = 53/120 (44%)
BI0505 LAB MANUAL
4. Compare the local and global alignments between the give n sequences. Comment on the
results.
MASMLLAQRLACSFQHSYRLLVPGSRHISQAAAKVDVEFDYDGPLMKTEVPGPRSQELMKQLNIIQNAEAVHFFCNYEESRGNYLV
DVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQNASMFVNRPALGILPPENFVEKLRQSLLSVAPKGMSQLITMACGSCSNENALK
TIFMWYRMASMLLAQRLACSFQHSYRLLVPGSRHISQAALVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQNASMFVNRPAL
GILPPENFVQLITMACGSCSNENALKTIFMWYR
Aim: To compare the local and global alignments between the given sequences
Introduction:
Global alignments, which attempt to align every residue in every sequence, are most useful when
the sequences in the query set are similar and of roughly equal size. (This does not mean global
alignments cannot end in gaps.) A general global alignment technique is the Needleman–Wunsch
algorithm, which is based on dynamic programming. Local alignments are more useful for
dissimilar sequences that are suspected to contain regions of similarity or similar sequence
motifs within their larger sequence context. The Smith–Waterman algorithm is a general local
alignment method also based on dynamic programming. With sufficiently similar sequences,
there is no difference between local and global alignments.
BI0505 LAB MANUAL
Method:
1. For local alignment, run the two query sequences in Blastp.
2. For global alignment, run the two query sequences in Needleman-Wunsch Global
Sequence Alignment Tool.
3.
Result and inference:
Local Alignment:
Score = 198 bits (504), Expect = 2e-56, Method: Compositional matrix adjust.
Identities = 112/180 (62%), Positives = 112/180 (62%), Gaps = 68/180 (38%)
Global Alignment:
NW Score = 487
Identities = 112/180 (62%), Positives = 112/180 (62%), Gaps = 68/180 (38%)
Difference: Scores are different.
BI0505 LAB MANUAL
5. Perform the alignment using Needleman Wunsch algorithm between P80404 and
P80147.
Aim: To perform the alignment using Needleman Wunsch algorithm between P80404
and P80147
Introduction:
The Needleman–Wunsch algorithm performs a global alignment on two sequences
(called A and B here). It is commonly used in bioinformatics to
align protein or nucleotide sequences.
Method:
1. Retrieve the sequences from NCBI and run the Needleman-Wunsch Global
Sequence Alignment Tool.
2. Observe results.
NW Score = 2536
Identities = 474/500 (95%), Positives = 490/500 (98%), Gaps = 0/500 (0%)
Query ID
lcl|39669
Description
gi|48429239|sp|P80404.3|GABT_HUMAN
Subject ID
39671
Description
gi|120968|sp|P80147.2|GABT_PIG
BI0505 LAB MANUAL
BI0505 LAB MANUAL
Aim: To identify the 10- homologues sequences of P68871 of various origins. Find the
conserved region existing between them comment on the same. Comment on the evolutionary
relationship between the sequences.
Introduction:
Phylogeny.fr has been designed to provide a high performance platform that transparently
chains programs relevant to phylogenetic analysis in a comprehensive, and flexible pipeline.
Although phylogenetic aficionados will be able to find most of their favorite tools and run
sophisticated analysis, the primary philosophy of Phylogeny.fr is to assist biologists with no
experience in phylogeny in analyzing their data in a robust way. The Phylogeny.fr platform
offers a phylogeny pipeline which can be executed through three main modes:
The "One Click mode" targets users that do not wish to deal with program and parameter
selection. By default, the pipeline is already set up to run and connect programs recognized for
their accuracy and speed (MUSCLE for multiple alignment and PhyML for phylogeny) to
reconstruct a robust phylogenetic tree from a set of sequences.
In the "Advanced mode", the Phylogeny.fr server proposes the succession of the same programs
but users can choose the steps to perform (multiple sequence alignment, phylogenetic
reconstruction, tree drawing) and the options of each program.
BI0505 LAB MANUAL
The "A la carte mode" offers the possibility of running and testing more alignment and
phylogeny programs, such as MUSCLE, ClustalW, T-Coffee, PhyML,BioNJ, TNT.
Method:
Blast output:
Clustaw Alignment:
One-Click Mode:
PHYLOGRAM
CLADOGRAM
A la Carte Result:
MA: ClustalW
View: TreeDyn
PHYLOGRAM
CLADOGRAM
The distance between the above cluster and the cluster formed by from
gi_6003534_gb_AAF00489.1_AF181989_1_hemoglobin_beta_subunit_vari,
gi_56749856_sp_P68871.2_HBB_HUMAN_RecName_Full=Hemoglobin_subunit,
gi_26892090_gb_AAN84548.1_beta_globin_chain_variant_Homo_sapiens and
gi_4378804_gb_AAD19696.1_hemoglobin_beta_chain_Homo_sapiens is 1.0
gi_4504349_ref_NP_000509.1_hemoglobin_subunit_beta_Homo_sapiens is distantly related to above
clusters.
BI0505 LAB MANUAL
Aim:
Identify the Genes present if any in the given genomic sequence NC_010456.
Introduction:
Under the probabilistic model of gene structural and compositional properties used by
GENSCAN, each possible "parse" (gene structure description) which is compatible with the
sequence is assigned a probability. The default output of the program is simply the "optimal"
(highest probability) parse of the sequence. The exons in this optimal parse are referred to as
"optimal exons" and the translation products of the corresponding "optimal genes" are printed as
GENSCAN predicted peptides. Of course, the optimal parse does not always correspond to the
actual (biological) parse of the sequence, that is, the actual set of exons/genes present. In
addition, there may be more than one parse which can be considered "correct", for example, in
the case of a gene which is alternatively transcribed, translated or spliced. For both of these
reasons, it may be of interest to consider "suboptimal" ("near-optimal") exons as well, i.e. exons
which have reasonably high probability but are not present in the optimal parse. Specifically, for
every potential exon E in the sequence, the probability P(E) is defined as the sum of the
probabilities under the model of all possible "parses" (gene structures) which contain the exact
exon E in the correct reading frame.
Given a probability cutoff C, suboptimal exons are those potential exons with P(E) > C which
are not present in the optimal parse.
Suboptimal exons have a variety of potential uses. First, suboptimal exons sometimes correspond
to real exons which were missed for whatever reason by the optimal parse of the sequence.
Second, regions of a prediction which contain multiple overlapping and/or incompatible optimal
and suboptimal exons may in some cases indicate alternatively spliced regions of a gene (Burge
& Karlin, in preparation). The probability cutoff C used to determine which potential exons
qualify as suboptimal exons can be set to any of a range of values between 0.01 and 1.00. The
default value on the web page is 1.00, meaning that no suboptimal exons are printed. For most
applications, a cutoff value of about 0.10 is recommended. Setting the value much lower than
0.10 will often lead to an explosion in the number of suboptimal exons, most of which will
probably not be useful. On the other hand, if the value is set much higher than 0.10, then
potentially interesting suboptimal exons may be missed. Gene: is aa locatable
region of genomic sequence, corresponding to a unit of inheritance, which is associated with
regulatory regions, transcribed regions, and or other functional sequence regions
Exon: is a nucleic acid sequence that is represented in the mature form of an RNA molecule
either after portions of a precursor RNA (introns) have been removed by cis-splicing or when
two or more precursor RNA molecules have been ligated by trans-splicing. The mature RNA
BI0505 LAB MANUAL
Intron: is any nucleotide sequence within a gene that is removed by RNA splicing to generate
the final mature RNA product of a gene. The term intron refers to both the DNA sequence within
a gene, and the corresponding sequence in RNA transcripts.
Intergenic region: (IGR) is a stretch of DNA sequences located between clusters of genes that
contain few or no genes. Occasionally some intergenic DNA acts to control genes nearby, but
most of it has no currently known function. It is one of the DNA sequences collectively referred
to as junk DNA, though it is only one phenomenon labeled such and in scientific studies today,
the term is less used. In humans, intergenic regions comprise a large percentage of the genome.
Isochore: is a large region of DNA (greater than 300 KB) with a high degree uniformity in G-C
and C-G (collectively GC) which tends to have more genes, higher local melting
or denaturation temperatures, and different flexibility. Overall, isochores are largely
homogeneous in GC content in contrast to the heterogeneity of the entire genome.
Method:
1. Open http://genes.mit.edu/GENSCAN.html
2. Enter the sequence ID .
3. Choose Organisms: Vertebrate; Arabidopsis; Maize and Suboptimal exon cut-off as 1.00
4. Run Genscan.
BI0505 LAB MANUAL
NC_010456:
GENSCAN Output
GENSCAN 1.0 Date run: 28-Oct-111 Time: 01:35:09
Predicted genes/exons:
Gn.Ex Type S .Begin ...End .Len Fr Ph I/Ac Do/T CodRg P.... Tscr...
----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
11.01 Intr - 135637 135458 180 2 0 122 -15 111 0.288 4.04
Exnum Type S .Begin ...End .Len Fr Ph B/Ac Do/T CodRg P.... Tscr..
----- ---- - ------ ------ ---- -- -- ---- ---- ----- ----- ------
S.137 Intr - 51745 51548 198 1 0 111 -21 144 0.011 5.35
S.171 Intr - 64008 63824 185 2 2 123 -29 126 0.137 3.83
S.214 Intr + 78635 78821 187 2 1 126 -24 112 0.015 3.49
S.397 Intr - 135637 135450 188 2 2 122 -46 102 0.015 -0.39
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_1|281_aa
MEQYCILGRIGEGAHGIVFKAKHVETGEIVALKKVALRRLEDGIPNQALREIKALQEIED
SQYVVQLKAVFPHGAGFVLAFEYMLSDLAEVLRHAQRPLAQAQVKSYLQMLLKGVAF
CHA
NNIVHRDLKPANLLISASGQLKIADFGLARVFSPDGSRLYTHQVATRWYRAPELLYGAR
Q
YNQGVDLWAVGCILGELLNGSPLFPGENDIEQLCCVLRILGTPSPQVWPEITELPDYNKI
SFKEQAPVPLEEVLPDASPQALDLLGRFLLYPPLQRIAASQ
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_2|422_aa
MARPDPSHICDLHHGSWQCQILNPLSEARNRTCILMFPSRIQFHCARMSPKMAQSPKMA
Q
SFKMAQSPKMAQSPKMAQSPKMAQSFKMAQSPKMAQSFKMAQSPKMAQSPKMAQSP
KMAQ
BI0505 LAB MANUAL
SFKMAQSFKMAQSPNKAGGAREPAGPIKRLLLCSWRAKVWFQNRRARISMKNAQKMK
KPH
VVPGPTQGEVGAPVQRGNGDPVPGSSQGGLYAPVPGSSQGGLYVPVPGSSQGGLYAPV
PE
SNQDGLYAPIPGPNQGKFCAPSFSCQQGPLAAQRDASHFWNPEDIPGQAIFLGYFGNRG
V
NIALPNTEIPAEEPSGNPNCSFFSGFSPTFLTTSQQPFSWAEAGSADLGGLGMPLAGNQA
LQDWRQHPPSGEQQSWWNQQPSPPPLPTAVLEPQLQGQILNPLREVRDRIRILMDTSRV
G
YR
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_3|96_aa
MAERREKKNEKKEKKKKKKKRGRGRERRGVEIGSLKSSKGPQPRHMEVPRLGVELELQ
LLLAYATAAALRDLSLICDIHHNSQQYWVLNPLMETRD
Function:
Probable serine/threonine-protein kinase mkcC
Length=891
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_4|204_aa
MLPPVMLPEPQGLGSSDEAWEQSGMWKFPSQEWNLCHSSNLSHCSDKARSLICGATGE
LLELVMSENDGRDVGLYLRRMEVPRRGVELELLAYAMATATQDPSHVFDLHHSSWQR
WILNPLSQARDQIRVLVDTSCVRYLCATTGAPLFLFFIVWLHPQHMEVPRLGVQSELHL
LATPQPQQRGIQAVSGTYTTVHGNAGSLTH
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_5|106_aa
MGPAPEAYGSSGLGVKSELQLPAYATAKATPDPSHVCHLQHSSGHPLNGTRAHLKVSQ
PQYNPVTTCDHQLPFGTVQNEGHPQRWMPPPGGALAWVTPYQARYAPY
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_6|65_aa
MSSGLMHPGLLYVDRKCLSEEQLLSATVGTWLLPWYMEVPRLGVESELQPPAYTTATA
MQDLSCI
BI0505 LAB MANUAL
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_7|122_aa
MGLYTPWLPELASGNTRLGVELTQQLPAYTTATAMQDPSCVCDLPHISWQCQILSPLSK
ARDRTRNLVDTSQGCTCGIMEVLRLRIQSELQLLATATSTAMVDPGCICKLQYSSQQCR
SLTH
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_8|95_aa
MRLQFRSLPLLGPHPQHMEVPRLGLTLELQLPGYATATTMWDPSHDYTCAMWRFPGY
GSNWSCSRRPTPQPQQHQIRAASANYTTSHGNTGSLTH
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_9|111_aa
MSAAAPAWERGNSTDLTLAKRAEARPTLMVHGGSQARGQIGAVAPKPQSQPQQSQIQT
VAPTYTRGPHLQHMEVPRMGRIRGAAASPRHSHSNAESERSLRPTSQLMAAPH
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_10|108_aa
MRLRVRSLPLLSGLTIRRCRIHWQALVSVGLLKAESSVLAGNEESALRKQMFPKSQTRR
PHWQHMEVSGLGVELELQLLAYTTAKATTTPDLSRICYLHPHLVATPDP
>/tmp/10_28_11-01:35:07.fasta|GENSCAN_predicted_peptide_11|186_aa
XPHPLHMQVPRLGITLDLQLPTTTTAPDPSHIGNPCCSLWQCKILNPLSKARDRIHILMN
RLHLWHMEVPRLGVESELQLPAYTAATATPDLSRIFYLYCSLWQHLILNLLSEATDGTR
NGTRILMNATRPNLQHMEVPRRGVKLELQLLAYPTATETPDPSHICSLHCSSRQCCILNP
LREAEE
BI0505 LAB MANUAL
Introduction:
Protein secondary structure includes the regular polypeptide folding patterns such as helices,
sheets, and turns. The backbone or main chain of a protein refers to the atoms that participate in
peptide bonds, ignoring the side chains of the amino acid. The conformation of the backbone can
therefore be described by the torsion angles (also called dihedral angles or rotation angles)
around the Phi and the Psi of each residue. The helix structure looks like a spring. The most
common shape is a right handed a-helix defined by the repeat length of 3.6 amino acid residues
and a rise of 5.4 Å per turn.
Secondary structure in proteins consists of local inter-residue interactions mediated by hydrogen
bonds, or not. The most common secondary structures are alpha helices and beta sheets. Other
helices, such as the 310 helix and π helix, are calculated to have energetically favorable
hydrogen-bonding patterns but are rarely if ever observed in natural proteins except at the ends
of α helices due to unfavorable backbone packing in the center of the helix. Other extended
structures such as the polyproline helix and alpha sheet are rare in native state proteins but are
often hypothesized as important protein folding intermediates. Tight turns and loose, flexible
loops link the more "regular" secondary structure elements. The random coil is not a true
secondary structure, but is the class of conformations that indicate an absence of regular
secondary structure.
Amino acids vary in their ability to form the various secondary structure elements. Proline and
glycine are sometimes known as "helix breakers" because they disrupt the regularity of the α
helical backbone conformation; however, both have unusual conformational abilities and are
commonly found in turns. Amino acids that prefer to adopt helical conformations in proteins
include methionine, alanine, leucine, glutamate and lysine ("MALEK" in amino-acid 1-letter
codes); by contrast, the large aromatic residues (tryptophan, tyrosine and phenylalanine) and Cβ-
branched amino acids (isoleucine, valine, and threonine) prefer to adopt β-strand conformations.
However, these preferences are not strong enough to produce a reliable method of predicting
secondary structure from sequence alone.
There are several methods for defining protein secondary structure (e.g. DEFINE, DSSP,
STRIDE (protein)).
BI0505 LAB MANUAL
1. To Compare the secondary structures of the following sequences and comment on the
result.
>1
MGLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLKSEDEMKA
SEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIPVKYLEFISECIIQVLQSK
HPGDFGADAQGAMNKALELFRKDMASNYKELGFQG
>2
MDPKQTTLLCLVLCLGQRIQAQEGDFPMPFISAKSSPVIPLDGSVKIQCQAIREAYLTQL
MIIKNSTYREIGRRLKFWNETDPEFVIDHMDANKAGRYQCQYRIGHYRFRYSDTLELVV
TGLYGKPFLSADRGLVLMPGENISLTCSSAHIPFDRFSLAKEGELSLPQHQSGEHPANFSL
GPVDLNVSGIYRCYGWYNRSPYLWSFPSNALELVVTDSIHQDYTTQNLIRMAVAGLVL
VALLAILVENWHSHTALNKEASADVAEPSWSQQMCQPGLTFARTPSVCK
Methods:
1. Take the sequence from uniprot or copy the sequence if already given
2. Go to http://www.compbio.dundee.ac.uk/www‐jpred/
3. Paste the sequence and click on make prediction
4. Wait for the software to predict the structure
5. Once Job is done . Save the output.
Results:
Sequence 1:
Jnet : ----HHHHHHHHHHHHHH---HHHHHHHHHHHHHHH-HHHHHHH----------
-----HHHHHHHHHHHHHHHHHHH----HHHHHHHHHHHHHH--------
HHHHHHHHHHHHHHH------HHHHHHHHHHHHHHHHHHHHHHHH----- : Jnet
jhmm : ----HHHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHHHHHHHHH---
-----------HHHHHHHHHHHHHHHHHHH----HHHHHHHHHHHHHH--------
HHHHHHHHHHHHHHH------HHHHHHHHHHHHHHHHHHHHHHHH----- : jhmm
jpssm : ----HHHHHHHHHHHHH----HHHHHHHHHHHHHHH--HHHHH-------------
---HHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHH------
HHHHHHHHHHHHHHHH------HHHHHHHHHHHHHHHHHHHHHHHHH---- : jpssm
Lupas 14 : ---------------------------------------------------------------------------------------------
-------------------------------------------------------------
Lupas 21 : ---------------------------------------------------------------------------------------------
-------------------------------------------------------------
Lupas 28 : ---------------------------------------------------------------------------------------------
-------------------------------------------------------------
Jnet_25 : --B---BB-BB--BB--B---B--BBBBBBBBBB--BB-B--BB--B-----------B--BB-
BB-BBB-BBB-BB--B--B--BB--BB--BB--B-B---BB--BB-BBBBBBB-BB---B---BB-BB--BB-
BBB-BBB--B----B--
Jnet_5 : ----------B--BB--B-------B--BB--BB------------B--------------B------BB-BB--
BB---------B--BB----------------B---BB-BB-----------B--BB--BB--BB-BB---B-------
Jnet_0 : ----------B-----------------B------------------------------------------BB---B------------
B---------------------B---B--------------B---B---B---------
Jnet Rel :
998468999999998620086689999999999884004466400577776533577753489999999999999987
0687589999999998750147876088999999999987447887468999999999999999999998604899
BI0505 LAB MANUAL
Inference:
Proline residues position is identical in template and query sequence. Histidine is highly
conserved.
Less than 50% coils are predicted.
Sequence 2:
Jnet : --HHHHHHHHHHHHH----EEEE-----EEEEEE---------EEEEEEEE-----
EEEEEEE-----------EEE------EEEEEE------EEEEEEEEE---------EEEEEEE------EEEE----EEE---
-EEEEEEE-----EEEEEEE---------------EEEEE--------EEEEEEEEE-----EEE----EEEEEEE---------
------EEE----EEEEEEE------EEEEEE------------------------------ : Jnet
jhmm : --HHHHHHHHHHHH-----EEEE-----EEEEE----EEE---EEEEEEEE-----
EEEEEEE----E-----EEEEE-----EEEEEEEEE---EEEEEEEEE---------EEEEEEE------EEE-----
EEE---EEEEEEEE-----EEEEEEE----------------EEEE--------EEEEEEEEE-----EEE----
EEEEEEE-------E-------EEE----EEEEEEE------EEEEEE--------------------EE-------- : jhmm
jpssm : --HHHHHHHHHHHHH----EEE-------EEEEE---------EEEEEEE------
EEEEEEE-------------------EEEE----------EEEEEEE-----------EEEEEE-------EEE-----------
EEEEEE-------EEEEEE--------------EEEEEE---------EEEEEEEE----EEE------EEEEEE------------
----------EEEEEE-------EEEE-------------------------------- : jpssm
Lupas 14 : ---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- : Lupas 14
Lupas 21 : ---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- : Lupas 21
Lupas 28 : ---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------- : Lupas 28
Jnet_25 : ---BBBBBBBBBBBBBB-B-BB---B--BBBBB-----B-----BBBBB-B---B--
BBBBB--------------------B-BBB--B---B-BBBBB-B-B--B-B--BB-B-BBB--------B-B-----B-----
BBBBB-B---B-BBBBBBB---------------B-B-B--B-----BBBBBBBBB--B-B-BB--B--BBB-B------
B-BB-BBB-BBB----BBBBBBBBB-BBB-BBBBB--B------B---------B-BBB-B--BB- : Jnet_25
Jnet_5 : -----B--BB--B---------------B-B-------------B-B-B--------B-B-----------------------
-B-B----------B-B-B------------B-B-B----------------------B-B-B---------B-B-------------------B-B-----
-----B-B-B---------------B-B-B-------------------------B-B---------B-B------------------------B------- :
Jnet_5
Jnet_0 : --------B-----------------------------------B-----------------------------------------------
-------B-B--------------B-B--------------------------B----------------------------------------------B----------
-----------B------------------------------------------------------------------------ : Jnet_0
BI0505 LAB MANUAL
Jnet Rel :
975899999999880056211106788705887036700077506888850788846888861577705356700000
777506640000006770588986007777777870588885378876056026750007870588885078887058
888436777767777875008884366467770488998616774065068880688884367777707777765000
77724676630788873576400246665667777777777770077777888 : Jnet Rel
Inference:
Less than 50% Strands is predicted
Results :
O13837: 4-aminobutyrate aminotransferase
>gi|6016100|sp|O13837.1|GABAT_SCHPO RecName: Full=4-aminobutyrate aminotransferase;
AltName: Full=GABA aminotransferase; Short=GABA-AT; AltName: Full=Gamma-amino-N-
butyrate transaminase; Short=GABA transaminase
MSSTATVTESTHFFPNEPQGPSIKTETIPGPKGKAAAEEMSKYHDISAVKFPVDYEKSIGN
YLVDLDGNVLLDVYSQIATIPIGYNNPTLLKAAKSDEVATILMNRPALGNYPPKEWARV
AYEGAIKYAPKGQKYVYFQMSGSDANEIAYKLAMLHHFNNKPRPTGDYTAEENESCLN
NAAPGSPEVAVLSFRHSFHGRLFGSLSTTRSKPVHKLGMPAFPWPQADFPALKYPLEEH
VEENAKEEQRCIDQVEQILTNHHCPVVACIIEPIQSEGGDNHASPDFFHKLQATLKKHDV
KFIVDEVQTGVGSTGTLWAHEQWNLPYPPDMVTFSKKFQAAGIFYHDLALRPHAYQHF
NTWMGDPFRAVQSRYILQEIQDKDLLNNVKSVGDFLYAGLEELARKHPGKINNLRGKG
KGTFIAWDCESPAARDKFCADMRINGVNIGGCGVAAIRLRPMLVFQKHHAQILLKKIDE
LI
Structure Prediction:
Jnet : --------------------------------HHHHHHHHHHH---------EEEEE---EEEEE----
EEEHHH--HHHHH-----HHHHHHHHHHHHH--------------HHHHHHHHHHHHHH------EEEE-
----HHHHHHHHHHHHHHHH----------HHHHHHHHH---------EEEEEE----------------------------
---E------------------HHHHHHHHHHHHHHH-----EEEEEE------------HHHHHHHHHHHHH---
EEEEE-------------------------HHHHHHHHH---HHHHHHH----------------HHHHHHHHHHHHH-
---HHHHHHHHHHHHHHHHHHHHH-------------EEEEEEEE---HHHHHHHHHHHHH--
EEEEE----EEEEE------HHHHHHHHHHHHH-- : Jnet
jhmm : --------------------------------HHHHHHHHHH----------EEEEE---EEEEE----
HHHHHH--HHHHH-----HHHHHHHHHHHH--------------HHHHHHHHHHHHHHH------
EEEE-----HHHHHHHHHHHHHHHH----------HHHHHHHH----------EEEEEE-------HHHH------
-------------EE--------------------HHHHHHHHHHHHH-----EEEEEE------------
HHHHHHHHHHHHH---EEEEEHHHH---------------------EEEEE------HHHHHHH----------------
HHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHHHHHHH------------EEEEEEEEE-----
HHHHHHHHHHH--EEEEE----EEEEE------HHHHHHHHHHHHH-- : jhmm
jpssm : -------------------------------HHHHHHHHHHHH---------EEEEE---EEEEE----
EEEEE---HHHHH------HHHHHHHHHHHH---------------HHHHHHHHHHHHH------EEEEE---
-HHHHHHHHHHHHHHHH----------HHHHHHHHH---------EEEEEE-----------E-------------------
-EE--------------HHHHHHHHHHHHHHHHH-----EEEEEEE-----------HHHHHHHHHHHHH---
EEEEE-------------------------HHHHHHHHHH----HHHHHH-----------------HHHHHHHHHHH---
-HHHHHHHHHHHHHHHHHHHHH----EEE-------EEEEEEE---HHHHHHHHHHHHH---
EEEE----EEEE-------HHHHHHHHHHHHH-- : jpssm
Lupas 14 : ---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
------------------------------ : Lupas 14
Lupas 21 : ---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
BI0505 LAB MANUAL
---------------------------------------------------------------------------------------------------------------------
------------------------------ : Lupas 21
Lupas 28 : ---------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
------------------------------ : Lupas 28
Jnet_25 : ----B-------BBB-----B-B-B--B-B--B--BB--B--BBBBBB--BBBBB--B--
BBBBB-----BBBBBBBBBBBBBBBBB--BB-BBB-BB--BBBBBBBBBBBBBBB-BB--BB-
BBB-BB---B-BBBBBBBBBBBBBBBBBBBBBBBB---------B--BBBBBBB-BB-----B-
BBBBBBBBBBBBBBBBBBBBB---B---B-BBB-BBBBBBBBBBBB----B--BB-BB--BB--B--
BB------BBBBBBBBBBBBBBBBBB---BB--BB-BB--B-
BBBBBBBBBBBBBBBBBBBBBBBB-B-BBBBBBBBBBBBBBBBBBBBBBBB-B--
BBBBBBBBBBBBBBBBBBBBB-BB----BB--B--BB--BB--B--BB--B--BB-
BBBBBBBBBBBBBBB----BB--BB-BBB-BBBBBBBBB--BBBBBBBBBB---BB--BB-BB---- :
Jnet_25
Jnet_5 : --------------------------------B--B-----------B----BBB--B--B-B-------
BBBBBBBBBBBBBB-----B---B---B---B-BBBBBB-------BB--BB--B----------BBBB-BBB-
BBBBBB-BB-----------------------B-----------BBBB-------BBBBBBBB------------B--B--B-BB--B--
----------B--B---B--B-------BBBBBBBBBBB--B-------BB--B---B----BBBBBB-B-BBB-BBB--
BBBB-------B-BBBBB--BBBBBBBBBBBBB----B-B-BB-BB-BBBBBBB-BBB--B--------B-----
-B---B--B-------B----B-B-BBBBBB-------B--BB--B---BBBB------BBBBBBBBBB----B--BB--B--
-- : Jnet_5
Jnet_0 : --------------------------------------------------------------B--------BBBB----B--------
----------------B-------------------B------------B------B---BB-BB-----------------------------------BB-----
------BB------------------------------------------------------------BBBBB-B-----------------B---B------
BBB--B--B---BB---B-----------B-B-B-------B-----------------------B-BBB--B------------B------B-----
------------------B---------------B---B--------------B-B----------B--B-------- : Jnet_0
Jnet Rel :
998654677777776777777642577877507899999874036787887458985584689954884000000123
334148887038999999987036656776666770068999999998744788835776067756899999999998
610467787776517888887026787787258886057777770000036777777777777777700001577777
777777500009999999999861588854888860057777777636899999999874586478850000777777
776542246777775000000000050004544000467645677778700016789999873088668999999999
999999987437840002688800588988506500999999988725706872588637750578764689999999
998607 : Jnet Rel
BI0505 LAB MANUAL
Inference:
P: blue
H: red
C: highly conserved.
Structure: Consists of both alpha helices and extended beta sheets
3. Find the secondary structure of the given sequence and compare with the output of 2.
Method:
1. Run Blastx to determine protein.
2. Predict Secondary Structure.
3. Go to http://www.compbio.dundee.ac.uk/www‐jpred/
4. Paste the sequence and click on make prediction
5. Wait for the software to predict the structure
6. Once Job is done . Save the output.
>
ATGTCTTCTACTGCCACCGTTACTGAAAGCACTCATTTTTTTCCCAATGAGCCTCAAGGCCCTAGCATTA
AGACCGAAACTATTCCCGGTCCCAAAGGTAAGGCCGCTGCTGAAGAAATGTCCAAATACCACGACATCAG
CGCTGTCAAGTTTCCTGTAGACTATGAAAAGTCCATTGGTAACTATCTTGTCGACTTGGATGGTAACGTT
CTCTTGGATGTTTACTCTCAAATCGCTACTATCCCCATTGGCTACAACAATCCTACTCTCCTCAAGGCTG
CCAAGTCGGACGAAGTCGCTACCATTTTAATGAACCGTCCTGCTTTGGGAAATTACCCTCCTAAGGAATG
GGCTCGTGTCGCTTATGAGGGTGCCATCAAATATGCCCCCAAGGGTCAAAAGTATGTTTACTTTCAAATG
AGTGGAAGTGATGCCAACGAGATTGCTTACAAGCTTGCTATGCTTCATCATTTCAACAACAAGCCTAGAC
CTACTGGTGATTACACTGCTGAAGAGAACGAGAGCTGCTTAAACAACGCTGCTCCTGGATCTCCCGAAGT
TGCTGTTCTCTCTTTCCGTCACTCTTTCCACGGACGTCTCTTTGGTTCTCTTTCCACTACTCGCTCCAAG
CCTGTTCACAAGCTTGGTATGCCTGCTTTCCCATGGCCTCAAGCTGATTTCCCTGCTTTGAAGTATCCTT
TGGAAGAGCACGTCGAAGAGAATGCAAAGGAGGAGCAACGCTGCATTGACCAGGTCGAGCAAATTTTAAC
TAACCACCATTGCCCTGTCGTTGCCTGTATCATTGAGCCCATTCAATCTGAGGGTGGTGACAACCATGCC
TCTCCTGACTTTTTCCACAAGCTTCAAGCTACTTTGAAGAAGCATGATGTCAAGTTTATCGTCGATGAAG
TCCAAACTGGTGTCGGCTCTACCGGTACTTTATGGGCTCACGAGCAATGGAATTTACCCTATCCTCCTGA
CATGGTTACCTTTTCCAAGAAATTCCAGGCTGCCGGTATTTTCTATCATGATTTGGCTCTTCGTCCTCAT
GCTTATCAGCACTTCAATACTTGGATGGGTGACCCATTCCGTGCTGTTCAATCTAGATATATTCTTCAAG
AAATTCAAGACAAGGATCTCCTTAATAACGTCAAGTCTGTTGGCGATTTCTTGTATGCTGGACTTGAAGA
GCTTGCTCGTAAGCACCCTGGCAAAATCAACAACCTCCGCGGTAAGGGAAAGGGTACTTTTATCGCTTGG
GATTGTGAGTCTCCTGCAGCCCGTGACAAATTCTGTGCTGACATGAGAATTAATGGTGTCAACATTGGTG
GCTGTGGTGTAGCTGCTATTCGTCTTCGTCCTATGCTTGTATTCCAAAAGCACCATGCTCAAATCCTTCT
CAAGAAGATTGACGAATTGATTTA
BI0505 LAB MANUAL
Results:
Inference:
Schizosaccharomyces pombe chromosome I, complete replicon
Length=5579133
Aim: Determine the 3d structure of human gaba transaminase using homology modeling
Introduction:
The tertiary structure of a protein or any other macromolecule is its three-dimensional structure,
as defined by the atomic coordinates. Tertiary structure is considered to be largely determined by
the protein's primary structure - the sequence of amino acids of which it is composed. Efforts to
predict tertiary structure from the primary structure are known generally as protein structure
prediction. However, the environment in which a protein is synthesized and allowed to fold are
significant determinants of its final shape and are usually not directly taken into account by
current prediction methods. Most such methods do rely on comparisons between the sequence to
be predicted and sequences of known structure in the Protein Data Bank and thus account for
environment indirectly, assuming the target and template sequences share similar cellular
contexts. In globular proteins, tertiary interactions are frequently stabilized by the sequestration
of hydrophobic amino acid residues in the protein core, from which water is excluded, and by the
consequent enrichment of charged or hydrophilic residues on the protein's water-exposed
surface. In secreted proteins that do not spend time in the cytoplasm, disulfide bonds between
cysteine residues help to maintain the protein's tertiary structure. A variety of common and stable
tertiary structures appear in a large number of proteins that are unrelated in both function and
evolution - for example, many proteins are shaped like a TIM barrel, named for the enzyme
triosephosphateisomerase. Another common structure is a highly stable dimeric coiled coil
structure composed of 2-7 alpha helices. Proteins are classified by the folds they represent in
databases like SCOP and CATH.
biological applications. Typically, the computational effort for a modeling project is less than 2
h. However, this does not include the time required for visualization and interpretation of the
model, which may vary depending on personal experience working with protein structures.
Swiss PDB viewer and swiss modeler are used as homology modeling software and
workspace.
Swiss-Pdb Viewer provides a user friendly interface allowing to analyze several proteins at the
same time.
1. Superimposition - structural alignments and compare their active sites or any other relevant
parts
10. POV-Ray scenes can be generated for stunning ray-traced quality images
Swiss Modeller
Protein sequence and structure databases necessary for modelling are accessible from the
workspace and are updated in regular intervals. Software tools for template selection, model
building, and structure quality evaluation can be invoked from within the workspace. A personal
working environment (workspace), where several modelling projects can be carried out in
parallel, is provided for each user.
Methods:
2.select the chain A, in control panel and in the menu bar click the bulid option and select the
inverse selection and then click on the remove selected residues.
4. open the empty window again, and click the swissmodel to load the raw sequence.
5.open the pdb file through the import structures in the "File" menubar.
6. Click the magic fit, iterative magic fit from Fit option in the menubar.
7. Open the alignment window from the wind and select the residues which are not aligned.
8.Delete the residues which are not aligned using the Build option in the menubar and click the
remove residues and save it.
9. Now submit this to the swiss modelling request for the raw
10. Download the modelled protein and open in the swiss viewer.
12. open the seq-structure aligned protein (step 8) and energy minimized protein in the viewer
and click the improve fit
15. Use Protein Structure & Model Assessment Tools for analyzing the protein.
BI0505 LAB MANUAL
MASMLLAQRLACSFQHSYRLLVPGSRHISQAAAKVDVEFDYDGPLMKTEVPGPRSQEL
MKQLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQ
PQNASMFVNRPALGILPPENFVEKLRQSLLSVAPKGMSQLITMACGSCSNENALKTIFM
WYRSKERGQRGFSQEELETCMINQAPGCPDYSILSFMGAFHGRTMGCLATTHSKAIHKI
DIPSFDWPIAPFPRLKYPLEEFVKENQQEEARCLEEVEDLIVKYRKKKKTVAGIIVEPIQSE
GGDNHASDDFFRKLRDIARKHGCAFLVDEVQTGGGCTGKFWAHEHWGLDDPADVMT
FSKKMMTGGFFHKEEFRPNAPYRIFNTWLGDPSKNLLLAEVINIIKREDLLNNAAHAGK
ALLTGLLDLQARYPQFISRVRGRGTFCSFDTPDDSIRNKLILIARNKGVVLGGCGDKSIRF
RPTLVFRDHHAHLFLNIFSDILADFK
BI0505 LAB MANUAL
Gabat.txt and 1OHVA.pdb Modeled Structure at swisspdb viewer and swiss modeller
RMSD: 0.07A
BI0505 LAB MANUAL
The qmean score(-1.129) and procheck (rc plot : 99.5% in allowed region)score were within
ranges proving protein structure as stable.
BI0505 LAB MANUAL
Introduction:
Basic Modeling. Model a sequence with high identity to a template.This exercise introduces the
use of MODELLER in a simple case where the template selection and target-template
alignments are not a problem.
Advanced Modeling. Model a sequence based on multiple templates and bound to a ligand.This
exercise introduces the use of multiple templates, ligands and loop refinement in the process of
model building with MODELLER.
Iterative Modeling. Increase the accuracy of the modeling exercise by iterating the 4 step
process.This exercise introduces the concept of MOULDING to improve the accuracy of
comparative models.
Difficult Modeling. Model a sequence based on a low identity to a template.This exercise uses
resources external to MODELLER in order to select a template for a difficult case of protein
structure prediction.
Modeling with cryo-EM. Model a sequence using both template and cryo-EM data.This
exercise assesses the quality of generated models and loops by rigid fitting into cryo-EM maps,
and improves them with flexible EM fitting.
BI0505 LAB MANUAL
Method:
1. Take query sequence whose structure needs to be modelled (e.g gabat) in PIR format.
2. Save the file with .ali extension in the bin folder of modeller.
3. Open build_profile.py file. Change the append filename to the query sequence(gabat.ali).
4. Open the command line by clicking the 'Modeller' link from the Start Menu in Windows.
5. Run the build_profile.py.This will search for potentially related sequences of known
structure. Two files are created build_profile_gabat.ali file and
build_profile_gabat.prf file.
6. Open the build_profile.prf file and select the sequences which has an e value 0.0 .
7. Download the structures of the selected protein from the PDB and save it in bin folder of
modeller.
8. Open the compare.py file.Write the the name of the selected proteins.
10. Choose the sequence with high resolution and moderate identity.
11. Align the query sequence with the template by using align2d command.
12. Two output files are created .pap file and .ali file.
13. Open model_single.py file .Use the above created .ali file .Run the model_single.py
command in the command line.
14. 5 possible models are generated .Select the best model which has the lowest dope score.
15. Run evaluate_model.py command for evaluating the selected model.Note the Dope score.
16. Run evaluate_template.py command for evaluating the template. Note the Dope score.
>P1;gabat
sequence:gabat: 0: : 0: :::-1.00:-1.00
MASMLLAQRLACSFQHSYRLLVPGSRHISQAAAKVDVEFDYDGPLMKTEVPGPRSQELMKQLNIIQNAEA
VHFFC
NYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQNASMFVNRPALGILPPENFVEKLRQ
SLLSV
APKGMSQLITMACGSCSNENALKTIFMWYRSKERGQRGFSQEELETCMINQAPGCPDYSILSFMGAFHGR
TMGCL
ATTHSKAIHKIDIPSFDWPIAPFPRLKYPLEEFVKENQQEEARCLEEVEDLIVKYRKKKKTVAGIIVEPI
QSEGG
DNHASDDFFRKLRDIARKHGCAFLVDEVQTGGGCTGKFWAHEHWGLDDPADVMTFSKKMMTGGFFHKEEF
RPNAP
YRIFNTWLGDPSKNLLLAEVINIIKREDLLNNAAHAGKALLTGLLDLQARYPQFISRVRGRGTFCSFDTP
DDSIR
NKLILIARNKGVVLGGCGDKSIRFRPTLVFRDHHAHLFLNIFSDILADFK*
>P1;2oatA
structure:2oatA: 28: : 404: :::-1.00:-1.00
----------------------------------------------------------------------
-----
--ERGKGIYLWDVEGRKYFDFLSSYSAVNQGHCHPKIVNALKSQVDKLTLTSRAVLG--EYEEYITKL--
-----
--FNYHKVLPMNTGVEAGETACKLARKW---------GYTVKGIQKYKA---------
KIVFAAGNFWGRTLSAI
SSS-------TDPTSYD-GFGPF----MPGFDIIPYND------LPALERAL-----
QDPNVAAFMVEPIQGEAG
VVVPDPGYLMGVRELCTRHQVLFIADEIQTGLARTGRWLAVDYENV--RPDIVLLG-
KALSGGLYDDDIMLTIKP
GEHGSTYGGNPLGCRVAIAALEVLEEENLAENADKLGIILRNELMKLPS---
DVVTAVRGKGLLNAIVIKEDWDA
WKVCLRLRDNGLLAKPTHGDIIRFAPPLVIKEDELRESIEIINKTILSF-*
>P1;1d7uA
structure:1d7uA: 28: : 427: :::-1.00:-1.00
----------------------------------------------------------------------
-----
--ERAKGSFVYDADGRAILDFTSGQMSAVLGHCHPEIVSVIGEYAGKSGMLSRP----------
VVDLATRLANI
TPPGLDRALLLSTGAESNEAAIR------------------------MAKLVTG--
KYEIVGFAQSWHGMTGAAA
SATYSKGVGPAAVGSFAIP-APFPR-------FERNGAYDYLAELDYAFDLI--
DRQSSGNLAAFIAEPILSSGG
IIELPDGYMAALKRKCEARGMLLILDEAQTGVGRTGTMFACQRDGV--
TPDILTLSKTLGAGTSAAIEERAHELG
BI0505 LAB MANUAL
YLFYTTHVSDPLPAAVGLRVLDVVQRDGLVARANVMGDRLRRGLLDLMERF-
DCIGDVRGRGLLLGVEEPADGLG
AKITRECMNLGVQLPGMGG-VFRIAPPLTVSEDEIDLGLSLLGQAI----*
>P1;1s0aA
structure:1s0aA: 32: : 261: :::-1.00:-1.00
----------------------------------------------------------------------
-----
----AEGCELILSDGRRLVDGMSSWWAAIHGYNHPQLNAAMKSQIDAMSHVMFGGITHAP----
AIELCRKLVAM
TPQPLECVFLADSGSVAVEVAMKMALQYWQAKGEARQRF---------------------
LTFRNGYHGDTFGAM
SVCDDNSMHSL------WKFAPAPQSR--MGEWDERDMVGFAR-------LMAAHRHE---
IAAVIIEPIQGAGG
MRMYHPEWLKRIRKICDREGILLIADEIATGFGRTGKLFACEH---------------------------
-----
----------------------------------------------------------------------
-----
--------------------------------------------------*
>P1;2gsaA
structure:2gsaA: 38: : 338: :::-1.00:-1.00
----------------------------------------------------------------------
-----
-FDRVKDAYAWDVDGNRYIDYVGTWGPAICGHAHPEVIEALKVAMEKGTSFGAPC----
ALENLAEMVNDAVPSI
E---MVRFVNSGTEACM---AVLRLMRAYTGRDK-------------------------
IIKFEGCYHGHADMFL
VKAGS-GVATLGLPSS--PGVP-----------
KKTTANTLTTPYNDLEAVKALFAENPGEIAGVILEPIVGNSG
FIVPDAGFLEGLREITLEHDALLVFDEVMTGGGVQEKFGV--------
TPDLTTLGKGLPVGAYGGKREIAPAGP
MYQAGTLSGNPLAMTAGIKTLELLRQPGTYEYLDQITKRLSDGLL-------------------------
-----
--------------------------------------------------*
>P1;1ohvA
structure:1ohvA: 1: : 461: :::-1.00:-1.00
--------------------------------------
FDYDGPLMKTEVPGPRSRELMKQLNIIQNAEAVHFFC
NYEESRGNYLVDVDGNRMLDLYSQISSIPIGYSHPALVKLVQQPQNVSTFINRPALGILPPENFVEKLRE
SLLSV
APKGMSQLITMACGSCSNENAFKTIFMWYRSKERGQSAFSKEELETCMINQAPGCPDYSILSFMGAFHGR
TMGCL
ATTHSKAIHKIDIPSFDWPIAPFPRLKYPLEEFVKENQQEEARCLEEVEDLIVKYRKKKKTVAGIIVEPI
QSEGG
DNHASDDFFRKLRDISRKHGCAFLVDEVQTGGGSTGKFWAHEHWGLDDPADVMTFSKKMMTGGFFHKEEF
RPNAP
YRIFNTWLGDPSKNLLLAEVINIIKREDLLSNAAHAGKVLLTGLLDLQARYPQFISRVRGRGTFCSFDTP
DESIR
NKLISIARNKGVMLGGCGDKSIRFRPTLVFRDHHAHLFLNIFSDILADF-*
BI0505 LAB MANUAL
>P1;1sffA
structure:1sffA: 36: : 424: :::-1.00:-1.00
----------------------------------------------------------------------
-----
-----------DVEGREYLDFAGGIAVLNTGHLHPKVVAAVEAQLKK---
LSHTCFQVLAYEPYLELCEIMNQKV
PGDFAKKTLLVTTGSEAVENAVKI------ARAATKRS--------------------
GTIAFSGAYHGRTHYTL
ALT-----GKVNPYSAGMGLMPVYRALYPCP--LHGISEDDA--IASIH-
RIFKNDAAPEDIAAIVIEPVQGEGG
FYASSPAFMQRLRALCDEHGIMLIADEVQSGAGRTGTLFAMEQMGV--APDLTTFAKS-
IAGGFGRAEVMDAVAP
GGLGGTYAGNPIACVAALEVLKVFEQENLLQKANDLGQKLKDGLLAIAEKHPE-
IGDVRGLGAMIAIELFEDGDH
NKIVARARDKGLILLSCGPNVLRILVPLTIEDAQIRQGLEIISQCFDEAK*
Align2d.ali
>P1;1ohvA
structureX:1ohv.pdb: 11 :A:+461 :A:MOL_ID 1; MOLECULE 4-
AMINOBUTYRATE AMINOTRANSFERASE; CHAIN A, B, C, D; FRAGMENT RESIDUES
29-500; SYNONYM GAMMA-AMINO-N-BUTYRATE TRANSAMINASE, GABA TRANSAMI
GABA AMINOTRANSFERASE, GABA-AT, GABA-T; EC 2.6.1.19:MOL_ID 1;
ORGANISM_SCIENTIFIC SUS SCROFA; ORGANISM_COMMON PIG; ORGANISM_TAXID
9823; ORGAN LIVER: 2.30:-1.00
--------------------------------------
FDYDGPLMKTEVPGPRSRELMKQLNIIQNAEAVHFFCNYEESRGNYLVDVDGNRMLDLYSQISSIPIGYS
BI0505 LAB MANUAL
HPALVKLVQQPQNVSTFINRPALGILPPENFVEKLRESLLSVAPKGMSQLITMACGSCSNENAFKTIFMW
YRSKERGQSAFSKEELETCMINQAPGCPDYSILSFMGAFHGRTMGCLATTHSKAIHKIDIPSFDWPIAPF
PRLKYPLEEFVKENQQEEARCLEEVEDLIVKYRKKKKTVAGIIVEPIQSEGGDNHASDDFFRKLRDISRK
HGCAFLVDEVQTGGGSTGKFWAHEHWGLDDPADVMTFSKKMMTGGFFHKEEFRPNAPYRIFNTWLGDPSK
NLLLAEVINIIKREDLLSNAAHAGKVLLTGLLDLQARYPQFISRVRGRGTFCSFDTPDESIRNKLISIAR
NKGVMLGGCGDKSIRFRPTLVFRDHHAHLFLNIFSDILADF-*
>P1;gabat
sequence:gabat: : : : ::: 0.00: 0.00
MASMLLAQRLACSFQHSYRLLVPGSRHISQAAAKVDVEFDYDGPLMKTEVPGPRSQELMKQLNIIQNAEA
VHFFCNYEESRGNYLVDVDGNRMLDLYSQISSVPIGYSHPALLKLIQQPQNASMFVNRPALGILPPENFV
EKLRQSLLSVAPKGMSQLITMACGSCSNENALKTIFMWYRSKERGQRGFSQEELETCMINQAPGCPDYSI
LSFMGAFHGRTMGCLATTHSKAIHKIDIPSFDWPIAPFPRLKYPLEEFVKENQQEEARCLEEVEDLIVKY
RKKKKTVAGIIVEPIQSEGGDNHASDDFFRKLRDIARKHGCAFLVDEVQTGGGCTGKFWAHEHWGLDDPA
DVMTFSKKMMTGGFFHKEEFRPNAPYRIFNTWLGDPSKNLLLAEVINIIKREDLLNNAAHAGKALLTGLL
DLQARYPQFISRVRGRGTFCSFDTPDDSIRNKLILIARNKGVVLGGCGDKSIRFRPTLVFRDHHAHLFLN
IFSDILADFK*
Evaluate_template.py
BI0505 LAB MANUAL
Evaluate_model.py
Aim:
To quantify the interaction of the ligand with the protein target using Glide protocol of
Schrodinger package.
Introduction:
Glide is Grid-based Ligand Docking with Energetics which searches for favorable
interactions between one or more ligand molecules and a receptor molecule, usually a protein.
Each ligand must be a single molecule, while the receptor may include more than one molecule,
e.g., a protein and a cofactor. Glide can be run in rigid or flexible docking modes; the latter
automatically generates conformations for each input ligand. The combination of position and
orientation of a ligand relative to the receptor, along with its conformation in flexible docking, is
referred to as a ligand pose.
The ligand poses that Glide generates pass through a series of hierarchical filters that
evaluate the ligand’s interaction with the receptor. The initial filters test the spatial fit of the
ligand to the defined active site, and examine the complementarity of ligand-receptor interactions
using a grid-based method patterned after the empirical ChemScore function. Poses that pass
these initial screens enter the final stage of the algorithm, which involves evaluation and
minimization of a grid approximation to the OPLS-AA nonbonded ligand-receptor interaction
energy.
Final scoring is then carried out on the energy-minimized poses. By default, Schrödinger’s
proprietary GlideScore multi-ligand scoring function is used to score the poses. If GlideScore
was selected as the scoring function, a composite Emodel score is then used to rank the poses of
each ligand and to select the poses to be reported to the user. Emodel combines GlideSore, the
nonbonded interaction energy, and, for flexible docking, the excess internal energy of the
generated ligand conformation. Glide uses a hierarchical series of filters to search for possible
locations of the ligand in the active-site region of the receptor. The shape and properties of the
receptor are represented on a grid by several different sets of fields that provide progressively
more accurate scoring of the ligand poses. Conformational flexibility is handled in Glide by an
BI0505 LAB MANUAL
Method:
Receptor
Define receptor:
Select ligand from workspace to exclude it from the grid
Site
Default setting
Settings
Ligands
Default settings
Output
View Results
Project Æ Show table
Result: