0% found this document useful (0 votes)

10 views

Slides 3

Uploaded by

Phlip Ong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

Slides 3

Uploaded by

Phlip Ong

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 53

So Many Choices, So Little Money:

systematic assignment of proteins to functional classes

With the human genome finished it is unclear where to

focus the efforts.

We have been here We are heading here where next

The three layers of genome annotation: where, what
and how?
The three layers of genome annotation: where, what
and how?
If gene hunting is easy (which is not universally
accepted) then assignment of gene/protein function
is not!
A number of observations may make the job easier.

• Proteins contain a variety of different functional

domains
• The evolution of genes can add different domains
and result in the generation of novel functional units
• Proteins exist in families which are conserved
amongst species
Protein Domains

• A domain is an independent structural unit

which can be found alone or in conjunction
with other domains or repeats.

• Module = mobile domain.

• Different domains have distinct functions.

• Many eukaryotic proteins have multiple

domains.
Protein Domains

PX domain with SH3 domain with

ligand ligand
• Rapidly growing databases of protein
sequences due to genome sequencing
projects.
• Many new proteins belong to protein families
with known functions, (significant sequence
similarity).
• Only a small fraction of known proteins have
functions determined by experiment.
• Databases providing computational sequence
analysis allow us to classify new proteins to
known families, and thus (potentially)
determine their function.
There are multiple tools to allow such
analysis
SMART (Simple Modular Architecture
Research Tool)
• There are over 600 domain families.
• Provides information about :
– function .
– subcellular localization.
– phyletic distribution.
– tertiary structure.
• Based on HMMs (Hidden Markov Models).
Domain Architecture
Protein: PA-3427CG
Species: Drosophila
melanogaster

Protein: ENSMUSP00000023109
Species: Mus musculus

Protein:
ENSANGP00000009529
Species: Anopheles gambiae
PROSITE - database of protein families
and domains
• Database of biologically significant sites and patterns.
Contains 1,609 profiles.
• Pattern – conserved sequence of a few amino acids.
• Identifies to which known family of proteins (if any) the
new sequence belongs.
• Used to determine the function of uncharacterized proteins
translated from genomic or cDNA sequences.
The evolution of genes can add different domains and
result in the generation of novel functional units.
Direct comparisons of homologous sequences
between different species can aid in the understanding
of protein classes.

• One of the most powerful approaches to predict the exact function of

a protein is to find its characterised ortholog from a different species

• An “Ortholog” is a homologous sequence from a different species

that arose from a common ancestor gene but may or may not have
a similar function

• A “Paralog” is a homologous sequence that diverged in a single

species by gene duplication
For example most rodent genes have a
human counterpart

1:1 Other Non-

Orthologues Homologues homologues

~80%

~20%
<1%
The numbers of proteins in different species
varies
The transcription factors families shown are the largest of
their category out of the 1,502 human protein families
In order to extract the maximum amount of information
from the rapidly accumulating genome sequences, all
conserved genes need to be classified according to their
homologous relationships.

• Comparison of proteins encoded in complete genomes allowed the

delineation of clusters of orthologous groups (COGs).
• Each COG consists of individual orthologous proteins or orthologous
sets of paralogs from at least three lineages. Orthologs typically have the
same function, allowing transfer of functional information from one
member to an entire COG.
• This relation automatically yields a number of functional predictions for
poorly characterized genomes.
• The COGs comprise a framework for functional and evolutionary
genome analysis.
A functional and phylogenetic
breakdown of the COGs.
Each column shows a COG; a
double streak indicates that two or
more paralogs from the given
species belong to the particular
COG.
Better (and improving) organisation of
data from multiple sources allows a more
complete understanding of genomic
information.

This will better allow for functional analysis

The Gene Ontology (GO) project is a collaborative effort to
address the need for consistent descriptions of gene
products in different databases.

The GO project has developed three structured controlled

vocabularies (ontologies) that describe gene products in
terms of their associated biological processes, cellular
components and molecular functions in a species-
independent manner.

The use of GO terms by collaborating databases facilitates

uniform queries across them.

As an example, you can use GO to find all the gene products

in the mouse genome that are involved in signal
transduction, or you can zoom in on all the receptor tyrosine
kinases.
There are many practical examples of the
use of such analysis
The malaria genome — and beyond
Nature 419, 512 - 519 (2002)
Species of malaria parasite that infect rodents have long been used
as models for malaria disease research. This study reported the
whole-genome shotgun sequence of one species, Plasmodium yoelii
yoelii, and comparative studies with the genome of the human malaria
parasite Plasmodium falciparum clone 3D7. A synteny map of 2,212 P.
y. yoelii contiguous DNA sequences (contigs) aligned to 14 P.
falciparum chromosomes reveals marked conservation of gene
synteny within the body of each chromosome.
Of about 5,300 P. falciparum genes, more than 3,300 P. y. yoelii
orthologues of predominantly metabolic function were identified.

This was the first genome sequence of a model eukaryotic parasite,

and it provides insight into the use of such systems in the modelling
of Plasmodium biology and disease.
A proteomic view of the Plasmodium falciparum life
cycle Nature 419, 520–526 (2002);
sporozoite, merozoite, trophozoite and gametocyte preparations were
lysed, digested and analysed independently by Tandem mass
spectrometry (MS/MS).
A proteomic view of the Plasmodium falciparum life
cycle Nature 419, 520–526 (2002);
sporozoite, merozoite, trophozoite and gametocyte preparations were
lysed, digested and analysed independently by Tandem mass
spectrometry (MS/MS).
Data sets from blood stages were searched against a database
containing both P. falciparum protein sequences and 24,006 ORFs
from the human, mouse and rat RefSeq NCBI databases.

Functional profiles of expressed proteins.

Functional classification comparison between P.
falciparum and P. y. yoelii proteins.
Protein Structure Prediction and Structural Genomics

Understanding of biological role of proteins will require

knowledge of their structure and function.

Although experimental structure determination methods are

providing high-resolution structure information about a subset
of the proteins, computational structure prediction methods
will provide valuable information for the large fraction of
sequences whose structures will not be determined
experimentally.
Protein Structure Prediction and Structural Genomics

Understanding of biological role of proteins will require

knowledge of their structure and function.

Although experimental structure determination methods are

This can be done in a high throughput manner

http://protein.gsc.riken.go.jp/

Aim
The aim of the PRG research is to experimentally obtain three-
dimensional (3D) protein structures and their molecular
functions on an equivalent scale to the genome sequencing
projects. In our project, the research focus will be shifted back to
biologists, to elucidate the cellular functions, or to chemists, who
will promote drug discovery programs based on information
regarding the active-site geometries for drug design .
Protein Structure Prediction and Structural Genomics

The first class of protein structure prediction methods,

including threading and comparative modelling, rely on
detectable similarity spanning most of the modelled
sequence and at least one known structure.

The second class of methods, de novo or ab initio

methods, predict the structure from sequence alone,
without relying on similarity at the fold level between the
modelled sequence and any of the known structures.
Modelling protein structures as a functional genomics tool

The first step in modelling of a protein sequence is to attempt to find related

known protein structures in the Protein Data Bank for as many domains in the
modelled sequence as possible (fold recognition or fold assignment).
The folds of domains in the target sequence can be assigned by pairwise and
multiple sequence similarity searches as well as by threading methods that
rely explicitly on the known structures of the candidate template proteins.
We used a structure prediction service to analysis the
PLUNC family

Fold Library Last Updated: Wed Oct 9 06:00:00 2002: [7733] Structures

Last updated: Tue Aug 6 12:27:55 2002 Visitors To Date:

Welcome to the 3D-PSSM Web Server V 2.6.0
A Fast, Web-based Method for Protein Fold Recognition using 1D and 3D
Sequence Profiles coupled with Secondary Structure and Solvation
Potential Information.

http://www.sbg.bio.ic.ac.uk/~3dpssm/
PLUNC proteins are
predicted to be
structurally similar to
BPI and LBP
All PLUNCs on the BPI x-ray structure

This analysis suggests that PLUNCs retain the hydrophobic pockets seen in
BPI and may therefore have the ability to interact with bacterial lipopeptides
for example LPS. They may be either pro or anti-inflammatory
Applications of comparative
modeling.

The potential uses of a comparative

model depend on its accuracy. This in
turn depends significantly on the
sequence identity between the
modeled sequence and the known
structure on which the model was
based. Sample models and
corresponding experimental
structures are shown on the right.
Multiple types of interacting technologies can be
used to practically assign potential function
Systematic assignment of gene function is still an
evolving art form.

All of the computational techniques will still only give an

indication of the function of a gene.

At the end of the day it is still an absolute requirement to directly

show that an individual protein exhibits the expected/predicted
function

Cellular and Molecular Pharmacology
From Everand
Cellular and Molecular Pharmacology
Dr. Amteshwar Singh Jaggi
4.5/5 (6)
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
PROTEIN FAMILY
No ratings yet
PROTEIN FAMILY
5 pages
Fat Noews Docx (9)
No ratings yet
Fat Noews Docx (9)
21 pages
Genomics and Proteomics
No ratings yet
Genomics and Proteomics
4 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Note On COGs
No ratings yet
Note On COGs
6 pages
35021244
No ratings yet
35021244
5 pages
Anotacion_de_Genomas
No ratings yet
Anotacion_de_Genomas
84 pages
Proteins Bioinfo Latest
No ratings yet
Proteins Bioinfo Latest
45 pages
10 1146@annurev Biophys 31 082901 134314 PDF
No ratings yet
10 1146@annurev Biophys 31 082901 134314 PDF
29 pages
structure of genomes 2
No ratings yet
structure of genomes 2
8 pages
BIF101 FINAL TERM Questions BY Zainab Arshad
No ratings yet
BIF101 FINAL TERM Questions BY Zainab Arshad
34 pages
Answer For Hots Question
No ratings yet
Answer For Hots Question
24 pages
Chapter 20 Genomics
No ratings yet
Chapter 20 Genomics
43 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Gene Pridiction and Orf
No ratings yet
Gene Pridiction and Orf
34 pages
L2 Proteomics, Genomics and Bioinformatics
No ratings yet
L2 Proteomics, Genomics and Bioinformatics
30 pages
Inter Pro
No ratings yet
Inter Pro
7 pages
Requency Domain Approach To Protein Sequence Similarity Analysis and Functional Classification
No ratings yet
Requency Domain Approach To Protein Sequence Similarity Analysis and Functional Classification
14 pages
BIF401 Midterm Past Papers Subjective
No ratings yet
BIF401 Midterm Past Papers Subjective
10 pages
Protein Side Chain Correction
No ratings yet
Protein Side Chain Correction
28 pages
1. Functional Proteome intro_2
No ratings yet
1. Functional Proteome intro_2
35 pages
Bioinformatics Unit I
No ratings yet
Bioinformatics Unit I
6 pages
Genomics 1
No ratings yet
Genomics 1
47 pages
Unit Vi
No ratings yet
Unit Vi
64 pages
Protein Functions
No ratings yet
Protein Functions
28 pages
Lec (3) - Protein_databases
No ratings yet
Lec (3) - Protein_databases
22 pages
Protein Folding
No ratings yet
Protein Folding
21 pages
The Role of Protein Structure in Genomics: Minireview
No ratings yet
The Role of Protein Structure in Genomics: Minireview
5 pages
Chapter 5: Modern Sequencing: - Sequencing by Mass Spectrometry
No ratings yet
Chapter 5: Modern Sequencing: - Sequencing by Mass Spectrometry
28 pages
Proteomics & Genomics
No ratings yet
Proteomics & Genomics
11 pages
3D Structure Prediction
No ratings yet
3D Structure Prediction
18 pages
Review Questions
No ratings yet
Review Questions
19 pages
Fold Lib
100% (1)
Fold Lib
24 pages
Annotating Genomes Using Proteomics Data: Andy Jones Department of Preclinical Veterinary Science
No ratings yet
Annotating Genomes Using Proteomics Data: Andy Jones Department of Preclinical Veterinary Science
22 pages
Unit-5 Bioinformatics
No ratings yet
Unit-5 Bioinformatics
13 pages
Bioinformatics2
No ratings yet
Bioinformatics2
42 pages
Protein Database
No ratings yet
Protein Database
8 pages
BIF501-Bioinformatics-II Solved Questions FINAL TERM (PAST PAPERS)
No ratings yet
BIF501-Bioinformatics-II Solved Questions FINAL TERM (PAST PAPERS)
23 pages
Protein Sequence Analysis
No ratings yet
Protein Sequence Analysis
44 pages
setubal2017
No ratings yet
setubal2017
24 pages
Bioinfo Training Material
No ratings yet
Bioinfo Training Material
42 pages
Research Article: Vanitha NM ., Jayarama Reddy and Ranganathan T.V
No ratings yet
Research Article: Vanitha NM ., Jayarama Reddy and Ranganathan T.V
10 pages
Proclust:: Improved Clustering of Protein Sequences With An Extended Graph-Based Approach
No ratings yet
Proclust:: Improved Clustering of Protein Sequences With An Extended Graph-Based Approach
58 pages
Brutlag 98
No ratings yet
Brutlag 98
6 pages
NIH Public Access: Author Manuscript
No ratings yet
NIH Public Access: Author Manuscript
9 pages
Datos de Bases de Enzimas
No ratings yet
Datos de Bases de Enzimas
2 pages
Bioinformatics Notes
No ratings yet
Bioinformatics Notes
104 pages
Unit 1: Structural Genomics
No ratings yet
Unit 1: Structural Genomics
4 pages
Human Genome Project: Presented By: Vaishali Gade & Sandhya Singh
No ratings yet
Human Genome Project: Presented By: Vaishali Gade & Sandhya Singh
30 pages
Genome Wide Prediction and Analysis of Protein Protein Functional Linkages in Bacteria Complete EPUB eBook
100% (13)
Genome Wide Prediction and Analysis of Protein Protein Functional Linkages in Bacteria Complete EPUB eBook
15 pages
genomicsproteomics-180414063127
No ratings yet
genomicsproteomics-180414063127
46 pages
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
From Everand
Neuroevolution: Fundamentals and Applications for Surpassing Human Intelligence with Neuroevolution
Fouad Sabry
No ratings yet
Chapter 6
No ratings yet
Chapter 6
68 pages
Practical Lab Exercise for Intro Bioinf II (2)
No ratings yet
Practical Lab Exercise for Intro Bioinf II (2)
29 pages
The Gene Ontology Consortium - Gene Ontology: Tool For The Unification of Biology
No ratings yet
The Gene Ontology Consortium - Gene Ontology: Tool For The Unification of Biology
5 pages
Class12 Biological Database
No ratings yet
Class12 Biological Database
23 pages
Superfamily Database
No ratings yet
Superfamily Database
8 pages
Gene Control: Unlocking Genetic Secrets
From Everand
Gene Control: Unlocking Genetic Secrets
Deevakar Asan
No ratings yet
7 Longrangereg
No ratings yet
7 Longrangereg
65 pages
6 Micro Arrays
100% (1)
6 Micro Arrays
60 pages
2 Intropapers
No ratings yet
2 Intropapers
29 pages
8 Epigenetic
No ratings yet
8 Epigenetic
54 pages
lt16 06cmn
No ratings yet
lt16 06cmn
28 pages
4-Regul Signals
No ratings yet
4-Regul Signals
52 pages
Food Chemistry: X: Caulerpa Chemnitzia
No ratings yet
Food Chemistry: X: Caulerpa Chemnitzia
9 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
lt11 06cmn
No ratings yet
lt11 06cmn
39 pages
lt17 06nd
No ratings yet
lt17 06nd
41 pages
lt14 06cmn
No ratings yet
lt14 06cmn
21 pages
lt05 06cmn
No ratings yet
lt05 06cmn
19 pages
First Order Linear
No ratings yet
First Order Linear
2 pages
DMT Entity Encounters Dialogues on the Spirit Molecule with Ralph Metzner, Chris Bache, Jeffrey Kripal, Whitley Strieber, Angela Voss, and Others Final Version Download
100% (13)
DMT Entity Encounters Dialogues on the Spirit Molecule with Ralph Metzner, Chris Bache, Jeffrey Kripal, Whitley Strieber, Angela Voss, and Others Final Version Download
15 pages
(Ebook) Grainger & Allison’s Diagnostic Radiology: Single Best Answer MCQs by Andrew McQueen, Lee Grant, Jennifer Findlay, Sheetal Sharma, Vivek Shrivastava, Scott McDonald ISBN 9780702031496, 0702031496 - Download the ebook now to start reading without waiting
100% (2)
(Ebook) Grainger & Allison’s Diagnostic Radiology: Single Best Answer MCQs by Andrew McQueen, Lee Grant, Jennifer Findlay, Sheetal Sharma, Vivek Shrivastava, Scott McDonald ISBN 9780702031496, 0702031496 - Download the ebook now to start reading without waiting
40 pages
Emotional Intelligence in The Workplace
No ratings yet
Emotional Intelligence in The Workplace
10 pages
OS3 User's Manual PDF
No ratings yet
OS3 User's Manual PDF
50 pages
Science 7 Q4 W2
No ratings yet
Science 7 Q4 W2
10 pages
The EFPIA Disclosure Code: Your Questions Answered
No ratings yet
The EFPIA Disclosure Code: Your Questions Answered
9 pages
CCS Medartis
No ratings yet
CCS Medartis
44 pages
A Soda Bottle Magnetometer
No ratings yet
A Soda Bottle Magnetometer
5 pages
Full download Arnošt Frischer and the Jewish Politics of Early 20th Century Europe Jan Lánícek pdf docx
100% (8)
Full download Arnošt Frischer and the Jewish Politics of Early 20th Century Europe Jan Lánícek pdf docx
67 pages
Prelims 5
No ratings yet
Prelims 5
201 pages
Using IDL: IDL Version 5.4 September, 2000 Edition
No ratings yet
Using IDL: IDL Version 5.4 September, 2000 Edition
716 pages
Direct Digital Phase Noise Measurement
No ratings yet
Direct Digital Phase Noise Measurement
5 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
DSAT Words in Context
No ratings yet
DSAT Words in Context
132 pages
Hydraulic Turbine and Turbine Control Models For System Pynamic Studies PDF
No ratings yet
Hydraulic Turbine and Turbine Control Models For System Pynamic Studies PDF
13 pages
Kerala Institute of Co-Operative Management: Application For Mba Fulltime Programme (2020-2022)
No ratings yet
Kerala Institute of Co-Operative Management: Application For Mba Fulltime Programme (2020-2022)
3 pages
Epocast 1652-A/B: Advanced Materials
No ratings yet
Epocast 1652-A/B: Advanced Materials
3 pages
ICT-2123-2012S Visual Graphic Design (NC II) P1: Week 11-20
No ratings yet
ICT-2123-2012S Visual Graphic Design (NC II) P1: Week 11-20
8 pages
Academic Writing
No ratings yet
Academic Writing
2 pages
Match Code
No ratings yet
Match Code
2 pages
MIT Case - Tesla's Entry Into The U.S. Auto Industry
100% (1)
MIT Case - Tesla's Entry Into The U.S. Auto Industry
27 pages
MMC Bible Study - The Triumphal Entry
No ratings yet
MMC Bible Study - The Triumphal Entry
5 pages
Free Software Guide-30.6
No ratings yet
Free Software Guide-30.6
36 pages
2013 GENG4402 Supplementary Exam
No ratings yet
2013 GENG4402 Supplementary Exam
6 pages
Analytic Geometry Hyperbola Problems
No ratings yet
Analytic Geometry Hyperbola Problems
14 pages
Resume Aniruddha Wagh
No ratings yet
Resume Aniruddha Wagh
3 pages
Abu Afzal Mohammad Shakar Et Al (2012)
No ratings yet
Abu Afzal Mohammad Shakar Et Al (2012)
6 pages
CSE 390a: Intro To Shell Scripting
No ratings yet
CSE 390a: Intro To Shell Scripting
22 pages
Warning Against Extremism by Shaikh Saleh Aalus Shaykh
100% (2)
Warning Against Extremism by Shaikh Saleh Aalus Shaykh
54 pages

Slides 3

Uploaded by

Slides 3

Uploaded by

So Many Choices, So Little Money:

systematic assignment of proteins to functional classes

With the human genome finished it is unclear where to

We have been here We are heading here where next

• Proteins contain a variety of different functional

• A domain is an independent structural unit

• Module = mobile domain.

• Different domains have distinct functions.

• Many eukaryotic proteins have multiple

PX domain with SH3 domain with

• One of the most powerful approaches to predict the exact function of

• An “Ortholog” is a homologous sequence from a different species

• A “Paralog” is a homologous sequence that diverged in a single

1:1 Other Non-

• Comparison of proteins encoded in complete genomes allowed the

This will better allow for functional analysis

The GO project has developed three structured controlled

The use of GO terms by collaborating databases facilitates

As an example, you can use GO to find all the gene products

This was the first genome sequence of a model eukaryotic parasite,

Functional profiles of expressed proteins.

Understanding of biological role of proteins will require

Although experimental structure determination methods are

Understanding of biological role of proteins will require

Although experimental structure determination methods are

This can be done in a high throughput manner

The first class of protein structure prediction methods,

The second class of methods, de novo or ab initio

The first step in modelling of a protein sequence is to attempt to find related

Last updated: Tue Aug 6 12:27:55 2002 Visitors To Date:

The potential uses of a comparative

All of the computational techniques will still only give an

At the end of the day it is still an absolute requirement to directly

You might also like