Gene Ontology and Functional Enrichment: Genome 559: Introduction To Statistical and Computational Genomics
Gene Ontology and Functional Enrichment: Genome 559: Introduction To Statistical and Computational Genomics
Functional Enrichment
Genome 559: Introduction to Statistical and
Computational Genomics
Elhanan Borenstein
A quick review
The parsimony principle:
Find the tree that requires the
fewest evolutionary changes!
Parsimony algorithm
1. Construct all possible trees
2. For each site in the alignment and for each tree count the
minimal number of changes required
3. Add sites to obtain the total number of changes required
for each tree
4. Pick the tree with the lowest score
Experimental conditions:
genes
conditions
What do we need?
A shared functional vocabulary
Systematic linkage between genes and functions
A way to identify genes relevant to the condition under
study
Statistical analysis
(combining all of the above to identify cellular
functions that contributed to the disease or
condition under study)
What do we need?
Gene Ontology
Annotation
Statistical analysis
Enrichment
analysis, GSEA
GO terms
The Gene Ontology (GO) is a controlled vocabulary,
a set of standard terms (words and phrases) used for
indexing and retrieving information.
Ontology structure
GO also defines the relationships between
the terms, making it a structured vocabulary.
GO is structured as a directed acyclic graph,
and each term has defined relationships to
one or more other terms.
GO domains
Three ontology domains:
1. Molecular function: basic activity or task
e.g. catalytic activity, calcium ion binding
2. Biological process: broad objective or goal
e.g. signal transduction, immune response
3. Cellular component: location or complex
e.g. nucleus, mitochondrion
Go domains
Biological process
Molecular function
Cellular component
eggNOG
Clusters of Orthologous
Groups (COG)
What do we need?
A shared functional vocabulary
Systematic linkage between genes and functions
A way to identify genes relevant to the condition under
study
GO annotation
Statistical analysis
(combining all of the above to identify cellular
functions that contributed to the disease or
condition under study)
Enrichment analysis
Functional
category
# of genes in
the study set
Signaling
82
27.6
Metabolism
40
13.5
Others
31
10.4
Trans factors
28
9.4
Transporters
26
8.8
Proteases
20
6.7
Protein synthesis
19
6.4
Adhesion
16
5.4
Oxidation
13
4.4
Cell structure
10
3.4
Secretion
2.0
Detoxification
2.0
# of genes in
the study set
Signaling
82
27.6
Metabolism
40
13.5
Others
31
10.4
Trans factors
28
9.4
Transporters
26
8.8
Proteases
20
6.7
Protein synthesis
19
6.4
Adhesion
16
5.4
Oxidation
13
4.4
Cell structure
10
3.4
Secretion
2.0
Detoxification
2.0
# of genes in
the study set
% on
array
Signaling
82
27.6%
26%
Metabolism
40
13.5%
15%
Others
31
10.4%
11%
Trans factors
28
9.4%
10%
Transporters
26
8.8%
2%
Proteases
20
6.7%
7%
Protein synthesis
19
6.4%
7%
Adhesion
16
5.4%
6%
Oxidation
13
4.4%
4%
Cell structure
10
3.4%
8%
Secretion
2.0%
2%
Detoxification
2.0%
2%
Arbitrary!
Limited hypotheses
Statistical analysis
(combining all of the above to identify cellular
functions that contributed to the disease or
condition under study)