DAVID Tutorial
DAVID Tutorial
Rossella Melchiotti
22/01/2015
Contents
1. Mount the local drive on the Windows machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Take a look at the datasets that will be used for the analysis . . . . . . . . . . . . . . . . . . . . 3
3. Select genes of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Perform gene enrichment using DAVID (on the Windows machine) . . . . . . . . . . . . . . . . 6
5. Perform KEGG enrichment using GSEA (on the Windows machine) . . . . . . . . . . . . . . . . 10
6. Visualize a pathway using PathVisio (on the Windows machine) . . . . . . . . . . . . . . . . . . 16
7. Perform gene enrichment using topGO (on the cluster, will be presented only if time permits) . 22
8. Help Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
In this practical you will learn how to perform pathway analysis on a real microarray expression dataset
using overrepresentation analysis, rank-based methods and conditional enrichment. You will also learn how
to overlay expression datasets on pathways for further understanding the effects of a perturbation.
This technical step is required to access the datasets and tools used in this practical. It is specific to this
tutorial and not required for pathway analysis in other contexts.
1. Go to the Desktop
2. Double click on Computer
1
3. Select the tab Computer
4. Click on Map Network Drive
5. Select F: as Drive
6. Type \\kclad\groups\BioinformaticsWorkshop\ in Folder
2
7. Click on Finish
2. Take a look at the datasets that will be used for the analysis
As previously mentioned by Dr. Filipe Gracio in his presentation on RNA-Seq, the analysis of expression
datasets often produces files containing gene names, their p-value for differential expression across two or
more phenotypic groups and, in some cases, associated fold changes. For this practical we will analyse a
similar file. The file to use for the analysis can be found at:
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\
deg_GSE2706_8h.txt (on the Windows shared drive)
3
This dataset was generated by comparing cells isolated from the same individuals before (3 samples) and
after (3 samples) stimulation with a receptor activator. Differential expression was estimated using the tool
Limma as implemented in GEO2R. Expression was measured using the Affymetrix Human Genome U133
Plus 2.0 Array.
Open this file and inspect it. It should contain the following fields:
The same directory contains the corresponding expression dataset (GSE2706_series_matrix.txt). Each
rows represent a probe, each column represent a sample. Samples identified by the prefix PBS are baseline
samples. Samples identified by the prefix LPS are samples measured after a perturbation. This dataset will
be described in more detail later on in the practical. Familiarize yourself with the structure of both files.
IMPORTANT!!!
Please do not save any changes made to the files or all the other students will be affected as
well
The first step in a pathway enrichment analysis is the selection of genes of interest to use for overrepresentation
analysis. One option is to look only at genes which are significantly different across two or more phenotypic
groups after multiple testing correction (adj.P.Val<0.05). This can be easily done in Excel. Because of time
contraints this step has already been performed and the corresponding list of differentially expressed genes
filtered by p-value can be found at:
4
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\
deg_GSE2706_8h_filtered_by_pval.txt (on the Windows shared drive)
How do we make biological sense out of this? Due to the large number of genes significantly perturbed we
5
cannot analyse them one by one.
We will start by running a simple overrepresentation analysis as implemented by the tool DAVID (Database
for Annotation, Visualization and Integrated Discovery). Go to the following website:
http://david.abcc.ncifcrf.gov/
6
4. Move to the tab Background and under Affymetrix 3’ IVT Backgrounds choose Human Genome U133
Plus 2 Array as a background since the experiment was run using this chip
7
Another option would be to use only expressed genes as a background.
Ontologies and Pathway Databases of interest can be selected in the right hand side of the webpage.
8
As mentioned in the theoretical session, DAVID provides three distinct tools to perform pathway enrichment:
Choose the right analysis and the right annotation to answer the following questions:
Q: What are the most enriched GO terms for Biological Process (GOTERM_BP_FAT)?
Q: Are the results redundant? (suggestion, use the Functional Annotation Clustering option)
Q: What are the most enriched KEGG pathways?
Q: Take a look at the description of this dataset (http:// www.ncbi.nlm.nih.gov/ geo/ query/ acc.cgi?acc=
GSE2706 ). Do the results make biological sense? (only untreated samples and samples treated with LPS at 8h
were used for the analysis)
9
Most of the enriched functions and pathways seem to revolve around inflammation. For this experiment
unstimulated human dendritic cells (DCs) were compared with DCs stimulated with lipopolysaccharides (LPS)
to induce TLR4-pathway activation. Expression after 8h stimulation was compared to baseline. It therefore
makes biological sense that most enriched pathways and functions are linked to the immune system. LPS, the
molecule used to activate dendritic cells, is in fact normally found in the outer membrane of Gram-negative
bacteria and should therefore be recognized as a threat by the immune system.
OPTIONAL
Q: What are the GO terms for Molecular Function (GOTERM_MF_FAT) most enriched in upregulated genes?
What about downregulated genes? (You can use the files deg_GSE2706_8h_filtered_by_pval_upregulated.txt
and deg_GSE2706_8h_filtered_by_pval_downregulated.txt which contain only genes upregulated and down-
regulated by the perturbation respectively)
To avoid the arbitrary choice of a threshold, for selecting genes of interest to test for enrichment, we can
use a ranked-based approach. In this practical we will focus on GSEA (Gene Set Enrichment Analysis).
The version of GSEA we will use today is the standalone Java application that can be found at http:
//www.broadinstitute.org/gsea/index.jsp The software can be launched by double clicking on gsea2-2.1.0.jar
located in the directory:
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Software\GSEA\
(on the Windows shared drive)
In order to run a GSEA analysis we need to download the expression dataset (instead of simply using a list
of genes). This file can be accessed at:
10
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\
GSE2706_series_matrix.txt (on the Windows shared drive)
The file, which can be opened using Excel (tab delimited), contains an identifier (Affymetrix probe ID)
followed by the expression of six samples: 3 controls (PBS.n) and 3 stimulated samples (LPS.8h.n).
To run the GSEA software:
11
4. Choose the loaded file as expression dataset
5. Choose c2.cp.kegg.v4.0.symbols.gmt as Gene sets database
6. Choose Create an on-the-fly phenotype as Phenotype labels
7. Write PBS.1, PBS.2, PBS.3 under Samples for class A (one per line, these names correspond to the
sample names in the header of the expression file)
8. Write LPS.8h.1, LPS.8h.2, LPS.8h.3 under Samples for class B (one per line, these names correspond
to the sample names in the header of the expression file)
9. Click on Apply to dataset
12
10. Choose gene_set as Permutation type (phenotype is usually preferred but since our dataset contains
only 6 samples, the number of possible permutations is not enough to estimate a reliable FDR q-val)
11. Choose HG_U133_Plus_2.chip as Chip platform(s)
12. Expand Basic Fields and choose a meaningful name as Analysis name
13. Use default values for the other parameters
14. Click on Run
13
Pre-computed results can be found at:
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Results\GSEA\
effect_of_stimulation_on_DCs.Gsea.1418825504824\ (on the Windows shared drive)
Load the index.html file containing the summary of the results using a browser.
Results computed by GSEA can also be accessed directly by the software window clicking on Success 5.
14
Enriched pathways for upregulated and downregulated genes can be found by clicking on Detailed enrich-
ment results in html format. Clicking on a pathway name leads to a description of a pathway. Clicking
on Details . . . leads to the list of genes in the pathway and the corresponding heatmap. Plots explaining
how well a pathway was enriched at the top of the list can be found by clicking on Snapshot of enrichment
results.
15
Q: What are the most enriched KEGG pathways for upregulated genes in class B?
Q: What are the most enriched KEGG pathways for downregulated genes in class B?
16
gene expression fold changes on this enriched pathway to better understand how the pathway is perturbed
after activation.
PathVisio is a free open-source tool for the visualization of biological pathways. Open PathVisio by double
clicking on the executable file which can be found at:
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Software\
pathvisio-3.1.3\ (on the Windows shared drive)
Here are the steps required to plot the toll like receptor signaling pathway and to colour it according to the
fold changes in our dataset for the genes in the pathway.
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Pathways\
(on the Windows shared drive)
17
2. Click on Data > Select Gene Database and select the file Hs_Derby_20130701.bridge stored at:
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Pathways\
(on the Windows shared drive)
3. Click on Data > Import expression data and select the expression matrix deg_GSE2706_8h_filtered_by_pval.txt
as Input file. This file is stored at:
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\Pathvisio\
(on the Windows shared drive)
4. To choose the Output file click on Browse and select your home directory (the one with your username
as a name, ex. a1102248). You can give the file the name you prefer. Click on Choose filename for
database. Click on Next
18
IMPORTANT!!!
Do not save the output file in the default directory given by the software
19
5. Choose tab as a data delimiter. Click on Next
6. Choose Gene.symbol as primary identifier column. Select Use the same system code for all rows and
choose HGNC (Hugo Gene Symbols). Click on Next
7. Click on Finish
8. Choose Data > Visualization options
20
9. Tick Expression as Color, Tick Basic and select only logFC as the metric to use to color nodes
10. Click on Modify and change the scale of the color set so that the gradient goes from -10 to 10
21
Q: Are genes mostly upregulated or downregulated?
Q: Are perturbed genes concentrated upstream or downstream in the pathway?
IMPORTANT!!!
Please do not save the changes made to the pathway or all the other students will be affected
as well
7. Perform gene enrichment using topGO (on the cluster, will be presented only
if time permits)
As mentioned in the theoretical lecture GO has a hierachical nature which can sometimes lead to redundant
enriched functions (see DAVID results in Section 4). It is therefore interesting to compare results obtained
22
with traditional overrepresentation and rank-based analyses with results obtained by conditional enrichment.
This can be performed programmatically for the GO ontology using the R package topGO. A description of
the package and of all functions contained in the package can be found at http://www.bioconductor.org/
packages/release/bioc/html/topGO.html.
Please open MobaXterm, login into the cluster, add the modules required and open RStudio as explained in
the previous practical (see handout).
You can follow this section by copying and pasting the code from this PDF or by running, in RStudio, the
script topGOAnalysis.R stored at:
rm(list = ls())
# Load packages
library(topGO)
library(org.Hs.eg.db)
library(biomaRt)
library(reshape2)
IMPORTANT!!!
working_dir should be changed to the directory in which to store results
prefix<-"GSE2706_8h"
deg_filename <- paste(input_file_dir,"deg_GSE2706_8h.txt",sep="")
significance_threshold_pvalue <- 0.05
# Load the list of genes with their corresponding p-value for differential expression
# (all genes regardless of p-value)
genes <- read.delim(deg_filename,sep="\t",header=TRUE,na.strings="")
print(head(genes))
23
# Collapse probes with the same Gene Symbol
genes_collapsed<-dcast(genes[,c("Gene.symbol","adj.P.Val")],Gene.symbol~.,
median,value.var="adj.P.Val")
colnames(genes_collapsed)<-c("Gene.symbol","adj.P.Val")
# Create a vector containing the scores that will be used to rank the list, each vector
# element should be named with its gene symbol
all_genes <- genes_collapsed$adj.P.Val
names(all_genes) <- genes_collapsed$Gene.symbol
# Enrichment for biological processes (the package org.Hs.eg.db contains the mapping
# between gene symbols and GO terms)
GO_data_BP<-new("topGOdata",
ontology = "BP",
allGenes = all_genes,
geneSel = top_diff_genes_function,
nodeSize = 10,
annot = annFUN.org,
mapping = "org.Hs.eg.db",
ID = "symbol"
)
##
## Building most specific GOs ..... ( 9342 GO terms found. )
##
## Build GO DAG topology .......... ( 12580 GO terms and 28930 relations. )
##
## Annotating nodes ............... ( 14149 genes annotated to the GO terms. )
# Run enrichment using both Fisher and Kolmogorov-Smirnov tests and both the classic and
# the elim methods provided by topGO
results_Fisher_BP <- runTest(GO_data_BP, algorithm = "classic", statistic = "fisher")
##
## -- Classic Algorithm --
##
## the algorithm is scoring 3620 nontrivial nodes
## parameters:
## test statistic: fisher
##
## -- Elim Algorithm --
##
## the algorithm is scoring 3620 nontrivial nodes
## parameters:
## test statistic: fisher
24
## cutOff: 0.01
##
## Level 18: 2 nodes to be scored (0 eliminated genes)
##
## Level 17: 5 nodes to be scored (0 eliminated genes)
##
## Level 16: 10 nodes to be scored (12 eliminated genes)
##
## Level 15: 16 nodes to be scored (32 eliminated genes)
##
## Level 14: 45 nodes to be scored (42 eliminated genes)
##
## Level 13: 99 nodes to be scored (208 eliminated genes)
##
## Level 12: 151 nodes to be scored (498 eliminated genes)
##
## Level 11: 250 nodes to be scored (741 eliminated genes)
##
## Level 10: 381 nodes to be scored (1211 eliminated genes)
##
## Level 9: 485 nodes to be scored (1374 eliminated genes)
##
## Level 8: 536 nodes to be scored (2086 eliminated genes)
##
## Level 7: 555 nodes to be scored (3037 eliminated genes)
##
## Level 6: 490 nodes to be scored (3372 eliminated genes)
##
## Level 5: 343 nodes to be scored (4412 eliminated genes)
##
## Level 4: 183 nodes to be scored (5318 eliminated genes)
##
## Level 3: 50 nodes to be scored (6462 eliminated genes)
##
## Level 2: 18 nodes to be scored (6829 eliminated genes)
##
## Level 1: 1 nodes to be scored (6829 eliminated genes)
##
## -- Classic Algorithm --
##
## the algorithm is scoring 5131 nontrivial nodes
## parameters:
## test statistic: ks
## score order: increasing
##
## -- Elim Algorithm --
##
25
## the algorithm is scoring 5131 nontrivial nodes
## parameters:
## test statistic: ks
## cutOff: 0.01
## score order: increasing
##
## Level 19: 1 nodes to be scored (0 eliminated genes)
##
## Level 18: 2 nodes to be scored (0 eliminated genes)
##
## Level 17: 7 nodes to be scored (0 eliminated genes)
##
## Level 16: 18 nodes to be scored (12 eliminated genes)
##
## Level 15: 32 nodes to be scored (24 eliminated genes)
##
## Level 14: 75 nodes to be scored (24 eliminated genes)
##
## Level 13: 158 nodes to be scored (45 eliminated genes)
##
## Level 12: 249 nodes to be scored (180 eliminated genes)
##
## Level 11: 450 nodes to be scored (526 eliminated genes)
##
## Level 10: 599 nodes to be scored (1016 eliminated genes)
##
## Level 9: 726 nodes to be scored (1461 eliminated genes)
##
## Level 8: 756 nodes to be scored (2496 eliminated genes)
##
## Level 7: 744 nodes to be scored (3114 eliminated genes)
##
## Level 6: 615 nodes to be scored (4000 eliminated genes)
##
## Level 5: 423 nodes to be scored (4349 eliminated genes)
##
## Level 4: 206 nodes to be scored (6085 eliminated genes)
##
## Level 3: 51 nodes to be scored (6848 eliminated genes)
##
## Level 2: 18 nodes to be scored (6968 eliminated genes)
##
## Level 1: 1 nodes to be scored (6968 eliminated genes)
# Substitute zero p-values with a very small number so that it is possible to compute the
# log10
results_Fisher_BP@score[which(results_Fisher_BP@score==0)]=1e-300
results_Fisher_elim_BP@score[which(results_Fisher_elim_BP@score==0)]=1e-300
results_KS_BP@score[which(results_KS_BP@score==0)]=1e-300
results_KS_elim_BP@score[which(results_KS_elim_BP@score==0)]=1e-300
26
elimFisher = results_Fisher_elim_BP,
classicKS = results_KS_BP,
elimKS = results_KS_elim_BP,
ranksOf = "classicFisher",
topNodes = length(score(results_Fisher_BP)))
Open the file you just created in your working directory (GSE2706_8h_topGO_results.csv). In case the
analysis is taking too long pre-computed results can be found at:
~/practicals/Thursday/Melchiotti/Results/topGO/GSE2706_8h_topGO_results.csv
(on the cluster computer)
OR
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Results\topGO\
GSE2706_8h_topGO_results.csv
(on the Windows shared driver)
Q: How many functions are significantly enriched according to classic Fisher? How many according to elim
Fisher?
Q: How many functions are significantly enriched according to classic KS? How many according to elim KS?
This can be done directly in R by examining the data frame all_res_BP_just_pval.
## [1] 783
## [1] 540
27
print(length(intersect(classicFisher,elimFisher)))
## [1] 437
## [1] 837
## [1] 536
print(length(intersect(classicKS,elimKS)))
## [1] 494
Q: Can you find an example of a function that is enriched when running KS but not when running Fisher?
results<-all_res_BP_just_pval[which(as.numeric(all_res_BP_just_pval$classicKS)<0.05
& as.numeric(all_res_BP_just_pval$classicFisher)
>0.05),]
print(tail(results))
28
## 5116 GO:2000696 regulation of epithelial cell differenti... 13
## 5129 GO:2001251 negative regulation of chromosome organi... 41
## Significant Expected Rank in classicFisher classicFisher elimFisher
## 4960 0 0.28 4960 1.00000 1.00000
## 5024 0 0.38 5024 1.00000 1.00000
## 5080 0 0.38 5080 1.00000 1.00000
## 5084 0 0.41 5084 1.00000 1.00000
## 5116 0 0.33 5116 1.00000 1.00000
## 5129 0 1.05 5129 1.00000 1.00000
## classicKS elimKS
## 4960 0.01102 0.01102
## 5024 0.00068 1.00000
## 5080 0.03267 0.03267
## 5084 0.01886 0.01886
## 5116 0.04885 0.04885
## 5129 0.04448 0.04448
GO:0008150
biological_process
1
363 / 14149
GO:0002376 GO:0050896
immune system proces... response to stimulus
< 1e−20 1.05e−19
153 / 2037 258 / 6776
GO:0002252 GO:0051707
immune effector proc... response to other or...
< 1e−20 < 1e−20
69 / 544 84 / 602
GO:0098542 GO:0009615
defense response to ... response to virus
< 1e−20 < 1e−20
55 / 325 52 / 273
GO:0051607
defense response to ...
< 1e−20
43 / 195
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 15
## Number of Edges = 20
##
## $complete.dag
## [1] "A graph with 15 nodes."
29
pdf(paste(working_dir,prefix,"_Fisher_BP_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_Fisher_BP), firstSigNodes = 10, useInfo = "all")
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 15
## Number of Edges = 20
##
## $complete.dag
## [1] "A graph with 15 nodes."
dev.off()
## pdf
## 2
GO:0008150
biological_process
1.000000
363 / 14149
GO:0006955 GO:0006950 GO:0009605 GO:0009607 GO:0042221 GO:0051716 GO:0044700 GO:0044763 GO:0044764 GO:0044419 GO:0050789
immune response response to stress response to external... response to biotic s... response to chemical cellular response to... single organism sign... single−organism cell... multi−organism cellu... interspecies interac... regulation of biolog...
4.30e−09 0.025170 0.755932 0.141046 0.894425 0.023986 0.073995 0.337651 0.600011 0.246480 0.903062
125 / 1274 173 / 3100 112 / 1778 84 / 629 141 / 3044 222 / 5376 206 / 4941 307 / 10311 38 / 678 40 / 731 262 / 8309
GO:0006952 GO:0009611 GO:0043207 GO:0010033 GO:0070887 GO:0007154 GO:0050794 GO:0044403 GO:0043900 GO:0048519
defense response response to wounding response to external... response to organic ... cellular response to... cell communication regulation of cellul... symbiosis, encompass... regulation of multi−... negative regulation ...
0.014586 0.202730 0.081788 0.846826 0.515286 0.114443 0.936779 0.246480 0.372157 0.764962
125 / 1333 80 / 1129 84 / 602 116 / 2173 109 / 2008 205 / 5009 247 / 7865 40 / 731 34 / 248 147 / 3346
GO:0045087 GO:0006954 GO:0051707 GO:0034097 GO:0071310 GO:0007165 GO:0016032 GO:0048523 GO:0043903 GO:0043901
innate immune respon... inflammatory respons... response to other or... response to cytokine cellular response to... signal transduction viral process negative regulation ... regulation of symbio... negative regulation ...
0.021511 8.09e−14 0.081788 0.178865 0.366757 0.010106 0.233544 0.554931 0.734954 0.182613
73 / 786 58 / 520 84 / 602 67 / 555 93 / 1605 196 / 4434 36 / 668 128 / 3054 21 / 157 22 / 93
GO:0060337 GO:0045069
type I interferon si... regulation of viral ...
< 1e−20 1.000000
24 / 75 16 / 53
GO:0045071
negative regulation ...
2.00e−16
16 / 37
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 55
## Number of Edges = 95
##
## $complete.dag
## [1] "A graph with 55 nodes."
30
pdf(paste(working_dir,prefix,"_Fisher_BP_elim_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_Fisher_elim_BP), firstSigNodes = 5,
useInfo = "all")
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 55
## Number of Edges = 95
##
## $complete.dag
## [1] "A graph with 55 nodes."
dev.off()
## pdf
## 2
GO:0008150
biological_process
1.00000
363 / 14149
GO:0098542 GO:0009615
defense response to ... response to virus
1.49e−13 8.98e−15
55 / 325 52 / 273
GO:0051607
defense response to ...
1.57e−14
43 / 195
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 18
## Number of Edges = 23
##
## $complete.dag
## [1] "A graph with 18 nodes."
31
pdf(paste(working_dir,prefix,"_KS_BP_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_KS_BP), firstSigNodes = 10, useInfo = "all")
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 18
## Number of Edges = 23
##
## $complete.dag
## [1] "A graph with 18 nodes."
dev.off()
## pdf
## 2
GO:0008150
biological_process
1.00000
363 / 14149
GO:0009607 GO:0009605 GO:0006950 GO:0006955 GO:0042221 GO:0051716 GO:0044700 GO:0044763 GO:0044764 GO:0044419 GO:0050789
response to biotic s... response to external... response to stress immune response response to chemical cellular response to... single organism sign... single−organism cell... multi−organism cellu... interspecies interac... regulation of biolog...
0.62871 0.59938 0.15545 5.13e−05 0.50310 0.20136 0.34824 0.42789 0.08659 0.23707 0.53550
84 / 629 112 / 1778 173 / 3100 125 / 1274 141 / 3044 222 / 5376 206 / 4941 307 / 10311 38 / 678 40 / 731 262 / 8309
GO:0043207 GO:0009611 GO:0006952 GO:0010033 GO:0070887 GO:0007154 GO:0044403 GO:0050794 GO:0043900 GO:0048519
response to external... response to wounding defense response response to organic ... cellular response to... cell communication symbiosis, encompass... regulation of cellul... regulation of multi−... negative regulation ...
1.00000 0.39213 0.28342 0.41982 0.45791 0.41160 0.23707 0.53284 0.03474 0.04207
84 / 602 80 / 1129 125 / 1333 116 / 2173 109 / 2008 205 / 5009 40 / 731 247 / 7865 34 / 248 147 / 3346
GO:0051707 GO:0006954 GO:0045087 GO:0034097 GO:0071310 GO:0007165 GO:0016032 GO:0043903 GO:0048523 GO:0043901
response to other or... inflammatory respons... innate immune respon... response to cytokine cellular response to... signal transduction viral process regulation of symbio... negative regulation ... negative regulation ...
0.00687 2.39e−07 0.00443 0.12689 0.32227 0.36178 0.05393 0.72331 0.02574 0.16752
84 / 602 58 / 520 73 / 786 67 / 555 93 / 1605 196 / 4434 36 / 668 21 / 157 128 / 3054 22 / 93
GO:0060337 GO:0045069
type I interferon si... regulation of viral ...
2.19e−10 0.52987
24 / 75 16 / 53
GO:0045071
negative regulation ...
2.60e−09
16 / 37
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 56
## Number of Edges = 96
##
## $complete.dag
## [1] "A graph with 56 nodes."
32
pdf(paste(working_dir,prefix,"_KS_BP_elim_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_KS_elim_BP), firstSigNodes = 5, useInfo = "all")
## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 56
## Number of Edges = 96
##
## $complete.dag
## [1] "A graph with 56 nodes."
dev.off()
## pdf
## 2
The following code contains two useful functions to retrieve the full name of a GO term and all genes belonging
to a particular GO category since topGO tends to truncate GO term full names in the tables and graphs.
# Retrieve full name of a GO term and all genes belonging to that category
go_id<-"GO:0045087"
print(paste("Full name of term GO:0045087:",Term("GO:0045087"),sep=""))
print(head(gene.data))
## hgnc_symbol go_id
## 1 IPO7 GO:0045087
## 2 TRIM27 GO:0045087
## 3 IFNW1 GO:0045087
## 4 PIK3C3 GO:0045087
## 5 TOLLIP GO:0045087
## 6 DUSP7 GO:0045087
Q: Compare the results of the two Fisher’s tests (classic and elim) using the PDF files just generated. Which
nodes lose significance using the elim version?
Q: Compare the results of the Fisher’s classic test with the results of the KS’s classis test using the PDF files
just generated. Are there any differences?
Close RStudio and MobaXterm.
33
8. Help Links
1. DAVID: http://david.abcc.ncifcrf.gov/helps/functional_annotation.html
2. GSEA: http://www.broadinstitute.org/gsea/doc/desktop_tutorial.jsp
3. PathVisio: http://www.pathvisio.org/documentation/
4. topGO: http://www.bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf
34