0% found this document useful (0 votes)

104 views

DAVID Tutorial

This document describes a hands-on session for performing pathway and network analysis. It discusses mounting a local drive to access necessary datasets and tools. It then reviews the datasets that will be used, including a file of differentially expressed genes from stimulated vs unstimulated dendritic cells. The document guides running gene enrichment analysis using DAVID and GSEA, looking at enriched Gene Ontology terms and KEGG pathways. It emphasizes selecting appropriate analysis options and interpreting results in a biologically meaningful way for the experimental system of lipopolysaccharide-stimulated dendritic cells.

Uploaded by

Data Screw

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

104 views

DAVID Tutorial

Uploaded by

Data Screw

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Pathway and Network Analysis Hands-On Session

Rossella Melchiotti
22/01/2015

Contents
1. Mount the local drive on the Windows machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2. Take a look at the datasets that will be used for the analysis . . . . . . . . . . . . . . . . . . . . 3
3. Select genes of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4. Perform gene enrichment using DAVID (on the Windows machine) . . . . . . . . . . . . . . . . 6
5. Perform KEGG enrichment using GSEA (on the Windows machine) . . . . . . . . . . . . . . . . 10
6. Visualize a pathway using PathVisio (on the Windows machine) . . . . . . . . . . . . . . . . . . 16
7. Perform gene enrichment using topGO (on the cluster, will be presented only if time permits) . 22
8. Help Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
In this practical you will learn how to perform pathway analysis on a real microarray expression dataset
using overrepresentation analysis, rank-based methods and conditional enrichment. You will also learn how
to overlay expression datasets on pathways for further understanding the effects of a perturbation.

1. Mount the local drive on the Windows machine

This technical step is required to access the datasets and tools used in this practical. It is specific to this
tutorial and not required for pathway analysis in other contexts.

1. Go to the Desktop
2. Double click on Computer

1
3. Select the tab Computer
4. Click on Map Network Drive

5. Select F: as Drive
6. Type \\kclad\groups\BioinformaticsWorkshop\ in Folder

2
7. Click on Finish

2. Take a look at the datasets that will be used for the analysis

As previously mentioned by Dr. Filipe Gracio in his presentation on RNA-Seq, the analysis of expression
datasets often produces files containing gene names, their p-value for differential expression across two or
more phenotypic groups and, in some cases, associated fold changes. For this practical we will analyse a
similar file. The file to use for the analysis can be found at:

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\
deg_GSE2706_8h.txt (on the Windows shared drive)

3
This dataset was generated by comparing cells isolated from the same individuals before (3 samples) and
after (3 samples) stimulation with a receptor activator. Differential expression was estimated using the tool
Limma as implemented in GEO2R. Expression was measured using the Affymetrix Human Genome U133
Plus 2.0 Array.
Open this file and inspect it. It should contain the following fields:

• Affymetrix Probe ID (ID)

• Benjamini Hochberg corrected p-value (adj.P.Val)
• Uncorrected p-value (P.Value)
• Log Fold Change of stimulated versus unstimulated cells (logFC)
• HUGO Gene Symbol (Gene.symbol)
• Gene full name (Gene.title)

The same directory contains the corresponding expression dataset (GSE2706_series_matrix.txt). Each
rows represent a probe, each column represent a sample. Samples identified by the prefix PBS are baseline
samples. Samples identified by the prefix LPS are samples measured after a perturbation. This dataset will
be described in more detail later on in the practical. Familiarize yourself with the structure of both files.
IMPORTANT!!!
Please do not save any changes made to the files or all the other students will be affected as
well

3. Select genes of interest

The first step in a pathway enrichment analysis is the selection of genes of interest to use for overrepresentation
analysis. One option is to look only at genes which are significantly different across two or more phenotypic
groups after multiple testing correction (adj.P.Val<0.05). This can be easily done in Excel. Because of time
contraints this step has already been performed and the corresponding list of differentially expressed genes
filtered by p-value can be found at:

4
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\
deg_GSE2706_8h_filtered_by_pval.txt (on the Windows shared drive)

Q: How many genes are significantly differentially expressed?

When the list of significant genes is too long additional filters can be introduced (for example fold changes)
or the significance threshold can be reduced.
Here is an heatmap representing the expression of differentially expressed genes across samples. Each row
represents a probe while each column represents a sample. Samples colored in green are before treatment
samples while samples colored in brown are after treatment samples.

How do we make biological sense out of this? Due to the large number of genes significantly perturbed we

5
cannot analyse them one by one.

4. Perform gene enrichment using DAVID (on the Windows machine)

We will start by running a simple overrepresentation analysis as implemented by the tool DAVID (Database
for Annotation, Visualization and Integrated Discovery). Go to the following website:
http://david.abcc.ncifcrf.gov/

1. Click on Functional Annotation

2. Upload the list deg_GSE2706_8h_filtered_by_pval.txt choosing Affymetrix_3PRIME_IVT_ID as

identifier and gene list as List Type
3. Click on Submit List

6
4. Move to the tab Background and under Affymetrix 3’ IVT Backgrounds choose Human Genome U133
Plus 2 Array as a background since the experiment was run using this chip

7
Another option would be to use only expressed genes as a background.
Ontologies and Pathway Databases of interest can be selected in the right hand side of the webpage.

8
As mentioned in the theoretical session, DAVID provides three distinct tools to perform pathway enrichment:

• Functional Annotation Clustering (similar enriched categories are clustered together)

• Functional Annotation Chart (enriched categories are independently reported)
• Functional Annotation Table (each probe is independently annotated)

Choose the right analysis and the right annotation to answer the following questions:
Q: What are the most enriched GO terms for Biological Process (GOTERM_BP_FAT)?
Q: Are the results redundant? (suggestion, use the Functional Annotation Clustering option)
Q: What are the most enriched KEGG pathways?
Q: Take a look at the description of this dataset (http:// www.ncbi.nlm.nih.gov/ geo/ query/ acc.cgi?acc=
GSE2706 ). Do the results make biological sense? (only untreated samples and samples treated with LPS at 8h
were used for the analysis)

9
Most of the enriched functions and pathways seem to revolve around inflammation. For this experiment
unstimulated human dendritic cells (DCs) were compared with DCs stimulated with lipopolysaccharides (LPS)
to induce TLR4-pathway activation. Expression after 8h stimulation was compared to baseline. It therefore
makes biological sense that most enriched pathways and functions are linked to the immune system. LPS, the
molecule used to activate dendritic cells, is in fact normally found in the outer membrane of Gram-negative
bacteria and should therefore be recognized as a threat by the immune system.
OPTIONAL
Q: What are the GO terms for Molecular Function (GOTERM_MF_FAT) most enriched in upregulated genes?
What about downregulated genes? (You can use the files deg_GSE2706_8h_filtered_by_pval_upregulated.txt
and deg_GSE2706_8h_filtered_by_pval_downregulated.txt which contain only genes upregulated and down-
regulated by the perturbation respectively)

5. Close the browser.

5. Perform KEGG enrichment using GSEA (on the Windows machine)

To avoid the arbitrary choice of a threshold, for selecting genes of interest to test for enrichment, we can
use a ranked-based approach. In this practical we will focus on GSEA (Gene Set Enrichment Analysis).
The version of GSEA we will use today is the standalone Java application that can be found at http:
//www.broadinstitute.org/gsea/index.jsp The software can be launched by double clicking on gsea2-2.1.0.jar
located in the directory:

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Software\GSEA\
(on the Windows shared drive)

In order to run a GSEA analysis we need to download the expression dataset (instead of simply using a list
of genes). This file can be accessed at:

10
\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\
GSE2706_series_matrix.txt (on the Windows shared drive)

The file, which can be opened using Excel (tab delimited), contains an identifier (Affymetrix probe ID)
followed by the expression of six samples: 3 controls (PBS.n) and 3 stimulated samples (LPS.8h.n).
To run the GSEA software:

1. Click on Load data

2. Browse to the directory containing the expression file (GSE2706_series_matrix.txt)

3. Click on Run GSEA

11
4. Choose the loaded file as expression dataset
5. Choose c2.cp.kegg.v4.0.symbols.gmt as Gene sets database
6. Choose Create an on-the-fly phenotype as Phenotype labels
7. Write PBS.1, PBS.2, PBS.3 under Samples for class A (one per line, these names correspond to the
sample names in the header of the expression file)
8. Write LPS.8h.1, LPS.8h.2, LPS.8h.3 under Samples for class B (one per line, these names correspond
to the sample names in the header of the expression file)
9. Click on Apply to dataset

12
10. Choose gene_set as Permutation type (phenotype is usually preferred but since our dataset contains
only 6 samples, the number of possible permutations is not enough to estimate a reliable FDR q-val)
11. Choose HG_U133_Plus_2.chip as Chip platform(s)
12. Expand Basic Fields and choose a meaningful name as Analysis name
13. Use default values for the other parameters
14. Click on Run

13
Pre-computed results can be found at:

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Results\GSEA\
effect_of_stimulation_on_DCs.Gsea.1418825504824\ (on the Windows shared drive)

Load the index.html file containing the summary of the results using a browser.
Results computed by GSEA can also be accessed directly by the software window clicking on Success 5.

14
Enriched pathways for upregulated and downregulated genes can be found by clicking on Detailed enrich-
ment results in html format. Clicking on a pathway name leads to a description of a pathway. Clicking
on Details . . . leads to the list of genes in the pathway and the corresponding heatmap. Plots explaining
how well a pathway was enriched at the top of the list can be found by clicking on Snapshot of enrichment
results.

15
Q: What are the most enriched KEGG pathways for upregulated genes in class B?
Q: What are the most enriched KEGG pathways for downregulated genes in class B?

15. Close GSEA.

6. Visualize a pathway using PathVisio (on the Windows machine)

One of the most enriched pathways in our dataset according to GSEA is, as expected, the
KEGG_TOLL_LIKE_RECEPTOR_SIGNALING_PATHWAY. It would be interesting to overlay

16
gene expression fold changes on this enriched pathway to better understand how the pathway is perturbed
after activation.
PathVisio is a free open-source tool for the visualization of biological pathways. Open PathVisio by double
clicking on the executable file which can be found at:

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Software\
pathvisio-3.1.3\ (on the Windows shared drive)

Here are the steps required to plot the toll like receptor signaling pathway and to colour it according to the
fold changes in our dataset for the genes in the pathway.

1. Click on File > Open and select the Hs_Toll-like_receptor_signaling_pathway_WP75_72133.gpml

pathway stored at:

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Pathways\
(on the Windows shared drive)

17
2. Click on Data > Select Gene Database and select the file Hs_Derby_20130701.bridge stored at:

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Pathways\
(on the Windows shared drive)

This file is an annotation file to map gene IDs to pathway components.

3. Click on Data > Import expression data and select the expression matrix deg_GSE2706_8h_filtered_by_pval.txt
as Input file. This file is stored at:

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Data\Pathvisio\
(on the Windows shared drive)

4. To choose the Output file click on Browse and select your home directory (the one with your username
as a name, ex. a1102248). You can give the file the name you prefer. Click on Choose filename for
database. Click on Next

18
IMPORTANT!!!
Do not save the output file in the default directory given by the software

19
5. Choose tab as a data delimiter. Click on Next
6. Choose Gene.symbol as primary identifier column. Select Use the same system code for all rows and
choose HGNC (Hugo Gene Symbols). Click on Next

7. Click on Finish
8. Choose Data > Visualization options

20
9. Tick Expression as Color, Tick Basic and select only logFC as the metric to use to color nodes

10. Click on Modify and change the scale of the color set so that the gradient goes from -10 to 10

21
Q: Are genes mostly upregulated or downregulated?
Q: Are perturbed genes concentrated upstream or downstream in the pathway?

11. Close PathVisio.

IMPORTANT!!!
Please do not save the changes made to the pathway or all the other students will be affected
as well

7. Perform gene enrichment using topGO (on the cluster, will be presented only
if time permits)

As mentioned in the theoretical lecture GO has a hierachical nature which can sometimes lead to redundant
enriched functions (see DAVID results in Section 4). It is therefore interesting to compare results obtained

22
with traditional overrepresentation and rank-based analyses with results obtained by conditional enrichment.
This can be performed programmatically for the GO ontology using the R package topGO. A description of
the package and of all functions contained in the package can be found at http://www.bioconductor.org/
packages/release/bioc/html/topGO.html.
Please open MobaXterm, login into the cluster, add the modules required and open RStudio as explained in
the previous practical (see handout).
You can follow this section by copying and pasting the code from this PDF or by running, in RStudio, the
script topGOAnalysis.R stored at:

~/practicals/Thursday/Melchiotti/Scripts/topGOAnalysis.R (on the cluster)

rm(list = ls())
# Load packages
library(topGO)
library(org.Hs.eg.db)
library(biomaRt)
library(reshape2)

# Set analysis parameters

input_file_dir <- "~/practicals/Thursday/Melchiotti/Data/"
working_dir <- "~/practicals/Thursday/Melchiotti/Results/topGO/"

IMPORTANT!!!
working_dir should be changed to the directory in which to store results

prefix<-"GSE2706_8h"
deg_filename <- paste(input_file_dir,"deg_GSE2706_8h.txt",sep="")
significance_threshold_pvalue <- 0.05

# Load the list of genes with their corresponding p-value for differential expression
# (all genes regardless of p-value)
genes <- read.delim(deg_filename,sep="\t",header=TRUE,na.strings="")
print(head(genes))

## ID adj.P.Val P.Value logFC Gene.symbol

## 1 204698_at 0.000489 8.94e-09 12.70 ISG20
## 2 1405_i_at 0.002017 9.94e-08 9.37 CCL5
## 3 33304_at 0.002017 1.20e-07 10.90 ISG20
## 4 204655_at 0.002017 1.48e-07 8.85 CCL5
## 5 210163_at 0.002811 2.57e-07 12.40 CXCL11
## 6 207901_at 0.002991 3.35e-07 9.26 IL12B
## Gene.title
## 1 interferon stimulated exonuclease gene 20kDa
## 2 chemokine (C-C motif) ligand 5
## 3 interferon stimulated exonuclease gene 20kDa
## 4 chemokine (C-C motif) ligand 5
## 5 chemokine (C-X-C motif) ligand 11
## 6 interleukin 12B

23
# Collapse probes with the same Gene Symbol
genes_collapsed<-dcast(genes[,c("Gene.symbol","adj.P.Val")],Gene.symbol~.,
median,value.var="adj.P.Val")
colnames(genes_collapsed)<-c("Gene.symbol","adj.P.Val")

# Create a vector containing the scores that will be used to rank the list, each vector
# element should be named with its gene symbol
all_genes <- genes_collapsed$adj.P.Val
names(all_genes) <- genes_collapsed$Gene.symbol

# Define function to select significant genes for Fisher's test

top_diff_genes_function <- function (scores)
{return(scores < significance_threshold_pvalue)}

# Enrichment for biological processes (the package org.Hs.eg.db contains the mapping
# between gene symbols and GO terms)
GO_data_BP<-new("topGOdata",
ontology = "BP",
allGenes = all_genes,
geneSel = top_diff_genes_function,
nodeSize = 10,
annot = annFUN.org,
mapping = "org.Hs.eg.db",
ID = "symbol"
)

##
## Building most specific GOs ..... ( 9342 GO terms found. )
##
## Build GO DAG topology .......... ( 12580 GO terms and 28930 relations. )
##
## Annotating nodes ............... ( 14149 genes annotated to the GO terms. )

# Run enrichment using both Fisher and Kolmogorov-Smirnov tests and both the classic and
# the elim methods provided by topGO
results_Fisher_BP <- runTest(GO_data_BP, algorithm = "classic", statistic = "fisher")

##
## -- Classic Algorithm --
##
## the algorithm is scoring 3620 nontrivial nodes
## parameters:
## test statistic: fisher

results_Fisher_elim_BP <- runTest(GO_data_BP, algorithm = "elim", statistic = "fisher")

##
## -- Elim Algorithm --
##
## the algorithm is scoring 3620 nontrivial nodes
## parameters:
## test statistic: fisher

24
## cutOff: 0.01
##
## Level 18: 2 nodes to be scored (0 eliminated genes)
##
## Level 17: 5 nodes to be scored (0 eliminated genes)
##
## Level 16: 10 nodes to be scored (12 eliminated genes)
##
## Level 15: 16 nodes to be scored (32 eliminated genes)
##
## Level 14: 45 nodes to be scored (42 eliminated genes)
##
## Level 13: 99 nodes to be scored (208 eliminated genes)
##
## Level 12: 151 nodes to be scored (498 eliminated genes)
##
## Level 11: 250 nodes to be scored (741 eliminated genes)
##
## Level 10: 381 nodes to be scored (1211 eliminated genes)
##
## Level 9: 485 nodes to be scored (1374 eliminated genes)
##
## Level 8: 536 nodes to be scored (2086 eliminated genes)
##
## Level 7: 555 nodes to be scored (3037 eliminated genes)
##
## Level 6: 490 nodes to be scored (3372 eliminated genes)
##
## Level 5: 343 nodes to be scored (4412 eliminated genes)
##
## Level 4: 183 nodes to be scored (5318 eliminated genes)
##
## Level 3: 50 nodes to be scored (6462 eliminated genes)
##
## Level 2: 18 nodes to be scored (6829 eliminated genes)
##
## Level 1: 1 nodes to be scored (6829 eliminated genes)

results_KS_BP <- runTest(GO_data_BP, algorithm = "classic", statistic = "ks")

##
## -- Classic Algorithm --
##
## the algorithm is scoring 5131 nontrivial nodes
## parameters:
## test statistic: ks
## score order: increasing

results_KS_elim_BP <- runTest(GO_data_BP, algorithm = "elim", statistic = "ks")

##
## -- Elim Algorithm --
##

25
## the algorithm is scoring 5131 nontrivial nodes
## parameters:
## test statistic: ks
## cutOff: 0.01
## score order: increasing
##
## Level 19: 1 nodes to be scored (0 eliminated genes)
##
## Level 18: 2 nodes to be scored (0 eliminated genes)
##
## Level 17: 7 nodes to be scored (0 eliminated genes)
##
## Level 16: 18 nodes to be scored (12 eliminated genes)
##
## Level 15: 32 nodes to be scored (24 eliminated genes)
##
## Level 14: 75 nodes to be scored (24 eliminated genes)
##
## Level 13: 158 nodes to be scored (45 eliminated genes)
##
## Level 12: 249 nodes to be scored (180 eliminated genes)
##
## Level 11: 450 nodes to be scored (526 eliminated genes)
##
## Level 10: 599 nodes to be scored (1016 eliminated genes)
##
## Level 9: 726 nodes to be scored (1461 eliminated genes)
##
## Level 8: 756 nodes to be scored (2496 eliminated genes)
##
## Level 7: 744 nodes to be scored (3114 eliminated genes)
##
## Level 6: 615 nodes to be scored (4000 eliminated genes)
##
## Level 5: 423 nodes to be scored (4349 eliminated genes)
##
## Level 4: 206 nodes to be scored (6085 eliminated genes)
##
## Level 3: 51 nodes to be scored (6848 eliminated genes)
##
## Level 2: 18 nodes to be scored (6968 eliminated genes)
##
## Level 1: 1 nodes to be scored (6968 eliminated genes)

# Substitute zero p-values with a very small number so that it is possible to compute the
# log10
results_Fisher_BP@score[which(results_Fisher_BP@score==0)]=1e-300
results_Fisher_elim_BP@score[which(results_Fisher_elim_BP@score==0)]=1e-300
results_KS_BP@score[which(results_KS_BP@score==0)]=1e-300
results_KS_elim_BP@score[which(results_KS_elim_BP@score==0)]=1e-300

# Convert results into a table

all_res_BP_just_pval <- GenTable(GO_data_BP,
classicFisher = results_Fisher_BP,

26
elimFisher = results_Fisher_elim_BP,
classicKS = results_KS_BP,
elimKS = results_KS_elim_BP,
ranksOf = "classicFisher",
topNodes = length(score(results_Fisher_BP)))

# Write results to a table

write.csv(all_res_BP_just_pval,paste(working_dir,prefix,"_topGO_results.csv",sep=""),quote=FALSE)

Open the file you just created in your working directory (GSE2706_8h_topGO_results.csv). In case the
analysis is taking too long pre-computed results can be found at:

~/practicals/Thursday/Melchiotti/Results/topGO/GSE2706_8h_topGO_results.csv
(on the cluster computer)

\\kclad\groups\BioinformaticsWorkshop\practicals\Thursday\Melchiotti\Results\topGO\
GSE2706_8h_topGO_results.csv
(on the Windows shared driver)

Q: How many functions are significantly enriched according to classic Fisher? How many according to elim
Fisher?
Q: How many functions are significantly enriched according to classic KS? How many according to elim KS?
This can be done directly in R by examining the data frame all_res_BP_just_pval.

print("Number of significant functions according to classic Fisher")

## [1] "Number of significant functions according to classic Fisher"

classicFisher <- which(all_res_BP_just_pval$classicFisher<0.05)

print(length(classicFisher))

## [1] 783

print("Number of significant functions according to elim Fisher")

## [1] "Number of significant functions according to elim Fisher"

elimFisher <- which(all_res_BP_just_pval$elimFisher<0.05)

print(length(elimFisher))

## [1] 540

print("Overlap between classic Fisher and elim Fisher")

## [1] "Overlap between classic Fisher and elim Fisher"

27
print(length(intersect(classicFisher,elimFisher)))

## [1] 437

print("Number of significant functions according to classic KS")

## [1] "Number of significant functions according to classic KS"

classicKS <- which(all_res_BP_just_pval$classicKS<0.05)

print(length(classicKS))

## [1] 837

print("Number of significant functions according to elim KS")

## [1] "Number of significant functions according to elim KS"

elimKS <- which(all_res_BP_just_pval$elimKS<0.05)

print(length(elimKS))

## [1] 536

print("Overlap between classic KS and elim KS")

## [1] "Overlap between classic KS and elim KS"

print(length(intersect(classicKS,elimKS)))

## [1] 494

Q: Can you find an example of a function that is enriched when running KS but not when running Fisher?

results<-all_res_BP_just_pval[which(as.numeric(all_res_BP_just_pval$classicKS)<0.05
& as.numeric(all_res_BP_just_pval$classicFisher)
>0.05),]

## Warning in which(as.numeric(all_res_BP_just_pval$classicKS) < 0.05 &

## as.numeric(all_res_BP_just_pval$classicFisher) > : NAs introduced by
## coercion

print(tail(results))

## GO.ID Term Annotated

## 4960 GO:0072182 regulation of nephron tubule epithelial ... 11
## 5024 GO:0097031 mitochondrial respiratory chain complex ... 15
## 5080 GO:1902186 regulation of viral release from host ce... 15
## 5084 GO:1902253 regulation of intrinsic apoptotic signal... 16

28
## 5116 GO:2000696 regulation of epithelial cell differenti... 13
## 5129 GO:2001251 negative regulation of chromosome organi... 41
## Significant Expected Rank in classicFisher classicFisher elimFisher
## 4960 0 0.28 4960 1.00000 1.00000
## 5024 0 0.38 5024 1.00000 1.00000
## 5080 0 0.38 5080 1.00000 1.00000
## 5084 0 0.41 5084 1.00000 1.00000
## 5116 0 0.33 5116 1.00000 1.00000
## 5129 0 1.05 5129 1.00000 1.00000
## classicKS elimKS
## 4960 0.01102 0.01102
## 5024 0.00068 1.00000
## 5080 0.03267 0.03267
## 5084 0.01886 0.01886
## 5116 0.04885 0.04885
## 5129 0.04448 0.04448

Plot the results on the Gene Ontology tree

# Plot and save results

showSigOfNodes(GO_data_BP, score(results_Fisher_BP), firstSigNodes = 10, useInfo = "all")

GO:0008150
biological_process
1
363 / 14149

GO:0002376 GO:0050896
immune system proces... response to stimulus
< 1e−20 1.05e−19
153 / 2037 258 / 6776

GO:0006955 GO:0006950 GO:0009605 GO:0009607

immune response response to stress response to external... response to biotic s...
< 1e−20 < 1e−20 1.19e−20 < 1e−20
125 / 1274 173 / 3100 112 / 1778 84 / 629

GO:0006952 GO:0043207 GO:0051704

defense response response to external... multi−organism proce...
< 1e−20 < 1e−20 1.90e−17
125 / 1333 84 / 602 110 / 1900

GO:0002252 GO:0051707
immune effector proc... response to other or...
< 1e−20 < 1e−20
69 / 544 84 / 602

GO:0098542 GO:0009615
defense response to ... response to virus
< 1e−20 < 1e−20
55 / 325 52 / 273

GO:0051607
defense response to ...
< 1e−20
43 / 195

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 15
## Number of Edges = 20
##
## $complete.dag
## [1] "A graph with 15 nodes."

29
pdf(paste(working_dir,prefix,"_Fisher_BP_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_Fisher_BP), firstSigNodes = 10, useInfo = "all")

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 15
## Number of Edges = 20
##
## $complete.dag
## [1] "A graph with 15 nodes."

dev.off()

## pdf
## 2

showSigOfNodes(GO_data_BP, score(results_Fisher_elim_BP), firstSigNodes = 5,

useInfo = "all")

GO:0008150
biological_process
1.000000
363 / 14149

GO:0002376 GO:0050896 GO:0023052 GO:0044699 GO:0009987 GO:0051704 GO:0065007

immune system proces... response to stimulus signaling single−organism proc... cellular process multi−organism proce... biological regulatio...
0.123479 0.611162 0.073995 0.396224 0.860723 0.598114 0.951017
153 / 2037 258 / 6776 206 / 4941 326 / 11446 337 / 12644 110 / 1900 271 / 8802

GO:0006955 GO:0006950 GO:0009605 GO:0009607 GO:0042221 GO:0051716 GO:0044700 GO:0044763 GO:0044764 GO:0044419 GO:0050789
immune response response to stress response to external... response to biotic s... response to chemical cellular response to... single organism sign... single−organism cell... multi−organism cellu... interspecies interac... regulation of biolog...
4.30e−09 0.025170 0.755932 0.141046 0.894425 0.023986 0.073995 0.337651 0.600011 0.246480 0.903062
125 / 1274 173 / 3100 112 / 1778 84 / 629 141 / 3044 222 / 5376 206 / 4941 307 / 10311 38 / 678 40 / 731 262 / 8309

GO:0006952 GO:0009611 GO:0043207 GO:0010033 GO:0070887 GO:0007154 GO:0050794 GO:0044403 GO:0043900 GO:0048519
defense response response to wounding response to external... response to organic ... cellular response to... cell communication regulation of cellul... symbiosis, encompass... regulation of multi−... negative regulation ...
0.014586 0.202730 0.081788 0.846826 0.515286 0.114443 0.936779 0.246480 0.372157 0.764962
125 / 1333 80 / 1129 84 / 602 116 / 2173 109 / 2008 205 / 5009 247 / 7865 40 / 731 34 / 248 147 / 3346

GO:0045087 GO:0006954 GO:0051707 GO:0034097 GO:0071310 GO:0007165 GO:0016032 GO:0048523 GO:0043903 GO:0043901
innate immune respon... inflammatory respons... response to other or... response to cytokine cellular response to... signal transduction viral process negative regulation ... regulation of symbio... negative regulation ...
0.021511 8.09e−14 0.081788 0.178865 0.366757 0.010106 0.233544 0.554931 0.734954 0.182613
73 / 786 58 / 520 84 / 602 67 / 555 93 / 1605 196 / 4434 36 / 668 128 / 3054 21 / 157 22 / 93

GO:0002252 GO:0098542 GO:0009615 GO:0034340 GO:0071345 GO:0007166 GO:0019058 GO:0050792

immune effector proc... defense response to ... response to virus response to type I i... cellular response to... cell surface recepto... viral life cycle regulation of viral ...
0.698497 0.522037 7.12e−05 1.000000 0.113446 0.170434 0.733635 0.698518
69 / 544 55 / 325 52 / 273 24 / 76 57 / 461 124 / 2544 21 / 285 18 / 135

GO:0051607 GO:0071357 GO:0019221 GO:0019079 GO:0048525

defense response to ... cellular response to... cytokine−mediated si... viral genome replica... negative regulation ...
5.84e−19 1.000000 0.000486 1.000000 0.476850
43 / 195 24 / 75 52 / 358 16 / 67 17 / 63

GO:0060337 GO:0045069
type I interferon si... regulation of viral ...
< 1e−20 1.000000
24 / 75 16 / 53

GO:0045071
negative regulation ...
2.00e−16
16 / 37

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 55
## Number of Edges = 95
##
## $complete.dag
## [1] "A graph with 55 nodes."

30
pdf(paste(working_dir,prefix,"_Fisher_BP_elim_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_Fisher_elim_BP), firstSigNodes = 5,
useInfo = "all")

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 55
## Number of Edges = 95
##
## $complete.dag
## [1] "A graph with 55 nodes."

dev.off()

## pdf
## 2

showSigOfNodes(GO_data_BP, score(results_KS_BP), firstSigNodes = 10, useInfo = "all")

GO:0008150
biological_process
1.00000
363 / 14149

GO:0002376 GO:0050896 GO:0051704

immune system proces... response to stimulus multi−organism proce...
1.28e−14 1.47e−07 0.00131
153 / 2037 258 / 6776 110 / 1900

GO:0006955 GO:0042221 GO:0006950 GO:0009605 GO:0009607

immune response response to chemical response to stress response to external... response to biotic s...
1.65e−18 1.54e−05 6.23e−12 2.88e−07 1.21e−16
125 / 1274 141 / 3044 173 / 3100 112 / 1778 84 / 629

GO:0010033 GO:0006952 GO:0043207

response to organic ... defense response response to external...
5.52e−05 3.78e−17 9.87e−18
116 / 2173 125 / 1333 84 / 602

GO:0002252 GO:0034097 GO:0051707

immune effector proc... response to cytokine response to other or...
1.75e−13 1.60e−13 9.87e−18
69 / 544 67 / 555 84 / 602

GO:0098542 GO:0009615
defense response to ... response to virus
1.49e−13 8.98e−15
55 / 325 52 / 273

GO:0051607
defense response to ...
1.57e−14
43 / 195

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 18
## Number of Edges = 23
##
## $complete.dag
## [1] "A graph with 18 nodes."

31
pdf(paste(working_dir,prefix,"_KS_BP_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_KS_BP), firstSigNodes = 10, useInfo = "all")

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 18
## Number of Edges = 23
##
## $complete.dag
## [1] "A graph with 18 nodes."

dev.off()

## pdf
## 2

showSigOfNodes(GO_data_BP, score(results_KS_elim_BP), firstSigNodes = 5, useInfo = "all")

GO:0008150
biological_process
1.00000
363 / 14149

GO:0002376 GO:0050896 GO:0023052 GO:0044699 GO:0009987 GO:0051704 GO:0065007

immune system proces... response to stimulus signaling single−organism proc... cellular process multi−organism proce... biological regulatio...
0.05871 0.39634 0.34824 0.77564 0.81331 0.79073 0.88152
153 / 2037 258 / 6776 206 / 4941 326 / 11446 337 / 12644 110 / 1900 271 / 8802

GO:0009607 GO:0009605 GO:0006950 GO:0006955 GO:0042221 GO:0051716 GO:0044700 GO:0044763 GO:0044764 GO:0044419 GO:0050789
response to biotic s... response to external... response to stress immune response response to chemical cellular response to... single organism sign... single−organism cell... multi−organism cellu... interspecies interac... regulation of biolog...
0.62871 0.59938 0.15545 5.13e−05 0.50310 0.20136 0.34824 0.42789 0.08659 0.23707 0.53550
84 / 629 112 / 1778 173 / 3100 125 / 1274 141 / 3044 222 / 5376 206 / 4941 307 / 10311 38 / 678 40 / 731 262 / 8309

GO:0043207 GO:0009611 GO:0006952 GO:0010033 GO:0070887 GO:0007154 GO:0044403 GO:0050794 GO:0043900 GO:0048519
response to external... response to wounding defense response response to organic ... cellular response to... cell communication symbiosis, encompass... regulation of cellul... regulation of multi−... negative regulation ...
1.00000 0.39213 0.28342 0.41982 0.45791 0.41160 0.23707 0.53284 0.03474 0.04207
84 / 602 80 / 1129 125 / 1333 116 / 2173 109 / 2008 205 / 5009 40 / 731 247 / 7865 34 / 248 147 / 3346

GO:0051707 GO:0006954 GO:0045087 GO:0034097 GO:0071310 GO:0007165 GO:0016032 GO:0043903 GO:0048523 GO:0043901
response to other or... inflammatory respons... innate immune respon... response to cytokine cellular response to... signal transduction viral process regulation of symbio... negative regulation ... negative regulation ...
0.00687 2.39e−07 0.00443 0.12689 0.32227 0.36178 0.05393 0.72331 0.02574 0.16752
84 / 602 58 / 520 73 / 786 67 / 555 93 / 1605 196 / 4434 36 / 668 21 / 157 128 / 3054 22 / 93

GO:0002252 GO:0009615 GO:0098542 GO:0034340 GO:0035456 GO:0071345 GO:0007166 GO:0019058 GO:0050792

immune effector proc... response to virus defense response to ... response to type I i... response to interfer... cellular response to... cell surface recepto... viral life cycle regulation of viral ...
0.80260 0.02304 0.57644 0.17964 5.16e−07 0.55731 0.50854 0.21865 0.63408
69 / 544 52 / 273 55 / 325 24 / 76 7 / 14 57 / 461 124 / 2544 21 / 285 18 / 135

GO:0051607 GO:0071357 GO:0019221 GO:0019079 GO:0048525

defense response to ... cellular response to... cytokine−mediated si... viral genome replica... negative regulation ...
1.10e−11 1.00000 2.05e−05 0.12314 0.11511
43 / 195 24 / 75 52 / 358 16 / 67 17 / 63

GO:0060337 GO:0045069
type I interferon si... regulation of viral ...
2.19e−10 0.52987
24 / 75 16 / 53

GO:0045071
negative regulation ...
2.60e−09
16 / 37

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 56
## Number of Edges = 96
##
## $complete.dag
## [1] "A graph with 56 nodes."

32
pdf(paste(working_dir,prefix,"_KS_BP_elim_just_pval.pdf",sep=""))
showSigOfNodes(GO_data_BP, score(results_KS_elim_BP), firstSigNodes = 5, useInfo = "all")

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 56
## Number of Edges = 96
##
## $complete.dag
## [1] "A graph with 56 nodes."

dev.off()

## pdf
## 2

The following code contains two useful functions to retrieve the full name of a GO term and all genes belonging
to a particular GO category since topGO tends to truncate GO term full names in the tables and graphs.

# Retrieve full name of a GO term and all genes belonging to that category
go_id<-"GO:0045087"
print(paste("Full name of term GO:0045087:",Term("GO:0045087"),sep=""))

## [1] "Full name of term GO:0045087:innate immune response"

ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")

gene.data <- getBM(attributes=c('hgnc_symbol', 'go_id'),
filters = 'go_id', values = go_id, mart = ensembl)
print("Number of genes contained in GO term GO:0045087:",length(gene.data))

## [1] "Number of genes contained in GO term GO:0045087:"

print(head(gene.data))

## hgnc_symbol go_id
## 1 IPO7 GO:0045087
## 2 TRIM27 GO:0045087
## 3 IFNW1 GO:0045087
## 4 PIK3C3 GO:0045087
## 5 TOLLIP GO:0045087
## 6 DUSP7 GO:0045087

Q: Compare the results of the two Fisher’s tests (classic and elim) using the PDF files just generated. Which
nodes lose significance using the elim version?
Q: Compare the results of the Fisher’s classic test with the results of the KS’s classis test using the PDF files
just generated. Are there any differences?
Close RStudio and MobaXterm.

33
8. Help Links

1. DAVID: http://david.abcc.ncifcrf.gov/helps/functional_annotation.html
2. GSEA: http://www.broadinstitute.org/gsea/doc/desktop_tutorial.jsp
3. PathVisio: http://www.pathvisio.org/documentation/
4. topGO: http://www.bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf

FlowJo Advanced Tutorial PDF
No ratings yet
FlowJo Advanced Tutorial PDF
87 pages
BI W2 Ex Ans
No ratings yet
BI W2 Ex Ans
9 pages
Desktop GARP Users Manual
No ratings yet
Desktop GARP Users Manual
13 pages
Wetland Flora
No ratings yet
Wetland Flora
15 pages
D 791
No ratings yet
D 791
15 pages
BPGA User Manual
No ratings yet
BPGA User Manual
9 pages
Using The Geoquery Package: Sean Davis September 21, 2014
No ratings yet
Using The Geoquery Package: Sean Davis September 21, 2014
15 pages
Tutorial: Expression Analysis Using RNA-Seq
No ratings yet
Tutorial: Expression Analysis Using RNA-Seq
19 pages
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
No ratings yet
Chua Yuen Chong, Gerrard - BIO61604 - Pract 3 and 4
20 pages
AUDocker LE Manual
No ratings yet
AUDocker LE Manual
11 pages
Dchip Expression
No ratings yet
Dchip Expression
4 pages
Exercise 7 Bioinformatics
No ratings yet
Exercise 7 Bioinformatics
8 pages
Lab Work
No ratings yet
Lab Work
29 pages
Taxonomic Profiling
No ratings yet
Taxonomic Profiling
13 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Extracted From Onechannelgui Vignettes.: Figure 1: Microarray Analysis Pipe-Line
No ratings yet
Extracted From Onechannelgui Vignettes.: Figure 1: Microarray Analysis Pipe-Line
35 pages
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
No ratings yet
Multiple Sequence Alignment Tools: Tutorials and Comparative Analysis
19 pages
I Am Sharing 'Document' With You
No ratings yet
I Am Sharing 'Document' With You
3 pages
2020 11 08 373720v1 Full
No ratings yet
2020 11 08 373720v1 Full
13 pages
MEGAN Handbook
No ratings yet
MEGAN Handbook
59 pages
EXP.No.1
No ratings yet
EXP.No.1
7 pages
Tutorial Online Calculator
No ratings yet
Tutorial Online Calculator
15 pages
PC#1_Exercises_Introduction_to_NCBI_2020_v2
No ratings yet
PC#1_Exercises_Introduction_to_NCBI_2020_v2
4 pages
Exp 6
No ratings yet
Exp 6
12 pages
2.exome Variant Interpretation SNVs CNVs
No ratings yet
2.exome Variant Interpretation SNVs CNVs
9 pages
Supplementary List of Software For Bioinformatics and Comparative Genomics
No ratings yet
Supplementary List of Software For Bioinformatics and Comparative Genomics
5 pages
PIPE2 Tutorial
No ratings yet
PIPE2 Tutorial
11 pages
IFU Genotyper
No ratings yet
IFU Genotyper
17 pages
Abacus Manual
No ratings yet
Abacus Manual
11 pages
FlowJo Introduction Tutorial
No ratings yet
FlowJo Introduction Tutorial
87 pages
6129 Proteomics Praticals
No ratings yet
6129 Proteomics Praticals
16 pages
AntCorGen Manual
No ratings yet
AntCorGen Manual
3 pages
Bioinformatics Assingment - B8.Docx Alex Presly-37
No ratings yet
Bioinformatics Assingment - B8.Docx Alex Presly-37
10 pages
Typing Epidemiological Clustering
No ratings yet
Typing Epidemiological Clustering
20 pages
Exp 5
No ratings yet
Exp 5
6 pages
Module8 RNASeq Pathogen Practical Manual
No ratings yet
Module8 RNASeq Pathogen Practical Manual
23 pages
Bioinf Workshop2a
No ratings yet
Bioinf Workshop2a
20 pages
TB
No ratings yet
TB
143 pages
PC#1_Exercises_Introduction_to_NCBI_2020-solved
No ratings yet
PC#1_Exercises_Introduction_to_NCBI_2020-solved
6 pages
Exercises For Phylogeny: Exercise 1. Parsimony and Rooted Versus Unrooted Trees
No ratings yet
Exercises For Phylogeny: Exercise 1. Parsimony and Rooted Versus Unrooted Trees
11 pages
Analyze Viral Hybrid Data and Identify Integration
No ratings yet
Analyze Viral Hybrid Data and Identify Integration
12 pages
Multi Blast
No ratings yet
Multi Blast
3 pages
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
No ratings yet
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
2 pages
Virtual Screening of Small Molecules Using AutoDock Vina
No ratings yet
Virtual Screening of Small Molecules Using AutoDock Vina
2 pages
Manual Ins Iliico Procedures
No ratings yet
Manual Ins Iliico Procedures
57 pages
TCS1 21
No ratings yet
TCS1 21
8 pages
MEMELink
No ratings yet
MEMELink
3 pages
Rast Tutorial
No ratings yet
Rast Tutorial
10 pages
Fat Noews Docx (5)
No ratings yet
Fat Noews Docx (5)
27 pages
Prottest: Selection of Best-Fit Models of Protein Evolution: Abascal@Mncn - Csic.Es Dposada@Uvigo - Es Rafaz@Mncn - Csic.Es
No ratings yet
Prottest: Selection of Best-Fit Models of Protein Evolution: Abascal@Mncn - Csic.Es Dposada@Uvigo - Es Rafaz@Mncn - Csic.Es
17 pages
Mfuzzgui PDF
No ratings yet
Mfuzzgui PDF
7 pages
The Stanford Microarray Database: Nucleic Acids Research, 2001, Vol. 29, No. 1 © 2001 Oxford University Press
No ratings yet
The Stanford Microarray Database: Nucleic Acids Research, 2001, Vol. 29, No. 1 © 2001 Oxford University Press
4 pages
Pre COBOL Test
No ratings yet
Pre COBOL Test
5 pages
Bioinformatics Module.docx
No ratings yet
Bioinformatics Module.docx
8 pages
Yeastmine-An Integrated Data Warehouse For Saccharomyces Cerevisiae Data As A Multipurpose Tool-Kit
No ratings yet
Yeastmine-An Integrated Data Warehouse For Saccharomyces Cerevisiae Data As A Multipurpose Tool-Kit
9 pages
Practical 2 sequence alignment
No ratings yet
Practical 2 sequence alignment
8 pages
Coriandis Manual
No ratings yet
Coriandis Manual
17 pages
Assignment I
No ratings yet
Assignment I
4 pages
Import Pandas As PD From Pandas - Tools.plotting Import Scatter - Matrix %matplotlib Inline
No ratings yet
Import Pandas As PD From Pandas - Tools.plotting Import Scatter - Matrix %matplotlib Inline
2 pages
GeneticAlgorithm Manual 1.4
No ratings yet
GeneticAlgorithm Manual 1.4
11 pages
6.2 MEGA Workshop
No ratings yet
6.2 MEGA Workshop
3 pages
Genetic Algorithm: Fundamentals and Applications
From Everand
Genetic Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Bat 10
No ratings yet
Bat 10
13 pages
Agilent GC MS Specification
No ratings yet
Agilent GC MS Specification
5 pages
Effects of Phosphoric Acid and Phosphates On Magnesium Oxysulfate Cement
No ratings yet
Effects of Phosphoric Acid and Phosphates On Magnesium Oxysulfate Cement
12 pages
Chemprotech Exhibitors - 2023
No ratings yet
Chemprotech Exhibitors - 2023
4 pages
Electrodos Modificados de Pasta de Carbono. Artículo
No ratings yet
Electrodos Modificados de Pasta de Carbono. Artículo
9 pages
New Book Chemistry Review
No ratings yet
New Book Chemistry Review
6 pages
SCF Project
No ratings yet
SCF Project
10 pages
Safety Data Sheet: International Paint
No ratings yet
Safety Data Sheet: International Paint
9 pages
Elutriation Technique: Presented by Julie Joy Andoy
100% (1)
Elutriation Technique: Presented by Julie Joy Andoy
23 pages
Clearing
100% (1)
Clearing
17 pages
Syllabus 3-8th Semester
No ratings yet
Syllabus 3-8th Semester
56 pages
KVTBrochure 19122012 PDF
No ratings yet
KVTBrochure 19122012 PDF
12 pages
Kingspan Insulation Concrete Wall Application
No ratings yet
Kingspan Insulation Concrete Wall Application
8 pages
CGS PPT
100% (1)
CGS PPT
32 pages
Pressure Drop in Reactors: F F X P C V V X P
No ratings yet
Pressure Drop in Reactors: F F X P C V V X P
3 pages
X TMF Gf100 Series MFC Eng
No ratings yet
X TMF Gf100 Series MFC Eng
82 pages
Determination of Citrulline in Watermelon Rind: Agnes M. Rimando, Penelope M. Perkins-Veazie
No ratings yet
Determination of Citrulline in Watermelon Rind: Agnes M. Rimando, Penelope M. Perkins-Veazie
5 pages
B.tech Chem Batchno 15
No ratings yet
B.tech Chem Batchno 15
67 pages
CISPLATIN
No ratings yet
CISPLATIN
7 pages
Organic Nomenclature - 1 (SR)
No ratings yet
Organic Nomenclature - 1 (SR)
25 pages
Calculation BFW
100% (1)
Calculation BFW
22 pages
Turco Plaudit TDS
No ratings yet
Turco Plaudit TDS
2 pages
Materials Selection For Wear Resistance
No ratings yet
Materials Selection For Wear Resistance
28 pages
2-Thermochemistry 2018
No ratings yet
2-Thermochemistry 2018
30 pages
Ch03-Crystal Structure and Geometry
No ratings yet
Ch03-Crystal Structure and Geometry
91 pages
COMSOL Implementation of A Multiphase Fluid Flow Model in Porous Media
No ratings yet
COMSOL Implementation of A Multiphase Fluid Flow Model in Porous Media
7 pages
Erle Rivers High School: Biology 20 Course Outline
No ratings yet
Erle Rivers High School: Biology 20 Course Outline
6 pages
7th Edition: Australian Dangerous Goods Code Information Guide
No ratings yet
7th Edition: Australian Dangerous Goods Code Information Guide
62 pages

DAVID Tutorial

Uploaded by

DAVID Tutorial

Uploaded by

Pathway and Network Analysis Hands-On Session

1. Mount the local drive on the Windows machine

• Affymetrix Probe ID (ID)

3. Select genes of interest

Q: How many genes are significantly differentially expressed?

4. Perform gene enrichment using DAVID (on the Windows machine)

1. Click on Functional Annotation

2. Upload the list deg_GSE2706_8h_filtered_by_pval.txt choosing Affymetrix_3PRIME_IVT_ID as

• Functional Annotation Clustering (similar enriched categories are clustered together)

5. Close the browser.

5. Perform KEGG enrichment using GSEA (on the Windows machine)

1. Click on Load data

3. Click on Run GSEA

15. Close GSEA.

6. Visualize a pathway using PathVisio (on the Windows machine)

1. Click on File > Open and select the Hs_Toll-like_receptor_signaling_pathway_WP75_72133.gpml

This file is an annotation file to map gene IDs to pathway components.

11. Close PathVisio.

~/practicals/Thursday/Melchiotti/Scripts/topGOAnalysis.R (on the cluster)

# Set analysis parameters

## ID adj.P.Val P.Value logFC Gene.symbol

# Define function to select significant genes for Fisher's test

results_Fisher_elim_BP <- runTest(GO_data_BP, algorithm = "elim", statistic = "fisher")

results_KS_BP <- runTest(GO_data_BP, algorithm = "classic", statistic = "ks")

results_KS_elim_BP <- runTest(GO_data_BP, algorithm = "elim", statistic = "ks")

# Convert results into a table

# Write results to a table

print("Number of significant functions according to classic Fisher")

## [1] "Number of significant functions according to classic Fisher"

classicFisher <- which(all_res_BP_just_pval$classicFisher<0.05)

print("Number of significant functions according to elim Fisher")

## [1] "Number of significant functions according to elim Fisher"

elimFisher <- which(all_res_BP_just_pval$elimFisher<0.05)

print("Overlap between classic Fisher and elim Fisher")

## [1] "Overlap between classic Fisher and elim Fisher"

print("Number of significant functions according to classic KS")

## [1] "Number of significant functions according to classic KS"

classicKS <- which(all_res_BP_just_pval$classicKS<0.05)

print("Number of significant functions according to elim KS")

## [1] "Number of significant functions according to elim KS"

elimKS <- which(all_res_BP_just_pval$elimKS<0.05)

print("Overlap between classic KS and elim KS")

## [1] "Overlap between classic KS and elim KS"

## Warning in which(as.numeric(all_res_BP_just_pval$classicKS) < 0.05 &

## GO.ID Term Annotated

Plot the results on the Gene Ontology tree

# Plot and save results

GO:0006955 GO:0006950 GO:0009605 GO:0009607

GO:0006952 GO:0043207 GO:0051704

showSigOfNodes(GO_data_BP, score(results_Fisher_elim_BP), firstSigNodes = 5,

GO:0002376 GO:0050896 GO:0023052 GO:0044699 GO:0009987 GO:0051704 GO:0065007

GO:0002252 GO:0098542 GO:0009615 GO:0034340 GO:0071345 GO:0007166 GO:0019058 GO:0050792

GO:0051607 GO:0071357 GO:0019221 GO:0019079 GO:0048525

showSigOfNodes(GO_data_BP, score(results_KS_BP), firstSigNodes = 10, useInfo = "all")

GO:0002376 GO:0050896 GO:0051704

GO:0006955 GO:0042221 GO:0006950 GO:0009605 GO:0009607

GO:0010033 GO:0006952 GO:0043207

GO:0002252 GO:0034097 GO:0051707

showSigOfNodes(GO_data_BP, score(results_KS_elim_BP), firstSigNodes = 5, useInfo = "all")

GO:0002376 GO:0050896 GO:0023052 GO:0044699 GO:0009987 GO:0051704 GO:0065007

GO:0002252 GO:0009615 GO:0098542 GO:0034340 GO:0035456 GO:0071345 GO:0007166 GO:0019058 GO:0050792

GO:0051607 GO:0071357 GO:0019221 GO:0019079 GO:0048525

## [1] "Full name of term GO:0045087:innate immune response"

ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")

## [1] "Number of genes contained in GO term GO:0045087:"

You might also like