Gene Expression Databases - 525 - 2016
Gene Expression Databases - 525 - 2016
expression databases
• Integrated databases
– Contain array data and addiVonal data of the samples
– Array data tends to be more annotated
– More analyVcal tools
– Smaller (more QC and curaVon needed)
– O_en no direct data access
Why do they exist
• Transparency/reproducibility of publicaVons
– Journals require data to be available for analysis
– Nowadays raw data is required
– Databases offer single resource and standardized
access
• Data was generated for a specific purpose, but is
not limited to that purpose
– Can be reanalyzed in a different context
– Can be combined with other datasets
– Can be used as independent validaVon
Gene expression repository
examples
• Gene expression omnibus (www.ncbi.nlm.nih/geo/)
– 1,117,462 samples, 3848 datasets
Coexpression analysis
DifferenVal analysis
Outlier analysis
Two Search OpVons
• Gene specific search:
– Gene
• Dataset search:
– Specific condiVons/diseases
NPHS2: encodes podocin,
Gene Search
a podocyte specific protein
Gene summary view
Demographics
Dataset/Disease type
Gene Search
Gene summary view
Reference
TubulointersVVal
Glomeruli
Gene Search
CorrelaVon with clinical conVnuous variaVon
Gene: VCAN
VCAN Analysis type: GFR
Dataset Type: Diabetes
P=7.71E-7
CorrelaVon: -0.832
Legend
1. < 15 ml/min/1.73m2 (3)
2. 15 - 29 ml/min/1.73m2 (4)
3. 30 - 59 ml/min/1.73m2 (7)
4. 60 - 89 ml/min/1.73m2 (5)
5. > 90 ml/min/1.73m2 (3)
Gene Search
Outlier analysis
Outlier analysis helps to idenVfy an expression profile where differenVal
paMern is only seen in a frac.on of samples of all paVents within a disease
type.
Why do we need it: 25% of paVents show over-expression of a gene. This
gene may not generate a significant p-value in a t-test comparing DN relaVve
to normal kidney.
How to do it: Transform all samples within a dataset, so that genes could be
ranked by their expression from high to low. The data transformaVon is
performed at certain percenVle bins (75, 90 & 95%), and a line is drawn at the
percenVle of that analysis to define outliers.
For example, in an outlier analysis at the 75th percenVle, the system draws a
line at the point at which only the top 25th percenVle samples extend above
it.
Gene Search
Outlier analysis
Controls DiabeVc
DifferenVal expression – Dataset search
Export
DifferenVal expression – dataset
search – compare analysis
• Compare different analyzes
• Data is standardized on upload (centered to 0 and standardized by variance)
• all features are mapped to common idenVfier (EntrezGeneID)
Meta analysis
• Find out which genes are significantly more
expressed in glomeruli compared to
tubulointersVVum
• Can you verify that with another dataset?
• Or with more than one other dataset?
• Does it maMer if the datasets are different?
• Can you imagine a use of this funcVonality for
an exclusive filter (NOT)
Example
Concepts Analysis
Concepts are sets of genes represenVng some aspect
of biology.
Concepts are derived from both Nephromine gene
expression signatures as well as third-party sources
such as Gene Ontology, KEGG Pathways, Human
Protein Reference Database, etc.
User can upload a self-defined custom concept (a set
of genes) to Nephromine to explore it’s associaVon
with Nephromine and third-party concepts.
Concepts Analysis
Upload Custom Concept
Manage My Concepts
Change password
Podo-50-symbol
Download list from C-tools
to the desktop, then upload
tranSMART –
A plaAorm and community
• Open-source and open-
data translaVonal
biomedical research
community
• Biomedical Researchers,
Developers, Service
Providers
• Clinician Researchers
tranSMART Plaiorm:
Academics and industry
2012 St.
Jude,
2009 2012 Harvard,
Johnson 2010 One Johns
and Thomson Mind for Hopkins
Johnson Reuters Research Univ.
Can further
specify with
AND or
exclusion
Subset 1 Subset 2
Summary staVsVcs 1
DifferenVally expressed genes
Gene symbols P-values Fold change
Enlarged:
Comparisons can be saved/emailed
tranSMART – why do we care?
• Enables data exploraVon with low hurdles
• Integrates many different data types
• Has interfaces to real analysis tools
• Provides a consistent data set
• Can be run locally/ insVtuVonal etc
• Can possibly be “shared” across insVtuVons
– McMurry et al, PLOS one: Shrine: enabling naConally scalable MulC-
site disease studies
• Go to: hMp://transmarioundaVon.org/
Acknowledgements