0% found this document useful (0 votes)

26 views

Statistics For Microarrays: Normalization

This document discusses normalization techniques for microarray data. It begins by introducing the need for normalization to correct for systematic differences between samples not due to biological variation. It then describes several common normalization methods including global adjustment, intensity-dependent normalization using LOWESS, and within print-tip group normalization. The document compares different normalization schemes and shows their impact on microarray data, noting that normalization reduces systematic effects but can increase variability. It emphasizes choosing normalization based on the experimental design and examining data before and after normalization.

Uploaded by

Karthi Keyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Statistics For Microarrays: Normalization

Uploaded by

Karthi Keyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

Statistics for Microarrays

Normalization

Class web site:

http://statwww.epfl.ch/davison/teaching/Microarrays/ETHZ/
Biological question
Differentially expressed genes
Sample class prediction etc.

Experimental design

Microarray experiment
16-bit TIFF files
Image analysis
(Rfg, Rbg), (Gfg, Gbg)
Normalization
R, G
Estimation Testing Clustering Discrimination

Biological verification
and interpretation
Preprocessing: Data Visualization

• Was the experiment a success?

• Are there any specific problems?

• What analysis tools should be used?

Tools for Microarray
Normalization and Analysis

• Both commercial and free software

• R (use sma package or Bioconductor:

http://www.bioconductor.org/)
Red/Green overlay images
Co-registration and overlay offers a quick
visualization, revealing information on color
balance, uniformity of hybridization, spot
uniformity, background, and artefacts
such as dust or scratches
Bad: high bg, ghost spots, little
Good: low bg, detectable d.e.
d.e.
Scatterplots: always log*, always rotate

log2R vs log2G M=log2R/G vs A=log2√RG

* Other transformations can provide improvement

Histograms

Signal/Noise = log2(spot intensity/background intensity)

Boxplots of log2R/G

Liver samples from 16 mice: 8 WT, 8 ApoAI KO

Spatial plots: background from the two slides
Highlighting extreme log ratios

Top (black) and bottom (green) 5% of log ratios

Pin group (sub-array) effects

Lowess lines through points from pin groups Boxplots of log ratios by pin group
Boxplots and highlighting pin group
effects
Log-ratios

Print-tip groups

Clear example of spatial bias

Plate effects
Clearly visible plate effects

KO #8

Probes: ~6,000 cDNAs, including 200 related to lipid metabolism.

Arranged in a 4x4 array of 19x21 sub-arrays.
Time of printing effects

spot number

False color overlay Boxplots within pin-groups Scatter (MA-)plots

Similar patterns apparent in non
self-self hybridizations

From the NCI60 data set (Stanford web site)

From Lawrence Berkeley National Laboratory
Normalization Methods (I)
• Normalization based on a global adjustment
log2 R/G -> log2 R/G - c = log2 R/(kG)
Choices for k or c = log2k are c = median or mean of log
ratios for a particular gene set (e.g. all genes, or control
or housekeeping genes). Or, total intensity
normalization, where k = ∑Ri/ ∑Gi.
• Intensity-dependent normalization
Here, run a line through the middle of the MA plot,
shifting the M value of the pair (A,M) by c=c(A), i.e.
log2 R/G -> log2 R/G - c (A) = log2 R/(k(A)G).
One estimate of c(A) is made using the LOWESS
function of Cleveland (1979): LOcally WEighted
Scatterplot Smoothing.
Normalization Methods (II)
• Within print-tip group normalization
In addition to intensity-dependent variation in log ratios,
spatial bias can also be a significant source of systematic
error.
Most normalization methods do not correct for spatial
effects produced by hybridization artefacts or print-tip or
plate effects during the construction of the microarrays.

It is possible to correct for both print-tip and intensity-

dependent bias by performing LOWESS fits to the data
within print-tip groups, i.e.
log2 R/G -> log2 R/G - ci(A) = log2 R/(ki(A)G),
where ci(A) is the LOWESS fit to the MA-plot for the ith
grid only.
Normalization: Which Spots to use?
The LOWESS lines can be run through many different
sets of points, and each strategy has its own implicit set
of assumptions justifying its applicability.
For example, the use of a global LOWESS approach can
be justified by supposing that, when stratified by mRNA
abundance, a) only a minority of genes are expected to be
differentially expressed, or
b) any differential expression is
as likely to be up-regulation as down-regulation.
Pin-group LOWESS requires stronger assumptions: that
one of the above applies within each pin-group.
The use of other sets of genes, e.g. control or
housekeeping genes, involve similar assumptions.
Normalization makes a difference

Global scale, global lowess, pin-group lowess; spatial plot after, smooth histograms of M after
Normalization by controls:
Microarray Sample Pool titration
series
Pool the
whole library

Control set to aid intensity-dependent normalization

Different concentrations in titration series
Spotted evenly spread across the slide in each pin-group
Comparison of Normalization
Schemes
(courtesy of Jason Goncalves)

• No consensus on best normalization method

• Experiment done to assess the common
normalization methods
• Based on reciprocal labeling experimental
data for a series of 140 replicate
experiments on two different arrays each
with 19,200 spots
DESIGN OF RECIPROCAL
LABELING EXPERIMENT

• Replicate experiment
with same mRNA pools
but invert fluors (dye
swap)
• Replicates are
independent experiments
• Scan, quantify,
normalize as usual
Comparison of Normalization Methods - Using 140 19K Microarrays

0.46

0.44

0.42

0.4
Average Mean Deviation Value

0.38

0.36

***
0.34

0.32

0.3
Pre Normalized Global Intensity Subarray Intensity Global Ratio Sub-Array Ratio Global LOWESS Subarray LOWESS
Normalization Method
Scale normalization: between slides

Boxplots of log ratios from 3 replicate self-self

hybridizations
Left panel: before normalization
Middle panel: after within print-tip group normalization
Right panel: after a further between-slide scale
normalization
The “NCI 60” experiments (no bg)

Some scale normalization seems desirable

Scale normalization: another data set

Log-ratios

Only small differences in spread apparent; no action

required.
One way of taking scale into account

Assumption: All slides have the same spread in M

True log ratio is mij where i represents different

slides and j represents different spots.

Observed is Mij, where

Mij = ai mij

Robust estimate of ai is

MADi = medianj { |yij - median(yij) | }

A slightly harder normalization problem

Global lowess doesn’t do the trick here

Print-tip-group normalization helps
But not completely

Still a lot of scatter in the middle in a WT vs KO comparison

Effects of previous normalization

Before normalization After print-tip-group

normalization
Within print-tip-group box plots of
M after print-tip-group
normalization
Taking scale into account, cont.

Assumption: All print-tip-groups have the same

spread in M
True log ratio is mij where i represents
different print-tip-groups and j represents
different spots.
Observed is Mij, where
Mij = ai mij
Robust estimate of ai is

MADi = medianj { |yij - median(yij) | }

Effect of location & scale
normalization

Clearly care is needed in making decisions like this

A comparison of three M v A plots

Unnormalized Print-tip normalization Print tip & scale n

The same normalization on another data set

Before

After

.
Normalization: Summary
• Reduces systematic (not random) effects
• Makes it possible to compare several arrays

• Use logratios (M vs A plots)

• Lowess normalization (dye bias)
• MSP titration series – composite normalization
• Pin-group location normalization
• Pin-group scale normalization
• Between slide scale normalization

• Control Spots
• Normalization introduces more variability
• Outliers (bad spots) are handled with replication
Affymetrix Oligo Chips

• Only one “color”

• Different technology, different
normalization issues
• Affy chip normalization is an active
research area – see
http://www.stat.berkeley.edu/users/
terry/zarray/Affy/affy_index.html
Pre-processed cDNA Gene
Expression Data
On p genes for n slides: p is O(10,000), n is O(10-100), but growing,

Slides
slide 1 slide 2 slide 3 slide 4 slide 5 …
1 0.46 0.30 0.80 1.51 0.90 ...
2 -0.10 0.49 0.24 0.06 0.46 ...
Genes 3 0.15 0.74 0.04 0.10 0.20 ...
4 -0.45 -1.03 -0.79 -0.56 -0.32 ...
5 -0.06 1.06 1.35 1.09 -1.09 ...

Gene expression level of gene 5 in slide 4

= (normalized) log2( Red / Green)

These values are conventionally displayed

on a red (>0) yellow (0) green (<0) scale.
Acknowledgments
Terry Speed (UCB and Matt Callow (LLNL)
WEHI)
Percy Luu (UCB)
Jean Yee Hwa Yang (UCB)
Sandrine Dudoit (UCB) John Ngai (UCB)
Ben Bolstad (UCB) Vivian Peng (UCB)
Natalie Thorne (WEHI)
Ingrid Lönnstedt
(Uppsala)
Dave Lin (Cornell)
Henrik Bengtsson (Lund)

Jason Goncalves (Iobion)

Ultrasound of The Eye and Orbit-Frazier
No ratings yet
Ultrasound of The Eye and Orbit-Frazier
517 pages
Tricks, Traps, and Shots of The - Ryan, William F.
100% (2)
Tricks, Traps, and Shots of The - Ryan, William F.
132 pages
PDFbigbook SAS
100% (1)
PDFbigbook SAS
1,115 pages
Assessing The Efficiency of Dye-Swap Normalization To Remove Systematic Bias From Two-Color Microarray Data
No ratings yet
Assessing The Efficiency of Dye-Swap Normalization To Remove Systematic Bias From Two-Color Microarray Data
27 pages
Microarray Review
No ratings yet
Microarray Review
5 pages
Limma: Linear Models For Microarray Data User's Guide
No ratings yet
Limma: Linear Models For Microarray Data User's Guide
102 pages
Tools For Statistical Analysis of Microarray Data: Matt Ritchie
No ratings yet
Tools For Statistical Analysis of Microarray Data: Matt Ritchie
36 pages
Basic Principles in Bioinformatics: Understanding Microarrays
No ratings yet
Basic Principles in Bioinformatics: Understanding Microarrays
81 pages
Normalization 1
No ratings yet
Normalization 1
23 pages
Dchip, MAS e RMA
No ratings yet
Dchip, MAS e RMA
8 pages
DNA Microarrays: DR Divya Gupta
100% (1)
DNA Microarrays: DR Divya Gupta
33 pages
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
No ratings yet
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
24 pages
Limma Guide
No ratings yet
Limma Guide
151 pages
Comparative_Study_on_Normalization_Procedures_for_Cluster_Analysis_of_Gene_Expression_Datasets_deSouto2008b
No ratings yet
Comparative_Study_on_Normalization_Procedures_for_Cluster_Analysis_of_Gene_Expression_Datasets_deSouto2008b
6 pages
Image Analysis: Pre-Processing of Affymetrix Arrays
No ratings yet
Image Analysis: Pre-Processing of Affymetrix Arrays
14 pages
Introduction to Biostatistics A Guide to Design, Analysis, and Discovery [FULL VERSION DOWNLOAD]
100% (8)
Introduction to Biostatistics A Guide to Design, Analysis, and Discovery [FULL VERSION DOWNLOAD]
15 pages
Introduction To R For Gene Expression Data Analysis
No ratings yet
Introduction To R For Gene Expression Data Analysis
11 pages
What Is The Best Way To Homogenize Data?: Objectives
No ratings yet
What Is The Best Way To Homogenize Data?: Objectives
4 pages
Scientific Inquiry
No ratings yet
Scientific Inquiry
47 pages
Introduction To Microarrays: BTCH-Paper XI Unit-IV: DNA Microarrays
No ratings yet
Introduction To Microarrays: BTCH-Paper XI Unit-IV: DNA Microarrays
43 pages
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
No ratings yet
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
398 pages
Gena Rise
No ratings yet
Gena Rise
32 pages
Senior Thesis FINAL
No ratings yet
Senior Thesis FINAL
64 pages
Limma: January 11, 2011
No ratings yet
Limma: January 11, 2011
168 pages
Numerical & Statistical Anylysis For Cheme's Part2
No ratings yet
Numerical & Statistical Anylysis For Cheme's Part2
129 pages
Analysis of Microarray Gene Expression Data eBook Full Text
100% (4)
Analysis of Microarray Gene Expression Data eBook Full Text
17 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
SAS Part001
No ratings yet
SAS Part001
15 pages
Applied Statistical Methods in Agriculture, Health and Life Sciences Direct eBook Download
100% (7)
Applied Statistical Methods in Agriculture, Health and Life Sciences Direct eBook Download
17 pages
Statistical-Methods-II
No ratings yet
Statistical-Methods-II
284 pages
Applied Statistical Methods in Agriculture, Health and Life Sciences Instant Download
No ratings yet
Applied Statistical Methods in Agriculture, Health and Life Sciences Instant Download
16 pages
Math631 Course Notes
No ratings yet
Math631 Course Notes
281 pages
Ceda
No ratings yet
Ceda
11 pages
Instant Access to Advanced High School Statistics 2nd Edition David Diez ebook Full Chapters
100% (4)
Instant Access to Advanced High School Statistics 2nd Edition David Diez ebook Full Chapters
65 pages
Goodness of Fit Techniques
No ratings yet
Goodness of Fit Techniques
589 pages
Micro Arrays II - Image Analysis and Data Pre-Processing
No ratings yet
Micro Arrays II - Image Analysis and Data Pre-Processing
34 pages
Animal Physiology Study Guide
No ratings yet
Animal Physiology Study Guide
11 pages
Advanced High School Statistics 2nd Edition David Diez pdf download
100% (2)
Advanced High School Statistics 2nd Edition David Diez pdf download
54 pages
QG Course Manual January 2008 Version 5.1
100% (1)
QG Course Manual January 2008 Version 5.1
229 pages
Applied Data Analysis For Process Improvement - A Practical Guide To Six Sigma Black Belt Statistics-Hytinen, - Annemieke
No ratings yet
Applied Data Analysis For Process Improvement - A Practical Guide To Six Sigma Black Belt Statistics-Hytinen, - Annemieke
303 pages
DevRes wk1-2
No ratings yet
DevRes wk1-2
6 pages
Microarray Full
No ratings yet
Microarray Full
56 pages
Statistical Principles of Experimental Design: Dov Stekel
No ratings yet
Statistical Principles of Experimental Design: Dov Stekel
58 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
Microarray Image Analysis and Gene Expression Ratio Statistics
No ratings yet
Microarray Image Analysis and Gene Expression Ratio Statistics
42 pages
Multivariate Statistical Analysis: Old School
No ratings yet
Multivariate Statistical Analysis: Old School
319 pages
Multivariate
0% (1)
Multivariate
319 pages
Edda Course Notes
No ratings yet
Edda Course Notes
310 pages
Essentials of Statistics
No ratings yet
Essentials of Statistics
272 pages
Foundations of Applied Statistical Methods 2nd Edition Hang Lee 2024 scribd download
100% (2)
Foundations of Applied Statistical Methods 2nd Edition Hang Lee 2024 scribd download
50 pages
2310.19244v1
No ratings yet
2310.19244v1
168 pages
Expect The Unexpected A First Course In Biostatistics 2nd Edition All Sections Download
100% (8)
Expect The Unexpected A First Course In Biostatistics 2nd Edition All Sections Download
16 pages
CRAN - Package Spatstat
No ratings yet
CRAN - Package Spatstat
3 pages
Sokal y Rohlf Bioestadistica
67% (3)
Sokal y Rohlf Bioestadistica
374 pages
Introduction To Bios Tatis Tic S Second
No ratings yet
Introduction To Bios Tatis Tic S Second
374 pages
Nmeth 4642
No ratings yet
Nmeth 4642
2 pages
Oriented Gradients Histogram: Unveiling the Visual Realm: Exploring Oriented Gradients Histogram in Computer Vision
From Everand
Oriented Gradients Histogram: Unveiling the Visual Realm: Exploring Oriented Gradients Histogram in Computer Vision
Fouad Sabry
No ratings yet
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Quant Developers' Tools and Techniques: Quant Books, #1
From Everand
Quant Developers' Tools and Techniques: Quant Books, #1
Manfred Hindering
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
UNIT 21 Formulae CSEC Revision Test: A B C Ab BC B P Q P Q
No ratings yet
UNIT 21 Formulae CSEC Revision Test: A B C Ab BC B P Q P Q
5 pages
Chapter 1
No ratings yet
Chapter 1
28 pages
Math 1105 (11) : Solutions To Assignment #1
No ratings yet
Math 1105 (11) : Solutions To Assignment #1
4 pages
Table of Specification Summative Test Second Quarter
No ratings yet
Table of Specification Summative Test Second Quarter
2 pages
Worksheet in Test Nozzle - Design
No ratings yet
Worksheet in Test Nozzle - Design
177 pages
UX31A2 (Zenbook Prime UX31A)
No ratings yet
UX31A2 (Zenbook Prime UX31A)
100 pages
H16x10B Y41
No ratings yet
H16x10B Y41
1 page
HUMAN EYE PPT .
No ratings yet
HUMAN EYE PPT .
15 pages
Customer Behavior Analysis Using Naive Bayes With Bagging Homogeneous Feature Selection Approach
No ratings yet
Customer Behavior Analysis Using Naive Bayes With Bagging Homogeneous Feature Selection Approach
12 pages
3 Program Flowchart Lecture
No ratings yet
3 Program Flowchart Lecture
37 pages
Kita Et Al (2014) - Stable Isotope Record of Neogene in The Great Plains
No ratings yet
Kita Et Al (2014) - Stable Isotope Record of Neogene in The Great Plains
13 pages
Deep Learning
No ratings yet
Deep Learning
45 pages
Kinetics of Acetone Hydrogenation For Synthesis of Isopropyl Alcohol Over Cu-Al Mixed Oxide Catalysts
No ratings yet
Kinetics of Acetone Hydrogenation For Synthesis of Isopropyl Alcohol Over Cu-Al Mixed Oxide Catalysts
9 pages
Course Policy - 2MA101 - LA - Odd 2020-21
No ratings yet
Course Policy - 2MA101 - LA - Odd 2020-21
9 pages
Region Map District Sagaing MIMU764v04 23oct2017 A4
100% (1)
Region Map District Sagaing MIMU764v04 23oct2017 A4
1 page
Unit-11: Genetic: Algorithms
No ratings yet
Unit-11: Genetic: Algorithms
20 pages
AEM Slightly Cheatsheet Part 1
No ratings yet
AEM Slightly Cheatsheet Part 1
16 pages
Types of Chemistry Flasks
No ratings yet
Types of Chemistry Flasks
15 pages
hl-mc-test-r2-equilibrium-(first-test)-
No ratings yet
hl-mc-test-r2-equilibrium-(first-test)-
9 pages
Lec.1 (Elements of Power System)
No ratings yet
Lec.1 (Elements of Power System)
23 pages
ASSNMNT WAVE MOTION
No ratings yet
ASSNMNT WAVE MOTION
10 pages
25 Hardest Questions From The GMAT Club Forum Solutions PDF
No ratings yet
25 Hardest Questions From The GMAT Club Forum Solutions PDF
11 pages
652638-Model Answer Grade9 Science (Urdu)
No ratings yet
652638-Model Answer Grade9 Science (Urdu)
5 pages
LT Notes
No ratings yet
LT Notes
2 pages
A short hedge is one in which p2
No ratings yet
A short hedge is one in which p2
29 pages
Yale Applied Math Course Syllabus Fall 2011
No ratings yet
Yale Applied Math Course Syllabus Fall 2011
3 pages
Advanced Lattice Support Structures For Metal Additive Manufacturing
No ratings yet
Advanced Lattice Support Structures For Metal Additive Manufacturing
8 pages
Module Requirement For Abstract Algebra
No ratings yet
Module Requirement For Abstract Algebra
6 pages

Statistics For Microarrays: Normalization

Uploaded by

Statistics For Microarrays: Normalization

Uploaded by

Statistics for Microarrays

Class web site:

• Was the experiment a success?

• Are there any specific problems?

• What analysis tools should be used?

• Both commercial and free software

• R (use sma package or Bioconductor:

log2R vs log2G M=log2R/G vs A=log2√RG

* Other transformations can provide improvement

Signal/Noise = log2(spot intensity/background intensity)

Liver samples from 16 mice: 8 WT, 8 ApoAI KO

Top (black) and bottom (green) 5% of log ratios

Clear example of spatial bias

Probes: ~6,000 cDNAs, including 200 related to lipid metabolism.

Green channel intensities (log2G). Printing over 4.5 days.

False color overlay Boxplots within pin-groups Scatter (MA-)plots

From the NCI60 data set (Stanford web site)

It is possible to correct for both print-tip and intensity-

Control set to aid intensity-dependent normalization

• No consensus on best normalization method

Boxplots of log ratios from 3 replicate self-self

Some scale normalization seems desirable

Only small differences in spread apparent; no action

Assumption: All slides have the same spread in M

True log ratio is mij where i represents different

Observed is Mij, where

MADi = medianj { |yij - median(yij) | }

Global lowess doesn’t do the trick here

Still a lot of scatter in the middle in a WT vs KO comparison

Before normalization After print-tip-group

Assumption: All print-tip-groups have the same

MADi = medianj { |yij - median(yij) | }

Clearly care is needed in making decisions like this

Unnormalized Print-tip normalization Print tip & scale n

• Use logratios (M vs A plots)

• Only one “color”

Gene expression level of gene 5 in slide 4

These values are conventionally displayed

Jason Goncalves (Iobion)

You might also like