0% found this document useful (0 votes)

52 views

Assessing The Efficiency of Dye-Swap Normalization To Remove Systematic Bias From Two-Color Microarray Data

This document assesses the effectiveness of dye-swap normalization for removing systematic bias from two-color microarray data. It compares dye-swap normalization to LOWESS normalization using data from three experiments. The results show that dye-swap normalization corrects for dye effects and gene-dye interactions while preserving biological characteristics, making it a valid alternative to LOWESS normalization for two-color microarray data. The paper also reviews the assumptions and formulas used in dye-swap normalization.

Uploaded by

Yu Bei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views

Assessing The Efficiency of Dye-Swap Normalization To Remove Systematic Bias From Two-Color Microarray Data

Uploaded by

Yu Bei

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Assessing the eciency of dye-swap normalization to remove systematic bias from two-color microarray data

Fatima Sanchez-Cabo Department of Biomolecular Sciences, UMIST P.O. Box 88, Manchester M60 1QD, U.K. Institute for Genomics and Bioinformatics Christian Doppler Laboratory for Genomics and Bioinformatics, Graz University of Technology, 8010 Graz, Austria Andreas Prokesch, Gerhard G. Thallinger, Roland Pieler and Zlatko Trajanoski Institute for Genomics and Bioinformatics Christian Doppler Laboratory for Genomics and Bioinformatics, Graz University of Technology, 8010 Graz, Austria Philip D. Butcher and Jason Hinds Bacterial Microarray Group, St.Georges Hospital Medical School, Cranmer Terrace, London, U.K. Leah E. A. Holmes, Susan G. Campbell, Mark P. Ashe and Simon Hubbard Department of Biomolecular Sciences, UMIST P.O. Box 88, Manchester M60 1QD, U.K. Kwang-Hyun Cho School of Electrical Engineering, University of Ulsan, Ulsan, 680-749, Korea Olaf Wolkenhauer Department of Computer Science University of Rostock, Rostock, Germany Address: Albert Einstein Str. 21, 18051 Rostock, Germany. E-mail: [email protected], Tel./Fax:+49 (0)381 498 33 35/99. March 24, 2004

To whom correspondence should be addressed.

Abstract
Microarrays are a powerful tool in functional genomics, what allow a simultaneous analysis of the expression level of thousands of genes under dierent conditions. In order to compare measurements within and across arrays and to correct non-biological variation masking meaningful information, normalization is an essential task prior to any further analysis. Among all the available normalization techniques, LOWESS has proved useful for the normalization of data generated from the two main microarray platforms (two-color arrays and aymetrix chips) due to its ability to remove intensity dependent eects. However, the use of this robust estimator to correct the data without taking account of biological characteristics is a concern often raised by microarray analysts. In addition, powerful software packages are needed to perform such computationally expensive normalization and several parameters need to be xed in advance, resulting in dierently corrected data sets for the dierent sets of parameters used. Reverse labeling designs are common setups in two-color microarray experiments if comparison between the co-hybridized samples is of interest. Using three dierent data sets, this paper assesses the eectiveness of dye-swap normalization, a method that makes use of the intrinsic information provided by this type of experimental design. The results show how dye-swap normalization corrects the bias introduced by the dierent properties of the dyes, removing intensity dependent eects. Furthermore, dye-swap normalization corrects the data accounting for gene-dye eects and the transformation of the data is justied on a biological basis. The results present dye-swap normalization as a valid alternative to normalize two-color microarray data. The paper also reviews the assumptions made and the formulas applied to correct the data using dye-swap normalization. In addition, a generalization of the dye swap normalization formula is implemented to normalize data generated from microarray experiments for which a large proportion of genes are expected to be dierentially expressed.

The gures and results presented in this paper have been implemented using several Bioconductor packages from R, MATLAB (Mathworks Inc.) and ArrayNorm. A collection of les and supplementary material is available from http://www.sbi.uni-rostock.de

Keywords: Two-color microarrays, normalization, LOWESS, dye-swap, replicates.

Introduction

Two-color microarray experiments estimate simultaneously the relative expression level of a set of genes in two biological samples. To allow such a comparison, messenger RNA (mRNA) from the populations of interest is reverse transcribed and labeled using two dierent uorescent dyes (usually Cyanine dyes, Cy3 and Cy5)(Schena 2002). Afterwards, both samples (related to the channels of the scanner used to read the array) are hybridized onto the microarray, where Polymerase Chain Reaction (PCR) products that represent all or part of the genes in the genome were spotted (Eisen and Brown 1999, Schulze and Downward 2001). The slide is then scanned at two dierent wavelengths corresponding to the range of the emission spectra of the uorophore. For each channel a high resolution image is generated, which is then analyzed in a process referred to as spot nding. The spots are quantied into single intensity values for each channel. These two intensity values are the estimators of the relative expression level of the gene in the two samples. The spotnding or scanning software (e.g. GenePix, Imagene) also provides an estimator of the background intensity for a given spot, and in both channels. The data analyst then has the option to correct the data by, for example, subtracting the background from the spot intensity. In microarrays, the process of removing non-biological variation that is masking meaningful information is known as normalization. The correction of the data, according to those factors introducing systematic errors, is an essential stage prior to the analysis and biological interpretation of the data. In two-color microarray experiments, the so-called dye eect is one of the most important sources of systematic errors. Several properties are dierent for the two dyes, i.e. their quantum eciency and their gene specic incorporation properties (Dobbin et al. 2003, Tseng et al. 2001). They make it necessary to balance the intensities of both channels before further analysis. To compare two measurements that are actually read in dierent scales, they must be brought in to the same range. There are two non-exclusive strategies that can be employed to normalize microarray data: Normalization by self-consistency (Kepler et al. 2002) using all genes: There are three main methods based on the assumption that the overall intensity should be the same for both channels, i.e., most of the genes should be equally expressed in both 4

compared samples. These methods are the global method (Yang et al. 2001, Yang et al. 2002), the use of a LOWESS function (Cleveland 1979) correcting intensitydependent data (Yang et al. 2001, Yang et al. 2002) and the use of the regression line (Quackenbush 2001). From all of them, the use of a LOWESS function to normalize within a slide is the most robust and popular. Normalization using the quality control elements introduced in the experiment: This refers to the intrinsic and extrinsic controls, the use of replicated genes within the array to correct spatial eects, the use of replicated arrays and the swapping of the dyes for replicated arrays. The latter is a requirement to apply dye-swap normalization. Dye swap experiments are extended and well established in the microarray community (Dobbin et al. 2003, Kerr and Churchill 2001). This paper studies the eectiveness of dye-swap normalization, which makes use of the information provided by this type of setup. Data from three dierent experiments were normalized using dye-swap and compared against the standard LOWESS normalized data. These experiments were chosen in order to test two dierent features: (1) A self-self hybridization experiment was conducted to assess the eciency of dye-swap normalization in the correction of intensity dependent systematic bias, i.e. the dierent properties of the two dyes used to label the co-hybridized samples. (2) A good normalization method should preserve the biological characteristics of the data. For this reason the correlation of technical and biological replicates after normalization was analyzed for two experiments, a growth curve experiment for M.tuberculosis and an experiment to study the yeast transcriptional repressors Mig1p and Mig2p (Rolland et al. 2002, Campbell et al. Manuscript in preparation, 2004). The paper is organized as follows. Firstly, the three main self consistency methods are discussed. These are the global approach, LOWESS (Yang et al. 2001, Yang et al. 2002) and the linear regressive approach (Quackenbush 2001). In Section 3, the most important quality control elements in microarrays are briey described and dye-swap normalization is explained in detail. The dierent formulas proposed in the literature and their equivalence are also discussed. To conclude, in Section 4, LOWESS and dye-swap normalization are applied to three dierent data sets. The results are discussed in the following section. 5

Within array normalization by self-consistency: LOWESS correction

Microarrays allow us to simultaneously measure the response of thousands of genes to specic biological conditions. Due to the large number of genes spotted onto an array, one might think that, on the whole, most genes will not show a signicant change in the expression level between the two samples being compared. Under this premise, dierences among the overall intensity of both channels would be the consequence of non-biological variation. An important source of systematic errors in two-color microarray experiments are the dierent properties of the dyes used to label the two samples (Yang et al. 2001, Dobbin et al. 2003). Under the assumption that most of the genes should be equally expressed in both samples, we ought to correct the data so that the distribution of the expression ratios has a central value of one. Choosing the median as an estimator of the central tendency of the distribution, the data are corrected to accomplish mediani=1,...,ng Ri Gi 1 log mediani=1,...,n = 2 g Ri Gi 0, =

where Ri represents the intensity of the red channel for gene i, Gi the same for the green one. ng indicates the number of genes spotted on the array. This transformation can be achieved by estimating an expression (Yang et al. 2001, Yang et al. 2002), as R = G. The dierent estimators of will result in the three dierent within-array normalization methods: The global method looks for a constant which relates the overall intensity of both channels. A common choice is (Yang et al. 2001) =2
mediani=1,...,ng log2
Ri Gi

The linear regression method (Quackenbush 2001) ts a regression line to the scatter plot (G,R). Under the assumption that most of the genes should be equally expressed for both channels, the regression line should have a slope one. Hence, R=mG+n 6 n R =G. m m

From that follows

m, where m is the slope of the regression line tted to the scatter

plot and n is the intercept with the ordinate. The LOWESS function was rst introduced by Cleveland (1979). This function is estimated through a locally weighted polynomial regression for a xed subset of genes in the neighborhood of every gene i. As a tool to normalize microarray data, it rst appeared in Yang et al. (2001). From the scatter plot (A,M ), where M = log2 A= R G and

1 (log2 G + log2 R) , 2

the LOWESS function c(Ai ) can be calculated: c(Ai ) : I R, where the set of indexes I denotes all genes spotted on the array. Under the assumption that most of the genes are equally expressed in both channels, A is the overall intensity level measured in the array as it can be observed by log2 R log2 G A = 1 (log2 G + log2 R) 2 log2 G log2 R .

The tting of the LOWESS function c(A) from the (A,M ) scatterplot leads to: M = log2 R G c(A) = k(A) = 2c(A) . =

Regardless of the method used to estimate , the data will be corrected as follows: log2 R G log = 2 R G 0 log2 = R G 0, =

where = log2 (). Denoting the corrected data by the superscript c , it follows that Mic = Mi i , for all i.

This is equivalent to correct both channels intensity values, for every spotted gene i as:
c Ri = Ri ,

Gc = G i i . i

LOcally WEighted leaSt Squares (LOWESS)

Because the dye eect appears to be intensity dependent in most of the cases (Yang et al. 2001, Yang et al. 2002, Workman et al. 2002), LOWESS has become a popular method for within-array normalization. The global dye correction method transforms all of the genes using a single value for the whole slide, whilst the regression method is highly sensitive to outliers. In consequence, the LOWESS approach appears to be the most suitable option among the three in reducing the dierential eects of the dyes.

Within array normalization using quality control elements: Dye-swap normalization

The three self-consistency methods described above provide a general approach to correct the dye eect. However, as shown by van de Peppel et al. (2003) they are not suitable for all those experiments for which a global shift in mRNA expression occurs. In those situations, the intrinsic information of the experiment must be used to normalize the data. To this end, a good experimental design should provide quality control elements, including control spots, replicated genes within the array or replicated arrays for which the dyes are swapped. Dierent material can be spotted as controls in the microarray, for example, genomic DNA (gDNA), spiked genes(van de Peppel et al. 2003, Benes and Muckenthaler 2003), or a Microarray Sample Pool (MSP) (Yang et al. 2002). For the controls to be useful in the normalization, their intensities should cover the whole intensity range. In that case, the LOWESS function or any other non-linear function tted to the controls (using for example the Levenberg-Marquardt algorithm (Marquardt 1963)) can be used to determine the relationship between both channels, and this function can then be used to correct the whole data set. The use of replicates meets a double target in microarray experiments (Churchill 2002): Biological replicates are a requirement to provide statistical signicance measures for differences in gene expression (Black and Doerge 2002) and to average out the dierences among individuals. Technical replicates remove random errors introduced in the experiment and replicated slides can be used to normalize the data if the dyes are swapped. Let us consider a particular gene i for which the expression level in two samples of mRNA is measured. We will refer to the two biological samples to be compared as s and r. Let us suppose that during the reverse transcription of mRNA into cDNA the sample 8

denoted by s was labelled with Cy5 (red) and the sample denoted by r with Cy3 (green). For every spotted gene i the following expression is considered Mi = log2 Ri Gi .

Using the same material, the reverse transcription process and labelling are repeated, but in this case the dyes are swapped so the sample s is labelled with Cy3 (green) and the r with Cy5 (red). For the same gene i we thus have Mi = log2 From these two equations, we obtain Mi = log2 Mi = log2 Ri Gi Ri Gi = log2 = log2 si ki ri ri k si i = log2 = log2 si ri si ri + log2 ki = log2 si ri + ci , si ri + ci , (1) (2) Ri Gi .

+ log2 ki = log2

where ri stands for the intensity of the gene i in sample r and si for the same value in
i sample s. The target is to estimate log 2 ( si ) from Mi , Mi . Hence, it follows that r

Mi ci = log2 Mi + ci = log2

si , ri si . ri

In these expressions, ci and ci account for the dierent properties of the dyes. As suggested in Yang et al. (2001) under the name self normalization, if ci A for an explanation), adding both equations Mi M i 2 log2 si ri = 1 (Mi Mi ) 2 log2 si ri . (3) ci for all i (see Appendix

Earlier, Kerr et al. (2000) had proposed a formula to estimate the corrected logged expression ratio for a dye-swap experiment, under the assumption of normality of the logged intensities: log2 si ri 1 = [(Mi Mi ) (meani (Mi ) meani (Mi ))]. 2 ci , for all i. In that case, meani (Mi ) (4) meani (Mi )

(3) and (4) are equivalent if ci

and (3) and (4) are the same formula. (4) was also used by Tseng et al. (2001) and it 9

is equivalent to performing a global normalization to correct possible global shifts weakening the reproducibility of the technical replicates. This can be due to dierent PMT (Photomultiplier Tube) settings, dierences in labeling or hybridization between the two replicates, etc. Once these errors are corrected, the conventional dye-swap normalization (3) is performed to correct the dierent properties of the dyes on a gene by gene base. All through the text dye-swap normalization was calculated using (4). The main advantage of dye-swap normalization is the correction of the data preserving the characteristics of every gene. In addition, it accounts for the dierent incorporation rate of the two dyes to dierent sequences. Note also that the implementation of the method is straight forward and the computational cost very low. The main apparent disadvantage is the need for an additional microarray slide to allow dye reversal. However, the inaccuracy of two-color microarrays makes it necessary to provide technical replicates in every condition. Hence, there is no real extra cost in performing the technical replicates with a swap of the dyes.

4
4.1

Results
Self-self hybridization

The scope of the experiment was to test the inuence of dierent surface coatings on the quality of in-house spotted mouse cDNA microarrays. The microarrays comprise several cDNA libraries: NIA (National Institute of Aging), BMAP (Brain Molecular Anatomy Project) and costum libraries. They were PCR amplied and spotted without purication in 3xSSC/1.5M Betaine spotting buer. Including controls and duplicates, 36672 features were spotted on each array in one spotting run. Four surfaces with dierent coatings were compared: Epoxy, amino and aldehyde surfaces from Schott-Nexterion and GAPS II coated slides from Corning. In order to be able to analyze as many spots as possible, RNA had to be chosen from a transcriptionally active source. Therefore, murine mesenchymal stem cells were induced to adipogenesis and eight time point samples were taken over 14 days of dierentiation. These samples were pooled together and self versus self hybridizations were performed on each two slides per surface.

Data Preprocessing The data was scanned with the GenePix 4000B scanner (AxonInstruments 1995-2004) and the image quantication software used was GenePix 4.0.1. After grid alignment, obvious areas of background artifacts were manually agged and genes were ltered out according to the two following lter criteria: Number of saturated pixels in at least one of the channels greater than 50 and sum of medians smaller than 1500. The latter removes low intensity spots for which expression levels cannot be reliably measured with the scanner. Dyeswap and LOWESS normalization were performed using the marrayNorm package from Bioconductor. The ltered data was normalized using ArrayNorm (Pieler et al. 2004). The commands used in Bioconductor can be found in the Supplementary Material. Bias removal Although no biological replicates were available, the experiment was interesting to test how well dye-swap normalization and LOWESS normalization are able to remove the bias and if dye-swap normalization can correct the intensity dependent eect. Because the same material was labelled with the two dyes and hybridized together onto the same slide type twice, the pair of technical replicates can also be considered as a dye swap pair. The distribution of the corrected log ratios should be centered around 0 with very small dispersion. Figures 1 and 2 show the normalized data after dye-swap and LOWESS normalization. For LOWESS normalization the eight slides were normalized independently and the average of the technical replicates was calculated, resulting in a unique value per surface coating to be tested. In addition, we calculated the standard deviation of the ltered normalized data. Table 1 shows the results.

4.2

Study of the M.tuberculosis growth curve

This experiment aimed to understand the growth curve for M.tuberculosis, taking measurements after 6, 14, 20 and 30 days. Four replicated arrays of RNA samples from each time point were hybridized. In total, sixteen arrays were produced, using for the signal channel the four samples of RNA extracted from M.tuberculosis and using gDNA for the reference channel. The advantage of this reference design is that all genes in the 11

(a) Dye-swap normalized data Epoxy surface coating.

(b) Dye-swap normalized data amino surface coating from Schott-Nexterion.

(c) Dye-swap normalized data aldehyde surfaces from Schott-Nexterion.

(d) Dye-swap normalized data GAPS II coated slides from Corning.

Figure 1: Dye-swap non-ltered normalized data

(a) LOWESS normalized data Epoxy surface coating.

(b) LOWESS normalized data amino surface coating from Schott-Nexterion.

(c) LOWESS normalized data aldehyde surfaces from Schott-Nexterion.

(d) LOWESS normalized data GAPS II coated slides from Corning.

Figure 2: LOWESS non-ltered normalized data

Table 1: Standard deviation of the ltered data after dye-swap normalization and after LOWESS normalization. The standard deviation after LOWESS is always approximately double than the standard deviation of the ltered log ratios normalized using dye-swap. condition 1 2 3 4 std. ltered log ratios dye-swap 0.1058109 0.0969764 0.1015136 0.1089701 std. ltered log ratios LOWESS 0.2002904 0.1548934 0.2620740 0.2671140

genome are present in the gDNA. Hence, every gene should give a homogeneous signal for the denominator of the ratio of both channels. A broader discussion on this topic can be found in (Talaat et al. 2002). The labelling reactions were performed independently and the dyes were swapped for one out of the four replicates. Denoting by a = 1, 2, ..., 16 the number of the array, the experiment can be summarized as for a = 4, 8, 11, 16 Green : RNA (signal), Red : gDNA (reference),

for a = 4, 8, 11, 16

Green : gDNA (reference), Red : RNA (signal).

PCR products of the 3924 genes of the genome of M.tuberculosis strain H37Rv were spotted once in every slide. In addition, dierent types of controls were printed at dierent locations. The normalization controls were 5s, 16s and 23s ribosomal RNA genes, printed in every sub-grid. The 16s and 23s rRNA were printed in a three-fold dilution series. Many of the controls gave a saturated signal in the RNA channel. The reason is that whilst gDNA used for the reference has a single copy of the rRNA genes, so equal in abundance to the other genes in the genome, the prokaryotic RNA is total RNA consisting of 98% rRNA and just 2% mRNA. Hence, in the RNA channel a greater proportion of the RNA hybridised to the control spots relative to the rest of the gene spots and so the higher intensities presented by the control spots was not in the same range as the intensities for the other genes. The control spots were excluded from the analysis and all of the results in this paper refer to the 3924 printed genes. Although there were no duplicated genes on 14

the slide, PCR products from the two IS6110 transposase family elements were present. Each of them has sixteen copies. Dierences of only a few nucleotides have been detected between the sequenced copies, so we can expect their intensity levels to be very similar after proper normalization of the data. Data preprocessing The slides were scanned with the Aymetrix 428T M scanner (MWG-Biotech 2004) and the image quantication software used was Imagene (BioDiscovery 2004). The use of gDNA

reference made the use of all the genes printed in the array feasible because all of them gave a reliable signal in the reference channel. In addition, no gene had to be removed due to high background intensity. Following the analysis of the background intensity, it was decided not to perform background subtraction. There were two reasons: First, the overall background intensity was very small if compared to the foreground intensity. In the second place, we found that the noise patterns that appeared in the background reconstruction were inherited by the foreground after background subtraction. All of this analysis and the dye-swap normalization of the data was achieved using the normalization module of the program MADE (Sanchez-Cabo et al. 2003). Because the implementation of LOWESS depends on the chosen parameters (i.e. width of the neighborhood, smooth degree, etc.) we normalized the data using the LOWESS function from the limma package from Bioconductor (Gentleman et al. 2003). The functions used can be found in the Supplementary Material. Correlation of replicated measurements For this particular experiment, relatively few genes are expected to change in expression between the two hybridized samples at every time point. LOWESS and dye-swap normalization were applied to correct the data. Both removed the bias as presented in Figure 3. A good normalization method should correct the systematic bias while preserving the biological information in the data. To test the second, we classied the slides using hierarchical clustering (Eisen and Brown 1999). The results obtained support the hypothesis that dye-swap normalization preserves the biological information of the data better than

boxplot after LOWESS normalization

boxplot after dye swap normalization

log2(G/R)

Arrays

(a) Boxplot after LOWESS normalization.

(b) Boxplot after dye-swap normalization.

Figure 3: Distribution of the log-ratios for the 16 arrays of the experiment after LOWESS and after dye-swap normalization. Every four consecutive boxplots (three after dye-swap normalization) are the replicates at a particular time point. LOWESS (see Figure 4). However, the comparison might not be considered fair because using dye-swap normalization the three slides classied for every condition have been normalized with a common slide, while LOWESS normalized every slide independently. The study of the correlation among the elements of the IS6110 family would be a fairer comparison. Therefore, we calculated the coecient of variation (CV) after dye-swap and after LOWESS normalization. The results are displayed in Table 2. The overall CV was smaller for the elements of the IS6110 family after dye-swap normalization than after LOWESS. In addition, the CV was larger after dye-swap than after LOWESS normalization for only one slide (slide 13 (time point four, rst replicate)).

4.3

Analysis of the yeast transcriptional repressors Mig1p and Mig2p

Microarray analysis was used to investigate Saccharomyces cerevisiae strains deleted for the transcriptional repression genes, MIG1 and MIG2. Mig1p and Mig2p function in the yeast glucose repression/derepression pathway to directly repress genes required for the use of alternate carbon sources (Rolland et al. 2002). In wild type, mig1 and mig2 strains, protein synthesis is inhibited following glucose removal (Ashe et al. 2000). However, the double mutant, mig1mig2, maintains translation following a switch to no glucose (Campbell et al. Manuscript in preparation, 2004). In an attempt to identify changes

Hierarchical clustering with average linkage

distance

distance
(4,1) (4,2) (4,3) (2,2) (2,3) (2,1) (3,4) (3,3) (3,2) (3,1) (2,4) (4,4) (1,1) (1,2) (1,3) (1,4)

(2,2)

(2,3)

(2,1)

(3,1)

(3,2)

(3,3)

(4,1)

(4,2)

(4,3)

(1,1)

(1,2)

(1,3)

(time point, replicate number)

(a) Hierarchical clustering of the replicates after LOWESS normalization.

(b) Hierarchical clustering of the replicates after dye-swap normalization.

Figure 4: Hierarchical clustering of the replicates. Replicates of the same time point cluster perfectly together after dye-swap normalization. After LOWESS normalization slide (2,1) is closer to slide (3,4) than to the rest of the second time point replicates. Slides (2,4) and (4,4) are missclassied.

Table 2: Dispersion of the IS6110 elements in every slide after LOWESS and dye-swap normalization. The quality measure used was the Coecient of Variation (CV). After LOWESS normalization (time, replicate) Mean STD CV (1,1) 0.6966 0.2516 0.3612 (1,2) 0.6881 0.1400 0.2034 (1,3) 0.7930 0.1292 0.1630 (1,4) 1.6077 0.3205 0.1993 (2,1) 0.8928 0.0599 0.0671 (2,2) 0.9898 0.2957 0.2988 (2,3) 0.9407 0.2332 0.2480 (2,4) 1.2124 0.1546 0.1275 (3,1) 0.9113 0.1780 0.1953 (3,2) 0.9365 0.1586 0.1694 (3,3) 1.2248 0.1383 0.1129 (3,4) 1.0549 0.3597 0.3410 (4,1) 0.9005 0.0826 0.0917 (4,2) 0.9370 0.1766 0.1885 (4,3) 0.9793 0.1884 0.1924 (4,4) 1.4126 0.3152 0.2231 overall mean 0.1989 After dye-swap normalization (time, replicate) Mean STD CV (1,1) 0.9426 0.1495 0.1586 (1,2) 0.6761 0.1387 0.2051 (1,3) 0.6845 0.0933 0.1363 (2,1) 0.9110 0.0587 0.0644 (2,2) 0.8142 0.1275 0.1566 (2,3) 0.8260 0.1080 0.1307 (3,1) 0.8539 0.0580 0.0679 (3,2) 0.8171 0.0953 0.1167 (3,3) 0.8654 0.0842 0.0973 (4,1) 0.8662 0.1352 0.1560 (4,2) 0.7748 0.1360 0.1755 (4,3) 0.7555 0.1290 0.1707 overall mean 0.1363

in gene expression proles that may be responsible for these translational phenotypes labeled cDNA from each mutant was compared to wild type cDNA using competitive hybridization on spotted arrays representing the whole genome of S. cerevisiae. Six arrays were hybridized for each mutant and wild type, of which three experiments used reciprocal labeling of RNA. The experiment was then repeated using duplicate RNA extracts to give a total of thirty-six hybridizations. Data preprocessing In two-color microarrays designed to study eukaryotic gene expression, the use of an homogeneous reference as gDNA presents practical diculties due to the high introns-exons ratio. In consequence, the quality of the data is rarely as good as for the previous experiment presented. Regions with high levels of background are often observed and also spatial artifacts. At this stage, agging of bad spots becomes an issue. At the moment, there is no consensus about the desirable balance between number of spots ltered out and the criteria to be applied to get reliable biological conclusions. For this particular data set, no standard ltering criteria was applied. Areas with obvious artifacts or very high background were agged. In this case background subtraction was performed. The slides were scanned using the GenePix 4000A scanner (AxonInstruments 1995-2004) and the images were quantied using GenePix . The data was rst normalized using the marrayTools package from Bioconductor (Gentleman et al. 2003). The functions used can be checked in the Supplementary Materials. The normalization of the ltered data was computed using ArrayNorm (Pieler et al. 2004). Correlation of replicated measurements For this experiment only those genes regulated by Mig1p and Mig2p were expected to change. Hence, dye-swap normalization and LOWESS are suitable to normalize the data and they should both remove the bias in a similar manner. In fact, a comparison of both normalization methods showed a very similar list of dierentially expressed genes when using the fold change. However, Figure 5 shows how the across replicates variation is not the same for the two compared methods. Consequently, if more biological replicates had been available and statistical methods such as t-test had been applied to the data,

dierent sets of dierentially expressed genes would have been found. Figure 5 shows the hierarchical clustering of the replicates after both, dye-swap and LOWESS normalization using the marrayTools package from Bioconductor. The correlation between the biological and technical replicates was calculated using all the genes without paying attention to the bad quality spots. In both gures, mig1, mig2 and mig1mig2 are the three compared mutants. BR stands for Biological Replicate (1,2) whereas DS 1, 2 and 3 represent the three dye swap pairs.

(a) Hierarchical clustering of the replicates after LOWESS normalization including all spots.

(b) Hierarchical clustering of the replicates after dye-swap normalization including all spots.

Figure 5: Hierarchical clustering of the replicates without ltering the data. Both plots show a similar result when looking at the correlation of the replicates without excluding the agged spots. In the case of LOWESS, all slides were independently normalized and the mean between the technical replicates was calculated. Because further clustering analysis and detection of dierentially expressed genes was performed using only those genes with a reliable measurement, the data were also normalized using ArrayNorm (Pieler et al. 2004) removing the genes that were agged. The correlation between the replicates was compared for the ltered data normalized using dyeswap and LOWESS. The results showed a better preservation of the biological meaning of the data after dye-swap normalization than after LOWESS normalization (see Figure 6).

(a) Hierarchical clustering of the replicates after LOWESS normalization for the ltered data.

(b) Hierarchical clustering of the replicates after dye-swap normalization for the ltered data.

Figure 6: Hierarchical clustering of the replicates. After dye-swap normalization all replicates cluster perfect with the exception of mig1mig2BR1 and mig1BR2DS2. However, after LOWESS normalization the biological and technical replicates do not correlate well.

Discussion

At present, there is no consensus view regarding either the normalization method that better corrects the systematic bias introduced in microarray experiments or the means to validate a normalization algorithm. Data sets that had been normalized using standard methods such as LOWESS have been shown to give dierent results if normalized using spike-in controls (van de Peppel et al. 2003). In general, it is dicult to know a priori which assumptions hold for every particular data set, because it is exactly the behavior on the whole that is unknown to the researcher when planning a microarray experiment. However, it is entirely reasonable that a good normalization method should preserve the biological characteristics of the experiment. Robust normalization methods have often given rise to concerns in this respect (Tsodikov et al. 2002, van de Peppel et al. 2003). The results from the self-self hybridization pointed out several important facts: Firstly, dye-swap normalization calculated using formula (4) removes the intensity dependent eect as eciently as LOWESS. Secondly, conditions 1 and 2 show a tail in the high intensity range (after LOWESS normalization), which is probably due to saturation artefacts. This 20

did not appear after dye-swap normalization. Thirdly, for both the ltered and the nonltered data, the spot dispersion was smaller after dye-swap normalization than after LOWESS. The other two data sets were chosen to test how well dye-swap normalization preserves the correlation among technical and biological replicates. For the growth curve experiment the results were denitive in favor of the dye-swap normalization. However, the correlation among replicates could be articially high because all three replicates were normalized with the same slide. For this reason the same comparison was done for the microarray experiment studying the Mig1p Mig2p yeast transcriptional repressors. Although the results for the non-ltered data did not show much variation between both methods, technical and biological replicates were much better correlated after dye-swap than after applying LOWESS to the ltered data. The necessity for the ltering of low intensity spots is a source of much debate. In this experiment ltering removed obvious low quality areas and the percentage of poor spots in every slide is shown in the Supplementary Material. Yet, both methods present the same important drawback: they can only be applied to experiments for which most genes are expected to behave in the same way in the two compared samples. However, dye-swap normalization could be easily applied to experiments for which a priori information about the number of genes dierentially expressed is available. For them, formula (4) should be: log2 si ri 1 th 1 = (Mi Mi ) n perc{i=1,...,ng } (Mi ) (100 n)th perc{i=1,...,ng } (Mi ) , 2 2

where nth perc indicates the percentage of genes expected to be changing between the two hybridized conditions.

Conclusions

Making use of three dierent data sets, this paper illustrates the eectiveness of dye-swap normalization. Dye-swap normalization removes systematic bias in the data accounting for intensity dependent eects and is a natural way to correct the data because each step is justied on a biological basis. The results present dye-swap normalization as a valid alternative to LOWESS with the advantage of its low computational cost. An apparent disadvantage is the need for an extra slide where the dyes are switched. However, technical 21

replicates must be provided in two-color microarray experiments, therefore a switch of the dyes does not necessarily increase the cost of the experiment. Dobbin et al. (2003) propose a more economic reverse dye design. Yet, the method as formulated by Kerr et al. (2000) is only valid under the premises that most genes are equally expressed and that the logged intensity values are normally distributed. With exception of the length of the tails (which contain the dierentially expressed genes) the symmetry and uni-modality of microarray data is often assumed as true. Having shown that dye-swap normalization corrects the data appropriately, its general application to experiments for which the proportion of genes expected to change it is known, is also simple and straight forward. Nothing is concluded about the necessity of establishing standard criteria to lter out poor quality spots. However microarrays are a quantitative tool that provide numerical information. The resulting measurements should then be as reliable as possible.

Appendix A: Dierent properties of Cy3 and Cy5

The basic assumption made by Yang et al. (2001) in the dye-swap normalization method, is that ci ci . This appendix tries to explain under which conditions this is true, from

a theoretical view, without considering random errors aecting the quality of the labeling or hybridization. The two cyanine dyes dier in several aspects. Some of them are intrinsical to the dyes and independent on the sample or the sequence the dyes are labelling. These are, for example, the dierent quantum yield, dierent quenching properties or the dierent photobleaching properties of the dyes (Tseng et al. 2001). In consequence, they are neither sample- nor gene-dependent, and they are not supposed to change signicatively from one array to another, and neither within an array. Formulating this in a mathematically form, we have that: Quantum Yield : QY(dye,gene,sample)=QY(dye) Quenching : Qn(dye,gene,sample)=Qn(dye) Photobeaching : PH(dye,gene,sample)=PH(dye)

However, there is another dierence between Cy3 and Cy5 that is essential in twocolor microarrays. Due to the dierent size of their molecules, Cy3 and Cy5 incorporate dierently to particular sequences. Hence, some genes have been observed to incorporate one dye more eciently than the other (Dobbin et al. 2003). Kerr et al. (2000) introduced in the ANOVA model proposed in a posterior publication (Kerr and Churchill 2001) the dye gene eect. Although not originally expected, experimental data showed several examples of the gene-dependent dierent incorporation properties of the two cyanine dyes. Again, we can formulate this as: Incorporation : In(dye,gene,sample)=In(dye,gene) Using the same nomenclature as in Section 3, if the gain set to scan both slides was the same, the intensity level of a particular gene i measured in the two channels can be expressed as: Ri = f (si ) = QY(Cy5, i, s) Qn(Cy5, i, s) PH(Cy5, i, s) In(Cy5, i, s) si = QY(Cy5) Qn(Cy5) PH(Cy5) In(Cy5, i) si Gi = g(ri ) = QY(Cy3, i, r) Qn(Cy3, i, r) PH(Cy3, i, r) In(Cy3, i, r) ri = QY(Cy3) Qn(Cy3) PH(Cy3) In(Cy3, i) ri

The same is true for Ri and Gi : Ri = f (ri ) = QY(Cy5, i, r) Qn(Cy5, i, r) PH(Cy5, i, r) In(Cy5, i, r) ri = QY(Cy5) Qn(Cy5) PH(Cy5) In(Cy5, i) ri Gi = g (si ) = QY(Cy3, i, s) Qn(Cy3, i, s) PH(Cy3, i, s) In(Cy3, i, s) si = QY(Cy3) Qn(Cy3) PH(Cy3) In(Cy3, i) si

Equation (1) and (2) can be then expressed as: Mi = log2 Mi = log2 Ri Gi Ri Gi = log2 = log2 si ri ri si QY(Cy5) Qn(Cy5) PH(Cy5) QY(Cy3) Qn(Cy3) PH(Cy3) QY(Cy5) Qn(Cy5) PH(Cy5) QY(Cy3) Qn(Cy3) PH(Cy3) In(Cy5, i) In(Cy3, i) In(Cy5, i) In(Cy3, i) = log2 = log2 si ri si ri + ci , + ci ,

from which is clear that ci ci . Although the functions f (), g() may not be linear and more factors can be inuencing the dierence between Cy3 and Cy5, the example proposed here proves the assumption that ci ci . If any random error occurred or the PMT settings were not set to the same value the dierence in the medians of the two slides in the formula proposed by Kerr et al. (2000) (4) would account for it and dye-swap normalization would still work as probed all through the text. Acknowledgements. F.S.C. was supported by a BBSRC studentship. A.P. was supported by the Austrian GEN-AU project GOLD, grant GZ 200.059/6-VI/2/2002. G.G.T. and R.P. were supported by the Austrian GEN-AU project BIN, grant GZ 200.067/4VI/1/2002. The collaboration between F.S.C. and A.P., G.G.T and R.P. was supported by the EU Marie Curie Training Site Program Genomics of Lipid Metabolism. The M.tuberculosis microarray studies were supported by grant 062511/Z/00/Z awarded to P.B by The Wellcome Trust under its Functional Genomics Resources Initiative to fund the BGS (the Bacterial Microarray Group at St Georges Hospital Medical School) multi-collaborative microbial pathogen microarray facility. The Mig1p/ Mig2p micro-array studies were supported largely by a Wellcome Trust project grant 061867/Z/00/Z to M.P.A. L.E.A.H. and F.S.C. are supported by an MRC studentship. S.G.C. is supported by a Wellcome Trust project grant 067328/Z/02/Z. K.-H. Cho acknowledges the support received by the Korean Ministry of Science and Technology (Korean Systems Biology Research Grant, M10309000006-03B5000-00211). O.W. acknowledges the funding received by the U.K. Department for the Environment, Food and Rural Aairs (DEFRA) as part of the M.Bovis post genomics programme, conducted in collaboration with the Veterinary Laboratories Agency (VLA), Weybridge.

References
Ashe, M., De Long, S. and Sachs, A. 2000. Glucose depletion rapidly inhipbits translation initiation in yeast. Molecular Biology of the Cell 11, 833848. AxonInstruments: 1995-2004. http://www.axon.com. Accessed 29 February 2004.

Benes, V. and Muckenthaler, M. 2003. Standardization of protocols in cDNA microarray analysis. Trends in Biochemical Sciences 28, 244249. BioDiscovery: 2004. ImaGene website. http://www.biodiscovery.com/imagene.asp. Accessed 29 February 2004. Black, M. and Doerge, R. 2002. Calculation of the minimum number of replicate spots required to detection of signicant gene expression fold change in microarray experiments. Bioinformatics 18, 16091616. Campbell, S., Holmes, L. and Ashe, M.: Manuscript in preparation, 2004. Churchill, G. 2002. Fundamentals of experimental design for cDNA microarrays. Nature Genetics Supplement 32, 490495. Cleveland, W. 1979. Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association 74, 829836. Dobbin, J., Shih, J. and R., S. 2003. Calculation of the minimum number of replicate spots required to detection of signicant gene expression fold change in microarray experiments. Bioinformatics 19(7), 803810. Eisen, M. and Brown, P. 1999. DNA arrays for analysis of gene expression. Methods Enzymol. 303, 179205. Gentleman, R., Rossini, R., Dudoit, S. and Hornik, K.: 2003. The bioconductor FAQ. http://www.bioconductor.org. Accessed 29 February 2004. Kepler, T., Crosby, L. and Morgan, K. 2002. Normalization and analysis of DNA microarray data by self-consistency and local regression. Genome Biology 3(7), research0037.10037.12. Kerr, K. and Churchill, G. 2001. Experimental design for gene expression microarrays. Biostatistics 2, 183201. Kerr, K., Martin, M. and Churchill, G. 2000. Analysis of variance for gene expression microarray data. Journal of Computational Biology 7, 819837.

Marquardt, D. 1963. An algorithm for least squares-estimation of nonlinear parametrs. Journal of the Society for Industrial and Applied Mathematics 11, 431441. MWG-Biotech: 2004. Aymetrix 428T M array scanner. http://www.mwg-biotech.com. Accessed 29 February 2004. Pieler, R., Sanchez-Cabo, F., Hackl, H., Thallinger, G. and Trajanoski, Z.: 2004. Arraynorm: Comprehensive normalization and analysis of microarray data. In Press. Bioinformatics. Quackenbush, J. 2001. Computational analysis of microarray data. Nature Reviews Genetics 2(6), 418427. Rolland, F., Winderickx, J. and Thevelein, J. 2002. Glucose-sensing and signalling mechanisms in yeast. FEMS Yeast Research 2(2), 183201. Sanchez-Cabo, F., Cho, K., Trajanoski, Z. and Wolkenhauer, O.: 2003. A graphical user interface to normalize microarray data. DSC 2003. Schena, M.: 2002. Microarray Analysis. Wiley-Liss. Schulze, A. and Downward, J. 2001. Navigating gene expression using microarrays - A technology review. Nature Cell Biology 3, 190195. Talaat, A., Howard, S., Hale IV, H., Lyons, R., Garner, H. and Johnston, S. 2002. Genomic DNA standards for gene expression proling in Mycobacterium tuberculosis. Nucleic Acids Research 30(20), e104. Tseng, G., Oh, M., Rohlin, L., Liao, J. and Wong, W. 2001. Issues in cDNA microarray analysis: quality ltering, channel normalization, models of variations and assessment of gene eects. Nucleic Acids Research 29(12), 25492557. Tsodikov, A., Szabo, A. and Jones, D. 2002. Adjustments and measures of dierential expression for microarray data. Bioinformatics 18, 251260. van de Peppel, J., Kemmeren, P., van Bakel, H., Radonjic, M., van Leenen, D. and Holstege, F. 2003. Monitoring global messenger RNA changes in externally controlled microarray experiments. EMBO Rep 4, 387393. 26

Workman, C., Jensen, L., Jarmer, H., Berka, R., Gautier, L., Nielsen, H., Saxild, H., Nielsen, C., Brunak, S. and Knudsen, S. 2002. A new non-linear normalization method for reducing variability in DNA microarray experiments. Genome Biology 3(9), research0048.10048.16. Yang, Y., Dudoit, S., Lin, D., Peng, V., Ngai, J. and Speed, T. 2002. Normalization for cDNA microarray data: A robust composite method adressing single and multiple slide systematic variation. Nucleic Acids Research 30(4), e15.1e15.10. Yang, Y. H., Dudoit, S., Luu, P. and Speed, T. P.: 2001. Normalization for cDNA microarray data. SPIE BIOS 2001.

BIOL 239 Lecture Notes
No ratings yet
BIOL 239 Lecture Notes
39 pages
Dna and Rna Grade 10 Topics
100% (2)
Dna and Rna Grade 10 Topics
16 pages
DORIN H Range - K Range Cross Chart
100% (5)
DORIN H Range - K Range Cross Chart
4 pages
Microarray Review
No ratings yet
Microarray Review
5 pages
Statistics For Microarrays: Normalization
No ratings yet
Statistics For Microarrays: Normalization
44 pages
Dchip, MAS e RMA
No ratings yet
Dchip, MAS e RMA
8 pages
DNA Microarrays: DR Divya Gupta
100% (1)
DNA Microarrays: DR Divya Gupta
33 pages
Limma: Linear Models For Microarray Data User's Guide
No ratings yet
Limma: Linear Models For Microarray Data User's Guide
102 pages
Basic Principles in Bioinformatics: Understanding Microarrays
No ratings yet
Basic Principles in Bioinformatics: Understanding Microarrays
81 pages
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
No ratings yet
Analysis of Microarray Gene Expression Data - M. Lee (Kluwer
398 pages
Analysis of Microarray Gene Expression Data eBook Full Text
100% (4)
Analysis of Microarray Gene Expression Data eBook Full Text
17 pages
Introduction To Microarrays: BTCH-Paper XI Unit-IV: DNA Microarrays
No ratings yet
Introduction To Microarrays: BTCH-Paper XI Unit-IV: DNA Microarrays
43 pages
Limma Guide
No ratings yet
Limma Guide
151 pages
Analysis Of Microarray Gene Expression Data Ting Lee Meiling pdf download
100% (2)
Analysis Of Microarray Gene Expression Data Ting Lee Meiling pdf download
81 pages
Tools For Statistical Analysis of Microarray Data: Matt Ritchie
No ratings yet
Tools For Statistical Analysis of Microarray Data: Matt Ritchie
36 pages
Micro Arrays II - Image Analysis and Data Pre-Processing
No ratings yet
Micro Arrays II - Image Analysis and Data Pre-Processing
34 pages
CMMB 461 Dna Microarray 1 2019 For D2L1
No ratings yet
CMMB 461 Dna Microarray 1 2019 For D2L1
37 pages
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
No ratings yet
Use of The Half-Normal Probability Plot To Identify Significant Effects For Microarray Data
24 pages
Microarray Experiment Design
No ratings yet
Microarray Experiment Design
18 pages
tmp25AA TMP
No ratings yet
tmp25AA TMP
19 pages
Práctica 1 Eng
No ratings yet
Práctica 1 Eng
17 pages
Methods Of Microarray Data Analysis Iii Papers From Camda 02 1st Edition Simon M Lin download
No ratings yet
Methods Of Microarray Data Analysis Iii Papers From Camda 02 1st Edition Simon M Lin download
87 pages
DNA Microarray Overview: (Some Slides From Dr. Holly Dressman, Duke University
No ratings yet
DNA Microarray Overview: (Some Slides From Dr. Holly Dressman, Duke University
34 pages
Discovery in Toxicology: Mediation by Gene Expression Array Technology
No ratings yet
Discovery in Toxicology: Mediation by Gene Expression Array Technology
12 pages
Microarray Full
No ratings yet
Microarray Full
56 pages
Exploration and Analysis of DNA Microarray and Other High Dimensional Data, 2nd Edition ISBN 1118356330, 9781118356333 Unlimited Download
No ratings yet
Exploration and Analysis of DNA Microarray and Other High Dimensional Data, 2nd Edition ISBN 1118356330, 9781118356333 Unlimited Download
15 pages
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
No ratings yet
Tutorial On Microarray Analysis Using Bioconductor and R (Sample Study)
2 pages
Gene Expression Data Analysis: Minireview
No ratings yet
Gene Expression Data Analysis: Minireview
8 pages
2000-Gene Expression Data Analysis
No ratings yet
2000-Gene Expression Data Analysis
8 pages
Data Mining MetaAnalysis
No ratings yet
Data Mining MetaAnalysis
39 pages
Molecular Modelling and Drug Design
From Everand
Molecular Modelling and Drug Design
K Anand Solomon
No ratings yet
Microarrays Technology (1)
No ratings yet
Microarrays Technology (1)
57 pages
DNA Microarray
100% (1)
DNA Microarray
37 pages
Applied Biophysics for Drug Discovery
From Everand
Applied Biophysics for Drug Discovery
Donald Huddler
5/5 (1)
Pooling Data Across Micorarray
No ratings yet
Pooling Data Across Micorarray
49 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne 2024 Scribd Download
No ratings yet
Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne 2024 Scribd Download
77 pages
Biostatistics Assignment: Dna Microarray: AN
No ratings yet
Biostatistics Assignment: Dna Microarray: AN
14 pages
Edger Users Guide
No ratings yet
Edger Users Guide
105 pages
Utilizing Web-Based Search Engines for Analyzing Biological Macromolecules
From Everand
Utilizing Web-Based Search Engines for Analyzing Biological Macromolecules
Natalie Roberts
No ratings yet
Image Analysis: Pre-Processing of Affymetrix Arrays
No ratings yet
Image Analysis: Pre-Processing of Affymetrix Arrays
14 pages
Microarray Data Analysis: Stuart M. Brown NYU School of Medicine
No ratings yet
Microarray Data Analysis: Stuart M. Brown NYU School of Medicine
73 pages
Systems Biology: A Textbook
From Everand
Systems Biology: A Textbook
Edda Klipp
No ratings yet
Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne - Download the full ebook version right now
No ratings yet
Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne - Download the full ebook version right now
47 pages
Normalization 1
No ratings yet
Normalization 1
23 pages
Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne pdf download
100% (1)
Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne pdf download
62 pages
Methods of Microarray Data Analysis III Papers from CAMDA 02 - 1st Edition Scribd PDF Download
No ratings yet
Methods of Microarray Data Analysis III Papers from CAMDA 02 - 1st Edition Scribd PDF Download
17 pages
Statistical Applications in Genetics and Molecular Biology
No ratings yet
Statistical Applications in Genetics and Molecular Biology
28 pages
Microarray: Yuki Juan Ntust May 26, 2003
No ratings yet
Microarray: Yuki Juan Ntust May 26, 2003
70 pages
High dimensional Microarray Data Analysis Cancer Gene Diagnosis and Malignancy Indexes by Microarray Instant EPUB Download
100% (8)
High dimensional Microarray Data Analysis Cancer Gene Diagnosis and Malignancy Indexes by Microarray Instant EPUB Download
16 pages
Senior Thesis FINAL
No ratings yet
Senior Thesis FINAL
64 pages
Gena Rise
No ratings yet
Gena Rise
32 pages
DNA Microarrays
No ratings yet
DNA Microarrays
39 pages
Terminology Of Biotechnology, Bio Medical Engineering, Molecular Biology, Genetics and Breeding
From Everand
Terminology Of Biotechnology, Bio Medical Engineering, Molecular Biology, Genetics and Breeding
Rakibul Hasan Mahmud
No ratings yet
Introduction To R For Gene Expression Data Analysis
No ratings yet
Introduction To R For Gene Expression Data Analysis
11 pages
Gene Expression - Microarrays: Misha Kapushesky
No ratings yet
Gene Expression - Microarrays: Misha Kapushesky
144 pages
[Ebooks PDF] download Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne full chapters
100% (1)
[Ebooks PDF] download Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne full chapters
67 pages
Micro Array Analysis
No ratings yet
Micro Array Analysis
29 pages
Introduction to Biostatistics A Guide to Design, Analysis, and Discovery [FULL VERSION DOWNLOAD]
100% (8)
Introduction to Biostatistics A Guide to Design, Analysis, and Discovery [FULL VERSION DOWNLOAD]
15 pages
Nmeth 4642
No ratings yet
Nmeth 4642
2 pages
A_Comparative_Study_of_Classification_Methods_For_
No ratings yet
A_Comparative_Study_of_Classification_Methods_For_
6 pages
Full Download Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne PDF DOCX
100% (9)
Full Download Biochemical Ecotoxicology Principles and Methods 1st Edition Francois Gagne PDF DOCX
77 pages
BMC Bioinformatics: A Meta-Data Based Method For DNA Microarray Imputation
No ratings yet
BMC Bioinformatics: A Meta-Data Based Method For DNA Microarray Imputation
10 pages
Final Exam Exam 16 November 2017 Questions
No ratings yet
Final Exam Exam 16 November 2017 Questions
26 pages
CV Linu
No ratings yet
CV Linu
11 pages
2nd Year Biology CH-20 Notes
No ratings yet
2nd Year Biology CH-20 Notes
31 pages
Etiology of Phaeosphaeria Leaf Spot Disease of Maize: Corresponding Author: L.D. Paccola-Meirelles
No ratings yet
Etiology of Phaeosphaeria Leaf Spot Disease of Maize: Corresponding Author: L.D. Paccola-Meirelles
11 pages
DLL Genetics 3
No ratings yet
DLL Genetics 3
2 pages
Functions of Nucleic Acids Are:: Nucleic Acids Consist of Either One or Two Long Chains of Two Types of Nucleic Acids
No ratings yet
Functions of Nucleic Acids Are:: Nucleic Acids Consist of Either One or Two Long Chains of Two Types of Nucleic Acids
2 pages
RNA - Polymerase 1 Class
No ratings yet
RNA - Polymerase 1 Class
21 pages
Nucleic Acids and The Code of Life
No ratings yet
Nucleic Acids and The Code of Life
42 pages
Marchesi et al 1998 plus errata.full 3
No ratings yet
Marchesi et al 1998 plus errata.full 3
1 page
Thèse Rashid
No ratings yet
Thèse Rashid
154 pages
Self Quizzes: See This Figure Below Then Answer The Question (8-10)
No ratings yet
Self Quizzes: See This Figure Below Then Answer The Question (8-10)
4 pages
Entomopathogenic Bacteria
No ratings yet
Entomopathogenic Bacteria
532 pages
Chapter 1 Microbiology 101
No ratings yet
Chapter 1 Microbiology 101
14 pages
Plastic Phylogenomics Dendroseris PLópez
No ratings yet
Plastic Phylogenomics Dendroseris PLópez
14 pages
Aquatic Geomicrobiology - Canfield
No ratings yet
Aquatic Geomicrobiology - Canfield
636 pages
Basics of Microbiology
No ratings yet
Basics of Microbiology
113 pages
Genome Organization
100% (1)
Genome Organization
23 pages
Semi Autonomus Nature of Mitochondria 1st Year
No ratings yet
Semi Autonomus Nature of Mitochondria 1st Year
9 pages
New Approaches To Prokaryotic Systematics-Elsevier - Academic Press (2014)
100% (1)
New Approaches To Prokaryotic Systematics-Elsevier - Academic Press (2014)
329 pages
Structure and Function of RNA - Microbiology - OpenStax
No ratings yet
Structure and Function of RNA - Microbiology - OpenStax
5 pages
The - RNA World Hypothesis - A Journal
No ratings yet
The - RNA World Hypothesis - A Journal
11 pages
Unit 4 AO1 Summaries
No ratings yet
Unit 4 AO1 Summaries
8 pages
rRNA (Ribosomal) tRNA (Transfer) mRNA (Messenger) : Function
No ratings yet
rRNA (Ribosomal) tRNA (Transfer) mRNA (Messenger) : Function
3 pages
LECT-09 Transcription in Eukaryotes
No ratings yet
LECT-09 Transcription in Eukaryotes
27 pages
Molecular Basis of Inheritance Notes
100% (2)
Molecular Basis of Inheritance Notes
12 pages
5-RNA Handout Sep 2020
No ratings yet
5-RNA Handout Sep 2020
37 pages
Microbiology and Parasitology
No ratings yet
Microbiology and Parasitology
55 pages

Assessing The Efficiency of Dye-Swap Normalization To Remove Systematic Bias From Two-Color Microarray Data

Uploaded by

Assessing The Efficiency of Dye-Swap Normalization To Remove Systematic Bias From Two-Color Microarray Data

Uploaded by

Assessing the eciency of dye-swap normalization to remove systematic bias from two-color microarray data

To whom correspondence should be addressed.

Keywords: Two-color microarrays, normalization, LOWESS, dye-swap, replicates.

Within array normalization by self-consistency: LOWESS correction

From that follows

m, where m is the slope of the regression line tted to the scatter

LOcally WEighted leaSt Squares (LOWESS)

Within array normalization using quality control elements: Dye-swap normalization

(3) and (4) are equivalent if ci

Study of the M.tuberculosis growth curve

(a) Dye-swap normalized data Epoxy surface coating.

(b) Dye-swap normalized data amino surface coating from Schott-Nexterion.

(c) Dye-swap normalized data aldehyde surfaces from Schott-Nexterion.

(d) Dye-swap normalized data GAPS II coated slides from Corning.

Figure 1: Dye-swap non-ltered normalized data

(a) LOWESS normalized data Epoxy surface coating.

(b) LOWESS normalized data amino surface coating from Schott-Nexterion.

(c) LOWESS normalized data aldehyde surfaces from Schott-Nexterion.

(d) LOWESS normalized data GAPS II coated slides from Corning.

Figure 2: LOWESS non-ltered normalized data

Green : gDNA (reference), Red : RNA (signal).

boxplot after LOWESS normalization

boxplot after dye swap normalization

(a) Boxplot after LOWESS normalization.

(b) Boxplot after dye-swap normalization.

Analysis of the yeast transcriptional repressors Mig1p and Mig2p

Hierarchical clustering with average linkage

Hierarchical clustering with average linkage

(time point, replicate number)

(time point, replicate number)

(a) Hierarchical clustering of the replicates after LOWESS normalization.

(b) Hierarchical clustering of the replicates after dye-swap normalization.

Appendix A: Dierent properties of Cy3 and Cy5

You might also like