0% found this document useful (0 votes)
54 views

B4 Gene Arrays

This document discusses DNA microarrays and their principles of technology. DNA microarrays allow researchers to analyze the expression levels of thousands of genes simultaneously using miniaturized devices containing DNA probes. They work by hybridizing labeled cDNA from samples to immobilized DNA probes on a substrate. By quantifying the fluorescence intensities at each probe location, conclusions can be drawn about which genes are more or less active between different cell types or conditions. The document discusses different types of probes that can be used, including cDNAs and oligonucleotides, and factors to consider for probe selection based on the goals of the experiment.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

B4 Gene Arrays

This document discusses DNA microarrays and their principles of technology. DNA microarrays allow researchers to analyze the expression levels of thousands of genes simultaneously using miniaturized devices containing DNA probes. They work by hybridizing labeled cDNA from samples to immobilized DNA probes on a substrate. By quantifying the fluorescence intensities at each probe location, conclusions can be drawn about which genes are more or less active between different cell types or conditions. The document discusses different types of probes that can be used, including cDNAs and oligonucleotides, and factors to consider for probe selection based on the goals of the experiment.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Gene arrays B4

Barbara Schaffrath and Andreas Bosio

Introduction Principle of the technology


Function and appearance comprise the phenotype DNA MICROARRAYS are miniaturised devices made up
of a cell and this is determined by the amount, the for the analysis of ribonucleic acids by hybridisation.
proportion and the condition of proteins which are For “classical” hybridisation-based analysis, the
produced in the cell. Although every cell in an extracted genomic DNA (Southern) or RNA (North-
organism possesses the same genetic information, ern) from the tissue of interest is immobilised on a
not all genes are active. According to the function membrane and a single nucleotide sequence (the
and demands of the cell, only certain genes are tran- probe), which is complementary to a certain
scribed into MESSENGER RIBONUCLEIC ACID (mRNA). sequence, is labeled to detect the corresponding
Based on the messenger RNA, the information is gene or transcript (Fig. 1). For array analysis, several
translated into the corresponding protein. After hav- hundreds to thousands of DNA fragments (probes)
ing mapped the whole human genome, questions are immobilised on a SUBSTRATE (a membrane, glass
about function and interaction of the genes and or plastic slide) on defined positions. To answer, e.g.
gene products were still open concerning the major- the question of which genes are regulated in certain
ity of genes. Until very recently, it was impossible to tumour cells compared to normal cells, RNA from
assess simultaneously the expression of hundreds of tumour and normal cells is extracted. The extracted
genes because of the complex and cumbersome RNA is transcribed into its respective copy, the so-
methodologies used in molecular biology. Tradition- called cDNA. cDNAs derived from tumour cells and
al methods mostly work on the basis of “one gene - normal cells are labeled differently, e.g., with
one experiment” which makes comprehensive gene nucleotides linked to different cyanine dyes called
expression profiling infeasible. During the 90s a tech- Cy5 (“red”) and Cy3 (“green”) (see Fig. 2). Subse-
nology was developed which made it possible to quently, the labeled cDNA pools are merged and
study the interaction of thousands of genes simulta- applied to the DNA array.During this procedure,com-
neously. The tools of this technology are referred to plementary sequences in the probes and the labeled
as DNA MICROARRAYS, DNA CHIPS, or biochips. nucleic acids hybridise. By quantifying the fluores-
The DNA chip technology uses the characteristic cent intensities (Cy5, Cy3) and downstream image
of DNA to form helices due to sequence comple- analysis by appropriate software the ratio of bound
mentarity which is called hybridisation. Since Cy5- to Cy3-labeled cDNAs for every spot on the
Southern introduced the blotting technique [1], the array can be determined. This ratio allows conclu-
hybridisation process has been used in a wide range sions on the activity of single genes in tumour and
of techniques for the recognition and quantification normal cells to be drawn.
of DNAs.However,only the reversal of this procedure The convenience of DNA MICROARRAYS consists in
– the arraying of multiple homogeneous copy DNAs the opportunity to observe a biological system on
(cDNAs) and testing with a single heterogeneous the transcriptional level while using minimal
labeled sample – made it possible to study the GENE amounts of sample material.
EXPRESSION PROFILES of thousands of genes in a given DNA microarray formats differ regarding the SUB-
sample. STRATES, the probe selection strategies and the way
198 Gene arrays

FIGURE 1. COMPARISON OF TRADITIONAL NORTHERN-BLOT AND DNA MICROARRAY


A: Total RNA of the tissue of interest is separated by gel electrophoresis and is blotted to a membrane. A labelled cDNA
probe complementary to the transcript of interest is labelled and hybridised to the membrane.
If the transcript is present in the total RNA, a signal can be detected due to hybridisation of probe and transcript.
One experiment – one gene using a single labelled probe.
B: Several cDNAs (hundreds to thousands) complementary to mRNA transcripts of selected genes are covalently bound
to a glass slide on a well-known position (spot). Total RNA from the tissue of interest is transcribed into cDNA and
labelled by reverse transcription. The labelled cDNA is hybridised to the bound cDNAs. Signals can be detected after
hybridisation of two complementary cDNAs.
One experiment – several genes using multiple probes and labelled transcripts.

the probes are immobilised. Basically, the field of (cDNA, oligonucleotide), manufacturing process (in
DNA MICROARRAYS can be divided into two groups: situ hybridisation, non-contact/contact SPOTTING) or
CDNA ARRAYS and OLIGONUCLEOTIDE ARRAYS.The mate- labeling procedure (fluorescent dyes,biotin,radioac-
rial of the SUBSTRATE differs independent of the dis- tive labeling, etc.).
pensed probes. So does the number of spotted
probes, presence of replicates and the presence of
controls. Several aspects of the technology will be The variety of probes
discussed in detail in the following paragraphs.
There are general criteria,e.g.for probe selection, In general, the PROBE SELECTION STRATEGY for MICROAR-
quality control (of probes, SUBSTRATE and RNA), data RAYS is subjected to several pre-requisites such as the
acquisition and data analysis, which are relevant for objective of the experiment, the available sequence
all DNA microarray types. Some criteria have to be information regarding the organism to be investigat-
considered depending on the microarray types ed and the efforts one is willing to invest. If one is
Principle of the technology 199

FIGURE 2: FLOW DIAGRAM OF MICROARRAY ANALYSIS

interested in a broad overview of the expression lev- oligonucleotide length of 20–60 nucleotides due to a
els of as many genes as possible in a small set of limited coupling efficiency [2, 3]. The direct SPOT-
experiments with low pre-chip expenses, it would be TING of pre-synthesised oligonucleotides with a
sufficient to spot cDNA libraries (for def. see Box 1), length to up to 70 bp offers the advantage that
uncharacterised expressed sequence tags (ESTs) probes can be quality-controlled before the SPOTTING
(for def. see Box 2), or oligonucleotides derived process. The use of short oligonucleotides (20 to 30
therefrom.The in-depth sequence analysis and anno- basepairs) is suitable for distinguishing between per-
tation is thereby postponed to the point when fectly matched duplexes and single-base or two-base
hybridisation has been performed and the resulting mismatches [4–6].When working with short oligonu-
‘clones/ genes of interest’ have been identified. In cleotide probes, the use of several different oligonu-
contrast, if the experimental setup calls for ACCURATE cleotides corresponding to a single gene is typically
identification and quantification of particular mRNA required to enhance the reliability of the hybridisa-
species by hybridisation,one can either buy ready-to- tion signals [7]. Thus, several signals are gained for
spot sets or take the time and effort to clone and one gene, which sometimes may be contradictory
sequence-verify the cDNAs (Fig. 3). However, these due to differing hybridisation kinetics or splice vari-
processes should be accompanied by extensive ants and therefore hard to interpret.
quality management.
cDNA probes
Oligonucleotide probes
As already mentioned, the appropriate PROBE SELEC-
Oligonucleotides are generally spotted or synthe- TION STRATEGY depends primarily on the objective of
sised in situ directly on the array. the experiment. If little prior information on relevant
The in situ synthesis by PHOTOLITHOGRAPHIC PROCE- genes is available or the prime motivation is an unbi-
DURES or inkjet technology allows a parallel produc- ased overview of global changes in gene expression
tion of OLIGONUCLEOTIDE ARRAYS, but is restricted to an patterns, the SPOTTING of clones from a library with-
200 Gene arrays

BOX 1. CDNA LIBRARIES

Libraries which store cDNA clones generated from RNA transcripts.Typically these clones represent only the open read-
ing frame (ORF).

BOX 2. ESTS

A set of single-pass sequenced cDNAs from an mRNA population derived from a specific cell population (e.g. a specif-
ic tissue, organ, developmental state or environmental condition).
- provides a profile of the mRNA population
- quick method for cloning a large number of genes known to be expressed in a cell population

chain-reaction (PCR). By doing so, one can adjust


important properties of the selected regions like
length, orientation or position within the mRNA,
which is important to avoid false signals (for more
information see Box 3). Beside the careful selection
of sequence regions, length and uniformity of the
cDNA fragments are of particular interest regarding
the robustness of the hybridisation process.
Uniformity provides optimal hybridisation condi-
tions (temperature and buffer conditions) for all
cDNAs in parallel.With increasing length of the used
probes, the hybridisation becomes more stable. A
length of 200 bases is adequate to guarantee an effi-
cient hybridisation independent of single nucleotide
polymorphisms and varying GC content of the single
probes.The SENSITIVITY of CDNA ARRAYS increases the
FIGURE 3. SELECTION STRATEGIES FOR PROBES longer the probes are, since more labeled samples
may hybridise to the respective immobilised probes.
On the other hand, the length of the probes
out prior sequencing is acceptable. Only those should be limited to a maximum of about 400 base-
clones that show differential expression after hybridi- pairs in order to avoid the probability of cross
sation are submitted to sequencing and further hybridisation caused by repetitive elements and
analysis. This approach includes a general lack of unspecific interaction.A limitation to ~ 400 basepairs
reliable sample annotation, shifting some of the nec- still provides the opportunity to distinguish even
essary work to the post-hybridisation phase. genes from highly homologous gene families by
A more refined strategy relies on available collec- choosing fragments from appropriate (e.g. poorly
tions of sequenced cDNA clones. The most reliable conserved) gene regions.
strategy is the amplification of suitable sequence A determination of single-base or two-base mis-
regions from pre-selected genes by polymerase- matches is not feasible with longer cDNA probes.
Production of microarrays 201

BOX 3. SELECTION AND ANNOTATION OF SUITABLE CDNA FRAGMENTS

To generate suitable fragments for the unambiguous detection of genes, some considerations have to be included:
1. Cross-hybridisation
Can be reduced by optimizing the experimental conditions if based on unspecific binding. Other reasons for unde-
sirable cross hybridisations are repetitive elements such as Alu repeats, microsatellite repeats, SINEs or LINEs (short
or long interspersed elements) within the DNA sequence. Comparing the selected sequence to databases like REP-
BASE (www.girinst.org) will give some information about this feature.
2. Splice variants/alternative splicing
Different mRNA species may be transcribed from a single gene. Alternative polyadenylation signals or alternative
splicing (the excision of distinct exons) may occur in the sequence. A fragment not detecting all of the possible
mRNA variants could lead to contrasting expression profiles when different mRNA species are present.
On the other hand, the variant gene products of a gene may serve different purposes. It might be interesting to dis-
tinguish between the mRNA species.
Depending on the question,fragments are generated either to detect a specific mRNA species,e.g.,splice or polyadeny-
lation variants or to detect all possible mRNA species of one gene. Most important with respect to quality management
of the array production is a precise documentation of the expected hybridisation properties of the fragment. In addi-
tion,each fragment should be completely annotated according to all the names used in public databases like Unigene,
Swissprot or trEMBL and other relevant data sources.This alleviates the gene selection for the user by simultaneously
avoiding redundancy caused by the probable utilization of different names for an identical gene [8].

a minimum. Other points like control of inter- and


Production of microarrays intra-batch uniformity should be kept in mind while
developing a coating protocol with respect to array
What are substrates ? production.

SUBSTRATES represent the solid phase of arrays. The


production of suitable SUBSTRATES is as crucial for reli- Production of probes
able and reproducible results as the generation of
probes. Array SUBSTRATES of various materials (mem- If oligonucleotides or whole EST clones are used as
branes, plastics, glass) with various coatings have probes, the SPOTTING process is the next step in the
flooded the market in the last few years. Generally, manufacturing of the arrays. This is not the case if
SUBSTRATES are composed of a solid phase and one or selected regions of genes are generated by reverse
two layers to hydrophobise the surface by simultane- transcription (RT) PCR (see 1.2.2). Here, the appro-
ously providing reactive groups to bind DNA cova- priate cDNA fragments are generated with the
lently (see Box 4). To guarantee sufficient quality respective primers, cloned and sequence-verified.
some general features should be taken into account. When the clones are sequence-verified, the inserts
Surface properties like planarity (which is mainly can be amplified by PCR. By using vector-specific,
influenced by the used solid phase), uniformity, possibly modified primers (see Box 4), the resulting
mechanical and chemical stability and an optimal PCR products are prepared for covalent binding to
DNA binding capacity are crucial. When working the activated surface of the SUBSTRATE (Fig. 4). This
with FLUORESCENCE-LABELED SAMPLES, autofluorescence step again should be quality-controlled by checking
of the SUBSTRATE may strikingly limit the SENSITIVITY of for the right fragment length and repeating several
hybridisation experiments and has to be reduced to PCRs using gene-specific primers.
202 Gene arrays

BOX 4. ARRAY COATINGS

In the beginning, most of the coatings which have been developed made use of the reactivity of the nucleobases and
lead to an unspecific crosslinking of the DNA to the surface, e.g., silyl (reactive aldehyde), silane (reactive amino-
groups), or polylysine [6]. These interactions can reduce the conformational freedom of the cDNAs and hence their
affinity for complementary molecules in solution. In contrast, other surfaces, suitable for a covalent binding of the DNA
to the surface via a specific linker attached to the 5’ or 3’ end of the DNA [9,10],were developed.The end-specific cova-
lent binding offers the opportunity to direct the amount of attached cDNAs to an optimal density at which enough
cDNAs are present but charge and steric effects are minimal.

FIGURE 4. SCHEME OF THE GENERATION OF A CDNA MICROARRAY STARTING FROM A GENE OF INTEREST

Spotting are independent of the kind of probes and SUBSTRATES


used.
Apart from in situ synthesis, the SPOTTING of cDNAs or CONTACT PRINTING typically involves rigid pins and
pre-synthesised oligonucleotides is practised. Basi- is probably the most widely used method for the pro-
cally, there are two methods to spot probes: contact duction of microarrays. There are various types of
printing and non-contact printing. Both technologies pins, but independently of the shape, the pins gener-
Application of microarrays 203

BOX 5

For a highly reliable detection especially of weakly expressed genes, probes should be spotted in replicates. To min-
imise the impact of probable spatial effects on the measured expression ratio, the replicates should not be arranged in
contiguity but be uniformly distributed over the whole spotting area.

BOX 6. QUALITY OF TOTAL RNA

Integrity and purity are the most critical factors for the quality of RNA.
- Ratio of 28S rRNA and 18S rRNA should be 2.
- Ratio of the extinction 260 nm/280 nm should be between 1.8 and 2.0.
- The sample should preferably be treated with DNAse to avoid contamination of genomic DNA.
- Protocols for RNA extraction has to be adapted according to the analysed tissue (e.g., extreme fatty or fibrous tissue)
- The choice of the preparation protocol may have influence on the range of transcript lengths present in the extract-
ed RNA (e.g., silica filters have a cut-off size of about 50–100 bases and therefore not the whole range of fragment
lengths are present in preparations derived therefrom.This will have an impact on the subsequent steps (labelling or
amplification).

ally dip into the probes and print them to the sur- to search in advance for an appropriate protocol in
face. NON-CONTACT PRINTING methods are based upon order to use it for all of the samples that should be
the ink-jet.The sample is dispensed as droplets from analysed in one batch of experiments.
the print head.
Independent of the SPOTTING technology the SPOT-
TING process should be accompanied by an ACCURATE Amount of RNA
documentation. The positioning of the genes has to
be unequivocally traceable (see also Box 5). The necessary amount of total RNA highly depends
on the labeling method and the kind of arrays. If any
amplification method is used, the required amount
Application of microarrays of RNA is drastically reduced.Thus, it is hardly possi-
ble to draw a general conclusion.
Preparation and quality of RNA Roughly, amounts varying from 10 to 100 µg total
RNA (which resembles 0.2–2 µg mRNA) are
Even if it sounds trivial, the first crucial step to required without amplification depending on the
achieve reliable gene expression results is RNA isola- labeling method and the kind of array. If the amount
tion.If the RNA is slightly degraded or contaminated, of available RNA is limited, e.g. when biopsies or
the results may be biased and irreproducible (see microdissections are analysed, the RNA can be sub-
also Box 6).Cells or tissue from which RNA is extract- jected to a linear amplification delivering amplified
ed are commonly lysed and subsequently extracted (a)RNA. Basically, either PCR-based amplification
using organic solvents or silica filter-based methods. methods or T7-driven in-vitro transcriptions from
Since RNA extraction protocols may influence the double-stranded cDNA templates are commonly
outcome of the expression analysis, it is worthwhile used (Fig. 5) [11].
204 Gene arrays

FIGURE 5. SCHEMATIC DIAGRAM OF MRNA AMPLIFICATION

Dyes, labeling and hybridisation methods cols increase the SENSITIVITY so that the amount of
starting material can be lowered, but most of these
Besides radioactive labeling, mainly fluorescent dyes indirect protocols are not that highly reproducible,
are used (see Box 7). Labeling using silver, gold or since they require more reactions (not only enzymat-
platinum particles either as LABEL itself or as ic but also chemical ones) and subsequent purifica-
enhancer gave good results,but had not been broad- tion steps.
ly accepted until now.The incorporation of the dyes The hybridisation can be performed as a one-,
is either direct or indirect. Performing direct dye two- or multi-colour experiment. In fact, mostly one-
incorporation (direct or one-step labeling),the mark- or two-colour protocols are applied. Using two
ers are introduced by reverse transcription of mRNA colours, the sample (e.g. treated cells, pathological
using a dye-labeled desoxyribonucleoside triphos- tissue) and the respective control (e.g. untreated
phate (dNTP) (most commonly dCTP) (see Fig. 2). cells,normal tissue) can be analysed simultaneously.
Since direct integration of the fluorescence dyes Of course, there is a variety of fluorescent labels on
is often problematic due to steric effects, two-step the market, using different dyes with different extinc-
labeling protocols (indirect labeling) have been tions. In any case, the use of Cy3 and Cy5 is so far the
established which first introduce a (smaller) reactive most common and accepted one.
compound into the cDNA,which are then linked in a The most cumbersome and critical step for array
second step to fluorescent dyes.Some of these proto- applications is the hybridisation step where the
Array data: acquisition, analysis and mining 205

BOX 7

The labelling process can be monitored by positive controls,e.g.artificial cDNAs which are present on certain positions
on the array.The respective in vitro transcripts can be spiked into each RNA sample.Thus,an effective labelling reaction
leads to signals on the appropriate positions.

labeled TARGET DNA and the affixed probe DNA are ture. Which way is the best depends on factors like
brought together. Therefore, fully or half automated accessibility, reproducibility and coverage (for exam-
hybridisation machines have been developed during ple, how many genes are detectable in the control
the last years. All systems aim to facilitate the active channel).The latter should be determined, of course,
movement of the labeled TARGET DNA during the before starting the experiments. Accessing several
hybridisation and therefore be independent from samples from different sources,e.g.from different lab-
poor diffusion rates. oratory animals at one time point for control, the
To ensure that the variance caused by labeling respective RNA should be pooled before starting the
and hybridisation efficiency is minimal, repeated labeling, since the biological variation of the control
experiments using the same sample should be per- samples could mimic differences between treat-
formed. ments. Pooling of the controls ensures that each indi-
vidual treatment may be compared to an identical
control and that the observed differences can be
Control samples ascribed to (biological) variation of the treatments.

The measurement of gene expression in a given sam-


ple is always referred to the gene expression in other Array data: acquisition, analysis and
samples, in this text denominated as “control”. Alter-
ations in gene expression can only be properly
mining
assessed versus appropriate control samples. As a
consequence, it is very important to choose the right Data acquisition
control in order to gain valuable data. The best con-
trols are obviously untreated cells or unaffected tis- Data acquisition from microarray experiments con-
sue of the same origin as treated cells or affected tis- sists of two parts: the digitalisation of the fluores-
sue respectively (e.g. peripheral blood mononuclear cence signals and the following image analysis using
cells (PBMCs) derived from the same source and appropriate software packages.
treated in vitro with phosphate-buffered saline (PBS) The scanning process is used to generate digi-
(control) or lipopolysaccharide (LPS) (sample)). tised images of the array. Two general types of scan-
Before starting a series of experiments, it should be ners can be used:
ensured that the control samples are accessible dur- Some systems are based on a white light source
ing the whole study. If it is impossible to get samples and a charge-coupled device (CCD) camera detec-
of the same origin as control samples, which may be tor technology. The array is exposed to the desired
the case when dealing with human material, it would excitation wavelengths (filtered from whole UV-VIS
be useful to establish an artificial common reference. spectrum) while the CCD camera is generating one
This can be performed,e.g.by pooling RNA from a set picture at a time from different sections of the array
of different cell lines or a (sufficient) number of con- until the whole array is mapped. The other systems
trol samples like normal tissue or untreated cell cul- are based on different lasers with excitation wave-
206 Gene arrays

BOX 8

Among the range of image analysis products, important quality features are:
• The possibilities to discriminate between valid and unwanted signals.
• To flag empty and negative spots as well as spots of irregular shape or with other minor quality features which can be
excluded from further analysis.
• Algorithms should be included allowing for automated spot finding which is an invaluable tool to optimise the grid
that defines the ‘regions of interest’ (ROIs = spots) quickly.
• The primary data should include information about the standard deviation of the spots and the background signals.

lengths specific for the used dyes.The images for the INTENSITIES should be subtracted from the signal
separate channels can be generated subsequently or intensities to obtain the net signal. Either the global
simultaneously. (the mean of the BACKGROUND INTENSITIES over the
Additional differences may be due to the way the whole spotting area) or the local background
arrays are scanned. Some devices scan the arrays (defined as an area surrounding the spot) may be
from the downside which may be a disadvantage used for background correction. But due to local
because even slight contaminations like dust parti- artefacts (contamination or poor hybridisation per-
cles can lead to ghost spots. This problem can be formance on defined parts of the array) which simi-
avoided by scanning from the upside of the slides. larly affects the signal and the background, the local
For quantification of the signals, appropriate background should be preferred. If a dual labeling is
image analysis software should be used to facilitate used, ratios of the different dyes are computed for
the production of valid data output (“primary data”, every spot.To a large extent systematic variations like
see Box 8). differences in dye integration can be reduced by a
process called “NORMALISATION” (see Box 9) and
which is performed during data analysis. If replicates
Data analysis and mining are present, the resulting data are averaged.
Because of the multiparametric nature of
Once having generated the primary data,the ratios of microarray experiments, bioinformatics and data
gene expression levels have to be calculated and mining represent essential tools for interpretation of
normalised. In order to include only valid signals in the mass of numerical data produced by (series of)
further analyses, a set of rules should be followed microarray experiments. Starting from relatively sim-
before starting the calculation of the expression ple demands for appropriate visualization of the
ratios. data, bioinformatics tools are necessary to focus on
Only ‘good quality control (QC) spots’ should be candidate genes and point out subtle changes in
processed. Spots of poor quality (empty or negative expression over many genes. Such expression pat-
spots,irregular shape,see Box 8) should be excluded terns have predictive power but are difficult to spot.
from further analyses. The minimum signal intensity Reliable identification of candidate genes by sta-
of a spot has to be determined in order to distinguish tistical methods often suffers from a limited number
unreliable spot signals from valid spots. This can be of replicate experiments. Since researchers are
achieved by setting a minimum threshold for signal sometimes overwhelmed by the amount of data pro-
intensities which is either dependent on the back- duced already by performing one single experiment,
ground or on negative controls (e.g. spotting buffer they ignore the basic necessity of repeated assays.
only or fragmented genomic DNA). This way, false Biological replicates are essential if dealing with
positive results can be avoided. The BACKGROUND expression profiling, especially if subtle changes of
Examples of microarray experiments 207

BOX 9. NORMALISATION

The main idea of normalisation for dual-labelled samples is to adjust differences in the intensity of the two labels.Such
differences result from efficiency of dye integration, differences in amount of sample and label used, settings of laser
power and photo-multiplier. Normalisation of one-channel arrays mainly corrects spatial heterogeneity. Although nor-
malisation alone cannot control all systematic variations, normalisation plays an important role in the earlier stage of
microarray data analysis because expression data can significantly vary under different normalisation procedures. A
number of normalisation methods have been proposed, but it is not possible to decide in principle which method per-
forms best. The normalisation method strongly depends on several factors like the number of detectable genes, the
number of regulated genes, signal intensities, quality of the hybridisation, etc.
For a rough classification global normalisation can be distinguished from local (signal-intensity-dependent) normali-
sation.
If global normalisation is used, a single normalisation factor is applied to all detectable genes, leading to a linear shift
of all signal intensities. Global normalisation methods are the most widely used methods.The underlying assumption
is that constant systematic variations occur,e.g.,the lower integration rate of one dye in respect to the second dye.Using
the median of the single spot ratios for global normalisation is only advisable if a sufficient number of the detected
genes are not regulated.If it is expected that nearly all of the genes will be regulated (which is of special interest regard-
ing small arrays) a set of housekeeping genes should be included in the array configuration. Because housekeeping
genes (by definition) are not regulated,the signal intensities of those genes should be the same on dual-labelled arrays.
The housekeeping genes have to be picked thoroughly since even so-called housekeeping genes are subject to regula-
tion under particular conditions.
Using local normalisation, a different normalisation factor is calculated for every gene. Local normalisation offers the
opportunity of a signal-intensity-dependent normalisation. Some variations (e.g., laser settings) have different impacts
on detected genes depending on their signal intensity.Thus, a non-linear shift of the signal intensities can be achieved
in reliance on the signal intensity of each single spot.
Several linear and non-linear approaches are applied to normalisation of DNA microarrays. New normalisation meth-
ods are still under development. Because there is no general answer to the question of which algorithm best meets
requirements, experimenters have to examine their data very carefully to decide which normalisation method to
choose.

gene expression shall be used to define e.g. disease information, genomic localisation or protein family
states or to distinguish substances by means of their classification.
impact on defined cell populations. Based on an
appropriate amount of data, classification can be
performed by using statistical methods to identify Examples of microarray experiments
genes characterizing experiment classes.
Additional bioinformatics methods can be used General considerations
to identify groups of interesting genes. One method
commonly used is hierarchical cluster analysis by There is no doubt that all changes in cellular
which genes and arrays can be ordered by similarity processes are reflected by altered gene expression.
in expression behaviour [12] (Fig. 6C). In order to The measurement of changes in mRNA levels can be
semi-automatically screen these results, bioinformat- paralleled at high standards of reliability and repro-
ics infrastructures may be used that integrate the ducibility, since differences in physicochemical
knowledge stored in diverse databases like pathway properties of mRNA rising from different genes are
208 Gene arrays

negligibly low compared to the respective gene assess the meaning of the expression profiling, it is
products. advisable to generate databases holding both the
But because alterations in gene expression do experimental data concerning probe collection and
not have to meet proportional changes on either the the hybridisation results which are linked to addi-
level of protein synthesis or activity, it is necessary to tional data sources (e.g.SwissProt,UniGene,etc.) in a
consider whether expression analysis should be fashion that allows for bi-directional queries.
accompanied by protein analysis methods after It is best to consider questions about data storage
investigation of the expression ratios. Combining in advance, since the data that are collected during
such traditional technologies with gene expression probe generation (and may be missing if one tries to
analysis will provide a comprehensive view to eluci- collect it retrospectively) should match the planned
date molecular mechanisms of actions, e.g. in under- statistical interpretation. General information about
standing particular phenotypes, characterising the the requirements of microarray experiments which
outcome of treatments or defining disease states. assure that microarray data can be easily interpreted
Some general considerations concerning the and that results derived from its analysis can be inde-
experimental design, the configuration of the ade- pendently verified were compiled in the minimum
quate array and the handling of the data have to be information about a microarray experiment
reviewed in advance. (MIAME) proposal [13].
Questions like sample size (to get the appropriate
amount of RNA), number of samples and applicable
controls (paragraph 3.4) have to be clarified. The A practical example for a mode of action
preparation and handling of samples and RNA study – in vitro/in vivo correlation of the
should be accompanied by STANDARD OPERATION PRO- causal relationship between cigarette smok-
CEDURES. In this way it is possible to avoid measuring ing and higher incidence of carcinomas
differences in gene expression that are only artefacts
caused by comparison of inappropriate samples. Little is known in detail about mechanisms leading
Another often-discussed question is the neces- to a higher incidence of carcinomas as a result of
sary number of genes per array. In contrast to MEDI- regular inhalation of cigarette smoke. An in vitro
UM-TO-LOW-DENSITY ARRAYS, where several hundreds time course experiment with Swiss3T3 cells exposed
to some thousands of mostly topic-defined probes to cigarette smoke (CS) was performed. After treat-
are spotted,some manufacturers supply ‘FULL GENOME ment with an aqueous solution of CS,RNA was isolat-
ARRAYS’ (which have to be updated regularly).Which ed at different times spanning 0.5 to 24 hours. RNA
array format fits best depends on the focus of the from untreated cells was prepared at the same time
experiments, although it has to be taken into points and used as control. Expression profiles for
account that the higher the number of genes on an every time point was performed using glass arrays
array, the more difficult the identification of specific representing 513 selected cDNAs. The RNA was
expression patterns characterizing disease states or labeled either Cy5 (CS exposed cells) or Cy3 (con-
treatments becomes.The noise caused by genes that trol cells).The labeling proceeds during the RT-reac-
can be measured as expressed but are only slightly tion of the RNA. Respective samples (Cy5/Cy3 pairs
regulated, hampers, for example, meaningful cluster from CS exposed cells and controls) were pooled,
analysis. cleaned and hybridised on the cDNA array. When
Considering the relatively high numbers of exper- hybridisation was finished, the arrays were read out
iments and the number of data, it is necessary to by a laser scanner generating an image for each of
think about how to organise and interpret all the the fluorescent dyes. For the following analysis the
data. The bare expression ratios call for scientific images were overlayed digitally. Red coloured spots
interpretation, which is not possible until additional represent spots with a higher intensity in the Cy5
information at least on the ‘history’ of each sample is channel (“overexpressed” genes), green spots repre-
available. In order to provide an optimal tool to sent spots with a higher intensity in the Cy3 channel
Examples of microarray experiments 209

FIGURE 6. GENEXPRESSION ANALYSES OF SWISS 3T3 CELLS EXPOSED TO CIGARETTE SMOKE


A. Digital false colour overlays.
B. Scatterplots of computed signal intensities (Cy5/Cy3).
C. Vertical cluster analysis of similarity of gene expression.
210 Gene arrays

(“repressed” genes) and for yellow spots both chan- tion (e.g. Northern blot) the analysis of the expres-
nels had the same intensity, for example, the gene sion profiles of several hundred to thousands of
was expressed similarly in both samples. genes in parallel is possible. This way, differences in
Comparing the expression ratios from treated the activity of genes between two different states of
and untreated cells at different time points,there was cells or tissue, e.g. treated versus non-treated, dis-
a pro-inflammatory response that could be tracked eased versus non-diseased can be described. DNA
for up to eight hours after treatment. After 24 hours, MICROARRAYS differ regarding the dispensed probes,
the GENE EXPRESSION PROFILES of the cells had returned the SUBSTRATE (solid phase), the labeling procedure
to their primary levels. The digital false colour over- and the process of manufacturing. For a rough classi-
lay (Fig. 6A) allows for examination of quality in fication, CDNA ARRAYS can be distinguished from
terms of consistency of the hybridisation all over the OLIGONUCLEOTIDE ARRAYS. Depending on the objective
spotting area or contamination by dust or smear. of the experiment and the efforts one is willing to
Changes in gene expression during the time invest, the sequence information regarding the DNA
course can be tracked by scatterplots extracted from probes may differ enormously.To get a first overview
the expression ratios of every time point (Fig. 6B). of the expression levels in a given system it would be
Every detectable gene is represented by a point.Most sufficient to spot low characterised sequences. The
of the detected genes are not influenced by treat- in-depth sequence analysis and annotation is
ment and therefore are located on the bisector. delayed to the point when hybridisation has been
Genes which are repressed as a result of treatment performed and the resulting genes of interest have
are located on the right side of the bisector. Induced been identified. If ACCURATE identification and quan-
genes due to treatment can be identified on the left tification of particular mRNA species are desired,
side of the bisector.Replicates of genes are represent- time and effort have to be invested to ensure the
ed in groups of four located nearby, which is also an SPOTTING of sequence-verified cDNAs.Various materi-
indication of quality. From a kinetic point of view, the als like membranes, plastics or glass may serve as
stress response could be characterised by the syn- SUBSTRATES for arrays. To guarantee sufficient quality
chronised upregulation of antioxidant pathways some general features like planarity, uniformity,
orchestrated by stress-responsive transcription fac- mechanical and chemical stability and optimal DNA
tors. In addition to the regulation of known expected binding capacity are crucial. Apart from in situ syn-
antioxidative and genotoxic/cell-cycle-regulatory thesis of oligonucleotides, the SPOTTING of cDNAs or
genes (e.g., heme-oxygenase-1, methallothionines pre-synthesised oligonucleotides is practised where
and a number of heat shock proteins), several genes contact and NON-CONTACT PRINTING are performed. A
not yet linked to a CS-evoked stress response were general declaration about the necessary amount of
identified [14]. Interestingly, some of the most upreg- total RNA for hybridisation is not possible but
ulated genes are known to be responsive to perox- depends highly on the labeling method and the kind
ynitrite-induced oxidative stress. Furthermore, genes of arrays.Roughly,amounts varying from 10 to 100 µg
involved in the inflammatory response and immune total RNA are required. If any amplification method
modulation were found to be upregulated. These is used, the required amount of RNA is drastically
findings – even if derived from in vitro studies – con- reduced.
tribute to our understanding of the mechanisms There is also a broad spectrum of used dyes and
leading to CS-related disorders in vivo. labeling methods. Mainly fluorescent dyes are used,
mostly Cy5 and Cy3.The incorporation of the dyes is
either direct or indirect (two-step labeling).The data
Summary acquisition from microarray experiments consists of
two parts: the digitalisation of the signals by using
DNA MICROARRAYS are miniaturised devices made for scanning devices and the subsequent image analysis
the analysis of ribonucleic acids by hybridisation. In using appropriate software packages for quantifica-
contrast to other technologies based on hybridisa- tion of the signals and valid output of primary data.
References 211

Having generated the primary data,the ratios of gene 4 Hacia JG, Collins FS (1999) Mutational analysis using
expression levels have to be calculated and nor- oligonucleotide microarrays. J Med Genet 36: 730–736
malised. Besides the following interpretation of the 5 Hacia JG, Fan JB, Ryder O (1999) Determination of
expression ratios, further bioinformatics methods ancestral alleles for human single-nucleotide poly-
can be used to identify groups of interesting genes. morphisms using high-density oligonucleotide arrays.
Nat Genet 22: 164–167
6 Okamoto T, Suzuki T, Yamamoto N (2000) Microarray
Selected readings fabrication with covalent attachment of DNA using
bubble jet technology. Nat Biotechnol 18: 438–441
Schema M (ed) (1999) DNA Microarrays: A Practical 7 Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV,
Approach. Oxford University Press, Oxford, UK Chee MS, Mittmann M,Wang C, Kobayashi M, Horton H
Berrar DP, Dubitzky W, Granzow M (ed) (2002) A Practical et al (1996) Expression monitoring by hybridization to
Approach to Microarray Data Analysis. Kluwer Acade- high-density oligonucleotide arrays. Nat Biotechnol 14:
mic Publishers, Boston/Dordrecht/London 1675–1680
Nature Genetics (2002) The Chipping Forecast II.Nat Genet 8 Tomiuk S, Hofmann K (2001) Microarray probe selec-
32 (issue 4, suppl): 461–552 tion strategies. Brief Bioinform 2: 329–340
9 Bosio A, Stoffel W, Stoffel M (1999) Device for the par-
allel identification and quantification of polynucleic
Recommended websites acids. In: MEMOREC Stoffel GmbH: EP0965647
10 O'Donnell-Maloney MJ, Smith CL, Cantor CR (1996)
Microarray Gene Expression Data Society - MGED Society: The development of microfabricated arrays for DNA
Minimum information about a microarray experiment sequencing and analysis. Trends Biotechnol 14:
– MIAME: http://www.mged.org/Workgroups/ 401–407
MIAME/ miame.html (Accessed April 2005) 11 Eberwine J (1996) Amplification of mRNA popula-
National Center for Biotechnology Information: Gene tions using aRNA generated from immobilized
Expression Omnibus: http://www.ncbi.nlm.nih.gov/ oligo(dT)-T7 primed cDNA. Biotechniques 20: 584–591
geo (Accessed April 2005) 12 Eisen MB, Spellman PT, Brown PO, Botstein D (1998)
Unigene: http://www.ncbi.nlm.ig.gov/UniGene (Accessed Cluster analysis and display of genome-wide expres-
April 2005) sion patterns. Proc Natl Acad Sci USA 95: 14863–1488
Genetic Information Research Institute: Repbase: 13 Brazma A, Hingamp P, Quackenbush J, Sherlock G,
http://www.girinst.org (Accessed April 2005) Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA,
Swiss-Prot: Curated protein sequence database: Causton HC et al (2001) Minimum information about
http://www.expasy.org/sprot/ (Accessed April 2005) a microarray experiment (MIAME)-toward standards
for microarray data. Nat Genet 29: 365–371
14 Bosio A, Knorr C, Janssen U, Gebel S, Haussmann HJ,
References Muller T (2002) Kinetics of gene expression profiling
in Swiss 3T3 cells exposed to aqueous extracts of cig-
1 Southern EM (1975) Detection of specific sequences arette smoke. Carcinogenesis 23: 741–748
among DNA fragments separated by gel electrophore-
sis. J Mol Biol 98: 503–517
2 McGall GH, Fidanza JA (2001) Photolithographic syn-
thesis of high-density oligonucleotide arrays. Methods
Mol Biol 170: 71–101
3 Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP,
Fodor SP (1994) Light-generated oligonucleotide
arrays for rapid DNA sequence analysis.Proc Natl Acad
Sci USA 91: 5022–5026

You might also like