Analysis of Bead-Level Data Using Beadarray: Mark Dunning
Analysis of Bead-Level Data Using Beadarray: Mark Dunning
Mark Dunning
Introduction
beadarray is a package for the pre-processing and analysis of Illumina BeadArray.
The main advantage is being able to read raw data output by Illumina’s scanning
software. Data presented in this form are in the same format regardless of
the assay (i.e expression, genotyping, methylation) being performed. Thus,
beadarray is able to handle all these types of data. Many functions within
beadarray have been written to cope with this flexibility.
The BeadArray technology involves randomly arranged arrays of beads, with
beads having the same probe sequence attached colloquially known as a bead-
type. BeadArrays are combined in parallel on either a rectangular chip (Bead-
Chip) or a matrix of 8 by 12 hexagonal arrays (Sentrix Array Matrix or SAM).
The BeadChip is further divided into strips on the surface known as sections,
with each section giving rise to a different image when scanned by BeadScan.
These images, and associated text files, comprise the raw data for a beadarray
analysis. However, for BeadChips, the number of sections assigned to each bio-
logical sample may vary from 1 on HumanHT12 chips, 2 on HumanWG6 chips
or sometimes ten or more for SNP chips with large numbers of SNPs being
investigated.
This vignette demonstrates the processing of bead-level data using beadarray
using data from the beadarrayExampleData package. A more comprehensive
commentary on the analysis of Illumina BeadArray package is given in the
vignette of BeadArrayUseCases, including other analysis tools that are not part
of beadarray .
library(beadarrayExampleData)
library(beadarray)
data(exampleBLData)
Analysis of Bead-level Data using beadarray
Citing beadarray
If you use beadarray for the analysis or pre-processing of BeadArray data please
cite:
Dunning MJ, Smith ML, Ritchie ME, Tavaré S, beadarray: R classes and
methods for Illumina bead-based data, Bioinformatics, 23(16):2183-2184
The command to read bead-level data from the current working directory is
as follows. However, raw data are not included with beadarrayExampleData or
beadarray . See the BeadArrayUseCases package for some example data to try
out this function.
BLData = readIllumina(useImages=FALSE, illuminaAnnotation = "Humanv3")
The useImages argument specifies whether beadarray will read foreground and
background intensities from the TIFF images present in the directory, allow-
ing users to experiment with strategies for image processing. Such strategies
are described in greater detail in the imageProcessing.pdf vignette. In this
example we set useImages=FALSE (often a convenient choice), and locally back-
ground corrected intensities will simply be extracted from the txt files. The
2
Analysis of Bead-level Data using beadarray
3
Analysis of Bead-level Data using beadarray
of flat-files and we have built Bioconductor packages that can be accessed from
within beadarray. In order that the correct mappings are performed, users must
specify an annotation name for their data, which requires knowing the organism
being investigated and annotation revision number (e.g. Humanv4, Humanv3,
Humanv2, Humanv1, Mousev2, Mousev1p1, Mousev1 or Ratv1). The sugges
tAnnotation function may be used if you are unsure of which string to use.
This checks the overlap between the bead IDs found in the data with a col-
lection of IDs extracted from Illumina’s annotation files. For the example data
stored with the beadarrayExampleData, the suggested annotation is Humanv3.
Provided that the illuminaHumanv3.db package is present, beadarray will be
able to annotate the beadarrayExampleData object.
suggestAnnotation(exampleBLData,verbose=TRUE)
## Percentage of overlap with IDs on this array and known expression platforms
## HUMANREF8_V3_0_R1_11282963_A_WGDASL HumanHT12_V3_0_R3_11283641_A
## 48.78621 97.16521
## HumanHT12 V4 0 R1 15002873_B
_ _ _ _ HumanHT12 V4 0 R2 15002873_B
_ _ _ _
## 83.70782 84.05715
## HumanHT12 V4 0 R2 15002873 B_WGDASL
_ _ _ _ _ HumanRef8_V1
## 54.68237 37.33872
## HumanRef8_V2_0_R2_11223162_A HumanRef8_V2_0_R4_11223162_A
## 38.88553 38.88470
## HumanRef8_V3_0_R0_11282963_A HumanRef8_V3_0_R3_11282963_A
## 48.78621 48.78621
## HumanWG6_V1 HumanWG6 V2 0 R2 11223189_A
_ _ _ _
## 37.33872 81.32701
## HumanWG6 V2 0 R4 11223189_A
_ _ _ _ HumanWG6 V2 11223189_B
_ _
## 81.32619 81.32619
## HumanWG6_V3_0_R3_11282955_A MouseRef8_V1
## 97.16521 36.67984
## MouseRef8_V1_1_R4_11234312_A MouseRef8_V2_0_R3_11278551_A
## 38.10629 43.77615
## MouseWG6_V1 MouseWG6 V1 1 R4 11234304_A
_ _ _ _
## 36.67984 38.10629
## MouseWG6_V1_B MouseWG6 V2 0 R3 11278593_A
_ _ _ _
## 36.58768 78.45060
## RatRef12 V1 0 R5 11222119_A
_ _ _ _
## 36.06644
## [1] "Humanv3"
annotation(exampleBLData) <-"Humanv3"
4
Analysis of Bead-level Data using beadarray
The verbose output of suggestAnnotation shows high overlap between the Ar-
rayAddress IDs in the exampleBLData object and the ArrayAddress IDs in the of-
ficial annotation files HumanHT12_V3_0_R3_11283641_A and HumanWG6_V3_0_R3_11282955_A.
However, both HT12 and WG6 arrays have the same probe sequences on them
and the difference is the number of sections on a chip. Hence, we can assign
the Humanv3 label to data from either platform.
## [1] "beadLevelData"
## attr(,"package")
## [1] "beadarray"
5
Analysis of Bead-level Data using beadarray
slotNames(exampleBLData)
## [1] 355 377 452 267 431 357 408 431 351 235
## [1] 10008 10010 10017 10019 10020 10021 10025 10035 10037 10039
4 Scan Metrics
The first view of array quality can be assessed using the metrics calculated by
the scanner. These include the 95th (P95) and 5th (P05) quantiles of all pixel
intensities on the image. A signal-to-noise ratio (SNR) can be calculated as the
ratio of these two quantities. These metrics can be viewed in real-time as the
arrays themselves are being scanned. By tracking these metrics over time, one
can potentially halt problematic experiments before they even reach the analysis
stage. The metrics information for the exampleBLData object can retrieved in
the following way. Illumina recommend that the SNR ratio should be above 10,
so these arrays are acceptable. However, the P95 and P05 values will fluctuate
6
Analysis of Bead-level Data using beadarray
over time and are dependant upon the scanner setup. Including SNR values for
arrays other than those currently being analysed will give a better indication of
whether any outlier arrays exist.
metrics(exampleBLData)
p95(exampleBLData, "Grn")
snr(exampleBLData, "Grn")
5 Transformation Functions
A more flexible way to obtain per-bead data from a beadLevelData object is
to define a transformation function that takes as arguments the beadLevelData
object and an array index. The function then manipulates the data in the desired
manner and returns a vector the same length as the number of beads on the
array. The logGreenChannelTransform is the default transformation in many
plotting / QA functions within beadarray. Users with two-channel data may
also wish to experiment with the similarly defined logRedChannelTransform or
logRatioTransform when plotting.
log2(exampleBLData[[1]][1:10,2])
logGreenChannelTransform
7
Analysis of Bead-level Data using beadarray
## <environment: namespace:beadarray>
logGreenChannelTransform(exampleBLData, array=1)[1:10]
logRedChannelTransform
8
Analysis of Bead-level Data using beadarray
code produces imageplots for all array-sections in the example dataset. Note
that we also change the colour scheme to represent low and high intensities by
light and dark green respectively.
If .locs information is available to beadarray , it will be able to determine the
optimal squareSize parameter. If not (as with our example dataset), the user
may have to experiment with different values for squareSize.
imageplot(exampleBLData, array=1, low="lightgreen", high="darkgreen")
9
Analysis of Bead-level Data using beadarray
7 BASH
BASH is a method for managing the spatial artefacts that may be found on an
array as described in Cairns et al (2008). BASH uses the methodology developed
for the Harshlight package, but altered to exploit the availability of replicated
observations on the same array. The algorithm first identifies Extended defects,
where an array has gradual but significant shifts across the surface. BASH also
seeks to find more localized artifacts on arrays by classifying features that have
unusual intensities as outliers and then finding outliers close to each other on
the array. Two separate algorithms then search for areas with a larger num-
bers of outliers than would be expected by chance (Diffuse Defects) and large
connected clusters of outliers (Compact defects). The random nature (both in
position and numbers of each feature type) of Illumina arrays mean that the
Harshlight algorithm must proceed in a different way to the original Harshlight
implementation. Whereas Affymetrix probes have replicates on other arrays,
Illumina beads are replicated on the same array. We can therefore generate
an error image based on how much each bead differs from the median of its
replicates’ intensities, instead of replicates on other arrays. Having performed
manipulations to the error image, we can then find outliers on this image by bead
type, determining which beads are more than 3 Median Absolute Devations, or
MADs, from the median.
Finally, since Illumina arrays are randomly arranged and use a hexagonal grid
rather than rectangular, BASH has it’s own method for creating networks of
beads on the array. However, if .locs files are available to beadarray the time
taken for this step will be improved considerably.
The following command can be used to run BASH with the default settings
bsh = BASH(exampleBLData, array=1:2)
We have already saved the weights into the exampleBLData object and they can
be retrieved in the following way. A weight of zero meaning that the bead will
be excluded from an outlier calculations or summarisation procedures.
10
Analysis of Bead-level Data using beadarray
##
## 0 1
## 13380 1074989
##
## 0 1
## 143923 956850
11
Analysis of Bead-level Data using beadarray
9 Summarization
The summarization procedure takes the BLData object, where each bead-type
is represented by differing numbers of observations on each array, and produces
a summarized object to make comparisons between arrays. For each array
section represented in the BLData object, all observations are extracted, trans-
formed, and then grouped together according to their ArrayAddressID. Outliers
are removed and the mean and standard deviation of the remaining beads are
calculated.
The illuminaChannel class is used to define how summarization proceeds with
specification of a transformation function, a function to remove outliers and
function to calculate the means and standard deviation. The default options
to summarize apply a log2 transformation, remove outliers using the Illumina 3
M.A.D cut-off, and report the mean and standard deviation for each bead type.
BSData <- summarize(exampleBLData)
12
Analysis of Bead-level Data using beadarray
The code below creates a different summarized object; one which reports median
and standard errors and does not log transform the data.
myMedian <- function(x) median(x, na.rm=TRUE)
myMad <- function(x) mad(x, na.rm=TRUE)
The BSData object is very similar to the ExpressionSet class in Biobase. How-
ever, to accommodate the unique features of Illumina data we have added an
nObservations slot, which gives the number of beads that we used to create
the summary values for each bead-type on each array after outlier removal.
BSData
13
Analysis of Bead-level Data using beadarray
## total)
## fvarLabels: ArrayAddressID IlluminaID Status
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation: Humanv3
## QC Information
## Available Slots:
## QC Items: Date, Matrix, ..., SampleGroup, numBeads
## sampleNames: 4613710017_B, 4616494005_A
##
|
| | 0%
|
|======================================================================| 100%
head(det)
## 4613710017_B 4616494005_A
## ILMN_1802380 0.00000000 0.00000000
## ILMN_1893287 0.27309237 0.43658211
## ILMN_1736104 0.55555556 0.73564753
## ILMN_1792389 0.00000000 0.00000000
## ILMN_1854015 0.05756359 0.01869159
## ILMN_1904757 0.21686747 0.40987984
14
Analysis of Bead-level Data using beadarray
sessionInfo()
15
Analysis of Bead-level Data using beadarray
16