Atropos
Atropos
Brian B. Avants1∗ , Nicholas J. Tustison2∗ , Jue Wu1 , Philip A. Cook1 , and James C. Gee1
1 Penn Image Computing and Science Laboratory, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
2 Department of Radiology, University of Virginia, Charlottesville, Virginia, USA
1
http://www.picsl.upenn.edu/ANTs
1 Introduction
As medical image acquisition technology has advanced, significant investment has been made to-
wards adapting classification techniques for neuroanatomy. Early work appropriated NASA satellite
image processing software for statistical classification of head tissues in 2-D MR images [1]. A pro-
liferation of techniques ensued with increasing sophistication in both core methodology and degree
of refinement for specific problems. The chronology of progress in segmentation may be tracked
through both technical reviews [2–9] and evaluation studies [e.g. 10–13].
The problem of accurately delineating the white matter, grey matter and cerebrospinal fluid
(and subdivisions) of the human brain continuously spurs technical development in segmentation.
Following [1], many researchers adopted statistical methods for n-tissue anatomical brain segmen-
tation. The Expectation-Maximization (EM) framework is natural [14] given the “missing data”
aspect of this problem. The work described in [15] was one of the first to use EM for finding a
locally optimal solution by iterating between bias field estimation and tissue segmentation. A core
component of this work was explicit modeling of the tissue intensity values as normal distributions
[16] for both 2-D univariate simulated data and T1 coronal images, which continues to find utility in
model, also influenced by earlier work [17], where Parzen windowing is used to model the tissue
intensity distribution omitting consideration of the underlying bias field. Although technically not
an EM-based algorithm, the robustness of the latter has motivated its continued use even more
Subsequent development included the use of Markov Random Field (MRF) modeling [19] to
regularize the classification results [20] with later work adding heuristics concerning neuroanatomy
to prevent over-regularization and the resulting loss of fine structural details [21, 22]. A more
formalized integration of generic MRF spatial priors was employed in the work of [23], commonly
referred to as FAST (FMRIB’s Automated Segmentation Tool), which is in widespread use given
its public availability and good performance. More recently, a uniform distribution of local MRFs
within the brain volume and their subsequent integration into a global solution has been proposed
Several initialization strategies have been proposed to overcome the characteristic susceptibility
of EM algorithms to local optima. Common low-level initialization steps include uniform prob-
ability assignment [15], Otsu thresholding [23], and K-means clustering [25]. More sophisticated
low-level initialization schemes include that of [26] in which a dense spatial distribution of Gaussians
is used to capture the complex neuroanatomical layout with subsequent processing used to conjoin
subsets of such Gaussians belonging to the same tissue classes. Recently, reseachers have begun
to rely on spatial prior probability maps of anatomical structures of interest to encode domain
knowledge[22, 27, 28]. These spatial prior probability maps may also provide an initial segmen-
tation. Related technological developments model partial volume effects for increased accuracy in
A general trend towards more integrative neuroanatomical image processing led to the work
described in [28] which is publicly available within SPM5, a large-scale Matlab module in which
registration, segmentation, and bias field correction can be simultaneously modeled within a single
optimization scheme. The roots of this very popular software package stem back to early work
by Karl Friston which laid the basis for statistical parametric mapping [32]. Similar integrative
brain processing was provided in [33] in which segmentation and registration parameters were
optimized simultaneously while casting the inhomogeneity model parameters of [15] as nuisance
variables. Continued work involved recursive parcellation of the brain volume by considering sub-
source medical image computation and visualization package with developmental contributions
Related neuroanatomical research concerns the selection of geometric features of the cortex [e.g.
36] which aims at understanding the functional-anatomical relationship of the human brain. Recent
endeavors produce a dense cortical labeling in which every point of the cortex is classified, i.e. a
cortical parcellation [37–39]. Various techniques have been proposed to reduce the manual effort
required to densely label a high-resolution neuroimage; one example is the popular software package
known as Freesurfer [37, 40, 41]. In contrast to the volumetric approach detailed in this work,
Freesurfer is primarily a surface-based technique in which the brain structures such as the grey-white
matter interface and pial surfaces are processed, analyzed, and displayed as tessellated surfaces
[40, 41]. Advantages of surface representations include the ability to map processed neuroanatomy
to simple geometric primitives such as spheres or planes and the ease of including topological
constraints in the analysis workflow. These types of methods, including Klein’s Mindboggle [42],
Researchers in aging often focus on accurately segmenting the T1 MRI of elderly controls
and subjects suffering from neurodegeneration, for instance, via SIENA [43]. A recent evaluation
study compared kNN segmentation, SPM Unified Segmentation and SIENA and found different
performance characteristics under different evaluation criteria [44]. [12] had similar findings when
comparing SPM5, FSL and FreeSurfer. These studies suggest that no single method performs best
under every measurement and, along with the No Free Lunch theorem [45], highlight the need for
segmentation tools that are tunable for different problems and research goals.
Our open source segmentation tool, which we have dubbed Atropos,2 efficiently and flexibly
implements an n-tissue paradigm for voxel-based image segmentation. Atropos allows users to
harness its generalized EM algorithm for standard tissue classification of the brain into gray matter,
white matter and cerebrospinal fluid even in cases of multivariate image data—relevant when more
than one view of anatomy aids segmentation, as in neonatal brain tissue classification [e.g. 18,
46]. Atropos equally allows incarnations that use EM to simultaneously maximize the posterior
probabilities of many classes with minimal random access memory requirements, for instance, when
parcellating the brain into hemispheres, cortical regions and deep brain structures such as amygdala,
hippocampus and thalamus. Atropos contains features of its predecessors for performing n-tissue
segmentation including imposition of prior information in the form of MRFs and template-based
spatial prior probability maps as well as weighted combinations of these terms. We also borrow
an idea from [47] and use sparse spatial priors to provide initialization and boundary conditions
segmentation toolbox that may be modified, tuned and refined for different use scenarios.
Coupled with the registration [48] and template building [49] already included in the ANTs, At-
ropos is a versatile and powerful software tool which touches multiple aspects of our brain processing
pipeline. We use Atropos to address brain extraction [50], grey matter/white matter/cerebrospinal
fluid segmentation, label fusion/propagation and cortical parcellation. We also allow Atropos to
interact with the recently developed N4 bias correction software [51] in an adaptive manner. To
2
Atropos is one of the three Fates from Greek mythology characterized by her dreaded shears used to decide the
destiny of each mortal. Also, consistent with the entomological motif of our ANTs, Acherontia atropos is a species of
large moth known for the skull-like pattern visible on its thorax.
further highlight the value of this open source contribution, we performed a search of software
attributes on NITRC and found that as of November 2010 no stand-alone EM methods are cur-
rently listed. We also evaluate Atropos performance on two brain MRI segmentation objectives.
First, three-tissue classification. Second, we test our ability to parcellate the brain into 69 neu-
roanatomical regions to illustrate the practical value of the low-memory implementation within this
paper. Although Atropos may be applied to multivariate data from arbitrary modalities, we limit
our evaluation to tissue classification in T1 neuroimaging in part due to the abundance of “gold-
standard” data for this modality. Consistent with our advocacy of open science (not to mention
the facilitation of analysis due to accessibility) we also only use publicly available data sets. For
this reason, all results in this paper are reproducible with the caveat that users may require some
Organization of this work is as follows: we first describe the theory behind the various compo-
nents of Atropos while acknowledging that more theoretical discussion is available elsewhere. This
mense practical utility. We then report results on the BrainWeb and Hammers dataset. Finally, we
provide a discussion of our results and our open source contribution in the context of the remainder
application-specific manner. The theory underlying Atropos dates back 20+ years and is rep-
resentative of some of the most innovative work in the field. Although we summarize some of the
theoretical work in this section, we recommend that the interested reader consult the deep literature
in this field for additional perspective and proofs behind the major concepts.
Bayes’ theorem provides a powerful mechanism for making inductive inferences assuming the
availability of quantities defining the relevant conditional probabilities, specifically the likelihood
and prior probability terms. Bayesian paradigms for brain image segmentation employ a user se-
lected observation model defining the likelihood term and one or more prior probability terms. The
product of likelihood(s) and prior(s) is proportional to the posterior probability. The likelihood term
has been previously defined both parametrically (e.g. a Gaussian model) and non-parametrically
(e.g. Parzen windowing of the sample histogram). The prior term, as given in the literature, has
often been formed either as MRF-based and/or template-based. An image segmentation solution
in this context is an assignment of one label to each voxel3 such that the posterior probability is
maximized. The next sections introduce notation and provide a formal description of three essential
• the prior probability quantities derived from a generalized MRF and template-based prior
terms, and
2.1 Notation
Assume a field, F, whose values are known at discrete locations, i.e. sites, within a regular voxel
lattice that makes up an image domain, I. Note that F can be a scalar field in the case of
unimodal data (e.g. T1 image only) or a vector field in the case of multimodal data (e.g. T1, T2,
and proton density images). A specific set of observed values, denoted by y, are indexed at N
discrete representation of an observed image’s intensities. A labeling of this image, also known as a
hard segmentation, assigns to each site in I one of K labels from the finite set L = {l1 , l2 , . . . , lK }.
Also considered a random field, this discrete labeling is X = {x1 , x2 , . . . , xN } where each xi ∈ L.
We use x to denote a specific set of labels in I and a valid, though not necessarily optimal, solution
Atropos optimizes a class of user selectable objective functions each of which may be represented
in a generic Bayesian framework, as described by [52]. This framework requires likelihood models
3
In the classic 3-tissue segmentation case, each voxel in the brain region is assigned a label of ‘cerebrospinal fluid
(csf)’, ‘gray matter (gm)’, or ‘white matter (wm)’.
and prior models which enter into Bayes’ formula,
1
p(x|y) = p(y|x) p(x) (1)
| {z } |{z} p(y)
Likelihood(s) Prior(s)
where the normalization term, 1/y, is a constant that does not affect the optimization [52]. Given
choices for likelihood models and prior probabilities, the Bayesian segmentation solution is the
Similar to its predecessors, Atropos employs the EM framework [14] to find maximum likelihood
solutions to this problem. The following sections detail the Atropos EM along with choices for the
To each of the K labels corresponds a single probabilistic model describing the variation of F
over I. We denote this set of K likelihood models as Φ = {p1 , p2 , . . . , pK }. Using the standard
notation, Pr(S = s) = p(s), Pr(S = s|T = t) = p(s|t), we can define these voxelwise probabilities,
Prk (Yi = yi |Xi = lk ) = pk (yi |lk ), in either parametric or non-parametric terms. Given its simplicity
and good performance, in the parametric case, pk is typically defined as a normal distribution, i.e.
where the parameters µk and σk2 respectively represent the mean and variance of the k th model.
When yi is a vector quantity, we replace the Euclidean distance by Mahalanobis distance and define
A common technique for the non-parametric variant is to define pk using Parzen windowing of
the sample observation histogram of y, i.e.
NB
!
1 X 1 −(yi − cj )2
pk (yi |lk ) = exp (4)
2σj2
q
NB 2πσj2
j=1
where NB is the number of bins used to define the histogram of the sample observations (in Atropos
the default is NB = 32) and cj is the center of the j th bin in the histogram. σj is the width of
each of the NB Gaussian kernels. For multi-modal data in which the number of components of yi
is greater than one, a Parzen window function is constructed for each component. The likelihood
Atropos segmentation likelihood estimates are based on the classical finite mixture model
(FMM). FMM assumes independency between voxels to calculate the probability associated with
the entire set of observations, y. Spatial interdependency between voxels is modeled by the prior
probabilities discussed in the next section. Marginalizing over the set of possible labels, L, leads
N K
!
Y X
p(y|x) = γk pk (yi |lk ) (5)
i=1 k=1
By modeling F via the set of observation models Φ, this so called finite-mixture model could be used
to produce a labeling or segmentation [e.g. 15]. However, as pointed out by [23], exclusive use of
the intensity profile produces a less than optimal solution because spatial contextual considerations
are ignored. This has been remedied by the introduction of a host of prior probability models
including those characterized by use of MRF theory and template-based information. For example,
in the works of [22] and [18], the original global prior term given in [15] is replaced by the product
of the template-based and the MRF-based prior terms. In addition to their descriptions below, we
discuss a third possible prior/objective combination in the form of a (sparse) prior labeling which
fixes specific points of the segmentation and uses EM to propagate this information elsewhere in
the image.
2.4.1 Generalized MRF Prior
One may incorporate spatial coherence into the segmentation by favoring labeling configurations
in which voxel neighborhoods tend towards homogeneity. This intuition is formally described by
MRF theory in which spatial interactions in voxel neighborhoods can be modeled [53].
system, Ni , on the lattice, I, composed of the neighboring sites of i. This neighborhood system
and where x is any particular labeling configuration on X (in other words, any labeling permutation
p xi |xI−{i} = p (xi |xNi ) (7)
where xI−{i} is the labeling of the entire image lattice except at site i and xNi is the labeling of
Ni . This locality property enforces solely local considerations based on the neighborhood system
in calculating the probability of the particular configuration, x. Following these two assumptions,
the Hammersley–Clifford theorem provides the basis for treating the MRF distribution (cf. Eqn.
with Z a normalization factor known as the partition function and U (x) the energy function which
can take several forms [53]. In Atropos, as is the case with many other segmentation algorithms of
the same family, we choose U (x) such that it is only composed of a sum over pairwise interactions
between neighboring sites across the image,4 i.e.
N X
X
U (x) = β Vij (xi , xj ) (9)
i=1 j∈Ni
where Vij is typically defined in terms of the Kronecker delta, δij , based on the classical Ising
and β is a granularity term which weights the contribution of the MRF prior on the segmentation
solution. Since Atropos allows for non-uniform neighborhood systems and systems in which not
just the immediate face-connected neighbors are considered, we use the modified function also used
in [55], which weights the interaction term by the Euclidean distance, dij , between interacting sites
δij
Vij = (11)
dij
so that sites in the neighborhood closer to i are weighted more heavily than distant sites.
A number of researchers have used templates to both ensure spatial coherence and incorporate prior
from which a template is constructed [e.g. 49, which is also available in ANTs]. Each labeling can
then be warped to the template where the synthesis of warped labeled regions produces a prior
probability map or prior label map encoding the spatial distribution of labeled anatomy which
4
Using a more expansive definition of U (x),
N
X X
U (x) = Vi (xi ) + β Vij (xi , xj )
i=1 j∈Ni
would permit casting the other prior terms inside the definition of U (x) in the form of the external field Vi (xi ) but,
for clarity purposes, we consider them separately.
can be harnessed in joint segmentation/registration or Atropos/ANTs hybrids involving unlabeled
subjects.
We employ the strategy given in [28] in which the stationary mixing proportions, Pr(xi = lk ) =
γk (cf. Eqn. (5)), describing the prior probability that label lk corresponds to a particular voxel,
regardless of intensity, are replaced by the following spatially varying mixing proportions,
γk tik
Pr(xi = lk ) = PK . (12)
j=1 γj tij
The tik is the prior probability value at site i which was mapped, typically by image registration,
to the local image from a template data set. The user may also choose mixing proportions equal to
tik
Pr(xi = lk ) = PK (13)
j=1 tij
Brain segmentation methods have relied on user interaction for many years [56–59]. Atropos is
capable of benefitting from user knowledge via an initialization and optimization that depends
upon a spatially varying prior label image passed as input. Rapid, sparse labeling—with visualiza-
loop that can be critical to solving segmentation problems with challenging clinical data in which
automated approaches fail. This part of Atropos design is inspired by the interactive graph cuts
pioneered by [60] and which has spawned many follow-up applications. The Atropos prior label
image pre-specificies the segmentation results at a subset of the spatial domain by fixing the priors
and likelihood (and, thus, the posterior) at a subset of I to be 1 for the known label and 0 for each
other label at the same site. The user input therefore not only initializes the optimization, but also
gives boundary conditions that influence the EM solution outside of the known sites. While the
graph-based min-cut max-flow solution is globally optimal for two labels, only locally optimal opti-
mizers are available for 3 or more classes. Thus, in most practical applications, EM is a reasonable
and efficient alternative to Boykov’s solution. Furthermore, one may automate the initialization
process. We provide this capability to allow the user to implement an interactive editing and seg-
mentation loop. The user may run Atropos with sparse manual label guidance, evaluate the results,
update the manual labels and repeat until achieving the desired outcome. This processing loop
2.5 Optimization
Atropos uses expectation maximization to find a locally optimal solution for the user selected
version of the Bayesian segmentation problem (cf. Eqn. (1)). After initial estimation of the
likelihood model parameters, EM iterates between calculation of the missing optimal labels x̂ and
subsequent re-estimation of the model parameters by maximizing the expectation of the complete
data log-likelihood (cf. Eqn. (5)). The expectation maximization procedure is derived in various
publications including [23] which yields the optimal mean and variance (or covariance), but sets the
similar to [28].5 When spatial coherence constraints are included as an MRF prior in Atropos, the
optimal segmentation solution becomes intractable.6 Although many optimization techniques exist
(see the introduction in [27] for a concise summary of the myriad optimization possibilities)—each
with their characteristic advantages and disadvantages in terms of computational complexity and
accuracy—Atropos uses the well-known Iterated Conditional Modes (ICM) [61] which is greedy,
computationally efficient and provides good performance. The EM employed in Atropos may
Initialization: In all cases, the user defines the number of classes to segment. The simplest
initialization is by the classic K-means or Otsu thresholding algorithms with only the number
of classes specified by the user. Otherwise, the user must provide prior information for each
class in the form of either a single n-ary prior label image or a series of prior probability
images, one for each class. The initialization also provides starter parameters.
Label Update (E-Step): Given the initialization and fixed model parameters, Atropos is capable
of updating the current label estimates using either a synchronous or asynchronous scheme.
The former is characterized by iterating through the image and determining which label
5
Due to the lack of parameters in the non-parametric approach, it is not technically an EM algorithm (as described
in [15]). However, the same iterative maximization is applicable and is quite robust in practice as evidenced by the
number of researchers employing non-parametric models (see the Introduction).
6
Consider N sites each with a possible K labels for a total of N K possible labeling configurations. For large
K 3, exact optimization is even more intractable than for the traditional 3-tissue scenario.
maximizes the posterior probability without updating any labels until all voxels in the mask
have been visited at which point all the voxel labels are updated simultaneously (hence the
descriptor “synchronous”). This option is specified with --icm [0]. However, unlike asyn-
To determine the labeling which maximizes the posterior probability for the asynchronous
option, an “ICM code” image is created once for all iterations by iterating through the image
and assigning an ICM code label to each voxel in the mask such that each MRF neighbor-
hood has a non-repeating code label set. Thus each masked voxel in the ICM code image is
assigned a value in the range {1, . . . , C} where C is the maximum code label. Such an image
can be created and viewed with Atropos by assigning a valid filename in the --icm [1] set
of options. An example adult brain slice and the associated code image is given in Figure 1
for an MRF neighborhood of 5 × 5 pixels. This produces a maximum code label of ‘13’. For
each iteration, one has the option to permute the set {1, . . . , C} which prescribes the order
in which the voxel labels are updated asynchronously. After the first pass through the set of
code labels, additional passes can further increase the posterior probability until convergence
(in ∼5 iterations). One can specify a maximum number of these “ICM iterations” on the
command line. For our example in Figure 1, this means that for each ICM iteration, we
would iterate through the image 13 times only updating those segmentation labels associated
Parameter Update (M-Step): Note that the posteriors used in the previous iteration are used
to estimate the parameters at the current iteration. We use a common and elementary
N
1 X
γk ← pk (lk |yi ). (14)
N
i=1
We update the parametric model parameters by computing, for each of K labels, the mean,
PN
yi pk (lk |yi )
µk ← Pi=1
N
(15)
i=1 pk (lk |yi )
1 5 8 9 2
4 2 3 4 7
6 11 10 1 6
1 7 8 9 2
4 2 5 3 4
Figure 1: An adult brain image slice is shown with its ICM code image corresponding to a 5 × 5 MRF
neighborhood. To the right of the ICM code image, we focus on a single neighborhood with a center voxel
associated with the ICM code label of ’10’. Each center voxel in a specified neighborhood exhibits a unique
ICM code label which does not appear elsewhere in its neighborhood. When performing the segmentation
labeling update for ICM, we iterate through the set of ICM code labels and, for each code label, we iterate
through the image and update only those voxels associated with the current code label.
and variance,
PN
i=1 (yi − µk )T pk (lk |yi )(yi − µk )
σk2 ← PN . (16)
i=1 pk (lk |yi )
The latter two quantities are modified, respectively, in the case of multivariate data as follows:
PN
yi pk (lk |yi )
µk ← Pi=1
N
(17)
i=1 pk (lk |yi )
PN T
i=1 pk (lk |yi )(yi − µk ) (yi − µk )
Σk ← PN 2 . (18)
1 − i=1 pk (lk |yi )
This type of update is known as soft EM. Hard EM, in contrast, only uses sites containing label
lk to update the parameters for the k th model. A similar pattern is used in non-parametric
cases.
EM will iterate toward a local maximum. We track convergence by summing up the maximum
posterior probability at each site over the segmentation domain. The E-step, above, depends upon
Gaussian Parzen Windowing
(parametric) (nonparametric)
Prior Probability Prior Label
Images Image
N4 using white
matter posterior
Update
probability image
Likelihoods
N4 Bias Geodesic Euclidean
Initialization
Correction
Figure 2: Flowchart illustrating Atropos usage typically beginning with bias correction via N4. Initialization
provides an estimate before the iterative optimization in which the likelihood models for each class are
tabulated from the current estimate followed by a recalculation of the posterior probabilities associated with
each class. The multiple options associated with the different algorithmic components are indicated by the
colored rounded rectangles connected to their respective core Atropos processes via curved dashed lines.
the selected coding strategy [61]. Atropos may use either a classical, sequential checkerboard update
or a synchronous update of the labels, the latter of which is commonly used in practice. Synchronous
update does not guarantee convergence but we employ it by default due to its intrinsic parallelism
and speed. The user may alternatively select checkerboard update if he or she desires theoretical
convergence guarantees. However, we have not identified performance differences, relative to ground
truth, that convince us of the absolute superiority of one approach over the other.
3 Implementation
Organization of the implementation section roughly follows that of the theory section.
As with other classes that comprise ANTs, Atropos uses the Insight Toolkit as a developmental
foundation. This allows us to take advantage of the mature portions of ITK (e.g. image IO) and
ensures the integrity of the ancillary processes such as those facilitated by the underlying statistical
framework. Although Atropos is publicly distributed with the rest of the ANTs package, we plan
to contribute its core elements to the Insight Toolkit where it can be vetted and improved by other
interested researchers.
An overview of Atropos components can be gleaned, in part, from the flowchart depicted in
Fig. 2. Given a set of input images and a mask images, each is preprocessed using N4 to correct for
intensity inhomogeneity. For our brain processing pipeline, the mask is usually obtained from the
standard skull-stripping preprocessing step which also uses Atropos. Initialization can be performed
in various ways using standard clustering techniques, such as K-means, to prior-based images. This
initialization is used to provide the initial estimate of the parameters of the likelihood model for
each class. These likelihoods combine with the current labeling to generate the current estimate of
the posterior probabilities at each voxel for each class. At each iteration, one can also integrate N4
by using the current posterior probability estimation of the white matter to update the estimate
of bias field.
To provide a more intuitive interface without the overhead costs of a graphical user interface,
a set of unique command line parsing classes were developed which can also provide insight to the
functionality of Atropos. The short version of the command line help menu is given in Listing 1
which is invoked by typing ‘Atropos -h’ at the command prompt. Both short and long option
flags are available and each option has its own set of possible values and parameters introduced in
a more formal way in both the previous discussion and related papers cited in the introduction.
Atropos has a number of parameters defined within Listing 2 and will function on 2, 3 or 4 dimen-
sional data. However, the majority of the time, users will be concerned with a smaller set of input
parameters. Here, we list the recommended input and an example definition for each parameter:
Input images to be segmented: If more than one input image is passed, then a multivariate
model will be instantiated. E.g. -a Image.nii.gz for one image and -a Image1.nii.gz -a
Input image mask: This binary image defines the spatial segmentation domain. Voxels outside
the masked region are designated with the label 0. E.g. -x mask.nii.gz.
Convergence criteria: The algorithm terminates if it reaches the maximum number of iterations
or produces a change less than the minimum threshold change in the posterior. E.g. -c
[5,1.e-5].
COMMAND :
Atropos
OPTIONS :
-d , -- image - dim ensional ity 2/3/4
-a , -- intensity - image [ intensityImage , < adaptiveSmoothingWeight >]
-b , -- bspline [ < nu mberOfL evels =6 > , < i n i t i a l M e s h R e s o l u t i o n =1 x1x ... > , < splineOrder =3 >]
-i , -- initi alizatio n Random [ n u mb er Of C la ss es ]
KMeans [ n um be r Of Cl as s es ]
Otsu [ nu mb e rO fC la s se s ]
P r i o r P r o b a b i l i t y I m a g e s [ numberOfClasses ,
f i l e S e r i e s F o r ma t ( index =1 to n um be r Of Cl a ss es ) or vectorImage ,
priorWeighting , < priorProbabilityThreshold >]
P ri or L ab el Im a ge [ numberOfClasses , labelImage , priorW eightin g ]
-p , -- posterior - formulation Socrates [ < u s e M i x t u r e M o d e l P r o p o r t i o n s =1 > ,
< i n i t i a l A n n e a l i n g T e m p e r a t u r e =1 > , < annealingRate =1 > ,
< m i n i m u m T e m p e r a t u r e =0.1 >]
Plato [ < u s e M i x t u r e M o d e l P r o p o r t i o n s =1 > ,
< i n i t i a l A n n e a l i n g T e m p e r a t u r e =1 > , < annealingRate =1 > ,
< m i n i m u m T e m p e r a t u r e =0.1 >]
-x , -- mask - image m a s k I m a g e F i l e n a m e
-c , -- convergence [ < n u m b e r O f I t e r a t i o n s =5 > , < c o n v e r g e n c e T h r e s h o l d =0.001 >]
-k , -- likelihood - model Gaussian
H i s t o g r a m P a r z e n W i n d o w s [ < sigma =1.0 > , < numberOfBins =32 >]
-m , -- mrf [ < s mo ot hi n gF ac to r =0.3 > , < radius =1 x1x ... >]
-g , -- icm [ < u s e A s y n c h r o n o u s U p d a t e =1 > , < m a x i m u m N u m b e r O f I C M I t e r a t i o n s =1 > ,
< icmCodeImage = ‘ ’ >]
-o , -- output [ classifiedImage , < p o s t e r i o r P r o b a b i l i t y I m a g e F i l e N a m e F o r m a t >]
-u , -- minimize - memory - usage (0)/1
-w , -- winsorize - outliers BoxPlot [ < lo we r Pe rc en t il e =0.25 > , < up pe rP e rc en ti l e =0.75 > ,
< whiskerLength =1.5 >]
GrubbsRosner [ < s i g n i f i c a n c e L e v e l =0.05 > , < w i n s o r i z i n g L e v e l =0.10 >]
-e , -- use - euclidean - distance (0)/1
-l , -- label - propagation whichLabel [ sigma =0.0 , < b o u n d a r y P r o b a b i l i t y =1.0 >]
-h
-- help
Listing 1: Atropos short command line menu which is invoked using the ‘-h’ option. The expanded menu,
which provides details regarding the possible parameters and usage options, is elicited using the ‘--help’
option.
MRF prior: The key parameter to increase or decrease the spatial smoothness of the label map
is β. A useful range of β values is 0 to 0.5 where we usually use 0.05, 0.1 or 0.2 in brain
segmentation. E.g. -m [0.1, 1x1x1] would define β = 0.1 with a MRF radius of one voxel
Initialization: The initialization options include (where the first parameter defines K, here 3 for
each below),
prior probability images only for initialization) or w > 0.0 (use the prior probability
images throughout the optimization). If one chooses 0 < w < 1.0 then one will increase
(from zero) the weight on the priors. These images, like the PriorLabelImage, should
Posterior formulation: The user may choose to estimate the mixture proportions (or not) by set-
Output: Atropos will output the hard segmentation and the probability image for each model.
tation in the first output parameter and a probability image for each class named, here,
Higher dimensions than 4 are possible although we have not yet encountered such an application-
specific need. Multiple images (assumed to be of the same dimension, size, origin, etc.), will
automatically enable multivariate likelihoods. In that case, the first image specified on the command
line is used to initialize the Random, Otsu, or K-means labeling with the latter initialization refined
by incorporating the additional intensity images, i.e. an initial univariate K-means clustering is
determined from the first intensity image which, along with the other images, provides the starting
multivariate cluster centers for a follow-up multivariate K-means labeling. More details on each of
As mentioned previously in the introduction, different groups have opted for different likelihood
models which have included either parametric (Gaussian) or non-parametric variations. However,
these approaches are similar in that they require a list sample of intensity data from the input
image(s) and a list of weighting values for each observation of the list sample from which the
model is constructed. In general, one may query model probabilities by passing a given pixel’s
single intensity (for univariate segmentation) or multiple intensities (for multivariate segmentation)
These similarities permitted the creation of a generic plug-in architecture where classes describing
both parametric and non-parametric observational models are all derived from an abstract list
sample function class. Three likelihood classes have been developed, one parametric and two
non-parametric, and are available for usage although one of the non-parametric classes is still in
experimental development. The plug-in architecture even permits mixing likelihood models with
different classes during the same run for a hybrid parametric/non-parametric model although this
If the Gaussian likelihood model is chosen, the list sample of intensity values and corresponding
weights comprised of the posterior probabilities are used to estimate the Gaussian model parame-
ters, i.e. the mean and variance. For the non-parametric model, the list sample and posteriors are
used in a Parzen windowing scheme on a weighted histogram to estimate the observational model
[62].
Consistent with our previous discussion, we offer both an MRF-based prior probability for modeling
spatial coherence and the possibility of specifying a set of prior probability maps or a prior label map
with the latter extendable to creating a dense labeling. To invoke the MRF ‘-m/--mrf’ option, one
specifies the smoothing factor (or the granularity parameter, β, given in Eqn. (9), and the radius
(in voxels) of the neighborhood system using the vector notation ‘1x1x1’ for a neighborhood radius
of 1 in all 3 dimensions. This radius is defined such that voxels including but not limited to those
that are face-connected will influence the MRF.
Image registration enables one to transfer information between spatial domains which may aid in
both segmentation and bias correction. We rely heavily on template-building strategies [49, 50]
which are also offered in ANTs. Since aligned prior probability images and prior label maps are
often associated with such templates, Atropos can be initialized with these data with their influence
regulated by a prior probability weighting term. Although prior label maps can be specified as a
single multi-label image, prior probability data are often represented as multiple scalar images
with a single image corresponding to a particular label. For relatively small classifications, such
as the standard 3-tissue segmentation (i.e. white matter, gray matter, and cerebrospinal fluid),
this does not typically present computational complexities using modern hardware. However, when
considering dense cortical parcellations where the number of labels can range upwards of 74 per
hemisphere [39], the memory load can be prohibitive if all label images are loaded into run-time
memory simultaneously. A major part of minimizing memory usage in Atropos, which corresponds
prior probability images. Motivated by the observation that these spatial prior probability maps
tend to be highly localized for large quantities of cortical labels, a threshold is specified on the
command line (default = 0.0) and only those probability values which exceed that threshold are
stored in the sparse representation. During the course of optimization, the prior probability image
for a given label is reconstructed on the fly as needed. For instance, the NIREP (www.nirep.org)
evaluation images are on the order of 300 × 300 × 256 with 32 cortical labels. Our novel memory
minimizing image representation typically shrinks run-time memory usage from a peak of 10+ GB
to approximately 1.5 GB and enable these datasets to be used for training/prior-based cortical
parcellation.
Assumptions about bias correction may be thought of as another prior model. As such, the typical
segmentation processing pipeline begins with an intensity normalization/bias correction step using
a method such as the recently developed N4 algorithm [51]. N4 extends the popular nonparametric
COMMAND :
N4BiasFieldCorrection
OPTIONS :
-d , -- image - dim ensional ity 2/3/4
-i , -- input - image i n p u t I m a g e F i l e n a m e
-x , -- mask - image m a s k I m a g e F i l e n a m e
-w , -- weight - image w e i g h t I m a g e F i l e n a m e
-s , -- shrink - factor 1/2/3/4/...
-c , -- convergence [ < n u m b e r O f I t e r a t i o n s =50 > , < c o n v e r g e n c e T h r e s h o l d =0.001 >]
-b , -- bspline - fitting [ splineDistance , < splineOrder =3 > , < sigmoidAlpha =0.0 > ,
< sigmoidBeta =0.5 >]
[ initialMeshResolution , < splineOrder =3 > , < sigmoidAlpha =0.0 > ,
< sigmoidBeta =0.5 >]
-t , -- histogram - sharpening [ < FWHM =0.15 > , < wienerNoise =0.01 > , < n u m b e r O f H i s t o g r a m B i n s =200 >]
-o , -- output [ correctedImage , < biasField >]
-h
-- help
Listing 2: N4 short command line menu which is invoked using the ‘-h’ option. The expanded menu, which
provides details regarding the possible parameters and usage options, is elicited using the ‘--help’ option.
• We replace the least squares B-spline fitting with a parallelizable alternative (which we also
made publicly available in the Insight Toolkit)— the advantages being that 1) computation is
much faster and 2) smoothing is not susceptible to outliers as is characteristic with standard
is to select a single resolution at which bias correction occurs, the N4 framework permits a
multi-resolution correction where a base resolution is chosen and correction can then occur
at multiple resolution levels each resolution being twice the resolution of the previous level.
Specifically, with respect to segmentation, there exists a third advantage with N4 over N3 in that
the former permits the specification of a probabilistic mask as opposed to a binary mask. Recent
demonstrations suggest improved white matter segmentation produces better gain field estimates
using N3 [64]. Thus, when performing 3-tissue segmentation, we may opt to use, for instance, the
posterior probability map of white matter at the current iteration as a weighted mask for input to
N4. This is done by setting the ‘--weight-image’ option on the N4 command line call (see Listing
2) to the posterior probability image corresponding to the white matter produced as output in the
Atropos call, i.e. ‘Atropos --output’. N4 was recently added to the Insight Toolkit repository7
7
http://www.itk.org/Doxygen/html/classitk 1 1N4MRIBiasFieldCorrectionImageFilter.html
where it is built and tested on multiple platforms nightly. The evaluation section will illustrate
The Atropos algorithm is cross-platform and compiles on, at minimum, modern OSX, Windows
and Linux-based operating systems. The user interface may be reached through the operating
system’s user terminal. Because of its portability and low-level efficiency, Atropos can easily
be called from within other packages, such as Matlab or Slicer, or, alternatively, integrated at
compile time as a library. A typical call to the algorithm, illustrated here with ANTs exam-
[0.2,1x1] -o [r16 seg.nii.gz,r16 prob %02d.nii.gz]. In this case, Atropos will output the
segmentation image, the per-class probability images and a listing of the parameters used to set
up the algorithm. A useful feature is that one may re-initialize the Atropos EM via the -i
tation via K-means, alter the output probabilities by externally computed functions (e.g. Gaussian
smoothing, image similarity or edge maps) and re-estimate the segmentation with the modified
priors. Finally, the functionality that is available to parametric models is equally available to the
4 Evaluation
Atropos encodes a family of segmentation techniques that may be instantiated for different appli-
cations but here we evaluate only two of the many possibilities. First, we perform an evaluation on
the BrainWeb dataset using both the standard T1 image with multiple bias and noise levels and
also the BrainWeb20 data [65, 66]. In combination, these data allow one to vary not only noise
and bias but also the underlying anatomy. Second, we evaluate the use of Atropos in improving
whole-brain parcellation and exercise its ability to efficiently solve many-class expectation maxi-
mization problem. We choose this evaluation problem in part to illustrate the flexibility of Atropos
and also the benefits of the novel, efficient implementation that allows many-class problems to be
solved with low memory usage (<2GB for a 69-class model on 1 mm3 brain data).
4.1 BrainWeb Evaluation
employ both the individual subject data and the BrainWeb20 data in this evaluation.
We use the single-subject data with 3% noise and three levels of bias referred to as 0, 20 and 40%
RF inhomogeneity. We study the effect of the MRF prior term and initialization on the Dice overlap
between ground truth and the segmentation result for each tissue. We test both K-means and prior
label image initialization with MRF β ∈ {0.00, 0.05, 0.10, 0.15, 0.20, 0.25, 0.30} at each bias field.
We also feed the white matter probability map derived from K-means into N4 to guide the bias
correction.8 Segmentation is then repeated, with the same parameters, but with the N4-corrected
image as input. The resulting algorithm is similar to those that fix segmentation parameters while
estimating bias and fix bias while estimating segmentation parameters. Thus, with this simple
evaluation, we are able to compare the impact of bias on the combination of N4 and Atropos
and also the validity of our prior label image initialization. Results of these evaluation scenarios,
in terms of Dice overlap, are shown in Figure 4. Because overlap ratios with N4 bias correction
approximate those of the zero bias data, we may conclude that simple N4 pre-processing is adequate
to correct even the 40% RF bias level. An example of this procedure, using BrainWeb data with
40% RF bias, is in Figure 3. We supply the information necessary to repeat the results in this
Atropos documentation folder as of SVN commit 711. The script may be easily modified to run
the whole evaluation. Figure 3 shows the results of simultaneously using proton density and T1-
weighted BrainWeb data to perform the segmentation. This multivariate input data outperforms
The single-subject BrainWeb study in the previous section tested the basic Atropos options and
the benefit of N4 for segmentation in the presence of bias. The 20 subject BrainWeb data al-
lows us to use 2-fold cross-validation to test our ability to segment different individuals reliably.
8
A comprehensive evaluation of N4 reported in [51] used the BrainWeb data set to compare performance with the
original N3 algorithm [63].
(a) (b) (c) (d)
Figure 3: We combine N4 and Atropos by simple sequential processing and apply to BrainWeb T1-weighted
single-subject data with 40% RF bias and 3% noise. The β for the MRF term is, here, 0.2. Slice 71 of
the input data is in (a). The initial K-means (K = 3) segmentation is in panel (b). We use the brain
mask to guide N4 bias correction and produce the image in (c). We repeat the K-means segmentation,
but with the N4-corrected image as input and produce the segmentation in (d). The average 3-tissue
Dice overlap of result (b) is 0.906 while the average overlap for (d) is 0.954. Arrows highlight a region
of large before-after segmentation discrepancy. In (e) we see the BrainWeb proton density image with no
inhomogeneity and 3% noise. Its segmentation is in (f) with average 3-tissue Dice overlap of 0.895. In (g)
we use both proton density data and T1 data as multiple modality input to Atropos. The segmentation of
this two-modality input data, using a multivariate Gaussian model, produces average 3-tissue Dice overlap
of 0.958, which exceeds the univariate solution. An arrow highlights one region where there is small, visually
recognizable improvement in sulcal segmentation relative to the result from T1 data alone. A second area
of improvement is the putamen segmentation. The ground truth segmentation is in (h). The multivariate
segmentation result, in combination with the low PD segmentation performance, suggests PD and T1 provide
complementary information that may improve 3-tissue segmentation and serves to validate the multivariate
Atropos implementation. In this case, the benefit is likely to derive from the fact that the PD image has no
bias.
Dice vs MRF BWeb1 B_RF0 Dice vs MRF BWeb1 B_RF20 Dice vs MRF BWeb1 B_RF40
0.950
0.97
0.920
●
●
●
0.915
0.945 ●
●
0.96
0.910
●
●
● ●
Dice Overlap
Dice Overlap
Dice Overlap
0.940
●
0.95
0.905
● ●
● ● ●
● ●
● ●
●
● ●
0.900
● ●
0.94
0.935
●
● ●
● ●
● ●
● gm
● ●
wm
● ●
●
0.895
● csf
●
● gm
wm
0.93
0.930
● csf
0.890
●
● ●
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
Dice vs MRF BWeb1 N4_RF0 Dice vs MRF BWeb1 N4_RF20 Dice vs MRF BWeb1 N4_RF40
0.965
0.960
0.960
0.96
0.955
0.955
0.950
● ●
0.950
● ● ● ● ● ● ●
●
0.95
● ● ●
Dice Overlap
Dice Overlap
Dice Overlap
●
● ●
0.945
●
0.945
● ●
●
●
●
●
● ●
● ●
0.940
● ● ●
0.940
0.94
●
● ● ● ●
●
0.935
0.935
● ● ●
● gm ● gm ● gm
wm wm wm
● csf ● csf
0.930
● csf
0.93
0.930
● ● ●
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.00 0.05 0.10 0.15 0.20 0.25 0.30
Figure 4: BrainWeb single-subject results for each tissue. The results show that N4 bias correction, com-
bined with Atropos, results in a minimal effect of bias, even at the 40% level. The optimal β for the MRF
term appears to be between 0.1 and 0.2. The legend is in the same position in each graph, allowing a visual
comparison of the results. As one may see, the N4-assisted overlap values are consistent across bias field/RF
inhomogeneity.
Figure 5: BrainWeb 20-subject results for each tissue as a function of MRF-β parameter where MRF-β is
in {0, 0.05, 0.1, 0.15, 0.2} and increases left to right. The results show that the PriorProbabilityMaps with
w = 0.5 (far right) gives the best performance for all tissues.
In this study, we divide the 20 subjects equally into training and testing groups. We then ex-
ploit the ground-truth labeling of the training data to build both a group template [49] and also
prior probability maps for each of the three major tissues in the cerebrum. Each prior proba-
bility map is gained by deforming the ground truth labels from each of the 10 training subjects
to their template and averaging component by component. We then deform the template—and
priors—to the ten testing subjects and run Atropos with not only KMeans[3] initialization but
roles of testing and training sets to gain 3-tissue segmentation for each of the twenty subjects. When
w = 0, the priors are only used in initializing the model parameters but not during subsequent EM
iterations. When w = 0.5, the priors are maintained in the product with the likelihood during all
EM iterations. Results, in terms of bar plots for Dice overlap mean and standard deviation, are
shown in Figure 5.
We evaluate the ability to improve multi-template labeling results by converting the group labels to
probability maps and using them to drive many-class EM segmentation. The ground truth labels
cover 69 classes and much of the brain. Some unlabeled regions remain which we assign to label
69 such that all brain parenchyma contains a unique label. Following [50], the initialization of
our evaluation applies the script ants multitemplate labeling.sh (available in ANTs) to the
under the adult atlases section [38, 67]. These initial majority voting results are competitive with
prior work [38, 68] and serve as a baseline against which we compare.
We first convert each of the 69 labels within the original evaluation dataset to an individual
image. The remaining steps, summarized briefly, are the same for each of the 19 subjects. We
select one subject as an unlabeled target. The other 18 datasets are then mapped (as in the
script above) to that subject. We then deform, individually, the 69 × 18 label images to the
unlabeled subject. The label probability map is gained by averaging the 18 deformed images
associated with each label. We repeat this for each subject. The following parameters are the most
[0.2,1x1x1] -c [5,0] -p Socrates[1]. Results, in terms of Dice overlap, are shown in Figure 6.
Data: The BrainWeb data is freely available. We used single-subject BrainWeb data as is but added
a metaformat data header to the raw binary files. An example copy of this header is contained
cerebrum tissue classes. The Hammers data was also used as is (http://www.brain-development.
org/).
711 was used for the examples and evaluations performed in this paper. Some components of ANTs
depend on the Insight ToolKit. The most critical dependency, for Atropos, is the ITK statistics
framework used to implement the univariate and multivariate parametric models. We linked to the
Scripts: The complete script for the single-subject BrainWeb study is based on generalizing
or greater) and which reproduces Figure 2 results. The template-based normalization procedure
for the BrainWeb 20 and the Hammers evaluation data is based on freely available scripts included
sion of ANTs—with a final version of Atropos—will be prepared with the final version of this
Neuroanatomical Region Label Atropos Majority PVal Sign
Right_Hippocampus 1 0.8522 0.8347 0.000183 +
Left_Hippocampus 2 0.8363 0.8234 0.005046 +
Right_Amygdala 3 0.832 0.8029 5.53E-08 +
Left_Amygdala 4 0.8346 0.8108 2.16E-05 +
Right_Anterior_temporal_lobe_medial_part 5 0.9078 0.8822 1.41E-12 +
Left_Anterior_temporal_lobe_medial_part 6 0.9092 0.8838 6.04E-12 +
Right_Anterior_temporal_lobe_lateral_part 7 0.8935 0.8652 7.55E-15 +
Left_Anterior_temporal_lobe_lateral_part 8 0.9022 0.8742 1.46E-13 +
Right_Parahippocampal_and_ambient_gyri 9 0.8651 0.8421 1.90E-10 +
Left_Parahippocampal_and_ambient_gyri 10 0.866 0.8441 3.91E-10 +
Right_Superior_temporal_gyrus 11 0.8907 0.874 1.16E-11 +
Left_Superior_temporal_gyrus 12 0.8976 0.8798 6.76E-09 +
Right_Middle_and_inferior_temporal_gyri 13 0.89 0.874 1.27E-09 +
Left_Middle_and_inferior_temporal_gyri 14 0.8908 0.873 9.39E-11 +
Right_Fusiform_gyrus 15 0.7667 0.7461 6.79E-08 +
Left_Fusiform_gyrus 16 0.7556 0.7391 0.000685 +
Right_Cerebellum 17 0.9829 0.9702 6.94E-18 +
Left_Cerebellum 18 0.9833 0.9703 3.72E-17 +
Brainstem 19 0.9523 0.9544 0.159323 ~
Left_Insula 20 0.8812 0.8797 0.06472 ~
Right_Insula 21 0.8754 0.8738 0.18553 ~
Left_Lateral_remainder_of_occipital_lobe 22 0.8603 0.8444 1.05E-12 +
Right_Lateral_remainder_of_occipital_lobe 23 0.8594 0.8434 3.70E-16 +
Left_Gyrus_cinguli_anterior_part 24 0.7671 0.7842 0.00263 -
Right_Gyrus_cinguli_anterior_part 25 0.8234 0.8407 0.000209 -
Left_Gyrus_cinguli_posterior_part 26 0.8332 0.8418 0.001878 -
Right_Gyrus_cinguli_posterior_part 27 0.8154 0.8233 0.015334 ~
Left_Middle_frontal_gyrus 28 0.8674 0.8554 1.50E-11 +
Right_Middle_frontal_gyrus 29 0.8717 0.8608 4.93E-15 +
Left_Posterior_temporal_lobe 30 0.872 0.8617 2.52E-10 +
Right_Posterior_temporal_lobe 31 0.8683 0.8578 9.39E-11 +
Left_Inferolateral_remainder_of_parietal_lobe 32 0.8706 0.8581 1.98E-11 +
Right_Inferolateral_remainder_of_parietal_lobe 33 0.8573 0.8434 2.08E-11 +
Left_Caudate_nucleus 34 0.8829 0.8952 5.81E-06 -
Right_Caudate_nucleus 35 0.8853 0.8921 0.000305 -
Left_Nucleus_accumbens 36 0.7361 0.7382 0.822167 ~
Right_Nucleus_accumbens 37 0.7204 0.7158 0.406373 ~
Left_Putamen 38 0.8925 0.8951 0.215131 ~
Right_Putamen 39 0.8936 0.8981 0.015623 ~
Left_Thalamus 40 0.9089 0.9148 0.002933 -
Right_Thalamus 41 0.9054 0.9102 0.003462 -
Left_Pallidum 42 0.7996 0.8288 0.000563 -
Right_Pallidum 43 0.7966 0.8284 0.002293 -
Corpus_callosum 44 0.839 0.886 3.02E-09 -
Right_Lateral_ventricle_excluding_temporal_horn 45 0.8911 0.89 0.581331 ~
Left_Lateral_ventricle_excluding_temporal_horn 46 0.9003 0.8986 0.408928 ~
Right_Lateral_ventricle_temporal_horn 47 0.6557 0.625 0.037556 ~
Left_Lateral_ventricle_temporal_horn 48 0.6523 0.62 0.001961 +
Third_ventricle 49 0.8313 0.817 9.85E-05 +
Left_Precentral_gyrus 50 0.8297 0.824 0.001547 +
Right_Precentral_gyrus 51 0.8463 0.8365 1.15E-06 +
Left_Gyrus_rectus 52 0.8132 0.7991 8.73E-08 +
Right_Gyrus_rectus 53 0.8245 0.8062 8.53E-08 +
Left_Orbitofrontal_gyri 54 0.8625 0.8451 3.75E-11 +
Right_Orbitofrontal_gyri 55 0.8787 0.8599 1.69E-12 +
Left_Inferior_frontal_gyrus 56 0.8491 0.833 3.51E-11 +
Right_Inferior_frontal_gyrus 57 0.8429 0.8282 1.49E-09 +
Left_Superior_frontal_gyrus 58 0.8806 0.8679 1.42E-10 +
Right_Superior_frontal_gyrus 59 0.8894 0.8778 6.43E-11 +
Left_Postcentral_gyrus 60 0.8119 0.7949 1.74E-06 +
Right_Postcentral_gyrus 61 0.8189 0.8009 7.95E-08 +
Left_Superior_parietal_gyrus 62 0.8604 0.8472 6.32E-10 +
Right_Superior_parietal_gyrus 63 0.8714 0.8562 7.95E-08 +
Left_Lingual_gyrus 64 0.855 0.8371 1.87E-08 +
Right_Lingual_gyrus 65 0.8453 0.8259 6.71E-08 +
Left_Cuneus 66 0.834 0.8077 4.66E-10 +
Right_Cuneus 67 0.8515 0.8278 5.93E-08 +
Unlabeled 68 0.7767 0.7579 0.002256 +
average 0.851 0.8412
Figure 6: The figure compares the Dice overlap results from Atropos versus the raw results from majority
voting for each of 68 neuroanatomical regions and, in addition, the unlabeled portions of the brain from the
Hammers evaluation dataset. We evaluated Atropos via N-fold cross-validation and employed PriorProbabil-
ityImages for each class where probabilities are gained by averaging mapped subject labels. The color coding
highlights those regions that have the highest (yellow) and lowest (pink) improvement. The significance of
the improvement, measured by pairwise T-test, is also shown as is a trinary coding of that improvement as:
+ significant improvement, − performance reduction, ∼ no change.
manuscript.
5 Discussion
We introduced Atropos, the theory and implementation details and documented its performance
in a variety of use cases. We also showed evidence that the openly available N4 bias correction
can easily be used with Atropos to improve segmentation. Furthermore, we used multiple subject
BrainWeb data to build dataset-specific priors that provided the most consistent segmentation
performance across tissues. Finally, we used majority voting to initialize an Atropos EM solution
to a 69-class brain parcellation problem. Significant improvements were gained in multiple brain
regions, in particular in temporal lobe cortex, the hippocampi and amygdalae and the lateral
ventricles. This work, in summary, proves the applicability of Atropos in both basic and extended
use cases.
Atropos results are competitive with the state of the art. For instance, [28] (SPM5) evaluated on
0% RF bias field, 3% noise BrainWeb single subject data finding 0.932 (GM) and .961 (WM) Dice
overlap. Results on 40% RF bias were 0.934 (GM) and 0.961 (WM). SPM5 exhibits insensitivity
to bias similar to our own best results on the 40% RF bias, 3% noise case (MRF-β=0.2, K-means
+ N4) with Dice overlap for GM is 0.951 and for WM is 0.963. [69] gave GM Dice overlap results
(BrainWeb single 3% noise) of 0.962 (0% RF bias), 0.964 (20% RF bias) and 0.956 (40% RF bias)
which are slightly higher than either SPM5 or Atropos results. However, [69] do not report WM
or CSF numbers for comparison. Topology-preserving methods also perform well. [70] achieved
Dice overlap for 3% noise 20% RF bias BrainWeb single subject with 0.912 (GM), 0.927 (WM) and
0.900 (CSF) Dice overlap. These are excellent numbers given the topological constraint applied
to the segmentation. [71] proposed TOADS and, estimating from the paper’s graph, showed that
the average Dice overlap accuracy for 3% noise for various RF was 0.930–0.950 (GM), 0.950–0.960
(WM), and 0.920–0.940 (CSF). Perhaps the most recent balanced evaluation was performed in
[12], which reports confusion matrix numbers, rather than Dice overlap. Because the absolute true
number of GM and WM voxels for BrainWeb are known, we can convert the confusion matrix to
Dice overlap. In that case, the SPM5 Dice overlap for BrainWeb single-subject data is 0.885 (GM)
and 0.909 (WM), while FreeSurfer and FSL’s accuracy is lower. The best GM Dice overlap result for
the 20 subject BrainWeb data is obtained by SPM5: 0.930; the best WM Dice overlap is from FSL:
0.950. We note that [12] used a comprehensive evaluation where quality of brain extraction also
contributed to the outcome. Thus, the results must be interpreted slightly differently than those
from other papers. Finally, in our evaluation of 20-subject BrainWeb data, the prior probability
models performed best of all the models used. Compared to the K-Means based segmentation, the
prior based segmentation performance also peaked at lower values of the MRF-β term (0.0 and
0.05). This is reasonable in that the spatial priors themselves impose a degree of regularity on the
segmentation, as in SPM5.
Our prior work, [48], showed that the majority vote initialization provided to Atropos by ANTs
template mapping is competitive with [38]. Overall, the Atropos EM extension improved these
results further. However, in a few regions of the mid-brain, the Atropos EM segmentation performed
significantly worse. This is not surprising, in that Atropos EM assumes that signal from the
likelihood and MRF term is valuable in improving the segmentation. This assumption held for
amygdala and lateral ventricles among other areas. However, in pallidum and corpus callosum (the
most significant areas with loss of performance), this is not true. We believe the explanation is
that the intensity varies within these structures and that a more complex intensity model (or finer
parcellation) would be needed here. An alternative solution would be to use boundary conditions
While specifying performance on BrainWeb is highly valuable, clinical validation is a second im-
portant aspect of segmentation evaluation. For instance, [44, 72–75] are only a few of the papers
that evaluate segmentation performance with respect to a known neurobiological outcome mea-
sure. Atropos is currently used in clinical studies and a number of clinically focused, application-
specific evaluations are ongoing and will constitute future work. One early example of a clinically-
focused Atropos neuroimaging application is in [76]. A second successful application area is that
of ventilation-based segmentation of hyperpolarized helium-3 MRI [77] which also used the open
source Glamorous Glue algorithm to impose topology constraints [78]. Thus, future work may
incorporate topology more closely into the Atropos methodology.
A more general advantage which extends beyond the scope of the experimental evaluation
section of this paper is the flexibility of Atropos. This includes not only n-tissue segmentation
and dense volumetric cortical parcellation, as reported in this work, but Atropos is also used in
conjunction with our ANTs registration tools for robust brain extraction which has reported good
performance in comparison with other popular, publicly available brain extraction tools [50].
6 Conclusion
The Atropos software is freely available to the public. We release this code not only to make
it available to clinical researchers but with the hope that other researchers in segmentation will
provide feedback about the implementation decisions that we made. EM segmentation is non-trivial
and there are numerous design alternatives available not only in the models selected but also in
the ICM coding, alternatives to ICM and the method in which prior and likelihood are combined.
Due to the flexibility of Atropos, we also hope that some of its capabilities, though not evaluated
data used in this work is available in the ANTs software repository, BrainWeb http://mouldy.
Acknowledgments This work was supported in part by NIH (AG17586, AG15116, NS44266,
and NS53488).
References
[1] Vannier M.W., Butterfield R.L., Jordan D., Murphy W.A., Levitt R.G., and Gado M. (1985)
[2] Bezdek J.C., Hall L.O., and Clarke L.P. (1993) Review of MR image segmentation techniques
[3] Pal N.R. and Pal S.K. (1993) A review on image segmentation techniques. Pattern Recognition
26, 1277–1294.
[4] Clarke L.P., Velthuizen R.P., Camacho M.A., Heine J.J., Vaidyanathan M., Hall L.O.,
Thatcher R.W., and Silbiger M.L. (1995) MRI segmentation: methods and applications. Magn.
[5] Pham D.L., Xu C., and Prince J.L. (2000) Current methods in medical image segmentation.
[6] Viergever M.A., Maintz J.B., Niessen W.J., Noordmans H.J., Pluim J.P., Stokking R., and
Vincken K.L. (2001) Registration, segmentation, and visualization of multimodal brain images.
[7] Suri J.S., Singh S., and Reden L. (2002) Computer vision and pattern recognition techniques
for 2-D and 3-D MR cerebral cortical segmentation (part I): A state-of-the-art review. Pattern
[8] Duncan J.S., Papademetris X., Yang J., Jackowski M., Zeng X., and Staib L.H. (2004) Geo-
metric strategies for neuroanatomic analysis from MRI. Neuroimage 23 Suppl 1, S34–S45.
[9] Balafar M.A., Ramli A.R., Saripan M.I., and Mashohor S. (2010) Review of brain MRI image
[10] Cuadra M.B., Cammoun L., Butz T., Cuisenaire O., and Thiran J.P. (2005) Comparison and
tistical brain MR image segmentation algorithms and their impact on partial volume correction
[12] Klauschen F., Goldman A., Barra V., Meyer-Lindenberg A., and Lundervold A. (2009) Eval-
uation of automated brain MR image segmentation and volumetry methods. Hum. Brain.
[13] de Boer R., Vrooman H.A., Ikram M.A., Vernooij M.W., Breteler M.M.B., van der Lugt A.,
and Niessen W.J. (2010) Accuracy and reproducibility study of automatic MRI brain tissue
[14] Dempster A., Laird N., and Rubin D. (1977) Maximum likelihood estimation from incomplete
[15] Wells W.M., Grimson W.L., Kikinis R., and Jolesz F.A. (1996) Adaptive segmentation of MRI
[16] Cline H.E., Lorensen W.E., Kikinis R., and Jolesz F. (1990) Three-dimensional segmentation
of MR images of the head using probability and connectivity. J Comput Assist Tomogr 14,
1037–1045.
[17] Kikinis R., Shenton M.E., Gerig G., Martin J., Anderson M., Metcalf D., Guttmann C.R.,
McCarley R.W., Lorensen W., and Cline H. (1992) Routine quantitative analysis of brain and
[18] Weisenfeld N.I. and Warfield S.K. (2009) Automatic segmentation of newborn brain MRI.
[19] Geman S. and Geman D. (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian
[20] Held K., Kops E.R., Krause B.J., Wells W.M., Kikinis R., and Mller-Grtner H.W. (1997)
Markov random field segmentation of brain MR images. IEEE Trans Med Imaging 16, 878–
886.
[21] Leemput K.V., Maes F., Vandermeulen D., and Suetens P. (1999) Automated model-based
bias field correction of MR images of the brain. IEEE Trans Med Imaging 18, 885–896.
[22] Leemput K.V., Maes F., Vandermeulen D., and Suetens P. (1999) Automated model-based
tissue classification of MR images of the brain. IEEE Trans Med Imaging 18, 897–908.
[23] Zhang Y., Brady M., and Smith S. (2001) Segmentation of brain MR images through a hidden
markov random field model and the expectation-maximization algorithm. IEEE Trans Med
[24] Scherrer B., Forbes F., Garbay C., and Dojat M. (2009) Distributed local MRF models for
tissue and structure brain segmentation. IEEE Trans Med Imaging 28, 1278–1295.
[25] Pappas T.N. (1992) An adaptive clustering algorithm for image segmentation. IEEE Trans.
[26] Greenspan H., Ruf A., and Goldberger J. (2006) Constrained Gaussian mixture model frame-
work for automatic segmentation of MR brain images. IEEE Trans Med Imaging 25, 1233–1245.
[27] Marroquin J.L., Vemuri B.C., Botello S., Calderon F., and Fernandez-Bouzas A. (2002) An
accurate and efficient Bayesian method for automatic segmentation of brain MRI. IEEE Trans
[28] Ashburner J. and Friston K.J. (2005) Unified segmentation. Neuroimage 26, 839–851.
[29] Ruan S., Jaggi C., Xue J., Fadili J., and Bloyet D. (2000) Brain tissue classification of magnetic
resonance images using partial volume modeling. IEEE Trans Med Imaging 19, 1179–1187.
[30] Ballester M.A.G., Zisserman A.P., and Brady M. (2002) Estimation of the partial volume effect
[31] Leemput K.V., Maes F., Vandermeulen D., and Suetens P. (2003) A unifying framework for
partial volume segmentation of brain MR images. IEEE Trans Med Imaging 22, 105–119.
[32] Friston K.J., Frith C.D., Liddle P.F., Dolan R.J., Lammertsma A.A., and Frackowiak R.S.
(1990) The relationship between global and local changes in PET scans. J Cereb Blood Flow
[34] Pohl K.M., Bouix S., Nakamura M., Rohlfing T., McCarley R.W., Kikinis R., Grimson W.E.L.,
Shenton M.E., and Wells W.M. (2007) A hierarchical algorithm for MR brain image parcella-
[35] Pieper S., Lorensen B., Schroeder W., and Kikinis R. (2006) The NA-MIC kit: ITK, VTK,
pipelines, grids and 3D Slicer as an open platform for the medical image computing community.
In Proceedings of the 3rd IEEE International Symposium on Biomedical Imaging: From Nano
[36] Goualher G.L., Procyk E., Collins D.L., Venugopal R., Barillot C., and Evans A.C. (1999) Au-
tomated extraction and variability analysis of sulcal neuroanatomy. IEEE Trans Med Imaging
18, 206–217.
[37] Fischl B., van der Kouwe A., Destrieux C., Halgren E., Sgonne F., Salat D.H., Busa E.,
Seidman L.J., Goldstein J., Kennedy D., Caviness V., Makris N., Rosen B., and Dale A.M.
(2004) Automatically parcellating the human cerebral cortex. Cereb Cortex 14, 11–22.
[38] Heckemann R.A., Hajnal J.V., Aljabar P., Rueckert D., and Hammers A. (2006) Automatic
anatomical brain MRI segmentation combining label propagation and decision fusion. Neu-
[39] Destrieux C., Fischl B., Dale A., and Halgren E. (2010) Automatic parcellation of human
cortical gyri and sulci using standard anatomical nomenclature. Neuroimage 53, 1–15.
[40] Dale A.M., Fischl B., and Sereno M.I. (1999) Cortical surface-based analysis. I. segmentation
[41] Fischl B., Sereno M.I., and Dale A.M. (1999) Cortical surface-based analysis. II: Inflation,
[42] Klein A. and Hirsch J. (2005) Mindboggle: a scatterbrained approach to automate brain
N.C. (2007) Longitudinal and cross-sectional analysis of atrophy in alzheimer’s disease: cross-
[44] de Bresser J., Portegies M.P., Leemans A., Biessels G.J., Kappelle L.J., and Viergever M.A.
(2011) A comparison of MR based segmentation methods for measuring brain atrophy pro-
[45] Wolpert D.H. and Macready W.G. (1997) No free lunch theorems for optimization. IEEE
[46] Prastawa M., Gilmore J.H., Lin W., and Gerig G. (2005) Automatic segmentation of MR
rithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26, 1124–1137.
[48] Avants B.B., Tustison N.J., Song G., Cook P.A., Klein A., and Gee J.C. (2011) A reproducible
evaluation of ANTs similarity metric performance in brain image registration. Neuroimage 54,
2033–2044.
[49] Avants B.B., Yushkevich P., Pluta J., Minkoff D., Korczykowski M., Detre J., and Gee J.C.
(2010) The optimal template effect in hippocampus studies of diseased populations. Neuroim-
[50] Avants B., Klein A., Tustison N., Woo J., , and Gee J.C. (2010) Evaluation of open-access,
automated brain extraction methods on multi-site multi-disorder data. In 16th Annual Meeting
[51] Tustison N.J., Avants B.B., Cook P.A., Zheng Y., Egan A., Yushkevich P.A., and Gee J.C.
(2010) N4ITK: improved N3 bias correction. IEEE Trans Med Imaging 29, 1310–1320.
[52] Sanjay-Gopal S. and Hebert T.J. (1998) Bayesian pixel classification using spatially variant
finite mixtures and the generalized em algorithm. IEEE Trans Image Process 7, 1014–1028.
[53] Li S.Z. (2001) Markov random field modeling in computer vision. Springer-Verlag, London,
UK.
[54] Besag J. (1974) Spatial interaction and the statistical analysis of lattice systems. Journal of
[55] Noe A. and Gee J.C. (2001) Partial volume segmentation of cerebral MRI scans with mixture
model clustering. In M. Insana and R. Leahy, eds., Information Processing in Medical Imaging,
Springer Berlin / Heidelberg, vol. 2082 of Lecture Notes in Computer Science, 423–430.
[56] Lim K.O. and Pfefferbaum A. (1989) Segmentation of MR brain images into cerebrospinal
fluid spaces, white and gray matter. J Comput Assist Tomogr 13, 588–593.
[57] Julin P., Melin T., Andersen C., Isberg B., Svensson L., and Wahlund L.O. (1997) Reliability of
[58] Freeborough P.A., Fox N.C., and Kitney R.I. (1997) Interactive algorithms for the segmen-
tation and quantitation of 3-D MRI brain scans. Comput Methods Programs Biomed 53,
15–25.
[59] Yushkevich P.A., Piven J., Hazlett H.C., Smith R.G., Ho S., Gee J.C., and Gerig G. (2006)
[60] Boykov Y.Y. and Jolly M.P. (2001) Interactive graph cuts for optimal boundary & region
segmentation of objects in N-D images. In Proc. Eighth IEEE Int. Conf. Computer Vision
[61] Besag J. (1986) On the statistical analysis of dirty pictures. Journal of the Royal Royal
[62] Awate S.P., Tasdizen T., Foster N., and Whitaker R.T. (2006) Adaptive Markov modeling for
726–739.
[63] Sled J.G., Zijdenbos A.P., and Evans A.C. (1998) A nonparametric method for automatic
correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging 17, 87–97.
[64] Boyes R.G., Gunter J.L., Frost C., Janke A.L., Yeatman T., Hill D.L.G., Bernstein M.A.,
Thompson P.M., Weiner M.W., Schuff N., Alexander G.E., Killiany R.J., DeCarli C., Jack
C.R., Fox N.C., and Study A.D.N.I. (2008) Intensity non-uniformity correction using N3 on
3-T scanners with multichannel phased array coils. Neuroimage 39, 1752–1762.
[65] Aubert-Broche B., Griffin M., Pike G.B., Evans A.C., and Collins D.L. (2006) Twenty new
digital brain phantoms for creation of validation image data bases. IEEE Trans Med Imaging
25, 1410–1416.
[66] Battaglini M., Smith S.M., Brogi S., and Stefano N.D. (2008) Enhanced brain extraction
[67] Hammers A., Allom R., Koepp M.J., Free S.L., Myers R., Lemieux L., Mitchell T.N., Brooks
D.J., and Duncan J.S. (2003) Three-dimensional maximum probability atlas of the human
brain, with particular reference to the temporal lobe. Hum Brain Mapp 19, 224–247.
[68] Heckemann R.A., Keihaninejad S., Aljabar P., Rueckert D., Hajnal J.V., Hammers A., and Ini-
tiative A.D.N. (2010) Improving intersubject image registration using tissue-class information
51, 221–227.
[69] Nakamura K. and Fisher E. (2009) Segmentation of brain magnetic resonance images for
measurement of gray matter atrophy in multiple sclerosis patients. Neuroimage 44, 769–776.
[70] Shiee N., Bazin P.L., Ozturk A., Reich D.S., Calabresi P.A., and Pham D.L. (2010) A topology-
preserving approach to the segmentation of brain images with multiple sclerosis lesions. Neu-
[71] Bazin P.L. and Pham D.L. (2007) Topology-preserving tissue classification of magnetic reso-
measure of cerebral volume changes from registered repeat MRI. IEEE Trans Med Imaging
16, 623–629.
[73] Westlye L.T., Walhovd K.B., Dale A.M., Espeseth T., Reinvang I., Raz N., Agartz I., Greve
D.N., Fischl B., and Fjell A.M. (2009) Increased sensitivity to effects of normal aging and
[74] Sánchez-Benavides G., Gmez-Ansn B., Sainz A., Vives Y., Delfino M., and Pea-Casanova
aging, mild cognitive impairment, and Alzheimer disease subjects. Psychiatry Res 181, 219–
225.
[75] Chou Y.Y., Lepor N., Avedissian C., Madsen S.K., Parikshak N., Hua X., Shaw L.M., Tro-
janowski J.Q., Weiner M.W., Toga A.W., Thompson P.M., and Initiative A.D.N. (2009) Map-
ping correlations between ventricular expansion and CSF amyloid and tau biomarkers in 240
subjects with Alzheimer’s disease, mild cognitive impairment and elderly controls. Neuroimage
46, 394–410.
[76] Avants B., Cook P.A., McMillan C., Grossman M., Tustison N.J., Zheng Y., and Gee J.C.
(2010) Sparse unbiased analysis of anatomical variance in longitudinal imaging. Med Image
[77] Tustison N., Avants B., Altes T., de Lange E., Mugler J., and Gee J. (2010) Automatic seg-
[78] Tustison N., Avants B., Siqueira M., and Gee J. (2010) Topological well-composedness and
Glamorous Glue: A digital gluing algorithm for topologically constrained front propagation.