Automated recognition and analysis of head thrashes behavior in C. elegans
Automated recognition and analysis of head thrashes behavior in C. elegans
¤ Current Address: School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom
* [email protected]
OPEN ACCESS
no role in study design, data collection and analysis, guide, we envision the broad utility of the framework for diverse problems across different
decision to publish, or preparation of the manuscript. length scales and imaging methods.
Competing Interests: The authors have declared
that no competing interests exist.
Author Summary
New technologies have increased the size and content-richness of biological imaging data-
sets. As a result, automated image processing is increasingly necessary to extract relevant
data in an objective, consistent and time-efficient manner. While image processing tools
have been developed for general problems that affect large communities of biologists, the
diversity of biological research questions and experimental techniques have left many
problems unaddressed. Moreover, there is no clear way in which non-computer scientists
can immediately apply a large body of computer vision and image processing techniques
to address their specific problems or adapt existing tools to their needs. Here, we address
this need by demonstrating an adaptable framework for image processing that is capable
of accommodating a large range of biological problems with both high accuracy and
computational efficiency. Moreover, we demonstrate the utilization of this framework for
disparate problems by solving two specific image processing challenges in the model or-
ganism Caenorhabditis elegans. In addition to contributions to the C. elegans community,
the solutions developed here provide both useful concepts and adaptable image-processing
modules for other biological problems.
Introduction
Diverse imaging techniques exist to provide functional and structural information about bio-
logical specimens in clinical and experimental settings. On the clinical side, new and augment-
ed imaging modalities and contrast techniques have increased the types of information that
can be garnered from biological samples [1]. Similarly, many tools have recently been devel-
oped to enable new and accelerated forms of biological experimentation in both single cells
and multicellular model organisms [2–10]. Increasingly, the capacity for high-throughput ex-
perimentation provided by new optical tools, microfluidics and computer controlled systems
has eased the experimental bottleneck at the level of physical manipulation and raw data collec-
tion. Still, the power of many of these toolsets lies in facilitating the automation of experimental
processes. The ability to perform real-time information extraction from images during the
course of an experiment is therefore a crucial computational step to harnessing the potential of
many of these physical systems (Fig 1A). Even when off-line data analysis is sufficient, the ca-
pability of these systems to generate large, high-content datasets places a large burden on the
speed of the downstream analysis.
Automated image processing and the use of supervised learning techniques have the poten-
tial for bridging this gap between raw data availability and the limitations of manual analysis in
terms of speed, objectivity and sensitivity to subtle changes [11]. In this area, many computer
Fig 1. Overview of biological structure detection using multi-tiered classification. a) Unsupervised image processing techniques are often necessary
to harness the power of emerging imaging and experimental technologies. b) An overview of the proposed generalizable two layer classification architecture
for the autonomous identification of specific biological structures. Intrinsic, computationally simple features and relational or computationally expensive
features are partitioned into two layers to accommodate both structural complexity and efficiency.
doi:10.1371/journal.pcbi.1004194.g001
vision techniques, including some general object detection strategies, have been developed to
address the detection and recognition of faces, vehicles, animals and household objects from
standard camera images [12–17]. While this body of literature solves complex recognition
problems within the domain of everyday objects and images, it is not clear how or whether
they are generalizable to the imaging modalities and object detection problems that arise in bio-
logical image processing. While these techniques have garnered some important but limited
adoption in biological applications[18–28], there is not a systematic methodology by which
these computational approaches can be applied to solving common problems in mining biolog-
ical images [29]. Thus, the development or adaptation of these tools for specific problems has
thus far been relatively opaque to many potential end-users and require a high degree of exper-
tise and intuition.
At the same time, there is a diverse array of specific object recognition problems that arise
in biology. Specifically, extraction of meaningful information from biological images usually
involves the identification of particular structures and calculation of their metrics, rather than
the usage of global image metrics. Depending on the specimen and the experimental platform,
this may range in scale from molecular or sub-cellular structure to individual cells or tissue
structures within a heterogeneous specimen, or entire organisms. While toolsets have already
been developed to address some common needs in biology [19–22, 24, 25, 30–32] and while
powerful algorithmic tools exist for pattern and feature discrimination and decision-making
[33–35], there are still many unaddressed needs in biological image processing.
Here, we present a general scheme for the detection of specific biological structures applica-
ble as a basis for solving a broad set of problems while using non-specific image processing
modules. As opposed to finished, ready-to-use toolsets, which address a limited problem defi-
nition by design, the workflow we propose has the power to simultaneously address the need
for accuracy, problem-specificity, and generalizability; end-users have the opportunity to
choose platforms and customize as needed. We demonstrate the power of this approach for
solving disparate biological image processing problems by developing two widely relevant
toolsets for the multicellular model organism, Caenorhabditis elegans. To address the problems
of extracting region-, tissue- and cell-specific information within a multicellular context, we de-
veloped an image processing algorithm to distinguish the head of the worm under bright-field
imaging and a set of tools for specific cell identification under fluorescence imaging. These de-
velopments demonstrate the flexibility of our framework to accommodate different imaging
modalities and disparate biological structures. The resulting toolsets contribute directing to ad-
dressing two fundamental needs for automated studies in the worm and contribute specific
concepts and modules that may be applied to a broader range of biological problems.
Results
Our framework is a two-tiered classification scheme to identify specific biological structures
within an image (Fig 1B). To identify biological structures of interest, images are first pre-pro-
cessed to condition the data and generate candidates for the structure of interest. In general,
candidates can either be individual pixels or discrete segmented regions generated via a thresh-
olding algorithm applied during pre-processing. To accommodate different image acquisition
setups and acquisition parameters, we propose the use of an image calibration factor, C, in pre-
processing and in all subsequent feature calculation steps. This calibration factor characterizes
the relationship between the digitized and real-world length scales for a specific experimental
setup and can be used to normalize feature and parameter scaling in all image processing steps
(Materials and Methods, S1 Table).
Subsequently we apply a two-layer classification scheme to identify whether the candidates
generated are features of interest. The candidate particles are quantitatively described by two
distinct sets of descriptive features. These features may be derived from intuitive metrics de-
signed to mimic human recognition or abstractions that capture additional information [33,
36]; they are mathematical descriptors that help delineate the structures of interest from other
candidates and will form the basis for classification. Separation of features into two distinct lay-
ers of classification in our proposed scheme serves three purposes. First, it permits conceptual
separation of intrinsic and extrinsic or relational properties of a biological structure. Second, it
permits the inclusion of higher level descriptions of the relationships between structures identi-
fied from the first layer of classification. Finally, it allows computationally expensive features to
only be associated with the second layer, which reduces the number of times these features
must be calculated as low probability candidates have already been removed. Accordingly, the
first layer of classification uses computationally inexpensive, intrinsic features of the candidates
to generate a smaller set of candidates. The second layer addresses additional complexity, and
uses computationally more expensive features or extrinsic features describing the relationship
between candidates, but only on a smaller number of candidates. This two-tier scheme allows
significant reduction in computational time. At each layer of classification, a trained classifier
is used to make a decision about the candidate’s identity based on the features calculated. In
this work, we chose to use support vector machines for all classification steps because of its in-
sensitivity to specific conditioning of feature sets and therefore being more robust [34, 37]. We
note that when constraints of the feature sets are well known, other models including Bayesian
discriminators and heuristic thresholds can also be used. In general, the workflow architecture
presented in Fig 1B permits the identification of generic biological structures and balances the
capability for complexity with computational speed. We describe here two distinct applications
using this two-tier classification methodology.
Fig 2. Preprocessing and feature selection for head versus tail discrimination in C. elegans. a) The limited field of view of high resolution imaging
systems creates a need for spatial positioning along the anterior-posterior axis of the worm. As a landmark for orienting the A-P axis, the head of the worm is
distinguished by the presence of the pharynx and a grinder structure (inset below). b) Preprocessing for bright field structural detection consists of minimum
intensity projection of a sparse z-stack (MP) followed by Niblack local thresholding (BW0) and preliminary filtration of segmented particles to generate
candidates for subsequent classification (BW1). c) In layer 1 of classification, computationally inexpensive, intrinsic properties of the candidates (BW1) are
calculated for SVM classification and reduction of the candidate pool (BW2). d) Two example image processing sequences showing that while the shape-
intrinsic features used in layer 1 of classification significantly reduces the candidate pool, it is insufficient for robust, specific identification of the grinder
particle. e) From the reduced candidate pool, layer 2 of classification utilizes regional properties of the remaining candidates to distinguish the grinder from
other structural and textural elements of the worm body with high specificity, making identification of the head possible on the basis of the presence of the
grinder particle.
doi:10.1371/journal.pcbi.1004194.g002
either too small (less than 37.5 μm2) or too large (greater than 100 μm2) to reduce downstream
computation (BW1 in Fig 2B). The remaining particles are processed through our two-layer
classification scheme to detect the presence of the pharyngeal grinder.
Second, in the feature selection step, distinct mathematical descriptors that may help to de-
scribe and distinguish the structure of interest are calculated for each layer of classification. In
the first layer of classification, intrinsic and computationally inexpensive metrics of the parti-
cles are computed and used as features (Fig 2C and S2 Fig) in classification of the grinder
shape. These features represent a combination of simple, intuitive geometric features, such as
area and perimeter, in addition to higher level measures of the object geometry and invariant
moments suitable for shape description and identification [36]. Training and application of a
classifier with this feature set eliminates candidates on the basis of intrinsic shape (BW2 in Fig
2C). However, the resulting false positives in Fig 2D show that the information within these
shape metrics is insufficient to distinguish the grinder with high specificity.
To refine the description of the biological structure in the second layer classification, we uti-
lize features that describe the relationship of candidate particles to nearby particles and texture
(Fig 2E and S3 Fig). Specifically, we note that the grinder resides inside the terminal bulb of the
pharynx, which is characterized by a distinct circular region of muscular tissue (Fig 2A). Based
on this observation, we define second layer features based on distributions of particle proper-
ties within a circular region around the centroid of the grinder candidate particle (S3 Fig). Not-
ing that the pharyngeal tissue is characterized by textural ranges in the radial direction and
relative uniformity in the angular direction, we build features sets describing both the radial
and angular distributions the surrounding particles (S3 Fig).
Using the features outlined in Fig 2, each classification step is a mathematical model that is
trained to distinguish between structures of interest such as the pharyngeal grinder and irrele-
vant structures generated represented the textures and boundaries of other tissues in the worm.
To allow for supervised training of both the layer 1 and layer 2 classifiers, we annotated a selec-
tion of images (n = 1,430) by manually identifying particles that represent the pharyngeal
grinder. The classifiers can then be trained to associate properties of the feature sets with the
manually specified identity of candidate particles. However, in addition to informative feature
selection and the curation of a representative training set, the performance of SVM classifica-
tion models is subject to several parameters associated with the model itself and its kernel func-
tion [34, 48]. Thus, to ensure good performance of the final SVM model, we first optimize
model parameters based on five-fold cross-validation on the training set (Fig 3A and 3B, Mate-
rials and Methods).
In the parameter selection process, the optimization metric can be designed to reflect the
goals of classification in each layer (Fig 3B). In our application, for the first layer of classifica-
tion, the goal is to eliminate the large majority of background particles while retaining as many
grinder particles in the candidate pool as possible for refined classification in the second layer.
In other words, we aim to minimize false negatives while tolerating a moderate number of false
positives. Therefore, we optimize the SVM parameters via the minimization of an adjusted
error rate that penalizes false negatives more than false positives (Fig 3B). We show that with
an appropriate parameter selection, the first layer of classification can eliminate over 90% of
background particles while retaining almost 99% of the true grinder particles for further analy-
sis downstream (Fig 3B).
To visualize feature and classifier performance, we use Fisher’s linear discriminant analysis to
linearly project the 14 layer 1 features of the training set onto two dimensions that show maxi-
mum separation between grinder and background particles (Fig 3C). A high degree of overlap
between the distributions of the grinder and background particles and high error rates associated
with the trained SVM in this visualization suggest that shape-intrinsic features are insufficient to
fully describe the grinder structure. Nevertheless, the first layer of classification enriches the true
grinder structure candidates in the training set from roughly 6.2% of the original particle set to
40% of the particle set entering into the second layer of classification (Fig 3C). This enriched set
Fig 3. Optimization and training of the two layers of SVM classification for pharyngeal grinder detection. a) To construct the layer 1 classifier with the
specified feature set, five-fold cross-validation with a manually annotated training set is first used to optimize SVM model parameters and ensure
classification performance. b) Classification performance based on the false positive (FPR) and false negative (FNR) error rates observed in five-fold cross-
validation allows selection of an optimal parameter set. c) The full training set and optimized parameters are used to construct the final layer 1 SVM model.
Linear projections of the training set features onto two dimensions show that the layer 1 feature set and the optimized SVM model are insufficient for
identifying the grinder particle with high specificity. d) The second layer of classification refines the final classification decision and is parameter-optimized
using the candidates passed from layer 1 of classification. e) Classification performance based on five-fold cross-validation is used for parameter selection. f)
The reduced layer 2 training set and optimized parameters are used to construct the final layer 2 SVM model. Linear projections of layer 2 features for the
training set demonstrate the capability of a two layer scheme for the detection of the grinder with both high specificity and sensitivity.
doi:10.1371/journal.pcbi.1004194.g003
of candidate particles is used to optimize and train the second layer of classification in a similar
manner (Fig 3D). With appropriate parameter selection, we show that the second layer of classi-
fication is capable of identifying the grinder with sensitivity and specificity above 95% (Fig 3E).
We train the final layer 2 classifier with the reduced training set and these optimized parameters
to yield high classification performance in combination with layer 1 (Fig 3F).
Changes in experimental conditions, the genetic background of the worms under study or
changes to the imaging system, can cause significant variation in the features, and thus degrade
the classifier performance due to overfitting that fails to take into account experimental
Fig 4. Head versus tail classification using grinder detection is robust to changes in experimental conditions and genetic background. a) Changes
in experimental conditions, such as food availability, can alter the bulk morphology and the appearance of worm body in bright field, with potential
consequences for classification accuracy. b) Our head versus tail classification scheme maintains sensitivity and specificity at over 95% at different ages and
feeding conditions despite these biological changes. c) Genetic changes can also induce changes in bulk morphology and texture of the worm. d) Despite not
being represented within the training set, the performance of the classifier is maintained even for mutant worms (dpy-4 (-)) with major morphological changes.
e) Changes in the optics, camera or acquisition parameters can alter the final resolution of images. f) The inclusion of the calibration metric within feature
calculation (S2 Fig and S3 Fig) maintains classifier performance across a two-fold change in image resolution due to alternations in digital binning.
doi:10.1371/journal.pcbi.1004194.g004
variation (Fig 3). To account for this potential variability, we include worms imaged at different
ages and food conditions in the training set of images. To validate the utility and efficacy of the
resulting classification scheme in a real-life laboratory setting, we analyze its performance on
new data sets that were not used in training the classifier. First, in spite of morphological
changes due to experimental conditions (Fig 4A), we show the resulting classification scheme
operates with consistently high performance in distinguishing the head and the tail of the
worm in the new data sets (Fig 4B). Second, while the training set only includes wildtype
worms imaged under different conditions, the morphology and texture of the worm is also sub-
ject to genetic alteration (Fig 4C). To see whether our classification scheme can accommodate
some of this genetic variability, we validate the classification scheme against a mutant strain
(dpy-4(-)) with large morphological changes in the body of the worm (Fig 4C). Finally, changes
in the imaging system can alter the digital resolution of biological structures of interest (Fig
4E). We show that the inclusion of a calibration factor adjusting for the pixel to micron conver-
sion of the imaging system is sufficient for maintain classifier operation across a two-fold
change in the resolution of the imaging system (Fig 4F). Thus, this calibrated classification
scheme can be easily adapted to systems with different camera pixel formats via the calculation
of a new calibration factor.
Fig 5. First layer classification for detection of fluorescently labelled neuronal cells demonstrates generalizability of first layer features for particle
shape classification. a) Stereotypical positioning of the ASI neuron pair in the head of the worm. Many neuronal cells in the worm are organized as similar
pairs near the pharynx. b) Bright field and fluorescent maximum intensity projection showing the appearance and positioning of fluorescently labelled ASI
cells in the head of the worm. c) Preprocessing of raw fluorescent images showing binary image after Niblack thresholding (BW0) and initial filtration of the
candidate set by size (BW1). d) First layer classification of fluorescently labeled neurons shows good generalizability of the first layer feature set developed
for pharyngeal grinder detection for classification based on binary particle shape.
doi:10.1371/journal.pcbi.1004194.g005
Fig 6. Second layer classification for neuron pair detection. a) The first layer of classification is insufficient for rejection of all background particles. b) The
reduced candidate set from the first layer of classification is used to form candidate cell pairs with feature sets describing their relative positioning and
intensities. c) Although classification based on these features is sufficient for accurate cell pair detection in the majority of cases (left), multiple potential cell
pairs are sometimes classified within the same image (right). d) Incorporating probability estimates (shown in panel c) into the SVM model and selecting the
most likely cell pair eliminates these false positives and increases the specificity of the classifier.
doi:10.1371/journal.pcbi.1004194.g006
Fig 7. Second layer classifier for cell pattern recognition and identification. a) Representative maximum intensity projection and schematic
representation of the two neuron pairs in which an insulin-like peptide is expressed. b) The modularity of our scheme permits the preprocessing and layer 1
classification components from neuron pair detection to be re-used for the recognition and identification of these neuron pairs. c) To identify the pattern with
the appropriate cell identifications, properties for all possible combinations and arrangements of the layer 1 candidates are calculated. Here, all six such
candidate sets for 4 candidate particles are shown. d) Validation of the SVM classifier trained with these features shows high specificity but only moderate
sensitivity. e) The lower sensitivity observed for this classification scheme is mainly due to the limit ability to accommodate biological deviations from the
stereotypical arrangement of the neurons while still maintaining high specificity.
doi:10.1371/journal.pcbi.1004194.g007
along with the selection of the most likely candidate in images with multiple positive classifica-
tion results is used to eliminate these false positives. This boosts the specificity of the classifier
without compromising the high sensitivity (Fig 6D). This additional step incorporates the real-
world constraint that, at most, one cell pair exists in each valid image and resolves any conflicts
that may arise in direct classification.
To demonstrate the ability of our framework to detect more complex cellular arrangements,
we use the expression pattern of a worm insulin-like peptide gene (ins-6) in two bilaterally
symmetric neuron pairs (Fig 7A) [50]. In this case, the specificity offered by the ins-6 promoter
is insufficient to offer full cell specificity, requiring the identification of different cells from the
raw fluorescent image. Taking advantage of our modular two-layer architecture, we reuse the
preprocessing and first layer classification tools that we have already constructed to identify a
small number of cell-shaped objects shown in Fig 7B. To detect the tetrad of cells with specifici-
ty for the ASI and ASJ neurons, we construct a relational feature set based on combinations of
neuron pairs (S6 Fig). As shown in Fig 7C, accounting for both correct cell pair identification
n 2
n
and non-repetition of individual cells within the tetrad set, there are ¼ 4ðn4Þ!
n!
2 2
tetrad sets that require feature calculation. Our two-layer architecture is therefore essential for
the construction of such relational feature sets with larger numbers of targets. Without layer 1
classification, description of such complex sets quickly becomes intractable: even 10 candidate
particles generates 1,260 different possible tetrad sets for feature calculation.
Discussion
We have demonstrated the flexibility and computational benefits of our two-layer architecture
in handling two disparate image recognition problems. Using our pipeline, we have developed
two specific solutions addressing common image processing problems for the C. elegans com-
munity. Our contribution of a ready-to-use head-versus-tail classification scheme under
bright-field imaging enables automated high-resolution imaging and stimulus application in a
large range of biological experiments in the worm. Our neuronal cell pair identification appli-
cation forms the basis for approaching the general problem of cell-specific information extrac-
tion within a multicellular context such as the worm. Together, these specific tools permit
automated visual dissection of the multicellular worm at different resolutions that range from
the targeting of rough anatomical regions to cell-specificity.
In addition to the immediate utility of the two examples provided in this work, they are also
representative of two classes of problems that are commonly found in biological image process-
ing. The detection of the pharyngeal grinder demonstrates a general class of problems where
discrete structures are distinguished by both their intrinsic shape and the characteristics of
their local environment. The entire framework, including the feature sets, developed and docu-
mented for this problem can be applied to the recognition of other discrete structures including
subcellular organelles such as nuclei, specific cell types and tissue structures. The detection of
single and multiple cell pairs extends the analysis to stereotypical formations of objects. The
feature sets documented here for analyzing paired objects is directly applicable to the analysis
of many symmetrical structures that arise in biology, such as in the nervous system. However,
with some modification, similar features can be applied to the analysis of different patterns that
may arise in specific biological processes such as development. Finally, the preprocessing mod-
ules developed for these two applications demonstrate the ability to segment out objects of dif-
ferent intensities from both bright-field and fluorescent imaging and are applicable to many
other problem sets.
Our two specific applications also highlight the effectiveness of our algorithm in segregating
complex image recognition problems in both a computationally effective and conceptually
convenient manner. In the detection of the pharyngeal grinder, two-layer construction elimi-
nated the need to compute a large set of regional descriptors by associating them with the sec-
ond layer of classification and therefore a smaller candidate set. In comparison with direct
calculation of all features in a single layer of classification, the two-layer architecture employed
in this work reduced average total computational time by a factor of two (S7 Fig). In cell
Conclusion
Beyond the specific applications we discuss here, we envision that our methodology can be a
powerful way to tackle a broad range of biological image processing problems. For instance, we
consider our scheme to be a generalization of the previously reported application of SVMs to-
wards the understanding of synaptic morphology in C. elegans [24]. In this application, indi-
vidual pixels within the image form the pool of candidates for potential synaptic pixels in the
first layer classification. The second layer of classification then refines this decision on the basis
of relational characteristics between candidates. Here, we formalize this classification approach
and demonstrate that it can be adapted towards detection of disparate structures imaged under
different imaging modalities. The imaging processing approach we present here has inherent
structural advantages in terms of conceptual division, modularization and computational effi-
ciency and demonstrates the application of a powerful supervised learning model to streamline
biological image processing. We thus envision that our methodology can form the basis for de-
tection algorithms for structures ranging from the molecular to the tissue or organismal level
under different experimental methodologies.
In order to generate binary particles for classification, we use a local thresholding algorithm
that uses information about the mean and variability of pixel intensities within a local region
around a pixel:
μlocal and σlocal are the means and standard deviations of all pixel values that fall within a
square region of width 2R + 1 centered around the pixel of interest xi, yi and k is a parameter
specifying the stringency of the threshold. μlocal and σlocal can be derived using standard image
filtering with a binary square filter h(xi, yj) of width 2R + 1:
1 1 X M X N
mlocal ¼ 2 filterðMP½i; j; h½i; jÞ ¼ 2 MP½i; jhði m; j nÞ
ð2R þ 1Þ ð2R þ 1Þ m¼0 n¼0
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 2
slocal ¼ 2 filterððMP½i; j mlocal ½i; jÞ ; h½i; jÞ
ð2R þ 1Þ
Using local mean and standard deviation information in the binary decision affords robust-
ness against local background intensity and texture changes.
The width of the local region, R, can be roughly selected on the basis of the size scale of the
structure of interest. In accordance with the size scales of the pharyngeal structure and individ-
ual neurons, we use R = 15μm for detection of the pharyngeal grinder and R = 5μm for fluores-
cent cell segmentation.
The parameter k can be roughly selected by visual inspection of segmentation results. We
use k = 0.75 for our bright field application and k = 0.85 for our fluorescence application. Indi-
vidual candidate particles in the resulting binary image are defined as groups of nonzero pixels
that are connected to each other via any adjacent of diagonal pixel (8-connected). We note that
changes in k can alter the size of segmented particles and the connectivity of segmented parti-
cles. Particularly in bright field, where the contrast mechanism lacks specificity, decreases in k
can cause particles to merge via small bridges of dark texture. In order to build in some robust-
ness against changes in k and background texture in these scenarios, we perform a form of a
morphological opening operation after thresholding to remove small bridges that may arise be-
tween otherwise distinct particles. To do this, we perform a morphological erosion with a small
circular structuring element followed by a morphological dilation with a smaller structuring el-
ement [53].
In order to fully capture both intrinsic and secondary characteristics of biological structures,
we calculate distinct sets of features for two layers of classification. The first layer, which delin-
eates structures of interest from other structures on the basis on its intrinsic geometric proper-
ties, is generally applicable to particle classification problems and is used for both the bright
field and fluorescent structure detection outlined here. Details and equations for the calculation
of the 14 features for layer 1 classification can be found in S2 Fig.
Secondary characteristics of biological structures describe the context in which structures
exist and their relationship to other structures. Due to the large variability in the secondary
characteristics of biological structures, a generic set of features is not necessarily attainable or
desirable due to concerns for computational efficiency. Rather, secondary features can be
The use of this calibration system renders the trained classifier relatively invariable to small
changes in the imaging set-up via conversion of all features into real units. Calibration factors
for all imaging systems and configurations used here can be found in S1 Table.
To implement discrete classification steps using support vector machines, we use the
LIBSVM library, which is freely available for multiple platforms including MATLAB [37]. For
general performance, we train use a Gaussian radial basis function kernel for all of our trained
classifiers [48]. To ensure performance of the SVM model for our datasets, we optimize the
penalty or margin parameter, CSVM, and the kernel parameter, γ, for each training set using the
five-fold cross-validation performance of the classifier as the output metric. For efficient pa-
rameter optimization, we start with a rough exponential grid search (Fig 3B and 3D and S4
Fig) and refine parameter selection with a finer grid search based on these results. To adjust for
the relative proportions of positive and negative candidates in unbalanced training sets (Fig
3C), we also adjust the relative weight, W, of the classes according to their representation in the
training set while training [37]. Additionally, we perform a small grid search for the optimal
weighting factor to fully optimize the following performance metric. Probability estimates for
single and multiple neuron pair identification are derived according to the native LIBSVM al-
gorithm [37].
For visualization of the high dimensionality feature sets (Fig 3C and 3F), we apply Fisher’s
linear discriminant analysis [54]. The two projection directions are chosen to be the first two
eigen vectors of:
S1
w SB
SB ¼ ðm1 m2 Þðm1 m2 Þ
T
X
Sw ¼ S1 þ S2 ; Si ¼ ðx mi Þðx mi Þ
T
x2class i
Supporting Information
S1 Fig. Images Collected with Standard Agar Pad Techniques Can Also Be Subjected to the
same Analysis for Identification of the Grinder. a, b and c show three representative images
of day 2, well-fed adult worms acquired using standard agar pad imaging techniques. The inter-
mediate outputs of grinder detection (MP, BW0, BW1, BW2, BW3) show the minimally pro-
jected image, the binary image after thresholding, the initial particle candidate set, the
candidate set after the first layer of classification and the final particle set after the second layer
of classification, respectively. The same process developed for head versus tail analysis on
microfluidic chip robustly identifies the grinder structure in these conventionally acquired
images.
(TIF)
S2 Fig. Robust Descriptors for Binary Particle Shape for Layer 1 of Classification Scheme.
a) Table of 14 features for binary shape description including low-level geometric descriptors,
more complex derived measures of geometry and invariant moments. b) Diagram of binary
particle indicating variables used for feature definition. c) Illustration and example of defining
and calculating the perimeter of an irregular particle based on pixel connectivity. d) Illustration
and example of the convex hull of a binary particle.
(TIF)
S3 Fig. Regional Descriptors for Structural Detection of the Pharyngeal Grinder. a) Dia-
gram of the region of interest around a grinder particle showing changes in texture and particle
density along radial partitions. b) Diagram of the region of interest around a grinder particle
distinguishing individual particles using different colors and showing particle distributions
along angular partitions. c) Table of 34 features used to describe regional characteristics of the
grinder particle for the second layer of classification.
(TIF)
S4 Fig. Parameter selection for the first and second layer classifiers in neuron pair identifi-
cation. Optimized parameters for the first layer classifier (a), the second layer single pair classi-
fier (b) and the second layer two pair classifier (c) show considerable variability, reinforcing
the need for case-specific parameter optimization.
(TIF)
S5 Fig. Relational features for pairs of neurons. a) Maximum intensity projection (MP) and
binary image (BW2) showing candidate particles after the first layer of classification with rele-
vant axes and regions labeled. b) Identification of possible pairs for feature calculation and
schematic of an example feature set for one pair. c) Table of the four relational features used to
describe cell pair patterns.
(TIF)
S6 Fig. Relational features for multiple cell pair detection and identification. a) Maximum
intensity projection (MP) and binary image showing candidate particles after layer 1 classifica-
tion (BW2) with relevant axes and regions labeled. b) Enumeration of the possible neuron pairs
and the possible sets of neuron pairs with correct distinction between the ASI and ASJ pairs. c)
Schematic showing the frame of reference (XC, YC) for the calculation of the relative location of
each neuron and the intensities of the neurons within two particular sets. d) Table showing
that 6 properties are calculated for each neuron pair, resulting in a total of 12 relational features
to identify the tetrad of neurons.
(TIF)
S7 Fig. Computational time savings associated with two-layer classification architecture for
head versus tail detection. a) Schematic comparisons of the two-layer, serial classification ar-
chitecture employed in this work and an equivalent single-layer, parallel classification architec-
ture used for time comparisons. b) Comparison of process-specific and total time requirements
for the two-layer and equivalent one-layer architectures. Reducing second-layer feature calcu-
lations using the two-layer scheme results in over a two-fold reduction in total classification
time. All times are based on performance on MATLAB 2013b running on a quad core
Acknowledgments
The authors would like to gratefully acknowledge Brad Parker and Jeffrey Andrews for machin-
ing hardware necessary for this work, and Dhaval S. Patel for critical commentary on
the manuscript.
Author Contributions
Conceived and designed the experiments: MZ MMC HL. Performed the experiments: MZ. An-
alyzed the data: MZ. Contributed reagents/materials/analysis tools: EVE AC DAFdA QC.
Wrote the paper: MZ MMC QC HL. Designed the software used in analysis: MZ MMC.
References
1. Brant WE, Helms CA. Fundamentals of diagnostic radiology: Lippincott Williams & Wilkins; 2012.
2. San-Miguel A, Lu H. Microfluidics as a tool for C. elegans research. WormBook: the online review of C
elegans biology. 2013:1–19.
3. Zhan M, Chingozha L, Lu H. Enabling Systems Biology Approaches Through Microfabricated Systems.
Analytical Chemistry. 2013; 85(19):8882–94. doi: 10.1021/ac401472y PMID: 23984862
4. Fenno L, Yizhar O, Deisseroth K. The development and application of optogenetics. Annual review of
neuroscience. 2011; 34:389–412. doi: 10.1146/annurev-neuro-061010-113817 PMID: 21692661
5. Larsch J, Ventimiglia D, Bargmann CI, Albrecht DR. High-throughput imaging of neuronal activity in
Caenorhabditis elegans. Proceedings of the National Academy of Sciences of the United States of
America. 2013; 110(45):E4266–73. doi: 10.1073/pnas.1318325110 PMID: 24145415
6. Palmer AE, Qin Y, Park JG, McCombs JE. Design and application of genetically encoded biosensors.
Trends in biotechnology. 2011; 29(3):144–52. doi: 10.1016/j.tibtech.2010.12.004 PMID: 21251723
7. Kocabas A, Shen C-H, Guo ZV, Ramanathan S. Controlling interneuron activity in Caenorhabditis ele-
gans to evoke chemotactic behaviour. Nature. 2012; 490(7419):273–7. doi: 10.1038/nature11431
PMID: 23000898
8. Shaffer SM, Wu M-T, Levesque MJ, Raj A. Turbo FISH: A Method for Rapid Single Molecule RNA
FISH. PloS one. 2013; 8(9):e75120. doi: 10.1371/journal.pone.0075120 PMID: 24066168
9. Raj A, Van Den Bogaard P, Rifkin SA, Van Oudenaarden A, Tyagi S. Imaging individual mRNA mole-
cules using multiple singly labeled probes. Nature methods. 2008; 5(10):877. doi: 10.1038/nmeth.1253
PMID: 18806792
10. Brown AE, Schafer WR. Unrestrained worms bridled by the light. nature methods. 2011; 8(2):129–30.
doi: 10.1038/nmeth0211-129 PMID: 21278723
11. Eliceiri KW, Berthold MR, Goldberg IG, Ibanez L, Manjunath BS, Martone ME, et al. Biological imaging
software tools. Nat Meth. 2012; 9(7):697–710.
12. Everingham M, Eslami SMA, Van Gool L, Williams CI, Winn J, Zisserman A. The Pascal Visual Object
Classes Challenge: A Retrospective. Int J Comput Vis. 2014:1–39.
13. Papageorgiou CP, Oren M, Poggio T, editors. A general framework for object detection. Computer Vi-
sion, 1998 Sixth International Conference on; 1998 4–7 Jan 1998.
14. Viola P, Jones M, editors. Rapid object detection using a boosted cascade of simple features. Comput-
er Vision and Pattern Recognition, 2001 CVPR 2001 Proceedings of the 2001 IEEE Computer Society
Conference on; 2001 2001.
15. Lienhart R, Kuranov A, Pisarevsky V. Empirical Analysis of Detection Cascades of Boosted Classifiers
for Rapid Object Detection. In: Michaelis B, Krell G, editors. Pattern Recognition. Lecture Notes in Com-
puter Science. 2781: Springer Berlin Heidelberg; 2003. p. 297–304.
16. Mohan A, Papageorgiou C, Poggio T. Example-based object detection in images by components. Pat-
tern Analysis and Machine Intelligence, IEEE Transactions on. 2001; 23(4):349–61.
17. Papageorgiou C, Poggio T. A Trainable System for Object Detection. Int J Comput Vis. 2000; 38(1):15–
33.
18. Boland MV, Markey MK, Murphy RF. Automated recognition of patterns characteristic of subcellular
structures in fluorescence microscopy images. Cytometry. 1998; 33(3):366–75. PMID: 9822349
19. Bao Z, Murray JI, Boyle T, Ooi SL, Sandel MJ, Waterston RH. Automated cell lineage tracing in Caenor-
habditis elegans. Proceedings of the National Academy of Sciences of the United States of America.
2006; 103(8):2707–12. PMID: 16477039
20. Murray JI, Bao Z, Boyle TJ, Boeck ME, Mericle BL, Nicholas TJ, et al. Automated analysis of embryonic
gene expression with cellular resolution in C. elegans. Nat Methods. 2008; 5(8):703–9. doi: 10.1038/
nmeth.1228 PMID: 18587405
21. Santella A, Du Z, Nowotschin S, Hadjantonakis AK, Bao Z. A hybrid blob-slice model for accurate and
efficient detection of fluorescence labeled nuclei in 3D. BMC bioinformatics. 2010; 11:580. doi: 10.
1186/1471-2105-11-580 PMID: 21114815
22. Huang K-M, Cosman P, Schafer WR. Machine vision based detection of omega bends and reversals in
C. elegans. Journal of neuroscience methods. 2006; 158(2):323–36. PMID: 16839609
23. Yemini E, Jucikas T, Grundy LJ, Brown AE, Schafer WR. A database of Caenorhabditis elegans behav-
ioral phenotypes. Nature methods. 2013; 10(9):877–9. doi: 10.1038/nmeth.2560 PMID: 23852451
24. Crane MM, Stirman JN, Ou C-Y, Kurshan PT, Rehg JM, Shen K, et al. Autonomous screening of C. ele-
gans identifies genes implicated in synaptogenesis. Nature methods. 2012; 9(10):977–80. doi: 10.
1038/nmeth.2141 PMID: 22902935
25. Restif C, Ibáñez-Ventoso C, Vora MM, Guo S, Metaxas D, Driscoll M. CeleST: Computer Vision Soft-
ware for Quantitative Analysis of C. elegans Swim Behavior Reveals Novel Features of Locomotion.
PLoS Comput Biol. 2014; 10(7):e1003702. doi: 10.1371/journal.pcbi.1003702 PMID: 25033081
26. Ranzato M, Taylor P, House J, Flagan R, LeCun Y, Perona P. Automatic recognition of biological parti-
cles in microscopic images. Pattern Recognition Letters. 2007; 28(1):31–9.
27. Dankert H, Wang L, Hoopfer ED, Anderson DJ, Perona P. Automated monitoring and analysis of social
behavior in Drosophila. Nat Meth. 2009; 6(4):297–303.
28. Yin Z, Sadok A, Sailem H, McCarthy A, Xia X, Li F, et al. A screen for morphological complexity identi-
fies regulators of switch-like transitions between discrete cell shapes. Nature cell biology. 2013; 15
(7):860–71. doi: 10.1038/ncb2764 PMID: 23748611
29. Ljosa V, Sokolnicki KL, Carpenter AE. Annotated high-throughput microscopy image sets for validation.
Nat Meth. 2012; 9(7):637.
30. Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, et al. CellProfiler: image analy-
sis software for identifying and quantifying cell phenotypes. Genome biology. 2006; 7(10):R100. PMID:
17076895
31. Wählby C, Kamentsky L, Liu ZH, Riklin-Raviv T, Conery AL, O’Rourke EJ, et al. An image analysis
toolbox for high-throughput C. elegans assays. Nature methods. 9(7):714. doi: 10.1038/nmeth.1984
PMID: 22522656
32. Albrecht DR, Bargmann CI. High-content behavioral analysis of Caenorhabditis elegans in precise spa-
tiotemporal chemical environments. Nature methods. 2011; 8(7):599–605. doi: 10.1038/nmeth.1630
PMID: 21666667
33. Sonka M, Hlavac V, Boyle R. Image processing, analysis, and machine vision: Cengage Learning;
2014.
34. Ben-Hur A, Ong CS, Sonnenburg S, Schölkopf B, Rätsch G. Support vector machines and kernels for
computational biology. PLoS computational biology. 2008; 4(10):e1000173. doi: 10.1371/journal.pcbi.
1000173 PMID: 18974822
35. Shamir L, Delaney JD, Orlov N, Eckley DM, Goldberg IG. Pattern recognition software and techniques
for biological image analysis. PLoS computational biology. 2010; 6(11):e1000974. doi: 10.1371/journal.
pcbi.1000974 PMID: 21124870
36. Zhang D, Lu G. Review of shape representation and description techniques. Pattern recognition. 2004;
37(1):1–19.
37. Chang C-C, Lin C-J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent
Systems and Technology (TIST). 2011; 2(3):27.
38. Chung K, Crane MM, Lu H. Automated on-chip rapid microscopy, phenotyping and sorting of C. ele-
gans. Nature methods. 2008; 5(7):637–43. doi: 10.1038/nmeth.1227 PMID: 18568029
39. Rohde CB, Zeng F, Gonzalez-Rubio R, Angel M, Yanik MF. Microfluidic system for on-chip high-
throughput whole-animal sorting and screening at subcellular resolution. Proceedings of the National
Academy of Sciences. 2007; 104(35):13891–5. PMID: 17715055
40. Swierczek NA, Giles AC, Rankin CH, Kerr RA. High-throughput behavioral analysis in C. elegans. Na-
ture methods. 2011; 8(7):592–8. doi: 10.1038/nmeth.1625 PMID: 21642964
41. Stirman JN, Crane MM, Husson SJ, Wabnig S, Schultheis C, Gottschalk A, et al. Real-time multimodal
optical control of neurons and muscles in freely behaving Caenorhabditis elegans. Nature methods.
2011; 8(2):153–8. doi: 10.1038/nmeth.1555 PMID: 21240278
42. Leifer AM, Fang-Yen C, Gershow M, Alkema MJ, Samuel AD. Optogenetic manipulation of neural activ-
ity in freely moving Caenorhabditis elegans. Nature methods. 2011; 8(2):147–52. doi: 10.1038/nmeth.
1554 PMID: 21240279
43. Ramot D, Johnson BE, Berry TL, Carnell L, Goodman MB. The Parallel Worm Tracker: a platform for
measuring average speed and drug-induced paralysis in nematodes. PloS one. 2008; 3(5):e2208–e.
doi: 10.1371/journal.pone.0002208 PMID: 18493300
44. Brunk UT, Terman A. Lipofuscin: mechanisms of age-related accumulation and influence on cell func-
tion. Free radical biology & medicine. 2002; 33(5):611–9.
45. Avery L, Thomas JH. Feeding and defecation. In: Riddle DL BT, Meyer BJ, et al., editor. C elegans II.
2nd ed. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 1997.
46. Chow DK, Glenn CF, Johnston JL, Goldberg IG, Wolkow CA. Sarcopenia in the Caenorhabditis ele-
gans pharynx correlates with muscle contraction rate over lifespan. Experimental gerontology. 2006;
41(3):252–60. PMID: 16446070
47. Johnston J, Iser WB, Chow DK, Goldberg IG, Wolkow CA. Quantitative Image Analysis Reveals Dis-
tinct Structural Transitions during Aging in Caenorhabditis elegans Tissues. PLoS ONE. 2008; 3(7):
e2821. doi: 10.1371/journal.pone.0002821 PMID: 18665238
48. Hsu C-W, Chang C-C, Lin C-J. A practical guide to support vector classification. 2003.
49. Wu T-F, Lin C-J, Weng RC. Probability estimates for multi-class classification by pairwise coupling. The
Journal of Machine Learning Research. 2004; 5:975–1005.
50. Cornils A, Gloeck M, Chen Z, Zhang Y, Alcedo J. Specific insulin-like peptides encode sensory informa-
tion to regulate distinct developmental processes. Development. 2011; 138(6):1183–93. doi: 10.1242/
dev.060905 PMID: 21343369
51. Bray MA, Fraser AN, Hasaka TP, Carpenter AE. Workflow and metrics for image quality control in
large-scale high-content screens. Journal of biomolecular screening. 2012; 17(2):266–74. doi: 10.
1177/1087057111420292 PMID: 21956170
52. Stiernagle T. Maintenance of C. elegans. WormBook: the online review of C elegans biology. 2006:1–
11.
53. Dougherty ER, Lotufo RA, SPIE TISfOE. Hands-on morphological image processing: SPIE press Bel-
lingham; 2003.
54. Scholkopft B, Mullert K-R. Fisher discriminant analysis with kernels. Neural networks for signal pro-
cessing IX. 1999.