0% found this document useful (0 votes)

57 views8 pages

A Robust Shape Model For Multi-View Car Alignment: Yan Li Leon Gu Takeo Kanade

We present A Robust Shape Model for localizing a set of feature points on a 2d image. Previous shape alignment models assume Gaussian observation noise. Such an assumption is vulnerable to gross feature detection errors.

Uploaded by

Ramsey President

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views8 pages

A Robust Shape Model For Multi-View Car Alignment: Yan Li Leon Gu Takeo Kanade

Uploaded by

Ramsey President

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

A Robust Shape Model for Multi-view Car Alignment

Yan Li Leon Gu Takeo Kanade

Carnegie Mellon University {yanli,gu,tk}@cs.cmu.edu

Abstract
We present a robust shape model for localizing a set of feature points on a 2D image. Previous shape alignment models assume Gaussian observation noise and attempt to t a regularized shape using all the observed data. However, such an assumption is vulnerable to gross feature detection errors resulted from partial occlusions or spurious background features. We address this problem by using a hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate object shape and pose hypotheses from randomly sampled partial shapes - subsets of feature points. The hypotheses are then evaluated to nd the one that minimizes the shape prediction error. The proposed model can effectively handle outliers and recover the object shape. We evaluate our approach on a challenging dataset which contains over 2,000 multi-view car images and spans a wide variety of types, lightings, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.

model or undesirable conditions such as shadows and occlusions. Due to the iterative nature of most algorithms, these gross errors may become arbitrarily large and therefore cannot be averaged out, as is typically done in the least-squares framework. Rogers and Graham [18] attempt to address this problem by use of M-estimators. However, M-estimators tend to suffer from local optima, and pose parameters have been ignored in their model. Another limitation in the previous models is the sensitivity to initialization. The objective functions are usually highly nonlinear and a suboptimal initialization may cause the model to get stuck at local minimums. Previous work attempt to tackle this problem by sampling - starting from multiple initializations and choosing the optimal resulting shape [17, 23]. However, each individual sample was evaluated and matched in a least-squares fashion, so that the alignment process could still fail in the presence of outliers. It is also unclear how many samples are sufcient to achieve the best solution. In this paper, we address these two problems in a hypothesis-and-testing framework. Our key insight is the following: since object shape typically resides in a lowdimensional subspace, the degree-of-freedom of a shape model is considerably less than the number of the observed features; therefore, a small subset of good features are sufcient to jump start the matching and produce a reasonable estimate. We adopt the random sample consensus (RANSAC) paradigm of Fischler and Bolles [9]. In particular, a Bayesian inference algorithm is developed for a generating shape and pose hypothesis from a randomly sampled subset of features; each hypothesis is matched against the full observation by a robust measure to identify the optimal one; and the hypothesis is further rened by incorporating more inliers into the corresponding subset. We apply the approach to multi-view car alignment identifying detailed car shapes from different viewpoints. The task is challenging because car images are often subject to signicant amount of occlusions, and detecting individual parts are difcult. Combining our alignment model with a random forest [2] based detector we develop a robust, fully automatic car alignment system.

1. Introduction
Deformable shape matching has been studied extensively in the past two decades with the emphasis on the alignment of human faces and anatomical structures. Representative work include Snakes [14], Active Shape Model [4] and Active Appearance Model [3], Bayesian shape model [24, 13], nonlinear shape models [4, 19, 25], view-based [6] and three dimensional models [1, 12], and models for weak initialization [16, 17, 23]. A common assumption in these models is that the observation noise is Gaussian distributed. However, in realworld images shape observations are usually corrupted by large-scale measurement errors which are in gross disagreement with the true underlying shape. Such errors, usually called outliers, are caused by the failures of the appearance
Partial support provided by National Science Foundation (NSF) Grant IIS-0713406.

978-1-4244-3991-1/09/$25.00 2009 IEEE

2466

2. Problem Formulation
Consider the shape of a deformable object which consists of a set of 2D landmark points. Let Y = T (u1 , v1 , . . . , uN , vN ) denote the locations of the points observed from an input image. The observation contains not only noises, but also gross outliers. Our goal is to estimate the true underlying shape from such observation, and identify the outliers. Instead of using the whole observation Y for estimation, we will rst use a randomly selected subset of Y , denoted by Yp , to generate a shape hypothesis. The subset of points are postulated as inliers which, by assumption, satisfy the underlying noise model, Yp = Mp T (S) + . (1)

3. Generating Hypotheses from Partial Shapes

In this section, we develop a Bayesian Partial Shape Inference (BPSI) algorithm to estimate the model parameter = {b, } iteratively. Detailed algorithm is shown in Alg. 1. The inference can be performed by the standard EM algorithm. In the E-step, we compute the posterior of S given the partial observation Yp and (t1) . Note that S represents an augmented shape which can be decomposed into the partial observation Sp and the hallucinated shape Sh . In particular, the posterior means are given by Eqn. 6 and 7. It shows that the hallucinated shape Sh is generated completely from the shape prior, while Sp subsumes two sources of information: one arises from the observation Yp ; the other reects the subspace constraint on b. Sp is essentially a weighted average of the observation and prior and the weight is determined by the two sources of noise. In the M-step, we optimize (t) which maximize the expectation of the complete log-likelihood log p(Yp , S|(t) ) over the posterior of S obtained in E-step. It shows that b and can be optimized independently. One important parameter that has yet to be dened is the observation variance = diag{2 , 2 , . . . , 2 , 2 }. 1 1 M M Since shape alignment can be viewed as an iterative model tting process, the observation noise can be estimated from the last iteration. In our implementation, i is dened as the prediction residual 2 = ||Y (t1) T(t1) (S (t1) )|| i (5)

The vector S denotes the normalized true shape which we refer to as canonical shape. It is transformed onto the image plane by T (S) = sRS + t with rotation R, scale s and translation t. Mp is a 2M 2N indicator matrix which species the subset. Observation noise N (0, ) is assumed to be independent for individual points. One should note that large-scale measurement errors will not conform with the Gaussian noise assumption, therefore the model (1) applies only to Yp . The canonical shape S is parameterized by a probabilistic PCA model [22, 24], S = + b + (2)

with the mean shape , the low-dimensional eigen subspace spanned by , and the shape deformation parameter b. Each element of b controls the magnitude of deformation along the corresponding axis in the subspace. A diagonal prior b N (0, ) (3)

where ||, || denotes the Euclidean distance, and T is the rigid transform which brings the canonical shape S to the observation space by .

3.1. Discussion
The BPSI algorithm provides us some insights to the noise-presenting shape model. However, from the optimization point of view, the objective function and search method remain obscure. In this section, we re-examine the BPSI algorithm and focus on its optimization method. In Step 6, we rst compute the posterior mean of p(b|S) which can be viewed as the probabilistic version of PCA projection. In addition to the subspace projection performed in PCA, BPSI applies an inhomogeneous shrinkage on each subspace dimension. The shrinkage parameter is dened by i = i i + 2 (i = 1, . . . , r) (9)

is put on b, where = diag {1 , 2 , . . . , r }, and i s are eigenvalues. The shape noise is chosen to be isotropic, N 0, 2 I , and its variance 2 = 2N1r 2N i is i=r+1 determined by the residual, off-eigenspace shape energy. Combining (1)(3), we have established a hierarchical probabilistic model that can be used for generating hypotheses. Specically, our problem is to estimate shape deformation b and pose = {R, s, t} from a partial shape Yp , i.e., nd the MAP {b , } = argmax p(b, |Yp ). This is a typb,

ical missing data problem that can be solve by ExpectationMaximization as described in Sec. 3. Given a hypothesis of b and , we can easily hallucinate the rest part of the shape Yh = Mh (sRS + t), (4)

where Mh is a binary matrix that indicates the remaining set of points. The hallucinate shape Yh is then used to test the hypothesis. Sec. 4 explains the details.

Recall that 2 is the average of the remaining eigen-values. Since b captures signicant amount of variance (98% in our implementation), 2 has a very small value (i.e., 2 0 and i 1). This implies that the PCA projection and reconstruction in Step 6 would not alter Sp substantially.

2467

Algorithm 1 Bayesian Partial Shape Inference (BPSI) Input: Partial observation Yp . b and from last iteration. Output: Updated b and . Initialize b = b and = for t = 1 to T do 3: E-Step: 4: Update Sp by blending, and Sh by reconstruction
1: 2:
1 Sp W1 T (Yp ) + W2 (b + )p Sh (b + )h

1 0.9 0.8 0.7 0.6 w 0.5 0.4

(6) (7)

0.3 0.2 0.1

where

W1 = s2 2 (s2 2 I + )1 W 2 = I W1

0 10

5: 6:

M-Step: Estimate shape b ( + 2 I)1 t (S ) Sp (b + )p

Figure 1. Graphic representation of the quasiconvex weight function.

4. Testing Hypotheses by RANSAC

If we have a priori knowledge about Yp such that it contains only inliers, the partial inference algorithm provides a principled framework for shape and pose estimation. However, a random feature subset may potentially contain outliers and the tted parameters can become arbitrary. In this section, we adopt the random sample consensus (RANSAC) paradigm of Fischler and Bolles [9] to generate a large number of hypotheses and identify the optimal feature subset. In the RANSAC framework, a minimal subset of features are used to estimate the model parameters. Specifically, our model requires six parameters (which capture 98% variance) to describe the shape b, and four parameters (scale/rotation/translation) to represent the pose . Since each 2D point provides two constraints on the parameters, ve points are sufcient to form a proposal subset Yp . Ideally every possible subset would be considered, but this is usually computationally infeasible. Fischler and Bolles [9] proposed that the number m of subsets can be chosen sufciently high to achieve statistical signicance. Assuming that the whole set of points may contain up to a fraction of outliers, one can determine m by m= (11) log(1 P ) log(1 (1 )p ) (13)

Estimate pose (Procrustes analysis [11]) arg min ||Yp T (Sp )||

(8) [U, W, V ] SVD(M ) t = sRSp + Yp

M end for

(Sp Sp )(Yp Yp )t ,
t

R=VU ,
8:

s = tr(W )/tr(M ),

Based on this observation, we can plug Eqn. 6 into Eqn. 8

1 Yp T (Sp ) Yp T W1 T (Yp ) + W2 (b + )p

= Yp [W1 Yp + W2 T (b + )p ] = (I W1 )Yp W2 T (b + )p = W2 [Yp T (b + )p ] It shows that Step 7 in BPSI solves a weighted least-squares problem
M

min
i=1

wi (i )||Yi T (Si )||

(10)

The weight wi is a quasiconvex function of i wi (i ) = 2 i s2 2 + 2 i

Fig. 1 shows the prole of the weight function. Recall that i is dened as the prediction residual from the previous step (Eqn. 5). Thus, the BPSI algorithm minimizes the sum of square errors via the iterative reweighted least-squares (IRLS).
M

min
i=1

wi (i

(t1)

)2 (b, ) i

where p = 5 is the number of features in one subset. P is the expected chance that at least one of the proposal subsets is good. In our implementation, we assume = 40% and require P = 0.99, thus m = 57. (k) Given the proposal subsets Yp (k = 1, . . . , K), the resulting shape b can be obtained by the least median of squares (LMedS) estimator [20]
2 min Med ri Yp , Yh k i (k) (k)

(12)

2468

Algorithm 2 Robust Shape Alignment Input: Observation Y . b and from last iteration. Output: Regularized Y . Updated b and .
1: 2:

Generate random subsets Yp , Yp , . . . , Yp for k = 1 to K do (k) 3: b(k) , (k) BPSI Yp , b ,

(1)

(2)

(K)

Hallucination: Yh
(k) i

(k)

Mh T(k) (b(k) + )
(k) (k)

2 Median ri Yp , Yh (k) (k)

Figure 2. The partial shape Yp (red dots) is used to hallucinate the remaining shape Yh (gray dots). The marginal variance of the hallucinated points can be calculated and shown here in ellipses.

6: 7: 8:

end for k arg min

where ri is the residual between the i-th corresponding point of Yp and Yh . In the traditional RANSAC literature, one usually assumes no a priori knowledge about the target model and the voting inliers are assumed to be iid. For instance, in the line tting example, any two points can determine a model and the residual is simply the Euclidean distance from a voting sample to the tted line. However, in a deformable shape alignment task varying amounts of residuals should be accommodated to deal with the inherent shape variation. Note that the the hallucinated shape Yh is generated from b through the canonical shape S. By propagating the information in b, we obtain the prior distribution of Yh

Include more inliers to Yp and 9: Y T (b + )

and run BPSI to rene b

5. Experiments
5.1. The Dataset
We evaluate our model on the MIT StreetScene dataset 1 . This dataset contains over 3,000 street scene images which were originally created for the task of object recognition and scene understanding under uncontrolled environment. We labeled 3,433 cars which span a wide variety of types, sizes, background scenes, lighting conditions, and partial occlusions. All the shapes are normalized to roughly the size 250x130 by the Generalized Procrustean Analysis [8]. The labeled data were manually classied into three views: 1,400 half-front view, 803 prole view and 1,230 half-back view. We randomly select 400 images from each view for training, and the rest for testing. For the occluded landmarks, we place their label at the most probable locations, but the corresponding local patches are excluded during training the appearance model.

E[Yh ] = Mh (sR + t)
t Var[Yh ] = s2 Mh R(t + 2 I)Rt Mh

In general, the points in Yh are correlated, thus the LMedS estimator cannot be applied directly. To remedy this problem, we make an independent assumption and use the marginal variance i of each point to compute the residual

5.2. Learning the Discriminative Appearance Model

The goal of appearance model is to provide an initial shape for the alignment algorithm. Due to the background clutter and substantial variations in color and pose, it is very challenging to capture the local appearance. To address the problem, we take a discriminative approach and learn the appearance density from the data. We generate training samples from the labeled car images of three different views. The car shape is represented by 14, 10, and 14 landmarks respectively. For each landmark, we extract a 40x40 image patch as the positive sample. Negative samples of the same size are extracted uniformly around three concentric-circles centered at the landmark with 5 pixels apart. 36 negative samples are collected
1 http://cbcl.mit.edu/software-datasets/streetscenes/

2 ri (Yp , Yh ) = [Yp (i) Yh (i)] 1 [Yp (i) Yh (i)] (14) i

ri is essentially the Mahalanobis distance between Yp (i) and Yh (i). Fig. 2 illustrated the inhomogeneous prior variance exhibited in Yh . Although the LMedS estimator is highly resistent to outliers, it has a relatively low statistical efciency and the estimate tends to be variable [21]. A post-processing must be employed to incorporate more inliers and re-estimate the model. Alg. 2 summarizes the complete hypothesis-and-test algorithm.

2469

descriptor

Figure 3. Random forest for posterior estimation. The descriptor is dropped to N decision trees. The nal posterior is the average over all the resulting histograms reached by the input descriptor.

(a) (b) (c) Figure 4. (a) A normalized image. We apply the trained random forest on the entire image. Posterior maps are shown for the wheels (b) and the top-right corner (c).

for each landmark. Local patches are further described by the Histogram of Oriented Gradients (HOG) descriptor [7]. The HOG descriptors are computed over dense and overlapping grids of spatial blocks, with image gradient features extracted at 9 orientations and gathered into a 576-dimensional feature vector (we use 8x8 cells, and 2x2 blocks). The extracted descriptors are fed to a Random Forest [2] for discriminative learning. A random forest is essentially an ensemble of decision trees which are induced by bootstrapped data. Specically, we adopt the Extremely Randomized Trees of Geurts et al. [10] for training. The random forest consists of N randomly generated decision trees, each of which is trained by 5000 bootstrapped samples. At each non-terminal node, two random dimensions, denoted by i and j, are chosen from the descriptor d. The splitting measure at that node is specied as B(d) = 1, if d(i) < d(j) 0, otherwise

as we only need to examine a subset of randomly selected feature dimensions. In addition, by combining all the landmarks and training the forest jointly, the model implicitly captures the image context information, thus being able to distinguish between neighboring landmarks. Fig. 4 illustrates the random forest result.

5.3. Performance Evaluation

We compare our approach with Active Shape Model (ASM) [5] and Bayesian Tangent Shape Model (BTSM) [24]. We initialize the car shape by a randomly perturbed mean shape, and the same initialization is applied to all three algorithms. Fig. 5 illustrates two example images. In the rst example (top row), the appearance model is distracted by some bogus background features. ASM attempts to compensate the errors with large pose change, but at the expense of diverging the good features from their true locations. BTSM generates smoother results by assigning different weights on the observation and the shape prior. However, the errors are too large to be accommodated by its Gaussian noise model. Our approach successfully detects the outliers (colored in black) and excludes them from the parameter estimation. The second example shows a typical image with partial occlusion (bottom row). Again, the tting is improved because the occluded features are automatically identied. Fig. 5(e) shows the random hypotheses generated by RANSAC. Although they are all car-like, our algorithm successfully identies the optimal one which enjoys maximum agreement from the observation and the trained shape model. Fig. 6 shows the root mean square error (RMSE) with respect to the labeled ground truth. We observe consistent improvement on the proposed model over the other approaches in all three views. A further investigation shows that our approach achieves comparable result as BTSM on good test images, but performs signicantly better on the images with gross errors. Given that the error is averaged over 2,000 images, the pixel-level improvement is substantial for the alignment task. To investigate the robustness of our algorithm to random initialization, we vary the noise level when perturbing the initial shape. Fig. 7 show the RMSE for different noise levels at each view. As expected, the performance of our

where B(d) indicates the branch that d should continue. At each terminal node, we save a normalized histogram that counts the frequency of each class reaching the node. Our random forest representation is similar to the feature classication trees by Lepetit et al. [15]. However, our task is to estimate the posterior of the landmark given the observed patch rather than classify it into different categories. Since the decision trees are generated randomly, we can even combine all the landmarks into one random forest. In this case, each landmark represents a distinct class, while all the negative samples from different landmarks are combined into one single negative class. The resulting random forest is shown in Fig. 3. Given an input descriptor d, the posterior that it belongs to landmark li is given by p(li |d) = 1 N
N

pj (li |d)
j=1

(15)

where pj (li |d) is the posterior returned by tree Tj . The proposed random forest model offers two benets: First, training and testing the model are extremely efcient

2470

Figure 5. (a) The observed shape. (b) ASM. (c) BTSM. (d) Our approach (solid colored points represent the partial shape that generates the optimal hypothesis; white ones are the inliers included in the renement step; and blacks ones are the outliers rejected by the model). (e) Random shape hypotheses generated by RANSAC. Top row shows an example with spurious background features; and bottom row shows an example with partial occlusion.
view1 (noise=5) 8.5 8 7.5 7 9 RMSE RMSE 8 7 5.5 5 4.5 4 ASM BTSM Our Approach 0 2 4 6 8 landmark index 10 12 14 6 5 4 7 6 5 4 RMSE 6.5 6 9 8 12 11 10 ASM BTSM Our Approach view2 (noise=5) 13 12 11 10 ASM BTSM Our Approach view3 (noise=5)

5 6 landmark index

6 8 landmark index

Figure 6. Test errors for ASM, BTSM and our approach. The initial shape is set to be the mean shape plus 5 pixels random noise on each landmark. For each test image, we use the same initialization for all three methods. The RMSE of each landmark is shown for different views: half-front view (left pane), prole view (middle pane), and half-back view (right pane).

alignment model drops as the noise level increases. However, the average error increases less than 1 pixel even when 20 pixels random shift is added to the initial shape. This is because our algorithm relies on a minimal subset of features to generate a hypothesis, therefore can recover the meaningful shape in a couple of iterations. Traditional approaches are more likely to fail in this case because shape observation is contaminated by more outliers. Fig. 6 shows the landmark-wise average error over the entire test dataset. To investigate the error distribution, we need to make a side by side comparison for each example. We focus on the half-frontal view which contains 1,400 images. For each example, we run BTSM and Robust alignment respectively, using the same initialization. In Fig. 8, we use the sorted error of BTSM as reference and plot the corresponding error of the proposed method. A cubic curve is also tted on the blue plot to provide a global illustration of the error distribution. As we can see, the two methods are comparable on the rst 600 or so examples, while robust method overtakes BTSM in the remaining ones. Further inspections show that many of those difcult examples correspond to occlusion images. Fig. 9 shows some alignment results by our approach.

We demonstrate car images with various viewpoints, lightings, occlusion patterns, and cluttered background.

40 35 30 25 20 15 10 5 0 Robust BTSM Robust (cubic fitting)

200

400

600

800

1000

Figure 8. Side-by-side comparison of BTSM and our approach.

2471

)e(
12 14

)d(

)c(

)b(

)a(

1200

1400

Robust Alignment (view1) 8 7.5 7 6.5 RMSE RMSE 6 5.5 5 4.5 4 noise=0 noise=5 noise=10 noise=20 0 2 4 6 8 landmark index 10 12 14 12 11 10 9 8 7 6 5 4 noise=0 noise=5 noise=10 noise=20

Robust Alignment (view2) 11 noise=0 noise=5 noise=10 noise=20

Robust Alignment (view3)

RMSE 1 2 3 4 5 6 landmark index 7 8 9 10

6 8 landmark index

Figure 7. Test error for our approach using different initializations. The initial shape (mean shape) is perturbed by different levels of noise from 0 to 20 pixels.

6. Conclusions
We have described a RANSAC-based approach for robust object alignment, and applied it to a challenging multiple-view car alignment task. It is encouraging to see that the approach is capable of dealing with large measurement errors such as occlusions. The current algorithm takes locally detected feature point as input. However, there are great potentials for extending the RANSAC framework to operate over multiple, globally detected feature points. We plan to explore this approach in the future work.

References
[1] V. Blanz and T. Vetter. A morphable model for the synthesis of 3d faces. In SIGGRAPH, pages 187194, 1999. 1 [2] L. Breiman. Random forests. Machine Learning, 45:532, 2001. 1, 5 [3] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. PAMI, 23(6):681685, 2001. 1 [4] T. F. Cootes and C. J. Taylor. A mixture model for representing shape variation. Image and Vision Computing, pages 110119, 1997. 1 [5] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape model their training and application. Computer Vision and Image Understanding, 61(1):3859, Jan 1995. 5 [6] T. F. Cootes, K. Walker, and C. J. Taylor. View-based active appearance models. In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000. 1 [7] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005. 5 [8] I. Dryden and K. Mardia. Statistical Shape Analysis. John Wiley & Sons, 1998. 4 [9] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model tting with application to image analysis and automated cartography. pages 381395, 1981. 1, 3 [10] P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63:342, 2006. 5 [11] J. Gower and G. Dijksterhuis. Procrustes Problems. Oxford University Press, 2004. 3

[12] L. Gu and T. Kanade. 3d alignment of face in a single image. In Proceedings of Computer Vision and Pattern Recognition, 2006. 1 [13] L. Gu and T. Kanade. A generative shape regularization model for robust face alignment. In Proceedings of The 10th European Conference on Computer Vision, 2008. 1 [14] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. IJCV, 1(4):321331, 1988. 1 [15] V. Lepetit, P. Lagger, and P. Fua. Randomized trees for realtime keypoint recognition. In Proceedings of Computer Vision and Pattern Recognition, 2005. 5 [16] L. Liang, R. Xiao, F. Wen, and J. Sun. Face alignment via component-based discriminative search. In Proceedings of European Conference on Computer Vision, 2008. 1 [17] C. Liu, H. Shum, and C. Zhang. Hierarchical shape modeling for automatic face localization. In Proceedings of European Conference on Computer Vision, 2002. 1 [18] M. Rogers and J. Graham. Robust active shape model search. In Proceedings of European Conference on Computer Vision, 2002. 1 [19] S. Romdhani, S. Gong, and A. Psarrou. A multi-view nonlinear active shape model using kernel PCA. In BMVC, 1999. 1 [20] P. J. Rousseeuw. Robust regression and outlier detection. Wiley, New York, 1987. 3 [21] C. Steward. Robust parameter estimation in computer vision. SIAM Review, 41(3):513537, 1999. 4 [22] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61:611622, 1999. 2 [23] J. Tu, Z. Zhang, Z. Zeng, and T. Huang. Face localization via hierarchical CONDENSATION with Fisher boosting feature selection. In Proceedings of Computer Vision and Pattern Recognition, 2004. 1 [24] Y. Zhou, L. Gu, and H. J. Zhang. Bayesian tangent shape model: estimating shape and pose parameters via Bayesian inference. In Proceedings of Computer Vision and Pattern Recognition, 2003. 1, 2, 5 [25] Y. Zhou, W. Zhang, X. Tang, and H. Shum. A Bayesian mixture model for multi-view face alignment. In Proceedings of Computer Vision and Pattern Recognition, 2005. 1

2472

Figure 9. Alignment results by our approach. For each test image, we show the nal result on the top and the observed shape at the bottom.

2473