0% found this document useful (0 votes)

56 views

Inferring 3d Structure

The document presents an image-based approach to infer 3D structure parameters using a probabilistic shape and structure model. A Bayesian reconstruction estimates multi-view contours and their 3D structure simultaneously from a prior density constructed with a mixture model and probabilistic PCA. The approach is tested on estimating 3D joint locations on synthetic pedestrian images.

Uploaded by

frank

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

Inferring 3d Structure

Uploaded by

frank

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

@ MIT

massachusetts institute of technolog y — artificial intelligence laborator y

Inferring 3D Structure with a

Statistical Image-Based Shape
Model
Kristen Grauman, Gregory Shakhnarovich
and Trevor Darrell

AI Memo 2003-008 April 2003

© 2003 m a s s a c h u s e t t s i n s t i t u t e o f t e c h n o l o g y, c a m b r i d g e , m a 0 2 1 3 9 u s a — w w w. a i . m i t . e d u
Abstract

We present an image-based approach to infer 3D structure parameters using a probabilistic “shape+structure” model. The

3D shape of a class of objects may be represented by sets of contours from silhouette views simultaneously observed from

multiple calibrated cameras. Bayesian reconstructions of new shapes can then be estimated using a prior density constructed

with a mixture model and probabilistic principal components analysis. We augment the shape model to incorporate structural

features of interest; novel examples with missing structure parameters may then be reconstructed to obtain estimates of these

parameters. Model matching and parameter inference are done entirely in the image domain and require no explicit 3D

construction. Our shape model enables accurate estimation of structure despite segmentation errors or missing views in the

input silhouettes, and works even with only a single input view. Using a dataset of thousands of pedestrian images generated

from a synthetic model, we can perform accurate inference of the 3D locations of 19 joints on the body based on observed

silhouette contours from real images.

This work was supported by the Department of Energy Computational Science Graduate Fellowship (CSGF) and the

DARPA Human Identification at a Distance (HID) program.

2
1. Introduction

Estimating model shape or structure parameters from one or more input views is an important computer vision problem.

Classic techniques attempt to detect and align 3D model instances within the image views, but high-dimensional models or

models without well-defined features may make this type of search computationally prohibitive. Rather than fit explicit 3D

models to input images, we explore reconstruction and parameter inference using image-based shape models which can be

matched directly to observed features. We learn an implicit, image-based representation of a known 3D shape, match it to

input images using a statistical model, and infer 3D parameters from the matched model.

Implicit representations of 3D shape can be formed using models of observed feature locations in multiple views. With

sufficient training data of objects of a known class, a statistical multi-view appearance model can represent the most likely

shapes in that class. Such a model can be used to reduce noise in observed images, or to fill in missing data.

In this paper we present an image-based approach to infer 3D structure parameters. A probabilistic “shape+structure”

model is formed using a probability density of multi-view silhouette images augmented with known 3D structure parameters.

We combine this with a model of the observation uncertainty of the silhouettes seen in each camera to compute a Bayesian

estimate of structure parameters. A reconstruction of an observed object yields the multi-view contours and their 3D structure

parameters simultaneously. To our knowledge, this is the first work to formulate an image-based statistical shape model for

the inference of 3D structure.

We also show how the image-based model can be learned from a known 3D shape model. Using a computer graphics

model of articulated human bodies, we render a database of views augmented with the known 3D feature locations (and

optionally joint angles, etc.) From this we learn a joint shape and structure model prior, which can be used to find the

instance of the model class that is closest to a new input image. One advantage of a synthetic training set is that labeled real

data is not required; the synthetic model includes 3D structure parameter labels for each example.

The strength of our approach lies in our use of a probabilistic multi-view shape model which restricts the object shape and

its possible structural configurations to those that are most probable given the object class and the current observation. Even

when given poorly segmented binary images of the object, the statistical model can infer appropriate structure parameters.

Moreover, all computation is done within the image domain, and no model matching or search in 3D space is required.

In our experiments, we demonstrate how our shape+structure model enables accurate estimation of structure parameters

despite large segmentation errors or even missing views in the input silhouettes. Since parameter inference with our model

3
succeeds even with missing views, it is possible to match the model with fewer views than it has been trained on. We also

show how configurations that are typically ambiguous in single views are handled well by our multi-view model.

Possible applications of the methods presented include fast approximation of 3D models for virtual reality applications,

gesture recognition, pose estimation, or image feature correspondence matching across images.

2. Previous Work

In this paper we consider image-based statisical shape models that can be directly matched to observed shape contours.

Models which capture the 2D distribution of feature point locations have been shown to be able to describe a wide range of

flexible shapes, and they can be directly matched to input images [4]. The authors of [1] developed a single-view model of

pedestrian contours, and showed how a linear subspace model formed from principal components analysis could represent

and track a wide range of motion [2]. A model appropriate for feature point locations sampled from a contour is also given in

[2]. This single-view approach can be extended to 3D by considering multiple simultaneous views of features. Shape models

in several views can be separately estimated to match object appearance [5]; this approach was able to learn a mapping

between the low-dimensional shape parameters in each view.

With multi-view contours from cameras at known locations, a visual hull can be recovered to model the shape of the

observed object [11]. Algorithms for fast rendering of image-based visual hulls, which sidestep any geometric construc-

tion, have recently been developed [12]. By forming a statistical model of these multi-view contours, an implicit shape

representation that can be used for efficient reconstruction of visual hulls is created [8].

Our model is based on a mixture of Gaussians model, where each component is estimated using principal components

analysis (PCA). The use of linear manifolds estimated by PCA to represent an object class, and more generally an appearance

model, has been developed by several authors[16, 3, 10]. A probabilistic interpretation of PCA-based manifolds has been

introduced by [17, 9] as well as in [13], where it was applied directly to face images. As described below, we rely on the

mixture of Gaussian probabilistic principal components analysis (PPCA) formulation of [15] to model prior densities.

The idea of augmenting a PCA-based appearance model with structure parameters and using projection-based reconstruc-

tion to fill in the missing values of those parameters in new images was first proposed in [6]. A method that used a mixture of

PCA approach to learn a model of single contour shape augmented with 3D structure parameters was presented in [14]; they

were able to estimate 3D hand and arm location just from a single silhouette. This system was also able to model contours

4
observed in two simultaneous views, but separate models were formed for each so no implicit model of 3D shape was formed.

3. Bayesian Multi-view Shape Reconstruction

While regularization or Bayesian maximum a posteriori (MAP) estimation of single-view contours has received considerable

attention as described above, less attention has been given to multi-view data from several cameras simultaneously observing

an object. With multi-view data, a probabilistic model and MAP estimate can be computed on implicit 3D structures. We

apply a PPCA-based probability model to form Bayesian estimates of multi-view contours, and show how such a represen-

tation can be augmented and used for inferring structure parameters. Our work builds on the shape model introduced in [8],

where a multi-view contour density model is derived for the purpose of 3D visual hull reconstruction.

Silhouette shapes are represented as sampled points on closed contours, with the shape vectors for each view concatenated

to form a single vector in the input space. That is, with a set of n contour points c k in each of the K views,

ck = (xk0 , xk1 , ..., xkn ), 0 < k < K, (1)

a multi-view observation is defined as

c = (c0 , c1 , ..., cK )T

As described in [8], if the vector of observed contour points of a 3D object resides on a linear manifold, then a multi-view

image-based representation of the approximate 3D shape of that object should also lie on a linear manifold, at least for the

case of affine cameras. Therefore, the shape vectors may be expressed as a linear combination of the 3D bases.

A technique suitable only for highly constrained shape spaces is to approximate the space with a single linear manifold.

For more deformable structures, it is difficult to represent the shape space in this way. For example, with the pedestrian data

we will use in the experiments reported below, inputs are expected to vary in two key (nonlinear) ways: the absolute direction

in which the pedestrian is walking across the system workspace, and the phase of his walk cycle in that frame.

Thus, following [15, 3], we construct a density model using a mixture of PPCA models that locally model clusters of

data in the input space with probabilistic linear manifolds. A single PPCA model is a probability distribution over the

observation space for a given latent variable, which for this shape model is the true underlying contours in the multi-view

image. Parameters for the M Gaussian mixture model components are determined for the set of observed data vectors c n ,

5
1 ≤ n ≤ N , using an EM algorithm to maximize a single likelihood function

N
M

L= log πi p(cn |i) (2)
n=1 i=1

where p(cn |i) is a single PPCA model, and πi is the ith component’s mixture proportion. A separate mean vector µ i , principal

axes Wi , and covariance parameter σ i is associated with each of the M components. As this likelihood is maximized, both

the appropriate partitioning of the data and the respective principal axes are determined. The mixture of probabilistic linear

subspaces constitutes the prior density of the object shape.

We assume there is a normal distribution of camera noise or jitter that affects the observed contour point locations in

the input images, and we model this as a multivariate Gaussian with covariance Σ o . A simple model may use a spherical

covariance matrix for Σ o , where the value is a tunable parameter depending on the amount of regularization desired.

training example test example, outlier shape

1
projection coefficients, 9th dim

−1

test example, outlier shape test example, outlier shape

−2

−3

−4
−5 0 5
projection coefficients, 8th dim

Figure 1: Illustration of prior and observed densities. Center plot shows two projection coefficients in the subspace for
training vectors (red dots) and test vectors (green stars), all from real data. The distribution of cleanly segmented silhouettes
(such as the multi-view image in top left) is representative of the prior shape density learned from the training set. The test
points are poorly segmented silhouettes which represent novel observations. Shown in bottom left and on right are some test
points lying far from the center of the prior density. Due to large segmentation errors, they are unlikely samples according to
the prior shape model. MAP estimation reconstructs such contours as shapes closer to the prior. Eighth and ninth dimensions
are shown here; other dimensions are similar.

6
PPCA
Sampled, normalized Reconstruction via models
contour points probabilistic shape model

EM
Silhouettes Reconstructed
silhouettes

Contour points
Background plus 3D structure
Inference of 3D parameters
subtraction structure parameters

Multi-view 3D structure Synthetic

textured images parameters training images

Figure 2: Diagram of data flow in our system.

A MAP estimate of the silhouettes is formed based on the PPCA prior shape model and the Gaussian distributed observa-

tion density [15]. The estimate is then backprojected into the multi-view image domain to generate the recovered silhouettes.

By characterizing which projections onto the subspace are more likely, the range of possible reconstructions is effectively

moderated to be more like those expressed in the training set (see Figure 1).

4. Inferring 3D Structure

We extend the shape model described above to incorporate additional structural features within the PPCA framework. A

model built to represent the shape of a certain class of objects using multiple contours can be augmented to include informa-

tion about the object’s orientation in the image, as well as the 3D locations of key points on the object. The mixture model

now represents a density over the observation space for the true underlying contours together with their associated 3D struc-

ture parameters. Novel examples are matched to the contour-based shape model using the same multi-view reconstruction

method described in Section 3 in order to infer their unknown or missing parameters. (See Figure 2 for a diagram of data

flow.)

The shape model is trained on a set of vectors that are composed of points from multiple contours from simultaneous

views, plus a number of three-dimensional structure parameters, s j = (s0j , s1j , s2j ). Each training input vector v is then

defined as

v = (c0 , c1 , ..., cK , s0 , s1 , ..., sz )T (3)

7
where there are z 3D points for the structure parameters. When presented with a new multi-view contour, we find the MAP

estimate of the shape and structure parameters based on only the observable contour data. The training set for this inference

task may be comprised of real or synthetic data.

One strength of the proposed approach for the estimation of 3D feature locations is that the silhouettes in the novel inputs

need not be cleanly segmented. Since the contours and unknown parameters are reconstructed concurrently, the parameters

are essentially inferred from a restricted set of feasible shape reconstructions; they need not be determined by an explicit

match to the raw observed silhouettes. Therefore, the probabilistic shape model does not require an expensive segmentation

module. A fast simple foreground extraction scheme is sufficient.

As should be expected, our parameter inference method also benefits from the use of multi-view imagery (as opposed

to single-view). Multiple views will in many cases overcome the ambiguities that are geometrically inherent in single-view

methods.

5. Learning a Multi-view Pedestrian Shape Model

A possible weakness of any shape model defined by examples is that the ability to accurately represent the space of realizable

shapes will generally depend heavily on the amount of available training data. Moreover, we note that the training set from

which the probabilistic shape+structure model is learned must be “clean”; otherwise the model could fit the bias of a particular

segmentation algorithm. It must also be labeled with the true values for the 3D features. Collecting a large data set with these

properties would be costly in resources and effort, given the state of the art in motion capture and segmentation, and at the

end the “ground truth” could still be imprecise. We chose therefore to use realistic synthetic data for training a multi-view

pedestrian shape model. We obtained a large training set by using P OSER [7] – a commercially available animation software

package – which allows us to manipulate realistic humanoid models, position them in the simulated scene, and render textured

images or silhouettes from a desired point of view. Our goal is to train the model using this synthetic data, but then use the

model for reconstruction and inference tasks with real images.

We generated 20,000 synthetic instances of multi-view input for our system. For each instance, a humanoid model was

created with randomly adjusted anatomical shape parameters, and put into a walk-simulating pose, at a random phase of the

walking cycle. The orientation of the model was drawn at random as well in order to simulate different walk directions of

human subjects in the scene. Then for each camera in the real setup we rendered a snapshot of the model’s silhouette from a

8
0 0.1 0.2
th

0.6

bh
n
0.5 rs ls

0.4 le
re

0.3 rw lw
rt lt

0.2

rk lk
0.1

0 la

ra ltoe
00.20.4 rtoe

Figure 3: An example of synthetically generated training data. Textured images (top) show rendering of example human
model, silhouettes and stick figure (below) show multi-view contours and structure parameters, respectively.

point in the virtual scene approximately corresponding to that camera. In addition to the set of silhouettes, we record the 3D

locations of 19 landmarks of the model’s skeleton, corresponding to selected anatomical joints. (See Figure 3.)

For this model, each silhouette is represented as sampled points along the closed contour of the largest connected compo-

nent extracted from the original binary images. All contour points are normalized to a translation and scale invariant input

coordinate system, and each vector of normalized points is resampled to a common vector length using nearest neighbor

interpolation. The complete representation is then the vector of concatenated multi-view contour points plus a fixed number

of 3D body part locations (see Equation 3).

6. Experiments

We have applied our method to datasets of multi-view images of people walking. The goal is to infer the 3D positions of

joints on the body given silhouette views from different viewpoints.

For the following experiments, we used an imaging model consisting of four monocular views per frame from cameras

located at approximately the same height at known locations about 45 degrees apart. The working space of the system is

defined as the intersection of their fields of view (approximately three meters). Images of subjects walking through the

space at various directions are captured, and a simple statistical color background model is employed to extract the silhouette

foreground from each viewpoint. In the input observation vector for each test example, the 3D pose parameters are set to

zero.

Since we do not have ground truth pose parameters for the raw test data, we have tested a separate, large, synthetic

9
Figure 4: Two left images show clean synthetic silhouettes. Two right images show same silhouettes with noise added to
image coordinates of contour points. First has uniform noise; second has nonuniform noise in patches normal to contour.

test set with known pose parameters so that we can obtain error measurements for a variety of experiments. In order to

evaluate our system’s robustness to mild changes in the appearance of the object, we generated test sequences in the same

manner as the synthetic training set was generated, but with different virtual characters, i.e., different clothing, hair and body

proportions. To make the synthetic test set more representative of the real, raw silhouette data, we added noise to the contour

point locations. Noise is added uniformly in random directions, or in contiguous regions along the contour in the direction

of the 2D surface normal. Such alterations to the contours simulate the real tendency for a simple background subtraction

mechanism to produce holes or false extensions along the true contour of the object. (See Figure 4.)

Intuitively, a multi-view framework can discern 3D poses that are inherently ambiguous in single-view images. Our

experimental results validate this assumption. We performed parallel tests for the same examples, in one case using our

existing multi-view framework, and in the other, using the framework outlined above, only with the model altered to be

trained and tested with single views alone. Figure 7 compares the overall error distributions of the single and multi-view

frameworks for a test set of 3,000 examples. Errors in both pose and contours are measured for both types of training. Multi-

view reconstructions are consistently more accurate than single-view reconstructions. Training the model on multi-view

images yields on average 24% better pose inference performance and 16% better contour reconstruction performance than

training the model on single-view images.

We have also tested the performance of our multi-view method applied to body pose estimation when only a subset of

views is available for reconstruction. A missing view in the shape vector is represented by zeros in the elements corresponding

to that view’s resampled contour. Just as unknown 3D locations are inferred for the test images, our method reconstructs the

missing contours by inferring the shape seen in that view based on examples where all views are known. (See Figures 5, 6,

8, and 9.)

10
th th th th
bh bh bh
ls n rs lsn rs nls
rs bh
rsn ls
le re le re rele re le
lwlt rt rw lwltrw
rt rwltrtlw rwrt lt lw

lk rk lkrk rk
lk rk lk
la la la la
ltoera ra
ltoe
rtoe raltoe
rtoe ra ltoe
rtoe
rtoe

Figure 5: Pose inference from only a single view. Top row shows ground truth silhouettes that are not in the training set.
Noise is added to input contour points of second view (middle), and this single view alone is matched to the multi-view
shape model in order to infer the 3D joint locations (bottom, solid blue) and compare to ground truth (bottom, dotted red).
Abbreviated body part names appear by each joint. This is an example with average pose error of 5 cm.

th th th th
bh nbhls nbh bh
rs n ls lsrs ls n rs
rs
re
le re le re le le re
rw
rw lw lw lw
rt ltlw rtlt rwlt rt lt rtrw

lk
lk lk rk lk rk
rk rk
la
ltoe la ra rtoe
ra ltoe ra rtoe la la rartoe
rtoe ltoe ltoe

Figure 6: Pose inference with one missing view. Top row shows noisy input silhouettes, middle row shows contour recon-
structions, and bottom row shows inferred 3D joint locations (solid blue) and ground truth pose (dotted red). This is an
example with average pose error of 2.5 cm per joint and an average Chamfer distance from the true clean silhouettes of 2.3.

11
Single− vs. multi−view training

18 4.5

16 4

Mean distance from true pose per joint (cm)

Chamfer distance between true contours

14 3.5

and reconstructed contours

12
3

10
2.5

8
2
6

1.5
4

1
2

0.5
0
single multi single multi
Training method Training method

Figure 7: Training on single view vs. training on multiple views. Chart shows error distributions for pose (left) and contour
(right). Lines in center of boxes denote median value; top and bottom of boxes denote upper and lower quartile values,
respectively. Dashed lines extending from each end of box show extent of rest of the data. Outliers are marked with pluses
beyond these lines.

We are interested in knowing how pose estimation performance degrades with each additional missing view, since this

will determine how many cameras are necessary for suitable pose estimation should we desire to use fewer cameras than are

present in the training set. Once the multi-view model has been learned, it may be used with fewer cameras, assuming that

the angle of inclination of the cameras with the ground plane matches that of the cameras with which the model was trained.

Figure 8 shows results for 3,000 test examples that have been reconstructed using all possible numbers of views (1,2,3,4),

alternately. For a single missing view, each view is omitted systematically one at a time, making 12,000 total tests. For two

or three missing views, omitted views are chosen at random in order to approximately represent all possible combinations of

missing views equally. As the number of missing views increases, performance degrades more gracefully for pose inference

than for contour reconstruction.

The pose error e f for each test frame is defined as the average distance in centimeters between the estimated and true

positions of the 19 joints,

1
ef = |ei |, (4)
19 i

where ei is the individual error for joint i.

As described above, test silhouettes are corrupted with noise and segmentation errors so that they may be more represen-

tative of real, imperfect data, yet still allow us to do a large volume of experiments with ground truth. The “true” underlying

12
Pose error Contour reconstruction error

Mean distance from true pose per joint (cm)

Chamfer distance between true contours

and reconstructed contours

8
20

6
15

10 4

5
2

0
0 1 2 3 0 1 2 3
Number of missing views Number of missing views

Figure 8: Missing view results. Chart shows distribution of errors for pose (left) and contours (center) when model is trained
on four views, but only a subset of views is available for reconstruction. Plotted as in previous figure.

contours from the clean silhouettes (i.e., the novel silhouettes before their contour points were corrupted) are saved for com-

parison with the reconstructed silhouttes. The contour error for each frame is then the distance between the true underlying

contours and their reconstructions.

Contour error is measured using the Chamfer distance. For all pixels with a given feature (usually edges, contours, etc.)

in the test image I, the Chamfer distance D measures the average distance to the nearest feature in the template image T.

1
D(T, I) = dT (f ) (5)
N
f ∈T

where N is the number of pixels in the template where the feature is present, and d T (f ) is the distance between feature f in

T and the closest feature in I.

To interpret the contour error results in Figure 8, consider that the average contour length is 850 pixels, and the pedestrians

silhouettes have an average area of 30,000 pixels. If we estimate the normalized error to be the ratio of average pixel distance

errors (number of contour pixels multiplied by Chamfer distance) to the area of the figure, then a mean Chamfer distance

of 1 represents an approximate overall error of 2.8%, distances of 4 correspond to 11%, etc. Given the large degree of

segmentation errors imposed on the test sets, these are acceptable contour errors in the reconstructions, especially since the

3D pose estimates (our end goal) do not suffer proportionally.

Finally, we evaluated our algorithm on a large dataset of real images of pedestrians taken from a database of 4,000 real

13
multi-view frames. The real camera array is mounted on the ceiling of an indoor lab environment. The external parameters

of the virtual cameras in the graphics software that were used for training are roughly the same as the parameters of this real

four-camera system. The data contains 27 different pedestrian subjects.

Sample results for the real test dataset are shown in Figure 9. The original textured images, the extracted silhouettes,

and the inferred 3D pose are shown. Without having point-wise ground truth for the 3D locations of the body parts, we can

best assess the accuracy of the inferred pose by comparing the 3D stick figures to the original textured images. To aid in

inspection, the 3D stick figures are rendered from manually selected viewpoints so that they are approximately aligned with

the textured images.

In summary, our experiments show how the shape+structure model we have formulated is able to infer 3D structure by

matching observed image features directly to the model. Our tests with a large set of noisy, ground-truthed synthetic images

offer evidence of the ability of our method to infer 3D parameters from contours, even when inputs have segmentation errors.

In the experiments shown in Figure 8, structure inference for body pose estimation is accurate within 3 cm on average.

Performance is good even when there are fewer views available than were used during training; with only one input view,

pose is still accurate within 15 cm on average, and can be as accurate as within 4 cm. Finally, we have successfully applied

our synthetically-trained model to real data and a number of different subjects.

7. Conclusions and Future Work

We have developed an image-based approach to infer 3D structure parameters using a probabilistic multi-view shape model.

Novel examples with contour information but unknown 3D point locations are matched to the model in order to retrieve

estimates for unknown parameters. Model matching and parameter inference are done entirely in the image domain and

require no explicit 3D construction from multiple views. We have demonstrated how the use of a class-specific prior on

multi-view imagery enables accurate estimation of structure parameters in spite of large segmentation errors or even missing

views in the input silhouettes.

In future work we will explore non-parametric density models for inferring structure from shape. We also plan to run

experiments using motion capture data so that we may compare real image test results to ground-truth joint angles. In

addition, we intend to include dynamics to strengthen our model for the pedestrian walking sequences. We are also interested

in how the body pose estimation application might be utilized in some higher-level gesture or gait recognition system.

14
th th th th
th th th th
bh bh
n bh
n n nbh nbh bh
n bh bh
rs n ls
ls rs rs
ls rs ls rs ls n
ls rs ls rs rs
ls
le re re le re le re le re
le lere rele re le
lw rw rw lw rw rw rw rw rw lw rw
lt rt ltrt rt lt lw rt ltlw lw lt rt lw
lt rt ltrt rt lt lw

lk lk rklk rk lk lk lk
rk rk rklk rk rk rk lk

la ra ra lara ra la ltoe ra la ra ra ra
ltoe ltoela ltoe rtoe la
ltoe la
ltoe rtoe la
ltoe
rtoe rtoe rtoe rtoe ltoe rtoe rtoe

th th th th
th th th th nbh bh bh bh
n
bh
rsn ls
bh
n
rs ls
bh
lsnrs bh rs ls lsnrs ls n rs ls
rs
ls n rs
re le re le lere le le re
re le lere le
lw re lw lw lw re lw
rw rw lw lw
rt ltlw rtlt ltrw
rt lt rtrw
rw rtlt ltrwrt lt rtrw lt rw
rt
rk rk
rk lk rk
lk lk rk lk lkrk lk rk lk rk lk
rtoe
la ra
la ra la ra la ra ra
raltoe
rtoe ltoe rtoe ltoe rtoe laltoe
la ra
ltoe rtoe ltoe rtoe lartoe
ltoe ra la
rtoe ltoe

Figure 9: Inferring structure on real data. For each example, top row shows original textured multi-view image, middle
row shows extracted input silhouettes where the views that are not used in reconstruction are omitted, and bottom row
shows inferred joint locations with stick figures rendered at different viewpoints. To aid in inspection, the 3D stick figures
are rendered from manually selected viewpoints that were chosen so that they are approximately aligned with the textured
images. In general, estimation is accurate and agrees with the perceived body configuration. An example of an error in
estimation is shown in the top left example’s left elbow, which appears to be incorrectly estimated as bent.

15
References

[1] A. Baumberg and D. Hogg. Learning flexible models from image sequences. In Proceedings of European Conference on Computer

Vision, 1994.

[2] A. Baumberg and D. Hogg. An adaptive eigenshape model. In British Machine Vision Conference, pages 87–96, Birmingham,

September 1995.

[3] T. Cootes and C. Taylor. A mixture model for representing shape variation. In British Machine Vision Conference, 1997.

[4] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham. Active shape models - their training and application. Computer Vision and

Image Understanding, 61(1):38–59, January 1995.

[5] T.F. Cootes, G.V. Wheeler, K.N. Walker, and C.J. Taylor. View-based active appearance models. Image and Vision Computing,

20:657–664, 2002.

[6] M. Covell. Eigen-points: Control-point location using principal component analysis. In Proceedings of the IEEE Int. Conf. on

Automatic Face and Gesture Recognition, Killington, October 1996.

[7] Egisys Co. Curious Labs. Poser 5 : The ultimate 3D character solution. 2002.

[8] K. Grauman, G. Shakhnarovich, and T. Darrell. An image-based approach to bayesian visual hull reconstruction. In Proceedings

IEEE Conf. on Computer Vision and Pattern Recognition, Madison, 2003.

[9] J. Haslam, C. Taylor, and T. Cootes. A probabilistic fitness measure for deformable template models. In British Machine Vision

Conference, pages 33–42, York, England, September 1994.

[10] M. Jones and T. Poggio. Multidimensional morphable models. In Proceedings IEEE Conf. on Computer Vision and Pattern Recog-

nition, pages 683–688, New Delhi, January 1998.

[11] A. Laurentini. The visual hull concept for silhouette-based image understanding. IEEE Transactions on Pattern Analysis and Machine

Intelligence, 16(2):150–162, February 1994.

[12] W. Matusik, C. Buehler, R. Raskar, S. Gortler, and L. McMillan. Image-based visual hulls. In Proceedings of the 27th Conference

on Computer Graphics and Interactive Techniques, Annual Conference Series, pages 369–374, 2000.

[13] B. Moghaddam. Principal manifolds and probabilistic subspaces for visual recognition. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 24(6):780–788, June 2002.

[14] E. Ong and S. Gong. The dynamics of linear combinations:tracking 3D skeletons of human subjects. Image and Vision Computing,

20:397–414, 2002.

16
[15] M. Tipping and C. Bishop. Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2):443–482, 1999.

[16] M. A. Turk and A. P. Pentland. Face recognition using eigenfaces. In Proceedings IEEE Conf. on Computer Vision and Pattern

Recognition, pages 586–590, Hawai, June 1992.

[17] Y. Wang and L. H. Staib. Boundary finding with prior shape and smoothness models. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 22(7):738–743, 2000.

Secret Code Samsung
88% (25)
Secret Code Samsung
3 pages
Electrician Practice Test
90% (10)
Electrician Practice Test
6 pages
Freightliner Wiring Diagrams
75% (44)
Freightliner Wiring Diagrams
12 pages
User Manual For LED Digital Clock
58% (12)
User Manual For LED Digital Clock
2 pages
Tubi Submission Guidelines
100% (9)
Tubi Submission Guidelines
13 pages
GE Universal Remotes Instructions
100% (2)
GE Universal Remotes Instructions
2 pages
Haute42 - Instruction Manual
100% (1)
Haute42 - Instruction Manual
1 page
Mobile Phone Unlock Codes
71% (14)
Mobile Phone Unlock Codes
8 pages
Downloader Codes List
75% (24)
Downloader Codes List
7 pages
Ford PATS Bypass PDF
64% (14)
Ford PATS Bypass PDF
1 page
Phone Codes
78% (27)
Phone Codes
5 pages
Private Pilot Checkride Study Guide
100% (10)
Private Pilot Checkride Study Guide
7 pages
4L60E Swapguide
100% (5)
4L60E Swapguide
3 pages
Nathan Kutz - Dynamic Mode Decomposition Data-Driven Modeling of Complex Systems
100% (1)
Nathan Kutz - Dynamic Mode Decomposition Data-Driven Modeling of Complex Systems
255 pages
CX14 Op's Manual
No ratings yet
CX14 Op's Manual
194 pages
Mobile Phone Codes
100% (1)
Mobile Phone Codes
7 pages
Drone Commercial pt107 Study Certification
100% (2)
Drone Commercial pt107 Study Certification
13 pages
Cascadia Maintenance Manual
No ratings yet
Cascadia Maintenance Manual
190 pages
Annual Report 07 Edit BPT 3
No ratings yet
Annual Report 07 Edit BPT 3
10 pages
Class-Specific Grasping of 3D Objects From A Single 2D Image
No ratings yet
Class-Specific Grasping of 3D Objects From A Single 2D Image
7 pages
Pid3418333 Final Edit
No ratings yet
Pid3418333 Final Edit
8 pages
Virtual Training For Multi-View Object Class Recognition
No ratings yet
Virtual Training For Multi-View Object Class Recognition
8 pages
Recovering 3D Human Pose From Monocular Images: Ankur Agarwal and Bill Triggs
No ratings yet
Recovering 3D Human Pose From Monocular Images: Ankur Agarwal and Bill Triggs
15 pages
Point Pair Feature-Based Pose Estimation With Multiple Edge Appearance Models (PPF-MEAM) For Robotic Bin Picking
No ratings yet
Point Pair Feature-Based Pose Estimation With Multiple Edge Appearance Models (PPF-MEAM) For Robotic Bin Picking
20 pages
brooks1981
No ratings yet
brooks1981
64 pages
CV Project Report
No ratings yet
CV Project Report
7 pages
Model Evaluation:: Reliable and Consistent Models
No ratings yet
Model Evaluation:: Reliable and Consistent Models
5 pages
Mandikal 3D-PSRNet Part Segmented 3D Point Cloud Reconstruction From A Single ECCVW 2018 Paper
No ratings yet
Mandikal 3D-PSRNet Part Segmented 3D Point Cloud Reconstruction From A Single ECCVW 2018 Paper
13 pages
Estimating Human Body Configurations Using Shape Context Matching
No ratings yet
Estimating Human Body Configurations Using Shape Context Matching
8 pages
Shape Reconstruction of Human Foot From Multi-Camera Images Based On PCA of Human Shape Database
No ratings yet
Shape Reconstruction of Human Foot From Multi-Camera Images Based On PCA of Human Shape Database
8 pages
3D Content-Based Search Based On 3D Krawtchouk Moments
No ratings yet
3D Content-Based Search Based On 3D Krawtchouk Moments
5 pages
Feature Detection and Tracking With Constrained Local Models
No ratings yet
Feature Detection and Tracking With Constrained Local Models
10 pages
Structure Tensor-Based Gaussian Kernel Edge-Adaptive Depth Map Refinement With Triangular Point View in Images
No ratings yet
Structure Tensor-Based Gaussian Kernel Edge-Adaptive Depth Map Refinement With Triangular Point View in Images
9 pages
3D CAD Models and Its Feature Similarity
No ratings yet
3D CAD Models and Its Feature Similarity
5 pages
3D Reconstruction Using Stereo Images For Pose Estimation
No ratings yet
3D Reconstruction Using Stereo Images For Pose Estimation
10 pages
Cvpr2012 Example Based
No ratings yet
Cvpr2012 Example Based
8 pages
Sinha Sig Graph Asia 08
No ratings yet
Sinha Sig Graph Asia 08
10 pages
Shape Similarity and Visual Parts
No ratings yet
Shape Similarity and Visual Parts
18 pages
Shape Completion From A Single RGBD Image: Dongping Li, Tianjia Shao, Hongzhi Wu, and Kun Zhou, Fellow, IEEE
No ratings yet
Shape Completion From A Single RGBD Image: Dongping Li, Tianjia Shao, Hongzhi Wu, and Kun Zhou, Fellow, IEEE
14 pages
3D-Face Model Reconstruction Utilizing Facial
No ratings yet
3D-Face Model Reconstruction Utilizing Facial
4 pages
From 3D Model Data To Semantics
No ratings yet
From 3D Model Data To Semantics
17 pages
Computer Graphics Detailed PYQs Solutions
No ratings yet
Computer Graphics Detailed PYQs Solutions
2 pages
cv unit 4
No ratings yet
cv unit 4
30 pages
A New Approach To Image-Based Realistic Architecture Modeling
No ratings yet
A New Approach To Image-Based Realistic Architecture Modeling
10 pages
1997 020 Head Tracking
No ratings yet
1997 020 Head Tracking
7 pages
Sarkar ACM MMSec 09 PDF
No ratings yet
Sarkar ACM MMSec 09 PDF
10 pages
Non-Photorealistic Real-Time Rendering of Characteristic Faces
No ratings yet
Non-Photorealistic Real-Time Rendering of Characteristic Faces
9 pages
Modeling, Combining, and Rendering Dynamic Real-World Events From Image Sequences
No ratings yet
Modeling, Combining, and Rendering Dynamic Real-World Events From Image Sequences
7 pages
智能汽车激光点云与视觉图像...三维目标检测与跟踪算法研究_姜景文
No ratings yet
智能汽车激光点云与视觉图像...三维目标检测与跟踪算法研究_姜景文
103 pages
Virtual Dressing Room
No ratings yet
Virtual Dressing Room
18 pages
A Method For Comparing Content Based Image Retrieval Methods
No ratings yet
A Method For Comparing Content Based Image Retrieval Methods
8 pages
3 Sweep Extracting Editable Objects
No ratings yet
3 Sweep Extracting Editable Objects
10 pages
Wang Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation CVPR 2020 Paper
No ratings yet
Wang Few-Shot Learning of Part-Specific Probability Space for 3D Shape Segmentation CVPR 2020 Paper
10 pages
Joint Detection and Tracking
No ratings yet
Joint Detection and Tracking
7 pages
Bayesian Stochastic Mesh Optimisation For 3D Reconstruction: George Vogiatzis Philip Torr Roberto Cipolla
No ratings yet
Bayesian Stochastic Mesh Optimisation For 3D Reconstruction: George Vogiatzis Philip Torr Roberto Cipolla
12 pages
3d Image Reconstruction From Point Cloud of 2d Images
No ratings yet
3d Image Reconstruction From Point Cloud of 2d Images
5 pages
A Survey of Methods For Converting Unstructured Data To CSG Models
No ratings yet
A Survey of Methods For Converting Unstructured Data To CSG Models
4 pages
From Images To 3D Models
No ratings yet
From Images To 3D Models
7 pages
10 30897-Ijegeo 300724-287071
No ratings yet
10 30897-Ijegeo 300724-287071
8 pages
Zhu Detailed Human Shape Estimation From A Single Image by Hierarchical CVPR 2019 Paper
No ratings yet
Zhu Detailed Human Shape Estimation From A Single Image by Hierarchical CVPR 2019 Paper
10 pages
Automatic Rectification of Perspective Distortion From A Single Image Using Plane Homography PDF
No ratings yet
Automatic Rectification of Perspective Distortion From A Single Image Using Plane Homography PDF
12 pages
download
No ratings yet
download
11 pages
Module 1
No ratings yet
Module 1
45 pages
Department of Electrical and Computer Systems Engineering: Technical Report MECSE-28-2006
No ratings yet
Department of Electrical and Computer Systems Engineering: Technical Report MECSE-28-2006
31 pages
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
From Everand
Multi View Three Dimensional Reconstruction: Advanced Techniques for Spatial Perception in Computer Vision
Fouad Sabry
No ratings yet
Convolutional Mesh Regression For Single-Image Human Shape Reconstruction
No ratings yet
Convolutional Mesh Regression For Single-Image Human Shape Reconstruction
10 pages
Quizz Article
No ratings yet
Quizz Article
10 pages
Hassner Basri - Example Based 3D Reconstruction From Single 2D Images
No ratings yet
Hassner Basri - Example Based 3D Reconstruction From Single 2D Images
8 pages
Fast Visibility Analysis in 3D Mass Modeling Environments and Approximated Visibility Analysis Concept Using Point Clouds Data
No ratings yet
Fast Visibility Analysis in 3D Mass Modeling Environments and Approximated Visibility Analysis Concept Using Point Clouds Data
10 pages
SSTD05 SimilarityModel
No ratings yet
SSTD05 SimilarityModel
18 pages
One-Dimensional Vector Based Pattern
No ratings yet
One-Dimensional Vector Based Pattern
12 pages
Sketching Reality TOG
No ratings yet
Sketching Reality TOG
21 pages
1.fan - A Point Set Generation Network For 3D Object Reconstruction From A Single Image - CVPR - 2017 - Paper
No ratings yet
1.fan - A Point Set Generation Network For 3D Object Reconstruction From A Single Image - CVPR - 2017 - Paper
9 pages
8 An Overview of 3d Data Content File Formats and Viewers
No ratings yet
8 An Overview of 3d Data Content File Formats and Viewers
21 pages
Aravind 500195259 Async Assignment
No ratings yet
Aravind 500195259 Async Assignment
5 pages
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
From Everand
Procedural Surface: Exploring Texture Generation and Analysis in Computer Vision
Fouad Sabry
No ratings yet
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
From Everand
Articulated Body Pose Estimation: Unlocking Human Motion in Computer Vision
Fouad Sabry
No ratings yet
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Secret Codes Description Code
67% (3)
Secret Codes Description Code
4 pages
GM Key Relearn Procedure
100% (1)
GM Key Relearn Procedure
4 pages
CDL Test
No ratings yet
CDL Test
148 pages
Loudspeakers: Full-Range Speaker
100% (1)
Loudspeakers: Full-Range Speaker
3 pages
Restaurant Report Card: December 14, 2023
No ratings yet
Restaurant Report Card: December 14, 2023
9 pages
All Phones Reset Codes
71% (7)
All Phones Reset Codes
8 pages
BD Torqueshift 6: Download The Latest Install Manuals at
No ratings yet
BD Torqueshift 6: Download The Latest Install Manuals at
17 pages
DDEC V Wiring Diagram Updates
No ratings yet
DDEC V Wiring Diagram Updates
1 page
Allison Transmission 1000 and 2000 Series Fault Codes List
100% (1)
Allison Transmission 1000 and 2000 Series Fault Codes List
3 pages
2007 GMC Yukon 5.3L Eng VIN 0 Base U1814 00
0% (1)
2007 GMC Yukon 5.3L Eng VIN 0 Base U1814 00
2 pages
CDL Manual
100% (3)
CDL Manual
160 pages
Ecm Installation and Relearn
No ratings yet
Ecm Installation and Relearn
4 pages
Cadillac Escalade 2003 - 2004 Fuse Box Diagram
No ratings yet
Cadillac Escalade 2003 - 2004 Fuse Box Diagram
5 pages
Exp3a
No ratings yet
Exp3a
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
3 pages
PDF Qualitative Research Methods in English Medium Instruction for Emerging Researchers Theory and Case Studies of Contemporary Research 1st Edition Curle download
100% (1)
PDF Qualitative Research Methods in English Medium Instruction for Emerging Researchers Theory and Case Studies of Contemporary Research 1st Edition Curle download
55 pages
Breast Cancer Classification Model Using Principal Component Analysis and Deep Neural Network
No ratings yet
Breast Cancer Classification Model Using Principal Component Analysis and Deep Neural Network
13 pages
Where can buy Description of biosecurity practices on shrimp farms in Java, Lampung, and Banyuwangi, Indonesia Marina K.V.C. Delphino ebook with cheap price
100% (6)
Where can buy Description of biosecurity practices on shrimp farms in Java, Lampung, and Banyuwangi, Indonesia Marina K.V.C. Delphino ebook with cheap price
36 pages
Independent Component Analysis: An Introduction: Alaa Tharwat
No ratings yet
Independent Component Analysis: An Introduction: Alaa Tharwat
28 pages
PerCal Plus v.1.0.0 - User Manual
No ratings yet
PerCal Plus v.1.0.0 - User Manual
38 pages
Self Efficacy
No ratings yet
Self Efficacy
8 pages
Segmenting Indian Consumers
No ratings yet
Segmenting Indian Consumers
13 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Thesis Template For Uitm
No ratings yet
Thesis Template For Uitm
56 pages
MML Book
No ratings yet
MML Book
381 pages
Crop Diversification and Household Food Security Status: Evidence From Rural Benin
No ratings yet
Crop Diversification and Household Food Security Status: Evidence From Rural Benin
12 pages
CSE4062S24 Group5 Project DescriptiveAnalysis
No ratings yet
CSE4062S24 Group5 Project DescriptiveAnalysis
10 pages
MSC Statistics
No ratings yet
MSC Statistics
36 pages
ML
No ratings yet
ML
49 pages
2014 JBAR CVofMBI 3905 12902 1 PB
No ratings yet
2014 JBAR CVofMBI 3905 12902 1 PB
19 pages
Song - Zhou (2022)_EAP Needs Analysis
No ratings yet
Song - Zhou (2022)_EAP Needs Analysis
15 pages
bcarr-understanding-data-series-seifa
No ratings yet
bcarr-understanding-data-series-seifa
8 pages
Module - 5 Lecture Notes - 5: Remote Sensing-Digital Image Processing Information Extraction Principal Component Analysis
No ratings yet
Module - 5 Lecture Notes - 5: Remote Sensing-Digital Image Processing Information Extraction Principal Component Analysis
10 pages
Large-Scale Unusual Time Series Detection
No ratings yet
Large-Scale Unusual Time Series Detection
4 pages
Applications of GIS and Very High-Resolution RS Da PDF
No ratings yet
Applications of GIS and Very High-Resolution RS Da PDF
10 pages
Manufacturing Cost Estimation Based On The Machining Process and Deep-Learning Method
No ratings yet
Manufacturing Cost Estimation Based On The Machining Process and Deep-Learning Method
12 pages
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
No ratings yet
Comparing Xgboost With Logistic Regression and K-Nearest Neighbours in Music Genre Classification
11 pages
Exploring The Dimensionality of "Religiosity" and "Spirituality" in The Fetzer Multidimensional Measure
No ratings yet
Exploring The Dimensionality of "Religiosity" and "Spirituality" in The Fetzer Multidimensional Measure
12 pages
Empowering Glioma Prognosis With Transparent Machine Learning and Interpretative Insights Using Explainable AI
No ratings yet
Empowering Glioma Prognosis With Transparent Machine Learning and Interpretative Insights Using Explainable AI
22 pages
An Analysis of Play Style of Advanced Mahjong Players Toward The Implementation of Strong AI Player
No ratings yet
An Analysis of Play Style of Advanced Mahjong Players Toward The Implementation of Strong AI Player
12 pages
DWT-PCA (EVD) Based Copy-Move Image Forgery Detection
No ratings yet
DWT-PCA (EVD) Based Copy-Move Image Forgery Detection
8 pages

Inferring 3d Structure

Uploaded by

Inferring 3d Structure

Uploaded by

@ MIT

massachusetts institute of technolog y — artificial intelligence laborator y

Inferring 3D Structure with a

AI Memo 2003-008 April 2003

silhouette contours from real images.

DARPA Human Identification at a Distance (HID) program.

the inference of 3D structure.

between the low-dimensional shape parameters in each view.

3. Bayesian Multi-view Shape Reconstruction

ck = (xk0 , xk1 , ..., xkn ), 0 < k < K, (1)

a multi-view observation is defined as

subspaces constitutes the prior density of the object shape.

training example test example, outlier shape

test example, outlier shape test example, outlier shape

Multi-view 3D structure Synthetic

Figure 2: Diagram of data flow in our system.

v = (c0 , c1 , ..., cK , s0 , s1 , ..., sz )T (3)

task may be comprised of real or synthetic data.

module. A fast simple foreground extraction scheme is sufficient.

5. Learning a Multi-view Pedestrian Shape Model

model for reconstruction and inference tasks with real images.

of 3D body part locations (see Equation 3).

joints on the body given silhouette views from different viewpoints.

training the model on single-view images.

Mean distance from true pose per joint (cm)

Chamfer distance between true contours

and reconstructed contours

than for contour reconstruction.

positions of the 19 joints,

where ei is the individual error for joint i.

Mean distance from true pose per joint (cm)

Chamfer distance between true contours

and reconstructed contours

contours and their reconstructions.

T and the closest feature in I.

3D pose estimates (our end goal) do not suffer proportionally.

four-camera system. The data contains 27 different pedestrian subjects.

the textured images.

our synthetically-trained model to real data and a number of different subjects.

7. Conclusions and Future Work

views in the input silhouettes.

Image Understanding, 61(1):38–59, January 1995.

Automatic Face and Gesture Recognition, Killington, October 1996.

IEEE Conf. on Computer Vision and Pattern Recognition, Madison, 2003.

Conference, pages 33–42, York, England, September 1994.

nition, pages 683–688, New Delhi, January 1998.

Intelligence, 16(2):150–162, February 1994.

Machine Intelligence, 24(6):780–788, June 2002.

Recognition, pages 586–590, Hawai, June 1992.

Machine Intelligence, 22(7):738–743, 2000.

You might also like