0% found this document useful (0 votes)
80 views

A Point-Cloud-Based Multiview Stereo Algorithm For Free-Viewpoint Video

This document summarizes a point-cloud-based multiview stereo algorithm for free-viewpoint video. The algorithm consists of three stages: 1) point cloud extraction using a robust stereo matching metric, 2) merging of point clouds through detection and fusion of visual hull information, frontier points, and implicit points, and 3) meshing where point fidelity values are fused in a Poisson reconstruction algorithm with space-constrained remeshing to improve accuracy. The algorithm aims to achieve high reconstruction accuracy under sparse viewpoint setups, addressing challenges of limited cameras, calibration errors, and motion blur in free-viewpoint video datasets.

Uploaded by

julio perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

A Point-Cloud-Based Multiview Stereo Algorithm For Free-Viewpoint Video

This document summarizes a point-cloud-based multiview stereo algorithm for free-viewpoint video. The algorithm consists of three stages: 1) point cloud extraction using a robust stereo matching metric, 2) merging of point clouds through detection and fusion of visual hull information, frontier points, and implicit points, and 3) meshing where point fidelity values are fused in a Poisson reconstruction algorithm with space-constrained remeshing to improve accuracy. The algorithm aims to achieve high reconstruction accuracy under sparse viewpoint setups, addressing challenges of limited cameras, calibration errors, and motion blur in free-viewpoint video datasets.

Uploaded by

julio perez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO.

3, MAY/JUNE 2010 407

A Point-Cloud-Based Multiview Stereo


Algorithm for Free-Viewpoint Video
Yebin Liu, Qionghai Dai, Senior Member, IEEE, and Wenli Xu

Abstract—This paper presents a robust multiview stereo (MVS) algorithm for free-viewpoint video. Our MVS scheme is totally point-
cloud-based and consists of three stages: point cloud extraction, merging, and meshing. To guarantee reconstruction accuracy, point
clouds are first extracted according to a stereo matching metric which is robust to noise, occlusion, and lack of texture. Visual hull
information, frontier points, and implicit points are then detected and fused with point fidelity information in the merging and meshing
steps. All aspects of our method are designed to counteract potential challenges in MVS data sets for accurate and complete model
reconstruction. Experimental results demonstrate that our technique produces the most competitive performance among current
algorithms under sparse viewpoint setups according to both static and motion MVS data sets.

Index Terms—Multiview stereo, MVS, free-viewpoint video, point cloud.

1 INTRODUCTION

O VER the past decade, computer graphics and computer


vision joined hands in many areas and brought about
lots of promising technologies. One of these technologies is
lots of MVS algorithms prevailing in vision domain fail in
obtaining satisfactory results in FVV scenario.
To realize robust 3D reconstruction, this paper proposes
the free-viewpoint video (FVV), which will ultimately a new MVS algorithm for free-viewpoint video. Our
provide users with superior fidelity and feelings of reconstruction algorithm follows the philosophy of tradi-
immersion while viewing visual media. MVS, known as a tional 3D laser scanners and is composed of three steps:
technique to acquire 3D model of real-world objects from point cloud extraction, merging, and meshing. Through this
multiple calibrated photographs, is important for realistic three-stage local processing, the reconstruction refrains
free-viewpoint video. from global optimization commonly used in available
The potential applications of multiview stereo range from free-viewpoint video systems, and achieves high recon-
the construction of realistic object models for films, games, struction accuracy comparable to the most accurate recon-
and design engineering, to the quantitative recovery of metric struction algorithms. Moreover, all aspects of our method
information for scientific and engineering data analysis. This are delicately designed to address the reconstruction
technique is so important that it is attracting attentions from challenges in FVV data sets stated above. The proposed
lots of computer vision researchers for decades. method has the ability to achieve high quality reconstruc-
In the domain of computer vision, MVS is commonly tion under extremely sparse camera setup, and it bridges
investigated for the reconstruction of static scenes and the gap between MVS algorithms in vision domain and
objects. The multiview images used for such scenario can be
graphics domain.
efficiently filmed by a single camera, and the quality of the In particular, the proposed MVS algorithm has the
captured images can be ideal through delicate camera
following characteristics in each step:
selection and flexible pose configuration of the filming
actions. As-accurate-as-possible 3D reconstruction is the . A point detection metric which is robust to occlu-
goal in this scenario. sion, noise, and lack of texture, is adopted to
Contrarily, for the realization of free-viewpoint video in improve the performance of point cloud in each
computer graphics domain, the multiview images captured view. Frontier points and implicit points are ex-
by video camera array suffer from the challenges such as tracted and designed to boost surface completeness
limited camera number, errors in geometry/color calibra- and accuracy.
tion and temporal synchronization, unobservable surface . Erroneous and conflicting points are adaptively
regions because of fixed camera array setting, and blur removed in 3D space based on surface prior and point
effect due to high speed motion. All these challenges make properties including position, normal, fidelity, and
extracted view to improve reconstruction accuracy.
. Point fidelity values are fused in the standard Poisson
. The authors are with the Automation Department, Tsinghua University,
reconstruction algorithm to improve meshing accu-
Central Main Building, Beijing 100084, P.R. China.
E-mail: {liuyebin, qhdai, xuwl}@tsinghua.edu.cn. racy. Space-constrained mechanism is introduced to
Manuscript received 20 Oct. 2008; revised 23 Mar. 2009; accepted 30 June point cloud remeshing to guarantee that the recon-
2009; published online 21 July 2009. structed mesh lies within the visual hull.
Recommended for acceptance by B. Guo. Another advantage of our reconstruction scheme lies in
For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference IEEECS Log Number TVCG-2008-10-0172. that the three steps are totally decoupled, which makes it
Digital Object Identifier no. 10.1109/TVCG.2009.88. highly flexible and scalable, and provides the possibility of
1077-2626/10/$26.00 ß 2010 IEEE Published by the IEEE Computer Society
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
408 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010

performance improvement by enhancing any of the three TABLE 1


modules independently. At last, the latest FVV systems [1], Comparison of Classes of Multiview Stereo Methods
[2] concentrate on topology preserving reconstruction
while neglecting reconstruction accuracy. Our MVS algo-
rithm can serve as a complement to these works in the
future FVV systems.
The remainder of this paper is laid out as follows: we
will first provide related works in Section 2. Then an
overview of our FVV system and our proposed MVS
algorithm are described in Section 3. We then discuss the
individual modules of our method, including point cloud
detection (Section 4), point cloud merging (Section 5), and
point cloud meshing (Section 6). Section 7 presents
experimental results, and Section 8 concludes with discus- reconstruction qualities on static objects. Their successes are
sion of the paper’s main contributions. ascribed to the advances of low level vision techniques
(image matching) and point-based graphics techniques
2 RELATED WORK (point cloud filtering [36], [37] and meshing [38], [39]).
Goesele et al. [19] and Bradley et al. [22] divide the MVS
Multiview stereo (MVS) algorithms can be categorized into
problem into depth map computation and depth map
two classes. The first is the global optimization methods,
merging. Pixel by pixel window matching technique is
which optimize the model surface based on a global energy
minimization formulation. According to the MVS bench- exploited to retrieve depth values for high fidelity pixels
mark [3], this kind of method can be further divided into and then the result depth maps are merged. Campbell et al.
subclasses of surface volume extraction [4], [5], [6], [7], [8], [40] concentrate on the accuracy improvement of depth
[9], [10] and surface evolution [11], [12], [13], [14], [15]. The maps, while Merrell et al. [41] and Zach et al. [21]
second is the multistage-local-processing methods, which investigate efficient multiple depth map merging. Although
break the whole MVS reconstruction into multiple sub- high accuracies are achieved for salient texture regions,
optimization problems. Such methods can also be further robustness and completeness are not addressed and
divided into depth maps merging approaches [16], [17], guaranteed in these works.
[18], [19], [20], [21], [22], and feature growing approaches Patch-based MVS (PMVS), another multistage-local-
[23], [24], [25], [26], [27]. processing MVS algorithm proposed by Furukawa et al.
The first FVV studio, Virtualized Reality [28], adopts the [24], divides the reconstruction problem into 3D feature
depth map merging approach belonging to the multistage- extraction problem and feature expansion problem.
local-processing methods. In their system, a group of stereo Through this two-stage processing, PMVS is delicately
depth images in 2.5D about a moving person is computed designed and obtains the highest reconstruction qualities
by 51 cameras arranged on a dome. Reconstruction for Middlebury static multiview data sets. However, as
performance at that time is low because of ambiguous shown in Section 7, PMVS is not necessarily accurate and
stereo matching and crude surface fusion. After this work, robust for FVV data sets.
lots of FVV systems turn to use global optimization MVS In general, the available multistage-local-processing
because of their abilities to produce a compact and closed MVS algorithms assume accurate stereo matching and
surface model. For example, Franco et al. [29] derive shape extract points or patches based on image pixels. Surface
from silhouette, Tomiyama et al. [30], Matsuyama et al. [31], regions that are textureless, noisy, or unobservable may not
and Starck et al. [32] compute a raw shape and then refine be reasonably recovered. Furthermore, extracted points
the shape to match stereo cues based on evolution cost or cannot guarantee to be spatially uniform and there may be
graph cuts. Goldluecke et al. [33] introduce a spatiotempor- holes on object surface, presenting challenges for meshing
al approach to volumetric reconstruction using level-sets for algorithm to generate a watertight and reasonable mesh.
temporally consistent reconstruction. Starck et al. [34], [35] Table 1 lists the advantages (items in blue) and
combine multiple shape cues for robust wide-baseline disadvantages (items in saffron yellow) of global optimiza-
volumetric reconstruction using graph cuts, which achieves tion MVS and multistage-local-processing MVS. The goal of
rendering quality comparable to that of the captured video. this work is to propose an algorithm that combines both
Global optimization MVS algorithms have also been advantages of these two. Such algorithm can serve as a
extensively investigated for the reconstruction of static robust and high performance reconstruction method for
objects [4], [5], [6], [7], [8], [9]. However, because a single FVV systems and cover the gap between MVS in computer
global cost function is difficult to cover extensive local vision domain and graphics domain.
surface curvature cases, and also because it is easy to fall
into local minima when seeking for the optima, the
reconstruction accuracies using approaches in this kind
3 SYSTEM OVERVIEW
are generally not high enough. To validate the reconstruction algorithm, we have devel-
In recent years, researchers revisited a number of oped a multicamera 3D studio to capture multiview video.
reconstruction algorithms belonging to the multistage-local- These multiview data sets consist of 20 views evenly spaced
processing MVS class. These methods achieve extremely high on a ring as shown in Fig. 1. The image spatial resolution is

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 409

Fig. 1. The multicamera dome for 3D video capture.


Fig. 2. Multiview images captured by the multicamera dome.

1;024  768. One of the captured multiview image sets is


between two square windows centered at the projections of
shown in Fig. 2. Compared with the available multiview
xi into P and neighboring images.
video data sets, our data sets contain much more actors
Summing up all the cji of the reference cameras for each
with different dressings and motions, which provides the
point xi is one kind of matching metric. Traditionally, it is
possibility for comprehensive evaluation of different algo-
assumed that the point achieving the maximum under this
rithms. Also, 20 views offer view scalability and flexibility
metric is the most likely point on the real surface. However,
for extensive researches besides modeling and rendering.
These multiview videos and the reconstruction results of because of occlusion among views, this is not always true. As
this work are available at our web page.1 observed by Vogiatzis et al. [9], if the surface region is seen
Our capture environment provides mono-color screen by a camera without deterioration caused by occlusion, the
allowing for background removal through chroma-keying, correlation score curve along the ray often shows a local
so that the visual hull [29], [42] can be conveniently maximum near the correct depth, though it may not be a
constructed. After that, reconstruction procedure is then global one. To take this constraint into account, Vogiatzis
implemented in sequential order as: point cloud extraction, designs a new metric which considers only the points that
point cloud merging, and point cloud meshing. Fig. 3 shows achieve local maximum. With this processing, when occlu-
the diagram of these three modules. sion happens in one of the reference views, the score on the
real surface will not be counted (since it is not the true score),
and thus, matching value can refrain from degradation.
4 POINT CLOUD DETECTION However, because of the existence of noise in common
Our point detection module is designed to be robust to FVV data sets, the curve of correlation scores still has lots of
occlusion, noise, and lack of texture. Section 4.1 describes the small sawtooth-like local maximums. These sawteeth
detection metric robust to occlusion and noise. Further, greatly affect the effectiveness of the local maximum rule.
Section 4.2 improves this metric to be robust to lack of texture. To counteract the noises, the correlation score curve cji ði ¼
4.1 Occlusion and Noise Robust Point Detection 1; 2 . . .Þ is first filtered by Gaussian window to get a new
curve c~ji ði ¼ 1; 2 . . .Þ. Therefore, the point detected for pixel
In MVS, stereo matching should be pixel-level to guarantee
the accuracy of the detected 3D points. This implies that p in target image P will be:
each image pixel has its own depth value corresponding to x ¼ arg min
X j
c~i  I c~ij > c~iþ1
j 
 I c~ij > c~i1
j 
; ð1Þ
a unique 3D point. xi
jNðP Þ
Specifically, for pixel p in the target image P , an optical
ray starting from the camera center and passing through the
pixel is first computed. Aiming at the retrieval of the most
probable 3D scene point that the ray intersects with the
object surface, a matching metric is designed to compute the
consistency of each possible point xi along the ray
segments. Then, the point that achieves the highest score
under this metric is regarded as the desired 3D point. In
practice, xi is evenly sampled with sampling distance  on
the ray segments. Each sample xi is projected onto J
neighboring cameras and their correlation scores cji ðj ¼
1; 2    JÞ for point xi are computed. These correlation scores
are the zero mean normalized cross correlation (ZNCC)
Fig. 3. Our proposed point-cloud-based MVS framework: point cloud
1. http://media.au.tsinghua.edu.cn/fvv.jsp. detection, point cloud merging, and point cloud meshing.

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
410 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010

Fig. 5. Illustration of error cleaning performance obtained using all the


three constraints. (a) Views of the extracted points of the ninth image in
DinoSparseRing data set. (b) Their corresponding cleaning results.

4.3 Error Cleaning


In this step, a raw point cloud for target image P has been
extracted. To improve accuracy and reduce the number of
outliers in the point cloud, erroneous points are removed
Fig. 4. Illustration of the benefit obtained using lack-of-texture robust through the enforcing of surface prior constraint. The
point detection. (a) One of the input images. (b) The shading result of
removal operation is not defined in the image space but
points extracted using the matching metric of (1) only. (c) The error
cleaning result of (b). (d) The shading result of using the texture robust in the 3D space, which provides a more natural way to
matching metric. (e) The cleaning results in (d) using the technique in exploit the prior knowledge of surface properties.
Section 4.3.
. Density constraint. A fixed radius is used to
where I is an indication function equaling to value 1 if the compute the local neighborhood of each point. The
condition is true and value 0 otherwise. Ið~ cij > c~iþ1
j
Þ point whose neighbor number is lower than half of
j j
ci > c~i1 Þ ¼ 1 implies that xi is a local maximum over the
Ið~ the average neighbor number should be removed.
ray c~ij ði ¼ 1; 2:::Þ and will be counted in the matching matrix. . PCA Eigenvalue constraint. Principal Component
Analysis (PCA) is performed on the covariance
4.2 Point Detection for Textureless Regions matrix of each 3D point and its neighbors. Three
For regions with large area of insufficient texture, the Eigenvalues 1  2  3 , representing the weights
optimal consistency values using the above matching metric of the corresponding directions of the Eigenvectors,
are low and ambiguous, which makes the points detected in can be obtained by decomposition of the covariance
such area randomly scattered to some extent. matrix. Points without obvious surface normal are
As we have observed, there are often shading effects on removed if (3 =1 > 0:5) or (3 =2 > 0:5).
concave regions because of the fast changing normal . Normal constraint. The Eigenvector n3 associated
properties. Such shading effect makes the region salient with the smallest Eigenvalue determines the
for stereo matching and the color consistency of this region normal direction of each 3D point in the point
cannot be extremely low. This implies that pixels with cloud. Points whose normals are approximately
extremely low color consistency will have little chance to perpendicular to the view direction may not be an
correspond to local concave surface, while they are more accurate point, and these points had better to be
inclined to correspond to the points on the visual hull. detected by other cameras. We remove points that
Considering this observation, we enforce visual hull satisfies absðn3  cÞ < . Here, c is a normalized
points to pixels whose optimal consistency values are lower vector from the point to the detecting camera and
than a quarter of the number of reference images (ZNCC is a threshold.
ranges in ½1; 1, so the permissible consistency value is The cleaning performance is illustrated in Fig. 5, where
bounded by the number of reference images). It is worthy to the scattered points have been successfully removed.
note that such approximation is not accurate. However,
4.4 Frontier and Implicit Point Detection
with the error removal step which will be described in
For most of the image pixels, their corresponding surface
Section 4.3, the results are much more satisfactory.
points can be successfully detected based on the above
Fig. 4 illustrates the performance improvement through
detection metric. However, for thin regions such as hands and
the proposed lack-of-texture robust point detection mechan-
fingers, traditional stereo matching techniques are incompe-
ism. Because the trousers are black, stereo matching is tent. Moreover, there are still surface regions that could not be
difficult and consistency values are low. The point filmed by any camera. For example, the back sides of the
corresponding to the highest consistency value may not human arms are not captured by any of the camera (see Fig. 6).
be on the real surface. Therefore, detected point cloud will This work proposes the detection and the fusion of frontier
be scattered (Fig. 4b), and error cleaning result after Section and implicit points for better reconstruction.
4.3 will be defected and incomplete (Fig. 4c). In contrast, Frontier points are points on the visual hull where two
after the introduction of the texture robust mechanism into contour generators intersect and hence are definitely on the
the error cleaning process, the completeness and accuracy object surface [35]. Fig. 6b shows the extracted frontier points.
will be both improved as shown in Fig. 4e, where the left leg Frontier points can be extracted by projecting each silhouette
of the actress is well recovered. pixel back to the 3D space and examining the number of

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 411

5.1 Merging and Downsampling


After the point clouds from all the camera views are
clustered together, a merged point cloud is obtained. Due to
the overlaps among surface regions obtained from multiple
views, the merged point cloud contains large amounts of
redundant information and many conflicting points.
For the convenience of subsequent processing, a down-
sampling operation on the merged point cloud is necessary.
Such downsampling can also be designed to remove errors
for better reconstruction. Frontier points and implicit points
are important shape information and are all maintained in
this step. Points with matching consistency value larger
Fig. 6. Frontier point extraction and implicit point extraction: (a) One of than a threshold should also be reserved. Note that this
the captured image (cameras are looking downward on a ring, and threshold is set to be large enough to assert the real
hence, the back sides of the two arms are not filmed by any camera). existences of those points.
(b) and (c) The extracted frontier points and implicit points. (d) The Chasing for as-uniform-as-possible point sampling, local
reconstructed model when frontier and implicit points are not consid-
optimal points are reserved based on all available informa-
ered. (e) The model when these two are adopted.
tion from their neighbors. For points in the same neighbor-
hood, high frequency noises satisfying the following rule
intersections with the visual hull. Only if there exists a unique
are first removed:
intersection, the intersection point is regarded as a frontier
point. The orientation of the frontier point can be computed ,
0 1
X
as it is perpendicular to the local surface.
X
~
n@ ~
ni ~
n A < :
i ð2Þ
Implicit points are points on the visual hull surface that si NðsÞ

si NðsÞ
cannot be filmed by any of the cameras. Fig. 6c shows the
extracted implicit points. Since all cameras are on the ring Here, s is the considered point, si is a point belonging to the
and are looking downward, the back sides of the two arms neighbor NðsÞ of point s; n and ni are the corresponding
are not observable to any camera. These unobservable normalized normals. This rule implies that point s has
regions can not be detected based on stereo matching normal direction greatly different to that of its neighbors.
techniques. If the points on these regions are not detected, We also reserve points that meet s:f > 8si :f and
reconstructed model will be incomplete. Implicit points can s:v6¼si :v. This means the fidelity value of point s is larger
be examined by projecting the points on the visual hull to than all color consistency values of neighboring points
all the camera views. If all the projections either intersect extracted from different cameras. Otherwise, the point s
with the visual hull or projects on the silhouette—none of should be deleted.
the projections is on the inner space of the image silhouette Note that the above neighborhoods are computed using
without occlusion—the point is regarded as a frontier point. a fixed radius . This radius is set to be relatively small to
Frontier points are not necessarily on the real surface, and remove high frequency noises.
they serve as assistances for complete model reconstruction.
See Section 5.2 for the processing of the implicit points. 5.2 Conflict Cleaning
Fig. 6d illustrates the reconstructed model if the The above neighborhood radius for downsampling is
frontier points and the implicit points are not considered, relatively small and there are still conflict points extracted
while Fig. 6e shows the result by fusing the frontier points from different cameras and locating in a relatively large
and implicit points. Both of the hands can be well distance.
reconstructed in Fig. 6e. The following criteria define the conflicting point pair:

1. have the same projection positions on one of the views,


5 POINT CLOUD MERGING 2. distance between each other is smaller than a
The functionality of point cloud merging in this work is the constant value ",
same as point cloud processing techniques for laser 3. are captured from different cameras, and
scanners. However, the properties of these two kinds of 4. have both normal direction outward to the
point cloud are totally different. First, the accuracy of MVS projected view.
point cloud is lower and has great redundance, conflicts, The cleaning algorithm assumes that points with lower
and noises. Second, MVS point cloud contains additional color consistency value in the conflicting pairs are noisy
beneficial information, e.g., consistency value and camera point and should be removed.
view parameters, which are helpful for better merging. Fig. 7a illustrates the concept of conflicting points.
In this work, each point is denoted as sðx; ~ n; v; fÞ, with Criterion 4) in the above criteria is essential for surface
3D position x, normal information ~ n, detecting view v, and completeness. Fig. 7b shows a counter example of conflict-
matching consistency value (fidelity value) f. All this ing points from different views because of their opposite
information is served in the following downsampling normal directions. Points on the two sides of the actress’
process and the conflict cleaning process. Since frontier hand belong to this case.
points are points definitely on the real surface and implicit Fig. 8 illustrates the pipeline of conflict cleaning. Here, " is
points are not necessarily on the real surface, the fidelities of set to be about 10 times the value of . The fidelity and priority
these two points are set to be 1 and 1, respectively. of implicit point is the lowest. An implicit point should be

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
412 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010

Fig. 7. Illustration of conflicting points. The two hollow points project to Fig. 9. Space-constrained Poisson surface rectification. The red ellipses
the same image pixel and are called conflicting points. show the reconstruction results before rectification, while the green ones
are the corresponding areas after remeshing.
removed once it occludes any potential valid point. Camera ~
nf ¼ s:~
n  s:f: ð3Þ
view C is allowed to be a virtual view to increase the number
of conflicting point pairs. About 50 virtual views evenly Actually, ~
nf is the normal weighted with the fidelity. Here,
spaced on the spherical surface defined by the 20 capture the normal s:~n is substituted with ~
nf for the computation of
cameras are adopted in this work and result in about 3D vector field V~ in standard PSR algorithm [39]. All the
30 percent removal rate. After the downsampling and the following steps in meshing are left untouched. This
conflict cleaning steps, the result point cloud is clean enough modified meshing algorithm enables the reconstructed
for free-viewpoint video applications. surface to pass through important points such as frontier
points and high fidelity points.

6 POINT CLOUD MESHING 6.2 Surface-Constrained Remeshing


In order to obtain a reliable and watertight mesh for surface In extreme cases, downsampling and conflict cleaning may
empty some surface regions or dramatically decrease the
rendering, the last part of our surface reconstruction
point density. PSR algorithm is capable for producing a
pipeline generates the connectivity between clean points.
smooth and watertight mesh to fill this region, but some
This point cloud meshing is derived from Poisson surface
reconstructed mesh vertices may lie outside the visual hull.
reconstruction (PSR) [39] technique.
To rectify these vertices, vertices outside the visual hull
6.1 Fidelity-Based Poisson Surface Reconstruction are projected to the visual hull surface using their inverse
The available PSR algorithm is dedicated to mesh point normal directions. The projected hull points and correspond-
cloud obtained from laser scanner. For MVS point clouds, ing normals are used to replace the original vertices and their
there is information, such as point fidelity value, beneficial normals. PSR algorithm is reimplemented on the rectified
for surface meshing. In this work, point fidelity value point clouds. Fig. 9 shows two of the cases in which the
(matching consistency) is fused into standard PSR to redundant parts of the reconstructed surfaces are removed.
improve reconstruction accuracy. This is realized through Our rectification shows reasonable reconstruction results.
the computation of a new vector for each point as:
7 EXPERIMENTAL RESULTS AND DISCUSSION
In this section, the performance of the proposed point-
cloud-based multiview stereo (PCMVS) algorithm is exten-
sively tested via comparison with the prevalent MVS
algorithms. These algorithms include exact polyhedral
visual hull (EPVH, a popular visual hull algorithm) [29],
patch-based multiview stereo (PMVS, a high performance
MVS algorithm) [24], and a FVV algorithm proposed by
Starck et al. (SurfCap, achieves the highest performance on
8-view free-viewpoint video data sets) [35]. For our PCMVS
algorithm, the same parameter set is used in all of our
experiments. These parameters are listed in Table 2.
View-independent rendering is adopted for texture
mapping. That is to say, each vertex on the reconstruction
model is shaded with a color value from one of the camera
pixel. This camera pixel is determined by examining the ray
direction from the camera center to the vertex which is most
negatively correlated to the normal of this vertex. Such
rendering scheme is extremely simple without any render-
ing optimization, and therefore, satisfactory visual quality
Fig. 8. Conflict cleaning algorithm. often implies accurate geometry model.

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 413

TABLE 2
Summary of PCMVS Parameters Used in the Following
Experiments (V : Bounding Box Volume of the Visual Hull)

7.1 Comparison with Visual Hull


The proposed PCMVS is first compared with visual hull (VH)
using all our 20 multiview images. VH plays an important
role in the whole reconstruction process. It initializes the
PCMVS with an approximated shape to start with and
provides shape cues for the whole optimization process (such
as frontier, implicit points, and space constraint remesh). As
shown in Figs. 4, 6, and 9, without visual hull information,
model completeness and shape correctness cannot be
guaranteed. However, VH itself is far from satisfactory.
Fig. 10 shows the comparison between VH and PCMVS on
Fig. 11. Comparisons of reconstruction and rendering results on PMVS
both mesh models and textured model. Although 20 views and PCMVS using our own data sets. First row: PMVS results; second
are enough for a good VH to approximate the real surface, row: PCMVS results. From (a) to (e), both of them include rendering
result, point cloud shading, and reconstructed mesh. Last row is the
some of our rendering details. (a) Upper: rendering of characters on the
shirt; bottom: original characters. (b) and (c) Reconstructed mesh and
rendering of ruffles. (d) Original image of the ruffles. (e) Reconstructed
ribbon and its original image.

concave and some occluded regions are hard to be recovered.


Texture mapping can conceal some of these artifacts, but
rendering results are still limited compared with the
corresponding results by PCMVS. The green rectangles in
the images indicate the failure regions after texture mapping.
For concavities where texture information is abundant or
with strong color contrast, mapping results will be unac-
ceptable. For example, the underarm area (warm color on the
arm and cold color on the shirt) on the girl in the second row
reveals obviously errors. Moreover, visual defects are much
more obvious during interactive viewpoint control on the
model. The relationship between silhouette cues and stereo
cues will be further analyzed in Section 7.3.

7.2 Comparison with PMVS


PCMVS algorithm is further compared with PMVS algo-
rithm using Furukawa’s open software [24]. Two of our
captured 20-view data sets are adopted in this experiment.
The first row of Fig. 11 is PMVS results and the second row
of Fig. 11 is PCMVS results. Both of them include view-
independent rendering, point cloud shading, and recon-
structed mesh results.
PMVS is a wonderful and well-known 3D reconstruction
algorithm. It achieves the highest accuracy and complete-
ness on nearly all the Middlebury data sets and lots of static
multiview data sets. However, it relies on the accurate
Fig. 10. Comparison of reconstruction and rendering results on VH and detection of feature patches for the propagation of the whole
PCMVS using our own data sets. From left to right: one view of the input surface. When texture information is not abundant, such as
images, visual hull, texture result on the visual hull, PCMVS result, and the black trousers, PMVS cannot detect enough confident
texture mapping on the PCMVS model. features, therefore fails in the propagation step for complete

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
414 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010

Fig. 12. View-independent rendering of faces.

surface reconstruction. Moreover, PMVS considers point


normals for precise calculation of projection windows in
stereo matching. This technique adds freedom for precisely
matching on some individual good patches. However, for
low quality texture surface, such freedom may introduce
random errors both in point normals and point positions, as
illustrated in the shading point clouds. At last, without
upgrading the PSR algorithm as ours (see Section 6), PMVS
will get weird models as shown in the figure.
The second row of Fig. 11 shows our corresponding
results. The point cloud shading results demonstrate its
completeness and cleanness.
Another evaluation is about modeling and rendering
details. The bottom row of Fig. 11 shows rendering results
of our algorithm on textures and ruffles. The results are
clear and are comparable to the original one. Fig. 12 shows
two of the view-independent renderings for multiple
continuous views of the faces. The proposed MVS algo-
rithm obtains high quality reconstruction on the whole
body while still preserving good looking results on face
using view-independent rendering.
7.3 Reconstruction on Sparse Multiview Images
In this experiment, the reconstruction performances of
PCMVS using different view numbers and under different
camera configurations are given. For the 20 camera views,
there exists two kinds of symmetrical downsample strate-
gies to reduce the number of camera views to 10. Both Fig. 14. Reconstructed models using 10 cameras. (a) PCMVS-10.
strategies are illustrated in Fig. 13. Fig. 13b is the default (b) PCMVS-BINO.
configuration (denoted as PCMVS-20). Fig. 13a is denoted as
PCMVS-10, which is evenly sampled from PCMVS-20 mode. between neighboring views during point cloud detection,
Fig. 13c is obtained by uniformly sampling of the neigh- widespread noise and error points prevail on thin and small
bored-binocular camera sets, and is called PCMVS-BINO. regions, which make these regions hard to be preserved in
Because the sampling scheme of PCMVS-10 is more the results. The red ellipses mark the defected parts.
uniform than PCMVS-BINO, the visual hull quality of Actually, the threshold parameter for visual hull substitu-
PCMVS-10 is better than the quality of PCMVS-BINO. tion (described in Section 4.2) controls the degree of model
However, the base-line between the neighbor cameras in completeness when cameras are extremely sparse and
PCMVS-10 is too wide for accurate stereo matching, while stereo matching consistency is low. When this parameter
PCMVS-BINO still keeps pair-wire camera views for is set to be high, the reconstructed model is similar to the
satisfactory stereo matching. visual hull. However, when it is set to be low, the detected
Fig. 14 illustrates the reconstructed models using these points are scattered and the result is incomplete. In contrast,
two camera configurations under PCMVS algorithm. The the reconstruction results of PCMVS-BINO are surprisingly
left two columns are the reconstructed models under pleasant, and their qualities approximate to those of
PCMVS-10 mode. Because of weak matching consistency PCMVS using all the 20 views (Refers to Figs. 10, 11, and
18) except for some regions on the body that become small.
Fig. 15 compares the appearance rendering performance
between PCMVS-20 mode and PCMVS-BINO mode. It is
difficult to tell the differences between these two results
without careful observation. Here, the blue ellipses mark
the tiny flaws in the results of PCMVS-BINO.
In summary, PCMVS-BINO requires only half of the
camera number and achieves good visual quality, which
demonstrates its high performance cost ratio.
From the above experiment, it is interesting to arrive at the
conclusion that for multistage-local-processing MVS algo-
Fig. 13. Two kinds of view configurations using 10 cameras. (a) PCMVS- rithms, binocular configuration is more suitable for 3D
10. (b) PCMVS-20. (c) PCMVS-BINO. reconstruction when camera array is sparse. Conventionally,

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 415

Fig. 16. Reconstruction results on Middlebuey data sets (Temple-


SparseRing and DinoSparseRing). (a) Some of the input images.
(b) The reconstructed results with our proposed algorithm.

accuracy (Acc) and completeness (Comp) of the final results


with respect to the ground truth model. From the table,
PMVS is still the best. However, compared with “SurfCap,”
the proposed method obtains a better performance on both
of the two data sets, with a sound accuracy and complete-
ness. For the DinoSparseRing, our performance approx-
imates to PMVS and gains better reconstruction quality than
TempleSparseRing. Moreover, if the sparse data sets are
captured under pairwise mode, the reconstruction perfor-
Fig. 15. Texture mapping results. (a) Texture mapping on PCMVS-BINO mance is further improved. Fig. 16 illustrates PCMVS
models. (b) Texture mapping on PCMVS-20 models. results and the corresponding input images.
Combining all these experiments, it can be seen that
the filming of multiview images tends to be based on uniform PMVS [25] is especially competitive for high quality MVS
sample mode for both standard MVS data sets [3] and free- data sets but does not guarantee a good performance on
viewpoint video data sets [1], [35], [28], [30], [31], [32], [2]. traditional FVV data sets. On the other hand, the “SurfCap”
Advantages of uniform sampling lie in two main aspects. is much more suitable for wide base line FVV data sets but
First, visual hull can be better recovered when views are shows worse reconstruction accuracy on shape tricky and
evenly spaced. Second, texture information may comprehen- high detail-resolution data sets. These further verify the
sively cover the whole object. comparison in Table 1 for multistep local processing MVS
However, with the latest progresses on multistage-local- and global optimization MVS. (PMVS stands for multistep
processing MVS, performance is now dominated by the local processing MVS and “SurfCap” stands for global
ability of local region recovery. Therefore, it is necessary to optimization MVS.)
settle neighboring views being close enough to each other In summary, the proposed algorithm shows competitive
for accurate stereo matching. In contrast, the performance of performance on all these data sets and it demonstrates that
“SurfCap” is determined by visual hull and global multistage local optimization mechanism is also feasible
optimization technique (such as graph cut). This requires and prospective for free-viewpoint video.
the input views to be uniformly spaced. Though such
configuration is not suitable for our PCMVS algorithm, 7.5 Reconstruction on Motion Sequences
PCMVS is qualified for high accurate reconstruction under In this part, we further show reconstruction results on
extremely sparse views (10 views to cover 360 degree). dynamic multiview sequences. In our work, each temporal
model is constructed independently and then all the models
7.4 Reconstruction on Middlebury Data Sets
In the following, the performance of PCMVS using static
MVS data sets is investigated. An experiment on Middle-
bury data sets [44], dinoSparseRing (16 images), and the
templeSparseRing (16 images) is performed. Table 3 lists the

TABLE 3
Results for the Middlebury Data Sets

Fig. 17. Reconstruction results on motion sequences.

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
416 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010

Fig. 18. Modeling and rendering results of extensive kinds of motion and clothing. The blue ones are reconstruction meshes, and the accompanied
color images are the view-independent rendering results.

are combined to form a free-viewpoint video. Fig. 17 multiview stereo information is not sufficiently exploited in
illustrates some of the temporal-successive models we these systems, which may result in the long-time tracking
obtained, and these results claim that PCMVS has the results deviating from the ground truth. The combination of
ability to obtain stable reconstructions for motion MVS data our algorithm and temporal deformation techniques to
sets. However, since temporal information has not been achieve spatiotemporal reconstruction is promising for
future FVV systems.
utilized, it is still impossible for PCMVS to achieve topology
preserving reconstruction. 7.6 Robustness
State-of-the-art multiview systems for human perfor-
mance capture [1], [2], [43] take advantage of temporal To demonstrate the robustness of the presented MVS
information to deform or to track key models for topology algorithm, reconstruction experiments on extensive multi-
preserving reconstruction. However, as-accurate-as-possible view data sets are performed. Fig. 18 shows reconstruction
3D reconstruction is still indispensable in these systems. Our models and rendering results under different clothing and
work provides a robust and accurate substitute for laser poses, as well as some other 3D objects. Challenges of these
scanner on the job of key model production. Moreover, reconstructions (from top to bottom, from left to right) are:

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 417

1. the wrinkled dress; We compare PCMVS with several popular MVS schemes
2. the complex pose; such as EPVH, PMVS, and “SurfCap” using both Middle-
3. the black trousers and the in-tilted breast; bury data sets and our FVV data sets. Experimental results
4. outward sleeve and pocket; demonstrate our comprehensive performance among these
5. black trousers and ruffles on the shirts; algorithms. Moreover, reconstruction on non-lambertian
6. high speed motion; objects, motion sequences, and all kinds of poses and clothes
7. the horizontal placed thin palm; verify the robustness of our proposed MVS algorithm.
8. multiple objects together and mono color table We compare the performance of PCMVS using various
cloth; and camera number and different camera configuration. Illustra-
9. non-lambertian bronze statue. tion results reveal that pair-wise camera setting can improve
Using 20 camera views and under the same reconstruction accuracy for multistage-local-processing
MVS. Moreover, PCMVS can realize high performance
parameter set, PCMVS shows its competency for all
reconstruction using only 10 cameras under pair-wise
these data sets.
camera setting mode.
7.7 Complexity Future works may concentrate on speed-optimized
The most time-consuming part of the proposed PCMVS is stereo matching technique using binocular image rectifica-
the point cloud detection module, whose complexity tion and parallelized optimization for all the image pixels.
increases linearly with the number of pixels in all the Moreover, temporal coherent silhouette extraction can be
images. For each image pixel, it requires traversing all the introduced to improve the consistency of multiple visual
feasible 3D points on the corresponding ray for consistency hulls. At last, it is worthy of combining PCMVS with latest
calculations. It costs about 70 percent of the whole mesh tracking algorithms for high accurate and temporal-
reconstruction time on this step. All other modules such consistent (topology preserving) free-viewpoint video.
as point cloud cleaning in each view, merging and filtering
of the whole point cloud, and modified Poisson surface
ACKNOWLEDGMENTS
reconstruction are all efficient. Without performance opti-
mization, it takes about 10-15 minutes for a single model The authors would like to thank Bennett Wilburn in
using all the 20-view images. Microsoft Research Asia for the valuable suggestion on
multicamera system construction, S. Seitz, B. Curless,
7.8 Limitations J. Diebel, D. Scharstein, and R. Szeliski for the temple and
Silhouette information plays an important role in the PCMVS dino data sets and evaluations. This work is supported by
algorithm for robust reconstruction when stereo matching the National Basic Research Project of China (973 Program),
fails. Using the state-of-the-art chroma-keying techniques, No. 2010CB731800, the distinguished Young Scholars of
the extracted silhouettes are still coarse and temporal- NSFC, No. 60721003, and the National High Technology
inconsistent without careful manipulation. This leads to Research and Development Program of China (863 Pro-
degraded reconstruction accuracy and jittered motion effect gram), No. 2009AA01Z329
in the final free-viewpoint video. Another limitation of our
work is its high complexity of the point cloud detection
module, which hampers its efficient and popular applica- REFERENCES
tions. At last, because of the distortion of projection window [1] E.D. Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H.P. Seidel, and S.
in stereo matching, the detected point clouds are still not Thrun, “Performance Capture from Sparse Multi-View Video,”
accurate enough for static object reconstruction. Proc. ACM SIGGRAPH ’08, vol. 27, no. 3, pp. 98:1-98:10, 2008.
[2] D. Vlasic, I. Baran, W. Matusik, and J. Popovic, “Articulated
Mesh Animation from Multi-View Silhouettes,” Proc. ACM
SIGGRAPH ’08, vol. 27, no. 1, pp. 97-1-97-9, 2008.
8 CONCLUSION AND FUTURE WORKS [3] S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A
In this paper, we first review the differences between global Vomparison and Evaluation of Multi-View Stereo Reconstruction
optimization MVS and multistage-local-processing MVS. Algorithms,” Proc. IEEE Int’l Conf. Computer Vision and Pattern
Recognition (CVPR ’06), pp. 519-528, June 2006.
The former cannot guarantee comprehensive accuracy on [4] S. Roy and I. Cox, “A Maximum Flow Formulation of the
all the surface regions under single optimization parameter, Ncamera Stereo Correspondence Problem,” Proc. IEEE Int’l Conf.
while the latter are not suitable for FVV because of their low Computer Vision (ICCV ’98), pp. 492-499, Jan. 1998.
completeness and robustness. [5] V. Kolmogorov and R. Zabih, “Multi-Camera Scene Recon-
To overcome the above limitations, a point-cloud-based struction via Graph Cuts,” Proc. European Conf. Computer
Vision (ECCV ’02), pp. 82-96, May 2002.
MVS algorithm belonging to the multistage-local-processing [6] G. Vogiatzis, P. Torr, and R. Cipolla, “Multi-View Stereo via
MVS is proposed for accurate and robust free-viewpoint Volumetric Graph-Cuts,” Proc. IEEE Int’l Conf. Computer Vision and
video. The idea is inspired from the traditional point cloud Pattern Recognition (CVPR ’05), pp. 391-398, 2005.
scanning and reconstruction philosophy. To guarantee [7] S. Tran and L. Davis, “3D Surface Reconstruction Using Graph
reconstruction accuracy, point clouds are first extracted Cuts with Surface Constraints,” Proc. European Conf. Computer
Vision (ECCV ’06), May 2006.
according to a stereo matching metric which is robust to [8] S.N. Sinha, P. Mordohai, and M. Pollefeys, “Multi-View Stereo via
noise, occlusion, and lack of texture. Visual hull informa- Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh,” Proc.
tion, frontier points, and implicit points are detected and IEEE Int’l Conf. Computer Vision (ICCV ’07), Oct. 2007.
fused in all the reconstruction modules. New techniques [9] G. Vogiatzis, C.H. Esteban, P.H.S. Torr, and R. Cipolla, “Multi-
used in this work include noise, occlusion and texture View Stereo via Volumetric Graph-Cuts and Occlusion Robust
Photo-Consistency,” IEEE Trans. Pattern Analysis and Machine
robust stereo matching, individual point cloud error Intelligence, vol. 29, no. 12, pp. 2241-2246, Dec. 2007.
cleaning, conflicting point removing, fidelity-based Poisson [10] K. Kutulakos and S. Seitz, “A Theory of Shape by Space Carving,”
surface reconstruction, and space constrained remeshing. Int’l J. Computer Vision, vol. 38, no. 3, pp. 199-218, 2000.

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
418 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010

[11] G. Slabaugh, B. Culbertson, T. Malzbender, and M. Stevens, [34] J. Starck, G. Miller, and A. Hilton, “Volumetric Stereo with
“Methods for Volumetric Reconstruction of Visual Scenes,” Int’l J. Silhouette and Feature Constraints,” Proc. British Machine Vision
Computer Vision, vol. 57, no. 3, pp. 179-199, 2004. Conference (BMVC ’06), Sept. 2006.
[12] C.H. Esteban and F. Schmitt, “Silhouette and Stereo Fusion for 3D [35] J. Starck and A. Hilton, “Surface Capture for Performance Based
Object Modeling,” Computer Vision and Image Understanding, Animation,” IEEE Computer Graphics and Applications, vol. 27,
vol. 96, no. 3, pp. 367-392, 2004. no. 3, pp. 21-31, July 2007.
[13] G. Zeng, S. Paris, L. Quan, and F. Sillion, “Progressive Surface [36] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C.T.
Reconstruction from Images Using a Local Prior,” Proc. IEEE Int’l Silva, “Point Set Surfaces,” Proc. IEEE Conf. Visualization, pp. 21-
Conf. Computer Vision (ICCV ’05), pp. 1230-1237, Oct. 2005. 28, 2001.
[14] J.-P. Pons, R. Keriven, and O. Faugeras, “Modelling Dynamic [37] O. Schall, A. Belyaev, and H-P. Seidel, “Robust Filtering of Noisy
Scenes by Registering Multi-View Image Sequences,” Proc. IEEE Scattered Point Data,” Proc. IEEE Symp. Point-Based Graphics (SPG
Int’l Conf. Computer Vision and Pattern Recognition (CVPR ’05), June ’05), pp. 71-77, 2005.
2005. [38] Y. Ohtake, A. Belyaev, M. Alexa, G. Turk, and H.P. Seidel, “Multi-
[15] A. Zaharescu, E. Boyer, and R. Horaud, “Transformesh: A Level Partition of Unity Implicits,” ACM Trans. Graphics, vol. 22,
Topology-Adaptive Mesh-Based Approach to Surface Evolution,” pp. 463-470, 2003.
Proc. Asian Conf. Computer Vision (ACCV ’07), Nov. 2007. [39] M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson Surface
[16] Y. Liu, Q. Dai, and W. Xu, “Continuous Depth Estimation for Reconstruction,” Proc. Fourth Eurographics Symp. Geometry Proces-
Multi-View Stereo,” Proc. IEEE Int’l Conf. Computer Vision and sing (SGP ’06), June 2006.
Pattern Recognition (CVPR ’09), June 2009. [40] N.D.F. Campbell, G. Vogiatzis, C. Hernandez, and R. Cipolla,
“Using Multiple Hypotheses to Improve Depth-Maps for Multi-
[17] P. Gargallo and P. Sturm, “Bayesian 3D Modeling from Images
View Stereo,” Proc. European Conf. Computer Vision (ECCV ’08),
Using Multiple Depth Maps,” Proc. IEEE Int’l Conf. Computer
Oct. 2008.
Vision and Pattern Recognition (CVPR ’05), pp. 885-891, June 2005.
[41] P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahm, R.
[18] C. Strecha, R. Fransens, and L.V. Gool, “Combined Depth and Yang, D. Nister, and M. Pollefeys, “Real-Time Visibility Based
Outlier Estimation in Multi-View Stereo,” Proc. IEEE Int’l Conf. Fusion of Depth Maps,” Proc. IEEE Int’l Conf. Computer Vision
Computer Vision and Pattern Recognition (CVPR ’06), June 2006. (ICCV ’07), Oct. 2007.
[19] M. Goesele, B. Curless, and S.M. Seitz, “Multi-View Stereo [42] Y. Furukawa and J. Ponce, “Carved Visual Hulls for Image-Based
Revisited,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Modeling,” Int’l J. Computer Vision, vol. 81, no. 1, pp. 53-67, Mar.
Recognition (CVPR ’06), June 2006. 2009.
[20] P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahm, R. [43] D. Bradley, T. Popa, A. Sheffer, W. Heidrich, and T. Boubekeur,
Yang, D. Nister, and M. Pollefeys, “Real-Time Visibility-Based “Markerless Garment Capture,” Proc. ACM SIGGRAPH ’08,
Fusion of Depth Maps,” Proc. IEEE Int’l Conf. Computer Vision vol. 27, no. 3, pp. 99-106, 2008.
(ICCV ’07), Oct. 2007. [44] Mview, http://vision.middlebury.edu/mview/, 2009.
[21] C. Zach, T. Pock, and H. Bischof, “A Globally Optimal Algorithm
for Robust Tv-l1 Range Image Integration,” Proc. IEEE Int’l Conf. Yebin Liu received the BE degree from Beijing
Computer Vision (ICCV ’07), Oct. 2007. University of Posts and Telecommunications,
[22] D. Bradley, T. Boubekeur, and T. Berlin, “Accurate Multi-View P.R. China, in 2002, and the PhD degree from
Reconstruction Using Robust Binocular Stereo and Surface the Automation Department, Tsinghua Univer-
Meshing,” Proc. IEEE Int’l Conf. Computer Vision and Pattern sity, Beijing, P.R. China, in 2009. He is currently
Recognition (CVPR ’08), June 2008. a postdoctoral research fellow in the Automation
[23] A. Manessis, A. Hilton, P. Palmer, P. McLauchlan, and X. Shen, Department, Tsinghua University, Beijing, P.R.
“Reconstruction of Scene Models from Sparse 3D Structure,” Proc. China. His research interests include light field,
IEEE Int’l Conf. Computer Vision and Pattern Recognition (CVPR ’00), image-based modeling, and rendering multi-
pp. 666-673, June 2000. camera array techniques.
[24] Y. Furukawa and J. Ponce, “Accurate, Dense, and Robust
Multiview Stereopsis,” Proc. IEEE Int’l Conf. Computer Vision and
Pattern Recognition (CVPR ’07), June 2007. Qionghai Dai received the BS degree in
[25] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S.M. Seitz, mathematics from Shanxi Normal University,
“Multi-View Stereo for Community Photo Collections,” Proc. IEEE P.R. China, in 1987, and the ME and PhD
Int’l Conf. Computer Vision (ICCV ’07), Oct. 2007. degrees in computer science and automation
[26] M. Habbecke and L. Kobbelt, “A Surface-Growing Approach to from Northeastern University, P.R. China, in
Multi-View Stereo Reconstruction,” Proc. IEEE Int’l Conf. Computer 1994 and 1996, respectively. Since 1997, he has
Vision and Pattern Recognition (CVPR ’07), June 2007. been with the faculty of Tsinghua University,
[27] P. Labatut, J.-P. Pons, and R. Keriven, “Efficient Multi-View Beijing, P.R. China, and is currently a professor
Reconstruction of Large-Scale Scenes Using Interest Points, and the director of the Broadband Networks and
Delaunay Triangulation and Graph Cuts,” Proc. IEEE Int’l Conf. Digital Media Laboratory. His research areas
Computer Vision (ICCV ’07), Oct. 2007. include video communication, computer vision, and graphics. He is a
[28] T. Kanade, P. Rander, and P. Narayanan, “Virtualized Reality: senior member of the IEEE.
Constructing Virtual Worlds from Real Scenes,” IEEE Trans.
Multimedia, vol. 4, no. 1, pp. 34-47, Jan.-Mar. 1997. Wenli Xu received the BS degree in electrical
[29] J.-S. Franco, M. Lapierre, and E. Boyer, “Visual Shapes of engineering and the ME degree in automatic
Silhouette Sets,” Proc. Int’l Symp. 3D Data Processing, Visualization control engineering from Tsinghua University,
and Transmission (3DPVT ’06), pp. 397-404, 2006. Beijing, P.R. China, in 1970 and 1980, respec-
[30] K. Tomiyama, Y. Orihara, M. Katayama, and Y. Iwadate, tively, and the PhD degree in electrical and
“Algorithm for Dynamic 3D Object Generation from Multi- computer engineering from the University of
Viewpoint Images,” Proc. SPIE ’04, pp. 153-161, 2004. Colorado, Boulder, in 1990. He is currently a
professor at Tsinghua University and the director
[31] T. Matsuyama, X. Wu, T. Takai, and S. Nobuhara, “Real-Time 3D
of the Chinese Association of Automation. His
Shape Reconstruction, Dynamic 3D Mesh Deformation, and High
research interests are mainly in the areas of
Fidelity Visualization for 3D Video,” Computer Vision and Image
automatic control and computer vision.
Understanding, vol. 96, no. 3, pp. 393-434, 2004.
[32] J. Starck and A. Hilton, “Virtual View Synthesis of People from
Multiple View Video Sequences,” Graphical Models, vol. 67, no. 6,
pp. 600-620, 2005. . For more information on this or any other computing topic,
[33] B. Goldluecke and M. Magnor, “Space-Time Isosurface Evolution please visit our Digital Library at www.computer.org/publications/dlib.
for Temporally Coherent 3D Reconstruction,” Proc. IEEE Int’l Conf.
Computer Vision and Pattern Recognition (CVPR ’04), pp. 350-355,
June 2004.

Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.

You might also like