A Point-Cloud-Based Multiview Stereo Algorithm For Free-Viewpoint Video
A Point-Cloud-Based Multiview Stereo Algorithm For Free-Viewpoint Video
Abstract—This paper presents a robust multiview stereo (MVS) algorithm for free-viewpoint video. Our MVS scheme is totally point-
cloud-based and consists of three stages: point cloud extraction, merging, and meshing. To guarantee reconstruction accuracy, point
clouds are first extracted according to a stereo matching metric which is robust to noise, occlusion, and lack of texture. Visual hull
information, frontier points, and implicit points are then detected and fused with point fidelity information in the merging and meshing
steps. All aspects of our method are designed to counteract potential challenges in MVS data sets for accurate and complete model
reconstruction. Experimental results demonstrate that our technique produces the most competitive performance among current
algorithms under sparse viewpoint setups according to both static and motion MVS data sets.
1 INTRODUCTION
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 409
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
410 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 411
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
412 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010
Fig. 7. Illustration of conflicting points. The two hollow points project to Fig. 9. Space-constrained Poisson surface rectification. The red ellipses
the same image pixel and are called conflicting points. show the reconstruction results before rectification, while the green ones
are the corresponding areas after remeshing.
removed once it occludes any potential valid point. Camera ~
nf ¼ s:~
n s:f: ð3Þ
view C is allowed to be a virtual view to increase the number
of conflicting point pairs. About 50 virtual views evenly Actually, ~
nf is the normal weighted with the fidelity. Here,
spaced on the spherical surface defined by the 20 capture the normal s:~n is substituted with ~
nf for the computation of
cameras are adopted in this work and result in about 3D vector field V~ in standard PSR algorithm [39]. All the
30 percent removal rate. After the downsampling and the following steps in meshing are left untouched. This
conflict cleaning steps, the result point cloud is clean enough modified meshing algorithm enables the reconstructed
for free-viewpoint video applications. surface to pass through important points such as frontier
points and high fidelity points.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 413
TABLE 2
Summary of PCMVS Parameters Used in the Following
Experiments (V : Bounding Box Volume of the Visual Hull)
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
414 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 415
TABLE 3
Results for the Middlebury Data Sets
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
416 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010
Fig. 18. Modeling and rendering results of extensive kinds of motion and clothing. The blue ones are reconstruction meshes, and the accompanied
color images are the view-independent rendering results.
are combined to form a free-viewpoint video. Fig. 17 multiview stereo information is not sufficiently exploited in
illustrates some of the temporal-successive models we these systems, which may result in the long-time tracking
obtained, and these results claim that PCMVS has the results deviating from the ground truth. The combination of
ability to obtain stable reconstructions for motion MVS data our algorithm and temporal deformation techniques to
sets. However, since temporal information has not been achieve spatiotemporal reconstruction is promising for
future FVV systems.
utilized, it is still impossible for PCMVS to achieve topology
preserving reconstruction. 7.6 Robustness
State-of-the-art multiview systems for human perfor-
mance capture [1], [2], [43] take advantage of temporal To demonstrate the robustness of the presented MVS
information to deform or to track key models for topology algorithm, reconstruction experiments on extensive multi-
preserving reconstruction. However, as-accurate-as-possible view data sets are performed. Fig. 18 shows reconstruction
3D reconstruction is still indispensable in these systems. Our models and rendering results under different clothing and
work provides a robust and accurate substitute for laser poses, as well as some other 3D objects. Challenges of these
scanner on the job of key model production. Moreover, reconstructions (from top to bottom, from left to right) are:
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
LIU ET AL.: A POINT-CLOUD-BASED MULTIVIEW STEREO ALGORITHM FOR FREE-VIEWPOINT VIDEO 417
1. the wrinkled dress; We compare PCMVS with several popular MVS schemes
2. the complex pose; such as EPVH, PMVS, and “SurfCap” using both Middle-
3. the black trousers and the in-tilted breast; bury data sets and our FVV data sets. Experimental results
4. outward sleeve and pocket; demonstrate our comprehensive performance among these
5. black trousers and ruffles on the shirts; algorithms. Moreover, reconstruction on non-lambertian
6. high speed motion; objects, motion sequences, and all kinds of poses and clothes
7. the horizontal placed thin palm; verify the robustness of our proposed MVS algorithm.
8. multiple objects together and mono color table We compare the performance of PCMVS using various
cloth; and camera number and different camera configuration. Illustra-
9. non-lambertian bronze statue. tion results reveal that pair-wise camera setting can improve
Using 20 camera views and under the same reconstruction accuracy for multistage-local-processing
MVS. Moreover, PCMVS can realize high performance
parameter set, PCMVS shows its competency for all
reconstruction using only 10 cameras under pair-wise
these data sets.
camera setting mode.
7.7 Complexity Future works may concentrate on speed-optimized
The most time-consuming part of the proposed PCMVS is stereo matching technique using binocular image rectifica-
the point cloud detection module, whose complexity tion and parallelized optimization for all the image pixels.
increases linearly with the number of pixels in all the Moreover, temporal coherent silhouette extraction can be
images. For each image pixel, it requires traversing all the introduced to improve the consistency of multiple visual
feasible 3D points on the corresponding ray for consistency hulls. At last, it is worthy of combining PCMVS with latest
calculations. It costs about 70 percent of the whole mesh tracking algorithms for high accurate and temporal-
reconstruction time on this step. All other modules such consistent (topology preserving) free-viewpoint video.
as point cloud cleaning in each view, merging and filtering
of the whole point cloud, and modified Poisson surface
ACKNOWLEDGMENTS
reconstruction are all efficient. Without performance opti-
mization, it takes about 10-15 minutes for a single model The authors would like to thank Bennett Wilburn in
using all the 20-view images. Microsoft Research Asia for the valuable suggestion on
multicamera system construction, S. Seitz, B. Curless,
7.8 Limitations J. Diebel, D. Scharstein, and R. Szeliski for the temple and
Silhouette information plays an important role in the PCMVS dino data sets and evaluations. This work is supported by
algorithm for robust reconstruction when stereo matching the National Basic Research Project of China (973 Program),
fails. Using the state-of-the-art chroma-keying techniques, No. 2010CB731800, the distinguished Young Scholars of
the extracted silhouettes are still coarse and temporal- NSFC, No. 60721003, and the National High Technology
inconsistent without careful manipulation. This leads to Research and Development Program of China (863 Pro-
degraded reconstruction accuracy and jittered motion effect gram), No. 2009AA01Z329
in the final free-viewpoint video. Another limitation of our
work is its high complexity of the point cloud detection
module, which hampers its efficient and popular applica- REFERENCES
tions. At last, because of the distortion of projection window [1] E.D. Aguiar, C. Stoll, C. Theobalt, N. Ahmed, H.P. Seidel, and S.
in stereo matching, the detected point clouds are still not Thrun, “Performance Capture from Sparse Multi-View Video,”
accurate enough for static object reconstruction. Proc. ACM SIGGRAPH ’08, vol. 27, no. 3, pp. 98:1-98:10, 2008.
[2] D. Vlasic, I. Baran, W. Matusik, and J. Popovic, “Articulated
Mesh Animation from Multi-View Silhouettes,” Proc. ACM
SIGGRAPH ’08, vol. 27, no. 1, pp. 97-1-97-9, 2008.
8 CONCLUSION AND FUTURE WORKS [3] S.M. Seitz, B. Curless, J. Diebel, D. Scharstein, and R. Szeliski, “A
In this paper, we first review the differences between global Vomparison and Evaluation of Multi-View Stereo Reconstruction
optimization MVS and multistage-local-processing MVS. Algorithms,” Proc. IEEE Int’l Conf. Computer Vision and Pattern
Recognition (CVPR ’06), pp. 519-528, June 2006.
The former cannot guarantee comprehensive accuracy on [4] S. Roy and I. Cox, “A Maximum Flow Formulation of the
all the surface regions under single optimization parameter, Ncamera Stereo Correspondence Problem,” Proc. IEEE Int’l Conf.
while the latter are not suitable for FVV because of their low Computer Vision (ICCV ’98), pp. 492-499, Jan. 1998.
completeness and robustness. [5] V. Kolmogorov and R. Zabih, “Multi-Camera Scene Recon-
To overcome the above limitations, a point-cloud-based struction via Graph Cuts,” Proc. European Conf. Computer
Vision (ECCV ’02), pp. 82-96, May 2002.
MVS algorithm belonging to the multistage-local-processing [6] G. Vogiatzis, P. Torr, and R. Cipolla, “Multi-View Stereo via
MVS is proposed for accurate and robust free-viewpoint Volumetric Graph-Cuts,” Proc. IEEE Int’l Conf. Computer Vision and
video. The idea is inspired from the traditional point cloud Pattern Recognition (CVPR ’05), pp. 391-398, 2005.
scanning and reconstruction philosophy. To guarantee [7] S. Tran and L. Davis, “3D Surface Reconstruction Using Graph
reconstruction accuracy, point clouds are first extracted Cuts with Surface Constraints,” Proc. European Conf. Computer
Vision (ECCV ’06), May 2006.
according to a stereo matching metric which is robust to [8] S.N. Sinha, P. Mordohai, and M. Pollefeys, “Multi-View Stereo via
noise, occlusion, and lack of texture. Visual hull informa- Graph Cuts on the Dual of an Adaptive Tetrahedral Mesh,” Proc.
tion, frontier points, and implicit points are detected and IEEE Int’l Conf. Computer Vision (ICCV ’07), Oct. 2007.
fused in all the reconstruction modules. New techniques [9] G. Vogiatzis, C.H. Esteban, P.H.S. Torr, and R. Cipolla, “Multi-
used in this work include noise, occlusion and texture View Stereo via Volumetric Graph-Cuts and Occlusion Robust
Photo-Consistency,” IEEE Trans. Pattern Analysis and Machine
robust stereo matching, individual point cloud error Intelligence, vol. 29, no. 12, pp. 2241-2246, Dec. 2007.
cleaning, conflicting point removing, fidelity-based Poisson [10] K. Kutulakos and S. Seitz, “A Theory of Shape by Space Carving,”
surface reconstruction, and space constrained remeshing. Int’l J. Computer Vision, vol. 38, no. 3, pp. 199-218, 2000.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.
418 IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 16, NO. 3, MAY/JUNE 2010
[11] G. Slabaugh, B. Culbertson, T. Malzbender, and M. Stevens, [34] J. Starck, G. Miller, and A. Hilton, “Volumetric Stereo with
“Methods for Volumetric Reconstruction of Visual Scenes,” Int’l J. Silhouette and Feature Constraints,” Proc. British Machine Vision
Computer Vision, vol. 57, no. 3, pp. 179-199, 2004. Conference (BMVC ’06), Sept. 2006.
[12] C.H. Esteban and F. Schmitt, “Silhouette and Stereo Fusion for 3D [35] J. Starck and A. Hilton, “Surface Capture for Performance Based
Object Modeling,” Computer Vision and Image Understanding, Animation,” IEEE Computer Graphics and Applications, vol. 27,
vol. 96, no. 3, pp. 367-392, 2004. no. 3, pp. 21-31, July 2007.
[13] G. Zeng, S. Paris, L. Quan, and F. Sillion, “Progressive Surface [36] M. Alexa, J. Behr, D. Cohen-Or, S. Fleishman, D. Levin, and C.T.
Reconstruction from Images Using a Local Prior,” Proc. IEEE Int’l Silva, “Point Set Surfaces,” Proc. IEEE Conf. Visualization, pp. 21-
Conf. Computer Vision (ICCV ’05), pp. 1230-1237, Oct. 2005. 28, 2001.
[14] J.-P. Pons, R. Keriven, and O. Faugeras, “Modelling Dynamic [37] O. Schall, A. Belyaev, and H-P. Seidel, “Robust Filtering of Noisy
Scenes by Registering Multi-View Image Sequences,” Proc. IEEE Scattered Point Data,” Proc. IEEE Symp. Point-Based Graphics (SPG
Int’l Conf. Computer Vision and Pattern Recognition (CVPR ’05), June ’05), pp. 71-77, 2005.
2005. [38] Y. Ohtake, A. Belyaev, M. Alexa, G. Turk, and H.P. Seidel, “Multi-
[15] A. Zaharescu, E. Boyer, and R. Horaud, “Transformesh: A Level Partition of Unity Implicits,” ACM Trans. Graphics, vol. 22,
Topology-Adaptive Mesh-Based Approach to Surface Evolution,” pp. 463-470, 2003.
Proc. Asian Conf. Computer Vision (ACCV ’07), Nov. 2007. [39] M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson Surface
[16] Y. Liu, Q. Dai, and W. Xu, “Continuous Depth Estimation for Reconstruction,” Proc. Fourth Eurographics Symp. Geometry Proces-
Multi-View Stereo,” Proc. IEEE Int’l Conf. Computer Vision and sing (SGP ’06), June 2006.
Pattern Recognition (CVPR ’09), June 2009. [40] N.D.F. Campbell, G. Vogiatzis, C. Hernandez, and R. Cipolla,
“Using Multiple Hypotheses to Improve Depth-Maps for Multi-
[17] P. Gargallo and P. Sturm, “Bayesian 3D Modeling from Images
View Stereo,” Proc. European Conf. Computer Vision (ECCV ’08),
Using Multiple Depth Maps,” Proc. IEEE Int’l Conf. Computer
Oct. 2008.
Vision and Pattern Recognition (CVPR ’05), pp. 885-891, June 2005.
[41] P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahm, R.
[18] C. Strecha, R. Fransens, and L.V. Gool, “Combined Depth and Yang, D. Nister, and M. Pollefeys, “Real-Time Visibility Based
Outlier Estimation in Multi-View Stereo,” Proc. IEEE Int’l Conf. Fusion of Depth Maps,” Proc. IEEE Int’l Conf. Computer Vision
Computer Vision and Pattern Recognition (CVPR ’06), June 2006. (ICCV ’07), Oct. 2007.
[19] M. Goesele, B. Curless, and S.M. Seitz, “Multi-View Stereo [42] Y. Furukawa and J. Ponce, “Carved Visual Hulls for Image-Based
Revisited,” Proc. IEEE Int’l Conf. Computer Vision and Pattern Modeling,” Int’l J. Computer Vision, vol. 81, no. 1, pp. 53-67, Mar.
Recognition (CVPR ’06), June 2006. 2009.
[20] P. Merrell, A. Akbarzadeh, L. Wang, P. Mordohai, J.-M. Frahm, R. [43] D. Bradley, T. Popa, A. Sheffer, W. Heidrich, and T. Boubekeur,
Yang, D. Nister, and M. Pollefeys, “Real-Time Visibility-Based “Markerless Garment Capture,” Proc. ACM SIGGRAPH ’08,
Fusion of Depth Maps,” Proc. IEEE Int’l Conf. Computer Vision vol. 27, no. 3, pp. 99-106, 2008.
(ICCV ’07), Oct. 2007. [44] Mview, http://vision.middlebury.edu/mview/, 2009.
[21] C. Zach, T. Pock, and H. Bischof, “A Globally Optimal Algorithm
for Robust Tv-l1 Range Image Integration,” Proc. IEEE Int’l Conf. Yebin Liu received the BE degree from Beijing
Computer Vision (ICCV ’07), Oct. 2007. University of Posts and Telecommunications,
[22] D. Bradley, T. Boubekeur, and T. Berlin, “Accurate Multi-View P.R. China, in 2002, and the PhD degree from
Reconstruction Using Robust Binocular Stereo and Surface the Automation Department, Tsinghua Univer-
Meshing,” Proc. IEEE Int’l Conf. Computer Vision and Pattern sity, Beijing, P.R. China, in 2009. He is currently
Recognition (CVPR ’08), June 2008. a postdoctoral research fellow in the Automation
[23] A. Manessis, A. Hilton, P. Palmer, P. McLauchlan, and X. Shen, Department, Tsinghua University, Beijing, P.R.
“Reconstruction of Scene Models from Sparse 3D Structure,” Proc. China. His research interests include light field,
IEEE Int’l Conf. Computer Vision and Pattern Recognition (CVPR ’00), image-based modeling, and rendering multi-
pp. 666-673, June 2000. camera array techniques.
[24] Y. Furukawa and J. Ponce, “Accurate, Dense, and Robust
Multiview Stereopsis,” Proc. IEEE Int’l Conf. Computer Vision and
Pattern Recognition (CVPR ’07), June 2007. Qionghai Dai received the BS degree in
[25] M. Goesele, N. Snavely, B. Curless, H. Hoppe, and S.M. Seitz, mathematics from Shanxi Normal University,
“Multi-View Stereo for Community Photo Collections,” Proc. IEEE P.R. China, in 1987, and the ME and PhD
Int’l Conf. Computer Vision (ICCV ’07), Oct. 2007. degrees in computer science and automation
[26] M. Habbecke and L. Kobbelt, “A Surface-Growing Approach to from Northeastern University, P.R. China, in
Multi-View Stereo Reconstruction,” Proc. IEEE Int’l Conf. Computer 1994 and 1996, respectively. Since 1997, he has
Vision and Pattern Recognition (CVPR ’07), June 2007. been with the faculty of Tsinghua University,
[27] P. Labatut, J.-P. Pons, and R. Keriven, “Efficient Multi-View Beijing, P.R. China, and is currently a professor
Reconstruction of Large-Scale Scenes Using Interest Points, and the director of the Broadband Networks and
Delaunay Triangulation and Graph Cuts,” Proc. IEEE Int’l Conf. Digital Media Laboratory. His research areas
Computer Vision (ICCV ’07), Oct. 2007. include video communication, computer vision, and graphics. He is a
[28] T. Kanade, P. Rander, and P. Narayanan, “Virtualized Reality: senior member of the IEEE.
Constructing Virtual Worlds from Real Scenes,” IEEE Trans.
Multimedia, vol. 4, no. 1, pp. 34-47, Jan.-Mar. 1997. Wenli Xu received the BS degree in electrical
[29] J.-S. Franco, M. Lapierre, and E. Boyer, “Visual Shapes of engineering and the ME degree in automatic
Silhouette Sets,” Proc. Int’l Symp. 3D Data Processing, Visualization control engineering from Tsinghua University,
and Transmission (3DPVT ’06), pp. 397-404, 2006. Beijing, P.R. China, in 1970 and 1980, respec-
[30] K. Tomiyama, Y. Orihara, M. Katayama, and Y. Iwadate, tively, and the PhD degree in electrical and
“Algorithm for Dynamic 3D Object Generation from Multi- computer engineering from the University of
Viewpoint Images,” Proc. SPIE ’04, pp. 153-161, 2004. Colorado, Boulder, in 1990. He is currently a
professor at Tsinghua University and the director
[31] T. Matsuyama, X. Wu, T. Takai, and S. Nobuhara, “Real-Time 3D
of the Chinese Association of Automation. His
Shape Reconstruction, Dynamic 3D Mesh Deformation, and High
research interests are mainly in the areas of
Fidelity Visualization for 3D Video,” Computer Vision and Image
automatic control and computer vision.
Understanding, vol. 96, no. 3, pp. 393-434, 2004.
[32] J. Starck and A. Hilton, “Virtual View Synthesis of People from
Multiple View Video Sequences,” Graphical Models, vol. 67, no. 6,
pp. 600-620, 2005. . For more information on this or any other computing topic,
[33] B. Goldluecke and M. Magnor, “Space-Time Isosurface Evolution please visit our Digital Library at www.computer.org/publications/dlib.
for Temporally Coherent 3D Reconstruction,” Proc. IEEE Int’l Conf.
Computer Vision and Pattern Recognition (CVPR ’04), pp. 350-355,
June 2004.
Authorized licensd use limted to: IE Xplore. Downlade on May 13,20 at 1:439 UTC from IE Xplore. Restricon aply.