3D Reconstruction From A Single Still Image Based PDF
3D Reconstruction From A Single Still Image Based PDF
1051/ itmconf/20171201018
ITA 2017
Abstract: we propose a framework of combining Machine Learning with Dynamic Optimization for reconstructing
scene in 3D automatically from a single still image of unstructured outdoor environment based on monocular vision
of an uncalibrated camera. After segmenting image first time, a kind of searching tree strategy based on Bayes rule is
used to identify the hierarchy of all areas on occlusion. After superpixel segmenting image second time, the AdaBoost
algorithm is applied in the integration detection to the depth of lighting, texture and material. Finally, all the factors
above are optimized with constrained conditions, acquiring the whole depthmap of an image. Integrate the source
image with its depthmap in point-cloud or bilinear interpolation styles, realizing 3D reconstruction. Experiment in
comparisons with typical methods in associated database demonstrates our method improves the reasonability of
estimation to the overall 3D architecture of image’s scene to a certain extent. And it does not need any manual assist
and any camera model information.
© The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution
License 4.0 (http://creativecommons.org/licenses/by/4.0/).
ITM Web of Conferences 12, 01018 (2017) DOI: 10.1051/ itmconf/20171201018
ITA 2017
an uncalibrated camera; Section 3 provides relative model information, here, the depths studied are not real
experiment and results analysis; Section 4 concludes absolute depths, but the relative depths of the different
this paper. components in an image.
An Uncalibrated Camera
Fig.1 the principle of the 3D reconstruction
2.1 Global Principle Fig.1 gives this paper’ main principle of the 3D
reconstruction directly from monocular 2D image, in
Comparing with 3D image, the monocular vision which: the source image is segmented with bigger scale
produces 2D image, which has only the information at and smaller scale in turn, to make occlusion
orientations of X and Y, without Z. That is to say, it has identification and depth integration detection on
not the depth information. So, in order to realize 3D material, texture and lighting; then the global depth
reconstruction, we need to estimate the depth architecture is estimated with dynamic optimization
information from 2D image. based on the two stages’ results above; at last,
If we close our one eye, with the other one open to combining the global depth architecture with source
see a picture, we still can feel the relationship of image, the 3D image reconstruction attached with
different parts’ back and forth to the object in image. To bilinear interpolation or point-cloud is realized. Next,
the complex image, according to some rules, we often we will describe each part of the principle in detail.
can also infer its relationship of back or forth with other
objects around and further infer the whole architecture
of depth from an image. These phenomena provide us 2.2 Implementation
with some possibilities for the 3D reconstruction of the
scene from monocular image. Here, we think that the To estimate the depth structure reasonably, the key is
perception above maybe some past experience and the need to make correct image representation as much as
rules above maybe some kind of optimization. So, this possible to the image’s important information. Image
paper will attempt to combine Machine Learning with representation is usually hierarchical, so it is carried on
Dynamic Optimization for the computer to solve this the basis of the image segmentation in different scales
problem automatically. [27, 28], in which the image’s different components
In a picture, usually we will see some familiar or represent different meanings.
unfamiliar object, which involves shape, material,
texture, color, and the effect of illumination. And we 2.2.1 Occlusion Identification
usually can also accept the inferences that:
A. For a familiar object, if it only can be seen part First, we segment the image in bigger scale with
shape, it is usually sheltered by something. method in Ref. [27] to acquire familiar objects and
B. For the material, the part near to us is usually identify the associated occlusions. Here, the occluding
clearer and rougher than the part in distance. phenomenon is defined as: when object A can only be
C. For the texture, usually, the near part presents seen in partial shape and the other area where should be
sparser, however, the part in distance presents denser. embodied as the rest shape of A, is occupied by object
D. For the effect of lighting, the part near light B, then we think that object A is occluded by object B,
source is usually brighter than the part far away. or B is in front of A. The occluded areas’ divisions refer
E. For the color, it usually changes with different to [29]. The related principle is as Eqs. (2-1) displaying,
objects or different parts of one object or even different in which M, N are two neighboring objects in an image;
lighting sources etc. X represents the associated traits; Y is the associated
According to the experiences above: we can use prior knowledge for familiar objects and Q is the
some samples owning different depth levels for the number of objects in an image. According to Bayesian
material, texture and brightness of pictures respectively rule, the more remarkable the traits information P(X/(M,
to train the relative learning machines; then we use N)) and the prior information P(Y/(M, N, X)) are, the
some priority to infer the depth level of some area in bigger the value of posterior probability function P((M,
image and for some objects with familiar shapes, we N)/(X, Y)) on occlusion will be.
can use some decision algorithm to infer the possible P Y M , N , X P X M , N
P M , N X ,Y (2 1)
existed occlusions; finally, to the whole image, we can
P Y M , N , X P X M , N
M N Q
use optimization algorithm to integrate with all the
inferences above; thus, we will acquire multi-possible To find out all the familiar objects’ occlusive
architectures of depths for an image and we will select relationships in an image, we adopt a kind of searching
the most possible one as the final result according to the tree algorithm. The searching process is just as Fig.2, in
computational optimal value or the biggest probability which all the familiar objects’ relative occlusive
value. In addition, because we do not use any camera’s relationships of each other are inferred by the principle
2
ITM Web of Conferences 12, 01018 (2017) DOI: 10.1051/ itmconf/20171201018
ITA 2017
of Eqs. (2-1) and at last, relative different number is On the selection of associated traits: the textural energy
used to represent different depth hierarchy. Of course, which are computed from Law’s masks in Refs. [31, 32]
there exists the likelihood of inferring error on the act as textural features to embody the extent of textural
occluding problem. It relies on whether the prior denseness at different depths; the Haar-like traits in Ref.
information on the familiar objects’ understanding is [33] act as material features to describe the extent of
reliable. And the more reliable the prior knowledge is, material smoothness and ambiguity with depth
the more accurate the inference is. changing; the lighting model on depth refers to [34, 35],
in which the distance from light source to viewer is
Whether there exists occlusion between A and B? regarded as the relative depth herein, then image’s other
Yes
No components are assigned to corresponding lighting
From occlusion, A and B are in the same
hierarchical position, so number A=B.
Whether A occlude B? categorical depth classes and model parameters in turn.
Yes No
A is in front of B, B is in front of A,
so number A>B. so number B>A. Integration detection on depth
Priority order: 1. material; 2. texture; 3. lighting
Use A or B in the next identification. Use A and B respectively, in the next identification
Detecting depth class Detecting depth class Detecting depth class
on Lighting factor on texture factor on material factor
Next group inference
identification, we predefine multiple depth classes From optimization theory in Ref. [36], we can infer
according to different changing extent of lighting, optimal
texture and material with depth changing. We use the
the value of cpq as Eqs. (2-3)
logistic regression version of AdaBoost algorithm in
Ref. [30] to train each class of depth samples and
c pq max gq mp , n p , d p p
pq ,0 2-3
acquire relative detector. At each class of depth, the Combing Eqs. (2-2) with (2-3), we can construct
relative detector based on textural or material traits’ the augmented Lagrangian function (2-4).
d
g m , n , d max g m , n , d
, 0
5
integration depth on material, texture and lighting. Fig.4 Thus the problem (2-2) above is transformed into
displays the associated detecting principle, in which the formula (2-5).
property of AdaBoost that uses weak classifiers to min LA d ,
, 2-5
d
construct strong classifiers is applied for both the single
depth class detection and the integration depth detection.
3
ITM Web of Conferences 12, 01018 (2017) DOI: 10.1051/ itmconf/20171201018
ITA 2017
4
ITM Web of Conferences 12, 01018 (2017) DOI: 10.1051/ itmconf/20171201018
ITA 2017
Depth error
to infer both 3D location and orientation of the patches
+(+ 6D[HQD 2XUPHWKRG
+(+ 6D[HQD 2XUPHWKRG
in an image without any explicit assumptions about the
structure of the scene makes it generalize well, even to
scenes with significant nonvertical structure. The
deficiency exists in that at the aspect of the whole 8-a. Depth error 8-b. Relative depth error
structural reconstruction, it lacks the comprehensive
Correct (%)
estimation is sometimes inaccurate, such as fig. 7-c3.
Our algorithm’s global dynamic optimization seems to +(+ 6D[HQD 2XUPHWKRG +(+ 6D[HQD 2XUPHWKRG
5
ITM Web of Conferences 12, 01018 (2017) DOI: 10.1051/ itmconf/20171201018
ITA 2017
on. 2010, IEEE. p. 1000-1003. 17. Wang, G., et al., Single view metrology from scene
3. Bok, Y., Y. Hwang, and I.S. Kweon, Accurate constraints. Image and Vision Computing, 2005.
motion estimation and high-precision 3d 23(9): p. 831-840.
reconstruction by sensor fusion, in Robotics and 18. Delage, E., H. Lee, and A.Y. Ng, A dynamic
Automation, 2007 IEEE International Conference bayesian network model for autonomous 3d
on. 2007, IEEE. p. 4721-4726. reconstruction from a single indoor image, in
4. Cao, X., Z. Li, and Q. Dai, Semi-automatic 2D-to- Computer Vision and Pattern Recognition, 2006
3D conversion using disparity propagation. IEEE IEEE Computer Society Conference on. 2006,
TRANSACTIONS ON BROADCASTING, 2011. IEEE. p. 2418-2428.
57(2): p. 491-499. 19. Han, F. and S.-C. Zhu. Bayesian reconstruction of
5. Hertzmann, A. and S.M. Seitz, Example-based 3d shapes and scenes from a single image. in
photometric stereo: Shape reconstruction with Higher-Level Knowledge in 3D Modeling and
general, varying brdfs. IEEE TRANSACTIONS Motion Analysis, 2003. HLK 2003. First IEEE
ON PATTERN ANALYSIS AND MACHINE International Workshop on. 2003. IEEE.
INTELLIGENCE, 2005. 27(8): p. 1254-1264. 20. Hoiem, D., A.A. Efros, and M. Hebert, Recovering
6. Guillou, E., et al., Using vanishing points for surface layout from an image. International Journal
camera calibration and coarse 3D reconstruction of Computer Vision, 2007. 75(1): p. 151-172.
from a single image. The Visual Computer, 2000. 21. Hoiem, D., A.A. Efros, and M. Hebert, Closing the
16(7): p. 396-410. loop in scene interpretation, in Computer Vision
7. Wilczkowiak, M., E. Boyer, and P. Sturm. Camera and Pattern Recognition. CVPR. IEEE Conference
calibration and 3D reconstruction from single on. 2008, IEEE. p. 1-8.
images using parallelepipeds. in Computer Vision, 22. Hoiem, D., A.A. Efros, and M. Hebert, Automatic
2001. ICCV 2001. Proceedings. Eighth IEEE photo pop-up. ACM Transactions on Graphics
International Conference on. 2001. IEEE. (TOG), , 2005. 24(3): p. 577-584.
8. Wang, G., et al., Camera calibration and 3D 23. Hoiem, D., A.A. Efros, and M. Hebert, Geometric
reconstruction from a single view based on scene context from a single image, in Computer Vision.
constraints. Image and Vision Computing, 2005. ICCV 2005. Tenth IEEE International Conference
23(3): p. 311-323. on. 2005, IEEE. p. 654-661.
9. Criminisi, A., I. Reid, and A. Zisserman, Single 24. Saxena, A., M. Sun, and A.Y. Ng, Make3d:
View Metrology. International Journal of Computer Learning 3d scene structure from a single still
Vision, 2000. 40(2): p. 123-148. image. IEEE TRANSACTIONS ON PATTERN
10. Barinova, O., et al., Fast automatic single-view 3-d ANALYSIS AND MACHINE INTELLIGENCE,
reconstruction of urban scenes, in Computer 2009. 31(5): p. 824-840.
Vision-ECCV 2008. 2008, Springer. p. 100-113. 25. Saxena, A., S.H. Chung, and A.Y. Ng, Learning
11. Sturm, P. and S. Maybank. A method for interactive depth from single monocular images, in Advances
3d reconstruction of piecewise planar objects from in Neural Information Processing Systems(NIPS).
single images. in The 10th British Machine Vision 2005. p. 1161-1168.
Conference (BMVC'99). 1999. The British 26. Delage, E., H. Lee, and A.Y. Ng, Automatic single-
Machine Vision Association (BMVA). image 3d reconstructions of indoor manhattan
12. Willneff, J., J. Poon, and C. Fraser, Single-image world scenes, in Robotics Research. 2007, Springer.
high-resolution satellite data for 3D information p. 305-321.
extraction. International Archives of 27. Levinshtein, A., et al., Turbopixels: Fast
Photogrammetry, Remote Sensing and Spatial superpixels using geometric flows. 2009, IEEE
Information Sciences, 2005. 36(1/W3): p. 1-6. TRANSACTIONS ON PATTERN ANALYSIS
13. Namboodiri, V.P. and S. Chaudhuri, Recovery of AND MACHINE INTELLIGENCE,. p. 2290-2297.
relative depth from a single observation using an 28. Felzenszwalb, P.F. and D.P. Huttenlocher, Efficient
uncalibrated (real-aperture) camera, in Computer graph-based image segmentation, in International
Vision and Pattern Recognition, CVPR. IEEE Journal of Computer Vision. 2004, Springer. p.
Conference on. 2008, IEEE. p. 1-6. 167-181.
14. Zhuo, S. and T. Sim, On the recovery of depth from 29. Hoiem, D., et al., Recovering occlusion boundaries
a single defocused image, in Computer Analysis of from a single image, in Computer Vision, 2007.
Images and Patterns. 2009, Springer. p. 889-897. ICCV 2007. IEEE 11th International Conference on.
15. Van den Heuvel, F.A., 3D reconstruction from a 2007, IEEE. p. 1-8.
single image using geometric constraints. ISPRS 30. Collins, M., R.E. Schapire, and Y. Singer, Logistic
Journal of Photogrammetry and Remote Sensing, regression, AdaBoost and Bregman distances, in
1998. 53(6): p. 354-368. Machine Learning. 2002, Springer. p. 253-285.
16. El-Hakim, S.F. A flexible approach to 3D 31. Davies, E.R., Laws' texture energy in texture. 1997,
reconstruction from single images. in ACM Machine vision: theory, algorithms, practicalities.
SIGGRAPH. 2001. Citeseer.
6
ITM Web of Conferences 12, 01018 (2017) DOI: 10.1051/ itmconf/20171201018
ITA 2017