Towards Automatic Modeling of Monuments and Towers
Towards Automatic Modeling of Monuments and Towers
Three-dimensional modeling from images, when Three-dimensional modeling from images, when
carried out entirely by a human, can be time consuming carried out entirely by a human, can be very time
and impractical for large-scale projects. On the other consuming and impractical for large-scale projects.
hand, full automation may be unachievable or not Efforts to increase the level of automation are essential in
accurate enough for many applications such as culture order to broaden the use of this technology. So far,
heritage documentation. In addition, three-dimensional however, the efforts to completely automate the
modeling from images, particularly fully automated processing, from image capture to the output of a 3D
methods, requires the extraction of features, such as model, are not always successful or applicable [3, 4]. Full
corners, and needs them to appear in multiple images. automation has been achieved under certain conditions
However, in practical situations those features are not and up to finding point correspondence and camera
always available, sometimes not even in a single image, positions and orientation [5]. Self-calibration and 3D
due to occlusions or lack of texture on the surface. Taking construction still requires a human in the loop either to
closely separated images or optimally designing view specify constraints or to perform post processing [6, 7].
locations can preclude some occlusions. However, taking Also some sacrifice to the accuracy and fidelity of the
such images is often not practical and we are left with created model may result when using full automation (see
small number of images that do not properly cover every section 1.3). Automated methods also rely on features that
surface or corner. The approach presented in this paper can be extracted automatically from the scene, thus
uses both interactive and automatic techniques, each occlusions and un-textured surfaces are problematic. We
where it is best suited, to accurately and completely often end up with areas with too many features that are
model monuments and towers. It particularly focuses on not all needed for modeling, and areas with no or too few
automating the construction of unmarked surfaces such as features to produce a complete model. This means that
columns, arches, and blocks from minimum available post processing is often required which means that user
clues. It also extracts the occluded or invisible corners interaction is still needed. Most impressive results were
from existing ones. Many examples, such as Arc de achieved with highly interactive approaches e.g. [8].
Triomphe in Paris, Florence’s St. John baptistery at Some interactive approaches with automated features that
Santa Maria del Fiori Cathedral, and other monuments take advantage of environment constraints proved
and towers from around the world are completely effective [3, 9]. Other more automated techniques that
modeled from a small number of images taken by tourists. target specific objects such as architecture [10, 11, 12]
have also been developed. If the goal is creating accurate
1. Introduction and complete 3D models of medium and large scale
objects under practical situations using only information
This paper addresses several interconnected issues: full contained in images, then full automation is still in the
automation versus partial automation, how to handle the future.
inevitable occlusions and lack of features or texture, and
the importance of high accuracy to constructing and Full automation is a priority for certain applications
documenting monuments and towers. We will address such as navigation, telepresence, augmented reality, and
only image-based approaches. However, it is important to where a model is needed fast for decision-making. In
note that to achieve complete geometric details, range those applications, complete details and high accuracy are
sensors will also be required for sculpted surfaces that are secondary. For other applications such as documentation
usually found on many monuments [1]. This requires the and even virtual museums full automation cannot excuse
integration of the two types of data [2]. the missing details or lack of accuracy.
Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT’02)
0-7695-1521-5/02 $17.00 © 2002 IEEE
1.2. Occlusions and Lack of Texture level of automation to assist the operator without
sacrificing accuracy or level of details. Figure 1
Three-dimensional measurement and modeling from summarizes the procedure and indicates which step is
images obviously requires that relevant points be visible interactive and which is automatic (interactive operations
in the image. This is often not possible either because the are grayed). Images are taken, all with the same camera
points or region of interest are hidden or occluded behind set up, from positions where the object is suitably
an object or a surface, or because there is no mark, edge, showing. Parts of the object should appear in two or more
or visual feature to extract. In fact even without multiple images when possible, and there should be a reasonable
objects in the scene and when we can take images from distance, or baseline, between the images. Several
well planned positions, there are not many objects that features appearing in multiple images are interactively
can be imaged without having portions of its surfaces extracted from the images, usually 12-15 per image. The
either invisible or without texture to extract. In objects user points to a corner and label it with a unique number
such as architectures and monuments in their normal and the system will accurately extract the corner point.
settings we are also faced with restrictions limiting the Harris operator is used [17] for its simplicity and
positions from which the images can be taken. Also efficiency. Image registration and 3D coordinate
illumination variations and shadows hamper feature computation are based on the photogrammetric bundle
extraction. Not only those factors preclude the modeling adjustment approach for its accuracy, flexibility, and
of occluded parts but also have negative effect on the effectiveness [18] compared to other structure from
modeling of visible parts, for example when applying motion techniques. Advances in bundle adjustment
automatic matching. eliminated the need for control points or initial
approximate coordinates. Many other aspects required for
1.3. Accuracy of 3D Modeling high accuracy such as camera calibration with full
distortion correction have long been solved problems in
Historic monuments and towers are particularly Photogrammetry [16] and will not be discussed in the
important and thus need to be constructed with high remainder of the paper.
accuracy both for documentation and visualization
purposes. To achieve the needed accuracy, one must use
the most rigorous approach for 3D modeling from images
rather than the simplest or easiest to implement. Tests
showed that methods based on projective geometry,
although an elegant and efficient approach, result in
geometric errors in the range of 4 to 5% [6, 13]. This
means that 20-meter tower could have a significant 1-
meter error. Photogrammetric methods such as bundle
adjustment and proper camera calibration [14, 15, 16],
although interactive and not as easy to use as projective
methods, give several orders of magnitude smaller error,
in the range of 0.01-0.001% on well defined features,
depending on camera resolution and lens quality.
Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT’02)
0-7695-1521-5/02 $17.00 © 2002 IEEE
more points into each of the segmented regions. The
matching is constrained by the epipolar condition and
disparity range computed from the 3D coordinates of the
initial points. The bundle adjustment is repeated with the
newly added points to improve on previous results and re-
compute 3D coordinate of all points.
Since many parts of the scene will show only in one Figure 2. Left (a) 4 seed points are extracted
image, an approach to extract 3D information from a on the base and crown of the column, right (b)
single image is necessary [20]. Our approach applies the column points are added automatically.
equation of the surface as a constraint, along with the
camera parameters, to the single-image coordinates to Arches are constructed by first fitting a plane to seed
compute the corresponding 3D coordinates. For example points on the wall (figure 3-a). An edge detector is
in many monuments and towers, the walls are planes that applied to the region (figure 3-b) and points at constant
are either parallel or perpendicular to each other. The interval along the arch are sampled. For edge detection, a
equations of some of the planes can be determined from specially designed morphological operator was developed
seed points previously measured. The remaining plane (a variation on [22]). Using the image coordinate of these
equations are determined using the knowledge that they points (in one image only), the known image parameters
are either perpendicular or parallel to one of the planes (from the bundle adjustment), and the equation of the
already determined. With little effort, the equations of all plane, the 3D coordinates are computed (figure 4).
the planes on the structure can be computed. From these
equations and the known camera parameters for each
image, we can determine 3D coordinates of any point or
pixel from a single image. This can also be applied to
surfaces like quadrics or cylinders whose equations can be
computed from existing points. Other constraints, such as
symmetry and points with the same depth or same height
are also used.
Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT’02)
0-7695-1521-5/02 $17.00 © 2002 IEEE
intersections of each normal will produce a new point (a automatically. Eight examples are presented here (they
black point in figure 5) automatically. We now have and several more are on the web [23]), each to illustrate
sufficient points to fully construct the block. More details specific feature. They are presented in wire-frame, solid
of the procedure are given in the following examples. model without texture, and solid model with texture, in
figures 6 to 13. In some of the monuments, we found
dimensional information available in travel or history
books. This information was not used or needed in the
model construction, but was valuable in evaluating the
accuracy.
3. Examples
Over the past year, members of our group visited
different cities around the world. Whenever possible, they
took images covering various interesting monuments. The
images were taken during routine tours without any
advanced planning of where to take the images. We took Figure 6. Arc de Figure 7. St. John
the images just like any typical tourist, by walking around Triomphe, Paris (14 Baptistery, Florence
the monument and getting the best view under real images). Illustrates (8 images). Illustrates
conditions such as presence of other tourists, vehicles, and automatic arches. automatic blocks
other buildings and objects. Several types of digital
cameras and regular film cameras (where the film was The next example is the St. John baptistery in Florence
digitized later) were used. The results were very (figure 7). The Olympus E-10 (4 Mega pixels) camera
encouraging and compelling. Over 100 models were was used to take eight images. The baptistery has eight
created using this approach, each one usually in 1-2 days sides. The actual dimensions were obtained from a plan in
of work by one person. The number of points and level of a book. The sides average about 13 m in length. Again we
interaction and automation obviously varied significantly will assign 13 m to one side and use it to scale the whole
from one model to another. Usually between 500 – 3000 model. The average difference between the model sides
points were needed, at least 80% of which were generated and the actual sides is less than 1 cm, or 0.07%. This is
significantly better than the accuracy of the Arc de
Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT’02)
0-7695-1521-5/02 $17.00 © 2002 IEEE
Triomphe (figure 6). This is due to the better camera used
(higher resolution, larger pixel size, and better quality
lens) and smaller size object with good feature definition.
Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT’02)
0-7695-1521-5/02 $17.00 © 2002 IEEE
automatically by taking advantage of some of the object Part B5A, Commission V, pp 309-318,
characteristics and making some realistic assumptions. Amsterdam, July 16-23, 2000.
Efforts to automate the whole procedure are continuing [10] Liebowitz, D., A. Criminisi, A. Zisserman, A.,
and will undoubtedly intensify in the future. In the mean ”Creating Architectural Models from Images”,
time in order to achieve immediate and useful results, EUROGRAPHICS ’99, 18(3), 1999.
parts of the process necessitate human interaction. [11] Tarini, A., P.Cignoni, C.Rocchini, R.Scopigno,
“Computer assisted reconstruction of buildings
5. Acknowledgements from photographic data”, Vision, Modeling and
Visualization 2000 Conference Proc., pp. 213-220,
Our colleagues François Blais and Eric Paquet took many Saarbrucken, DE, November 2000.
of the images. Emily Whiting constructed some models. [12] Dick, A.R., P.H.Torr, S.J. Ruffle, R.Cipolla,
“Combining single view recognition and multiple
6. References view stereo for architectural scenes”, Proc. 8th
IEEE International Conference on Computer
[1] Beraldin, J.-A., F. Blais, L. Cornouyer., M. Rioux, Vision (ICCV'01), pp. 268-274, July 2001.
S.F. El-Hakim, R. Rodella, F. Bernier, N. Harrison, [13] Georgis, N., M. Petrou, J. Kittler, “Error guided
“3D imaging system for rapid response on remote design of a 3D vision system”, IEEE Trans. PAMI,
sites”, IEEE proc. of 2nd. Int. Conf. On 3D Digital 20(4), pp. 366-379, 1998.
Imaging and Modeling (3DIM’99), pp 34- 43, 1999 [14] Brown, D.C., “The bundle adjustment - Progress
[2] El-Hakim, S.F., “3D modeling of complex and prospective.” International Archives of
environments", In Proc. Videometrics and Optical Photogrammetry, 21(3): 33 pages, ISP Congress,
Methods for 3D Shape Measurement, San Jose, Helsinki, Finland, 1976.
California, SPIE Vol. 4309, pp 162-173, Jan. 2001. [15] Fraser, C.S., “Network design considerations for
[3] Shum, H.Y., R. Szeliski, S. Baker, M. Han, P. non-topographic photogrammetry”,
Anandan, “Interactive 3D modeling from multiple Photogrammetric Engineering & Remote Sensing,
images using scene regularities”, European 50(8), pp. 1115-1126, 1994.
Workshop 3D Structure from Multiple Images of [16] Fraser, C.S., “Digital camera self-calibration”,
Large-scale Environments – SMILE (ECCV’98), ISPRS Journal for Photogrammetry and Remote
pp 236-252, 1998. Sensing, 52(4), pp. 149-159, 1997.
[4] Oliensis, J., “A critique of structure-from-motion [17] Harris C., M. Stephens, “A combined corner and
algorithms”, Computer Vision and Image edge detector", Proc. 4th Alvey Vision Conf., pp.
Understanding, 80(2), pp 172-214, 2000. 147-151, 1998.
[5] Pollefeys, M., R. Koch, M. Vergauwen., L. Van [18] Triggs, W., P. McLauchlan, R. Hartley, A.
Gool, “Hand-held acquisition of 3D models with a Fitzgibbon, Bundle Adjustment for Structure from
video camera”, IEEE proceedings of 2nd. Int. Conf. Motion, in Vision Algorithms: Theory and
On 3D Digital Imaging and Modeling (3DIM’99), Practice, Springer-Verlag, 2000.
pp. 14- 23, 1999. [19] Zorin, D.N., Subdivision and Multiresolution
[6] Pollefeys, M., R. Koch, L. Van Gool, “Self- Surface Representation. Ph.D. Thesis, Caltech,
calibration and metric reconstruction in spite of California, 1997.
varying and unknown intrinsic camera [20] van den Heuvel, F.A., “3D reconstruction from a
parameters”, International J. of Computer Vision, single image using geometric constraints”, ISPRS
32(1), pp. 7-25, 1999. Journal Photogrammetry & Remote Sensing, 53(6),
[7] Faugeras, O., L. Robert, S. Laveau, G. Csurka, C. pp. 354-368, 1998.
Zeller, C. Gauclin, I. Zoghlami, “3-D [21] Zwillinger, D. (ed.), Standard Mathematical
reconstruction of urban scenes from image Tables, 30th Edition, CRC Press, Inc., West Palm
sequences”, Computer Vision and Image Beach, Florida, pp 311-316, 1996.
Understanding, 69(3), pp 292-309, 1998. [22] Lee, J., R. Haralick, L. Shapiro, “Morphologic
[8] Debevec, P., C.J. Taylor, J. Malik, “Modeling and edge detection”, IEEE Journal of Robotics and
rendering architecture from photographs: A hybrid Automation, 3(2), pp 142-156, 1987.
geometry and image-based approach”, [23] http://www.vit.iit.nrc.ca/elhakim/home.html.
SIGGRAPH’96, pp. 11–20, 1996.
[9] Gruen, A., “Semi-automatic approaches to site
recording and modeling”, International Archives of
Photogrammetry and Remote Sensing, Volume 33,
Proceedings of the First International Symposium on 3D Data Processing Visualization and Transmission (3DPVT’02)
0-7695-1521-5/02 $17.00 © 2002 IEEE