0% found this document useful (0 votes)

16 views

Resch Scalable Structure From 2015 CVPR Paper

Uploaded by

dio din

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Resch Scalable Structure From 2015 CVPR Paper

Uploaded by

dio din

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Scalable Structure from Motion for Densely Sampled Videos

B. Resch1,2 H. P. A. Lensch2,3 O. Wang1 M. Pollefeys3 A. Sorkine-Hornung1

1 2 3
Disney Research Zurich Tübingen University ETH Zurich

Abstract cation domains, it is often much simpler and more practical

to capture video instead of photographs to ensure a suffi-
Videos consisting of thousands of high resolution frames ciently broad as well as dense coverage of a scene. At the
are challenging for existing structure from motion (SfM) same time, video resolution and frame rate are constantly
and simultaneous-localization and mapping (SLAM) tech- increasing. Mobile cameras such as the iPhone can record
niques. We present a new approach for simultaneously com- more than 120 frames per second, yielding thousands of im-
puting extrinsic camera poses and 3D scene structure that ages in just a few seconds of capture. Older approaches
is capable of handling such large volumes of image data. based on feature tracking as well as modern sparse feature
The key insight behind this paper is to effectively exploit co- point and SLAM-based approaches do not scale well to such
herence in densely sampled video input. Our technical con- densely-sampled data, both due to numerical inaccuracies
tributions include robust tracking and selection of confident arising from small baselines, and computational tractability
video frames, a novel window bundle adjustment, frame-to- associated with the sheer quantity of pixels.
structure verification for globally consistent reconstructions But while densely sampled, high quality video data
with multi-loop closing, and utilizing efficient global linear presents many challenges, it also provides opportunities.
camera pose estimation in order to link both consecutive For instance, recent approaches have demonstrated how
and distant bundle adjustment windows. To our knowledge spatiotemporal coherence can be exploited in the context
we describe the first system that is capable of handling high of 3D reconstruction [15, 33]. The key insight in this paper
resolution, high frame-rate video data with close to real- is that such data also enables novel, more efficient strategies
time performance. In addition, our approach can robustly for achieving globally consistent geometric calibrations.
integrate data from different video sequences, allowing mul- Contributions. The main technical contributions that
tiple video streams to be simultaneously calibrated in an we propose are: a modification to KLT that allows for drift
efficient and globally optimal way. We demonstrate high free tracking over thousands of frames, a robust selection
quality alignment on large scale challenging datasets, e.g., of confident frames, a novel interleaved window bundle ad-
2-20 megapixel resolution at frame rates of 25-120 Hz with justment (BA) that makes optimizing large windows more
thousands of frames. efficient, uniform image coverage based point subsampling,
robust frame-to-structure verification to obtain global, wide
baseline anchors between camera poses, and the utiliza-
1. Introduction tion of an efficient linear camera post estimation (LCPE)
Structure from Motion (SfM), i.e., reconstructing cam- method that integrates information from both BA windows
era parameters and sparse scene structure from images, has and global anchors in a unified way.
a long history in computer vision. Early approaches con- As opposed to prior work, our approach does not rely
centrated on reconstruction from videos based on feature on analytical, fixed input size n-Point methods, which we
tracking techniques (see, e.g., [12, 27] for an overview). observed to be not good enough due the fact that they use
With the advent of robust feature descriptors like SIFT, less data than BA, and therefore yield less precise results for
SURF or ORB (see, e.g., the survey [29]), larger view dif- the small baselines of densely sampled video input. More-
ferences could be matched, and SfM techniques were suc- over, we combine results from piecewise camera track re-
cessfully extended to very large scale, unstructured sets of construction, loop closing, and linking between different
images [5,26]. Coupled with the availability of online photo camera tracks into a single nonlinear optimization proce-
collections, these approaches have become very popular, dure. This allows different camera tracks to help each other
enabling a wide range of novel applications such as 3D re- to get a good initialization for global optimization.
construction from thousands of photographs [6, 34]. When all of the contributions are combined, our method
However, for individual users, and in personalized appli- is able to obtain a globally consistent extrinsic camera cali-

1
bration in substantially less time then previous approaches, All these methods are tuned towards heterogeneous, un-
and on datasets with 1000s of high resolution frames. Fur- structured data, and as a consequence have difficulty when
thermore, our approach generalizes to an arbitrary number applied to densely sampled, coherent image sequences.
of input video sequences, allowing for rapid, globally con- This is due to per-frame feature point detection and pose
sistent calibration and scene reconstruction across multiple estimation using n-point algorithms causing unstable recon-
capture devices. In this paper we describe all of these con- structions, as well as computational inefficiency. In our ex-
tributions, and present detailed psuedocode for reimplemen- periments, we show that by explicitly considering coher-
tation in the supplemental material [1]. ence in the data, it is possible to achieve high quality recon-
In our system we leverage established existing ap- structions at significantly faster convergence and computa-
proaches, such as the commonly used KLT tracker [7], tion times.
SIFT histogram based frame similarity cost matrices [30], Coherent, densely sampled input. In contrast to above,
a global linear solver that integrates relative camera pose most techniques for densely sampled input such as video se-
constraints [14], and a robust depth-based point parameter- quences are based on continuously tracking feature points
ization [33]. throughout image sequences and iterative pose optimiza-
tion techniques [12, 24, 27]. These original methods were
2. Related Work designed for short, low resolution video sequences and did
not consider multi-loop closing.
Depending on underlying methodology and applications,
a multitude of different terminologies exists for geometric Particularly related are SLAM approaches and their vari-
camera auto-calibration, the most common being variants ants, as their aim is to compute accurate camera poses of
of structure from motion (SfM) and simultaneous localiza- a dynamically moving camera from a video stream. Of-
tion and mapping (SLAM). Here we classify prior works ten, however, such techniques are limited with respect to
into two categories according to their preferred input data; the supported scene size [16] or require additional sen-
unstructured and sparse vs. coherent and dense sampling. sor modalities [17]. Real-time methods based on feature
Unstructured, sparsely sampled input. A key chal- points [9] or dense, per-pixel tracking [21] are generally de-
lenge for methods focusing on sparsely sampled input such signed to provide as good as possible results with a small
as photo collections is that the data is generally unstructured input lag, rather than a final, fully consistent and high qual-
and heterogeneous, with significant appearance changes be- ity reconstruction that globally optimizes the poses of all
tween images. The current state-of-the-art is therefore gen- input frames. CoSLAM [35] combines data from coop-
erally based on iterating (see [12]): (i) robust detection and eratively acquired videos as long as some of the cameras
matching of feature points, (ii) n-point algorithms to estab- see the same content at the same time. Other approaches
lish initial geometric relationships between views, and (iii) achieve real time performance, but only on preconstructed
global BA. This approach has been successfully extended scenes [18], i.e., with known geometry.
to massive, very large scale datasets [6,11,26], with various Recently, a direct SLAM method (LSD-SLAM) was pro-
publicly available implementations [2, 4]. posed [10], which does not require detection and tracking
A central problem for such techniques is bootstrapping, of feature points, but instead recovers sparse depth maps
i.e., finding a good global initialization for BA that includes based directly on epipolar line scanning. However, such
all images, without having to run many iterations of BA on an approach does not scale well to high image resolutions,
parts of the reconstruction. Martinec and Pajdla [20] present as it requires depth estimates for many pixels. In addition,
a robust solution for finding global camera poses, concen- depth recovery is very sensitive to accurate intrinsic calibra-
trating on the camera orientation. Wilson and Snavely [32] tion. Our approach instead focuses on a subset of reliable
and Jiang et al. [14] show how to find global camera posi- features tracks, which is more efficient and less sensitive to
tions given known orientations. For massive datasets like image distortions, especially for high resolution input.
Internet photo collections, a second problem is the sheer Specific light field calibration techniques have been pro-
amount of images, often in the order of millions. To this posed for dense spatio-temporal-angular sampling using
end, techniques such as skeletal graphs [25] have been pro- camera arrays [13,15,31] and plenoptic cameras [22]. How-
posed, which remove unnecessary data by focusing on sta- ever, these methods generally focus on the static geomet-
ble subsets of cameras. Agarwal et al. [5] showed that it ric calibration of a light field, rather than computing both
is necessary to reconsider well established strategies in or- structure and motion, and hence cannot be applied to the
der to tackle large datasets consisting of 10s of thousands of acquisition scenarios we discuss in this paper. The work on
images. A further alternative is to perform an incremental, unstructured light field acquisition [8] explores this to some
piecewise reconstruction of a scene [23], and later assemble extent, but only supports small scale scenes and focuses on
individual fragments based, e.g., on extracted scene point an interactive interface for guiding the user during the ac-
descriptors. quisition process.
0.16 8
FeatureWdetectionW/Wtracking

Reprojection error (px)

drift

Camera drift (units)

0.14 7
LCPE 0.12 6 drift free
KLTWTracks 0.1 5
0.08 4
0.06 3
SIFT InterleavedWwindowWBA 0.04 2
Features 0.02 1
Windows 0 0

oﬃ

st h

oﬃ ay

sh 2

oﬃ

st h

oﬃ ay

sh 2
ai

ai
ip

ip
nc

nc
ce

ce
rw

rw
GlobalWanchorWconstraints Test scene Test scene

CameraWpose
WWconstraints Figure 2. Influence of KLT drift on the reconstruction result. Note
how drift free KLT tracks reduce the drift of the camera positions
LinearWcameraWposeWestimation
as well as the average reprojection error.

FinalWBAWoptimization
all input frames from all input sequences.
Figure 1. Algorithm overview. The feature detection stage com- Figure 1 shows an overview of the key steps of our algo-
putes KLT tracks for window BA and SIFT features for wide base-
rithm. The following sections discuss each step in detail.
line handling. The window BA sweeps through the input, selects
confident frames and computes camera pose constraints between
interleaving sets of the selected frames. The global anchor stage 3.1. Drift reduced feature tracking
uses the SIFT features to establish global links between different
sections of the sequence. A linear camera pose estimation pro- Detectors optimized for wide baseline matching such
duces an initial arrangement of cameras which is further refined as SIFT [19] compute incoherent feature point sets even
by bundle adjustment steps.
between neighboring video frames. For continuous se-
quences, feature point tracking produces more reliable and
Our method focuses on densely sampled image se- efficient results. We build on the standard KLT tracker im-
quences and overcomes several of the previously mentioned plemented in OpenCV [3], which is also the basis for earlier
limitations. This results in a SfM approach that is sta- video-centered SfM techniques [27]. There are, however,
ble and globally consistent over long, high resolution se- two limitations of standard continuous KLT that have to be
quences, while still being able to robustly handle wide base- addressed in our application setting.
line matches. Firstly, we observed that for densely sampled video se-
quences, feature tracks that are visible for hundreds of
3. Method frames exhibit noticeable drift. Note that for high frame rate
The input to our method is one or more image sequences. cameras, this often corresponds to just a second of video.
We focus on extrinsic calibration and assume the intrin- We therefore modify the basic tracking to perform a simple
sics to be fixed and known (in practice they can be com- drift correction: when adding a frame, we track each fea-
puted from a few frames of the image sequences by using ture from the previous frame to the new one, and then re-
Bundler [2]). fine the feature position in the new frame using the original
On a high level our strategy is as follows. First, we per- frame where the feature was detected. In our experiments
form a modified 2D tracking of feature points utilizing data this simple modification led to considerably reduced drift
coherence to reduce drift. Next, we apply a window BA and higher reconstruction quality (see Figure 2).
strategy on a set of confident frames only. These are frames Secondly, simply tracking points over an image sequence
that are well connected via continuous tracks. To incorpo- cannot guarantee any form of global consistency of the re-
rate loop closing, we further establish global anchor links constructed cameras and scene. For example, when the
between carefully selected frame pairs of different parts of camera revisits the same scene elements multiple times over
the video or even different video streams altogether. In ad- a longer image sequence with intermediate occlusions, a
dition to these global constraints, relative camera pose con- single scene point will be represented by multiple, indi-
straints from the window BA are integrated with an efficient vidually tracked and reconstructed points. This problem
linear camera pose estimation [14]. We then perform global is known as the so called loop-closing problem in SLAM.
BA, and finally add all the less confident images by inter- For each feature track we therefore extract SIFT descrip-
polation and BA of their poses. During this step, we keep tors [19] in confident frames after the window BA, which
the scene structure fixed as determined by the confident im- is later used to re-identify points and for the generation of
ages. The final result is a globally consistent calibration of global anchor constraints.
0.14 8

Reprojection error (px)

all points

Camera drift (units)

0.12 7
6 subsampled
0.1
5
0.08
Figure 3. Illustration of different camera initializations. Cameras 0.06
4
3
are arranged uniformly along a line orthogonal to the viewing di- 0.04 2
rection in 45° steps. 0.02 1
0 0

oﬃ
be e
st h
oﬃ wa
sh 2

oﬃ
be e
st h
oﬃ wa
sh 2
ai

ai
ip

ip
nc

nc
c

ce y

ce
r

r
3.2. Interleaved window bundle adjustment

y
Test scene Test scene

Given feature tracks and descriptors, the goal of our win-

450
dow BA is to efficiently reconstruct camera poses for im- 400 all points
350 subsampled
age sub-sequences without considering global consistency 300

Time (s)
Global optimization
or jointly processing multiple individual sequences, both of 250
200
which will be addressed later. 150
100
50
0
3.2.1 Window initialization

oﬃ

sh
ai

ip
nc
ce

ce
rw
h

2
ay
Test scene
Initializing camera geometry is usually accomplished us-
ing n-point algorithms [12], which are (in contrast to BA)
Figure 4. Comparison of reconstruction with and without point
limited in the number of constraints and therefore generally
subsampling. Our subsampling strategy leads to comparable
not sufficiently accurate given the very small camera base- reconstruction results while reducing the computation time by
lines encountered in high framerate video sequences. Our roughly a factor of 3. This factor is assumed to increase on larger
method is inspired by the approach of Yu and Gallup [33] image resolutions since more points can be excluded there.
designed for accidental small baseline camera motion.
We initialize a window by picking the first N consecu-
tive images from an image sequence and immediately per- scheme on the available scene points in all BA steps. We
form a BA step using the parameterization proposed in [33], found that sampling points randomly can lead to unstable or
where points are represented by inverse depth values pro- underconstrained optimization. We therefore choose point
jected from a reference frame (we use the center image in samples according to three rules explained below, achieving
the window). We found, however, that identical initializa- similar reconstruction quality at a fraction of the optimiza-
tion of all cameras [33] may cause BA to get stuck in local tion time.
minima. According to our observations, this can reliably be First, a minimum number τ of points should be visible
avoided by starting from different linearly displaced config- from each camera. Second, the reprojected 2D positions of
urations (see Figure 3) and optimizing first for the camera the points should be uniformly distributed in all camera im-
orientation and then for all extrinsics. Finally we pick the ages. Finally, the points should be visible in “sufficiently
best result in terms of reprojection error. Moreover, we ob- many” images, since in general a point provides more reli-
served more robust results when initializing scene points able constraints when seen by more cameras. At the same
with uniform instead of random depth [33]. The origi- time, however, we observed that points visible in a very
nal method of Yu and Gallup requires a comparably large large number of frames, i.e. points with very long 2D tracks,
number of images for robust convergence. With our above are more likely to be affected by tracking errors along ob-
modifications we observed stable convergence already with ject silhouettes, corrupting the result. Each point is there-
N = 11. For high frame rate handheld video, spacing be- fore picked with a probability proportional to the length of
tween frames (e.g., 3 in our experiments) for slightly in- the track, but capped to no longer then 10 frames. This sub-
creased baselines led to improved convergence. sampling strategy resulted in a 3-fold speedup without sac-
The next step is to grow this window. To this end, we rificing result quality (see Figure 4). For all experiments,
first describe a subsampling scheme of the scene points that we use a value of τ = 100.
allows us to reduce the BA computational cost significantly
at similar reconstruction quality. 3.2.3 Confidence criteria

3.2.2 Scene point subsampling For efficiency reasons, and to improve result quality, we
compute scene structure and camera poses initially only for
Following the observation that BA requires a certain mini- a sparse set of confident frames. In order to find this set, we
mum number of scene points but does not improve signif- test cameras with a linearly increasing step size and add the
icantly with many more points, we employ a subsampling furthest possible frame fulfilling a set of confidence criteria.
We define the following three confidence criteria ξ1 , ξ2 , ξ3
for measuring whether a tested camera c is suitable for win-
dow BA.
The first term ξ1 measures the number of features of the
camera that can be matched to the points pi of the window
with a low reprojection error. This ensures that there are
sufficiently many constraints for BA.
Figure 5. Selecting keyframes for interleaved window BA. Offsets
ξ2 represents how far the camera moved around the scene
between the keyframes selected for BA are linearly increasing to-
points. We use the the median of all points’ angular differ- wards older frames. To determine a consistent pose for a camera
ences φ(e− →, −
pc →
p pcn ) between the vectors to the new cn and the which was not part of the BA, we use the relative pose constraints
previous camera cp . This term makes sure that two cameras that were generated in previous windows where the camera was
are not too far apart from each other and ensures that the part of the BA.
visual appearance of the feature points doesn’t change to
much so that the next confident frame has mostly the same
feature tracks. We experimented with various offsetting strategies be-
The last term ξ3 is set to the median reprojection er- sides the growth strategy described above. The linear in-
ror ee of the tested camera ct and its visible points pt : crease provided the best results in terms of algorithm stabil-
ξ3 = ee(ct , pt ). This ensures that no cameras are added to ity, camera sampling, and computation time. The output of
the optimization which are too inconsistent with the content this stage are camera pose constraints from each window,
of the window. which we will later use for initializing the global scene op-
We label a camera as sufficiently confident when the fol- timization. We also keep all the windows for finding global
lowing criteria are fulfilled: ξ1 ≥ 30, ξ2 ≤ 5°, ξ3 ≤ 5px, anchor constraints as described in the following section.
i.e., the camera must be linked to at least 30 points, must not 3.3. Global anchor constraints
rotate more than five degrees around at least half of these
points, and at least half of the points must have less than The goal of these constraints is to establish global links
five pixels reprojection error. Similarly, a camera is labeled between different parts (possibly different subsequences) of
as candidate for removal from the current window as soon a video that have shared scene content. These links can later
as it does not satisfy the following confidence constraints be used in the linear camera pose estimation stage to obtain
anymore: ξ1 ≥ 70, ξ2 ≤ 10°, i.e., at least 70 points and less a good global initialization.
than ten degrees of camera rotation around at least 50% of We establish these constraints by importance sampling
the points. frame pairs from the set of confident frames and by join-
ing them based on SIFT features and the previously re-
constructed window scene structure. To do this, we ex-
3.2.4 Window processing tract SIFT descriptors for all KLT features in the confident
Given the confidence criteria for addition and removal, in frames, and for each pair, try to integrate the camera of
each iteration of the algorithm, we first remove images la- one frame using the scene structure associated with another
beled for removal from the current window, keeping a min- frame using BA. The optimized camera pose is rated based
imum of 5 cameras in the window at all times. After this on a confidence measure. Stable matching pairs among all
step, the current window contains usually about five to ten possible confident frame pairs in the video sequence(s) are
confident cameras. used as relative camera pose constraints for the global linear
However, for some camera (sub-)trajectories, stable win- pose estimation stage (see Figure 1).
dows can be much larger. To retain the efficiency of BA
while keeping as much information as possible, we select a 3.3.1 Camera stability
subset of the cameras in the window on which to perform
The stability of cameras for being used as global anchor
the actual BA. We pick cameras with increased spacing for
constraints is based on the following measures ζ:
older images (see Figure 5). This subset is then optimized
using standard BA. • The number of remaining points attached to a camera:
After BA, all cameras in the current window are made ζ1 = n. This makes sure that there are enough con-
consistent with a linear camera pose estimation tech- straints for optimization.
nique [14], using the relative camera pose constraints of for-
mer windows. This solver works on the camera poses only, • The distribution of the point projections ρ in the image:
producing faster results than BA in comparable quality as ζ2 = min(Std(ρx ), Std(ρy )). This avoids unstable
long as the input is consistent. configurations with very localized feature positions.
windows that share a camera, this step propagates the
variance information to arbitrary indirectly connected
camera pairs.
This results in a variance matrix V (see Figure 6). We
can now estimate a matrix S representing potential anchor
C V S = (1 − C) ◦ V frames to be used as global links:
Figure 6. Cost, variance and sampling matrices for wide baseline
candidate picking. The camera circled an object twice. Dark parts S = (1 − C) ◦ V (3)
of C indicate regions where good global anchor constraints are
likely to be found. S shows where we sample for global anchor Note that Cij ∈ (0, 1) and ◦ is the element-wise product
constraints. of matrices. We importance sample S to get frame pairs
(f1 , f2 ) that represent useful anchor constraints.
Geometrical verification. To ensure that a global an-
• The ratio between the smallest and the largest princi- chor constraint is truly useful, we perform a geometrical
pal component PCA(p)min , PCA(p)max of the scene verification. We pick the window with the most available
PCA(p)min
point positions p: ζ3 = PCA(p) max
. This avoids using scene points that contains f1 and BA for the pose of f2 ’s
two-dimensional scenes which tend to be ambiguous camera based on those points, utilizing SIFT matches for
(e.g. camera in plane) or unstable (e.g. frontoparallel linking f2 ’s features to f1 ’s points.
plane). In our experiments we observed that up to 40% of the
matches were outliers when matching SIFT features ex-
3.3.2 Anchor selection tracted from KLT keypoints. Therefore, we exploit the al-
ready known scene geometry to gain robustness in this pro-
To find good anchor candidates for wide baseline links we cess. We apply four passes of BA for the camera pose pa-
use two measures: The cost for matching two frames, and rameters while removing all the points with reprojection er-
the uncertainty of relative camera poses computed during rors worse than the average between the passes. Since BA
the window BA. tends to prefer consistent constraints, inconsistent reprojec-
Cost estimation. For robust estimation of the basic link- tions are removed by this procedure. If there is not enough
ing cost, we compute frame similarity based on histograms consistent data, BA diverges which leads to a violation of
of SIFT features [30]. The output is a cost matrix C rep- our stability constraints. We consider the geometric verifi-
resenting the cost for matching two frames of a video se- cation successful if it passes the following stability thresh-
quence (see Figure 6). olds: ζ1 ≥ 25, ζ2 ≥ 0.075 · ImageSize and ζ3 ≥ 0.1, which
Uncertainty estimation. For uncertainty estimation, we worked well in all our experiments. When a pair of frames
approximate the variance of the camera poses for every pair representing a global anchor constraint fulfils these thresh-
of confident cameras in the following three steps: olds, we add the respective relative camera pose constraints
to the existing set of constraints. In all our experiments,
1. Estimate the variance of each camera’s pose ci relative
these thresholds reliably removed all outliers.
to each window’s structure, i.e., the 3D points com-
Figure 7 illustrates the effect of using the anchor con-
puted from cameras in window wj :
straints, based on sampling costs C and S, which addition-
V ar(ci , wj ) ∝ 1/(min(25, ζ1 ) · ζ2 · ζ3 )2 (1) ally takes into account variance matrix V. Using anchor
constraints considerably reduces camera drift. By concern-
We assume that 25 reprojections are sufficiently many ing V in addition to the basic matching cost, drift can be
constraints. reduced by another 40%.

2. Use this information to estimate the variance between 3.4. Final optimization
windows by averaging the summed variances to com- The window BA and the global anchors now provide a
mon camera poses: large set of pose constraints. Using all these constraints we
n again apply global linear optimization [14] in order to com-
X V ar(ci , wj1 ) + V ar(ci , wj2 ) pute a globally consistent 3D scene and camera calibration
V ar(wj1 , wj2 ) =
i=1
n2 for all input frames.
(2) We then apply a series of nonlinear least squares opti-
mization passes based on the following three strategies:
3. Find the camera→window→...→window→camera
path with the lowest summed variance for each cam- A No Field of View (FoV) optimization, no bad point re-
era pair. While step 2 only considers variances for moval.
Processing time 637 frames (s)
0.001 4.5 250 25 10 500

Reprojection error (px)

Wide baseline time (s)

Ours

Camera pair samples

4
Camera drift (units)

Standard deviation
0.0008 3.5 200 20 1 400 PTAM
3
0.0006 150 15 LSD
2.5 0.1 300
2 Voodoo
0.0004 100 10
1.5 0.01 200 VisSfM
0.0002 1 50 5
0.5 0.001 100
0 0 0 0
2 loops 2 loops 2 loops 2 loops 0.0001 0
Test scene Test scene Test scene Test scene 176 637 Inkl. Excl.
Tracking Tracking

no handling cost use+cost Number of frames Timing

Figure 7. Comparison of loop closing strategies. Beside camera Figure 9. Evaluation based on synthetic ground truth. We give the
drift and reprojection error, we also show the number of samples standard deviation of the reconstructions fitted to the ground truth
needed to get 20 verified wide baseline links and the computation with an affine transformation plus timings. Our approach runs or-
time for wide baseline handling. Without loop closing, the camera ders of magnitude faster than other SfM systems while producing
drift is quite high (out of scale: 0.08). Cost based frame selec- results which are an order of magnitude more accurate than SLAM
tion for wide baseline handling reduces the drift drastically (use). systems. PTAM failed after 176 frames due to too slow map up-
Choosing frames also based on their value for wide baseline han- date.
dling (encoded in V) reduces the drift by another 40% for a fair
amount of extra samples/runtime (use+cost). Note that the repro-
180
jection error increases because of the extra constraints that have to 160 Linear reconstruction

Processing time (s)

be fulfilled for closing the loop. 140 Anchor constraints
120 Global initialization
100
Keyframe optimization
80
60 Adding all cameras
Reprojection error (px)

0.12 8 600
Camera drift (units)

7 40
0.1 500
Overall time (s)

6 20
0.08 5 400 0
0.06 4 300
Sh

2
ﬃ

lo
ai
ip

nc
3
ce

op
rw
0.04 200

s
ay

2
2 Scene
0.02 1 100
0 0 0
Figure 10. Breakdown of reconstruction timings to the individual
oﬃ
be e
st ch
oﬃ rwa
sh e2

oﬃ
be e
st ch
oﬃ rwa
sh e2

oﬃ
be e
st ch
oﬃ rwa
sh e2
ai

ai
ip

ip
n

n
c

c y

pipeline parts.
Test scene Test scene Test scene

full sparse

Figure 8. Keyframe selection evaluation. We compare our im- 4. Evaluation

plementation optimizing only keyframes together with the points
(sparse) to an implementation that does a full global BA (full). In the addition to the quantitative evaluations shown
Using a sparse camera set yields results comparable to the full op- throughout the paper, we provide additional results on
timization but is 2-10x faster. This factor is assumed to increase ground truth data and timings. Most test scenes (office 1333
with higher framerates since less keyframes are selected for the frames, bench 1055 frames, stairway 867 frames, office2
sparse set. 1180 frames) were recorded with a GoPro Hero 3 in Wide
Angle 1080p 60fps mode. The ship scene (4411 frames)
was recorded with a DSLR mounted on a slowly moving
B No FoV optimization, bad point removal. crane to simulate high frame rate footage. Our datasets and
additional supplemental materials are provided on the ac-
C FoV optimization, bad point removal. companying project webpage [1].
Confident frame selection. In order to demonstrate
We run the following sequence: ABABABCCC. Skipping the robustness of our confident frame selection process we
bad point removal (A) at the beginning avoids the removal of compare the results of just using those frames in the fi-
reliable points because of a bad initialization, thus loosing nal BA optimization to a full BA reconstruction using all
valuable information for optimization. FoV optimization is frames. Figure 8 shows that there is only a small quality
added at the very end only (C), because it tends to converge improvement at the cost of considerably increased compute
to singularities in small or badly initialized scenes. time with the full BA.
When all confident cameras are calibrated and corre- Ground truth comparison. We have constructed a
sponding stable scene points are reconstructed, we initialize 2MP, 60 fps synthetic ground truth sequence from the Open
the poses of all remaining, unused cameras by linear inter- Movie Project ”Sintel” [28] containing rich scenery, motion
polation followed by a BA step constrained by the stable blur, glare and camera shakes. Our method is more of a
scene points. SfM approach than SLAM, as it features global BA steps
3500
Bundler
3000
Processing time (s)

VisualSfM, all pairwise matches

2500 Voodoo
2000 LSD SLAM
1500 Ours
1000 SfM only
500
0
25 50 100 200 400 800 1600

Number of frames

Figure 11. Computation time comparison for a FullHD (1080p) image sequence. Our technique exhibits the lowest computation time of
all tested approaches. Note that if only the SfM part without feature processing is considered, we are about 6x faster than the nearest
competitor. For 400, 800 and 1600 frames, we did not obtain reconstruction timings from all methods. Separate timings for IO/features
and SfM reconstruction could not be obtained with Voodoo.

typical to SfM. However, Figure 9 shows that it runs orders current OpenCV KLT tracking by a GPU based implemen-
of magnitude faster than SfM systems, at comparable speed tation and improving our point subsampling strategy, e.g.,
to current SLAM systems, while producing results which using stratified sampling, could lead to improved recon-
are an order of magnitude more accurate than SLAM sys- struction quality and speed.
tems.
Timings. Figure 10 breaks down the reconstruction tim- 5. Conclusion
ings for the scenes used in this paper to the individual parts
We introduced a novel pipeline that enables efficient
of our pipeline. Reconstruction timings of our approach are
computation of extrinsic camera poses and scene structure
further compared with several other techniques in Figure
on high spatiotemporal resolution, densely sampled video
11 analyzing timings for varying numbers of frames of the
sequences. One of the key insights in this work is that the
FullHD outdoor video sequence. We compare to two ap-
coherence of such data enables the use of modified track-
proaches designed for handling images (Bundler [2] and Vi-
ing, subsampling, and global optimization schemes, which
sualSfM [4] (GPU accelerated, parallelized)) as well as two
in combination allow for considerably faster and more ro-
approaches for video sequences (Voodoo Camera Tracker
bust computation, similar to observations made in previous
and the recent LSD-SLAM [10]).
works [15,33] in the context of 3D reconstruction. In partic-
Methods intended for sparse, unstructured data suffer
ular we found that common choices in SfM such as n-point
from n2 runtime for searching corresponding images. The
algorithms for initialization are problematic in this context
Voodoo Camera Tracker performs well for small tracks but
and can be entirely replaced by BA-based approaches.
becomes much slower when BA has to correct accumu-
Given the constant increase of camera resolution and
lated drift in the end. Even in comparison to recent effi-
frame rate, and the advent of light field sensors by com-
cient SLAM approaches such as LSD SLAM our method
panies such as Lytro or Pelican Imaging, we believe that
is faster. We also observed that increased image resolu-
algorithms specifically designed for densely sampled input
tion can lead to significant drops in performance for the
represent a great opportunity for future research in this area.
tested methods, whereas our method scales well due to the
proposed subsampling. We expect further significant speed References
gains by improved preprocessing such as feature extraction
and tracking, as the majority of computing time is spent on [1] http://www.disneyresearch.com/project/scalablesfm. 2, 7, 8
these steps, and not our core optimization procedure (Fig- [2] Bundler Structure from Motion Toolkit. https://
ure 11). github.com/snavely/bundler_sfm. [Online; ac-
Please also refer to the supplemental material on the cessed 09-Nov-2014]. 2, 3, 8
[3] Open Source Computer Vision Library. http://opencv.
project webpage [1] for reconstruction results on several
org/. [Online; accessed 09-Nov-2014]. 3
other scenes, including very high resolution 5k video, mul-
[4] VisualSFM : A Visual Structure from Motion System.
tiple video sequence reconstructions and reconstructions
http://ccwu.me/vsfm/. [Online; accessed 09-Nov-
from the Stanford Light Field datasets. 2014]. 2, 8
Limitations and future work. Our method currently [5] S. Agarwal, N. Snavely, S. M. Seitz, and R. Szeliski. Bundle
computes only extrinsic camera parameters. As future work adjustment in the large. In ECCV, 2010. 1, 2
it would be interesting to support uncalibrated cameras with [6] S. Agarwal, N. Snavely, I. Simon, S. M. Seitz, and
changing intrinsics. Moreover, the algorithm is limited by R. Szeliski. Building rome in a day. In ICCV, 2009. 1,
some of the components used. For instance, replacing the 2
[7] G. Bradski. Dr. Dobb’s Journal of Software Tools, 2000. 2 [29] T. Tuytelaars and K. Mikolajczyk. Local invariant feature
[8] A. Davis, M. Levoy, and F. Durand. Unstructured light fields. detectors: A survey. In Foundations and Trends in Computer
Comp. Graph. Forum, 31(2pt1):305–314, May 2012. 2 Graphics and Vision, pages 177–280, 2007. 1
[9] A. J. Davison, I. D. Reid, N. Molton, and O. Stasse. [30] O. Wang, C. Schroers, H. Zimmer, M. H. Gross, and
Monoslam: Real-time single camera SLAM. IEEE TPAMI, A. Sorkine-Hornung. Videosnapping: interactive synchro-
29(6):1052–1067, 2007. 2 nization of multiple videos. ACM Trans. Graph., 33(4):77,
[10] J. Engel, T. Schöps, and D. Cremers. LSD-SLAM: large- 2014. 2, 6
scale direct monocular SLAM. In ECCV, 2014. 2, 8 [31] B. Wilburn, N. Joshi, V. Vaish, E.-V. Talvala, E. Antunez,
[11] J. Frahm, P. F. Georgel, D. Gallup, T. Johnson, R. Raguram, A. Barth, A. Adams, M. Horowitz, and M. Levoy. High per-
C. Wu, Y. Jen, E. Dunn, B. Clipp, and S. Lazebnik. Building formance imaging using large camera arrays. ACM Trans.
rome on a cloudless day. In ECCV, 2010. 2 Graph., 24(3):765–776, 2005. 2
[12] A. Harltey and A. Zisserman. Multiple view geometry in [32] K. Wilson and N. Snavely. Robust global translations with
computer vision. Cambridge University Press, 2006. 1, 2, 1dsfm. In ECCV, 2014. 2
4 [33] F. Yu and D. Gallup. 3d reconstruction from accidental mo-
[13] M. B. Hullin, J. Hanika, B. Ajdin, H.-P. Seidel, J. Kautz, and tion. In CVPR, 2014. 1, 2, 4, 8
H. P. A. Lensch. Acquisition and analysis of bispectral bidi- [34] C. Zhang, J. Gao, O. Wang, P. Georgel, R. Yang, J. Davis,
rectional reflectance and reradiation distribution functions. J. Frahm, and M. Pollefeys. Personal photograph enhance-
ACM Trans. Graph., 29(4):97:1–97:7, July 2010. 2 ment using internet photo collections. IEEE Trans. Vis. Com-
[14] N. Jiang, Z. Cui, and P. Tan. A global linear method for put. Graph., 20(2):262–275, 2014. 1
camera pose registration. In ICCV, pages 481–488, 2013. 2, [35] D. Zou and P. Tan. Coslam: Collaborative visual slam in
3, 5, 6 dynamic environments. IEEE TPAMI, 35(2):354–366, 2013.
[15] C. Kim, H. Zimmer, Y. Pritch, A. Sorkine-Hornung, and 2
M. Gross. Scene reconstruction from high spatio-angular
resolution light fields. ACM Trans. Graph., 32(4):73:1–
73:12, 2013. 1, 2, 8
[16] G. Klein and D. Murray. Parallel tracking and mapping for
small AR workspaces. In ISMAR, 2007. 2
[17] M. Li and A. I. Mourikis. High-precision, consistent EKF-
based visual-inertial odometry. Int. J. Robotics Research,
32(6):690–711, 2013. 2
[18] H. Lim, S. N. Sinha, M. F. Cohen, M. Uyttendaele, and H. J.
Kim. Real-time monocular image-based 6-dof localization.
Int. J. Robotics Research, 2014. 2
[19] D. G. Lowe. Distinctive image features from scale-invariant
keypoints. IJCV, 60(2):91–110, 2004. 3
[20] D. Martinec and T. Pajdla. Robust rotation and translation
estimation in multiview reconstruction. In CVPR, 2007. 2
[21] R. A. Newcombe, S. Lovegrove, and A. J. Davison. Dtam:
Dense tracking and mapping in real-time. In ICCV, 2011. 2
[22] R. Ng. Digital Light Field Photography. PhD thesis, 2006.
2
[23] R. Parys and A. Schilling. Incremental large scale 3d recon-
struction. 3DIMPVT, 2012. 2
[24] M. Pollefeys, L. J. V. Gool, M. Vergauwen, F. Verbiest,
K. Cornelis, J. Tops, and R. Koch. Visual modeling with
a hand-held camera. IJCV, 59(3):207–232, 2004. 2
[25] N. Snavely, S. Seitz, and R. Szeliski. Skeletal graphs for
efficient structure from motion. In CVPR, 2008. 2
[26] N. Snavely, S. M. Seitz, and R. Szeliski. Modeling the world
from internet photo collections. IJCV, 80(2):189–210, 2008.
1, 2
[27] R. Szeliski and S. B. Kang. Recovering 3d shape and motion
from image streams using nonlinear least squares. In CVPR,
1993. 1, 2, 3
[28] T. Roosendaal (Producer). Sintel. Blender Foundation,
Durian Open Movie Project. http://www.sintel.
org/, 2010. 7

Visual SFM Tutorial
No ratings yet
Visual SFM Tutorial
21 pages
3D Reconstruction USING MULTIPLE 2D IMAGES
No ratings yet
3D Reconstruction USING MULTIPLE 2D IMAGES
4 pages
Schoenberger 2016 SFM
No ratings yet
Schoenberger 2016 SFM
10 pages
Schoenberg Er 2016 S FM
No ratings yet
Schoenberg Er 2016 S FM
11 pages
Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
No ratings yet
Large-Scale and Drift-Free Surface Reconstruction Using Online Subvolume Registration
9 pages
Structure From Motion - Revisited
No ratings yet
Structure From Motion - Revisited
10 pages
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
No ratings yet
Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions
8 pages
Multi View Environment - Fuhrmann-2014-MVE
No ratings yet
Multi View Environment - Fuhrmann-2014-MVE
8 pages
Computer Vision and Image Understanding: Yisong Chen, Antoni B. Chan, Zhouchen Lin, Kenji Suzuki, Guoping Wang
No ratings yet
Computer Vision and Image Understanding: Yisong Chen, Antoni B. Chan, Zhouchen Lin, Kenji Suzuki, Guoping Wang
11 pages
2108.08291v1
No ratings yet
2108.08291v1
17 pages
Real-Time Panoramic Tracking For Event Cameras
No ratings yet
Real-Time Panoramic Tracking For Event Cameras
10 pages
A Photogrammetry-Based Framework To - SciTePress
No ratings yet
A Photogrammetry-Based Framework To - SciTePress
7 pages
3D Reconstruction From Multiple Images
No ratings yet
3D Reconstruction From Multiple Images
189 pages
Motion Estimation: Advancements and Applications in Computer Vision
From Everand
Motion Estimation: Advancements and Applications in Computer Vision
Fouad Sabry
No ratings yet
Fast Structure from Motion for Sequential and Wide Area Motion Imagery
No ratings yet
Fast Structure from Motion for Sequential and Wide Area Motion Imagery
4 pages
Building Rome in A Day
No ratings yet
Building Rome in A Day
8 pages
Roma Pepper
No ratings yet
Roma Pepper
8 pages
2412.09401v2
No ratings yet
2412.09401v2
15 pages
Unsupervised 3D Object Recognition and Reconstruction in Unordered Datasets
No ratings yet
Unsupervised 3D Object Recognition and Reconstruction in Unordered Datasets
8 pages
Affine Reconstruction From Multiple Views Using Singular Value Decomposition
No ratings yet
Affine Reconstruction From Multiple Views Using Singular Value Decomposition
61 pages
Lin_OcclusionFusion_Occlusion-Aware_Motion_Estimation_for_Real-Time_Dynamic_3D_Reconstruction_CVPR_2022_paper
No ratings yet
Lin_OcclusionFusion_Occlusion-Aware_Motion_Estimation_for_Real-Time_Dynamic_3D_Reconstruction_CVPR_2022_paper
10 pages
Uw Cse 11 02 02 PDF
No ratings yet
Uw Cse 11 02 02 PDF
8 pages
3d Gaussian Splatting High
No ratings yet
3d Gaussian Splatting High
14 pages
OMNIVIS09
No ratings yet
OMNIVIS09
9 pages
Towards Linear-Time Incremental Structure from Motion
No ratings yet
Towards Linear-Time Incremental Structure from Motion
8 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
95 pages
Robust Odometry Estimation For RGB-D Cameras: Christian Kerl, J Urgen Sturm, and Daniel Cremers
No ratings yet
Robust Odometry Estimation For RGB-D Cameras: Christian Kerl, J Urgen Sturm, and Daniel Cremers
8 pages
Semantic Structure From Motion With Points, Regions, and Objects
No ratings yet
Semantic Structure From Motion With Points, Regions, and Objects
8 pages
By Reinhard Koch, Marc Pollefeys and Luc Van Gool
No ratings yet
By Reinhard Koch, Marc Pollefeys and Luc Van Gool
13 pages
Point Cloud Densification
No ratings yet
Point Cloud Densification
72 pages
Discrete-Continuous Optimization For Large-Scale Structure From Motion
No ratings yet
Discrete-Continuous Optimization For Large-Scale Structure From Motion
8 pages
CV Potential Questions
No ratings yet
CV Potential Questions
5 pages
VGGT
No ratings yet
VGGT
20 pages
UNIT IV AICV AIDS
No ratings yet
UNIT IV AICV AIDS
22 pages
Real Time Tracking
No ratings yet
Real Time Tracking
8 pages
2501.13928v1
No ratings yet
2501.13928v1
16 pages
04 Multi-View Geometry
No ratings yet
04 Multi-View Geometry
54 pages
Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
No ratings yet
Structure From Motion: Computer Vision Jia-Bin Huang, Virginia Tech
84 pages
Visual Modeling With A Hand-Held Camera: Abstract
No ratings yet
Visual Modeling With A Hand-Held Camera: Abstract
26 pages
lecture10-2
No ratings yet
lecture10-2
23 pages
Patch Based Non Rigid 3D Reconstruction From A Single Depth Stream
No ratings yet
Patch Based Non Rigid 3D Reconstruction From A Single Depth Stream
10 pages
Im High Quality Structure ICCV 2015 Paper
No ratings yet
Im High Quality Structure ICCV 2015 Paper
9 pages
a cluster
No ratings yet
a cluster
17 pages
Progressively Optimized Local Radiance Fields For Robust View Synthesis
No ratings yet
Progressively Optimized Local Radiance Fields For Robust View Synthesis
10 pages
Panoramic Image Stitching 3D Photography Initial Project Proposal
No ratings yet
Panoramic Image Stitching 3D Photography Initial Project Proposal
4 pages
Automatic Calibration and Reconstruction For
No ratings yet
Automatic Calibration and Reconstruction For
169 pages
Using Vanishing Points For Camera Calibration
No ratings yet
Using Vanishing Points For Camera Calibration
26 pages
From Images To 3D Models
No ratings yet
From Images To 3D Models
7 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
2505.05473v1
No ratings yet
2505.05473v1
15 pages
Multi - View Stereo A Tutorial
No ratings yet
Multi - View Stereo A Tutorial
151 pages
D V2D: V D D S M: EEP Ideo To Epth With Ifferentiable Tructure From Otion
No ratings yet
D V2D: V D D S M: EEP Ideo To Epth With Ifferentiable Tructure From Otion
20 pages
2506.05348v1
No ratings yet
2506.05348v1
17 pages
A New Approach To Image-Based Realistic Architecture Modeling
No ratings yet
A New Approach To Image-Based Realistic Architecture Modeling
10 pages
Ji Maximizing Rigidity Revisited ICCV 2017 Paper
No ratings yet
Ji Maximizing Rigidity Revisited ICCV 2017 Paper
9 pages
Research On 3D Reconstruction Based On Multiple Views
No ratings yet
Research On 3D Reconstruction Based On Multiple Views
5 pages
Experiments in 3D Measurements by Using Single Camera and Accurate Motion
No ratings yet
Experiments in 3D Measurements by Using Single Camera and Accurate Motion
6 pages
Robust Dynamic Radiance Fields
No ratings yet
Robust Dynamic Radiance Fields
11 pages
Gaussian Sla
No ratings yet
Gaussian Sla
15 pages
Overview of Structure From Motion
No ratings yet
Overview of Structure From Motion
8 pages
Why Cost Volume Construction Can Be a Non-Trivial Yet Interesting Problem in Transformer-Based Models_ _ Tianye Ding
No ratings yet
Why Cost Volume Construction Can Be a Non-Trivial Yet Interesting Problem in Transformer-Based Models_ _ Tianye Ding
5 pages
Intro To Apollo Infra 2020 07 16 Updates
No ratings yet
Intro To Apollo Infra 2020 07 16 Updates
17 pages
VSLAM Tutorial CVPR14 A13 BundleAdjustment Handout
No ratings yet
VSLAM Tutorial CVPR14 A13 BundleAdjustment Handout
7 pages
Exp Diff
No ratings yet
Exp Diff
10 pages
Derivation Jacobian
No ratings yet
Derivation Jacobian
2 pages
Surface Reconstruction From Misregistered Data
No ratings yet
Surface Reconstruction From Misregistered Data
6 pages
Lie Optimization
No ratings yet
Lie Optimization
9 pages
Flask Palletsprojects Com en 1.1.x PDF
No ratings yet
Flask Palletsprojects Com en 1.1.x PDF
293 pages
Easypic v7 Manual v104
100% (1)
Easypic v7 Manual v104
38 pages
Spring Mobile Real Time Application
No ratings yet
Spring Mobile Real Time Application
14 pages
Performance: Performance Accessibility Best Practices SEO Progressive Web App Publisher Ads
No ratings yet
Performance: Performance Accessibility Best Practices SEO Progressive Web App Publisher Ads
5 pages
Scratch Basics
No ratings yet
Scratch Basics
36 pages
REST API - FLUTTERs
No ratings yet
REST API - FLUTTERs
16 pages
Workstation Checklist
No ratings yet
Workstation Checklist
3 pages
AMD TR4 (X399) Promontry 300-Series: 14 Sp3 GND 15 Ddr4 - Dimm - A
No ratings yet
AMD TR4 (X399) Promontry 300-Series: 14 Sp3 GND 15 Ddr4 - Dimm - A
83 pages
Assignment Graphical Limits
No ratings yet
Assignment Graphical Limits
6 pages
Mathematics Unit-4 - Engg.
No ratings yet
Mathematics Unit-4 - Engg.
297 pages
Advanced methods for fault diagnosis and fault-tolerant control Steven X. Ding download
100% (4)
Advanced methods for fault diagnosis and fault-tolerant control Steven X. Ding download
62 pages
IR Assignment Article Review 2023
No ratings yet
IR Assignment Article Review 2023
7 pages
Ethics and Trends
No ratings yet
Ethics and Trends
4 pages
Saturn SOLT33 16P Datasheet
No ratings yet
Saturn SOLT33 16P Datasheet
2 pages
Standard Operating Procedure
100% (1)
Standard Operating Procedure
15 pages
Test Fixtures Catalogue 14 1
No ratings yet
Test Fixtures Catalogue 14 1
214 pages
Isc2 Acceleratedcissp 2018 3 1 15 Security Architecture and Engineering
No ratings yet
Isc2 Acceleratedcissp 2018 3 1 15 Security Architecture and Engineering
37 pages
VMCE_V12 Veeam Certified Engineer v12 Updated Questions 2025
No ratings yet
VMCE_V12 Veeam Certified Engineer v12 Updated Questions 2025
8 pages
pdf_5454575_en-US-6
No ratings yet
pdf_5454575_en-US-6
28 pages
Course Outline
No ratings yet
Course Outline
16 pages
TCS Updated Resume 14
No ratings yet
TCS Updated Resume 14
5 pages
CableFree Low Cost Microwave Radio Datasheet
No ratings yet
CableFree Low Cost Microwave Radio Datasheet
4 pages
Input Data With Steps Followed
No ratings yet
Input Data With Steps Followed
9 pages
Honeywell Forge Business Aviation-Bro
No ratings yet
Honeywell Forge Business Aviation-Bro
16 pages
Trigonometry 6th Edition Charles P. Mckeague - Download the ebook today and experience the full content
No ratings yet
Trigonometry 6th Edition Charles P. Mckeague - Download the ebook today and experience the full content
43 pages
Advanced Configuration - Keycloak
No ratings yet
Advanced Configuration - Keycloak
4 pages
Example SDLC Thesis
100% (3)
Example SDLC Thesis
7 pages
MindMap Cobit 5 Dan Audit Sistem Informa PDF
No ratings yet
MindMap Cobit 5 Dan Audit Sistem Informa PDF
6 pages
DS_6200Series
No ratings yet
DS_6200Series
27 pages
About POMA Consensus
No ratings yet
About POMA Consensus
4 pages
Cybersecurity_Checklist_meddevice_ICS
No ratings yet
Cybersecurity_Checklist_meddevice_ICS
12 pages

Resch Scalable Structure From 2015 CVPR Paper

Uploaded by

Resch Scalable Structure From 2015 CVPR Paper

Uploaded by

Scalable Structure from Motion for Densely Sampled Videos

B. Resch1,2 H. P. A. Lensch2,3 O. Wang1 M. Pollefeys3 A. Sorkine-Hornung1

Abstract cation domains, it is often much simpler and more practical

Reprojection error (px)

Camera drift (units)

Reprojection error (px)

Camera drift (units)

Given feature tracks and descriptors, the goal of our win-

Reprojection error (px)

Wide baseline time (s)

Camera pair samples

no handling cost use+cost Number of frames Timing

Processing time (s)

Figure 8. Keyframe selection evaluation. We compare our im- 4. Evaluation

VisualSfM, all pairwise matches

You might also like