0% found this document useful (0 votes)

55 views

User-Perspective Augmented Reality Magic Lens From Gradients

barievi

Uploaded by

gabrieliam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views

User-Perspective Augmented Reality Magic Lens From Gradients

barievi

Uploaded by

gabrieliam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

User-Perspective Augmented Reality Magic Lens From Gradients

Domagoj Baričević∗ Tobias Höllerer† Pradeep Sen‡ Matthew Turk§

University of California, Santa Barbara

(a) (b) (c)

Figure 1: An example Augmented Reality application showcasing the difference between user-perspective and device-perspective magic lens
interfaces. (a) Real world environment only. (b) Augmented Reality scene with the conventional device-perspective magic lens. (c) AR scene
rendered with our user-perspective magic lens prototype.

Abstract also been adopted as an intuitive interface for some VR applica-

tions. More prominently, today it is the de facto standard interface
In this paper we present a new approach to creating a geometrically- for Augmented Reality due to the wide adoption of hand-held com-
correct user-perspective magic lens and a prototype device imple- puting devices such as smartphones and tablets. Indeed, it is these
menting the approach. Our prototype uses just standard color cam- devices that are largely responsible for bringing AR into the main-
eras, with no active depth sensing. We achieve this by pairing a re- stream consumer market.
cent gradient domain image-based rendering method with a novel
semi-dense stereo matching algorithm inspired by PatchMatch. Our However, while the concept of the magic lens is a natural fit for
stereo algorithm is simple but fast and accurate within its search hand-held AR, the typical approach falls short of the full vision of
area. The resulting system is a real-time magic lens that displays the metaphor. At issue is the perspective of the augmented scene.
the correct user perspective with a high-quality rendering, despite While concept imagery for AR often presents the magic lens as
the lack of a dense disparity map. an almost seamless transparent display, in reality nearly all cur-
rent magic lens implementations rely on video-see-through meth-
ods where the device displays and augments video captured by the
CR Categories: H.5.1 [Information Interfaces and Presentation]:
camera on the back of the device. As a result the AR scene is pre-
Multimedia Information Systems—Artificial, augmented, and vir-
sented from the perspective of the device, instead of that of the user.
tual realities; I.3.7 [Image Processing and Computer Vision]: Scene
Analysis—Stereo This device-perspective approach does not provide a fully intuitive
and seamless experience for the user. There is a clear break between
Keywords: augmented reality, magic lens, user-perspective, im- what is in the real world and what is mediated by the device. Fur-
age based rendering, gradient domain, semi-dense stereo thermore, the sometimes dramatic change in perspective can have
negative effects on usability and spatial understanding. As an exam-
1 Introduction ple, consider an AR application for visualizing interior design and
furniture arrangement (see Figure 1); this is a popular use case for
The metaphor of the magic lens is used to describe a common in- AR. The entire purpose of AR in this type of application is to give
terface paradigm in which a display region reveals additional hid- the user a better sense of what the space will look like with the new
den information about the objects the user is interacting with. This décor or furniture. However, device-perspective AR will distort the
metaphor was originally introduced for traditional GUIs, but it has perspective and scale of the room (Figure 1b) so the user will not
get a true feel for the future remodeled room. On the other hand, a
∗ e-mail: [email protected] true magic lens would show the augmented scene at the same hu-
† e-mail: [email protected] man scale as the real world. Ideally there would be no perspective
‡ e-mail: [email protected] difference between the scene inside and outside the magic lens (Fig-
§ e-mail: [email protected] ure 1c). This type of interface is referred to as a user-perspective
magic lens.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed Scene reconstruction has been at the heart of the problem of creat-
for commercial advantage and that copies bear this notice and the full citation on the
first page. Copyrights for components of this work owned by others than ACM must be
ing a user-perspective magic lens. Since the scene is mediated by
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on an opaque display, it has to be re-rendered from a scene model. Re-
servers, or to redistribute to lists, requires prior specific permission and/or a fee. construction is still a challenging research problem. While active
Request permissions from [email protected]. depth sensors provide good results and have recently become com-
VRST 2014, November 11 – 13, 2014, Edinburgh, Scotland, UK.
2014 Copyright held by the Owner/Author. Publication rights licensed to ACM. monplace, they have constraints such as range limits and inability
ACM 978-1-4503-3253-8/14/11 $15.00
http://dx.doi.org/10.1145/2671015.2671027

87
to work outdoors in strong sunlight. Another approach to scene The NaviCam was a video-see-through AR system consisting of a
reconstruction is stereo vision, where the depth of the scene is re- palmtop TV with a mounted camera and tethered to a workstation.
constructed by matching two views of a stereo camera pair. The The video from the camera is captured, augmented, and then dis-
advantage of stereo reconstruction is that it can work with standard played on the TV. This hand-held video-see-through approach soon
cameras, it does not need active illumination, and there are no major became the norm for Augmented Reality interfaces [Zhou et al.
restrictions with regard to outdoor scenes. 2008]. Optical see-through AR approaches (e.g. [Bimber et al.
2001; Olwal and Höllerer 2005; Waligora 2008] can implement
Some stereo reconstruction algorithms can provide quite accurate perspectively correct AR magic lenses without the need for scene
depth maps, but this comes at a performance penalty. Fully accu- reconstruction but have to cope with convergence mismatches of
rate depth maps cannot yet be achieved at frame rate. Real-time augmentations and real objects behind the display unless they use
stereo can produce depth maps that are sufficient for many appli- stereoscopic displays.
cations, but they are not very good for the purpose of re-rendering
a real world scene. Typically, real-time stereo approaches achieve There have been efforts in the AR community to design and de-
speed by using a small depth range (limiting the number of different velop video see-through head-worn displays that maintain a seam-
depth values), resulting in a scene model composed of distinct front less parallax-free view of the augmented world [State et al. 2005;
facing planes. Re-rendering this model from a new point-of-view Canon 2014]. This problem is slightly simpler than correct per-
can result in a scene composed of obvious distinct layers. spective representation of the augmented world on hand-held magic
lenses since the relationship between the imaging device and the
In this paper we present a new approach to solving the problem user’s eyes is relatively fixed.
of creating a user-perspective magic lens. We observe that accu-
rate dense scene reconstruction is a requirement imposed by the With the proliferation of smartphones and tablets AR has reached
traditional rendering methods, and not an inherent requirement of the mainstream consumer market; this has made hand-held video-
creating a user-perspective view. By taking a different approach to see-through the most common type of AR and it is what is often
rendering, we lower the requirements for reconstruction while also assumed by the term “magic lens” when used in the context of AR
achieving good results. We do this by using image-based rendering [Mohring et al. 2004; Olsson and Salo 2011]. Since the display
(IBR) [Shum and Kang 2000]. IBR can produce high quality results of the augmented environment from the perspective of the device’s
with only limited scene models by leveraging existing imagery of camera introduces a potentially unwanted shift of perspective, there
the scene. This fits very well with the nature of our problem. is renewed interest in solutions for seamless user-perspective repre-
sentation of the augmented world on such self-contained mobile AR
The key to our approach is the adoption of a recent gradient do- platforms. User studies conducted using simulated [Baričević et al.
main IBR algorithm [Kopf et al. 2013] that is paired with a novel
2012] or spatially constrained [Čopič Pucihar et al. 2013; Čopič Pu-
semi-dense stereo matching algorithm we developed. The IBR al-
cihar et al. 2014] systems have shown that user-perspective views
gorithm we use renders from the gradients in the image instead of
have benefits over device-perspective views. Several systems have
the pixel color values. It achieves good results as long as the depth
attempted to create a user-perspective view by warping the video
estimates of the strongest gradients are good, even if the depths of
of a video-see-through magic lens [Hill et al. 2011; Matsuda et al.
the weak gradients are incorrect. This fits well with the general be-
2013; Tomioka et al. 2013]; however these approaches can only
havior of stereo reconstruction, but we exploit it further by using a
approximate the true user-perspective view as they are unable to
semi-dense stereo algorithm to compute depths only at the strongest
change the point of view and therefore do not achieve the geomet-
gradients.
rically correct view frustum.
With this approach we have created a geometrically-correct user-
The most directly relevant work to this paper is the geometrically-
perspective magic lens with better performance and visual quality
correct user-perspective hand-held augmented reality magic lens
than previous systems. Furthermore, we use only passive sensing,
system in [Baričević et al. 2012]. That prototype system was built
and support fully dynamic scenes with no prior modeling. Due to
using a Kinect depth sensor and a Wiimote. The Wiimote is used to
the use of face tracking, we do not require instrumenting the user.
track goggles worn by the user in order to obtain the head position.
Although our prototype system is tethered to a workstation and
The approach relies on the fairly high quality depth information
powered by a GPU, we are confident that given the rate of advance-
provided by the Kinect to obtain an accurate 3D model of the real
ment of mobile hardware this will be possible on a self-contained
world; the final scene is then rendered using conventional rendering
mobile platform in just a few years.
methods (raycasting and scanline rendering). While the approach is
fairly straightforward, it has certain constraints. Firstly, the system
2 Related Work does not gracefully handle dynamic scenes as the scene is rendered
in two layers with different real time characteristics. One layer is
The “magic lens” metaphor was first introduced by Bier et al. at Xe- rendered from the live Kinect stream and updates immediately, the
rox PARC [Bier et al. 1993] as a user interface paradigm developed other is rendered from a volumetric scene model that updates more
for traditional desktop GUI environments. The basic idea is that of slowly. Secondly, active depth sensors like the Kinect cannot oper-
a movable window that alters the display of the on-screen objects ate well under strong sunlight (or any other strong light source that
underneath it. This window acts like an information filter that can emits at their frequency).
reveal hidden objects, alter the visualization of data, or otherwise
modify the view within the region that the window covers. Stereo reconstruction is one of the most well researched areas of
computer vision. A full overview is well beyond the scope of this
This concept of an information filtering widget was quickly adopted paper. For an excellent review of the field we refer the reader to
outside traditional desktops. Viega et al. developed 3D versions [Scharstein and Szeliski 2002]. In recent years, a number of al-
of the magic lens interface, both as flat windows and as volumet- gorithms have been proposed that take advantage of GPU hard-
ric regions [Viega et al. 1996]. The Virtual Tricorder [Wloka and ware to achieve real-time performance [Wang et al. 2006; Yu et al.
Greenfield 1995] was a interaction device for an immersive VR en- 2010; Zhang et al. 2011; Kowalczuk et al. 2013]. While these algo-
vironment that featured a mode in which a hand-held tool revealed rithms can produce fairly accurate dense disparity maps, the real-
altered views of the 3D world. [Rekimoto and Nagao 1995] in- time speeds are achieved for relatively low resolutions and narrow
troduced hand-held Augmented Reality with the NaviCam system. disparity ranges. Our stereo algorithm is inspired by PatchMatch

88
to introduce the idea. We also give a more detailed explanation of
how we adapted the method for our system in Section 5 below.
The main idea behind gradient domain methods is that an image
can be reconstructed from its gradients by performing an integra-
tion. Therefore, if one needed to generate an image corresponding
to a new viewpoint of a scene (as in a user-perspective magic lens),
one could do so by integrating the gradient images for those view-
points. These gradient images can be obtained by reprojecting the
gradients computed for an existing view of a scene for which there
(a) (b)
is scene geometry information. Since strong gradients are generally
sparse in a scene, and since stereo matching algorithms work best
at strong gradients, this approach provides a way to create a high
quality image even without a fully dense and accurate depth map
as long as the strongest gradients are correctly reprojected. While
there will be errors in the reprojected gradient image, they will be
mostly confined to weak gradients that do not have a large effect
on the integration of the final solution. In contrast, a standard re-
projection method would result in a noisy solution with much more
noticeable artifacts.
(c) (d) Using a rendering method that only requires good depth informa-
tion at the gradients gives us the opportunity to optimize our stereo
reconstruction. Instead of the standard approach of computing a
dense depth map across the input image pair, we can compute semi-
dense depth maps that only have information at the parts of the im-
age that have strong gradients. The depth of the rest of the image
can then be approximated by filling in depth values extrapolated
from the computed parts of the depth map. As long as the depth
information for the strongest gradients is correct, the final rendered
solution for the novel view will not have significant artifacts.
(e) (f) In order to achieve this goal we have developed a novel semi-dense
stereo matching algorithm inspired by PatchMatch [Barnes et al.
Figure 2: The steps to rendering a novel view: (a) input image, (b) 2009]. The algorithm is simple and fast, but it computes accurate
gradient magnitudes of input, (c) mask of strongest gradients, (d) results over the areas of interest. A detailed description of the algo-
disparity map for masked area, (e) filled-in disparity map, (f) final rithm is given in Section 4 below.
solution. (Note: (a) - (e) are for left camera, (f) is for final pose)
3.1 Creating a novel view

[Barnes et al. 2009], an iterative probabilistic algorithm for finding The basic steps to generating a novel view with our approach are
dense image correspondences. PatchMatch is a general algorithm shown in Figure 2. The input to the pipeline is a stereo pair (Figure
and has been applied to the field of stereo matching before. [Bleyer 2a shows left image) and a desired position for the novel view.
et al. 2011] proposed a stereo matching algorithm based on Patch-
Match primarily designed to support matching slanted surfaces, al- The first step (Figure 2b) is to filter the input image pair in order to
though it also supports front facing planes. In [Pradeep et al. 2013] produce a mask that marks the pixels that are at the strong gradi-
this was adapted for real-time 3D shape reconstruction by using a ents. We define the gradients as forward difference between neigh-
faster matching cost and relying on a volumentric fusion process to bors. The overall strength of the gradient is computed by taking the
compensate for the noisy per-frame depth maps. maximum between the horizontal and vertical strengths, which are
defined as the average of the per channel absolute differences.
Imaged-based rendering techniques create novel views of a scene
from existing images [Shum and Kang 2000]. These novel views We then apply a threshold to this gradient strength image to create
can be rendered either purely from input image data [Levoy and a gradient mask. We use a global threshold for the entire image.
Hanrahan 1996], or by using some form of geometry [Shade et al. The threshold can be either a set fixed value or the current average
1998; Debevec et al. 1996]. Our approach is based on the gradient- gradient magnitude. In practice, we find a fixed threshold between
domain image-based rendering work by Kopf et al. [2013]. Their 5 and 10 to work well. We first clean the mask by removing pix-
method creates novel views by computing dense depth maps for the els that have no neighbors above the threshold and then perform a
input images, reprojecting the gradients of the images to the novel dilation step (Figure 2c).
view position, and finally using Poisson integration [Pérez et al. Next, our stereo matching algorithm is run over the masked pixels.
2003] to generate the novel view. This results in a semi-dense disparity map (Figure 2d) with good
depth estimates for the masked areas with strong gradients, and no
3 Overview data for the rest of the image. We then perform a simple extrapo-
lation method to fill-in the disparity map across the image (Figure
As mentioned above, our approach is based on the gradient domain 2e). Then the 3D position of each pixel is computed from the dis-
image-based rendering algorithm by Kopf et al. [2013]. For a de- parity map. The renderer takes the 3D position information, as well
tailed description of the algorithm we refer the reader to the original as the desired novel view’s camera parameters (position, view frus-
paper; here we will only give a brief high level overview in order tum, etc.) and generates the final image (Figure 2f).

89
4 Stereo Reconstruction zero, while the costs are initialized to the maximum possible value.
In our implementation we use unsigned 8-bit values to store the
One of the most important considerations in the development of our disparities, with a disparity range of [0, 255]. The costs are stored
algorithm was the need to run as fast as possible. This led to a par- as unsigned 16-bit values, giving a range of [0, 216 -1]. The upper
allel GPU-based approach, which in turn set additional constraints. limit is above the maximum possible value that can be returned as a
One of the principal tenets of GPU computing (or SIMD computing matching cost, so initializing the cost to 0xffff simplifies the search
in general) is to avoid code path divergence. That is, each thread in for the minimum cost disparity since there is no need to treat the
a concurrently running group of threads should execute the same first candidate disparity differently from the rest.
steps in the same order at the same time, just using different data.
This demand led to several design decisions regarding our algo- Random Search The random search step consists of generating
rithm. a random disparity value, computing the matching cost given that
disparity, and keeping it if the cost is lower than the current cost.
4.1 Mask indexing This can then be repeated a number of times before continuing to
the propagation step.
The mask computed from the gradient magnitudes determines the
pixels for which the stereo algorithm will compute disparities. The way the random disparity is generated requires some discus-
However, since the algorithm is implemented on the GPU using sion. Regular PatchMatch [Barnes et al. 2009] initializes fully ran-
CUDA, using this mask directly would be inefficient. A naı̈ve ap- domly from all possible correspondences, and the random search
proach would be to run a thread per pixel and simply exit the thread is done by randomly searching from all possible correspondences
if the pixel is not in the mask. However, this is very inefficient, within a shrinking window centered on the current solution. Our
as these threads will not truly exit. The SIMD nature of the GPU approach is different. Firstly, the initialization and random search
hardware requires all the threads that are concurrently running on a is a single unified step. Secondly, the random disparity is not gen-
core to follow the same code path. If even one thread in that group erated from the disparity range but from the valid indices for that
is in the mask and needs to run the algorithm, then all the threads epipolar line. We are matching only the strong gradients that are
in the group might as well run since they would introduce (almost) within our masks.
no overhead. In order to get any performance gain, all the pixels In general, if a part of the scene is labeled as a strong gradient in the
in the image region covered by the group would have to be outside left image it will also be labeled as a strong gradient in the right im-
the mask. This is rare in natural images, as there are almost always age (and vice-versa). This is not the case for parts that are occluded
a few strong gradient pixels in any part of the image. This means in one image of the pair, but those do not have a correct match
that the naı̈ve approach to the gradient-guided semi-dense stereo anyway. If follows that a pixel within the gradient mask of one im-
algorithm degenerates to a dense algorithm. age will have its corresponding pixel within the gradient mask of
In order to prevent this waste of computational power we re-group the other image. Since the gradients are generally sparse, this sig-
the gradient pixels so that they cluster together. We process the nificantly reduces the possible valid disparities. This reduction in
mask image to create an array of pixel indices. Each row of the search space means that each random guess has a higher probability
mask is traversed in parallel, and when a pixel that is inside the of being correct, which improves convergence.
mask is encountered, its index is saved in the output array at the Therefore, when generating a random disparity we sample from the
same row and in the next available column. Pixels outside the mask space of valid indices, not from the full disparity range. As men-
are simply ignored. As a result the indices of the masked pixels are tioned above, the first column of each row in the index masks stores
densely stored in the output array. The count of masked pixels in the number of valid pixels. This value is used as the range of a
a row is saved in the first column of the output array. This process uniform random distribution. We generate a random integer from
creates a mask whose blocks are mostly completely full or com- this distribution, this number gives us the column in the index mask
pletely empty, with only a few that are partially full. This mask is row to sample. The index stored in that column gives us our ran-
much more suitable for parallel processing on GPU architectures. dom match candidate. We then compute the matching cost for this
candidate correspondence, if the cost is lower than the current cost
4.2 Stereo matching we save the disparity and the cost as the current best match. For the
matching cost we use the standard sum of absolute differences over
Now that we have a mask of the strong gradients in the image, we a 7 × 7 support window. This process can be iterated, in our current
can run stereo matching on them. We implemented a simple, fast, implementation we run two iterations.
and accurate stereo matching algorithm inspired by PatchMatch.
Our algorithm takes the basic ideas of random search and propaga-
Propagation The random search step will generate a very noisy
tion from PatchMatch and applies it to the domain of semi-dense
disparity map where most of the disparities are wrong, but some are
stereo matching at the gradients and in parallel. Although inspired
correct. The propagation step serves to propagate the good matches
by PatchMatch, the specific details are somewhat different due to
across the image. Here our algorithm also differs significantly from
the nature of the problem.
PatchMatch.
The algorithm consists of two main steps: Random Search and
Taking the standard PatchMatch approach to propagation would
Propagation. The full algorithm is run for a number of iterations,
present several problems for our application scenario. Firstly, the
and in each iteration each step is iterated as well. Each iteration
computation cost is too high. In the serial version the image is
of each step is fully parallel at the individual pixel level. Only the
processed linearly from one corner to the next. At each pixel the
steps themselves and their iterations are serialized.
disparities of the preceding horizontal and vertical neighbors are
used as possible new disparities and new matching cost are com-
Data and Initialization The algorithm takes as its input the stereo puted. If the cost of a candidate disparity is lower than the current
image pair and the arrays with the mask indices. It outputs the one, the new disparity is adopted. Computing the matching cost is
disparity values and matching costs for each camera of the stereo expensive in general, and doing it serially is prohibitive. The per-
pair. Before the algorithm is run, the disparities are initialized to formance would be far too slow for real-time use. Parallel versions

90
of PatchMatch have been proposed, but they are still not well suited eration contributes somewhat to the disparity edge fattening, but it
to our application. Although the computations are done in parallel, improves the disparity map overall.
much more are needed per pixel. Even the parallel versions require
too many expensive matching cost computations per frame. Finally, after the previous steps we can fill in the remainder of the
disparity map. We assign new values to any pixels that are still
Secondly, PatchMatch is meant for computing dense correspon- left as invalid, or were not in the gradient mask. To extrapolate
dences. We only compute disparities within the masked areas. This the disparity map we use a simple linear search across the epipolar
means there are large gaps in the image. Although it is possible in lines. From each pixel (again in parallel) we search left and right for
principle to propagate by skipping those gaps, this would violate the first pixel that is valid. We look at the two disparity values and
the assumption of propagating between neighbors and it is unlikely adopt the lower one (taking the lower value instead of interpolating
that that kind of propagation would be useful. In the case of par- helps prevent occlusion edges from bleeding into occluded areas).
allel implementations of PatchMatch, the propagation is limited in This is perhaps an overly simplistic approach, and it does result in
radius so it would not be able to skip gaps anyway. considerable streaks in the disparity map. However, these streaks
are mainly over low gradient strength areas and therefore do not
We take a simpler approach to the propagation step. Instead of cause many artifacts in the final re-rendered image.
propagating serially through the entire image, we have each pixel
in parallel check its neighborhood. Instead of computing another
matching cost for each of its neighbor’s disparities, the pixel uses 4.4 Performance and accuracy
the neighbor’s cost as a proxy for what the cost would be for this
pixel if it had the same disparity. The idea behind this is that if our Through experimentation with the number of iterations of our
disparity is the same as that of our neighbor, our matching costs stereo algorithm, we settled on two overall iterations, each doing
will likely be very similar as well. We chose the neighbor with the two iterations of search and three iterations of propagation. This
lowest cost and take its disparity as a candidate solution, we only means that in total we only perform ten matching cost computations
now compute a new matching cost. If this new cost is lower than our per frame per masked pixel. Despite this we get accurate disparity
old cost we accept the new disparity, otherwise we keep the old one. results. Table 1 gives the timings and error rates for the Teddy and
This means that each iteration of the propagation step only does one Cones pairs from the Middlebury dataset [Scharstein and Szeliski
matching cost computation. In our current implementation we run 2003]. Figures 3 and 4 show the disparity maps and disparity errors.
three iterations of the propagation step. Because of the probabilistic nature of the algorithm we have an
effective disparity range of 256, even though we only compute the
4.3 Post processing cost for ten disparity levels. To achieve the equivalent precision a
plane sweep algorithm would have to check all disparity levels and
Most stereo algorithms have a post-processing step that follows the perform an order of magnitude more matching cost computations
initial computation of the disparity map. The purpose of this step (256). Even if the plane sweep skipped over unmasked areas, it
is to further refine the disparities. We take a fairly simple approach would not significantly reduce the runtime because of the GPU code
to post processing. The goal is to determine which of the computed path divergence problem mentioned above.
disparities are likely correct. Then the rest of the image is filled in
with extrapolated values. Table 1: Per-frame timings and error rates for the Teddy and Cones
datasets. The resolution of the input images and the disparity maps
The simplest way to determine good disparities is to run a con-
is 450x375. The error rate is the percentage of pixels within unoc-
sistency check. This involves comparing the left and right dispar-
cluded masked areas with a disparity error greater than 1 pixel.
ity map and only keeping the values for those pixels whose tar-
get points back at them, i.e., only keep correspondence pairs. This
Teddy Cones
eliminates the parts of the image that are occluded in the other view,
and therefore cannot have a good match. Although this works well Timings
for a standard plane sweep algorithm, in our case it could cause er- Computing mask 2.98 ms 3.18 ms
rors because our search is probabilistic and there is no guarantee Stereo matching 12.59 ms 16.82 ms
that pixels that are unoccluded and belong to a correspondence pair Post-processing 1.92 ms 1.57 ms
will point to each other. It is possible for only one pixel of the pair
to point to its match, while the other one points elsewhere. To help Error rate 15.47% 7.52%
with this we run a step prior to the consistency check. For each
pixel p in a disparity map we check if its match p0 = q points back.
If it does not, we compare the matching costs of the two pixels. Prototype In our prototype system the stereo camera has a native
Since the matching cost is symmetric it should be the same (and resolution of 1024x768, but in order to improve performance we
minimal) for a correspondence pair. If q has a higher matching cost reduce this to 512x384 for the stereo matching algorithm. We do,
then p, we set its match q 0 to p and set the cost. This will always however, use the full-color image for the matching, instead of the
create a better solution. This process is run in both left-to-right and common grayscale reduction. Although the stereo matching is at
right-to-left directions. After this step, we run a traditional consis- half resolution, this is upscaled back to full resolution before com-
tency check. Pixels that are not part of a correspondence pair are puting the gradient positions and calling the IBR algorithm.
labeled as invalid.
Since the invalid pixels are in the masked area, they are impor- 5 Rendering
tant so we do not want to naı̈vely fill them in the same way as the
unmasked pixels. Instead we attempt to grow the valid disparity As mentioned above, the basic idea of the method is to create a
values into the invalid ones. This is a parallel process where each novel view by integrating gradient images that are formed by re-
invalid pixel checks it direct neighbors and adopts the lowest valid projecting the gradients of the original view. Integrating a solution
disparity from the neighbors, this invalid pixels is now marked valid just from the gradients is a computationally expensive operation,
but its cost is set to maximum. Each iteration of this further grows even with a GPU-based parallel implementation. It can take many
the disparity, we settled on five iterations for our system. The op- iterations for the integration to converge to a solution, partly due

91
(a) (b) (a) (b)

(c) (d) (c) (d)

Figure 3: Stereo matching for Teddy dataset. (a) Left input image. Figure 4: Stereo matching for Cones dataset. (a) Left input image.
(b) Raw disparity. (c) Final (filled-in) disparity. (d) Disparity error: (b) Raw disparity. (c) Final (filled-in) disparity. (d) Disparity error:
white - correct, black - error greater than 1 pixel, gray - not in mask white - correct, black - error greater than 1 pixel, gray - not in mask
or excluded because of occlusion. or excluded because of occlusion.

to the unknown constant of integration. The method by Kopf et al. Finally, the third step is the integration of the final solution from the
[2013] uses an approximate solution (the data term) as an initial gradients, initializing with and biasing toward the data term. This
solution in order to significantly reduce the number of iterations. step is fairly expensive, but its runtime is mostly constant, depend-
ing mainly on the number of iterations.
The key to the approximation step is to consider that when a gra-
dient changes position from the original view to the new view it The original work by Kopf et al. used a super-resolution frame-
should alter the color of the regions that it passes over. To clar- buffer for rendering all the steps in the algorithm, i.e., the frame-
ify, consider a quadrilateral whose two opposing edges are the gra- buffer size is several times larger than the input resolution. They
dient’s original position and the new position. This quad can be also bias the final solution toward the approximate solution. We
drawn over the original view, and the gradient value can be applied take a somewhat different approach. We observe that we can treat
to the pixels that the quad covers. This may add or subtract to those the approximate solution as simply the low frequency component
pixels’ value. If this process is done for all gradients, the result- of the final solution, while the reprojected gradients can provide the
ing image will be very similar to what the correct image should high frequency detail. We then use the approximate solution just as
be from the new view. For a more in-depth description, please see a initial solution, and do not bias towards it during the integration.
[Kopf et al. 2013]. This then allows us to use a much lower resolution image for our
data term, since it only needs to capture low frequency information.
5.1 Performance considerations By using a lower resolution data term we significantly improve per-
formance. We set the data term resolution to a quarter of the regular
While developing our prototype system we aimed to strike a bal- framebuffer resolution. We also reduce the number of integration
ance between real-time performance and good image quality. The steps to five, and use a framebuffer size smaller than the original
various bottlenecks were identified through profiling, and adjust- image. Although our framebuffer size (640x480) is smaller than
ments were made to reduce the run-time while minimizing any loss the raw input resolution, it does not actually lower the quality of
of quality. Here we give some details about those considerations. the final results. This is because the field of view of the user’s frus-
tum is usually narrower than that of the camera. As a result, the
The IBR algorithm can be divided into three distinct steps that have input image is effectively scaled up when shown on the magic lens
different performance behaviors. and therefore still oversampled by the framebuffer.
The first step is the rendering of the data term, which is surprisingly The final augmented image is rendered at 800x600, which is the
the most expensive. The performance hit here comes from the num- resolution of our display. The various resolutions in our pipeline
ber and size of the quads. Each quad corresponds to a gradient, so were empirically determined to give a good balance of performance
there are twice as many quads as there are pixels (one horizontal and quality for our system.
and one vertical). Furthermore, the nature of the shifting gradi-
ents means that each quad will typically generate a large number of
fragments. The cost of this step changes considerably based on the 6 Prototype
novel view position.
Our system consists of a hand-held magic lens rig tethered to a
The second (also fastest) step is rendering the gradients images, i.e., workstation. The rig, shown in Figure 5, was built using common
simply reprojecting the lines of the gradients at their new positions. off-the-shelf parts. The central component of the magic lens is a

92
6.2 Face tracking

In order to create a user-perspective view, we need to be able to ac-

quire the user’s perspective. That is, we need to capture the position
of the user’s eyes relative to the display. We achieve this with face
tracking, which requires a user-facing camera, available on most
smartphones and tablets. Indeed, the recent Amazon Fire Phone
features face tracking as a core element, implemented through the
Figure 5: Our user-perspective magic lens prototype hardware. use of four user-facing cameras.
We implemented face tracking with FaceTracker [Saragih et al.
2009; Saragih and McDonald 2014], a real-time deformable face
Lilliput 10.100 LCD display. We mounted a PointGrey Bumblebee2 tracking library. The library provides us with the user’s head posi-
stereo camera, used for the scene reconstruction, to the back of the tion, from which we compute an approximate 3D position for the
display. A PointGrey Firefly MV with a wide angle lens is mounted user’s “middle eye” (i.e., the point halfway between the left and
on the front and is used to track the user. right eye). Due to monocular scale ambiguity and the differences in
the dimensions of human faces, this position is only approximate,
The magic lens is tethered to a workstation with an NVIDIA but it is sufficient for our prototype. This size ambiguity can be
Quadro 6000 GPU. This GPU provides most of the processing resolved by introducing user profiles with the exact facial features
power and does most of the computational work. The workstation of the user, and using face recognition to automatically load such
also has 32GB of RAM, and two dual-core AMD Opteron CPUs. a profile. We leave that for future work. Alternatively, using two
The software stack of the system is built on Linux (Kubuntu 12.04), or more camera’s for the face tracking can also resolve the scale
using CUDA 5.0 and OpenGL 4.3. ambiguity.
The user tracking is implemented as a separate system running in
6.1 Calibration
its own process and communicating with the main software through
a loopback network interface. This allows us to easily swap out the
In order to ensure a properly aligned view, the various components tracking system if needed. We used this feature to implement a
of the magic lens rig needed to be calibrated, both individually and marker based tracker for the purpose of capturing images and video
to each other. Firstly, using standard methods we acquired the in- of the magic lens. The images in this paper and accompanying
trinsic camera parameters for the Firefly, and the stereo camera pa- videos were taken by attaching a fiducial marker to the cameras,
rameters of the Bumblebee2. These parameters are loaded by our and using this alternate tracker.
system in order to undistort the captured video. In the case of the
stereo camera, the parameters are also used to rectify the input so
that the epipolar lines are aligned with the horizontal axis.
Secondly, we needed to calibrate the positions and orientations of
all the cameras and the display. We use the left camera of the stereo
pair as our reference coordinate system. The right camera is already
calibrated to the left from the stereo calibration. Calibrating the
user-facing camera and the display required more effort.
Since the stereo camera and the user-facing camera are facing oppo-
site directions, they cannot be easily calibrated by using a common
target. There are methods that can achieve this using mirrors, but
we opted for a simpler approach. We prepared a small workspace
as a calibration area, covering it with coded fiducial markers. The
area was arranged so that it had markers on opposing sides. The
area was captured with a calibrated camera and the relative trans-
formations between the markers were computed, with one of them
used as a reference. We then placed the magic lens rig inside the
calibration area and captured simultaneous views from the front and
back cameras. From the pose of each camera relative to the refer-
ence, we computed the transformation between the cameras.
With the user-facing camera calibrated to the stereo camera, the
only remaining part was calibrating the display. The display was
calibrated to the user-facing camera, which by extension calibrated
it to the the stereo camera. We did this by displaying a fiducial
marker on the display, placing the rig in the calibration area, and us-
ing both the user-facing camera and an external camera. The setup
consisted of having the user facing camera see some of the mark-
ers in the calibration area, with the external camera simultaneously
seeing those same markers as well as the marker on the display.
The external camera gives the relative pose of the display to the
common markers, and the user-facing camera gives the common Figure 6: Example of final result. Top: user’s view showing cor-
markers pose relative to itself. Combined, this gives the pose of the rect user-perspective and good alignment of the view frustum. Bot-
display relative to the user-facing camera, and therefore relative to tom: corresponding screen capture showing good quality image
the reference left camera as well. with minimal rendering artifacts.

93
7 Results the gradients. This is likely because this step makes OpenGL and
CUDA synchronize which forces all GPU operations to complete,
Some examples of the type of results we get can be seen in Figures it also imposes a synchronization with the CPU.
1a, 2, and 6.
Figure 6 shows a simple example of an AR scene, with both the 7.2 Discussion
user’s view (top) and the corresponding screen capture from the
magic lens display (bottom). The view frustum inside the magic Overall, our system provides quite satisfactory results but it does
lens is well aligned with the outside and the perspective of the scene have some remaining challenges. From a user perspective, the chal-
matches that of the outside. The screen capture taken at the same lenges are issues with the view frustum, and issues with the image
moment shows that the image quality of the magic lens view is quite quality. From a technical standpoint these are caused by issues with
good with only minimal rendering artifacts. face tracking, stereo reconstruction, rendering, and calibration.
Figure 2 shows the main steps of our approach for a somewhat clut-
tered live scene with various different features: dark areas, bright
areas, textured surfaces, homogenous surfaces, specularities, and
thin geometry. The stereo matching is only run on a small per-
centage of the image, and the filled-in disparity map is very coarse.
However, the final rendering has relatively minor artifacts.

7.1 Evaluation and performance

Figures 7 and 8 illustrate the effects of the individual parts of our

approach using the standard images from the Middlebury dataset.
The effect of using an approximate disparity map can be seen in
Figure 8. The top shows the results of our rendering when using
ground truth data for the disparity map. The bottom show the results
with a disparity map produced by our stereo algorithm. Despite the
much coarser disparity map, the final results are fairly similar, with
the most serious artifacts confined to the pencils in the mug.
A comparison of the rendering from using full resolution (in this
case 640x533) images for everything, versus using a half-resolution
disparity map and a quarter-resolution data term is given in Figure
7. As can be seen, the reduced resolution does not have a significant
effect on the quality of the final rendering.

Table 2: Average per-frame timings for our prototype implementa- Figure 7: Comparison between full resolution and reduced reso-
tions. Average framerate is about 16 FPS. lution. Left is data term, right is solution. Top is full resolution,
bottom is reduced resolution.
Timing (ms)
Frame total 62.32
Prepare input pair 3.11
Stereo matching 7.92
Post-processing 3.73
Consistency check 0.18
Grow disparity 0.81
Fill disparity 2.74
Compute and update positions 7.34
Image-based rendering 36.33
Data term 13.66
Gradients 8.04
Merge left and right 1.08
Conjugate gradient solver 13.55
Other 3.89

The performance of our final system across the various steps in our
pipeline can be seen in Table 2. The system has an overall aver-
age framerate of 16 FPS. The largest aggregate cost and about half
the total cost is the image-based rendering. The stereo matching is
very fast at less than 8ms. However post-processing adds another
3.7ms, most of which is spent on filling in the disparity. This is a
very simple step, but it is not yet optimized and performs poorly if Figure 8: Comparison between result using ground truth versus
the masked regions are too sparse. Another unexpectedly high cost our stereo matching. Top is with ground truth, bottom is with our
at over 7ms is the computing and updating of the 3D positions of stereo algorithm.

94
View frustum The heart of the user-perspective magic lens prob- sion boundaries. In areas that are visible from the viewer’s position
lem is providing a correct view frustum for the user. While our but not seen from the cameras, the gap is filled by smooth streaks
system generally accomplishes this goal, it has some constraints. connecting the edges.
Firstly, since it is a fully live system it can only show what the stereo
cameras currently see. Although we use cameras with a fairly wide 8 Conclusion and Future Work
field of view, it is still possible for the user to orient the magic lens
in such a way that the user’s view frustum includes areas that the We have presented a new approach to creating a geometrically-
cameras do not see. This problem is somewhat mitigated by the correct user-perspective magic lens, based on leveraging the gra-
fact that the best way to use a user-perspective magic lens is to hold dients in the real world scene. The key to our approach is in the
it straight in order to get the widest view. This then keeps the de- coupling of a recent image-based rendering algorithm with a novel
sired view frustum within the region visible by the cameras. Never- semi-dense stereo matching algorithm. Our stereo algorithm is fast
theless, this issue warrants some discussion. Currently our system and accurate in the areas of interest. The use of image-based render-
simply fills in those areas using information from the known edges. ing provides us with good imagery, even with limited scene model
A possible simple solution to this problem could be to use fisheye detail. Based on this approach we built a prototype device using
lenses or additional cameras in order to get a 180◦ view of the scene common off-the-shelf hardware.
behind the display. In [Baričević et al. 2012] the approach was to
create a model of the environment and render from the model, this In addition to the various possible improvements to the system we
way the out-of-sight areas could still be rendered if they were once would also like to evaluate the system with a formal user study.
visible. This type of compromise approach where currently visible Previous user studies on user-perspective magic lenses have either
areas are rendered from live data, while out-of-sight areas are ren- been in simulation [Baričević et al. 2012] or with approximations
dered from a model could also be a promising solution here. Since [Čopič Pucihar et al. 2013; Čopič Pucihar et al. 2014]. We hope
we use image-based rendering, the scene model can simply be a to be able to do a fair comparison between device-perspective and
collection of keyframes with depth maps. user-perspective magic lenses with a full real system.

Secondly, our system has some noticeable latency. The latency

is a compound of the latency from the face tracking, the latency 9 Acknowledgements
from the pre-processing of the stereo video, and the latency from
the frame rate. This latency causes a lag in correctly aligning the D. B. would like to thank UCSB for the Chancellor’s Fellowship
view frustum, most noticeable when the user makes fast motions or award, which provided funding. This work was partially supported
does long sweeps with the device. by NSF grants IIS-1219261 and IIS-0747520, as well as ONR grant
N00014-14-1-0133.
Thirdly, the view frustum can be slightly misaligned due to inac-
curacies with the face tracking. We use a free off-the-shelf face
tracker and only estimate an approximate position using a general
References
model of the human face. Better results could be achieved by using
a more robust face tracker and by using per-user face profiles. BARI ČEVI Ć , D., L EE , C., T URK , M., H ÖLLERER , T., AND
B OWMAN , D. 2012. A hand-held AR magic lens with user-
perspective rendering. In Mixed and Augmented Reality (IS-
Image quality The overall quality of our system is quite good. MAR), 2012 IEEE International Symposium on, 197–206.
However, we do not yet achieve a level of quality that would be
satisfactory for mainstream commercial applications. The visuals BARNES , C., S HECHTMAN , E., F INKELSTEIN , A., AND G OLD -
are not as clean as with systems that only approximate the user- MAN , D. B. 2009. PatchMatch: a randomized correspondence
perspective view through image warping [Hill et al. 2011; Tomioka algorithm for structural image editing. ACM Trans. Graph. 28,
et al. 2013] but they are generally as good as the geometrically- 3.
correct user-perspective magic lens in [Baričević et al. 2012].
B IER , E. A., S TONE , M. C., P IER , K., B UXTON , W., AND
The artifacts we get are primarily caused by errors in stereo recon- D E ROSE , T. D. 1993. Toolglass and magic lenses: the see-
struction. In general, when the correct stereo correspondence has through interface. In Proceedings of the 20th annual conference
a higher matching cost than another incorrect correspondence, an on computer graphics and interactive techniques, ACM, New
error in the disparity map will occur. This can occur with highly York, NY, USA, SIGGRAPH ’93, 73–80.
specular surfaces or occlusion boundaries where the background is
different between the stereo views. Another cause is when there is a B IMBER , O., F ROHLICH , B., S CHMALSTEIG , D., AND E NCAR -
NACAO , L. 2001. The Virtual Showcase. Computer Graphics
low texture or periodic feature that is aligned with the epipolar lines
of the stereo camera. These problems are common to local stereo and Applications, IEEE 21, 6 (Nov), 48–55.
algorithms, especially when using the simple sum of absolute dif- B LEYER , M., R HEMANN , C., AND ROTHER , C. 2011. Patch-
ferences as the matching cost. There have been many proposals Match Stereo - Stereo Matching with Slanted Support Win-
for matching costs that can help address some of these issues. Our dows. In Proceedings of the British Machine Vision Conference,
stereo matching algorithm is agnostic to the matching cost used, BMVA Press, 14.1–14.11.
so exploring these alternative costs is of definite interest for future
work. C ANON, 2014. Canon Mixed Reality. http://usa.canon.
com/cusa/office/standard_display/Mixed_
In general, the artifacts are not very severe. They are mostly unnot- Reality_Product, accessed June 2014.
icable in the weak gradient areas, and occur primarily when there
is an error with the disparity of a strong gradient. Due to the na- Č OPI Č P UCIHAR , K., C OULTON , P., AND A LEXANDER , J. 2013.
ture of the gradient domain image-based rendering algorithm, any Evaluating Dual-view Perceptual Issues in Handheld Augmented
errors are usually blurred out which helps in making them less ob- Reality: Device vs. User Perspective Rendering. In Proceedings
jectionable. In most cases the artifacts are either fuzzy waves along of the 15th ACM on International Conference on Multimodal In-
some straight edges, or occasional blurry streaks from some occlu- teraction, ACM, New York, NY, USA, ICMI ’13, 381–388.

95
Č OPI Č P UCIHAR , K., C OULTON , P., AND A LEXANDER , J. 2014. S ARAGIH , J. M., L UCEY, S., AND C OHN , J. 2009. Face Align-
The Use of Surrounding Visual Context in Handheld AR: Device ment through Subspace Constrained Mean-Shifts. In Interna-
vs. User Perspective Rendering. In Proceedings of the SIGCHI tional Conference of Computer Vision (ICCV).
Conference on Human Factors in Computing Systems, ACM,
New York, NY, USA, CHI ’14, 197–206. S CHARSTEIN , D., AND S ZELISKI , R. 2002. A Taxonomy and
Evaluation of Dense Two-Frame Stereo Correspondence Algo-
D EBEVEC , P. E., TAYLOR , C. J., AND M ALIK , J. 1996. Mod- rithms. International Journal of Computer Vision 47, 1-3, 7–42.
eling and Rendering Architecture from Photographs: A Hybrid
S CHARSTEIN , D., AND S ZELISKI , R. 2003. High-accuracy stereo
Geometry- and Image-based Approach. In Proceedings of the
depth maps using structured light. In Computer Vision and Pat-
23rd Annual Conference on Computer Graphics and Interactive
tern Recognition, 2003. Proceedings. 2003 IEEE Computer So-
Techniques, ACM, New York, NY, USA, SIGGRAPH ’96, 11–
ciety Conference on, vol. 1, I–195–I–202 vol.1.
20.
S HADE , J., G ORTLER , S., H E , L.- W., AND S ZELISKI , R. 1998.
H ILL , A., S CHIEFER , J., W ILSON , J., DAVIDSON , B., G ANDY, Layered Depth Images. In Proceedings of the 25th Annual
M., AND M AC I NTYRE , B. 2011. Virtual transparency: intro- Conference on Computer Graphics and Interactive Techniques,
ducing parallax view into video see-through AR. In Proceedings ACM, New York, NY, USA, SIGGRAPH ’98, 231–242.
of the 10th IEEE International Symposium on Mixed and Aug-
mented Reality (ISMAR), 2011, 239–240. S HUM , H., AND K ANG , S. B., 2000. Review of image-based ren-
dering techniques.
KOPF, J., L ANGGUTH , F., S CHARSTEIN , D., S ZELISKI , R., AND
G OESELE , M. 2013. Image-based Rendering in the Gradient S TATE , A., K ELLER , K. P., AND F UCHS , H. 2005. Simulation-
Domain. ACM Trans. Graph. 32, 6 (Nov.), 199:1–199:9. Based Design and Rapid Prototyping of a Parallax-Free, Ortho-
scopic Video See-Through Head-Mounted Display. In Proceed-
KOWALCZUK , J., P SOTA , E., AND P EREZ , L. 2013. Real- ings of the 4th IEEE/ACM International Symposium on Mixed
Time Stereo Matching on CUDA Using an Iterative Refinement and Augmented Reality, IEEE Computer Society, Washington,
Method for Adaptive Support-Weight Correspondences. Circuits DC, USA, ISMAR ’05, 28–31.
and Systems for Video Technology, IEEE Transactions on 23, 1
(Jan), 94–104. T OMIOKA , M., I KEDA , S., AND S ATO , K. 2013. Approximated
user-perspective rendering in tablet-based augmented reality. In
L EVOY, M., AND H ANRAHAN , P. 1996. Light Field Render- Mixed and Augmented Reality (ISMAR), 2013 IEEE Interna-
ing. In Proceedings of the 23rd Annual Conference on Com- tional Symposium on, 21–28.
puter Graphics and Interactive Techniques, ACM, New York,
NY, USA, SIGGRAPH ’96, 31–42. V IEGA , J., C ONWAY, M. J., W ILLIAMS , G., AND PAUSCH , R.
1996. 3D magic lenses. In Proceedings of the 9th annual ACM
M ATSUDA , Y., S HIBATA , F., K IMURA , A., AND TAMURA , H. symposium on user interface software and technology, ACM,
2013. Poster: Creating a user-specific perspective view for mo- New York, NY, USA, UIST ’96, 51–58.
bile mixed reality systems on smartphones. In 3D User Inter-
faces (3DUI), 2013 IEEE Symposium on, 157–158. WALIGORA , M. 2008. ”Virtual Windows: Designing and Im-
plementing a System for Ad-hoc, Positional Based Rendering”.
M OHRING , M., L ESSIG , C., AND B IMBER , O. 2004. Video see- Master’s thesis, University of New Mexico, Department of Com-
through AR on consumer cell-phones. In Mixed and Augmented puter Science.
Reality, 2004. ISMAR 2004. Third IEEE and ACM International
Symposium on, 252–253. WANG , L., L IAO , M., G ONG , M., YANG , R., AND N ISTER , D.
2006. High-Quality Real-Time Stereo Using Adaptive Cost Ag-
O LSSON , T., AND S ALO , M. 2011. Online user survey on current gregation and Dynamic Programming. In 3D Data Processing,
mobile augmented reality applications. In Proceedings of the Visualization, and Transmission, Third International Symposium
10th IEEE International Symposium on Mixed and Augmented on, 798–805.
Reality (ISMAR), 2011, 75–84.
W LOKA , M. M., AND G REENFIELD , E. 1995. The Virtual Tri-
O LWAL , A., AND H ÖLLERER , T. 2005. POLAR: Portable, Opti- corder: A Uniform Interface for Virtual Reality. In Proceedings
cal See-through, Low-cost Augmented Reality. In Proceedings of the 8th Annual ACM Symposium on User Interface and Soft-
of the ACM Symposium on Virtual Reality Software and Technol- ware Technology, ACM, New York, NY, USA, UIST ’95, 39–40.
ogy, ACM, New York, NY, USA, VRST ’05, 227–230.
Y U , W., C HEN , T., F RANCHETTI , F., AND H OE , J. 2010. High
P ÉREZ , P., G ANGNET, M., AND B LAKE , A. 2003. Poisson Image Performance Stereo Vision Designed for Massively Data Paral-
Editing. ACM Trans. Graph. 22, 3 (July), 313–318. lel Platforms. Circuits and Systems for Video Technology, IEEE
Transactions on 20, 11 (Nov), 1509–1519.
P RADEEP, V., R HEMANN , C., I ZADI , S., Z ACH , C., B LEYER ,
M., AND BATHICHE , S. 2013. MonoFusion: Real-time 3D re- Z HANG , K., L U , J., YANG , Q., L AFRUIT, G., L AUWEREINS , R.,
construction of small scenes with a single web camera. In Mixed AND VAN G OOL , L. 2011. Real-Time and Accurate Stereo: A
and Augmented Reality (ISMAR), 2013 IEEE International Sym- Scalable Approach With Bitwise Fast Voting on CUDA. Circuits
posium on, 83–88. and Systems for Video Technology, IEEE Transactions on 21, 7
(July), 867–878.
R EKIMOTO , J., AND NAGAO , K. 1995. The World Through the
Computer: Computer Augmented Interaction with Real World Z HOU , F., D UH , H.-L., AND B ILLINGHURST, M. 2008. Trends
Environments. In Proceedings of the 8th Annual ACM Sympo- in augmented reality tracking, interaction and display: A review
sium on User Interface and Software Technology, ACM, New of ten years of ISMAR. In Mixed and Augmented Reality, 2008.
York, NY, USA, UIST ’95, 29–36. ISMAR 2008. 7th IEEE/ACM International Symposium on, 193–
202.
S ARAGIH , J., AND M C D ONALD , K., 2014. FaceTracker.
facetracker.net. accessed 1 June 2014.

Liggghts Installation Guide 180204145706
No ratings yet
Liggghts Installation Guide 180204145706
10 pages
Jasper Reports - Working With Beans and Sub Report - A Knol by Nasir Qureshi
No ratings yet
Jasper Reports - Working With Beans and Sub Report - A Knol by Nasir Qureshi
11 pages
Expert C Programming Peter Van Der Linden PDF
No ratings yet
Expert C Programming Peter Van Der Linden PDF
2 pages
Tracking XML Developers Guide
No ratings yet
Tracking XML Developers Guide
119 pages
Global Illumination: Advancing Vision: Insights into Global Illumination
From Everand
Global Illumination: Advancing Vision: Insights into Global Illumination
Fouad Sabry
No ratings yet
Mod 3 ppt
No ratings yet
Mod 3 ppt
120 pages
Unit 1
No ratings yet
Unit 1
16 pages
Distance Fog: Exploring the Visual Frontier: Insights into Computer Vision's Distance Fog
From Everand
Distance Fog: Exploring the Visual Frontier: Insights into Computer Vision's Distance Fog
Fouad Sabry
No ratings yet
Unit 1 augmented and virtual reality starter
No ratings yet
Unit 1 augmented and virtual reality starter
96 pages
A SEMINAR REPORT
No ratings yet
A SEMINAR REPORT
28 pages
Projects in VR: The Magicbook - Moving Seamlessly Between Reality and Virtuality
No ratings yet
Projects in VR: The Magicbook - Moving Seamlessly Between Reality and Virtuality
3 pages
Augmented Reality Seminar Power Point PR
No ratings yet
Augmented Reality Seminar Power Point PR
37 pages
Billinghurst The Magic Book
No ratings yet
Billinghurst The Magic Book
3 pages
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
From Everand
Ray Tracing Graphics: Exploring Photorealistic Rendering in Computer Vision
Fouad Sabry
No ratings yet
Unit 1
No ratings yet
Unit 1
64 pages
Encyclopedia AugReal Holley Hobbs Final 19-10-2019
No ratings yet
Encyclopedia AugReal Holley Hobbs Final 19-10-2019
8 pages
IJCRT2012248
No ratings yet
IJCRT2012248
3 pages
Spatial Augmented Reality
No ratings yet
Spatial Augmented Reality
393 pages
Bimber O, Raskar R (2005) Spatial Augmented Reality Merging Real and Virtual Worlds (378S)
No ratings yet
Bimber O, Raskar R (2005) Spatial Augmented Reality Merging Real and Virtual Worlds (378S)
393 pages
CH 5 Augmented Reality AR Introduction To Emerging Technology
No ratings yet
CH 5 Augmented Reality AR Introduction To Emerging Technology
41 pages
Satbir Singh Sohal Armds Expt-8
No ratings yet
Satbir Singh Sohal Armds Expt-8
9 pages
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
From Everand
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
Fouad Sabry
No ratings yet
Augmented Reality - An Overview and Five Directions For AR in Educ
No ratings yet
Augmented Reality - An Overview and Five Directions For AR in Educ
22 pages
Ochristmann, 2010 (09-2) 1x2
No ratings yet
Ochristmann, 2010 (09-2) 1x2
19 pages
Paper 1
No ratings yet
Paper 1
19 pages
S15.s1 - Material
No ratings yet
S15.s1 - Material
62 pages
Spatial Augmented Reality Merging Real And Virtual Worlds Oliver Bimber pdf download
No ratings yet
Spatial Augmented Reality Merging Real And Virtual Worlds Oliver Bimber pdf download
85 pages
Lecture 04
No ratings yet
Lecture 04
29 pages
Final Doc2
No ratings yet
Final Doc2
28 pages
Augmented Reality Seminar Power Point Pr
No ratings yet
Augmented Reality Seminar Power Point Pr
37 pages
EmTe (CH 5)
No ratings yet
EmTe (CH 5)
41 pages
Augmented reality in education – cases, places and
No ratings yet
Augmented reality in education – cases, places and
11 pages
Curality Publication
No ratings yet
Curality Publication
13 pages
Chapter Five Augmented Reality: by Abdulaziz Oumer
No ratings yet
Chapter Five Augmented Reality: by Abdulaziz Oumer
20 pages
Mikko Sairio
No ratings yet
Mikko Sairio
11 pages
AR_Chirag_Kataria 2021UCS1699
No ratings yet
AR_Chirag_Kataria 2021UCS1699
9 pages
Seminar Report
No ratings yet
Seminar Report
21 pages
Augmented Reality Final (Repaired) 1
100% (2)
Augmented Reality Final (Repaired) 1
27 pages
Interactive Mediated Reality
No ratings yet
Interactive Mediated Reality
2 pages
Augmented Reality Seminar Power Point PR
No ratings yet
Augmented Reality Seminar Power Point PR
36 pages
Augmented Reality: Abhinav Kaushik
No ratings yet
Augmented Reality: Abhinav Kaushik
27 pages
A Seminar Report on Augumented Reality
No ratings yet
A Seminar Report on Augumented Reality
32 pages
Chapter Five Augmented Reality (AR) : Emerging Technologies
100% (1)
Chapter Five Augmented Reality (AR) : Emerging Technologies
10 pages
Chapter Five
No ratings yet
Chapter Five
18 pages
Chapter Five: Augmented Reality (AR)
No ratings yet
Chapter Five: Augmented Reality (AR)
24 pages
Chapter 5: Augmented Reality (AR)
No ratings yet
Chapter 5: Augmented Reality (AR)
14 pages
Unit I Ar VR
No ratings yet
Unit I Ar VR
20 pages
Augmented Reality in Education: Mark Billinghurst
No ratings yet
Augmented Reality in Education: Mark Billinghurst
5 pages
Augmented Reality: Goals Taxonomy Technology
No ratings yet
Augmented Reality: Goals Taxonomy Technology
23 pages
Chapter Five-AR
No ratings yet
Chapter Five-AR
29 pages
ETCh51
No ratings yet
ETCh51
40 pages
Augmented Reality - Students - Final 3-1
No ratings yet
Augmented Reality - Students - Final 3-1
4 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Three Dimensional Computer Graphics: Exploring the Intersection of Vision and Virtual Worlds
From Everand
Three Dimensional Computer Graphics: Exploring the Intersection of Vision and Virtual Worlds
Fouad Sabry
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
10 pages
Chapter 5
No ratings yet
Chapter 5
7 pages
Sanet - ST B0CVKY823C
No ratings yet
Sanet - ST B0CVKY823C
65 pages
Chapter 5
No ratings yet
Chapter 5
14 pages
Ar, VR & MR
No ratings yet
Ar, VR & MR
11 pages
Zayyan Main 33 Page
No ratings yet
Zayyan Main 33 Page
22 pages
COMP4801 Final Year Project Final Report
No ratings yet
COMP4801 Final Year Project Final Report
27 pages
Augmented Reality 2017
100% (1)
Augmented Reality 2017
31 pages
Chapter - Five Augumented
No ratings yet
Chapter - Five Augumented
39 pages
Seminar 1
No ratings yet
Seminar 1
2 pages
SAP DMS Instalation
No ratings yet
SAP DMS Instalation
22 pages
Improving Security Challeges of Atm System in Commercial Bank of Ethiopia: The Case of Wolaita Zone, Sodo City
No ratings yet
Improving Security Challeges of Atm System in Commercial Bank of Ethiopia: The Case of Wolaita Zone, Sodo City
21 pages
Embedded Systems Books Tools Pack
No ratings yet
Embedded Systems Books Tools Pack
18 pages
J2ee Q
No ratings yet
J2ee Q
7 pages
The Huawei E153 Modem On Linux
No ratings yet
The Huawei E153 Modem On Linux
6 pages
Detailed Design Document Template
40% (5)
Detailed Design Document Template
42 pages
Java MCQ
100% (1)
Java MCQ
9 pages
Performance Analysis
No ratings yet
Performance Analysis
5 pages
Information Security and Cyber Threats
No ratings yet
Information Security and Cyber Threats
14 pages
An Alternative Method For Least-Square Fitting of Parametric Survival Models
No ratings yet
An Alternative Method For Least-Square Fitting of Parametric Survival Models
8 pages
Difference Between Java Bean and EJB
100% (2)
Difference Between Java Bean and EJB
2 pages
Binary Ordering Algorithm
No ratings yet
Binary Ordering Algorithm
5 pages
Ug1144 Petalinux Tools Reference Guide
No ratings yet
Ug1144 Petalinux Tools Reference Guide
105 pages
Exchange 2010 Interview Question Answers
100% (2)
Exchange 2010 Interview Question Answers
68 pages
234209915
No ratings yet
234209915
2 pages
Set No: 1: Code: V3218/R07 Digital Signal Processing Time: 3 Hours Max. Marks: 80
No ratings yet
Set No: 1: Code: V3218/R07 Digital Signal Processing Time: 3 Hours Max. Marks: 80
4 pages
Hitachi AMS 2000 Family Host Installation Guide For Fibre Channel
No ratings yet
Hitachi AMS 2000 Family Host Installation Guide For Fibre Channel
148 pages
JavaScript Gtu Solved Paper 2018
No ratings yet
JavaScript Gtu Solved Paper 2018
12 pages
IARE MPID Lectures Notes
100% (1)
IARE MPID Lectures Notes
162 pages
MCS 053
No ratings yet
MCS 053
4 pages
ICT Risk Assessment Impact Score
No ratings yet
ICT Risk Assessment Impact Score
7 pages
Android Widget Tutorial PDF
No ratings yet
Android Widget Tutorial PDF
9 pages
Serial & Modbus Example Example
No ratings yet
Serial & Modbus Example Example
7 pages
Project Presentation: Real Estate Management System)
No ratings yet
Project Presentation: Real Estate Management System)
12 pages
CS 464 Question Bank
No ratings yet
CS 464 Question Bank
5 pages
ISU Master Data Creation Tool
No ratings yet
ISU Master Data Creation Tool
9 pages

User-Perspective Augmented Reality Magic Lens From Gradients

Uploaded by

User-Perspective Augmented Reality Magic Lens From Gradients

Uploaded by

User-Perspective Augmented Reality Magic Lens From Gradients

Domagoj Baričević∗ Tobias Höllerer† Pradeep Sen‡ Matthew Turk§

(a) (b) (c)

Abstract also been adopted as an intuitive interface for some VR applica-

(c) (d) (c) (d)

In order to create a user-perspective view, we need to be able to ac-

7.1 Evaluation and performance

Figures 7 and 8 illustrate the effects of the individual parts of our

Secondly, our system has some noticeable latency. The latency

You might also like