Computer Vision Applied To Super Resolution
Computer Vision Applied To Super Resolution
Applied to
Super Resolution
S
uper-resolution (SR) restora-
tion aims to solve the following
problem: given a set of observed
images, estimate an image at a
higher resolution than is present in any of ©DIGITAL VISION, LTD.
the individual images. Where the applica-
tion of this technique differs in computer
vision from other fields is in the variety and severity of the variations among the images. After alignment a compos-
registration transformation between the images. In par- ite image mosaic may be rendered and SR restoration
ticular this transformation is generally unknown, and a may be applied to any chosen region of interest.
significant component of solving the SR problem in com- We shall describe the two key components that are nec-
puter vision is the estimation of the transformation. The essary for successful SR restoration: the accurate align-
transformation may have a simple parametric form, or it ment or registration of the LR images and the
may be scene dependent and have to be estimated for ev- formulation of an SR estimator that uses a generative im-
ery point. In either case the transformation is estimated age model together with a prior model of the super-re-
directly and automatically from the images. solved image itself. As with many other problems in
Computer vision techniques applied to the SR prob- computer vision, these different aspects are tackled in a
lem have already yielded several successful products, in- robust, statistical framework.
cluding Cognitech’s “Video Investigator” software [1]
and Salient Stills’ “Video Focus” [2]. In the latter case, for
example, a high-resolution (HR) still of a face, suitable Image Registration
for printing in a newspaper article, can be constructed Essential to the success of any SR algorithm is the need to
from low-resolution (LR) video news feed. find a highly accurate point-to-point correspondence or
The approach discussed in this article is outlined in registration between images in the input sequence. This
Figure 1. The input images are first mutually aligned onto correspondence problem can be stated as follows: given
a common reference frame. This alignment involves not two different views of the same scene, for each image
only a geometric component but also a photometric com- point in one view find the image point in the second view
ponent, modeling illumination, gain, or color balance which has the same pre-image, i.e., corresponds to the
same actual point in the scene.
Many SR estimators, particularly those derived in the
David Capel and Andrew Zisserman Fourier domain, are based on the assumption of purely
translational image motion [34], [35]. In computer vision,
however, far more demanding image transformations are
Feature-Based Registration
In computer vision it is common to es-
timate the parameters of a geometric
Super Resolution • MAP Estimation transformation such as a homography
H by automatic detection and analysis
of corresponding features among the
input images. Typically, in each image
several hundred “interest points” are
automatically detected with subpixel
accuracy using an algorithm such as the
Harris feature detector [17]. Putative
correspondences are identified by com-
paring the image neighborhoods
around the features, using a similarity
metric such as normalized correlation.
Output: Original These correspondences are refined us-
High Resolution Resolution ing a robust search procedure such as
the RANSAC algorithm [13] that ex-
tracts only those features whose
s 1. Stages in the SR process. interimage motion is consistent with a
Image 1 Image 2
Algorithm: Automatic Two-View Registration
s 3. Two imaging scenarios for which the image-to-image corre- s 4. The main steps in the algorithm to automatically estimate a
spondence is captured by a planar homography. homography between two images.
i
(
L = ∑ (xi − x i )2 + ( y i − y )2
i
+ ( x i′ − x i′ ) 2 + ( y i′ − y ′ ) 2
i
)
(The unknown scale factor σ may be
safely dropped in the above equation
since it has no effect on the follow-
ing derivations). Of course, the true
(b) pre-image points are unknown, so
we replace {x , x ′ } in the above equa-
tion with {x$ , x$ ′ }, the estimated posi-
tions of the pre-image points, hence
(
L = ∑ ( x i − x$ i ) 2 + ( y i − y$ i ) 2
i
)
+ ( x i′ − x$ i′ ) 2 + ( y i′ − y$ i′ ) 2 .
(1)
(c)
Finally, we impose the constraint that
x$ maps to x$ ′ under a homography
and hence substitute x$ ′ = Hx$ . This
error metric is illustrated in Figure 6.
Thus minimizing L requires estimat-
ing the homography and the pre-im-
age points {x$}. A direct method of
obtaining these estimates is to
parameterize both the eight parame-
ters of the homography and the 2 N
(d)
parameters of the N points {x$}. We
will return to this idea shortly. In the
two-view case, however, it is possible
s 5. Steps in the robust algorithm for registering two views. (a) Two images of an Oxford to derive a very good approximation
collage. The motion between views is a rotation about the camera center, so the im- to this log-likelihood [19] that avoids
ages are exactly related by a homography. (b) Detected point features superimposed explicit parameterization of the pre-
on the images. There are approximately 500 features on each image. (c) The following image points, permitting H ml to be
results are superimposed on the left image: 268 putative matches shown by the line
computed by a standard nonlinear
linking matched points to their position in the other image; note the clear mismatches.
least-squares optimization over only
The right image shows RANSAC outliers, 117 of the putative matches. (d) Left: RANSAC
inliers, 151 correspondences consistent with the estimated homography. Right: Final eight parameters. For example, the
set of 262 correspondences after guided matching and optimal estimation. The esti- Levenberg-Marquardt algorithm
mated transformation is accurate to subpixel resolution. [25] can be used.
Photometric Registration
Photometric registration refers to the procedure by which
global photometric transformations between images are
estimated. Examples of such transformations are global
illumination changes across the scene and intensity varia- s 8. Concatenation of homographies permits registration error to
tions due to camera automatic gain control or automatic accumulate over the sequence. This is problematic when a se-
white balancing. In practice, it has been shown that a sim- quence “loops back.”
Super Resolution
The observed LR images are re-
garded as degraded observations of a
real, HR image. These degradations
typically include geometric warping,
optical blur, spatial sampling, and
noise, as shown in Figure 11. The for-
ward model of image formation is de-
scribed below. Given several such LR
image observations our objective is
(a)
to solve the inverse problem, i.e., de-
termine the SR image from the mea-
sured LR images given the image
formation model.
We will discuss two solutions to
this problem. In the first, we deter-
mine the ML estimate of the SR im-
age such that, when reprojected back
into the images via the imaging
model, it minimizes the difference
between the actual and “predicted”
observations. In the second, we de-
termine the maximum a posteriori
(MAP) estimate of the SR image in-
cluding prior information.
Generative Models
(b)
It is assumed that the set of ob-
s 9. A mosaic generated from 100 images after geometric registration using the N-view served LR images were produced
maximum likelihood method (below). The outline of every fifth image is superimposed. by a single HR image under the
Intensity
Intensity
150 150 150
50 50 50
0 0 0
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
Position on Profile Position on Profile Position on Profile
Intensity
Intensity
50 50 50
0 0 0
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
Position on Profile Position on Profile Position on Profile
s 10. Estimating and correcting for global photometric variation between images.
Geometric Spatial
Optical Blur
Transformation Sampling
s 11. The principal steps in the imaging model. From left to right: the HR planar surface undergoes a geometric viewing transformation
followed by optical/motion blurring and finally down-sampling.
( g$ ( x , y) − g n ( x , y)) 2
in which the vector f is a lexicographic reordering of pix-
( )
Pr g n|f$ = ∏
1
∀x , y σ n 2 π
exp − n
2σ 2n
els in f ( x , y), and where the linear operators T n , h and s ↓ (4)
have been combined into a single matrix M n . Each LR where the simulated LR image g$ n is given by
pixel is therefore a weighted sum of SR pixels, the g$ n = α n M n f$ + β n . The corresponding log-likelihood
weights being determined by the registration parame- function is
ters, and the shape of the point-spread function, and spa-
L(g n ) = − ∑ ( g$ n ( x , y) − g n ( x , y))
2
tial integration. Note the point-spread function may
combine the effects of optical blur and motion blur, but ∀x , y
2
we will only consider optical blur here. Motion blur is = − M n f$ − g n
2
= − g$ n − g n .
considered in [4].
From here on we shall drop the explicit photometric pa-
rameters, (α n ,β n ), to improve the clarity of the equations Again, the unknown σ n may be safely dropped in the
presented. Putting them back in is straightforward. Of above. Assuming independent observations, the log-like-
course, the algorithms used to generate the results do still in- lihood over all images is given by
clude the photometric parameters in their computations, and 2
∑ L(g n ) = − ∑ M n f$ − g n
2
in the real examples they are estimated robustly using the = − Mf − g .
method described previously under photometric registration. ∀n ∀n
f$ mle = M + g
f : ground truth, HR image
gn : nth observed LR image
T n : geometric transformation of nth image where M + is the Moore-Penrose pseudo-inverse of M,
h: point spread function which is M + = (M T M) −1 M T .
s ↓: down-sampling operator by a factor S M is a very large sparse Nn 2 × m 2 matrix, where N is
α n ,β n : scalar illumination parameters the number of LR images, and n 2 and m 2 are the number
ηn : observation noise
of pixels in the LR and HR images, respectively. Typical
Transformation T is assumed to be a
homography. The point spread function h is as-
values are N = 30, n 2 = 2 ,500, m 2 = 10,000, so it is not
sumed to be linear, spatially invariant. The noise possible in practice to directly compute the pseudo-in-
η is assumed to be Gaussian with mean zero. verse M + . Instead iterative solutions are sought, for ex-
ample, the method of conjugate gradients. A very popular
and straightforward solution was given by Irani and Peleg
[21], [22]. Here we compute f$ mle by preconditioned con-
s 12. The generative image formation model. jugate gradient descent.
MAP estimation
We now derive the MAP estimate
f map for the SR image. Suppose we
have prior information Pr(f$ ) on the
form of the SR image. Various exam-
ples of priors are discussed below, but MLE @ 1.75× Zoom MLE @ 2.0× Zoom
one example is a measure of image
smoothness. We wish to compute the (b)
estimate of f$ given the measured im- s 13. (a) A mosaic composed from 200 frames captured using hand-held DV camera.
ages g n and prior information Pr(f$ ). The region of interest (boxed in green) contains a car. (b) MLE reconstructions up to
It is a standard result of applying . × show marked improvement over the LR and median images. Reconstruction error
15
Bayes theorem [5] that the posterior . × zoom.
starts to become apparent at 175
$
probability Pr(f|g) is given by
Pr(f$|g) = Pr(g|f$ )Pr(f$ ) / Pr(g), where Pr(g|f$ ) is obtained The specific form of lg Pr(f$ ) depends on the prior being
from (4). It is convenient to work with the logs of these used, and we will now overview a few popular cases.
quantities, and the MAP estimate of f is then obtained from
Image Priors
The simplest and most common priors have potential
f
() ( )
f map = argmax lg Pr f$ + lg Pr g|f$ functions that are quadratic in the pixel values f, hence
= argmax lg Pr(f$ ) −
1 2
1
Mf − g
2
. Pr(f ) = exp(− f T Qf )
f 2σ n (5) Z (6)
ρ( x) = x2 if |x|≤ α
= 2α|x|−α 2 otherwise (11)
Examples
Figure 15 compares the solutions ob-
tained under these three priors for the
car example. The SR image is recon-
structed at 3 × pixel zoom, and in all
cases the MAP solutions show more GMRF (λ = 0.006) HMRF (λ = 0.009, α = 0.05)
convincing detail than the ML recon-
struction of Figure 13, especially s 15. MAP results for the car example using various priors.
around the door handles and wing
mirror. The||x||2 and GMRF priors produce similar re-
sults, but note the sharp edges around the windows and
headlights in the HMRF reconstruction. The level of de-
tail in the reconstructions compared to the LR images is
Low Res (Bicubic 3× Zoom)
very apparent. Furthermore, the priors have eliminated
the noise of the ML solution, without introducing arti-
facts of their own. An ML solution at this zoom factor
would be completely dominated by noise.
Figures 16 and 17 show two further examples of MAP
HMRF (λ = 0.01, α = 0.05)
reconstruction. In the first, which is constructed from 30
LR images in a similar situation to Figure 13, the text is s 16. MAP super resolution applied to 30 LR images using the
clearly readable in the SR image but is not in the original HMRF prior.
images. The second example shows a
MAP reconstruction for images ob-
tained by the Mars rover. The details
of the rock surface are considerably
clearer in the SR image compared to
the originals.
Current Research
Challenges
Current research on SR in the com-
puter vision field falls into three
categories: first, there is analysis on Low Res (Bicubic 3× Zoom) GMRF (λ = 0.01)
performance bounds—how far can
this area of image be zoomed before s 17. MAP super resolution applied to 25 LR JPEG images using the GMRF prior.
subspaces [8], [9]. The objective here is to use a prior tuned [18] R.I. Hartley, “Self-calibration of stationary cameras,” Int. J. Comput. Vi-
sion, vol. 22, no. 1, pp. 5-23, Feb. 1997.
to particular types of scenes, such as a face or text, rather
[19] R.I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision.
than a general purpose prior such as GMRF. These priors
London, U.K: Cambridge Univ. Press, 2000.
need not be made explicit, and in one imaginative approach
[20] M. Irani and P. Anandan, “About direct methods,” in Vision Algorithms:
[3], [15] the mapping from LR to HR is learned from Theory and Practice (Lecture Notes in Computer Science), W. Triggs, A.
training examples or low and HR image pairs. Zisserman, and R. Szeliski, Eds. New York: Springer Verlag, 2000.
[21] M. Irani and S. Peleg, “Improving resolution by image registration,”
David Capel completed his Ph.D. on image mosaicing and Graphi. Models Image Process., vol. 53, pp. 231-239, 1991.
super resolution as part of the Visual Geometry Group at [22] M. Irani and S. Peleg, “Motion analysis for image enhancement:resolu-
Oxford University in 2001. Since then, he has worked as a tion, occlusion, and transparency,” J. Visual Commun. Image Representation,
vision scientist at 2d3 Ltd, UK. He has published a num- vol. 4, pp. 324-335, 1993.
ber of papers in the field of image mosaicing and super res- [23] Z. Lin and H.Y. Shum, “On the fundamental limits of reconstruc-
tion-based super-resolution algorithms,” in Proc. Proc. IEEE Conf. Computer
olution and has also authored a book on the subject.
Vision and Pattern Recognition, 2001, pages I:1171-1176.
[24] S. Mann and R.W. Picard, “Virtual bellows: Constructing high quality
Andrew Zisserman is professor of engineering science at stills from video,” in Int. Conf. Image Processing, 1994.
the University of Oxford and heads the Visual Geometry [25] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes
Group. His research interests include geometry and rec- in C, 2nd ed. London, U.K.: Cambridge Univ. Press, 1992.
ognition in computer vision. He has authored over 100 [26] H.S. Sawhney, S. Hsu, and R. Kumar, “Robust video mosaicing through
papers and is author/editor of eight books. He has twice topology inference and local to global alignment,” in Proc. Euro. Conf. Com-
been awarded the IEEE Marr Prize for Computer Vision. puter Vision, 1998, pp. 103-119.
[27] R.R. Schultz and R.L. Stevenson, “Extraction of high-resolution frames
from video sequences,” IEEE Trans. Image Processing, vol. 5, pp. 996-1011,
References June 1996.
[28] A. Shashua and S. Toelg, “The quadric reference surface: Theory and ap-
[1] http://www.cognitech.com
plications,” Int. J. Comput. Vision, vol. 23, no. 2, pp. 185-198, 1997.
[2] http://www.salientstills.com
[29] C. Slama, Manual of Photogrammetry, 4th ed. Falls Church, VA: Amer.
[3] S. Baker and T. Kanade, “Limits on super-resolution and how to break them,”
Soc. Photogrammetry, 1980.
IEEE Pattern Anal. Machine Intell., vol. 24, no. 9, pp. 1167-1183, 2002.
[30] V.N. Smelyanskiy, P. Cheeseman, D. Maluf, and R. Morris, “Bayesian
[4] B. Bascle, A. Blake, and A. Zisserman, “Motion deblurring and super-reso-
super-resolved surface reconstruction from images,” in Proc. IEEE Conf.
lution from an image sequence,” in Proc. Euro. Conf. Computer Vision, 1996,
Computer Vision and Pattern Recognition, 2000, pp I:375-382.
pp. 312-320.
[31] R. Szeliski, “Image mosaicing for tele-reality applications,” Digital Equip-
[5] C.M. Bishop, Neural Networks for Pattern Recognition. London, U.K.: Ox-
ment Corp., Cambridge, MA, Tech. rep., 1994.
ford Univ. Press, 1995.
[6] D.P. Capel, “Image mosaicing and super-resolution,” Ph.D. dissertation, [32] A.N. Tikhonov and V.Y. Arsenin. Solutions of Ill-Posed Problems. Washing-
Univ. of Oxford, 2001. ton, DC: Winston,Wiley, 1977.
[7] D.P Capel and A. Zisserman, “Automated mosaicing with super-resolution [33] P.H.S. Torr and A. Zisserman, “MLESAC: A new robust estimator with
zoom,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, June application to estimating image geometry,” Comput. Vision Image Under-
1998, pp. 885-891. standing, vol. 78, pp. 138-156, 2000.
[8] D.P. Capel and A. Zisserman, “Super-resolution from multiple views using [34] R. Tsai and T. Huang, “Multiframe image restoration and registration,”
learnt image models,” in Proc. IEEE Conf. Computer Vision and Pattern Rec- Advances Comput.Vision Image Processing, vol. 1, pp. 317-339, 1984.
ognition, 2001. [35] H. Ur and D. Gross, “Improved resolution from subpixel shifted pictures,”
[9] D.P. Capel, Image Mosaicing and Super-Resolution. New York: Graph. Models Image Process., vol. 54, no. 2, pp. 181-186, Mar. 1992.
Springer-Verlag, 2003. [36] Y. Wexler and A. Shashua, “Q-warping: Direct computation of quadratic
[10] G. Cross and A. Zisserman, “Quadric surface reconstruction from reference surfaces,” in Proc. IEEE Conf. Computer Vision and Pattern Recog-
dual-space geometry,” in Proc. ICCV, Jan 1998, pp. 25-31. nition, vol. 1, 1991, pp. 333-338.
[11] F. Devernay and O.D. Faugeras, “Automatic calibration and removal of [37] W.Y. Zhao and S. Sawhney, “Is super-resolution with optical flow feasi-
distortion from scenes of structured environments,” in SPIE, vol. 2567, San ble?” in Proc. Euro. Conf. Computer Vision (Lecture Notes in Computer Sci-
Diego, CA, July 1995. ence, vol. 2350). Springer-Verlag, 2002, pp. 599-613.
[12] H. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems. [38] A. Zomet and S. Peleg, “Applying super-resolution to panoramic mosa-
Dordrecht, Germany: Kluwer Academic, 1996. ics,” in Workshop Applications of Computer Vision, 1998.