0% found this document useful (0 votes)
17 views

Computer Vision Applied To Super Resolution

computer vision method for super-resolution

Uploaded by

owuchangyuo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Computer Vision Applied To Super Resolution

computer vision method for super-resolution

Uploaded by

owuchangyuo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Computer Vision

Applied to
Super Resolution

S
uper-resolution (SR) restora-
tion aims to solve the following
problem: given a set of observed
images, estimate an image at a
higher resolution than is present in any of ©DIGITAL VISION, LTD.
the individual images. Where the applica-
tion of this technique differs in computer
vision from other fields is in the variety and severity of the variations among the images. After alignment a compos-
registration transformation between the images. In par- ite image mosaic may be rendered and SR restoration
ticular this transformation is generally unknown, and a may be applied to any chosen region of interest.
significant component of solving the SR problem in com- We shall describe the two key components that are nec-
puter vision is the estimation of the transformation. The essary for successful SR restoration: the accurate align-
transformation may have a simple parametric form, or it ment or registration of the LR images and the
may be scene dependent and have to be estimated for ev- formulation of an SR estimator that uses a generative im-
ery point. In either case the transformation is estimated age model together with a prior model of the super-re-
directly and automatically from the images. solved image itself. As with many other problems in
Computer vision techniques applied to the SR prob- computer vision, these different aspects are tackled in a
lem have already yielded several successful products, in- robust, statistical framework.
cluding Cognitech’s “Video Investigator” software [1]
and Salient Stills’ “Video Focus” [2]. In the latter case, for
example, a high-resolution (HR) still of a face, suitable Image Registration
for printing in a newspaper article, can be constructed Essential to the success of any SR algorithm is the need to
from low-resolution (LR) video news feed. find a highly accurate point-to-point correspondence or
The approach discussed in this article is outlined in registration between images in the input sequence. This
Figure 1. The input images are first mutually aligned onto correspondence problem can be stated as follows: given
a common reference frame. This alignment involves not two different views of the same scene, for each image
only a geometric component but also a photometric com- point in one view find the image point in the second view
ponent, modeling illumination, gain, or color balance which has the same pre-image, i.e., corresponds to the
same actual point in the scene.
Many SR estimators, particularly those derived in the
David Capel and Andrew Zisserman Fourier domain, are based on the assumption of purely
translational image motion [34], [35]. In computer vision,
however, far more demanding image transformations are

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 75


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
1053-5888/03/$17.00©2003IEEE
required and estimated on a regular basis. Fast, accurate, and
Super-resolution restoration robust automated methods exists for registering images re-
lated by affine transformations [21], biquadratic transfor-
aims to solve the following: mations [24], and planar projective transformations [7].
given a set of observed images, Image deformations inherent in the imaging system, such as
radial lens distortion, may also be parametrically modeled
estimate an image at a higher and accurately estimated [11], [14].
resolution than is present in
any of the individual images. Geometric Registration
For illustrative purposes we will focus on the case of images
that are related by a planar projective transformation, also
called a planar homography, a geomet-
ric transformation which has eight de-
grees of freedom (see Figure 2). There
are two important situations in which a
planar homography is appropriate
Input: Multiple Images [19]: images of a plane viewed under
arbitrary camera motion and images of
an arbitrary three-dimensional scene
viewed by a camera rotating about its
optic center and/or zooming. The two
situations are illustrated in Figure 3. In
both cases, the image points x and x ′
correspond to a single point X in the
• Geometric
Image Registration
• Photometric
world. A third imaging situation in
which a homography may be appro-
priate occurs when a freely moving
camera views a very distant scene, such
Image Mosiac as is the case in high-aerial or satellite
photography. Because the distance of
the scene from the camera is very much
greater than the motion of the camera
between views, the parallax effects
caused by the three-dimensional nature
of the scene are negligibly small.

Feature-Based Registration
In computer vision it is common to es-
timate the parameters of a geometric
Super Resolution • MAP Estimation transformation such as a homography
H by automatic detection and analysis
of corresponding features among the
input images. Typically, in each image
several hundred “interest points” are
automatically detected with subpixel
accuracy using an algorithm such as the
Harris feature detector [17]. Putative
correspondences are identified by com-
paring the image neighborhoods
around the features, using a similarity
metric such as normalized correlation.
Output: Original These correspondences are refined us-
High Resolution Resolution ing a robust search procedure such as
the RANSAC algorithm [13] that ex-
tracts only those features whose
s 1. Stages in the SR process. interimage motion is consistent with a

76 IEEE SIGNAL PROCESSING MAGAZINE MAY 2003


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
homography [19], [33]. Finally, these inlying correspon- Feature-based algorithms have several advantages
dences are used in a nonlinear estimator which returns a over direct, texture correlation-based approaches
highly accurate estimate of the homography. The algorithm often found elsewhere [20], [26], [31]. These in-
is summarized in Figure 4, and the process is illustrated in clude the ability to cope with widely disparate views
Figure 5 for the case of two views. and excellent robustness to illumination changes.
More importantly in the context of SR, the fea-
ture-based approach allows us to derive a statistically
well- founded estimator of the registration parameters
Notation: Points are represented by homogeneous
using the method of maximum likelihood (ML). Ap-
coordinates, so that a point ( x , y) is represented as
( x , y ,1). Conversely, the point ( x1 , x2 , x3 ) in homoge- plied to several hundred point correspondences, this
neous coordinates corresponds to the inhomogene- estimator gives highly accurate results. Furthermore,
ous point ( x1 / x3 , x2 / x3 ). the feature-based ML estimator is easily extended to
Definition—Planar Homography: Under a planar perform simultaneous registration of any number of
homography (also called a plane projective transfor- images, yielding mutually consistent, accurate esti-
mation, collineation, or projectivity) points are mates of the interimage transformations.
mapped as
 x ′1  h11 h12 h13  x1 
     ML Registration of Two Views
 x ′2  = h21 h22 h23  x2  We first look at the ML homography estimator for just
   
 x ′3  h31 h32 h33  x3 
two views. The localization error on the detected feature
or equivalently points is modeled as an isotropic, normal distribution
with zero mean and standard deviation σ. Given a true,
x′ = Hx
noise-free point x (which is the projection of a pre-image
where the = indicates equality up to a scale factor. The scene point X), the probability density of the correspond-
equivalent nonhomogeneous relationship is ing observed (i.e., noisy) feature point location is
h x + h12 y + h13 h x + h22 y + h23
x ′ = 11 , y′ = 21 .  ( x − x) 2 + ( y − y) 2 
h31 x + h32 y + h33 h31 x + h32 y + h33 1 − .
Pr(x|x) = exp
2 πσ 2  2σ 2 
 
s 2. Definition of the planar homography.

Image 1 Image 2
Algorithm: Automatic Two-View Registration

R, t 1) Features: Compute interest point features in


x′
each image to subpixel accuracy (e.g., Harris corners
x [17]).
2) Putative Correspondences: Compute a set of
interest point matches based on proximity and similar-
ity of their intensity neighbourhood.
3) RANSAC Robust Estimation: Repeat for N sam-
X ples
Planar Surface a) Select a random sample of 4 correspon-
dences and compute the homography H.
Images of a Planar Surface b) Calculate a geometric image distance error
for each putative correspondence.
X
c) Compute the number of inliers consistent
with H by the number of correspondences for
X which the distance error is less than a threshold.
Choose the H with the largest number of inliers.
x 4) Optimal Estimation: reestimateHfrom all corre-
x
x′ spondences classified as inliers, by maximizing the
x′ likelihood function of (1).
5) Guided Matching: Further interest point corre-
spondences are now determined using the estimated
Plan View
H to define a search region about the transferred point
position.
The last two steps can be iterated until the number
Images from a Panning/Zooming Camera
of correspondences is stable.

s 3. Two imaging scenarios for which the image-to-image corre- s 4. The main steps in the algorithm to automatically estimate a
spondence is captured by a planar homography. homography between two images.

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 77


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
Hence, given the set of true, noise-
free correspondences {x ↔ x ′ }, and
Automatic Homography Estimation
making the very reasonable assump-
tions that the measurements are inde-
pendent, and that the feature
localization error is uncorrelated
across different images, the probabil-
ity density of the set of observed,
noisy correspondences {x ↔ x ′ } is

Pr({x , x ′ }) = ∏ Pr(x ′i|x 'i )Pr(x ′i|x ′i ).


i

The negative log-likelihood of the set


(a)
of all correspondences is therefore

i
(
L = ∑ (xi − x i )2 + ( y i − y )2
i

+ ( x i′ − x i′ ) 2 + ( y i′ − y ′ ) 2
i
)
(The unknown scale factor σ may be
safely dropped in the above equation
since it has no effect on the follow-
ing derivations). Of course, the true
(b) pre-image points are unknown, so
we replace {x , x ′ } in the above equa-
tion with {x$ , x$ ′ }, the estimated posi-
tions of the pre-image points, hence

(
L = ∑ ( x i − x$ i ) 2 + ( y i − y$ i ) 2
i

)
+ ( x i′ − x$ i′ ) 2 + ( y i′ − y$ i′ ) 2 .

(1)
(c)
Finally, we impose the constraint that
x$ maps to x$ ′ under a homography
and hence substitute x$ ′ = Hx$ . This
error metric is illustrated in Figure 6.
Thus minimizing L requires estimat-
ing the homography and the pre-im-
age points {x$}. A direct method of
obtaining these estimates is to
parameterize both the eight parame-
ters of the homography and the 2 N
(d)
parameters of the N points {x$}. We
will return to this idea shortly. In the
two-view case, however, it is possible
s 5. Steps in the robust algorithm for registering two views. (a) Two images of an Oxford to derive a very good approximation
collage. The motion between views is a rotation about the camera center, so the im- to this log-likelihood [19] that avoids
ages are exactly related by a homography. (b) Detected point features superimposed explicit parameterization of the pre-
on the images. There are approximately 500 features on each image. (c) The following image points, permitting H ml to be
results are superimposed on the left image: 268 putative matches shown by the line
computed by a standard nonlinear
linking matched points to their position in the other image; note the clear mismatches.
least-squares optimization over only
The right image shows RANSAC outliers, 117 of the putative matches. (d) Left: RANSAC
inliers, 151 correspondences consistent with the estimated homography. Right: Final eight parameters. For example, the
set of 262 correspondences after guided matching and optimal estimation. The esti- Levenberg-Marquardt algorithm
mated transformation is accurate to subpixel resolution. [25] can be used.

78 IEEE SIGNAL PROCESSING MAGAZINE MAY 2003


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
Simultaneous Registration
of Multiple Images The key components necessary
By computing homographies between all pairs of consec-
utive frames in the input sequence, the images may be for successful SR restoration: the
aligned with a single common reference frame (Figure 7), accurate alignment of the LR
warped and blended to render an image mosaic. This is
possible due to the concatenation property of images and the formulation of
homographies, i.e., the homography relating frame 0 and an SR estimator that uses a
frame N is simply the product of the intervening
homographies. However, this process permits the accu- generative image model together
mulation of “dead-reckoning” error. This is particularly with a prior model of the
problematic when the camera “loops back,” revisiting
certain parts of the scene more than once (see Figure 8). super-resolved image itself.
In this case, the accumulated registration error may cause
the first and last images to be misaligned.
Fortunately, the feature-based registration scheme of-
fers an elegant solution to this problem. The two-view
ML estimator may be easily extended to perform simulta- x ∧
neous registration of any number of views. Furthermore, ∧ d
x
d′
x H−1
the N-view estimator allows feature correspondences be-
x′
tween any pair of views, for example, between the first Image 1
H
Image 2
and last frames, to be incorporated in the optimization.
This guarantees that the estimated homographies will be
globally consistent. s 6. The ML estimator of (1) minimizes the squared geometric
As illustrated in Figure 7, any particular pre-image distances between the pre-image point correspondence ( x$ , x$ ′ )
scene point X j may be observed in several (but not neces- and the observed interest points (x, x ′ ).
sarily all) images. The corresponding set of detected fea-
ture points {x ij } (where the superscript i indicates the
image) plays an identical role to the two-view correspon-
dences already discussed. The pre-image points X j are ex-
plicitly parameterized to lie in an arbitrarily chosen plane
and the homographies H i map the points X j to their cor-
responding image points x ij . Analogously to the two-
view ML estimator, the N-view estimator seeks the set of
homographies and pre-image points that minimizes the
(squared) geometric distances d(x ij , x$ ij ) between each ob-
served feature point x ij and its predicted position
x$ ij = H i X j . In practice the plane of the points X j is often
chosen to correspond to one of the images. This algo- s 7. Three images acquired by a rotating camera may be regis-
rithm, which optimizes over all the homographies and tered to the frame of the middle one, as shown, by projectively
warping the outer images to align with the middle one.
the pre-image points simultaneously, is known to photo-
grammetrists as block bundle adjustment [29]. The imple-
mentation details are described in [9], [18], and [19].
Figure 9 shows a mosaic image composed using 100
frames registered by block bundle adjustment. There is no
visible misalignment between frames. Note that in this
example the images are reprojected on a planar manifold.
However, a cylindrical reprojection manifold is also com-
mon in image mosaicing [38].

Photometric Registration
Photometric registration refers to the procedure by which
global photometric transformations between images are
estimated. Examples of such transformations are global
illumination changes across the scene and intensity varia- s 8. Concatenation of homographies permits registration error to
tions due to camera automatic gain control or automatic accumulate over the sequence. This is problematic when a se-
white balancing. In practice, it has been shown that a sim- quence “loops back.”

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 79


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
 r 2  α r 0 0  r1   β r 
      
Essential to the success of any  g2  =  0 αg 0  g1  +  β g  ,
      
SR algorithm is the need to find  b2   0 0 α b   b1   β b 

a highly accurate point-to-point resulting in a total of six parameters. After geometric


correspondence or registration alignment, the colors of corresponding pixels in two im-
ages may be used to directly estimate the parameters of
between images in the input the colour transformation between them. Due to the pos-
sequence. sibility of outliers to this simple model, which may be
caused by specularities, shadowing, etc., the estimation is
again performed using a robust algorithm such as
RANSAC, followed by optimal estimation using the
ple parametric model of these effects, along with a robust inliers to the model.
method for computing the parameters given a set of geo- In the example shown in Figure 10, the photometric
metrically registered views, can be sufficient to allow suc- difference is due to a change in daylight conditions. The es-
cessful application to image mosaicing and SR [6]. timated transformation is used to render a color-corrected
The examples shown here employ a model which al- version of image 1. The corrected image exhibits the same
lows for an affine transformation (contrast and bright- orange glow as the sun-lit image. The effectiveness of the
ness) per RGB channel photometric registration is further verified by the intensity
profiles. In this case, the red channel
undergoes the most severe transfor-
mation. After correction, the profiles
of the corrected image match closely
those of image 2.

Super Resolution
The observed LR images are re-
garded as degraded observations of a
real, HR image. These degradations
typically include geometric warping,
optical blur, spatial sampling, and
noise, as shown in Figure 11. The for-
ward model of image formation is de-
scribed below. Given several such LR
image observations our objective is
(a)
to solve the inverse problem, i.e., de-
termine the SR image from the mea-
sured LR images given the image
formation model.
We will discuss two solutions to
this problem. In the first, we deter-
mine the ML estimate of the SR im-
age such that, when reprojected back
into the images via the imaging
model, it minimizes the difference
between the actual and “predicted”
observations. In the second, we de-
termine the maximum a posteriori
(MAP) estimate of the SR image in-
cluding prior information.

Generative Models
(b)
It is assumed that the set of ob-
s 9. A mosaic generated from 100 images after geometric registration using the N-view served LR images were produced
maximum likelihood method (below). The outline of every fifth image is superimposed. by a single HR image under the

80 IEEE SIGNAL PROCESSING MAGAZINE MAY 2003


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
Image 1 Image 2 Image 1 (Corrected)

Image 1 Profiles Image 2 Profiles Image 1 (Corrected) Profiles


250 250 250

200 200 200


Intensity

Intensity

Intensity
150 150 150

100 100 100

50 50 50

0 0 0
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
Position on Profile Position on Profile Position on Profile

Red Profiles Green Profiles Blue Profiles


250 250 250
Image 1
200 200 200 Image 2
Image 1 (Corrected)
Intensity

Intensity
Intensity

150 150 150

100 100 100

50 50 50

0 0 0
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
Position on Profile Position on Profile Position on Profile

s 10. Estimating and correcting for global photometric variation between images.

Hi-Res Texture Lo-Res Image

Geometric Spatial
Optical Blur
Transformation Sampling

s 11. The principal steps in the imaging model. From left to right: the HR planar surface undergoes a geometric viewing transformation
followed by optical/motion blurring and finally down-sampling.

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 81


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
 g0   M0   η0 
 g   M   η 
Photometric registration refers  1 = 1 
f +  1 , g = Mf + η.
to the procedure by which  M   M   M 
g  M  η 
 N −1   N −1   N −1  (3)
global photometric
transformations between
images are estimated. Maximum Likelihood Estimation
We now derive an ML estimate f mle for the SR image f,
given the measured LR images g n and the imaging matri-
generative model of image formation given in Figure
ces M n . Assuming the image noise to be Gaussian with
12. After discretization, the model can be expressed in
mean zero, variance σ 2n , the total probability of an ob-
matrix form as
served image g n ( x , y) given an estimate of the SR image
g n =α nM nf +β n + ηn (2) f$ ( x , y) is

 ( g$ ( x , y) − g n ( x , y)) 2 
in which the vector f is a lexicographic reordering of pix-
( )
Pr g n|f$ = ∏
1
∀x , y σ n 2 π
exp  − n
 2σ 2n


els in f ( x , y), and where the linear operators T n , h and s ↓ (4)
have been combined into a single matrix M n . Each LR where the simulated LR image g$ n is given by
pixel is therefore a weighted sum of SR pixels, the g$ n = α n M n f$ + β n . The corresponding log-likelihood
weights being determined by the registration parame- function is
ters, and the shape of the point-spread function, and spa-
L(g n ) = − ∑ ( g$ n ( x , y) − g n ( x , y))
2
tial integration. Note the point-spread function may
combine the effects of optical blur and motion blur, but ∀x , y
2
we will only consider optical blur here. Motion blur is = − M n f$ − g n
2
= − g$ n − g n .
considered in [4].
From here on we shall drop the explicit photometric pa-
rameters, (α n ,β n ), to improve the clarity of the equations Again, the unknown σ n may be safely dropped in the
presented. Putting them back in is straightforward. Of above. Assuming independent observations, the log-like-
course, the algorithms used to generate the results do still in- lihood over all images is given by
clude the photometric parameters in their computations, and 2

∑ L(g n ) = − ∑ M n f$ − g n
2
in the real examples they are estimated robustly using the = − Mf − g .
method described previously under photometric registration. ∀n ∀n

The generative models of all N images are stacked ver-


tically to form an over-determined linear system We seek an estimate f mle which maximizes this log-likeli-
hood
2
f mle = argmin Mf − g .
f

This is a standard linear minimization, and the solution is


gn ( x , y) = α n ↓ ( h( u,v) * f ( T n ( x , y))) + β n + η( x , y) given by

f$ mle = M + g
f : ground truth, HR image
gn : nth observed LR image
T n : geometric transformation of nth image where M + is the Moore-Penrose pseudo-inverse of M,
h: point spread function which is M + = (M T M) −1 M T .
s ↓: down-sampling operator by a factor S M is a very large sparse Nn 2 × m 2 matrix, where N is
α n ,β n : scalar illumination parameters the number of LR images, and n 2 and m 2 are the number
ηn : observation noise
of pixels in the LR and HR images, respectively. Typical
Transformation T is assumed to be a
homography. The point spread function h is as-
values are N = 30, n 2 = 2 ,500, m 2 = 10,000, so it is not
sumed to be linear, spatially invariant. The noise possible in practice to directly compute the pseudo-in-
η is assumed to be Gaussian with mean zero. verse M + . Instead iterative solutions are sought, for ex-
ample, the method of conjugate gradients. A very popular
and straightforward solution was given by Irani and Peleg
[21], [22]. Here we compute f$ mle by preconditioned con-
s 12. The generative image formation model. jugate gradient descent.

82 IEEE SIGNAL PROCESSING MAGAZINE MAY 2003


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
Figure 13 shows an example of the
ML solution under various degrees of
zoom. The original images were ob-
tained from a panning hand-held dig-
ital video camera. The images are
geometrically and photometrically
registered automatically and dis-
played as a mosaic. The SR results are
given for a 40 × 25 pixel region of the
LR images which contains a station-
ary car. These are computed using 50
LR images and assuming a Gaussian
point spread function for optical blur
with scale σ psf = 0.425. (a)
It can be seen that up to a zoom
factor of 1.5 the resolution improves
and more detail is evident. There is
clear improvement over the original
images and a “median image,” ob-
tained by geometrically warp-
ing/resampling the input images into
the SR coordinate frame, and com-
bining them using a median filter. As
Low Res (Bicubic 2× Zoom) Median Image @ 2.0× Zoom
the zoom factor increases, however,
further characteristic high frequency
noise is superimposed on the SR im-
age. This is a standard occurrence in
inverse problems and results from
noise amplification due to poor con-
ditioning of the matrix M. One stan-
dard remedy is to regularize the
solution, and this is discussed in the
MLE @ 1.25× Zoom MLE @ 1.5× Zoom
next section where the regularizers
are considered as prior knowledge.

MAP estimation
We now derive the MAP estimate
f map for the SR image. Suppose we
have prior information Pr(f$ ) on the
form of the SR image. Various exam-
ples of priors are discussed below, but MLE @ 1.75× Zoom MLE @ 2.0× Zoom
one example is a measure of image
smoothness. We wish to compute the (b)

estimate of f$ given the measured im- s 13. (a) A mosaic composed from 200 frames captured using hand-held DV camera.
ages g n and prior information Pr(f$ ). The region of interest (boxed in green) contains a car. (b) MLE reconstructions up to
It is a standard result of applying . × show marked improvement over the LR and median images. Reconstruction error
15
Bayes theorem [5] that the posterior . × zoom.
starts to become apparent at 175
$
probability Pr(f|g) is given by
Pr(f$|g) = Pr(g|f$ )Pr(f$ ) / Pr(g), where Pr(g|f$ ) is obtained The specific form of lg Pr(f$ ) depends on the prior being
from (4). It is convenient to work with the logs of these used, and we will now overview a few popular cases.
quantities, and the MAP estimate of f is then obtained from
Image Priors
The simplest and most common priors have potential
f
() ( )
f map = argmax lg Pr f$ + lg Pr g|f$ functions that are quadratic in the pixel values f, hence

= argmax lg Pr(f$ ) −
1 2
1
Mf − g
2
. Pr(f ) = exp(− f T Qf )
f 2σ n (5) Z (6)

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 83


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
operator. Equations (8) and (9) will be
Huber ρ(x), α = 0.2 Huber ρ(x), α = 0.4 Huber ρ(x), α = 0.6 familiar to many people as forms of
1 1 1 Tikhonov regularization [12], [16],
[32], a technique proposed by
Tikhonov and Arsenin in the context of
solving Fredholm integral equations of
0.5 0.5 0.5
the first kind. Image deconvolution is
one example of this class of problem.
Another way to think about (6) is
0 0 0 as a multivariate Gaussian distribu-
−1 0 1 −1 0 1 −1 0 1
tion over f, in which Q is the inverse
(a) of the covariance matrix.

Huber pdf, α = 0.2 Huber pdf, α = 0.4 Huber pdf, α = 0.6


The || x 2 || Prior
Referring to (6) and setting Q equal
0.4 0.4 0.4
to some multiple of the identity is
equivalent to assuming zero-mean,
0.2 0.2 0.2
Gaussian i.i.d pixel values. We shall
modify this distribution slightly to
use the median image as the mean in-
0 0 0 stead. This allows us to take advan-
−5 0 5 −5 0 5 −5 0 5 tage of the good SR estimate which is
(b) provided by the median image, by de-
fining a prior which encourages the
s 14. (a) The Huber potential functions ρ( x ), plotted for three different values of α. (b) SR estimate to lie close to it. The as-
The corresponding prior distributions, (6), are a combination of a Gaussian (dashed sociated prior is
line) and a Laplacian distribution.
1  |f − f |2 
where Q is a symmetric, positive-definite matrix. In this Pr(f ) = exp  − med .

Z  2 σ 2
f 
case, (5) becomes
1
f map = argmax − f$Qf$ − 2 Mf − g .
2
Gaussian MRFs
f 2σ n (7) When the matrix Q in (6) is nondiagonal, we have a
multivariate Gaussian distribution over f, in which spatial
This case is of particular interest, since the MAP estimator correlations between adjacent pixels are captured by the
has, in principle, a linear solution: off-diagonal elements. The corresponding MRFs are
termed Gaussian MRFs or GMRFs. For the purpose of
f map = (M T M + Q) −1 M T g. our examples, we define a GMRF in which L is formed by
taking first-order finite difference approximations to the
Of course, in the context of image restoration (as in the image gradient over horizontal, vertical, and diagonal
ML case), it is computationally infeasible to perform the pair-cliques. For every location f x , y in the SR image, L
matrix inversion directly, but since both terms in (7) are computes the following finite differences in the four adja-
quadratic, the conjugate gradient ascent method [25] cent, unique pair-cliques:
may applied to obtain the solution iteratively.
The simplest matrix Q which satisfies the criterion is a d x = f x +1, y − f x , y d y = f x , y +1 − f x , y
multiple of the identity, giving 1 1
d xy = ( f x + 1 , y + 1 − f x , y ) d yx = ( f x + 1 , y −1 − f x , y ).
2 1 2
2 2 (10)
f map = argmax − γ 2 f − 2
Mf − g .
f 2σ n (8) Schultz and Stevenson [27] suggest a prior based on sec-
ond derivatives, in which the spatial activity measures are
A common variation on this scheme is when Q is derived defined over triplet-cliques.
from a linear operator L applied to the image f:
2 1 2
Huber MRFs
f map = argmax − γ 2 Lf − Mf − g
f 2σ 2n (9) A common criticism leveled at the GMRF priors is that
the associated MAP SR estimates tend to be overly
in which case Q is L T L. The matrix L is typically chosen to smooth and that sharp edges, which are what we are most
be a discrete approximation of a first or second derivative interested in recovering, are not preserved. This problem

84 IEEE SIGNAL PROCESSING MAGAZINE MAY 2003


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
can be ameliorated by modeling the image gradients with noise dominates signal. This was touched on in [7] and has
a distribution that is heavier in the tails than a Gaussian. been more thoroughly investigated recently by [3] and [23].
Such a distribution accepts the fact that there is a small, The extent to which an image region can be zoomed need
but nonetheless tangible, probability of intensity discon- not be homogeneous across the image; some regions, where
tinuities occurring. there are more overlapping images and lower blur, may be
In a Huber MRF (HMRF), the Gibbs potentials are zoomed more than others. The second area of current inter-
determined by the Huber function est is the registration transformation. What is required here

ρ( x) = x2 if |x|≤ α
= 2α|x|−α 2 otherwise (11)

where x here is the first derivative of


the image, as given in (10). Figure 14
shows the Huber potentials function
and the corresponding prior PDF
plotted for several values of α. Note
that the transition from the quadratic Low Res (Bicubic 3× Zoom) Simple ||x||2 Prior (λ = 0.006)
to the linear region maintains gradient
continuity. HMRFs are an example of
convex, but nonquadratic priors.

Examples
Figure 15 compares the solutions ob-
tained under these three priors for the
car example. The SR image is recon-
structed at 3 × pixel zoom, and in all
cases the MAP solutions show more GMRF (λ = 0.006) HMRF (λ = 0.009, α = 0.05)
convincing detail than the ML recon-
struction of Figure 13, especially s 15. MAP results for the car example using various priors.
around the door handles and wing
mirror. The||x||2 and GMRF priors produce similar re-
sults, but note the sharp edges around the windows and
headlights in the HMRF reconstruction. The level of de-
tail in the reconstructions compared to the LR images is
Low Res (Bicubic 3× Zoom)
very apparent. Furthermore, the priors have eliminated
the noise of the ML solution, without introducing arti-
facts of their own. An ML solution at this zoom factor
would be completely dominated by noise.
Figures 16 and 17 show two further examples of MAP
HMRF (λ = 0.01, α = 0.05)
reconstruction. In the first, which is constructed from 30
LR images in a similar situation to Figure 13, the text is s 16. MAP super resolution applied to 30 LR images using the
clearly readable in the SR image but is not in the original HMRF prior.
images. The second example shows a
MAP reconstruction for images ob-
tained by the Mars rover. The details
of the rock surface are considerably
clearer in the SR image compared to
the originals.

Current Research
Challenges
Current research on SR in the com-
puter vision field falls into three
categories: first, there is analysis on Low Res (Bicubic 3× Zoom) GMRF (λ = 0.01)
performance bounds—how far can
this area of image be zoomed before s 17. MAP super resolution applied to 25 LR JPEG images using the GMRF prior.

MAY 2003 IEEE SIGNAL PROCESSING MAGAZINE 85


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.
is a point-to-point mapping between the images. This arti- [13] M.A. Fischler and R.C. Bolles, “Random sample consensus: A paradigm
for model fitting with applications to image analysis and automated cartog-
cle has concentrated on a homography mapping that is ap-
raphy,” Comm. ACM, vol. 24, no. 6, pp. 381-395, 1981.
plicable in certain circumstances. A simple extension is
[14] A.W. Fitzgibbons, “Simultaneous linear estimation of multiple view ge-
when the camera centers are not coincident and the viewed ometry and lens distortion,” in Proc. IEEE Conf. Computer Vision and Pat-
surface is a quadric (for example an ellipsoid or tern Recognition, 2001.
hyperboloid) where a transformation can be computed [15] W.T. Freeman, E.C. Pasztor, and O.T. Carmichael, “Learning low-level
from nine or more corresponding points [10], [28], [36]. vision,” Int. J. Comput. Vision, vol. 40, no. 1, pp. 25-47, Oct. 2000.
More generally the mapping for noncoincident camera cen- [16] C. Groetsch, The Theory of Tikhonov Regularization for Fredholm Equations
ters can be computed by a stereo reconstruction of the sur- of the First Kind. New York: Pitman, 1984.
face [30] or by using optic flow between images [37]. The [17] C.J. Harris and M. Stephens, “A combined corner and edge detector,” in
third area of current research is into scene specific priors and Proc. Alvey Vision Conf., 1988, pp. 147-151.

subspaces [8], [9]. The objective here is to use a prior tuned [18] R.I. Hartley, “Self-calibration of stationary cameras,” Int. J. Comput. Vi-
sion, vol. 22, no. 1, pp. 5-23, Feb. 1997.
to particular types of scenes, such as a face or text, rather
[19] R.I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision.
than a general purpose prior such as GMRF. These priors
London, U.K: Cambridge Univ. Press, 2000.
need not be made explicit, and in one imaginative approach
[20] M. Irani and P. Anandan, “About direct methods,” in Vision Algorithms:
[3], [15] the mapping from LR to HR is learned from Theory and Practice (Lecture Notes in Computer Science), W. Triggs, A.
training examples or low and HR image pairs. Zisserman, and R. Szeliski, Eds. New York: Springer Verlag, 2000.
[21] M. Irani and S. Peleg, “Improving resolution by image registration,”
David Capel completed his Ph.D. on image mosaicing and Graphi. Models Image Process., vol. 53, pp. 231-239, 1991.
super resolution as part of the Visual Geometry Group at [22] M. Irani and S. Peleg, “Motion analysis for image enhancement:resolu-
Oxford University in 2001. Since then, he has worked as a tion, occlusion, and transparency,” J. Visual Commun. Image Representation,
vision scientist at 2d3 Ltd, UK. He has published a num- vol. 4, pp. 324-335, 1993.

ber of papers in the field of image mosaicing and super res- [23] Z. Lin and H.Y. Shum, “On the fundamental limits of reconstruc-
tion-based super-resolution algorithms,” in Proc. Proc. IEEE Conf. Computer
olution and has also authored a book on the subject.
Vision and Pattern Recognition, 2001, pages I:1171-1176.
[24] S. Mann and R.W. Picard, “Virtual bellows: Constructing high quality
Andrew Zisserman is professor of engineering science at stills from video,” in Int. Conf. Image Processing, 1994.
the University of Oxford and heads the Visual Geometry [25] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical Recipes
Group. His research interests include geometry and rec- in C, 2nd ed. London, U.K.: Cambridge Univ. Press, 1992.
ognition in computer vision. He has authored over 100 [26] H.S. Sawhney, S. Hsu, and R. Kumar, “Robust video mosaicing through
papers and is author/editor of eight books. He has twice topology inference and local to global alignment,” in Proc. Euro. Conf. Com-
been awarded the IEEE Marr Prize for Computer Vision. puter Vision, 1998, pp. 103-119.
[27] R.R. Schultz and R.L. Stevenson, “Extraction of high-resolution frames
from video sequences,” IEEE Trans. Image Processing, vol. 5, pp. 996-1011,
References June 1996.
[28] A. Shashua and S. Toelg, “The quadric reference surface: Theory and ap-
[1] http://www.cognitech.com
plications,” Int. J. Comput. Vision, vol. 23, no. 2, pp. 185-198, 1997.
[2] http://www.salientstills.com
[29] C. Slama, Manual of Photogrammetry, 4th ed. Falls Church, VA: Amer.
[3] S. Baker and T. Kanade, “Limits on super-resolution and how to break them,”
Soc. Photogrammetry, 1980.
IEEE Pattern Anal. Machine Intell., vol. 24, no. 9, pp. 1167-1183, 2002.
[30] V.N. Smelyanskiy, P. Cheeseman, D. Maluf, and R. Morris, “Bayesian
[4] B. Bascle, A. Blake, and A. Zisserman, “Motion deblurring and super-reso-
super-resolved surface reconstruction from images,” in Proc. IEEE Conf.
lution from an image sequence,” in Proc. Euro. Conf. Computer Vision, 1996,
Computer Vision and Pattern Recognition, 2000, pp I:375-382.
pp. 312-320.
[31] R. Szeliski, “Image mosaicing for tele-reality applications,” Digital Equip-
[5] C.M. Bishop, Neural Networks for Pattern Recognition. London, U.K.: Ox-
ment Corp., Cambridge, MA, Tech. rep., 1994.
ford Univ. Press, 1995.
[6] D.P. Capel, “Image mosaicing and super-resolution,” Ph.D. dissertation, [32] A.N. Tikhonov and V.Y. Arsenin. Solutions of Ill-Posed Problems. Washing-
Univ. of Oxford, 2001. ton, DC: Winston,Wiley, 1977.

[7] D.P Capel and A. Zisserman, “Automated mosaicing with super-resolution [33] P.H.S. Torr and A. Zisserman, “MLESAC: A new robust estimator with
zoom,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, June application to estimating image geometry,” Comput. Vision Image Under-
1998, pp. 885-891. standing, vol. 78, pp. 138-156, 2000.

[8] D.P. Capel and A. Zisserman, “Super-resolution from multiple views using [34] R. Tsai and T. Huang, “Multiframe image restoration and registration,”
learnt image models,” in Proc. IEEE Conf. Computer Vision and Pattern Rec- Advances Comput.Vision Image Processing, vol. 1, pp. 317-339, 1984.
ognition, 2001. [35] H. Ur and D. Gross, “Improved resolution from subpixel shifted pictures,”
[9] D.P. Capel, Image Mosaicing and Super-Resolution. New York: Graph. Models Image Process., vol. 54, no. 2, pp. 181-186, Mar. 1992.
Springer-Verlag, 2003. [36] Y. Wexler and A. Shashua, “Q-warping: Direct computation of quadratic
[10] G. Cross and A. Zisserman, “Quadric surface reconstruction from reference surfaces,” in Proc. IEEE Conf. Computer Vision and Pattern Recog-
dual-space geometry,” in Proc. ICCV, Jan 1998, pp. 25-31. nition, vol. 1, 1991, pp. 333-338.
[11] F. Devernay and O.D. Faugeras, “Automatic calibration and removal of [37] W.Y. Zhao and S. Sawhney, “Is super-resolution with optical flow feasi-
distortion from scenes of structured environments,” in SPIE, vol. 2567, San ble?” in Proc. Euro. Conf. Computer Vision (Lecture Notes in Computer Sci-
Diego, CA, July 1995. ence, vol. 2350). Springer-Verlag, 2002, pp. 599-613.
[12] H. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems. [38] A. Zomet and S. Peleg, “Applying super-resolution to panoramic mosa-
Dordrecht, Germany: Kluwer Academic, 1996. ics,” in Workshop Applications of Computer Vision, 1998.

86 IEEE SIGNAL PROCESSING MAGAZINE MAY 2003


Authorized licensed use limited to: Huazhong University of Science and Technology. Downloaded on October 30,2023 at 12:19:15 UTC from IEEE Xplore. Restrictions apply.

You might also like