CSPL 391 PDF
CSPL 391 PDF
Raghuram Rangarajan
Ramji Venkataramanan
Siddharth Shah
Wavelet transforms enable us to represent signals with a high degree of sparsity. This is the principle
behind a non-linear wavelet based signal estimation technique known as wavelet denoising. In this report we
explore wavelet denoising of images using several thresholding techniques such as SUREShrink, VisuShrink
and BayesShrink. Further, we use a Gaussian based model to perform combined denoising and compression
for natural images and compare the performance of these methods.
Contents
1 Background and Motivation 3
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 The concept of denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Thresholding 4
2.1 Motivation for Wavelet thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Hard and soft thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Threshold determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Comparison with Universal threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
5 Conclusions 12
3
“If you painted a picture with a sky, be confused with smoothing; smoothing only removes
clouds, trees, and flowers, you would use the high frequencies and retains the lower ones.
a different size brush depending on the Wavelet shrinkage is a non-linear process and is
size of the features.Wavelets are like those what distinguishes it from entire linear denoising
brushes.” technique such as least squares. As will be explained
later, wavelet shrinkage depends heavily on the choice
-Ingrid Daubechies of a thresholding parameter and the choice of this
threshold determines, to a great extent the efficacy of
denoising. Researchers have developed various tech-
1 Background and Motivation niques for choosing denoising parameters and so far
there is no “best” universal threshold determination
1.1 Introduction technique.
The aim of this project was to study various
From a historical point of view, wavelet analysis is
thresholding techniques such as SUREShrink [1], Vis-
a new method, though its mathematical underpin-
uShrink [3] and BayeShrink [5] and determine the best
nings date back to the work of Joseph Fourier in the
one for image denoising. In the course of the project,
nineteenth century. Fourier laid the foundations with
we also aimed to use wavelet denoising as a means of
his theories of frequency analysis, which proved to be
compression and were successfully able to implement
enormously important and influential. The attention
a compression technique based on a unified denoising
of researchers gradually turned from frequency-based
and compression principle.
analysis to scale-based analysis when it started to be-
come clear that an approach measuring average fluc-
tuations at different scales might prove less sensitive
to noise. The first recorded mention of what we now 1.2 The concept of denoising
call a ”wavelet” seems to be in 1909, in a thesis by
A more precise explanation of the wavelet denoising
Alfred Haar.
procedure can be given as follows. Assume that the
In the late nineteen-eighties, when Daubechies and observed data is
Mallat first explored and popularized the ideas of
wavelet transforms, skeptics described this new field
X(t) = S(t) + N (t)
as contributing additional useful tools to a growing
toolbox of transforms. One particular wavelet tech-
nique, wavelet denoising, has been hailed as “offering where S(t) is the uncorrupted −1
signal with additive
all that we may desire of a technique from optimal- noise N(t). Let W (·) and W (·) denote the forward
ity to generality” [6]. The inquiring skeptic, how- and inverse wavelet transform operators.. Let D(·, λ)
ever maybe reluctant to accept these claims based on denote the denoising operator with threshold λ. We
asymptotic theory without looking at real-world ev- intend to denoise X(t) to recover Ŝ(t) as an estimate
idence. Fortunately, there is an increasing amount of S(t). The procedure can be summarized in three
of literature now addressing these concerns that help steps
us appraise of the utility of wavelet shrinkage more
realistically.
Wavelet denoising attempts to remove the noise Y = W (X)
present in the signal while preserving the signal char- Z = D(Y, λ)
acteristics, regardless of its frequency content. It in- Ŝ = W −1 (Z)
volves three steps: a linear forward wavelet trans-
form, nonlinear thresholding step and a linear in- D(·, λ) being the thresholding operator and λ being
verse wavelet transform.Wavelet denoising must not the threshold.
4
4
20
6 6
4
15 2
4
2
10
0 0
2
5 −2
−2
0 −4
0
−6
−4
−2
−5 −8
−10 −6
−4 −10 −10 −8 −6 −4 −2 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 2 4 6 8 10
0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500
Figure 1: A noisy signal Figure 2: The same sig- Figure 3: Hard Thresh- Figure 4: Soft Thresh-
in time domain. nal in wavelet domain. olding. olding.
Note the sparsity of co-
efficients.
2.2 Hard and soft thresholding
Hard and soft thresholding with threshold λ are de-
2 Thresholding fined as follows
The hard thresholding operator is defined as
2.1 Motivation for Wavelet threshold-
ing D(U, λ) = U for all|U | > λ
= 0 otherwise
The plot of wavelet coefficients in Fig 2 suggests that
small coefficients are dominated by noise, while coef- The soft thresholding operator on the other hand is
ficients with a large absolute value carry more signal defined as
information than noise. Replacing noisy coefficients (
small coefficients below a certain threshold value) by D(U, λ) = sgn(U )max(0, |U | − λ)
zero and an inverse wavelet transform may lead to
a reconstruction that has lesser noise. Stated more Hard threshold is a “keep or kill” procedure and
precisely, we are motivated to this thresholding idea is more intuitively appealing. The transfer function
based on the following assumptions: of the same is shown in Fig 3. The alternative, soft
thresholding (whose transfer function is shown in Fig
4 ), shrinks coefficients above the threshold in abso-
• The decorrelating property of a wavelet trans-
lute value. While at first sight hard thresholding may
form creates a sparse signal: most untouched
seem to be natural, the continuity of soft threshold-
coefficients are zero or close to zero.
ing has some advantages. It makes algorithms math-
ematically more tractable [3]. Moreover, hard thresh-
• Noise is spread out equally along all coefficients.
olding does not even work with some algorithms such
as the GCV procedure [4]. Sometimes, pure noise co-
• The noise level is not too high so that we can efficients may pass the hard threshold and appear
distinguish the signal wavelet coefficients from as annoying ’blips’ in the output. Soft thesholding
the noisy ones. shrinks these false structures.
1.1
Blocks
soft
mean squared error between the signal and its esti-
1
hard
mate is minimized.
0.9 The wavelet decomposition of an image is done as
0.8
follows: In the first level of decomposition, the im-
0.7
age is split into 4 subbands,namely the HH,HL,LH
and LL subbands. The HH subband gives the diag-
MSE
0.6
onal details of the image;the HL subband gives the
horizontal features while the LH subband represent
0.5
0.4
the vertical structures. The LL subband is the low
0.3
resolution residual consisiting of low frequency com-
0.2 ponents and it is this subband which is further split
0.1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
at higher levels of decomposition.
The different methods for denoising we investigate
threshold
Bumps
1.1
soft
hard
differ only in the selection of the threshold. The basic
1
procedure remains the same :
0.9
0.6
0.5
may be universal or subband adaptive)
0.4
• Compute the IDWT to get the denoised esti-
0.3
mate.
0.2
0.1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
Soft thresholding is used for all the algorithms due
threshold
Heavisine
to the following reasons: Soft thresholding has been
1.1
soft
hard
shown to achieve near minimax rate over a large num-
1
ber of Besov spaces[3]. Moreover, it is also found to
0.9 yield visually more pleasing images. Hard threshold-
0.8 ing is found to introduce artifacts in the recovered
0.7 images.
We now study three thresholding techniques- Vis-
MSE
0.6
0.5
uShrink,SureShrink and BayesShrink and investigate
0.4
their performance for denoising various standard im-
ages.
0.3
0.2
0.1
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
3.2 VisuShrink
threshold
1.4
Doppler
Visushrink is thresholding by applying the Univer-
soft
hard sal threshold proposed by Donoho
√ and Johnstone [2].
1.2
This threshold is given by σ 2logM where σ is the
1
noise variance and M is the number of pixels in the
image.It is proved in [2] that the maximum of any M
0.8 values iid as N(0,σ 2 )will be smaller than the univer-
MSE
0.2
0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
threshold
3.3 SureShrink
3.3.1 What is SURE ?
Let µ = (µi : i = 1, . . . d) be a length-d vector, and
let x = {xi } (with xi distributed as N(µi ,1)) be multi-
variate normal observations with mean vector µ. Let
µ
b=µ b(x) be an fixed estimate of µ based on the obser-
vations x. SURE (Stein’s unbiased Risk Estimator)
is a method for estimating the loss kb µ − µk2 in an
unbiased fashion.
In our case µ b is the soft threshold estimator
(b) Noisy version of ‘Lena’ (t)
µ
bi (x) = ηt (xi ). We apply Stein’s result[1] to get
an unbiased estimate of the risk Ekb µ(t) (x) − µk2 :
d
X
SU RE(t; x) = d−2·#{i : |xi | < T }+ min(|xi |, t)2 .
i=1
(2)
For an observed vector x(in our problem, x is
the set of noisy wavelet coefficients in a subband),
we want to find the threshold tS that minimizes
SURE(t;x),i.e
R∞
½ and Γ(t) = 0 e−u ut−1 du.
ηtFd (xi ) s2d ≤ γd
bx (x)i =
µ (4) The parameter σX is the standard deviation and β
ηtS (xi ) s2d > γd ,
is the shape parameter It has been observed[5] that
where with a shape parameter β ranging from 0.5 to 1, we
can describe the the distribution of coefficients in a
P 2 3/2 subband for a large set of natural images.Assuming
i (xi − 1) log2 (d)
s2d = γd = √ (5) such a distribution for the wavelet coefficients, we em-
d d pirically estimate β and σX for each subband and try
to find the threshold T which minimizes the Bayesian
η being the thresholding operator.
Risk, i.e, the expected value of the mean square error.
σY2 = σX
2
+ σ2 (11)
4.1 Introduction
where σY2 is the variance of Y. Since Y is modelled The philosophy of compression is that a signal typ-
as zero-mean, σY2 can be found empirically by ically has structural redundancies that can be ex-
ploited to yield a concise representation.White noise,
n
1 X 2 however does not have correlation and is not easily
bY2 =
σ Y (12)
n i,j=1 ij compressible. Hence, a good compression method
can provide a suitable method for distinguishing be-
where n × n is the size of the subband under consid- tween signal and noise.So far,we have investigated
eration. Thus wavelet thresholding techniques such as SureShrink
b2
σ and BayesShrink for denoising.We now use MMSE
TbB (b
σX ) = (13)
σ
bX estimation based on a Gaussian prior and show
where q that significant denoising can be achieved using this
σ
bX = max(b σY2 − σb2 , 0) (14) method. We then perform compression of the de-
noised coefficients based on their distribution and
In the case that σ b2 ≥ σ bY2 , σ
bX is taken to be find that this can be done without introducing sig-
b σX ) is ∞, or, in practice,TbB (b
zero, i.e, TB (b σX ) = nificant quantization error. Thus, we achieve simul-
max(|Yij |), and all coefficients are set to zero. taneous denoising and compression.
10
σ̂Y2 i = σ̂X
2
i + σ̂
2
We observe the similarity of this step to wavelet 3. Each coefficient Aj is encoded using the quan-
shrinkage, since each coefficient Yji is brought tizer with the least M so that (Aj − Âj )2 ≤ D.
closer to zero in absolute value by multiplying Note that both D and the quantizer levels, de-
2
σ̂X
with i
(< 1). This effect is similar to that of fined for N (0, 1) have to scaled by σAj for each
σ̂ 2 i
Y coefficient Aj .
wavelet shrinkage in soft thresholding.
4. Steps 2 and 3 are repeated for all the coefficients
Steps 2 through 4 are repeated for each Aj in a subband and for all the detail subbands.
detail subband i. Note that the coefficients
in the low resolution LL subband are kept 5. The coefficents in the low resolution subband are
unaltered. quantized assuming a uniform distribution [5].
This is motivated by the fact that the LL coeffi-
The results obtained using this method for the cients are essentially local averages of the image
’Elaine’ image with a Db4 wavelet with 4 levels are and are not characterized by a Gaussian distri-
shown in the first three parts of Figure 8.The MSE bution.
comparison plot in Figure 9 shows that denoising
by Gaussian estimation performs slightly better than 4.4 Results
SureShrink for the ’Clock’ image. The slightly infe-
Figure 8 shows the results obtained when this de-
rior performance to BayesShrink is to be expected
noising and compression scheme is applied to the
since a GGD prior is a more exact representation of
image ’Elaine’ with σ = 30.We used Db-4 discrete
the wavelet coefficients in a subband than the Gaus-
wavelet series with 4 levels of decomposition.We see
sian prior.
the denoised version has much lower MSE (143.7 vs
σ 2 = 900)and better visual quality too. The com-
4.3 Compression pressed version looks very similar to the denoised
image with an additional MSE of around 20. It has
We now introduce a quantization scheme for a been encoded using 1.52 bpp (distortion value D set
concise representation of the denoised coefficients at=0.1). The rate can be controlled by changing the
{X̂ji }. From (16), the {X̂ji } are iid with distribu- distortion level D. If we fix a large distortion level D,
4
tion N (0,
σ̂X i
). The number of bits used to encode we get a low encoding rate, but have a price to pay-
σ̂ 2 i
Y larger quantization error. We choose to operate at
each coefficient X̂ji is determined as follows. For sim- a particular point on the ’Rate v Distortion’ curve
plicity of notation , we denote X̂ji as Aj , keeping in based on the distortion we are prepared to tolerate.
mind that Aj is a part of subband i The performance of the different denoising schemes
is compared in Figure 9. A 200 × 200 image ’Clock’
1. We first fix the maximum allowable distortion, is considered and the MSEs for different values of σ
say D, for each coefficient. are compared. Clearly, VisuShrink is the least ef-
fective among the methods compared. This is due
2. The variance of each coefficient Aj is found to the fact that it is based on a Universal threshold
empirically by calculating the variance of a 3 × 3 and not subband adaptive unlike the other schemes.
block of coefficients centered at Aj . Among these, BayesShrink clearly performs the best.
This is expected since the GGD models the distribu-
It is assumed that we have available a fi- tion of coefficients in a subband well. MMSE esti-
nite set of optimal Lloyd Max quantizers for mation based on a Gaussian distribution performs
the N (0, 1) distribution. In our experiments, we slightly worse than BayesShrink. We also see that
took 5 quantizers with number of quantization a quantization error(approximately constant) is in-
levels M = 2, 4, 8, 16 and 32. troduced due to compression. Among the subband
12
80
300
100
120 250
140
MSE
200
160
180
150
200
20 40 60 80 100 120 140 160 180 200
50
20
40 0
5 10 15 20 25 30
60 sigma
80
100
Figure 9: Comparison of MSE of various denoising
120
schemes
140
160
180
adaptive schemes, SureShrink has the highest MSE.
200
20 40 60 80 100 120 140 160 180 200 But it should be noted that SureShrink has the de-
(b) Noisy version of ‘Elaine’ sirable property of adapting to the discontinuities in
the signal. This is more evident in 1-D signals such
as ’Blocks’ than in images.
Denoised Elaine with estimation with wavelet db4 # levels=4
5 Conclusions
We have seen that wavelet thresholding is an ef-
fective method of denoising noisy signals. We first
tested hard and soft on noisy versions of the stan-
dard 1-D signals and found the best threshold.We
then investigated many soft thresholding schemes
viz.VisuShrink, SureShrink and BayesShrink for de-
(c) Denoised version of ‘Elaine’
noising images. We found that subband adaptive
thresholding performs better than a universal thresh-
Quantized version of Denoised Elaine olding. Among these, BayesShrink gave the best re-
sults. This validates the assumption that the GGD
is a very good model for the wavelet coefficient dis-
tribution in a subband. By weakening the GGD as-
sumption and taking the coefficients to be Gaussian
distributed, we obtained a simple model that facili-
tated both denoising and compression.
An important point to note is that although
References
[1] Iain M.Johnstone David L Donoho. Adapting
to smoothness via wavelet shrinkage. Journal
of the Statistical Association, 90(432):1200–1224,
Dec 1995.