Image Fusion Using Empirical Mode Decomposition
Image Fusion Using Empirical Mode Decomposition
INTRODUCTION
Empirical Mode Decomposition (EMD) was first introduced by Huang et al. [1] and
provides a powerful tool for adaptive multi scale analysis of non stationary signals. It is a
non-parametric data-driven analysis tool that decomposes non-linear non-stationary signals
into Intrinsic Mode Functions (IMFs). The final representation of signal is an energy-
frequency distribution, designated as Huang spectrum [1] that gives sharp identifications of
salient information. With the Hilbert transform, the IMFs allow representation of
instantaneous frequencies as functions of time. The main conceptual benefits are the
decomposition of parent signal into IMFs and the visualization of time-frequency
characteristics.
The joint space-spatial frequency representations have received special attention in the fields
of image processing, vision, and pattern recognition. The EMD method was originally
proposed for the study of ocean waves and found potential applications in
geophysical exploration, underwater acoustic signals, noise removal flter
and biomedicine, audio source separation [2], de noising [3], and analysis of Doppler
ultrasound signals. The major advantage of EMD is that the basis functions
can be directly derived from the signal itself. Compared with Fourier
analysis, EMD analysis is adaptive while the basis functions of Fourier
analysis are linear combinations of fxed sinusoids.
The combination of EMD with the Hilbert Spectral Analysis (HSA) designated as the
Hilbert–Huang Transform (HHT), in five patents [1] by the National Aeronautics and Space
Administration (NASA), has provided an alternative paradigm in time–frequency analysis.
Though the Hilbert transform is well known and has been widely used in the signal
processing field since 1940s, the Hilbert transform when applied for instantaneous frequency
computation has many drawbacks among them, the most serious one is its derived
instantaneous frequency of a signal could lose its physical meaning when the signal is not a
mono-component or AM/FM separable oscillatory function [4]. The EMD, at its very
beginning was developed to overcome this drawback so that the data can be examined in a
physically meaningful time–frequency–amplitude space. It has widely accepted that EMD
1
with its new improvements has become a powerful tool in both signal processing and
scientific data analysis.
In addition to that, Wu and Huang [1] argued using the central limit theorem that each
IMF of Gaussian noise is approximately Gaussian distributed, and therefore, the energy of
each IMF must be a c 2 distribution. From the characteristics they obtained, Wu and Huang
further derived the expected energy distribution of IMFs of white noise. By determining the
number of degrees of freedom of the c 2 distribution for each IMF of noise, they derived the
analytic form of the spread function of the energy of IMF. From these results, one would be
able to discriminate an IMF of data containing signals from that of only white noise with any
arbitrary statistical significance level. They verified their analytic results with those from the
Monte Carlo test and found consistency.
The main difference between the Wavelet and EMD fusion approaches is
depended on decomposition. The EMD method is widely used in one-
dimensional signal processing as well as in two-dimensional image
processing. Wavelet decomposition is related to the predefned wavelet
basis while the EMD is a non-parametric data-driven process that is not
required to predetermine the basis during decomposition. It is commonly
2
understood that Fourier transform is a useful method for stationary signal analysis, while
DWT is more suitable for non-stationary signal analysis. In fact, the DWT is a windowed
Fourier transform and a finite length of the DWT base may cause energy leakage. Once the
wavelet base and decomposition level are determined, the signal obtained is within a certain
frequency range that only depends on the sampling rate and has no relationship to the signal.
Therefore, this
method is not adaptive. Compared with the DWT, the EMD shows superior performance on
data analysis and data filtering. It is a powerful tool for adaptive multiscale analysis of brief
non-linear and non-stationary signals. These interesting characteristics of the EMD motivated
the extension of this method to the area of image processing [7].
The main objective of this work is to study Empirical Mode Decomposition (EMD) and to
perform image fusion using EMD .
To study EMD process and carry out image decomposition using Bi dimensional
Empirical Mode Decomposition (BEMD) and Vectorized Empirical Mode
Decomposition(VEMD) and compare the results.
To fuse the decomposed images from BEMD and VEMD using four different
methods.
To compare the results and analyse the performance of each fusion method.
Second chapter explains the properties of Intrinsic Mode Functions (IMFs) and the process of
one dimensional EMD.
Third chapter explains about Bidimensional Empirical Mode Decomposition (BEMD) and its
techniques in brief and presents BEMD by finite element method
Fourth chapter deals with the Vectorized Empirical Mode Decomposition (VEMD) and gives
the process of VEMD with the flow chart.
3
Fifth chapter deals with image fusion methods and explains four different image fusion
methods in detail.
Sixth chapter presents the comparative results and discussions with the help of subjective
and objective evaluation.
Sixth chapter deals with conclusions, assumptions made for carrying out the project.
4
CHAPTER 2
Empirical Mode Decomposition (EMD) is a powerful tool for adaptive multi scale
analysis of non stationary signals [1]. It is a non-parametric data-driven analysis tool that
decomposes non-linear non-stationary signals into Intrinsic Mode Functions (IMFs). A
fundamental problem in data analysis is to obtain an adaptive application-oriented
representation for a given data set. EMD is an efficient method for such adaptive
representations. Indeed, the original purpose of EMD is to decompose a signal into
components, each of which has meaningful instantaneous frequency, and different
components correspond to different frequency scales. EMD decomposes a signal into a
infinite sum of intrinsic mode functions (IMFs) based on the direct extraction of the energy
associated with various intrinsic time scales. Many examples of using EMD show that the
IMFs obtained from EMD provide physical insights which are crucial in engineering
applications. Due to the fully adaptive nature of the method, it is particularly suitable for
processing nonlinear and non-stationary signals.
EMD for 1D signal analysis is described in this section.
Contrary to many of the former decomposition methods, EMD is intuitive and direct,
with the basis functions based on and derived from the data. The assumptions for this method
are [1][4]
1. the signal has at least a pair of extrema.
2. the characteristic time scale is defined by the time between the successive extrema.
3. if there are no extrema, and only inflection points, then the signal can be
differentiated to realize the extrema, whose IMFs can be extracted. Integration may
be employed for reconstruction. The time between the successive extrema was used
by Huang et al. [1] as it allowed the decomposition of signals that were all positive,
all negative, or both. This implied that the data did not have to have a zero mean, as
in the case of image data. This also allowed a finer resolution of the oscillatory
modes.
5
A function is an intrinsic mode function if the number of extrema equals the number
of zero-crossings and if it has a zero local mean. EMD decomposes a signal into its IMFs
based basically on the local frequency or oscillation information. The first IMF contains the
highest local frequencies of oscillation, the final IMF contains the lowest local frequencies of
oscillation and the residue contains the trend of the data. The IMFs obtained from EMD are
expected to have the following properties [8]:
i. in the whole data set, the number of extrema and the number of zero crossings must
be equal or differ by at most one.
ii. there should be only one mode of oscillation between two successive zero
crossings.
iii. at any point, the mean value of the envelope defined by the local maxima and the
envelope defined by the local minima is zero.
iv. the IMFs are locally orthogonal. In fact property (i) ensures property (ii), and vice
versa.
As per the definition of an IMF, the decomposition method can simply employ the
envelopes defined by the local maxima and minima individually. The extrema are identified
and all local maxima are connected by a cubic spline to form the upper envelope. This
process is repeated for the local minima and the lower envelope is constructed. While
interpolating, care is taken that the upper and lower envelopes cover all the data between
them. The point-wise mean of the envelopes is called m1 , and is subtracted from the data r0
for the first component h1` . For the first iteration, X(t) is the used as the data (shown in Fig-1),
r0 = X (t )
h1 = r0 - m1
As per mathematical definitions, h1` should be considered as one of the IMF, as h1` seems to
satisfy all the requirements of an IMF. But since we are interpolating the extrema with
numerical schemes, overshoots and undershoots are bound to occur. These generate new
maxima and minima, and distort the magnitude and phase of the existing extrema. These
effects will not affect the process directly as it is the mean of these envelopes that pass on to
the next stages of the algorithm and not the envelopes themselves. The formation of false
6
extrema cannot be avoided easily and an interesting offshoot is that this procedure inherently
recovers the proper modes lost in the initial examination and recovers low-amplitude riding
waves on repeated sifting. The envelope means may be different from true local mean and
consequently some asymmetric waveforms may occur but they can be ignored as their effects
in the final reconstruction are very minimal. Apart from a few theoretical difficulties, in
practice, a ringing effect at the ends of the data array can occur. But even with these effects,
the sifting process still extracts the essential scales from the data set. The sifting process
eliminates riding waves and makes the signal symmetrical. In the second sifting process, h1 is
considered as the data where m11 is the mean of h1 envelope.
h1` = h1 - m11
c1 = h1k (2.2)
In sifting, the finest oscillatory modes are separated from the data, analogous to
separating fine particles through a set of fine to coarse sieves. As can be expected of such a
process, uneven amplitudes will be smoothened. But if performed too long, the sifting
process becomes invasive and destroys the physical meaning of the amplitude fluctuations.
On sifting too long, we get IMFs that are frequency modulated signals with constant
amplitude. To retain the physical meanings of an IMF, in terms of amplitude and frequency
modulation, a standard deviation based stopping criterion is used. The standard deviation,
SD, computed from two consecutive sifting results, is used as one of the stopping criteria.
�h 2
�
i(k - 1) - h i(k )
SD = �� �
� h 2 i(k -1) �
� �
(2.3)
7
Sifting is stopped if SD falls below a threshold (generally the threshold is between 0.2 to 0.3).
The isolated intrinsic mode function c1 contains the finest scale of the signal and we separate
c1 from the data.
r1 = r0 - c1 (2.4a)
The new signal called the residue r1 still holds lower frequency information. In the next
iteration, the residue r1 is treated as the new data in place of r0 and subjected to the sifting
process. This procedure is repeated on all the subsequent residues ( rj ’s), to realize a set of
IMFs.
r2 = r1 - c2 ,..., rn = rn -1 - cn
(2.4b)
The sifting through residuals can be stopped by any of the following stopping criteria: if the
residue becomes too small to be of any practical importance, or when the residue becomes a
monotonic function containing no more IMFs. It is not expected, to always have a residue
with zero mean, because even for data with zero mean, the final residue can still be different
from zero. The final residue is the trend of the data. If the extrema of the residue is greater
than 3 then it has to go through next shifting process, while on the other hand, if the extrema
is less than 2 then it is generally considered to have the lowest information and hence it is
considered as the final residue and another shifting is not required. Reconstruction of the
signal is performed using the relation,
n
X̂(t) = �ci + rn
i =1 (2.5)
Thus, the data is decomposed into n-empirical modes, and a residue rn , which can be either
the mean trend or a DC shift. The flow process of EMD is described in figure 1.
8
rik|i = 0 ;k = 0 = X ( t )
SD < 3
Extrema rik
<3
Envelope and mean det ection m ik
hik = hi ( k - 1 ) - mik hi ( k - 1 )
NO
YES
ci = hik ri = ri - 1 - ci
NO
IMFs YESS
Trend rn = ri
9
Figure 2.1 Flow diagram of EMD
0.6
0.4
0.2
0
Amplitude
-0.2
-0.4
-0.6
-0.8
0 20 40 60 80 100 120 140 160 180 200
Number of samples
0.4
0.3
0.2
0.1
Amplitude
-0.1
-0.2
-0.3
-0.4
0 20 40 60 80 100 120 140 160 180 200
Number of samples
0.25
0.2
0.15
0.1
Amplitude
0.05
-0.05
-0.1
-0.15
-0.2
0 20 40 60 80 100 120 140 160 180 200
Number of samples
10
0.08
0.06
0.04
0.02
Amplitude
-0.02
-0.04
-0.06
-0.08
0 20 40 60 80 100 120 140 160 180 200
Number of samples
0.08
0.06
0.04
0.02
Amplitude
-0.02
-0.04
-0.06
-0.08
0 20 40 60 80 100 120 140 160 180 200
Number of samples
0.1
0.08
0.06
0.04
0.02
Amplitude
-0.02
-0.04
-0.06
-0.08
-0.1
0 20 40 60 80 100 120 140 160 180 200
Number of samples
11
0.6
0.58
0.56
0.54
Amplitude
0.52
0.5
0.48
0.46
0 20 40 60 80 100 120 140 160 180 200
Number of samples
Figure 2 shows the Intrinsic Mode Functions IMFs of a random signal generated for 200
samples as shown. Last block shows the residue of the signal and it is the lowest
decomposition which cannot be further decomposed.
The 1D-EMD is very powerful for various signal processing applications because it
decomposes a signal into various spectral components called intrinsic mode functions (IMFs).
IMFs are adaptive and dependent on the signal itself, and are a useful tool for analyzing
nonlinear and non-stationary signals. The 1D-EMD has been used for audio source
separation, denoising and analysis of Doppler ultrasound signals. Its practical value led
researchers to investigate the extension of 1D-EMD into complex-valued signals [9], with the
overarching goal to connect 1D-EMD to traditional filter bank theory [6].
12
CHAPTER 3
EMD has many interesting features and an important feature is, it is a fully adaptive
multi scale decomposition. This is because EMD operates on the local extremum sequence
and the decomposition is carried out by direct extraction of the local energy associated with
the intrinsic time-scales of the signal itself. This is different from the wavelet-based multi
scale analysis that characterizes the scale of a signal event using pre-specified basis functions.
Owing to this feature, EMD is highly promising in dealing with other problems of a multi
scale nature. EMD found various advantages and it can be useful for two dimensional data
analysis. Although this is quite interesting, there are several challenging difficulties that need
to be overcome. One of them is the computation efficiency. For a medium-sized two-
dimensional data, e.g. a 512X512 image, the number of the local extrema can be tens of
thousands. Owing to the iterative nature of the EMD method, the decomposition of such a
dataset is rather time-consuming and could be unacceptable for many applications. Various
attempts are made for the development of fast bidimensional EMD which will be useful for
various applications. The 1D-EMD that is applied for signals has been extended for 2D image
and 2D-EMD has been developed recently by a few researchers based on a cubic polynomial
interpolation [10] and a partial differential equation [11]. 2D-EMD has been applied into
some applications such as image compression and image fusion [12].
An image is a bi dimensional IMF if it has a zero mean, if the maxima are positive
and the minima are negative and if the number of maxima equals the number of minima. Bi
dimensional empirical mode decomposition (BEMD) method is a relatively new, but potential
image processing algorithm. BEMD decomposes an image into multiple hierarchical
components known as bi dimensional intrinsic mode functions (BIMFs) and a bi dimensional
residue, based on the local spatial variations or scales of the image. In each iteration of the
process, two-dimensional (2D) scattered data surface interpolation is applied to a set of
arbitrarily distributed local maxima (minima) points to form the upper (lower) envelope.
13
BEMD is a promising image processing algorithm that can be applied in various real-world
problems, e.g., medical image analysis, pattern analysis, and texture analysis, Both EMD and
BEMD require finding local maxima and minima points and subsequent interpolation of
those points at each iteration of the process. One dimensional (1D) extrema points are
obtained using either a sliding window or a local derivative; 2D extrema points are obtained
using a sliding window or various morphologic operations [13].
EMD as well as BEMD decomposes a signal into its IMFs based basically on the
local frequency or oscillation information. The definition and properties of the BIMFs are
slightly different from the IMFs. It is sufficient for BIMFs to follow only the final two (iii
and iv in section 4.2) properties given above for IMFs [14, 15]. In fact, due to the properties
of an image and the BEMD process, it is not possible to satisfy the first two properties (i and
ii in section 4.2) given above in the case of BIMFs, since the maxima and minima points are
defined in a 2D scenario for an image.
The first IMF picture represents the finest scaled components which show the most detailed
and original information. The second IMF represents less detailed information compared to
the first IMF and so on. The detailed information describes the image texture of the finest
scaled section and the rest shows the image basic trend section and basic structure.
The BEMD general process is same as that of one dimensional EMD but signal
should be replaced by an array i.e an image. The whole process of decomposition of an image
into its 2D IMFs (BIMFs) in order of local spatial scales is known as sifting. The
decomposition of any image into BIMFs is not a unique process.
In development of the two-dimensional EMD method, while the sifting process of the
original EMD can be directly used, the key step is to generate the local mean surface of the
two-dimensional data. BEMD decomposition and the resulting BIMFs are governed by the
method of extrema detection, criteria for stopping the iterations for each IMF and
interpolation techniques. Though all of these factors are important for successful
decomposition, the interpolation method may be considered the most crucial. Most of the
scattered data interpolation techniques to produce 2D surfaces are themselves iterative
14
processes. In the case of BEMD, it is very likely that the maxima or minima map does not
contain any interpolation centers at the boundary region, which may be more severe for the
later modes of decomposition. Hence, some kind of boundary processing to introduce
additional interpolation centers at the boundary may also be required for successful
decomposition [13,16]. Interpolation of the local maxima points is needed to form the upper
envelope, and interpolation of the local minima points is needed to form the lower envelope
of the data/image. The average of the upper and the lower envelopes of the image gives the
mean envelope. One of the purposes of the BEMD decomposition is to get BIMFs with zero
local mean defined by the mean envelope, which further plays a significant role for
orthogonal decomposition. Hence, the accuracy of the envelopes in terms of shape and
smoothness is very important, which calls for the need to identify an appropriate 2D scattered
data interpolation technique for BEMD.
In development of the two-dimensional EMD method, while the sifting process of the
original EMD can be directly used, the key step is to generate the local mean surface of the
two-dimensional data. The decomposition of any image into 2D IMFs is not a unique process.
Owing to the above statements there are various methods available for BEMD. The one-
dimensional EMD algorithm of Huang can be extended to the two-dimensional case by using
cubic interpolation on triangles to generate the upper and lower envelope surfaces (17). This
decomposition is based on Delaunay triangulation and on piecewise cubic polynomial
interpolation. A particular attention is devoted to boundary conditions that are crucial for the
feasibility of the bidimensional EMD.
In BEMD using Finite Element method, the local mean surface of a two-dimensional dataset
is generated directly from the characteristic data points rather than from the upper and lower
envelopes. This overcomes the problem of possible over shootings between the upper and
lower envelopes. Our method avoids constructing two different two dimensional interpolating
surfaces (the upper and lower envelopes), which is normally a difficult task and requires
much computational cost. In addition, the characteristic data points in our method include not
only the local maxima and minima, but also the saddle points, which are a distinct feature of
two-dimensional data. In this paper, finite element method is used for fusing the images
using EMD [18].
15
The method of the two-dimensional EMD using finite element method is summarized for a
dataset I [19]. The first IMF of I is obtained as follows.
(i) Find the local extrema and saddle points of I .
(ii) Form a triangular mesh using the Delaunay method [20,21]
(iii) Smooth the characteristic point set using the Laplacian operator. [Equation 10 in
(19)].
(iv) Pick up the rows and columns on which the number of the characteristic points is
greater than the value specified in equation 13 in (19), determine the knots for the bi-
cubic interpolation and generate the local mean surface m(p) .
(v) Compute h = I - m .
[hk -1 (p) - h k (p)]2
(vi) If �
p h 2k -1 (p)
< SD
where hk (p) is the sifting result in the k th iteration, and SD is typically set
between 0.2 and 0.3, stop and we obtain an IMF. Otherwise, treat h as the data and
iterate on h through steps (i)–(vi).
which is the first residue. The algorithm proceeds to select the next IMF by applying the
above procedure to the first residue r1 . This process is repeated until the last residue rn has at
most one extremum or becomes constant. The original data can then be represented as
n
I = �c j + rn (3.1)
j= 1
16
CHAPTER 4
a b c d
e f g h
I(x, y) =
i j k l
m n o p
Concatenate rows
I(x) = a b c d h g f e i j k l p o n m
17
Figure 4.1 Concatenation of rows to convert 2D image into 1D vector data
18
The extremas of the converted signal are found out and the maxima and minima of
the signal are interpolated to find the upper and lower envelope. The mean of the two
envelopes are calculated, shifting process is applied and the standard deviation of the shifted
signal is found out. If the standard deviation is less than .3 then no further shifting is not
possible and assign it as Intrinsic Mode Function. On the other hand if the standard deviation
is greater than .3 the above procedure is again repeated till the standard deviation becomes
less than .3.
Residue is calculated from IMF and if the number of extremas is less than 3, no further
decomposition is not possible. It is assigned as final residue and if the extremas is greater
than 3, the above procedure is repeated. The reconstructed image is found by summing all the
IMFs and the residue. After performing 1D EMD the reconstructed vector is converted back
to 2D image by reversing the procedure shown in Figure 4.1.
19
CHAPTER 5
20
wavelet transform would provide directional information in decomposition levels and contain
unique information at different resolutions [24, 25].
Various image fusion algorithms are in literature and an attempt have been made to
fuse the images using empirical mode decomposition. The images to be fused are
decomposed to several IMFs using the above BEMD process. Fusion is performed at the
decomposition level and the fused IMFs are reconstructed to realize the fused image. The
decomposed IMFs of the images are fused using three methods explained as below and the
performance metrics are compared to find the better results.
An important point to note in image fusion using EMD is that the number of IMFs
should be fixed. The number of IMFs can be different for two images and fusion of IMFs are
not possible. Here the number of IMFs should be fixed and the images are decomposed to
IMFs and fused. One of the important prerequisites to be able to apply fusion techniques to
source images is the image registration, i.e., the information in the source images is needed to
be adequately aligned and registered prior to fusion of the images. In this thesis, it is assumed
that the source images are already registered.
There are lot of image fusion methods available. Here in this project we discuss four
image fusion methods viz. Simple Averaging, Principle Component Analysis, Discrete
Wavelet Transform and Laplacian pyramid.
5.2.1 SIMPLE AVERAGING
Simple Average mechanism is a simple way of obtaining an output image with all
regions in focus. The value of the pixel I ( x , y ) of each image is taken and added. This sum
is then divided by N to obtain the average. The average value is assigned to the corresponding
pixel of the output image. This is repeated for all pixel values. This technique is a basic and
straightforward technique and fusion could be achieved by simply averaging the
corresponding pixels in each input image. The IMFs from each image are averaged using the
below formula and the fused IMFs are reconstructed to get the fused image.
I f ( x, y ) = a * I1 ( x, y) + b * I 2 ( x, y)
(5.1)
21
The PCA involves a mathematical procedure that transforms a number of correlated
variables into a number of uncorrelated variables called principal components. It computes a
compact and optimal description of the data set. The first principal component accounts for as
much of the variance in the data as possible and each succeeding component accounts for as
much of the remaining variance as possible. First principal component is taken to be along
the direction with the maximum variance. The second principal component is constrained to
lie in the subspace perpendicular of the first. Within this subspace, this component points the
direction of maximum variance. The third principal component is taken in the maximum
variance direction in the subspace perpendicular to the first two and so on. The PCA is also
called as Karhunen-Loève transform or the Hotelling transform. The PCA does not have a
fixed set of basis vectors like FFT, DCT and wavelet etc. and its basis vectors depend on the
data set [22].
Let the source images (images to be fused) be arranged in two-column vectors. The
steps followed to project this data into a 2-D subspaces are:
1. Organise the data into column vectors. The resulting matrix Z is of dimension 2 x
n.
2. Compute the empirical mean along each column. The empirical mean vector M has
a dimension of 1 x 2.
3. Subtract the empirical mean vector M from each column of the data matrix Z. The
resulting matrix X is of dimension 2 x n.
4. Find the covariance matrix C of X i.e. C= XX T mean of expectation = cov(X).
5. Compute the eigenvectors V and eigen value D of C and sort them by decreasing
eigen value. Both V and D are of dimension 2 x 2.
6. Consider the first column of V which corresponds to larger eigenvalue to compute
P1 and P2 as:
V (1)
P1 = V (2)
�V ;
P2 =
�V (5.2)
22
The IMFS of each image (images to be fused) I1 ( x, y ) and I 2 ( x , y ) are arranged
in two column vectors and their empirical means are subtracted. The resulting vector has a
dimension of n x 2, where n is length of the each image vector. Compute the eigenvector and
eigen values for this resulting vector are computed and the eigenvectors corresponding to the
larger eigen value obtained. The normalized components P1 and P2 (i.e., P1 + P2 = 1) are
computed from the obtained eigenvector. The fused image is:
I f ( x, y ) = P1 I1 ( x, y ) + P2 I 2 ( x , y )
(5.3)
23
Figure 5.1 Block diagram of Discrete Wavelet Transform.
Wavelet separately filters and down samples the 2-D data (image) in the vertical and
horizontal directions (separable filter bank). The input (source) image is I( x , y ) filtered by
low pass filter L and high pass filter H in horizontal direction and then down sampled by a
factor of two (keeping the alternative sample) to create the coefficient matrices I L ( x , y ) and
I H ( x , y ) .The coefficient matrices and I H ( x , y ) are both low pass and high pass filtered in
vertical direction and down sampled by a factor of two to create sub bands (sub images)
I LH ( x , y ) , I HL ( x , y ) and I HH ( x , y ) . The I LL ( x , y ) contains the average image
24
Figure 5.2 Block diagram of Inverse Discrete Wavelet Transform.
Inverse 2-D wavelet transform is used to reconstruct the image I( x , y ) from sub
column up sampling (inserting zeros between samples) and filtering using lowpass L% and
high pass filter H% for each sub images. Row up sampling and filtering with low pass filter
L% and high pass filter H% of the resulting image and summation of all matrices would
Image pyramids have been initially described for a multi-resolution image analysis
and as a model for the binocular fusion in human vision. An image pyramid can be described
as collection of low or band pass copies of an original image in which both the band limit and
sample density are reduced in regular steps. A multi resolution pyramid transformation
decomposes an image into multiple resolutions at different scales. A pyramid is a sequence of
images in which each level is a filtered and sub sampled copy of the predecessor. The lowest
level of the pyramid has the same scale as the original image and contains the highest
resolution information. Higher levels of the pyramid are reduced resolution and increased
scale versions of the original image.
25
5.2.4a IMAGE FUSION USING LAPLACIAN PYRAMID:
Several approaches to Laplacian fusion techniques have been documented since Burt
and Andelson introduced this transform back in 1983[26]. The Laplacian Pyramid
implements a “pattern selective” approach to image fusion, so that the composite image is
constructed not a pixel at a time, but a feature at a time. The basic idea of this technique is to
perform pyramid decomposition on each of the source images, and then integrate all these
decompositions to form a composite representation, and finally reconstruct the fused image
by performing an inverse pyramid transform.
If the original image is considered as g0, the first step in Laplacian pyramid
transform is to low-pass filter the original image g0 to obtain image g1, which is a “reduced”
version of g0. In similar way g2 is formed as a reduced version of g1, and so on. Image
reconstruction involves the opposite steps of the image decomposition which is explained as
above.
The first step is to construct a pyramid for each source image. The fusion is then
implemented for each level of the pyramid using a feature selection decision mechanism. It
can be used several modes of combination, such as selection or averaging. In the first one, the
combination process selects the most salient component pattern from the source and copies it
to the composite pyramid, while discarding the less salient pattern. In the second one, the
process averages the sources patterns. This averaging reduces noise and provides stability
where source images contain the same patter information. The former is used in locations
where the source images are distinctly different, and the latter is used in locations where the
source images are similar. One other possible approach, chosen in this research, is to select
the most Salient component, following next equation
�I ( x, y ), if I1 ( x , y ) > I 2 ( x, y ) �
I f ( x, y ) = �1 � (5.4)
� I 2 ( x, y ), otherwise
Where I f , I1 and I 2 are the two input and fused signals for levels 0 ≤ l ≤ N −1. Then a
consistency filter is applied. The aim of this consistency filter is to eliminate the isolated
points. Finally, for level N it is performed an average of both source components.
I1 ( x, y ) + I 2 ( x, y )
I f ( x, y) = (5.5)
2
This function method uses a recursive algorithm to achieve three main tasks. First, it
constructs the Laplacian pyramid of the source images [27]. Second, it does the fusion at each
26
level of the decomposition. And finally, it reconstructs the fused image from the fused
pyramid. To implement Laplacian pyramid decomposition, two elementary scaling operations
are to be defined first, usually referred to as reduce and expand. The reduce operation applies
a low-pass filter to the image and down samples it by a factor of two. The expand operation
employs a predefined interpolation method and upsamples the image by a factor of two.
The block diagram of laplacian pyramid is shown in the figure 5.3
Given these two operations, the Laplacian pyramid is obtained through the following two-
step process:
1. Generate a Laplacian pyramid Li for each of the images I i .
2. Merge the pyramids Li by taking the maximum at each pixel of the pyramid, obtaining the
Laplacian pyramid representation L of the fusion result.
3. Reconstruct the fusion result I from its Laplacian pyramid representation.
4. Normalize the dynamic range of the result so that it resides within the range of [0,1], and
apply additional post-processing techniques as necessary.
27
CHAPTER 6
RMSE is calculated using corresponding pixels in the reference image and the reconstructed
image. RMSE value is 0 for a perfectly reconstructed image.
M N
1
RMSE =
MN
��( I (i , j ) - I
t r (i , j )) 2
i =1 j =1
(6.1)
Where It
refers to true/reference image and If refers to the reconstructed image
MAE is also calculated using corresponding pixels in the reference image reconstructed
image. MAE value should also be nearer to zero same as that of RMSE.
28
M N
1
MAE =
MN
��I (i , j ) - I
t f (i , j )
i =1 j =1 (6.2)
It is the difference between the means of original reference image and the reconstructed
image. The value is relative to mean value of the original image. Its ideal value is 0. BM is
calculated using the formula
It - I f
BM =
It (6.3)
M ,N
1
Where It =
MN
�I ( i , j )
i , j =1
t
M ,N
1
If =
MN
�I
i , j=1
f ( i, j )
Important index to weight the information of image, it reflects the deviation degree of values
relative to mean of image. The greater the SD, more dispersive the gray grade distribution is.
Standard deviation would be more efficient in the absence of noise. An image with high
contrast would have a high standard deviation. It is calculated using the formula
1 M ,N
SD = �( I (i , j ) - I f )2
( MN ) i , j =1 f
(6.4)
PSNR is the ratio of peak values of signal to noise ratio. Its value will be high when the
reconstructed and reference images are similar. Higher value implies better reconstruction.
M N
��I (i , j ) I
i =1 j =1
t f (i , j )
NCC = M N
��I (i , j )
i =1 j = 1
t
(6.5)
29
Where L is the number of gray levels in the image
�A B i i
Cos a = i =1
(6.6)
N N
�A A �B B
i=1
i i
i =1
i i
��I (i , j ) I
i =1 j =1
t f (i , j )
NCC = M N (6.5)
��I (i , j )
i = 1 j =1
t
30
dissimilar. The ideal value is one when the reference and fused are exactly alike and it will be
less than one when the dissimilarity increases.
2C tf
CORR = (6.7)
C t +C f
M N
Where C r = ��I r ( i , j )2
i = 1 j =1
M N
C f = ��I f ( i , j )2
i =1 j =1
M N
C rf = ��I f ( i , j ) I t ( i , j )
i =1 j =1
SF = RF 2 + CF 2 (6.8)
N M
1
RF =
MN
��[ I
i =1 j = 2
f ( i , j ) - I f ( i - 1, j )]2
N M
1
CF =
MN
��[ I
i =1 j = 2
f ( i , j ) - I f ( i - 1, j )]2
31
M N M N
1 1
m It =
MN
��I t (i , j ) , m I f =
i =1 j =1 MN
��I
i =1 j =1
f (i, j )
M N M N
1 1
s 2 It = � �
MN - 1 i =1 j =1
( I t ( i , j ) - m I )2 , s 2 I f =
t
� �
MN - 1 i =1 j =1
( I f ( i , j ) - m I )2
f
M N
1
s 2 It I f = ��( I ( i , j ) - m It )( I f ( i , j ) - m I f )
MN - 1 i =1 j =1 t
where the parameter C has two senses. Here N refers to product of rows and columns.Firstly,
it is introduced to avoid the denominator to be zero. Secondly, it can be viewed as a scaling
parameter, the different magnitude of will lead to the different ESSIM score. Here, we choose
C = ( BL)2
Where C is the predefined constant and L is the dynamic range of the edge .
32
comparison of BIMFs and residue formed using BEMD and VEMD. In VEMD the IMF
signals are also shown in addition to the BIMFs. The reconstructed and the error image of the
same are also shown. The decompositions contains the image information shown in figure
and it is found that first IMFs capture the border information with better accuracy while the
latter IMFs receded into the details of the image. The residue is the low pass filtered one and
it contains the low information as compared to other IMFs.
SIGNAL IMAGE
IMF1
IMF2
33
IMF3
IMF4
IMF5
Residue
Recons
tructed
image
Using
IMFs
-16
x 10
8
Error 2
Image -2
-4
-6
-8
0 1 2 3 4 5 6 7
4
x 10
34
Figure 6.1b Comparison of BEMD and VEMD
BEMD algorithm is applied to the given image as shown in the figure 4a. Table 1 shows the
effect of increasing the number of iterations, Table 2 depicts the results for increasing number
of IMFs. Table 3 shows the time taken for the various tolerance values and their
corresponding RMSE values are also presented. From the Tables 1-3, it is observed that time
increases with the increase in maximum number of iterations and the number of IMFs. But if
the tolerance decreases, time increases (i.e) if more precise the value of tolerance is, more the
number of iterations is needed.
Table 6.1 Time and RMSE values corresponding to number of iterations for tolerance=1 and
number of IMFs = 5
Time
Imax RMSE
(sec.)
1 32.681 1.8539e-014
2 41.605 2.4045e-014
3 46.275 2.4199e-014
4 46.257 2.4199e-014
5 46.285 2.4199e-014
Table 6.2 Time and RMSE values corresponding to number of IMFs for tolerance=1 and
number of iterations Imax = 5
No. of
Time(sec.) RMSE
IMFs
1 30.697 5.7629e-15
2 34.583 9.6615e-15
3 38.029 1.5727e-14
4 41.399 2.0036e-14
5 46.444 2.2712e-14
Table 6.3 Time and RMSE values corresponding to tolerance for No. of IMFs=1 and
maximum number of iterations = 5
35
Table 6.4 Performance evaluation of BEMD and VEMD for a given image with number of
iterations = 5, no of IMFs=5 and tolerance = 0.01
Image Time
Metrics RMSE MAE BM SD PSNR
size (s)
BEMD 1.6400e-14 7.2106e-15 0 47.6085 2.102e+11 189.060
512x512
VEMD 1.6243e-16 1.1417e-16 0 68.6460 1.7102e+11 3.227
Table 6.5 shows the relative time taken for BEMD and VEMD for various image sizes. From
the table it inferred that VEMD shows shorter response time than BEMD. Figure 6.2 shows
the variation of relative time with different image sizes. If we fit the equation for the same,
36
180
160
VEMD
140 BEMD
120
Time (sec.)
100
80
60
40
20
0
64x64 128x128 256x256 512x512
Image size
Figure 6.2 Graph showing the variation of time with the size of the image using BEMD and
VEMD
Table 6.5 Relative time of BEMD and VEMD for various image sizes
Time
Image size Metrics Time(s)
difference
BEMD 189.060
6.2.2 IMAGE 512x512 58.5869 FUSION USING
VEMD 3.227
BEMD Vs VEMD
The BEMD 55.020 results of image
fusion using 256x256 67.8422 VEMD and
VEMD 0.811
BEMD are discussed here.
Two 256*256 BEMD 14.407 images are taken
(figure 6.3b & 128x128 56.7205 figure 6.3c) and
VEMD 0.254
the fused image after applying
VEMD and BEMD 5.016 BEMD are
64x64 35.8286
compared with VEMD 0.140 the reference
image as shown in figure 6.3a.
37
The IMFs of each image after applying VEMD and BEMD are shown in the figure 6.4a and
6.5b respectively. The signal corresponding to the image are also shown.
SIGNAL IMAGE
SIGNAL IMAGE
SIGNAL IMAGE
38
Figure 6.3c Second image to be fused (Image 2)
IMF1
IMF2
IMF3
39
IMF4
IMF5
Residue
IMF1
40
IMF2
IMF3
IMF4
IMF5
Residue
41
Transform (DWT) and Laplacian Pyramid (LAP.PYMD). The fused images are shown in the
figure 6.5a and their corresponding error images are shown in the figure 6.5b.
SA
PCA
DWT
LP
Figure 6.5a Figure showing the fused image using various fusion methods
42
SA
PCA
DWT
LP
Figure 6.5b Figure showing the error image using various fusion methods
43
Figure 6.6a Image set 2 - FLIR image and low light visible Tv image.
SA
PCA
44
DWT
LP
Figure 6.6b Figure showing the fused image for image set – 2.
The performance values are given in the table 6.6 for the evaluation of the four
fusion algorithms. These values corresponds to image set – 1 (figure 6.5). Values are given
for BEMD and VEMD separately.
45
Table 6.6a: Performance metrics for the evaluation of image fusion using BEMD and VEMD
46
Metrics Metrics RMSE MAE BM SD PSNR SAM
Table 6.6b Performance metrics for the evaluation of image fusion using BEMD and VEMD
6.2.3 INFERENCE
From the results obtained it is found that Discrete Wavelet Transform gives better
results for both BEMD and VEMD. BIMFs fused using discrete wavelet transform fetches
good results when compared with VEMD. BEMD using laplacian pyramid performs good
47
following discrete wavelet transform. It is found that both BEMD and VEMD gives exactly
the same results in case of simple averaging as expected. BEMD gives better fused results
than VEMD among four methods, but latter performs faster when compared with BEMD.
The comparison of response times of the same is shown in the table 6.7.
Table 6.7: Relative time of BEMD and VEMD for different fusion methods
Fusion Time
Metrics Time(s)
method difference
BEMD 99.062
SA 12.9256
VEMD 7.664
BEMD 84.948
PCA 11.1686
VEMD 7.606
BEMD 84.580
DWT 11.3897
VEMD 7.426
An important point to
note is that BEMD 93.391 laplacian pyramid
LP 2.3122
performs well if the VEMD 40.390 number of levels is
only 2. But if we increase the
number of levels the performance of both BEMD and VEMD decreases in a exponential
manner as shown in the figure 6.8 which is depicted from the error values as shown in table
6.8. The same case happens with discrete wavelet transform (ie) if we increase the
decomposition levels the performance decreases in an exponential manner (figure 6.9). This
depicts that image fusion using more decomposition levels on IMFs is not giving good
results.
Table 8: Performance values of BEMD and VEMD for increasing laplacian levels
48
No.of laplacian
Metrics RMSE MAE
levels
BEMD 4.7194 1.9834
2
VEMD 5.1987 2.1014
6.0586 2.8540
BEMD
3
VEMD 5.2807 2.6395
30
Root Mean Squared Error
25
20
15
10
0
2 4 6 8 10 12 14
Number of laplacian levels
Figure 6.8 Graph showing the exponentially increasing pattern of RMSE with the increase
of laplacian levels.
Table 6.9 Performance values of BEMD and VEMD for increasing decomposition levels in
discrete wavelet transform
49
No.of
Levels Metrics RMSE MAE
25
Root Mean Square Error
20
15
10
0
1 2 3 4 5 6 7 8
Number of decomposition levels
Figure 6.9 Graph showing pattern of RMSE with the increase of decomposition levels in
discrete wavelet transform
50
Objective quality measures are convenient because they do not have the costs
associated with human subjects. Objective measure will only produce predictable results for
the environment, error conditions and impairments it was developed for. This sensitivity is
especially severe for no-reference measures, which are typically developed to detect one
specific impairment (Image set2 in our case). Objective evaluation often depends on
characteristics which are difficult to infer, even given an undistorted reference. For instance,
visual attention and gaze direction are known to significantly
influence subjective quality [33].
Since multimedia systems typically have human end-users, the definitive quality
measure is given by human perception. Despite many efforts to design accurate objective
measures, so far none have been able to account for all the peculiarities of human
physiological and psychological responses. Thus, subjective tests are generally regarded as
the most reliable methods for assessing image quality. Subjective tests are generally regarded
as the most reliable and definitive methods for assessing image quality, although laboratory
studies are time consuming and expensive.
Subjective assessment method is a man-made visual analysis for fused image, it is simple and
intuitive. In addition to this, it has many advantages, such as it can be used to determine
whether the image has shadow, whether the fusion image texture or color information is
consistent, and whether the clarify has been reduced,etc. Therefore, the subjective assessment
method is often used to compare the edges of fused images. It can get the differences of
images in space decomposed force and clarity intuitively.
The fused image sets are circulated to the 10 volunteers with the true images. Ratings
provided by the volunteers are provided in the table 6.10 as below. The mean and the
standard deviation of these values are calculated which is also shown. The same procedure is
repeated for the image set 2. Volunteers are both experienced and inexperienced in image
processing domain. They are asked to rate the images on a ten scale and repeatability of the
scores are also verified. From the evaluation results, it is found out Discrete Wavelet
Transform got good scores among others for image set 1 and for image set 2 (IR and Tv
images) fusion using laplacian pyramid scored good when followed by DWT and PCA.
Table 6.10a Table showing Subjective evaluation scores for image set 1 using BEMD
51
Subject * BEMD VEMD
SA PCA DWT LP SA PCA DWT LP
1 No 7 8 10 9 7 7.5 10 9
2 yes 5 5 8 9 6 5.5 9 8.5
3 yes 7 6 8.5 9 6 7 8 9
4 No 7 7 8 8 6 7 8.5 8
5 No 7 7 8 8 6 7 8 8
6 yes 6.5 6 8 8 7 6 8 8
7 yes 6 7 9 8 6 7 8.5 8
8 no 7 7 8 8 7 7.5 8 8
9 yes 5 6.5 8 8 6 6 8 8
10 yes 6 7 8 8 5 6 8 8
6.35 6.65 8.35 8.3 6.2 6.65 8.4 8.25
±0.82 ±0.82 ±0.67 ±0.48 ±0.63 ±0.71 ±0.66 ±0.43
* - People working in Image Processing.
Table 6.10b Table showing Subjective evaluation scores for image set 2 using BEMD
Subject * BEMD VEMD
SA PCA DWT LP SA PCA DWT LP
1 No 7 8 8.5 9 7.5 7 8 8.5
2 yes 8 9 7.5 7 8 9 7 7
3 yes 7.5 7 8 8 7 7 8 9
4 No 5 6 7 8 5 6.5 7.5 8
5 No 7 8 6 8 7 7 8 8
6 yes 9 7 5 9 8.5 7 5 8.5
7 yes 6 7 9 8 6.5 7 8.5 8
8 no 8 8 7 8 7 7.5 7 8
9 yes 6.5 7.5 8 8 6.5 7.5 8 8
10 yes 7.5 9 6 8 7 8 6 8
7.15 7.65 7.2 8.1 7 7.35 7.3 8.1
±1.13 ±0.94 ±1.25 ±0.57 ±0.94 ±0.71 ±1.09 ±0.52
* - People working in Image Processing.
CHAPTER 6
CONCLUSION
Two types of Empirical Mode decomposition viz. BEMD and VEMD are studied and it is
found that VEMD performs faster than other. Intrinsic Mode Functions produced from
BEMD and VEMD algorithms are fused separately using four different methods and
evaluated both subjectively and objectively. From the objective evaluation, it is found that
52
BIMFs fused using Discrete Wavelet Transform gives good performance among them and it
is also found that image fusion using BEMD gives comparatively small error than VEMD
although latter gives faster results than BEMD. It is also found that fusion using more number
of decompositions on IMFs is degrading the performance.
The images to be fused are assumed to be already registered. Image registration is not
carried out in this work.
Eleven IMFs are assumed for image fusion using both BEMD and VEMD.
REFERENCES
[1] N. Huang, Z. Shen, S. Long, M. Wu, H. Shih, Q. Zheng, N. Yen, C. Tung, and H. Liu,
“The empirical mode decomposition and Hilbert spectrum for non-linear and non-stationary
time series analysis,” Proceedings of the Royal Society A, vol. 454, pp. 903–995, 1998.
53
[3] Y. Kopsinis, and S. Mclaughlin, “Development of EMD-based de noising methods
inspired by wavelet thresholding”, IEEE Transactions on Signal Proceedings, Vol. 57, no.4,
pp. 1351-1362, Apr. 2009.
[4] G. Rilling, P. Flandrin and P. Goncalves, “On empirical Mode Decomposition and Its
Algorithm”, IEEE-EURASIP Worshop on Non Linear Signal and Image Processing NSIP-
03.
[5] G. Rilling, P. Flandrin and P. Goncalves, “Detrending and Denoising with Empirical
Mode Decomposition”, EUSIPCO-04, Wien.
[7] Wenzhong shi, Yan tian, Ying huang, Haixia mao and Kimfung liu, “A two
dimensional empirical mode decomposition method with application for
fusing panchromatic and multispectral satellite images”, International Journal
of Remote Sensing
Vol. 30, No. 10, 20 May 2009, 2637–2652.
[8] J. C. Nunes, Y. Bouaoune, E. Delechelle, O. Niang and Ph. Bunel, “Image analysis by
bidimensional empirical mode decomposition”, Image Vision Computers. vol.21, pp. 1019–
1026, 2003.
[10] Z. Liu and S. Peng, “Boundary processing of bidimensional EMD using texture
synthesis”, IEEE Signal Process. Lett.vol. 12(1), pp. 33–36, 2005.
[11] Chen, Z., Micchelli, C. A. & Xu, Y, “Fast collocation methods for second kind integral
equations”,SIAM J. Numerical Analysis,Vol. 40, 344–375, 2002.
54
[12] G. Rilling, P. Flandrin, P. Goncalves, and J. M. Lilly, “Bivariate empirical mode
decomposition”, IEEE Signal Processing Letters, vol. 14, no. 12, pp. 936-939, Dec. 2007.
[13] Sharif M. A. Bhuiyan, Jesmin F. khan, Reza r. adhami. “A novel approach of edge
detection via a fast and adaptive bidimensional empirical mode decomposition method”.
Advances in Adaptive Data Analysis vol 2, 171-192, 2010.
[14] J.C. Nunes, S. Guyot and E. Delechelle, “Texture analysis based on local analysis of the
bidimensional empirical mode decomposition”, Machine Vision Applications, Vol. 16(3),
177–188, 2005.
[15] A. Linderhed, “2D empirical mode decompositions in the spirit of image compression”,
Proceedings of SPIE, Vol. 4738, pp.1–8, 2002.
[16] Z. Liu and S. Peng, “Boundary processing of bidimensional EMD using texture
synthesis”, IEEE Signal Processing Letters, Vol. 12(1), pp. 33–36, 2005.
[17] C. Damerval, S. Meignen and V. Perrier, “A fast algorithm for bidimensional EMD”,
IEEE Signal Processing Letters vol. 12(10) , pp. 701–704, 2005.
[20] Kobbelt, L., Campagna, S., Vorsatz, J. & Seidel, H.-P, “Interactive multiresolution
modeling on arbitrary meshes”, Proceedings of Signal Graphs, Vol. 98, pp. 105–114, 2002.
55
[21] Sapidis, N. & Perucchio, K. “Delaunay triangulation of arbitrarily shaped domains”
Computer Aided Geometry, Vol. 8, 421–437, 1991.
[21] Vaidehi.V et. al, “Fusion of Multi-Scale Visible and Thermal Images using EMD for
Improved Face Recognition”, IMECS , Vol I, 2011.
[22] VPS. Naidu and J.R. Roal, “Pixel-level Image Fusion using Wavelets and Principal
Component Analysis”, Defence Science Journal, Vol. 58, No. 3, pp. 338-352, May 2008.
[23] Varsheny, P.K, “ Multi-sensor data fusion”, Electronics and Communication Engg., Vol.
9(12), 245-53, 1997.
[24] Mallet, S.G. “A Theory for multiresolution signal decomposition: The wavelet
representation”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 11(7),
674-93, 1989.
[25] Wang, H.; Peng, J. & Wu, W, “ Fusion algorithm for multisensor image based on
discrete multiwavelet transform”, IEEE Proceedings of Visual Image Signal Processing, Vol.
149(5), 2002.
[26] Burt, P.J. Kolczynski, R.J., “Enhanced image capture through fusion”, 4th International
Conference on Computer Vision, Berlin, Germany, 173-182, 1993.
[28] A.K. Moorthy and A.C. Bovik, “Perceptually significant spatial pooling techniques for
image quality assessment,” Proceedings of SPIE, 2009.
[29] Goetz, A.F. H., J. W. Boardman, and R. H. Yunas. "Discrimination Among Semi-Arid
Landscape Endmembers Using the Spectral Angle Mapper(SAM) Algorithm." Proceedings of
Summeries 3rd Annual JPL Airborne Geoscience Workshop, 147-149, 1992.
56
[30] Wang, Z. & Bovik, A.C. “A universal image quality index”, IEEE Signal Processing
Letters, Vol. 9(9), 81-84, March 2002.
[31] Yu Han, Yunze Cai, Yin Cao, Xiaoming Xu, “A new image fusion performance metric
based on visual information fidelity”, Information fusion, Vol. 14, Issue 2, pp.no 127-135,
April 2013.
[32] Xuande Zhang, Xiangchu Feng, Weiwei Wang, and Wufeng Xue, “Edge Strength
Similarity for Image Quality Assessment”, IEEE Signal Processing letters, Vol. 20, No.4,
2013.
[33] Flavio Ribeiro, Dinei Florencio and Vitor Nascimento, “Crowdsourcing subjective
image quality evaluation”, 18th IEEE conference on Image Processing,Vol.8, pp. no 3158-
3182, 2011.
57