Markovian Regularization of Hermite-Transform-Base
Markovian Regularization of Hermite-Transform-Base
net/publication/253450478
Article in Proceedings of SPIE - The International Society for Optical Engineering · February 2004
DOI: 10.1117/12.511402
CITATIONS READS
0 42
3 authors, including:
Some of the authors of this publication are also working on these related projects:
ANALYSIS OF THE MECHANICAL FUNCTION OF THE LEFT VENTRICLE IN CARDIAC CT SEQUENCES View project
All content following this page was uploaded by José Luis Silván-Cárdenas on 16 October 2014.
ABSTRACT
A novel classification scheme for SAR images based on the perceptual classification of image patterns in the Discrete
Hermite Transform domain has been developed. In order to obtain the DHT referred to a rotated coordinate system the
set of coefficients of a given order are mapped through a unitary transformation based on the generalized binomial
function. This representation allows a perceptual classification, including constant patterns (0-D), oriented structures (1-
D), and non-oriented structures (2-D). Classification is based on light adaptation and contrast masking properties of the
human vision. Finally, classification is improved by means of a probabilistic approach based on Markov Random Fields.
Keywords: SAR image classification, Hermite transform, binomial filters, Markov random fields, multiresolution
analysis, steerable transforms, visual perception.
1. INTRODUCTION
Unsupervised image classification schemes often aim at finding differences among a wide variety of image
characteristics such as texture, gray level, pattern structure, etc. Some of the more efficient approaches combine
statistical methods, such as Markov Random Fields1 with multiresolution analysis methods such as Wavelets2. The
former discriminates classes on the basis of a probabilistic model of the local grey level distribution, while the latter
assumes that images are conformed by objects of different dimensions and contrasts which are uniquely represented in
the scale space. Some of these multiresolution methods are inspired in important properties of human vision such as the
Hermite Transform. This transform has been proposed as a robust model for image representation since it carries out
similar operations to those present in the primary stages of the human visual system3. These operators are modeled as
Gaussian derivatives and have been successfully used in several applications, including restoration4, coding 5 and optic
flow estimation6. They possess symmetry properties related to translation, rotation and scaling which are ideal for the
analysis of the geometric properties of images7, 8. The discrete approach of these operators based on binomial filters
satisfies a number of recursive relationships that can be efficiently implemented using adders and delays only. Moreover,
these discrete filters present interesting orthogonality properties. In this work we use the discrete version of the Hermite
transform (DHT). For the case of SAR imagery, an algorithm for speckle reduction, also based on the Hermite
Transform, is used before classification4. Our approach, concerned with visual perception states that psychophysical
redundancy can be taken care of by eliminating the transform coefficients that do not contribute to the perceived image
structure. Furthermore, we state that image structures can be 0D, 1D or 2D, depending on the information lying within
the coefficients and certain perceptual thresholds. This paper also presents how this pattern classification can be
regularized by means of Markov Random Fields. This approach allows a better estimation of structure parameters,
especially in the case of 1D structures where orientation is a critical parameter.
*
[email protected]; phone +52-55-56223016; fax +52-55-561610173; http://verona.fi-p.unam.mx/amovip.html
2. THE DISCRETE HERMITE TRANSFORM
The Discrete Hermite Transform (DHT) maps the image data into a set of spectral coefficients from which it is possible
to recover the original image:
Let Wk = 2 − N C Nk denote the symmetric binomial window and let
Bn,k = 2− N ∆n C Nk −−nn , { }
for k = 0,…,N, denote the binomial function of order n, for n = 0,…,N, where Cn = n! /(n − k )! are the binomial
k
∑ (−1)
n
n− j
∆n f k = Cnj f k + j
j =0
is the n-th order forward difference of the discrete sequence fk. Then, the DHT is obtained by convolving the input signal
with the binomial filters Dn,k = WnBn,N-k and taking the outputs at points in a given lattice3.
The binomial functions satisfy a number of properties that can be expressed using the matrix notation B = [Bi,j]i,j=0,…,N ,
where each row represents a mask of analysis. Therefore, the outputs of the filtered signal by the binomial filters at a
generic lattice point can be obtained by applying the matrix D = WB to a vector containing the samples of the signal,
where W = diag(Wi)i=0,…,N is a diagonal matrix of the window function.
The two most appealing properties of the binomial functions are the orthogonality BB = I and the symmetry W2B =
BTW2, which leads to the following identity for the binomial filters: DTD = W2. Thus, if F is a matrix containing the data
of a block of (N + 1) × (N + 1) pixels, then G = DFDT are the spectral coefficients of the DHT. The DHT is said to be a
local expansion of the image. This is expressed by W2FW2 = DGDT , which is the windowed block of data.
3. LOCAL ROTATION
The rotation in the DHT domain can be seen as a mapping of the spectral coefficients (for each point in the sampling
lattice) through a linear transformation5. This mapping is done over the set of coefficients of the same total order by
means of the generalized binomial functions
Bn( N,k ,θ ) = tan k +n θ∆n C Nk −−nn cos 2 k θ sin N −2 k θ
for 0 ≤ θ ≤ π . Notice that when the slant parameter θ π /4 π /4
is set to we get the symmetric case, i.e., Bn,k = Bn ,k . If
gn for n = 0,…, 2N denote the column vectors formed by the anti-diagonals of matrix G, then the coefficients in a
(θ ) (θ)
coordinate system that is rotated by an angle θ are given by g n = R gn, with the rotation matrix R(θ)=WB(θ)W-1.
Since R is unitary, the inverse rotation is achieved by applying the transposed matrix RT . Hereafter, we refer to the
(θ )
rotated coefficients as the matrix G(θ) whose anti-diagonals are the vectors g n . In the remaining work we consider the
angle θ = arctan (G0,1=G 1,0), which is nothing but the gradient orientation.
4. PATTERN CLASSIFICATION
Since image content is not homogenous, it is convenient to separate the space of blocks into three classes according to
the dimensionality of the underlying pattern. The dimensionality of a pattern is directly related to the number of
coefficients needed to code the block. For blocks over a constant-luminance background all the AC-coefficients vanish.
Therefore, only a DC-coefficient (G0;0) is needed to code the block. For blocks containing a strongly oriented structure,
(θ )
only the coefficients Gi , 0 , for i = 0,…,N, are required to represent that pattern (for simplicity we omit the superscript
hereafter). The third general class comprises all the non-oriented patterns like corners, junctions, dots, etc.
Figure 1. Construction of (a) light adaptation and (b) contrast masking thresholds.
To produce a partition of the block space we assume that each of these three classes can be well described by the
following variables:
∑
1/ 2
∑∑
1/ 2
N N
2
2 N 2
L = G0, 0 , C= G −L
2
i, j
, ∆C = C − Gi,0
i=0 j=0 i=1
which stand for mean luminance, contrast, and 1-D residual contrast respectively. Let the three classes be labeled as 0-D,
1-D and 2-D. The classification is done in two steps. First, the 0-D class is separated by means of the following
comparison. If C < k 0Cthr ( L) , then 0-D, otherwise 1-D or 2-D, where Cthr(L) is the light adaptation threshold 9. This
threshold can be inferred from the image of Fig. 1-(a). The image presents white Gaussian noise with mean varying
linearly along the horizontal direction, while the variance varies linearly along the vertical direction. The curve plotted
over the image represents a typical detection threshold. This threshold determines the limit beyond which local contrast
becomes visually relevant. The light adaptation threshold was fitted by the following model
1/ α
Lα − Lαmin
Cthr = C min + α
L + Lαmin
where the exponent α is in the interval [0,1] (typically 0.6). Cmin is the minimum contrast at luminance level L min that is
the level at which the eye has a maximum contrast sensibility. Fig. 1-(a) shows the theoretical curve of the previous
equation and the measured curve along the image profile on the image.
Fig. 2 SAR ERS2 image over México City after speckle reduction
In the second step, we separate the 1-D blocks from the remaining blocks by means of the following comparison. If
∆C < k1∆Cthr ( L, C ) then 1-D, otherwise 2-D, where ∆Cthr ( L, C ) is the contrast masking threshold10. Ideally, for
one-dimensional patterns all the energy is fully contained along the gradient orientation and the residual contrast
becomes zero. In practice, however, there is a residual contrast different from zero even in the cases when oriented
structures are well perceived. Contrast masking refers precisely to the reduction in the visibility of a component in the
image due to the presence of another. The contrast masking model used here is based on the works of Legge and
Foley10,11:
(
∆C thr = max C thr , C β ⋅ C thr
1− β
)
where the exponent β is in the interval [0,1]. Note that if β = 0 no masking exists and the threshold is the constant Cthr. If
β = 1 it results in what is commonly called the Weber’s law. A typical value of β is 0.7. In Fig. 1-(b) we have plotted a
stimulus image built with a sinusoid wave grating plus uncorrelated Gaussian noise. The amplitude of the sinusoid was
varied linearly along the vertical axis, while the standard deviation of noise was varied linearly along the horizontal axis.
In the figure it is also shown the theoretical curve and the measured values for the image profile shown to the left.
Fig. 3. Pattern classification at high resolution. Left: + (red) signs stand for 0D patterns, ◊ (green) signs for 1D patterns,
∆ (blue) signs for 2D patterns. Right: Local pattern orientations. Note that longest vectors correspond to 1D patterns
5. MARKOVIAN REGULARIZATION
From the results of orientation estimation at high resolution shown in Fig. 3, it is clear that vector field estimation can be
improved by regularizing the vector field with a probabilistic approach that takes into account neighborhood spatial
relations. In order to estimate the true vector field from the output vector field we follow a similar approach to the
problem of image restoration. Let us take a Markov Random Field (MRF) model for piecewise constant image
restoration where the goal is to recover true pixel values f, from the observed image values d. In our case, let
d = {d 1 ,..., d m } be an observation, i.e., a rectangular array of vector values, f = { f 1 ,..., f m } the set of sampled vectors,
which is assumed to be a realization of a MRF, L = {l1 ,..., l M }the label set composed of vector values and F the set of all
possible labelings. According to the Hammersley-Clifford theorem and the theory for Bayes labeling of MRFs, an
optimal estimate, known as the MAP estimate, is given by:
In the MAP-MRF labeling, P ( f | d ) is the a-posteriori distribution of an MRF, and the a-posteriori probability is
P( f | d )α e −U ( f |d ) , where U ( f | d ) = U (d | f ) + U ( f ) is the a-posteriori energy. The MAP estimate can then be found by
minimizing f * = arg min U ( f | d ) , known as the a-posteriori energy.
f
For piecewise constant surfaces we can define the clique potentials using multi-level logistic (MLL), that is considering
the clique potentials
The statement “all sites in c have the same label”, i.e. all { f i | i ∈ c} are the same, implies the entire smoothness of
labels f on the clique c. Any violation of the entire smoothness incurs in a penalty of the positive number, i.e, ξc < 0 .
Since the more likely configurations are those with higher P(f) or lower U(f) values, the MLL model favors smooth f. A
special case is such that Vc is nonzero only for the pair-site cliques and zero for all the other types. In this case,
The a-priori energy is the sum of all clique potentials, this is,
U( f ) = ∑ ∑v
i∈S i '∈N i
20 [1 − δ ( f i − f i ' )] .
The likelihood energy U(d|f) can be determined using a general observation model expressed as d = ϕ ( B ( f ))Θe ,
where B is a blurring effect, ϕ is a transformation, e is the sensor noise and Θ is an operator. These parameters are
relevant for image restoration, however, in our case we assume no blurring, nor transformation nor noise, then U(d|f) will
be zero and U(f|d)=U(f).
We need to minimize U(f) and since L d is discrete we deal with a combinatorial problem. The simplest algorithm is
steepest local energy descent. ICM (Iterated Conditional Modes) is an algorithm that uses the “greedy” strategy in the
iterative local maximization of probability. Given the data d and f S −{i} the set of labels at the sites in S − {}
i , the
algorithm sequentially updates each fi(k) into fi(k+1) by maximizing p ( d i | f i ) P ( f i | N i ) , the conditional (posterior)
probability with respect to fi. Maximizing the above is equivalent to minimizing the corresponding posterior potential
using the following rule
V ( f i | d i , f Ni ) = v20 ∑ [1 − δ ( f | f )]
i '∈Ni
i i' ,
where ∑ [1 − δ ( f | f )]
i '∈Ni
i i' is the number of neighboring sites whose labels fi’ differs from fi. For discrete L, V(fi|d i,fNi) is
(k +)
evaluated for each fi ∈ L and the label causing the lowest V(fi|di,fNi) value is chosen as true value for f i . When
applied to each label in turn, the above defines an updating cycle of ICM.
Then, for the estimation of the true vector field problem, we propose to use the same methodology with the following
considerations.
Let d = {d 1 ,..., d m }be an observation representing a rectangular array of vector angles values, f = { f 1 ,..., f m } the
set of sampled vector angles, which is assumed to be a realization of a MRF and L = {l1 ,..., l M }the label set, where
labels are vector angle values. We then minimize V ( f i | d i , f Ni ) = ∑ [1 − δ ( f | f )]
i '∈ Ni
i i' where ∑ [1 − δ ( f | f )]
i '∈Ni
i i'
is the number of neighboring sites whose labels fi’ differs from fi.
Figure 5 shows a crop of the original vector field and the regularized one as a result of applying the above method.
We have developed a method for SAR image classification based on the discrete Hermite transform. This transform is a
computational efficient model that possesses interesting mathematical and psychophysical properties. In order for the
algorithm to be efficient, speckle reduction should be applied before classification. 0D, 1D and 2D patterns are classified
by comparing parameters extracted from DHT coefficients, such as mean luminance, contrast and 1D residual contrast
with light adaptation and contrast masking thresholds. A Markov random field restoration approach is used to estimate
the true vector field of oriented structures. Results of this classification scheme show that 0D patterns correspond to flat
surfaces on the ground, such as water reservoirs and large flat surfaces. 1D patterns correspond to streets, large urban
constructions, and boundaries between different regions. 2D patterns include inhabiting zones and irregular surfaces.
Analysis of the evolution of classes along different spatial resolutions may help classifying zones accordingly: large
objects at low resolutions while fine detail at higher resolutions.
ACKNOWLEDGMENTS
This work was partially supported by PAPIIT grant IN107101. SAR ERS2 images were provided by the European Space
Agency under contract A03-318. We are also indebted to Centro GEO for providing support and means to carry out this
work.
REFERENCES
1. R. Chellappa and S. Chartterjee, “Classification of textures using Gaussian Markov random fields”, IEEE
Transactions on Acooustics, Speech and Signal Processing 33, 1985, pp. 959-963.
2. M. Unser, “Texture classification and segmentation using wavelet frames”, IEEE Transactions on Image
Processing 4, 1995, pp. 1549-1560.
3. J.-B. Martens, "The Hermite transform-theory", IEEE Transactions on Acoustics, Speech and Signal Processing,
vol. 389, 1990 , pp. 1595 -1606.
4. P. Camarillo-Sandoval, A. Varela-López, B. Escalante-Ramírez "Adaptive Multiplicative-Noise Reduction in
SAR Images with Polynomial Transforms", IGARSS '98, IEEE Geoscience and Remote Sensing Society,
Proceedings, July 1998, pp. 1171-1173.
5. J. L. Silván-Cárdenas, B. Escalante-Ramírez, “Image coding with a directional-oriented discrete Hermite
transform on a hexagonal sampling lattice”, Applications of Digital Image Processing XXIV (A.G. Tescher, Ed.),
Proceedings SPIE, vol. 4472, 2001, pp. 528-536.
6. B. Escalante-Ramírez, J. L. Silván-Cárdenas, "Motion Analysis and Classification with Directional Gaussian
Derivatives in Image Sequences", Advanced Signal Processing Algorithms, Architectures, and Implementations X
(F.T. Luk, Ed), Proceedings SPIE, vol. 4116, 2000, pp. 447-453.
7. J. J Koenderink, A. J. van Doorn, "Generic neighborhood operators", IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 146, 1992, pp. 597 -605.
8. Z.-Q. Liu, R.M. Rangayyan, C.B. Frank, "Directional analysis of images in scale space". IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 13 11, 1991, pp. 1185 -1192.
9. H. B. Barlow, Handbook of Sensory Physiology, Dark and Light Adaptation: Psychophysics, (L. Hurvich and D.
Jameson, Eds.), New York, 1972.
10. G. E. Legge and J. M. Foley, “Contrast masking in human vision,” Journal of the Optical Society of America vol.
70 (12), 1980, pp. 1458–1471.
11. G. E. Legge, “A power law for contrast discrimination,” Vision Research, vol. 21, 1981, pp. 457–467.