0% found this document useful (0 votes)

29 views

Statistical Modeling of Texture Sketch: Abstract. Recent Results On Sparse Coding and Independent Component

1. The document proposes a two-level generative model for textures. At the bottom level, the texture image is represented by a linear superposition of image bases. At the top level, a Markov model is assumed for the placement of the image bases or the sketch. 2. The sketch model captures the geometrical aspect of the texture image by modeling the locations, orientations, and elongations of the active image bases. Each active image base can be visually depicted by a small line segment. 3. Statistical modeling of the sketch is explored, using a Markov chain model whose conditional distributions are characterized by simple geometrical feature statistics selected from a predefined vocabulary.

Uploaded by

Gautam Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Statistical Modeling of Texture Sketch: Abstract. Recent Results On Sparse Coding and Independent Component

Uploaded by

Gautam Jha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Statistical Modeling of Texture Sketch

Ying Nian Wu1 , Song Chun Zhu2 , and Cheng-en Guo2

1
Dept. of Statistics, Univ. of California, Los Angeles, CA 90095, USA
[email protected]
2
Dept. of Comp. and Info. Sci., Ohio State Univ., Columbus, OH 43210, USA
{szhu, cguo}@cis.ohio-state.edu

Abstract. Recent results on sparse coding and independent component

analysis suggest that human vision ﬁrst represents a visual image by a
linear superposition of a relatively small number of localized, elongate,
oriented image bases. With this representation, the sketch of an image
consists of the locations, orientations, and elongations of the image
bases, and the sketch can be visually illustrated by depicting each image
base by a linelet of the same length and orientation. Built on the insight
of sparse and independent component analysis, we propose a two-level
generative model for textures. At the bottom-level, the texture image is
represented by a linear superposition of image bases. At the top-level, a
Markov model is assumed for the placement of the image bases or the
sketch, and the model is characterized by a set of simple geometrical
feature statistics.

Keywords: Independent component analysis; Matching pursuit; Mini-

max entropy learning; Sparse coding; Texture modeling.

1 Introduction

As argued by Mumford (1996) and many other researchers, the problem of vision
can be posted in the framework of statistical modeling and inferential computing.
That is, top-down generative models can be constructed to represent a visual
system’s knowledge in the form of probability distributions of the observed im-
ages as well as variables that describe the visual world, then visual learning and
perception become a statistical inference (and model selection) problem that can
be solved in principle by computing the likelihood or posterior distribution. To
guide reliable inference, the generative models should be realistic, and this can
be checked by visually examining random samples generated by the models.
Recently, there has been some progress on modeling textures. Most of the
recent models involve linear ﬁlters for extracting local image features, and the
texture patterns are characterized by statistics of local features. In particular, in-
spired by the work of Heeger and Bergen (1996), Zhu, Wu, and Mumford (1997)
and Wu, Zhu, and Liu (2001) developed a self-consistent statistical theory for
texture modeling; borrowing results from statistical mechanics, they showed that
a class of Markov random ﬁeld models is a natural choice under the assumption

A. Heyden et al. (Eds.): ECCV 2002, LNCS 2352, pp. 240–254, 2002.
c Springer-Verlag Berlin Heidelberg 2002

Statistical Modeling of Texture Sketch 241

that texture impressions are decided by histograms of ﬁlter responses. Mean-

while, Portilla and Simoncelli (2000), using steerable pyramid, made a thorough
investigation of feature statistics for representing texture patterns. These mod-
els appear to be very successful in capturing stochastic textures, but seem to
be less effective for textures with salient local structures and regular spatial
arrangements.
Meanwhile, there have been major developments in sparse coding (Olshausen
and Field, 1996) and independent component analysis (Bell and Sejnowski,
1997). These methods represent natural images (actually image patches) by lin-
ear superposition of image bases. By asking for sparseness or independence of the
coefficients in the linear decomposition, localized, elongate and oriented image
bases can be learned from natural images. The learned bases constitute a general
and efficient vocabulary for image representation. The idea of sparseness has also
been studied in wavelet community (e.g., Mallat and Zhang, 1993; Candes and
Donoho, 2000).
This inspired us to re-consider the role of linear filters in texture models. The
filter responses are inner products between the image and localized image bases,
and this is a bottom-up operation. However, generative models are for top-down
representation, and therefore it appears to be more appropriate for the images
bases to play a representational role as in sparse and independent component
analysis. This consideration promoted us to construct texture models based on
linear representation instead of linear operation. The resulting model is a two-
level top-down model. At the bottom level, the texture image is represented by
linear superposition of image bases, and at the top level, the spatial arrange-
ments of image bases and their coefficients are to be modeled. The sparse and
independent component analysis suggests that with well-designed image bases,
the coefficients should become much sparser and less dependent than the raw im-
age intensities, and thus more susceptible to modeling. Guided by the top-down
model, local structures and their organizations can be more sharply captured,
because variable selection or explaining-away effect can be easily achieved in
model-based context.
To be more specific, the spatial model for the image bases and their coef-
ficients incorporate sparseness in the sense that only a small number of image
bases have non-zero coefficients or are active, so that the locations, orientations,
and elongations of the active image bases capture the geometrical aspect of the
texture image, or they tell us how to sketch the image, while the coefficients of
these active bases capture the photometrical aspect of the image, or they tell
us how to paint the image given the sketch. To visually illustrate the sketch,
we can depict each active image base by a small line segment or a linelet of
the same location, orientation, and length, as first proposed by Bergeaud and
Mallat (1996). The sparse decomposition in general achieves about 100 folds of
dimension reduction.
The two-level top-down model is a hidden Markov model, with the coefficients
of image bases being latent variables or missing data. While rigorous model fit-
ting typically involves EM-type algorithms (Dempster, Laird, and Rubin, 1977),
242 Y.N. Wu, S.C. Zhu, and C.-e. Guo

in this article, we shall isolate the problem of modeling the sketch of the texture
images, while assuming that the sketch can be obtained from the image. Our
sketch model is a causal Markov chain model whose conditional distributions
are characterized by a set of simple geometrical feature statistics automatically
selected from a pre-deﬁned vocabulary.
Embellished versions of our model can be useful in the following regards. In
computer vision, it can be used for image segmentation and perceptual grouping.
In computer graphics, as the sketch captures the geometrical essence of images, it
may be used for non-photo realistic rendering. For understanding human vision,
it provides a model-based representational theory for Marr’s primal sketch (Marr
1982).

2 Sketching the Image

2.1 Sparse Coding and Independent Component Analysis

The essential idea of sparse coding (Olshausen and Field, 1996), independent
component analysis (Bell and Sejnowski, 1999), and their combination (Lewiki
and Olshausen, 1999) is the assumption that an image I can be represented as
the superposition of a set of image bases. The bases are selected from an over-
complete basis (vocabulary) {b(x,y,l,θ,e) }, where (x, y) is the central position of
the base on the image domain, and (l, θ, e) is the type of the base. l is the scale
or length of the base, θ the orientation, and e the indicator for even/odd bases.
The DC components of the bases are 0, and the l2 norm of the bases are 1.
Therefore,

I= c(x,y,l,θ,e) b(x,y,l,θ,e) + N (0, σ 2 ), (1)
(x,y,l,θ,e)

c(x,y,l,θ,e) ∼ p(c) = ρδ0 + (1 − ρ)N (0, τ 2 ), independently. (2)

The base coeﬃcients c(x,y,l,θ,e) is assumed to be independently distributed ac-

cording to a distribution p(c), which is the mixture of a point mass at 0 (then
the base is said to be inactive) and a Gaussian distribution with a large variance
(for active state).
In general, the basis (vocabulary) can be learned from natural images by
minimizing the number of active bases (i.e. sparse coding) or by maximizing a
measure of independence between the coefficients.
In our experiments, we select a set of bases for vocabulary b(x,y,l,θ,e) , as shown
in Figure 1. We use 128 difference of offset Gaussian (DOOG) filters (Malik and
Perona, 1989) at various scales and orientations, even and odd. There are 5
scales, and we adopt a curvelet design of Candes and Donoho (2000), that is, for
smaller scales, the bases become more elongate and there are more orientations.
We also use the center-surround Difference of Gaussian (DOG) filters at 8 scales
to account for variations that cannot be efficiently captured by the DOOG bases.
Statistical Modeling of Texture Sketch 243

Fig. 1. Image bases used in representation and modeling.

2.2 Matching Pursuit and Its MCMC Version

Since the bases are over-completed, the coefficients have to be inferred by sam-
pling a posterior probability. In this section, we extend the heuristic matching
pursuit algorithm of Mallat and Zhang (1993) to a more principled Markov chain
Monte Carlo (MCMC) algorithm.
We sample the posterior distribution of the coefficients p({c(x,y,l,θ,e) } | I) in
Model (1) in order to find a symbolic representation or a sketch of image I. We
assume the parameters (ρ, τ 2 , σ 2 ) are known for this model. First, let’s consider
the Gibbs sampler for posterior sampling. For simplicity, we use j or i to index
(x, y, l, θ, e) and we define zj = 1 if bj is active, i.e., cj = 0, and zj = 0 otherwise.
Then the algorithm is as follows:

1. Randomly select a base bj . Compute R = I − i=j ci bi , i.e., the residual
image. Let rj =< R, bj > /(1 + σ 2 /τ 2 ), and σ 2 = 1/(1/σ 2 + 1/τ 2 ).
2. Compute the Bayes factor by integrating out cj ,

p(I | zj = 1; {ci , ∀i = j}, I) rj2

γj = = exp{ 2 } σ 2 /τ 2 .
p(I | zj = 0; {ci , ∀i = j}, I) 2σ
244 Y.N. Wu, S.C. Zhu, and C.-e. Guo

Then the posterior probability

(1 − ρ)γj
p(zj = 1 | {ci , ∀i = j}, I) = .
ρ + (1 − ρ)γj
3. Sample zj according to the above probability. If zj = 0, then set cj = 0,
otherwise, sample cj ∼ N (rj , σ 2 ). Go back to [1].

The problem with this Gibbs sampler is that if σ 2 is small, the algorithm
is too willing to activate the base bj even though the response rj is not that
large, or in other words, the algorithm is too willing to jump into a local energy
minimum. The idea of matching pursuit of Mallat and Zhang (1993) can be
used to remedy this problem. That is, instead of randomly selecting a base bj to
update, we randomly choose a window W on the image domain, and look at all
the inactive bases within this window W . Then we sample one base by letting
these bases compete for an entry. So we have the following windowed-Gibbs
sampler or a Metropolized matching pursuit algorithm:

1. Randomly select a window W on the image domain, let A be the number of

active bases within W , and let B be the number of inactive bases within W .
With probability pbirth (a pre-designed number), go to [2]. With probability
pdeath = 1 − pbirth , go to [4].
2. For each inactive base j = (x, y, l, θ, e) with (x, y) ∈ W and zj = 0, compute
γj as described in [1] and [2] of the Gibbs sampler. Then with probability

pdeath j:zj =0; (x,y)∈W (1 − ρ)γj /ρ
paccept = ,
pbirth × (A + 1)
go to [3], with probability 1 − paccept go back to [1].
3. Among all the inactive bases j with (x, y) ∈ W and zj = 0, sample a base
j with probability proportional to γj , then let zj = 1 and sample cj as
described in [3] of the Gibbs sampler. Go back to [1].
4. If A > 0, then randomly select an active base bk with zk = 1 and (x, y) ∈ W .
Then temporarily turn oﬀ bk , i.e., set ck = 0, and A → A − 1 temporarily.
Then for all the inactive bases j with zj = 0 and (x, y) ∈ W , including base
k, do the same computation as [2] (as if ck = 0), and compute paccept as in
[2].
5. With probability 1/paccept , accept the proposal of deleting the base k, i.e.,
set ck = 0. Go back to [1]. With probability 1 − 1/paccept , reject the proposal
of deleting base k, i.e., recover the original ck . Go back to [1].

This is a Metropolis algorithm with two types of proposals, adding an inactive

base and deleting an active base, and this is a birth and death move. In addition
to that, we can easily incorporate updating schemes, including 1) perturbing
the coefficient of an active base, i.e., updating cj ; 2) moving an active base to a
different position, i.e., updating (x, y); 3) shortening or stretching an active base
by changing its length l; 4) rotating an active base to a different orientation, i.e.,
updating θ.
Statistical Modeling of Texture Sketch 245

For the simple model (1), if we let σ 2 → 0, then the Metropolis algorithm
described above goes to the windowed version of the matching pursuit algorithm.
In this paper, we shall just use the latter algorithm for simplicity. We feel that
the MCMC version of matching pursuit is useful in two aspects. Conceptually, it
helps us to understand matching pursuit as a limiting MCMC for posterior sam-
pling. Practically, we believe that the MCMC version, especially with the moves
for updating coeﬃcients, positions, lengths, orientations of the active bases, is
useful for us to re-estimate the image sketch after we ﬁt a better prior model for
sketch.

a b c d

Fig. 2. Sparse decomposition by overcomplete basis and the symbolic sketch.

Experiments I. Figure 2 shows some results of sparse decomposition with

overcomplete basis, and more results are shown in Figures 3 and 4. Figure 2.a
are observed images, Figure 2.b are reconstructed images with a small number
of bases whose coeﬃcients are larger than 5, |cj | > 5, the rate of compression is
usually about 100 to 150 folds, e.g., for a 200× 200 image, we only need about
300 bases. Figure 2.c is a ﬁner reconstruction with all bases whose |cj | > 2, the
rate of compression is about 30. For each selected base b(x,y,l,θ,e) , we illustrate
it symbolically by a linelet of the same length l and orientation θ, while ignoring
the brightness c(x,y,l,θ,e) and the odd/even indicator e. All the linelets are of the
same width. Figure 2.d shows the sketches of the images.

3 Modeling the Image Sketch

Now we shall improve the simple prior model (1) by a sophisticated sketch model
that accounts for the spatial arrangements of image bases.
246 Y.N. Wu, S.C. Zhu, and C.-e. Guo

We may use the following two representations interchangeably for the sketch
of a texture image I, and let’s denote the sketch by S.
1. A list: let n be the number of active bases, then we have

S = {st = (xt , yt , ct , lt , θt , et ), t = 1, ..., n}.

2. A bit-map: let δx,y be the indicator of whether there is an active base at

(x, y) or not, then

S = {sx,y = (δx,y , cx,y , lx,y , θx,y , ex,y ), ∀(x, y)}.

If δx,y = 0, i.e., there is no active base, then all the (c, l, θ, e) take null values.
Using the ﬁrst representation, the two-level top-down model is of the follow-
ing form:

S = {st = (xt , yt , ct , lt , θt , et ), t = 1, ..., n} ∼ sketch model(Λ),

I= ct bxt ,yt ,lt ,θt ,et + .
st ∈S

Clearly, the sparse and independent component analysis is a special case of

the above model, where at the top level, (xt , yt ) follow Poisson point process,
(lt , θt , et ) follow independent uniform distributions, and ct follows a normal dis-
tribution with a large variance. The sketch S is the latent variable or missing
data, and Λ is the unknown parameter. So the overall model should be ﬁtted by
the EM algorithm or its stochastic versions. The general form of such a model
ﬁtting procedure is to iterate the following two steps.
1. Scene reconstruction: Estimate S from I, conditional on Λ.
2. Scene understanding: Estimate Λ from S.
In our current work, we isolate the problem of modeling S, assuming that S
can be inferred from I by matching pursuit algorithm and its MCMC version
with the simple sparse and independent prior model on S. To further simplify the
problem, we ignore c and e, i.e., the photometrical aspects, and only concentrate
on the modeling of (l, θ), i.e., the geometrical aspects of the sketch S.

3.1 Previous Models on Textures

1. MRF and FRAME models. The FRAME model by Zhu, et al. (1997) incor-
porates the idea of large ﬁlters and histogram[10] into Markov random ﬁelds.
1
p(I | Λ) = exp{ λl,θ,e (< I, b(x,y,l,θ,e) >)}, (3)
Z(Λ) x,y
l,θ,e

where Λ = {λl,θ,e ( ), ∀(l, θ, e)} is a collection of one-dimensional functions, and

Z(Λ) is the normalizing constant. This model is of the Gibbs form. The λ func-
tions can be quantitized into step functions. This model is derived and justiﬁed
Statistical Modeling of Texture Sketch 247

as the maximum entropy distribution that reproduces the marginal distributions

(or histograms in the quantitized case) of < Iobs , b(x,y,l,θ,e) > where Iobs is the
observed image.
If we compare model (3) with model (1), we can see that in FRAME model
(3), the b(x,y,l,θ,e) are used for bottom-up operation, and the model is based on
features < I, b(x,y,l,θ,e) >. In contrast, in model (1), the b(x,y,l,θ,e) are used for
top-down representation, and the model is based on latent variables c(x,y,l,θ,e) .
We believe that the feature-based models and latent-variable models are two
major classes of models, and the former can be modiﬁed into the latter if we
replace features by latent variables. Of course, this is in general not a trivial
step.
As to the spatial model for point process S = {st = (xt , yt , lt , θt ), t = 1, ..., n},
Guo, et al. (2001) proposed a Gestalt model of the Gibbs form

1
p(S | Λ) = exp{λ0 n + λ1 (st ) + λ2 (st1 , st2 )}, (4)
Z(Λ) t t ∼t 1 2

where t1 ∼ t2 means that st1 and st2 are neighbors. So this model is a pair-
potential Gibbs point process model (e.g., Stoyan, Kendall, Mecke, 1985). Guo
et al. (2000) further parameterized the model by introducing a small set of
Gestalt features (Koffka, 1935) for spatial arrangement. Again, the model can
be justified by the maximum entropy principle.
The Gibbs models (3) and (4) are analytically intractable because of the in-
tractability of the normalizing constant Z(Λ). Sampling and maximum likelihood
estimation (MLE) have to be done by MCMC algorithms.
2. Causal Markov models. One way to get around this difficulty is to use a
causal Markov model. The causal methods have been used extensively in early
work on texture generation, most notably, the causal model of Popat and Picard
(1993). In the causal model, the joint distribution of the image I is factorized
into a sequence of conditional distribution by imposing a linear order on all the
pixels (x, y), e.g.,
m
n
p(I) = p(Ix,y | IN (x,y) ),
x=1 y=1

where x, y index the pixel, and N (x, y) is the neighboring pixels within a certain
spatial distance that are scanned before (x, y) The causal model is analytically
tractable because of the factorization form.
The causal plan has been successfully incorporated in example-based texture
synthesis by Efros and Leung (1999), Liang et al. (2001), Efros and Freeman
(2001). Hundred of realistic textures have been synthesized.

3.2 Modeling the Sketch Patterns

With the above preparation, we are ready to describe our model for image sketch.
Let S be the sketch of I, and let SN (x,y) be the sketch of the causal neighborhood
248 Y.N. Wu, S.C. Zhu, and C.-e. Guo

of (x, y). Recall that both S and SN (x,y) have two representations. Our model
is of the following causal form
m
n
p(S) = p(sx,y | SN (x,y) ).
x=1 y=1

The conditional distribution is

1
p(sx,y | SN (x,y) ) = exp{λ0 δx,y + λ1 (lx,y , θx,y )
Z(Λ | SN (x,y) )

+ λ2 (lx,y , θx,y ; lt , θt ; xt − x, yt − y)},
st ∈SN (x,y)

where Z is the normalizing constant, and if lx,y and θx,y take on the null values
when δx,y = 0, λ1 ( ) and λ2 ( ) are 0. If λ1 ( ) and λ2 ( ) are always 0, then
the model reduces to the simple Poisson model (1). Like in the FRAME model
(Zhu, et al. 1997) and the Gestalt model (Guo, et al. 2001), this conditional
distribution can be derived or justiﬁed as the maximum entropy distribution
that reproduces the probability that there exists a linelet, the distribution of the
orientation and length of the linelet, and the pair-wise conﬁguration made up
by this linelet and a nearby existing linelet.
In this model, the probability that we sketch a linelet at (x, y) depends on
the attributes of this linelet, and more important, how this linelet lines up with
existing linelets nearby, for instance, whether the proposed linelet connects with
a nearby existing linelet, or whether the proposed linelet is parallel with a nearly
existing linelet, etc. One can envisage this conditional model as modeling the way
an artist sketches a picture by adding one stroke at a time. Similar maximum
entropy models are also used in language[3].
We can also write the conditional model in a log-additive form
p(δx,y = 1, lx,y , θx,y | SN (x,y) )
log = λ0 + λ1 (lx,y , θx,y )
p(δx,y = 0 | SN (x,y) )

+ λ2 (lx,y , θx,y ; lt , θt ; xt − x, yt − y).
st ∈SN (x,y)

One may argue that a causal model for spatial point process is very con-
trived. We agree. A causal order is physically nonsensical. But for the purpose
of representing visual knowledge, it has some conceptual advantages because of
its analytical tractability. The situation is very similar to view-based methods
for objective recognition. Moreover, the model is also suitable for the purpose of
coding and compression. Mathematically, one may view this model as a causal
(or factorization) approximation to the Gestalt model of Guo et al. (2001). It
is expected that the causal approximation loses some of the expressive power of
the non-causal model, but this may be compensated by making the causal model
more redundant.
The current form of the model only accounts for pair-wise conﬁgurations of
the linelets. We can easily extend the model to account for conﬁgurations that
involves more than two linelets.
Statistical Modeling of Texture Sketch 249

3.3 Feature Statistics, Model Fitting, and Feature Pursuit

Because the length and orientation has been taken care of by λ1 (ls,y , θx,y ), we
choose to parameterize pair-wise configuration function λ2 ( ) as λ2 (D(sx,y , st ),
A(sx,y , st )), for st ∈ SN (x,y) , where D( ) is the smallest distance between the
two linelets, and A( ) the angle between the two linelets. Note that there is some
loss of information by this parameterization, in particular, it cannot distinguish
between two crossing linelets and two touching linelets if the angles are the same,
because in both cases, the smallest distance is 0. But still, we would like to see
how far we can go with this simple parameterization.
We could further express λ1 (l, θ) = λ11 (l) + λ12 (θ) + λ13 (l, θ), i.e., decom-
pose λ1 into the main effects for l and θ respectively, and the interaction or
combination between l and θ. This is the ANOVA (analysis of variance) decom-
position, which is redundant unless we add some constraints. We do not need
to worry about this issue because we use this ANOVA decomposition for the
sake of feature selection to be described later, and the feature selection method
will not select redundant feature statistics. Similarly, we can decompose λ2 ( )
by λ2 (D, A) = λ21 (D) + λ22 (A) + λ23 (D, A).
After that we choose to quantize l, θ, D, and A, i.e., each of these variables
can only take a finite set of values. Then the functions can be reduced to vectors
of function values. For instance, for a function λ(x), if x ∈ {x1 , ..., xn }, we
can represent λ( ) by λ = (λ1 , ..., λn ), so that λj = λ(xj ). Therefore λ(x) =
n
j=1 λj 1x=xj =< λ, H(x) >, where H(x) is the vector of indicators (1x=xj , j =
1, ..., n). Therefore, we can write the conditional distribution in our sketch model
as
1
p(sx,y | SN (x,y) ) = exp{λ0 δx,y + < λ11 , H11 (lx,y ) >
Z(Λ | SN (x,y) )
+ < λ12 , H12 (θx,y ) > + < λ13 , H13 (lx,y , θx,y ) >

+ < λ21 , H21 (D(sx,y , st )) > + < λ22 , H22 (A(sx,y , st )) >
st ∈SN (x,y)

+ < λ23 , H23 (D(sx,y , st ), A(sx,y , st )) >}

1
= exp{< Λ, H(sx,y | SN (x,y) ) >},
Z(Λ | SN (x,y) )

where Λ = (λ0 , λ11 , λ12 , λ13 , λ21 , λ22 , λ23 ), and

H(sx,y | SN (x,y) ) = (δx,y , H11 (lx,y ), H12 (θx,y ), H13 (lx,y , θx,y ),

H21 (D(sx,y , st )), H22 (A(sx,y , st )),
SN (x,y) SN (x,y)

H23 (D(sx,y , st ), A(sx,y , st ))).
SN (x,y)

In our work, H(sx,y | SN (x,y) ) have one to two hundred components. It can be
easily shown that p(sx,y | SN (x,y) ) achieves maximum entropy under the con-
straints on < H(sx,y | SN (x,y) ) >p(sx,y |SN (x,y) ) , where < >p means expectation
with respect to distribution p. The full model is
250 Y.N. Wu, S.C. Zhu, and C.-e. Guo

m
n
p(S | Λ) = p(sx,y | SN (x,y) )
x=1 y=1
m n m n
1
={ } × exp{< Λ, H(sx,y | SN (x,y) ) >}.
x=1 y=1
Z(Λ | SN (x,y) ) x=1 y=1

Now, let’s consider model ﬁtting. Let Sobs be the observed sketch of an image.
Then we can estimate Λ by maximizing the log-likelihood

l(Λ | Sobs ) = {< Λ, H(Sobs obs obs
x,y | SN (x,y) ) > − log Z(Λ | SN (x,y) )}.
x,y

Statistical theories of exponential family models tell us that

∂
l(Λ | Sobs ) = {H(sobs obs obs
x,y | SN (x,y) )− < H(sx,y | SN (x,y) ) >p(sx,y |Sobs ,Λ) },
∂Λ x,y
N (x,y)

and
∂2
2
l(Λ | Sobs ) = − {Varp(sx,y |Sobs ,Λ) [H(sx,y | Sobs
N (x,y) )]}.
∂Λ x,y
N (x,y)

Therefore, the log-likelihood is concave, and the model can be ﬁtted by the
Newton-Raphson or equivalently in this case, the Fisher scoring algorithm,

∂2 ∂
Λt+1 = Λt − [ 2
l(Λ | Sobs )] |−1
Λt l(Λ | Sobs ) |Λt .
∂Λ ∂Λ
The convergence of Newton-Raphson is very fast; usually 5 iterations can already
give very good fit.
Both the first and second derivatives of the log-likelihood are of the form
x,y g(x, y). For each pixel (x, y), we need to evaluate the probabilities of all
possible sx,y . So the computation is still quite costly, although much more man-
ageable compared to MCMC type of algorithms. To further increase the effi-
ciency, we choose to sample a small number of pixels instead of going through
all of them. More specifically, for each (x, y), let πx,y ∈ [0, 1] be the probability
that pixel (x, y) will be included in the sample. Then after we collect a sample
of (x, y) by independent coin-flipping according to πx,y , we can approximate

g(x, y) ≈ g(x, y)/πx,y ,
x,y (x,y)∈sample

where the right hand side is the Hovitz-Thompson unbiased estimator of the left
hand side. As to the choice of πx,y , if there is a linelet at (x, y) on Sobs , then
we always let πx,y = 1. For other empty pixels (x, y), we can set πx,y according
to our need for speed. Usually, even if we take πx,y = .01 for empty pixels, the
algorithm can still give satisfactory ﬁt.
Statistical Modeling of Texture Sketch 251

It is often the case that some components of x,y H(sobs obs
x,y | SN (x,y) ) are 0, and
if we implement the usual Newton-Raphson procedure, then the corresponding
components of Λ will go to −∞, thereby creating hard constraints. While this
is theoretically the right result, it can make the algorithm unstable. We choose
to
stop ﬁtting such components as long as the corresponding components of
obs
x,y < H(s x,y | SN (x,y) ) >p(sx,y |Sobs ) drop below a threshold, e.g., .5.
N (x,y)
For a speciﬁc observed image, we do not want to use all the one to two hun-
dred dimensions in our model. In fact, we can just select a small number of com-
ponents of H(sx,y | SN (x,y) ) using some model selection methods such as Akaike
information criterion (AIC). While best-set selection is too time-consuming, we
can consider a feature pursuit scheme studied by Zhu, et al. (1997), i.e., we start
with only λ0 and δx,y in our model. Then we repeatedly add one component
of H at a time, so that the added component leads to the maximum increase
in log-likelihood. Although the log-likelihood is analytically tractable for the
causal model, the computation of the increase of log-likelihood for each candi-
date component of H is still quite costly. So as an approximation, we choose the
component Hk so that

{ x,y Hk (sobs obs obs
x,y | SN (x,y) )− < Hk (sx,y | SN (x,y) ) >p̂ }
2
gk = obs
,
x,y Varp̂ [Hk (sx,y | SN (x,y) )]

is the largest, where p̂ is the currently ﬁtted model. Intuitively, we choose a

component that is worst ﬁtted by the current model. Then we stop after a
number of steps. Again, the Horvitz-Thompson estimator can be used for faster
computation of gk .
Experiments II. Figures 3 and 4 show some experiments. For each input
image (left), we ﬁrst compute its sketch (middle), and then a sketch model is
learned with 40-60 feature statistics selected by feature pursuit. The angles as
well as orientations are quantitized into 12 bins, and the distances are quantitized
into 12 bins too. Of course, the model fails to capture some features that human
vision is sensitive, which is expected of a pair-wise model. More sophisticated
features should be used, and we leave this to future work.

4 Discussion
There are two major loose ends in our work. One is that the coefficients of
the active bases are not modeled. The other is that the model fitting is not
rigorous. The model can be further extended to incorporate more sophisticated
local structures, such as local shapes and lighting, as well as more sophisticated
organizations such as flows and graphs. The key is that these concepts should be
understood in the context of a top-down generative model. For some stochastic
textures, sparse decomposition may not be achievable, and therefore, we might
have to stay with models built on pixel values such as the FRAME model.
We would like to stress that our goal in this work is to find a top-down
generative model for textures. We are not merely trying to come up with a line-
drawing version of the texture image by some edge detector, and then synthesize
252 Y.N. Wu, S.C. Zhu, and C.-e. Guo

a). input b). image sketch c). a sketch synthesized.

Fig. 3. Examples of learning sketch model. a) is an input image, b) is its computed

sketch, and c) is the synthesized sketch as a random sample from the sketch model
learned from b).

similar line-drawings. We would also like to point out that our work is inspired
by Marr’s primal sketch (Marr, 1982). Marr’s method is bottom-up procedure-
based, that is, there does not exist a top-down model to guide the bottom-up
procedure.
Our eventual goal is to find the top-down model as a visual system’s concep-
tion of primal sketch, so that the largely bottom-up procedure will be model-
based. The hope is that the model is unified and explicit like a language, with
rich vocabularies for local image structures as well as their spatial organizations.
When fitted to a particular image, an automatic model selection procedure will
identify a low-dimensional sub-model as the most meaningful words. The model
should lie between physics-based models (that are not explicit and unified) and
Statistical Modeling of Texture Sketch 253

a). input b). image sketch c). a sketch synthesized.

Fig. 4. Examples of learning sketch model. a) is an input image, b) is its computed

sketch, and c) is the synthesized sketch as a random sample from the sketch model
learned from b).

image-based synthesis (that does not involve dimension reduction). Needless to

say, our current eﬀort is merely a modest ﬁrst step towards this goal.
254 Y.N. Wu, S.C. Zhu, and C.-e. Guo

References
1. A. J. Bell and T.J. Sejnowski, “An information maximization approach to blind
separation and blind deconvolution”, Neural Computation, 7(6): 1129-1159, 1995.
2. F. Bergeaud and S. Mallat, “Matching pursuit: adaptive representation of images
and sounds.” Comp. Appl. Math., 15, 97-109. 1996.
3. A. Berger, V. Della Pietra, and S. Della Pietra, “A maximum entropy approach to
natural language processing”, Computational Linguistics, vol.22, no. 1 1996.
4. E. J. Candès and D. L. Donoho, “Curvelets - A Surprisingly Effective Nonadaptive
Representation for Objects with Edges”, Curves and Surfaces, L. L. Schumaker et
al. (eds), Vanderbilt University Press, Nashville, TN.
5. A. P. Dempster, N.M. Laird, and D. B. Rubin, “Maximum likelihood from incom-
plete data via the EM algorithm”, Journal of the Royal Statistical Society series
B, 39:1-38, 1977.
6. A. A. Efros and T. Leung, “Texture synthesis by non-parametric sampling”, ICCV,
Corfu, Greece, 1999.
7. A. A. Efros and W. T. Freeman, “Image Quilting for Texture Synthesis and Trans-
fer”, SIGGRAPH 2001.
8. S. Geman and D. Geman. “Stochastic relaxation, Gibbs distributions and the
Bayesian restoration of images”. IEE Trans. PAMI 6. pp 721-741. 1984.
9. C. E. Guo, S. C. Zhu, and Y. N. Wu, “Visual learning by integrating descriptive
and generative methods”, ICCV, Vancouver, CA, July, 2001.
10. D. J. Heeger and J. R. Bergen, “Pyramid-based texture analysis/synthesis”, SIG-
GRAPHS, 1995.
11. M. S. Lewicki and B. A. Olshausen, “A probabilistic framework for the adaptation
and comparison of image codes”, JOSA, A. 16(7): 1587-1601, 1999.
12. L. Liang, C. Liu, Y. Xu, B.N. Guo, H.Y. Shum, “Real-Time Texture Synthesis By
Patch-Based Sampling”, MSR-TR-2001-40, March 2001.
13. J. Malik, and P. Perona, “Preattentive texture discrimination with early vision
mechanisms”, J. of Optical Society of America A, vol 7. no.5, May, 1990.
14. S. G. Mallat, “A theory for multiresolution signal decomposition: the wavelet rep-
resentation”, IEEE Trans. on PAMI, vol.11, no.7, 674-693, 1989.
15. S. Mallat and Z. Zhang, “Matching pursuit in a time-frequency dictionary”, IEEE
trans. on Signal Processing, vol.41, pp3397-3415, 1993.
16. D. Marr, Vision, W.H. Freeman and Company, 1982.
17. D. B. Mumford “The Statistical Description of Visual Signals” in ICIAM 95, edited
by K.Kirchgassner, O.Mahrenholtz and R.Mennicken, Akademie Verlag, 1996.
18. B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field proper-
ties by learning a sparse code for natural images” Nature, 381, 607-609, 1996.
19. K. Popat and R. W. Picard, “Novel Cluster-Based Probability Model for Texture
Synthesis, Classification, and Compression.” Proc. of the SPIE Visual Comm. and
Image Proc., Boston, MA, pp. 756-768, 1993.
20. J. Portilla and E. P. Simoncelli, “A parametric texture model based on joint statis-
tics of complex wavelet coefficients”, IJCV, 40(1), 2000.
21. Y. Wu, S. C. Zhu, and X. Liu, (2000)“Equivalence of Julesz texture ensembles and
FRAME models”, IJCV, 38(3), 247-265.
22. S. C. Zhu, Y. N. Wu and D. B. Mumford, “Minimax entropy principle and its
application to texture modeling”, Neural Computation Vol. 9, no 8, Nov. 1997.

01208315
No ratings yet
01208315
10 pages
Geometry-based_Completed_Local_Binary_Pattern_for_Texture_Image_Classification
No ratings yet
Geometry-based_Completed_Local_Binary_Pattern_for_Texture_Image_Classification
5 pages
A Pattern Recognition Approach To Image Segmentation
No ratings yet
A Pattern Recognition Approach To Image Segmentation
7 pages
Domain Specific Cbir For Highly Textured Images
No ratings yet
Domain Specific Cbir For Highly Textured Images
7 pages
Image Texture Model Based On Energy Features
No ratings yet
Image Texture Model Based On Energy Features
11 pages
Texture Intro
No ratings yet
Texture Intro
34 pages
What Is Texture?
No ratings yet
What Is Texture?
23 pages
ICASSP, 1991, Pp. 2773-2776.: Texture Information in Run-Length Matrices
No ratings yet
ICASSP, 1991, Pp. 2773-2776.: Texture Information in Run-Length Matrices
8 pages
GLCM
No ratings yet
GLCM
24 pages
Robust Rule Based Local Binary Pattern Method For Texture Classification and Analysis
No ratings yet
Robust Rule Based Local Binary Pattern Method For Texture Classification and Analysis
4 pages
Texture: Reading: Chapter 9 (Skip 9.4)
No ratings yet
Texture: Reading: Chapter 9 (Skip 9.4)
31 pages
Representation and Descriptor
No ratings yet
Representation and Descriptor
40 pages
Texture Analysis Methods - A Review
No ratings yet
Texture Analysis Methods - A Review
33 pages
Perception Based Texture Classification, Representation and Retrieval
No ratings yet
Perception Based Texture Classification, Representation and Retrieval
6 pages
mv_cs4243_2024_amir_4 (1)
No ratings yet
mv_cs4243_2024_amir_4 (1)
63 pages
Automatic Texture Segmentation For Texture-Based Image Retrieval
No ratings yet
Automatic Texture Segmentation For Texture-Based Image Retrieval
6 pages
Texture Modeling Using MRF and Parameters Estimation
No ratings yet
Texture Modeling Using MRF and Parameters Estimation
5 pages
07.07.dynamic Textures
No ratings yet
07.07.dynamic Textures
19 pages
Geodesic Active Contours for Supervised Texture Segmentation
No ratings yet
Geodesic Active Contours for Supervised Texture Segmentation
6 pages
t5 Presentation
No ratings yet
t5 Presentation
15 pages
Multimedia Systems: Multimedia Databases - Image Processing Basics
No ratings yet
Multimedia Systems: Multimedia Databases - Image Processing Basics
58 pages
Neighbors of Pixel
100% (2)
Neighbors of Pixel
6 pages
CV Lecture 11
No ratings yet
CV Lecture 11
147 pages
Summary of Computer Vision Cyril Stanissh
No ratings yet
Summary of Computer Vision Cyril Stanissh
13 pages
Texture Features For Browsing and Retrieval of Image Data: Manjunathi and W.Y
No ratings yet
Texture Features For Browsing and Retrieval of Image Data: Manjunathi and W.Y
6 pages
IPPR_Lec2
No ratings yet
IPPR_Lec2
92 pages
Secene Image Segmentation
No ratings yet
Secene Image Segmentation
30 pages
1002.2193 CBIR Published
No ratings yet
1002.2193 CBIR Published
5 pages
Spatial Texture Primitive of CBIR Using Contribution Based Clustering
No ratings yet
Spatial Texture Primitive of CBIR Using Contribution Based Clustering
5 pages
Image Segmentation Using Clustering (Texture With PCA)
No ratings yet
Image Segmentation Using Clustering (Texture With PCA)
25 pages
Wavelet Based Texture Classification
No ratings yet
Wavelet Based Texture Classification
4 pages
DIgital Image Processing
No ratings yet
DIgital Image Processing
74 pages
DIP Lab.1 for level 5th
No ratings yet
DIP Lab.1 for level 5th
6 pages
3D Reconstruction Based On Stereovision and Texture Mapping
No ratings yet
3D Reconstruction Based On Stereovision and Texture Mapping
6 pages
Analysis and Comparison of Texture Features For Content Based Image Retrieval
No ratings yet
Analysis and Comparison of Texture Features For Content Based Image Retrieval
6 pages
Robust Face Recognition Under Difficult Lighting Conditions
No ratings yet
Robust Face Recognition Under Difficult Lighting Conditions
4 pages
Smooth Signed Distance Field Textures
No ratings yet
Smooth Signed Distance Field Textures
114 pages
Survey On Texture Feature Extraction Methods
100% (1)
Survey On Texture Feature Extraction Methods
3 pages
4texture
No ratings yet
4texture
59 pages
ML2012 FP Yaniv Bar
No ratings yet
ML2012 FP Yaniv Bar
11 pages
Learning To Compare Image Patches Via Convolutional Neural Networks
No ratings yet
Learning To Compare Image Patches Via Convolutional Neural Networks
9 pages
tmpBEFC TMP
No ratings yet
tmpBEFC TMP
4 pages
Wavelet Transform Based Texture Features For Content Based Image Retrieval
No ratings yet
Wavelet Transform Based Texture Features For Content Based Image Retrieval
5 pages
Segmenting Multiple Textured Objects Using Geodesic Active Contour and Discrete Wavelet Transform (DWT)
No ratings yet
Segmenting Multiple Textured Objects Using Geodesic Active Contour and Discrete Wavelet Transform (DWT)
8 pages
Block Texture Pattern Detection Based On Smoothness and Complexity of Neighborhood Pixels
No ratings yet
Block Texture Pattern Detection Based On Smoothness and Complexity of Neighborhood Pixels
9 pages
TP1-IMAGEVISION
No ratings yet
TP1-IMAGEVISION
15 pages
Unit 3 Session 1
No ratings yet
Unit 3 Session 1
19 pages
Boundary Representation and Description PDF
No ratings yet
Boundary Representation and Description PDF
7 pages
Glbal Regional Features 2
No ratings yet
Glbal Regional Features 2
51 pages
Extended Rule Base Local Binary Pattern Technique For Texture Classification and Analysis
No ratings yet
Extended Rule Base Local Binary Pattern Technique For Texture Classification and Analysis
6 pages
Image Processing Chapter 2
No ratings yet
Image Processing Chapter 2
92 pages
MorphoLibJ-manual-v1 6 2
No ratings yet
MorphoLibJ-manual-v1 6 2
104 pages
Texture Feature Representation and Retrieval
No ratings yet
Texture Feature Representation and Retrieval
7 pages
mirage_correction1
No ratings yet
mirage_correction1
9 pages
Feature Extraction Techniques Based On Color Images
No ratings yet
Feature Extraction Techniques Based On Color Images
7 pages
Fabric Color and Design Identification For Vision Impaired
No ratings yet
Fabric Color and Design Identification For Vision Impaired
6 pages
What Is A Digital Image: Pixel Values Typically Represent Gray Levels, Colours, Heights, Opacities Etc
No ratings yet
What Is A Digital Image: Pixel Values Typically Represent Gray Levels, Colours, Heights, Opacities Etc
7 pages
A New Exemplar Based Image Completion Using Belief Propagation
No ratings yet
A New Exemplar Based Image Completion Using Belief Propagation
7 pages
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet
Empr - QCS 2014
No ratings yet
Empr - QCS 2014
3 pages
Amd Radeon Prorender Add-In For Solidworks: User Guide
No ratings yet
Amd Radeon Prorender Add-In For Solidworks: User Guide
59 pages
Network Security by CISCO
No ratings yet
Network Security by CISCO
0 pages
Pump Tutorial 9
No ratings yet
Pump Tutorial 9
19 pages
Project Documentation
No ratings yet
Project Documentation
66 pages
Bluemine - Applying Analytics in Healthcare
No ratings yet
Bluemine - Applying Analytics in Healthcare
16 pages
Gktoday MCQ September 2022
No ratings yet
Gktoday MCQ September 2022
63 pages
Application Form For Stipendium Hungaricum Scholarship 2023 2024
No ratings yet
Application Form For Stipendium Hungaricum Scholarship 2023 2024
3 pages
ABB RTU560 Urun Listesi 2017
No ratings yet
ABB RTU560 Urun Listesi 2017
3 pages
CSN Call Processing Functions
100% (1)
CSN Call Processing Functions
36 pages
Presentation On Twitter
No ratings yet
Presentation On Twitter
18 pages
Instruction Manual: Digital Multimeter
No ratings yet
Instruction Manual: Digital Multimeter
2 pages
CFD Service Manual
No ratings yet
CFD Service Manual
37 pages
Fortinac: Fortigate VPN Device Integration
No ratings yet
Fortinac: Fortigate VPN Device Integration
45 pages
Nano-Pulse Electrolysis
100% (1)
Nano-Pulse Electrolysis
4 pages
IMR454 Chapter 3 - Crossword Labs
No ratings yet
IMR454 Chapter 3 - Crossword Labs
2 pages
F - Security & Intercoms
No ratings yet
F - Security & Intercoms
29 pages
GATE Progress Tracker
No ratings yet
GATE Progress Tracker
3 pages
LD Systems Catalogue PDF
No ratings yet
LD Systems Catalogue PDF
151 pages
Analog Circuit Design: Chapter 2. MOS Device Physics
No ratings yet
Analog Circuit Design: Chapter 2. MOS Device Physics
42 pages
Quadrotor Helicopter Flight Dynamics and Control T
No ratings yet
Quadrotor Helicopter Flight Dynamics and Control T
21 pages
Biniyam C. V
No ratings yet
Biniyam C. V
6 pages
Idera SQL Workload Analysis
No ratings yet
Idera SQL Workload Analysis
12 pages
V3.0 ITC HD Video Conference System Catalogue
No ratings yet
V3.0 ITC HD Video Conference System Catalogue
4 pages
BIM in Engineering Education in Malaysia
No ratings yet
BIM in Engineering Education in Malaysia
5 pages
CH 30 Cryptography Multiple Choice Questions and Answers PDF
No ratings yet
CH 30 Cryptography Multiple Choice Questions and Answers PDF
9 pages
MKT2004: Research Methods For Business and Marketing: Revision Semester 2, 2010-11
No ratings yet
MKT2004: Research Methods For Business and Marketing: Revision Semester 2, 2010-11
18 pages
SAMEERA documentation
No ratings yet
SAMEERA documentation
32 pages
PDF Bloomberg
No ratings yet
PDF Bloomberg
2 pages
Table of Content: C# Websocket Server Documentation
No ratings yet
Table of Content: C# Websocket Server Documentation
4 pages

Statistical Modeling of Texture Sketch: Abstract. Recent Results On Sparse Coding and Independent Component

Uploaded by

Statistical Modeling of Texture Sketch: Abstract. Recent Results On Sparse Coding and Independent Component

Uploaded by

Statistical Modeling of Texture Sketch

Ying Nian Wu1 , Song Chun Zhu2 , and Cheng-en Guo2

Abstract. Recent results on sparse coding and independent component

Keywords: Independent component analysis; Matching pursuit; Mini-

that texture impressions are decided by histograms of ﬁlter responses. Mean-

2 Sketching the Image

2.1 Sparse Coding and Independent Component Analysis

c(x,y,l,θ,e) ∼ p(c) = ρδ0 + (1 − ρ)N (0, τ 2 ), independently. (2)

The base coeﬃcients c(x,y,l,θ,e) is assumed to be independently distributed ac-

Fig. 1. Image bases used in representation and modeling.

2.2 Matching Pursuit and Its MCMC Version

p(I | zj = 1; {ci , ∀i = j}, I) rj2

Then the posterior probability

1. Randomly select a window W on the image domain, let A be the number of

This is a Metropolis algorithm with two types of proposals, adding an inactive

Fig. 2. Sparse decomposition by overcomplete basis and the symbolic sketch.

Experiments I. Figure 2 shows some results of sparse decomposition with

3 Modeling the Image Sketch

S = {st = (xt , yt , ct , lt , θt , et ), t = 1, ..., n}.

2. A bit-map: let δx,y be the indicator of whether there is an active base at

S = {sx,y = (δx,y , cx,y , lx,y , θx,y , ex,y ), ∀(x, y)}.

S = {st = (xt , yt , ct , lt , θt , et ), t = 1, ..., n} ∼ sketch model(Λ),

Clearly, the sparse and independent component analysis is a special case of

3.1 Previous Models on Textures

where Λ = {λl,θ,e ( ), ∀(l, θ, e)} is a collection of one-dimensional functions, and

as the maximum entropy distribution that reproduces the marginal distributions

3.2 Modeling the Sketch Patterns

The conditional distribution is

3.3 Feature Statistics, Model Fitting, and Feature Pursuit

+ < λ23 , H23 (D(sx,y , st ), A(sx,y , st )) >}

where Λ = (λ0 , λ11 , λ12 , λ13 , λ21 , λ22 , λ23 ), and

Statistical theories of exponential family models tell us that

is the largest, where p̂ is the currently ﬁtted model. Intuitively, we choose a

a). input b). image sketch c). a sketch synthesized.

Fig. 3. Examples of learning sketch model. a) is an input image, b) is its computed

a). input b). image sketch c). a sketch synthesized.

Fig. 4. Examples of learning sketch model. a) is an input image, b) is its computed

image-based synthesis (that does not involve dimension reduction). Needless to

You might also like

p(I | zj = 1; {ci , ∀i = j}, I) rj2