0% found this document useful (0 votes)
61 views

Paul Honeiné, Cédric Richard, Patrick Flandrin, Jean-Baptiste Pothin

This document proposes using kernel-target alignment as a method for selecting optimal time-frequency representations for signal classification tasks. Kernel-target alignment is a criterion from machine learning that allows finding the best reproducing kernel for a classification problem without designing the classifier. The paper discusses three potential applications of this approach: 1) Adjusting the free parameters of a time-frequency distribution to improve classification performance. 2) Selecting the best distribution from a set of candidate distributions for a given task. 3) Optimally combining multiple distributions to further enhance classification.

Uploaded by

Pramod Bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Paul Honeiné, Cédric Richard, Patrick Flandrin, Jean-Baptiste Pothin

This document proposes using kernel-target alignment as a method for selecting optimal time-frequency representations for signal classification tasks. Kernel-target alignment is a criterion from machine learning that allows finding the best reproducing kernel for a classification problem without designing the classifier. The paper discusses three potential applications of this approach: 1) Adjusting the free parameters of a time-frequency distribution to improve classification performance. 2) Selecting the best distribution from a set of candidate distributions for a given task. 3) Optimally combining multiple distributions to further enhance classification.

Uploaded by

Pramod Bhatt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

OPTIMAL SELECTION OF TIME-FREQUENCY REPRESENTATIONS FOR SIGNAL

CLASSIFICATION: A KERNEL-TARGET ALIGNMENT APPROACH


Paul Honein(1) , Cdric Richard(2) , Patrick Flandrin(3) , Jean-Baptiste Pothin(2)
(1)
(2)
(3)

Sonalyse, Pist Oasis, 131 impasse des palmiers, 30319 Als, France

ISTIT (FRE CNRS 2732), Troyes University of Technology, BP 2060, 10010 Troyes cedex, France

Laboratoire de Physique (UMR CNRS 5672), cole Normale Suprieure de Lyon, 46 alle dItalie, 69364 Lyon, France

ABSTRACT
In this paper, we propose a method for selecting time-frequency distributions appropriate for given learning tasks. It is based on a criterion that has recently emerged from the machine learning literature:
the kernel-target alignment. This criterion makes possible to find
the optimal representation for a given classification problem without designing the classifier itself. Some possible applications of our
framework are discussed. The first one provides a computationally
attractive way of adjusting the free parameters of a distribution to
improve classification performance. The second one is related to the
selection, from a set of candidates, of the distribution that best facilitates a classification task. The last one addresses the problem of
optimally combining several distributions.

that best facilitate the classification task at hand. An interesting solution has recently been developed within the area of
machine learning through the concept of kernel-target alignment [11]. This criterion makes possible to find the optimal
reproducing kernel for a given classification problem without designing the classifier itself. In this paper, we discuss
three applications of the alignment criterion to select timefrequency distributions that best suit a classification task. The
first one provides a computationally attractive way of adjusting the free parameters of a distribution. The second one is
related to the selection of the best distribution from a set of
candidate distributions. The last one addresses the problem
of optimally combining several distributions to achieve improvements in classification performance.

1. INTRODUCTION
Time-frequency and time-scale distributions provide a powerful tool for analyzing nonstationary signals. They can be set
up to support a wide range of tasks depending on the users
information need. As an example, there exist classes of distributions that are relatively immune to interference and noise
for analysis purpose [1, 2, 3]. There are also distributions
that maximize a contrast criterion between classes to improve
classification accuracy [4, 5, 6]. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. The most popular ones
are Support Vector Machines (SVM), kernel Ficher Discriminant Analysis (kernel-FDA) and kernel Principal Component
Analysis (kernel-PCA) [7]. They have gained wide popularity due to their conceptual simplicity and their outstanding
performance [8]. Despite these advances, there are few papers other than [9, 10] associating time-frequency analysis
with kernel machines. Clearly, time-frequency analysis still
has not taken advantage of these new information extraction
methods, although many efforts have been focused to develop
task-oriented signal representations.
We begin this paper with a brief review of the related
work [10]. We show how the most effective and innovative
kernel machines can be configured, with a proper choice of
reproducing kernel, to operate in the time-frequency domain.
In the above cited paper, however it was posed as an open
question how to objectively pick time-frequency distributions

2. BACKGROUND ON KERNEL MACHINES


In this section, we concisely review the fundamental building
blocks of kernel machines, mainly the definition of reproducing kernel Hilbert spaces, the kernel trick and the representer
theorem. Let X be a subspace of L2 (C), the space of finiteenergy complex signals. A kernel is a function from X X
to C, with hermitian symmetry. The following two definitions
provide the basic concept of reproducing kernels [12].
Definition 1. A kernel (xi , xj ) is said to be positive definite
on X if the following is true:
n X
n
X

ai aj (xi , xj ) 0,

(1)

i=1 j=1

for all n IN, x1 , . . . , xn X , and a1 , . . . , an C.


Definition 2. Let (H, h , iH ) be a Hilbert space of functions
from X to C. The function (xi , xj ) from X X to C is the
reproducing kernel of H if, and only if,
the function xi : xj 7 xi (xj ) = (xi , xj ) is in H,
for all xi X ;
(xi ) = h(), xi ()iH , for all xi X and H.
It can be shown that every positive definite kernel is the reproducing kernel of a unique Hilbert space of functions from X

to C, called reproducing kernel Hilbert space. Reciprocally,


every reproducing kernel is a positive definite kernel. A proof
of this may be found in [12]. From the second point of definition 2 results a fundamental property of reproducing kernel
Hilbert space. Replacing () by xj (), we obtain

computationally demanding because the size of Cx grows


quadratically in the length of the input signal x. Faced with
such prohibitive computational costs, an attractive alternative
is to make use of the kernel trick and the representer theorem,
if possible, with the following kernel

(xj , xi ) = hxj (), xi ()iH

(xi , xj ) = hCxi , Cxj i.


(7)
P
Writing condition (1) as k i ai Cxi k2 0, which is indeed
satisfied, we verify that is a positive definite kernel. We
denote by H the unique reproducing kernel Hilbert space
associated with . This argument shows that (7) can be associated with any kernel machine reported in the literature
to perform pattern recognition in the time-frequency domain.
Thanks to the representer theorem, the solution (x) admits
a time-frequency interpretation, (x) = h , Cx i, with

(2)

for all xi , xj X , which is the origin of the now generic


term reproducing kernel to refer to . Denoting by () the
map that assigns to each x the kernel function (x, ), equation (2) implies that (xj , xi ) = h(xj ), (xi )iH . The kernel
then evaluates the inner product of any pair of elements of X
mapped to H without any explicit knowledge of (). This
key idea is known as the kernel trick because it can be used to
transform linear algorithms expressed only in terms of inner
products into nonlinear ones.
The representer theorem [13], like the kernel trick, is a
quintessential building block for kernel machines. Consider
a training set An consisting of n input-output pairs (xi , yi ).
This theorem states that any function () of H minimizing
a regularized cost function of the form

n
X

ai Cxi .

(8)

i=1

This equation is obtained by combining (4) and (6). The question of how to select Cx is still open. The next section brings
some elements of answer in a binary classification framework.

J((x1 , y1 , (x1 )), . . . , (xn , yn , (xn ))) + g(kk2H ), (3)


with g() a monotone increasing function on IR+ , can be expressed as a kernel expansion in terms of available data
(x) =

n
X

ai (x, xi ).

(4)

i=1

Applications of this theorem include SVM, kernel-PCA and


kernel-FDA [7]. In the next section, we show how kernel machines can be configured, with a proper choice of reproducing
kernel, to operate in the time-frequency domain.
3. TIME-FREQUENCY REPRODUCING KERNELS
For reasons of conciseness, we restrict ourselves to the Cohen
class of time-frequency distributions. They can be defined as
ZZ

Cx (t, f ) =
(, ) Ax (, ) e2j(f +t) d d, (5)
where Ax (, ) denotes the narrow-band ambiguity function
of x, and (, ) is a parameter function. Conventional pattern recognition algorithms applied directly to time-frequency
representations consist of estimating (t, f ) in the statistics
ZZ
(x) = h , Cx i =
(t, f ) Cx (t, f ) dt df
(6)
to optimize a criterion of the general form (3). Examples of
cost functions include the maximum output variance for PCA,
the maximum margin for SVM, and the maximum Fisher
criterion for FDA. It is apparent that this direct approach is

4. KERNEL-TARGET ALIGNMENT
The alignment criterion is a measure of similarity between
two reproducing kernels, or between a reproducing kernel and
a target function [11]. Given a training set An , the alignment
of kernels 1 and 2 is defined as follows
hK1 , K2 iF
A(1 , 2 ; An ) = p
,
hK1 , K1 iF hK2 , K2 iF

(9)

where h , iF is the Frobenius inner product between two matrices, and K1 and K2 are the Gram matrices with respective
entries 1 (xi , xj ) and 2 (xi , xj ), for all i, j {1, . . . , n}.
The alignment then is simply the correlation coefficient between the bidimensional vectors K1 and K2 .
For binary classification purpose, the decision statistic
should satisfy (xi ) = yi , where yi is the class label of xi .
By setting yi = 1, the ideal Gram matrix would be given by

1 if yi = yj

(10)
K (i, j) = h(xi ), (xj )i =
1 if yi 6= yj ,
p
in which case hK , K iF = n. In [11], Cristianini et al.
propose maximizing the alignment with the target K in order to determine the most relevant reproducing kernel for a
given classification task. The ease with which this criterion
can be estimated using only training data, prior to any computationally intensive training, makes it an interesting tool for
kernel selection. Its relevance is supported by the existing
connection between the alignment score and the generalization performance of the resulting classifier. This has motivated various computational methods of optimizing kernel

k k (xi , xj )

(11)

32

alignment
0.04

30

0.035

28

0.03

26

error rate
0.025

24

0.02

22

error rate

(xi , xj ) =

m
X

0.045

alignment

alignment, including metric learning [14], eigendecomposition of the Gram matrix [11, 15] and linear combination of
kernels [16, 17]. We will focus on the latter of these issues,
which consider the kernel expansion

k=1

and study the problem of choosing the k s to maximize the


kernel-target alignment. A positivity constraint on these coefficients is imposed to ensure the positive definiteness of .
Some more or less efficient algorithms have been proposed in
the literature. In [16], it has been shown that a concise analytical solution exists in the m = 2 case:

(1 , 2 ) if 1 , 2 > 0
(1, 0)
if 2 0
(1 , 2 ) =
(12)

(0, 1)
if 1 0,
with

1 =
2 =

1 hK1 , K iF 2hK1 , K2 iF 2
2
kK1 k2F +

1 (kK1 k2F + )hK2 , K iF hK1 , K2 iF hK1 , K iF


,
2
(kK1 k2F + )(kK2 k2F + ) hK1 , K2 i2F

where 0 arises from a regularization constraint penalizing kk2 . To combine more than 2 kernels, we opted for a
branch and bound approach. It starts from the best available
kernel, and selects from the remaining kernels the one which
best increases the alignment criterion. This procedure is iterated until no improving candidates can be found.

0.015

10

20

30

40

50

60

20
70

window length
Fig. 1. Adjustment of the window size of a spectrogram using the kerneltarget alignment. Comparison with the error rate of a SVM classifier.

and white with variances 02 and 12 , respectively. They were


fixed to 2.25 for the first two experiments, and 0 was considered a random variable uniformly distributed over [0, 2[.
In the third experiment, 02 and 12 were set to 9 and 4, respectively, and 0 was fixed to 0. For each experiment, a
training set A200 of size 200 was generated with equal priors.
A test set T1000 of 1000 examples was also created to estimate the generalization performance of kernel-optimal SVM
classifiers trained on A200 .
5.1. Parameter setting
The first illustration deals with parameter setting of timefrequency distributions. Without any loss of generality, we
address the problem of adjusting the window size of a spectrogram Sx with a view to maximize classification accuracy.
The reproducing kernel is then defined as
sp (xi , xj ) = hSxi , Sxj i.

(15)

5. TIME-FREQUENCY FORMULATION
By placing time-frequency based classification within the
larger framework of kernel machines, we can take advantage
of concepts and tools that have been developed above. In this
section, we focus on selecting time-frequency distributions
appropriate for binary classification tasks. That is, we consider the maximization problem
(13)
22

sp

Cx .

where K is the Gram matrix associated with


We also
discuss how to improve performance by optimally combining
several time-frequency distributions.
Before proceeding, note that the experiments were run on
64-sample data generated according to the hypothesis test

0 : x(t) = w0 (t)
(14)
1 : x(t) = w1 (t) + e2j[(t)+0 ] ,
where (t) is a quadratic phase modulation and 0 the initial
phase. The noises w0 (t) and w1 (t) are zero-mean, Gaussian

20

bj
ridh
spwv cw

18

error rate

hK , K iF
,
= arg max p
n hK , K iF

Figure 1 shows, as a function of the window size, the


kernel-target alignment of sp over the training set A200 . It
also includes the error rate of a SVM classifier trained and
tested on A200 and T1000 , respectively. We note that the maximum alignment is obtained with a window size of 27, and
coincides with the lowest error rate. This shows that with a
high alignment on the training set, we can expect a good generalization performance of a kernel-based classifier.

16

14

mh
12

wv
10
0.04

0.06

0.08

0.1

0.12

0.14

0.16

alignment
Fig. 2. Alignment and error rate for different kernels.

Fig. 3. Smoothed pseudo-Wigner (left), Wigner (middle), and composite associated with the kernel spwv + 0.208 wv (right). Here these distributions are
applied to the signal to be detected.

5.2. Selection of a distribution


The second illustration is concerned with the selection of a
distribution from a set of candidates. The latter consists of
the following distributions: Wigner (wv ), smoothed pseudoWigner (spwv ), Margenau-Hill (mh ), Cho-Williams (cw ),
Born-Jordan (bj ), reduced interference with Hanning window (ridh ), and spectrogram (sp ). Figure 2 shows the performance averaged over 50 independent realizations of the
training and test sets. It provides the alignment of the abovementioned kernels over A200 , versus the error rate of a SVM
classifier trained and tested on A200 and T1000 , respectively.
The apparent relationship between these two criteria emphasizes once more the relevance of the kernel-target alignment.
5.3. Combination of distributions
The last illustration focuses on the combination of timefrequency distributions to achieve improvements in classification performance. This problem was addressed with the
kernel-based process (11)-(12), which was applied to the
above-described set of candidate distributions. Kernels spwv
and wv were successively selected. The kernel-target alignment increased from 0.1039 to 0.1076, while the error rate
of the SVM classifier reduced from 4.7% to 3.2%. Figure 3
presents the composite time-frequency distribution, applied
here to the signal to be detected.
Another experimentation was carried out by adding the
short-time Fourier transform to the above-mentioned set of
quadratic distributions. Note that stft (xi , xj ) = hxi , xj i for
a normalized window. Kernels stft and spwv were successively chosen, for a final alignment of 0.1698 and an error
rate of 2.7%. This result is consistent with statistical decision
theories since the log-likelihood ratio for the detection problem under consideration involves both linear and quadratic
components of the observation.
6. CONCLUSION
In this paper, we showed that specific reproducing kernels allow any kernel machine to operate on time-frequency representations. We also proposed a method, based on the kerneltarget alignment, for selecting or combining time-frequency

distributions to achieve improvements in classification performance. All these links offer new perspectives in the field of
non-stationary signal analysis since they provide an access to
the most recent methodological and theoretical developments
of pattern recognition and statistical learning theory.
7. REFERENCES
[1] F. Auger and P. Flandrin, Improving the readability of time-frequency and timescale representations by reassignment methods, IEEE Transactions on Signal
Processing, vol. 43, no. 5, pp. 10681089, 1995.
[2] D. Jones and R. Baraniuk, An adaptive optimal-kernel time-frequency representation, IEEE Transactions on Signal Processing, vol. 43, no. 10, pp. 23612371,
1995.
[3] J. Gosme, C. Richard, and P. Gonalvs, Adaptive diffusion of time-frequency
and time-scale representations: a review. IEEE Transactions on Signal Processing, vol. 53, no. 11, 2005.
[4] L. Atlas, J. Droppo, and J. McLaughlin, Optimizing time-frequency distributions for automatic classification, in Proc. SPIE, vol. 3162, 1997, pp. 161171.
[5] C. Heitz, Optimum time-frequency representations for the classification and
detection of signals, Applied Signal Proceedings, vol. 3, pp. 124143, 1995.
[6] M. Davy, C. Doncarli, and G. Boudreaux-Bartels, Improved optimization of
time-frequency based signal classifiers, IEEE Signal Processing Letters, vol. 8,
no. 2, pp. 5257, 2001.
[7] K. Mller, S. Mika, G. Rtsch, K. Tsuda, and B. Schlkopf, An introduction
to kernel-based learning algorithms, IEEE Transactions on Neural Networks,
vol. 12, no. 2, pp. 181202, 2000.
[8] V. Vapnik, The nature of statistical learning theory. New York: Springer, 1995.
[9] M. Davy, A. Gretton, A. Doucet, and P. Rayner, Optimised support vector machines for nonstationary signal classification, IEEE Signal Processing Letters,
vol. 9, no. 12, pp. 442445, 2002.
[10] P. Honein, C. Richard, and P. Flandrin, Reconnaissance des formes par
mthodes noyau dans le plan temps-frquence, in Proc. Colloque GRETSI,
Louvain-la-Neuve, Belgium, 2005, pp. 969972.
[11] N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, On kernel-target
alignment, in Advances in Neural Information Processing Systems 14. MIT
Press, 2002.
[12] N. Aronszajn, Theory of reproducing kernels, Transactions of the American
Mathematical Society, vol. 68, pp. 337404, 1950.
[13] B. Schlkopf, R. Herbrich, and R. Williamson, A generalized representer theorem, NeuroCOLT, Royal Holloway College, University of London, UK, Tech.
Rep. NC2-TR-2000-81, 2000.
[14] G. Wu, E. Y. Chang, and N. Panda, Formulating distance functions via the kernel trick, in Proc. 11th ACM International conference on knowledge discovery
in Data mining, 2005, pp. 703709.
[15] J. Kandola, J. Shawe-Taylor, and N. Cristianini, On the extensions of kernel
alignment, Dept. Comput. Sci., University of London, Tech. Rep. 120, 2002.
[16] J.-B. Pothin and C. Richard, Kernel machines : une nouvelle mthode pour
loptimisation de lalignement des noyaux et lamlioration des performances,
in Proc. Colloque GRETSI, Louvain-la-Neuve, Belgium, 2005, pp. 11331136.
[17] J. Kandola, J. Shawe-Taylor, and N. Cristianini, Optimizing kernel alignment
over combinations of kernels, Department of Computer Science, University of
London, Tech. Rep. 121, 2002.

You might also like