Paul Honeiné, Cédric Richard, Patrick Flandrin, Jean-Baptiste Pothin
Paul Honeiné, Cédric Richard, Patrick Flandrin, Jean-Baptiste Pothin
Sonalyse, Pist Oasis, 131 impasse des palmiers, 30319 Als, France
ISTIT (FRE CNRS 2732), Troyes University of Technology, BP 2060, 10010 Troyes cedex, France
Laboratoire de Physique (UMR CNRS 5672), cole Normale Suprieure de Lyon, 46 alle dItalie, 69364 Lyon, France
ABSTRACT
In this paper, we propose a method for selecting time-frequency distributions appropriate for given learning tasks. It is based on a criterion that has recently emerged from the machine learning literature:
the kernel-target alignment. This criterion makes possible to find
the optimal representation for a given classification problem without designing the classifier itself. Some possible applications of our
framework are discussed. The first one provides a computationally
attractive way of adjusting the free parameters of a distribution to
improve classification performance. The second one is related to the
selection, from a set of candidates, of the distribution that best facilitates a classification task. The last one addresses the problem of
optimally combining several distributions.
that best facilitate the classification task at hand. An interesting solution has recently been developed within the area of
machine learning through the concept of kernel-target alignment [11]. This criterion makes possible to find the optimal
reproducing kernel for a given classification problem without designing the classifier itself. In this paper, we discuss
three applications of the alignment criterion to select timefrequency distributions that best suit a classification task. The
first one provides a computationally attractive way of adjusting the free parameters of a distribution. The second one is
related to the selection of the best distribution from a set of
candidate distributions. The last one addresses the problem
of optimally combining several distributions to achieve improvements in classification performance.
1. INTRODUCTION
Time-frequency and time-scale distributions provide a powerful tool for analyzing nonstationary signals. They can be set
up to support a wide range of tasks depending on the users
information need. As an example, there exist classes of distributions that are relatively immune to interference and noise
for analysis purpose [1, 2, 3]. There are also distributions
that maximize a contrast criterion between classes to improve
classification accuracy [4, 5, 6]. Over the last decade, a number of new pattern recognition methods based on reproducing kernels have been introduced. The most popular ones
are Support Vector Machines (SVM), kernel Ficher Discriminant Analysis (kernel-FDA) and kernel Principal Component
Analysis (kernel-PCA) [7]. They have gained wide popularity due to their conceptual simplicity and their outstanding
performance [8]. Despite these advances, there are few papers other than [9, 10] associating time-frequency analysis
with kernel machines. Clearly, time-frequency analysis still
has not taken advantage of these new information extraction
methods, although many efforts have been focused to develop
task-oriented signal representations.
We begin this paper with a brief review of the related
work [10]. We show how the most effective and innovative
kernel machines can be configured, with a proper choice of
reproducing kernel, to operate in the time-frequency domain.
In the above cited paper, however it was posed as an open
question how to objectively pick time-frequency distributions
ai aj (xi , xj ) 0,
(1)
i=1 j=1
(2)
n
X
ai Cxi .
(8)
i=1
This equation is obtained by combining (4) and (6). The question of how to select Cx is still open. The next section brings
some elements of answer in a binary classification framework.
n
X
ai (x, xi ).
(4)
i=1
Cx (t, f ) =
(, ) Ax (, ) e2j(f +t) d d, (5)
where Ax (, ) denotes the narrow-band ambiguity function
of x, and (, ) is a parameter function. Conventional pattern recognition algorithms applied directly to time-frequency
representations consist of estimating (t, f ) in the statistics
ZZ
(x) = h , Cx i =
(t, f ) Cx (t, f ) dt df
(6)
to optimize a criterion of the general form (3). Examples of
cost functions include the maximum output variance for PCA,
the maximum margin for SVM, and the maximum Fisher
criterion for FDA. It is apparent that this direct approach is
4. KERNEL-TARGET ALIGNMENT
The alignment criterion is a measure of similarity between
two reproducing kernels, or between a reproducing kernel and
a target function [11]. Given a training set An , the alignment
of kernels 1 and 2 is defined as follows
hK1 , K2 iF
A(1 , 2 ; An ) = p
,
hK1 , K1 iF hK2 , K2 iF
(9)
where h , iF is the Frobenius inner product between two matrices, and K1 and K2 are the Gram matrices with respective
entries 1 (xi , xj ) and 2 (xi , xj ), for all i, j {1, . . . , n}.
The alignment then is simply the correlation coefficient between the bidimensional vectors K1 and K2 .
For binary classification purpose, the decision statistic
should satisfy (xi ) = yi , where yi is the class label of xi .
By setting yi = 1, the ideal Gram matrix would be given by
1 if yi = yj
(10)
K (i, j) = h(xi ), (xj )i =
1 if yi 6= yj ,
p
in which case hK , K iF = n. In [11], Cristianini et al.
propose maximizing the alignment with the target K in order to determine the most relevant reproducing kernel for a
given classification task. The ease with which this criterion
can be estimated using only training data, prior to any computationally intensive training, makes it an interesting tool for
kernel selection. Its relevance is supported by the existing
connection between the alignment score and the generalization performance of the resulting classifier. This has motivated various computational methods of optimizing kernel
k k (xi , xj )
(11)
32
alignment
0.04
30
0.035
28
0.03
26
error rate
0.025
24
0.02
22
error rate
(xi , xj ) =
m
X
0.045
alignment
alignment, including metric learning [14], eigendecomposition of the Gram matrix [11, 15] and linear combination of
kernels [16, 17]. We will focus on the latter of these issues,
which consider the kernel expansion
k=1
(1 , 2 ) if 1 , 2 > 0
(1, 0)
if 2 0
(1 , 2 ) =
(12)
(0, 1)
if 1 0,
with
1 =
2 =
1 hK1 , K iF 2hK1 , K2 iF 2
2
kK1 k2F +
where 0 arises from a regularization constraint penalizing kk2 . To combine more than 2 kernels, we opted for a
branch and bound approach. It starts from the best available
kernel, and selects from the remaining kernels the one which
best increases the alignment criterion. This procedure is iterated until no improving candidates can be found.
0.015
10
20
30
40
50
60
20
70
window length
Fig. 1. Adjustment of the window size of a spectrogram using the kerneltarget alignment. Comparison with the error rate of a SVM classifier.
(15)
5. TIME-FREQUENCY FORMULATION
By placing time-frequency based classification within the
larger framework of kernel machines, we can take advantage
of concepts and tools that have been developed above. In this
section, we focus on selecting time-frequency distributions
appropriate for binary classification tasks. That is, we consider the maximization problem
(13)
22
sp
Cx .
20
bj
ridh
spwv cw
18
error rate
hK , K iF
,
= arg max p
n hK , K iF
16
14
mh
12
wv
10
0.04
0.06
0.08
0.1
0.12
0.14
0.16
alignment
Fig. 2. Alignment and error rate for different kernels.
Fig. 3. Smoothed pseudo-Wigner (left), Wigner (middle), and composite associated with the kernel spwv + 0.208 wv (right). Here these distributions are
applied to the signal to be detected.
distributions to achieve improvements in classification performance. All these links offer new perspectives in the field of
non-stationary signal analysis since they provide an access to
the most recent methodological and theoretical developments
of pattern recognition and statistical learning theory.
7. REFERENCES
[1] F. Auger and P. Flandrin, Improving the readability of time-frequency and timescale representations by reassignment methods, IEEE Transactions on Signal
Processing, vol. 43, no. 5, pp. 10681089, 1995.
[2] D. Jones and R. Baraniuk, An adaptive optimal-kernel time-frequency representation, IEEE Transactions on Signal Processing, vol. 43, no. 10, pp. 23612371,
1995.
[3] J. Gosme, C. Richard, and P. Gonalvs, Adaptive diffusion of time-frequency
and time-scale representations: a review. IEEE Transactions on Signal Processing, vol. 53, no. 11, 2005.
[4] L. Atlas, J. Droppo, and J. McLaughlin, Optimizing time-frequency distributions for automatic classification, in Proc. SPIE, vol. 3162, 1997, pp. 161171.
[5] C. Heitz, Optimum time-frequency representations for the classification and
detection of signals, Applied Signal Proceedings, vol. 3, pp. 124143, 1995.
[6] M. Davy, C. Doncarli, and G. Boudreaux-Bartels, Improved optimization of
time-frequency based signal classifiers, IEEE Signal Processing Letters, vol. 8,
no. 2, pp. 5257, 2001.
[7] K. Mller, S. Mika, G. Rtsch, K. Tsuda, and B. Schlkopf, An introduction
to kernel-based learning algorithms, IEEE Transactions on Neural Networks,
vol. 12, no. 2, pp. 181202, 2000.
[8] V. Vapnik, The nature of statistical learning theory. New York: Springer, 1995.
[9] M. Davy, A. Gretton, A. Doucet, and P. Rayner, Optimised support vector machines for nonstationary signal classification, IEEE Signal Processing Letters,
vol. 9, no. 12, pp. 442445, 2002.
[10] P. Honein, C. Richard, and P. Flandrin, Reconnaissance des formes par
mthodes noyau dans le plan temps-frquence, in Proc. Colloque GRETSI,
Louvain-la-Neuve, Belgium, 2005, pp. 969972.
[11] N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola, On kernel-target
alignment, in Advances in Neural Information Processing Systems 14. MIT
Press, 2002.
[12] N. Aronszajn, Theory of reproducing kernels, Transactions of the American
Mathematical Society, vol. 68, pp. 337404, 1950.
[13] B. Schlkopf, R. Herbrich, and R. Williamson, A generalized representer theorem, NeuroCOLT, Royal Holloway College, University of London, UK, Tech.
Rep. NC2-TR-2000-81, 2000.
[14] G. Wu, E. Y. Chang, and N. Panda, Formulating distance functions via the kernel trick, in Proc. 11th ACM International conference on knowledge discovery
in Data mining, 2005, pp. 703709.
[15] J. Kandola, J. Shawe-Taylor, and N. Cristianini, On the extensions of kernel
alignment, Dept. Comput. Sci., University of London, Tech. Rep. 120, 2002.
[16] J.-B. Pothin and C. Richard, Kernel machines : une nouvelle mthode pour
loptimisation de lalignement des noyaux et lamlioration des performances,
in Proc. Colloque GRETSI, Louvain-la-Neuve, Belgium, 2005, pp. 11331136.
[17] J. Kandola, J. Shawe-Taylor, and N. Cristianini, Optimizing kernel alignment
over combinations of kernels, Department of Computer Science, University of
London, Tech. Rep. 121, 2002.