0% found this document useful (0 votes)
105 views

l8 Signal Extraction

This document discusses feature extraction from time series data. It describes how extracting features can help describe time series for tasks like clustering, classification, and querying. It provides examples of preprocessing techniques applied to time series like removing trends, translating offsets, normalizing amplitudes, and removing noise. These preprocessing steps can improve the performance of algorithms that calculate the similarity or distance between time series. The document also introduces the concept of forming a "wedge" to bound a set of candidate time series sequences.

Uploaded by

le dinh chien
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
105 views

l8 Signal Extraction

This document discusses feature extraction from time series data. It describes how extracting features can help describe time series for tasks like clustering, classification, and querying. It provides examples of preprocessing techniques applied to time series like removing trends, translating offsets, normalizing amplitudes, and removing noise. These preprocessing steps can improve the performance of algorithms that calculate the similarity or distance between time series. The document also introduces the concept of forming a "wedge" to bound a set of candidate time series sequences.

Uploaded by

le dinh chien
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 80

esk vysok uen technick v Praze

Fakulta informanch technologi

Katedra teoretick informatiky


Evropsk sociln fond
Praha & EU: Investujeme do va budoucnosti

MI-PDD Data Preprocessing module (2011/2012)

Lecture 8: Feature extraction from time series

Pavel Kordk, FIT, Czech Technical University in Prague 1


Feature Extraction from signals
Time series
Industry, signals from sensors
Biological data
Financial time series
Speech
Music
...
Why we need to describe time series by
features?
Kordik, CTU Prague, FIT, MI-PDD 2
Examples of time series
A. Far-infrared laser
excitation
B. Sleep Apnea
C. Currency exchange rates
D. Particle driven in
nonlinear multiple well
potentials
E. Variable star data
F. J. S. Bach fugue notes

Kordik, CTU Prague, FIT, MI-PDD 3


Time series data mining tasks
Clustering Classification

Motif Discovery Rule Query by


10
Discovery Content

s = 0.5
c = 0.3

Visualization
Novelty Detection

Kordik, CTU Prague, FIT, MI-PDD 4


(c) Eamonn Keogh, [email protected]
Time Series Filtering

Matches Q11
Time Series

1 5 9

2 6 10 Given a Time Series T, a set


of Candidates C and a
distance threshold r, find all
3 7 11
subsequences in T that are
4 8 12 within r distance to any of the
Candidates candidates in C.

Kordik, CTU Prague, FIT, MI-PDD 5


Filtering vs. Querying
Database Query
(template)

Database
Matches Q11 Best match

1 5 9 6
1

2 6 10 2 7

3 8
3 7 11
4 9
4 8 12 10
5
Queries Database
Kordik, CTU Prague, FIT, MI-PDD 6
Time series similarity
Similarity at the level of shape

Similarity at the structural level

Kordik, CTU Prague, FIT, MI-PDD 7


Distance of two signals

C Q

Given two time series:


Q = q1qn
C = c1cn

D(Q, C ) (qi ci )
n
2

i =1
D(Q,C)
Kordik, CTU Prague, FIT, MI-PDD 8
Early Abandon

During the computation, if current sum of the


squared differences between each pair of
corresponding data points exceeds r 2, we can
safely stop the calculation.

C
calculation
abandoned at
this point
Q

0 10 20 30 40 50 60 70 80 90 100

Kordik, CTU Prague, FIT, MI-PDD 9


First translate offsets
3 3

2.5 2.5 D(Q,C)


2 2

1.5 1.5

1 1

0.5 0.5

0 0
0 50 100 150 200 250 300 0 50 100 150 200 250 300

Q = Q - mean(Q)
C = C - mean(C)
D(Q,C)

0 50 100 150 200 250 300


0 50 100 150 200 250 300

Kordik, CTU Prague, FIT, MI-PDD 10


Then scale (normalize) signals

0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000

Q = (Q - mean(Q)) / std(Q)
C = (C - mean(C)) / std(C)
D(Q,C)

Kordik, CTU Prague, FIT, MI-PDD 11


Then remove trend (optional)
12 5

10 4

8 3

6 2

4 1

2 0

0 -1

-2 -2

-4 -3
0 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 160 180 200

Removed linear trend


Removed offset translation

Removed amplitude scaling

Kordik, CTU Prague, FIT, MI-PDD 12


Remove noise

8 8

6 6

4 4

2 2

0 0

-2 -2

-4 -4
0 20 40 60 80 100 120 140 0 20 40 60 80 100 120 140

Q = smooth(Q)
C = smooth(C)
D(Q,C)

Kordik, CTU Prague, FIT, MI-PDD 13


Agglomerative clustering using
Euclidian distance
Noise removed
Trend removed
Without preprocessing
Offset translated
Signals normalized

9
3

8
2

9 7

6 5

8 6

5 4

7 3

4 2

1 1

Kordik, CTU Prague, FIT, MI-PDD 14


Li Wei Eamonn Keogh

Wedge Computer Science & Engineering Dept.


University of California Riverside
Riverside, CA 92521
{wli, eamonn}@cs.ucr.edu

C1 Having candidate sequences C1, .. , Ck , we


can form two new sequences U and L :
C2 Ui = max(C1i , .. , Cki )
Li = min(C1i , .. , Cki )

They form the smallest possible bounding


U
W envelope that encloses sequences C1, .. ,Ck .

We call the combination of U and L a


L wedge, and denote a wedge as W.
W = {U, L}

U A lower bounding measure between an


W
arbitrary query Q and the entire set of
Q
candidate sequences contained in a wedge
W:
L (q i U i )
2
if q i > U i
n

LB _ Keogh (Q, W ) = (q i Li ) if q i < Li
i =1
2

Kordik, CTU Prague, FIT, MI-PDD 0 otherwise 15


Hierarchal Clustering with Wedges

C3 (or W3)
W3 W3 W3

C5 (or W5) W((2,5),3)


W2
W(2,5) W(2,5) W(((2,5),3), (1,4))

C2 (or W2)
W5

C4 (or W4) W1 W1
W(1,4)
W(1,4)

C1 (or W1) W4 W4

K=5 K=4 K=3 K=2 K=1

Wedges can be used for filtering or querying.


Kordik, CTU Prague, FIT, MI-PDD 16
But!
Euclidian
distance not
universally
applicable!
Consider:
Phase shift
Delays
Unsynchron
ized signals
Euclidean ...
Distance

Kordik, CTU Prague, FIT, MI-PDD 17


Example

Kordik, CTU Prague, FIT, MI-PDD 18


Dynamic Time Warping
[Berndt, Clifford, 1994]

Allows acceleration-deceleration of signals along the


time dimension

Basic idea
Consider X = x1, x2, , xn , and Y = y1, y2, , yn
We are allowed to extend each sequence by
repeating elements
Euclidean distance now calculated between the
extended sequences X and Y
Matrix M, where mij = d(xi, yj)

Kordik, CTU Prague, FIT, MI-PDD 19


Dynamic Time Warping
[Berndt, Clifford, 1994]

j=i+w

warping path

j=iw
Y

X
Kordik, CTU Prague, FIT, MI-PDD 20
Restrictions on Warping Paths

Monotonicity
Path should not go down or to the left
Continuity
No elements may be skipped in a sequence
Warping Window
| i j | <= w

Kordik, CTU Prague, FIT, MI-PDD 21


Formulation
Let D(i, j) refer to the dynamic time warping
distance between the subsequences

x1, x2, , xi
y1, y2, , yj

D(i, j) = | xi yj | + min { D(i 1, j),


D(i 1, j 1),
D(i, j 1) }

Kordik, CTU Prague, FIT, MI-PDD 22


Solution by Dynamic Programming

Basic implementation = O(n2) where n is


the length of the sequences
will have to solve the problem for each
(i, j) pair

If warping window is specified, then O(nw)


Only solve for the (i, j) pairs where
| i j | <= w

Kordik, CTU Prague, FIT, MI-PDD 23


Longest Common Subsequence
Measures

(Allowing for Gaps in Sequences)

Gap skipped

Kordik, CTU Prague, FIT, MI-PDD 24


Basic LCS Idea

X = 3, 2, 5, 7, 4, 8, 10, 7
Y = 2, 5, 4, 7, 3, 10, 8, 6
LCS = 2, 5, 7, 10

Sim(X,Y) = |LCS| or Sim(X,Y) = |LCS| /n


Edit Distance is another possibility

Kordik, CTU Prague, FIT, MI-PDD 25


Landmarks
[Perng et. al., 2000]

Similarity definition much closer to human


perception (unlike Euclidean distance)
A point on the curve is a n-th order landmark if
the n-th derivative is 0
Thus, local max and mins are first order
landmarks
Landmark distances are tuples (e.g. in time
and amplitude) that satisfy the triangle
inequality
Several transformations are defined, such as
shifting, amplitude scaling, time warping, etc
Kordik, CTU Prague, FIT, MI-PDD 26
PAA and APCA
Piecewise Aggregate Approximation
Segment the time series into equal parts,
store the average value for each part.
Adaptive Piecewise Constant
Approximation
Parts are of adaptive length

Kordik, CTU Prague, FIT, MI-PDD 27


Symbolic Aggregate approXimation
C

C
0 20 40 60 80 100 120

First convert the time c


series to PAA c c
representation, then
convert the PAA to
b b
b
symbols
- a a
It take linear time
0 20 40 60 80 100 120

Eamonn Keogh and Jessica Lin


Computer Science & Engineering Department
University of California - Riverside
Riverside,CA 92521
[email protected]
baabccbc
Kordik, CTU Prague, FIT, MI-PDD 28
Visual Comparison

3
2 DFT
f
1 e
d PLA
0 c
b
-1 a Haar
-2
APCA
-3

A raw time series of length 128 is transformed into the


word ffffffeeeddcbaabceedcbaaaaacddee.
We can use more symbols to represent the time series since each symbol
requires fewer bits than real-numbers (float, double)
Kordik, CTU Prague, FIT, MI-PDD 29
D (Q , C ) (q i c i )
n 2
1.5

1
C i =1
0.5

0 Euclidean Distance
-0.5

-1

-1.5
Q
0 20 40 60 80 100 120

(q c )
w
DR(Q , C ) n 2
w i =1 i i
1.5

1 C
0.5 PAA distance
0 lower-bounds
-0.5
the Euclidean
-1
Q Distance
-1.5

0 20 40 60 80 100 120

C = baabccbc
MINDIST(Q , C ) n
w i =1
w
(dist(
qi ,
ci ) )2

dist() can be implemented using a


Q = babcacca table lookup.

Kordik, CTU Prague, FIT, MI-PDD 30


What is lower bounding?
Exact (Euclidean) distance D(Q,S) Lower bounding distance DLB(Q,S)
Q Q

S S

D(Q,S)
DLB(Q,S)
D(Q,S) DLB(Q,S)
( )
n

2
i
q s i i=1 i i1 i i
M
( sr sr )(qv sv ) 2
i =1

Lower bounding means that for all Q and S, we have


D (Q,S) D(Q,S31)
Kordik, CTU Prague, FIT, MI-PDDLB
Structure and model based
similarity
A
Extract global
features from the
B
time series, create C
a feature vector,
Time
and use these Series A B C
Feature
feature vectors to
measure similarity Max Value 11 12 19

and/or classify Autocorrelation 0.2 0.3 0.5


Zero Crossings 98 82 13

Kordik, CTU Prague, FIT, MI-PDD 32


ARIMA Models and Forecasting
If we can describe the way the points in the
series are related to each other (the
autocorrelations), then we can describe the
series using the relationships that weve found
AutoRegressive Integrated Moving Average
Models (ARIMA) are mathematical models of the
autocorrelation in a time series
One way to describe time series

Kordik, CTU Prague, FIT, MI-PDD 33


Autocorrelation

The major statistical tool for ARIMA


models is the sample autocorrelation
coefficient


n __ __
( Yt Y ) ( Yt-k Y )
t=k+1
rk =

n __
( Yt Y )2
t=1

Kordik, CTU Prague, FIT, MI-PDD 34


Autocorrelations
r1 indicates how successive values of Y
relate to each other,
r2 indicates how Y values two periods apart
relate to each other,
and so on.

Kordik, CTU Prague, FIT, MI-PDD 35


Autoregressive Models
The autoregressive process of order p is denoted
AR(p), and defined by
Yt = 0 + 1Yt 1 + 2Yt 2 + ... + pYt p + t
p

Yt = r Yt-r + wt
r=1

where 1 , . . . , p are parameters to be


estimated and {wt} white noise, a sequence of
independent (or uncorrelated) random variables
with mean 0 and variance 2
An AR(p) model is a regression model with
lagged values of the dependent variable in the
independent variable positions, hence the name
autoregressive model.

Kordik, CTU Prague, FIT, MI-PDD 36


Moving Average Models
The moving average process of order q, denoted
MA(q), includes lagged error terms t1 to tq,
written as
q The term Moving
Average is historical

Yt = + wt r wt-r
and should not be
confused with the
moving average
r=1 smoothing procedures.

where 1 , 2 , , q are the MA parameters and


wt is white noise
An MA(q) model is a regression model with the
dependent variable, Yt, depending on previous
values of the errors rather than on the variable
itself.

Kordik, CTU Prague, FIT, MI-PDD 37


Simple Moving Average
Include n most recent observations
Weight equally
Ignore older observations
weight

1/n

Professor Stephen R. Lawrence


n ... 3 2 1
College of Business and Administration
University of Colorado
today
Kordik, CTU Prague, FIT, MI-PDD 38
Boulder, CO 80309-0419
Simple Moving Average
Forecast Ft is average of n previous observations or
actuals Dt :
1
Ft +1 = ( Dt + Dt 1 + L + Dt +1n )
n
t
1
Ft +1 = Di
n i =t +1n
Note that the n past observations are equally
weighted.
Issues with moving average forecasts:
All n past observations treated equally;
Observations older than n are not included at all;
Requires that n past observations be retained;
Problem when 1000's of items are being forecast.
Kordik, CTU Prague, FIT, MI-PDD 39
Moving Average

Internet Unicycle Sales


n=3
450

400

350

300

250
Units

200

150

100

50

0
Apr-01 Sep-02 Jan-04 May-05 Oct-06 Feb-08 Jul-09 Nov-10 Apr-12 Aug-13
Month

Kordik, CTU Prague, FIT, MI-PDD 40


Exponential Smoothing, idea
Include all past observations
Weight recent observations much more
heavily than very old observations:
0 < a <1
weight
Decreasing weight given a
to older observations
a(1 a)

today

Kordik, CTU Prague, FIT, MI-PDD 41


Exponential Smoothing, math
Ft = aDt + a (1 a ) Dt 1 + a (1 a ) Dt 2 + L 2

Ft = aDt + (1 a ) Ft 1
Thus, new forecast is weighted sum of old forecast and
actual demand
Notes:
Only 2 values (Dt and Ft-1 ) are required, compared with n for
moving average
Parameter a determined empirically (whatever works best)
Rule of thumb: a < 0.5
Typically, a = 0.2 or a = 0.3 work well

Kordik, CTU Prague, FIT, MI-PDD 42


Exponential Smoothing, example

a = 0.2

Kordik, CTU Prague, FIT, MI-PDD 43


Time honored linear models
N AR N MA
y[t + 1] = ai y[t i ] + b j x[t i ]
i =0 j =0

Auto Regressive Moving Average (ARMA)


Many linear estimation techniques based on Least
Squares, or Least Mean Squares
Power spectra, and Autocorrelation characterize such
linear systems
Randomness comes only from forcing function x(t)

Vincent Stanford
Complex Systems Test Bed project
Kordik, CTU Prague, FIT, MI-PDD 44
August 31, 2007
Simple nonlinear systems
can exhibit chaotic behavior

x[t + 1] = r x[t](1 x[t])


Spectrum, autocorrelation,
characterize linear systems,
not these
Deterministic chaos looks
random to linear analysis
methods
Logistic map is an early
example (Elam 1957). Logistic map 2.9 < r < 3.99

Kordik, CTU Prague, FIT, MI-PDD 45


Time Series
Representations
Data Adaptive Non Data Adaptive

Sorted Singular Symbolic Trees Wavelets Random Spectral Piecewise


Coefficients Value Mappings Aggregate
Decomposition Approximation

Piecewise Adaptive Natural Strings Orthonormal Bi-Orthonormal Discrete Discrete


Linear Piecewise Language Fourier Cosine
Approximation Constant Transform Transform
Approximation
Haar Daubechies Coiflets Symlets
dbn n > 1

UUCUCUCD
0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100 120 0 20 40 60 80 100120 0 20 40 60 80 100 120 0 20 40 60 80 100120

U
U
C
U
C
U
D
D

46
DFT DWT SVDKordik, CTU Prague, FIT, MI-PDD
APCA PAA PLA SYM
Spectrum and Shape
Descriptors
Centroid
Rolloff =
Flux
M Bandwidth Feature
Moments Space
....
Feature vector

F
Centroid

Kordik, CTU Prague, FIT, MI-PDD 47


Fourier transform
DFT (Discrete Fourier Transform)
Transform the data from the time domain
to the frequency domain
Useful for signals with periodicities
sales patterns follow seasons;
economy follows 50-year cycle (or 10?)
temperature follows daily and yearly cycles ...

Kordik, CTU Prague, FIT, MI-PDD 48


How does it work?
Based on Slides by D. Gunopulos (UCR)

Decomposes signal to a sum of sine (and


cosine) waves.
Q:How to assess similarity of x with a
wave?
value
x ={x0, x1, ... xn-1}

0 1 n-1 time
Kordik, CTU Prague, FIT, MI-PDD 49
How does it work?

A: consider the waves with frequency 0,


1, ...; use the inner-product (~cosine
similarity)

value value
freq. f=0 freq. f=1 (sin(t * 2 /n) )

time 0 1 n-1 time


0 1 n-1
Kordik, CTU Prague, FIT, MI-PDD 50
How does it work?
A: consider the waves with frequency 0, 1,
...; use the inner-product (~cosine
similarity)

value
freq. f=2

0 1 n-1 time

Kordik, CTU Prague, FIT, MI-PDD 51


How does it work?

basis
functions 0 1 n-1

cosine, f=1
sine, freq =1 0 1 n-1 0 1 n-1

sine, freq = 2 cosine, f=2


01 n-1 01 n-1
Kordik, CTU Prague, FIT, MI-PDD 52
How does it work?
Basis functions are actually n-dim vectors,
orthogonal to each other
similarity of x with each of them: inner
product
DFT: ~ all the similarities of x with the
basis functions

Kordik, CTU Prague, FIT, MI-PDD 53


How does it work?
Since ejf = cos(f) + j sin(f) (j=sqrt(-1)),
we finally have:

Kordik, CTU Prague, FIT, MI-PDD 54


DFT: definition
Discrete Fourier Transform (n-point):

n 1
X f = 1/ n x t * exp( j 2 tf / n)
t =0

( j = 1 ) inverse DFT
n 1
xt = 1 / n X f * exp(+ j 2 tf / n)
t =0

Kordik, CTU Prague, FIT, MI-PDD 55


DFT: Amplitude spectrum

A f = Re ( X f ) + Im ( X f )
2 2 2
Amplitude
Intuition: strength of frequency f
count Af
freq: 12

time freq. f
Kordik, CTU Prague, FIT, MI-PDD 56
DFT: Amplitude spectrum
excellent approximation, with only 2
frequencies!

Kordik, CTU Prague, FIT, MI-PDD 57


Varying frequencies
DFT is great - but, how about compressing
opera? (baritone, silence, soprano?)

value

time

Kordik, CTU Prague, FIT, MI-PDD 58


Short time Fourier transform
(STFT)

Solution#1: Short time Fourier transform


Apply DFT to sliding window
But: how short should be the window?

Kordik, CTU Prague, FIT, MI-PDD 59


STFT drawbacks
Unchanged Window
Dilemma of Resolution
Narrow window -> poor frequency resolution
Wide window -> poor time resolution
Heisenberg Uncertainty Principle
Cannot know what frequency exists at what time intervals

Via Narrow Window Via Wide Window

Kordik, CTU Prague, FIT, MI-PDD 60


Transformations

From http://www.cerm.unifi.it/EUcourse2001/Gunther_lecturenotes.pdf, p.10

Kordik, CTU Prague, FIT, MI-PDD 61


Wavelet transform
Split Up the Signal into a Bunch of Signals
Representing the Same Signal, but all
Corresponding to Different Frequency
Bands
Only Providing What Frequency Bands
Exists at What Time Intervals

Kordik, CTU Prague, FIT, MI-PDD 62


DEFINITION OF CONTINUOUS
WAVELET TRANSFORM
t
CWT
x ( , s ) = x ( , s ) = 1 x (t ) * dt
s s
Translation
(The location of
Scale
the window)
Wavelet Mother Wavelet
Small wave
Means the window function is of finite length
Mother Wavelet
A prototype for generating the other window functions
All the used windows are its dilated or compressed and shifted
versions

Kordik, CTU Prague, FIT, MI-PDD 63


SCALE
Scale
S>1: dilate the signal
S<1: compress the signal
Low Frequency -> High Scale -> Non-detailed
Global View of Signal -> Span Entire Signal
High Frequency -> Low Scale -> Detailed View
Last in Short Time
Only Limited Interval of Scales is Necessary

Kordik, CTU Prague, FIT, MI-PDD 64


CWT computation
1 * t
CWT (, s ) = (, s ) =

x

x x (t ) dt
s s

Step 1: The wavelet is placed at the beginning of the signal, and set s=1
(the most compressed wavelet);
Step 2: The wavelet function at scale 1 is multiplied by the signal, and
integrated over all times;
Step 3: Shift the wavelet to t= , and get the transform value at t=
and s=1;
Step 4: Repeat the procedure until the wavelet reaches the end of the
signal;
Step 5: Scale s is increased by a sufficiently small value, the above
procedure is repeated for all s;
Step 6: Each computation for a given s fills the single row of the time-
scale plane;
Step 7: CWT is obtained if all s are calculated.

Kordik, CTU Prague, FIT, MI-PDD 65


Haar Wavelets

subtract sum of left half from right half


repeat recursively for quarters, eightths
...

Kordik, CTU Prague, FIT, MI-PDD 66


Wavelets - construction

x0 x1 x2 x3 x4 x5 x6 x7

Kordik, CTU Prague, FIT, MI-PDD 67


Wavelets - construction

level 1 d1,0 s1,0 d1,1 s1,1 .......


+
-
x0 x1 x2 x3 x4 x5 x6 x7

Kordik, CTU Prague, FIT, MI-PDD 68


Wavelets - construction

level 2 d2,0 s2,0

d1,0 s1,0 d1,1 s1,1 .......


+
-
x0 x1 x2 x3 x4 x5 x6 x7

Kordik, CTU Prague, FIT, MI-PDD 69


Wavelets - construction

etc ...

s2,0
d2,0

d1,0 s1,0 d1,1 s1,1 .......


+
-
x0 x1 x2 x3 x4 x5 x6 x7

Kordik, CTU Prague, FIT, MI-PDD 70


Wavelets - construction
Q: map each coefficient
on the time-freq. plane f

s2,0
d2,0

t
d1,0 s1,0 d1,1 s1,1 .......
+
-
x0 x1 x2 x3 x4 x5 x6 x7

Kordik, CTU Prague, FIT, MI-PDD 71


Wavelets - construction
Q: map each coefficient
on the time-freq. plane f

s2,0
d2,0

t
d1,0 s1,0 d1,1 s1,1 .......
+
-
x0 x1 x2 x3 x4 x5 x6 x7

Kordik, CTU Prague, FIT, MI-PDD 72


Wavelets - Drill:

Q: baritone/silence/soprano - DWT?

value
time

Kordik, CTU Prague, FIT, MI-PDD 73


Wavelets - Drill:

Q: baritone/soprano - DWT?

value
time

Kordik, CTU Prague, FIT, MI-PDD 74


Wavelets - construction

Observation1:
+ can be some weighted addition
- is the corresponding weighted difference
(Quadrature mirror filters)
Observation2: unlike DFT/DCT,
there are *many* wavelet bases: Haar,
Daubechies-4, Daubechies-6, ...

Kordik, CTU Prague, FIT, MI-PDD 75


Advantages of Wavelets

Better compression (better RMSE with same


number of coefficients)
closely related to the processing of the
mammalian eye and ear
Good for progressive transmission
handle spikes well
usually, fast to compute (O(n)!)

Kordik, CTU Prague, FIT, MI-PDD 76


Feature space
Keep the d most important wavelets
coefficients
Normalize and keep the largest

Kordik, CTU Prague, FIT, MI-PDD 77


Speech extraction and recognition
System Components
Feature Extraction
Convert audio stream into feature vectors
Baum-Welsch Training
Calibrate HMMs using known data.
Viterbi Decoding
Recognition using HMMs.

Kordik, CTU Prague, FIT, MI-PDD 78


How it is implemented at IBM Cell
processors?
Feature Extraction Viterbi Decoding
Pipeline (12 stages): Evaluate likelihood that a HMM
Window Extraction produced a particular sequence of
Zero Mean frames.
Energy Computation
Dynamic Programming
Preemphasis Filter
Hamming Window HMM parameters
Spectrum Computation (FFT) Left-Right model
Mel Frequency Computation (only forward and self transitions;
Cepstrum Computation transition matrix is upper bi-
diagonal)
Discrete Cosine Transform
Lifter (Cepstral Filter) Gaussian Mixture Model
(PDFs are approximated by a set
Cepstrum Energy Normalization
of Gaussians)
First and Second Temporal
Derivatives Very expensive computation;
Convert 25 ms frame of bottleneck in speech pipeline
speech (with 15 ms overlap)
into 39 coefficients (MFCCs).

Kordik, CTU Prague, FIT, MI-PDD 79


References
[Goldin & Kanellakis 95] On Similarity Queries for Time-Series Data:
Constraint Specification and Implementation. CP 1995: 137-153
[Jagadish et al. 95] Similarity-Based Queries. PODS 1995: 36-45
[Rafiei and Mendelson] Querying Time Series Data Based on Similarity.
TKDE 12(5): 675-693 (2000)
[Berndt and Clifford 94] Using Dynamic Time Warping to Find Patterns in
Time Series. KDD Workshop 1994: 359-370
[Ge and Smyth 2000] Deformable Markov model templates for time-series
pattern matching. KDD 2000: 81-90
[Perng et al. 2000] Landmarks: a New Model for Similarity-based Pattern
Querying in Time Series Databases. ICDE 2000: 33-42

Kordik, CTU Prague, FIT, MI-PDD 80

You might also like