0% found this document useful (0 votes)
287 views

The DFT FFT and Practical Spectral Analysis 2.1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
287 views

The DFT FFT and Practical Spectral Analysis 2.1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 106

The DFT, FFT, and Practical Spectral

Analysis

Collection Editor:
Douglas L. Jones
The DFT, FFT, and Practical Spectral
Analysis

Collection Editor:
Douglas L. Jones
Authors:
Douglas L. Jones
Ivan Selesnick

Online:
< http://cnx.org/content/col10281/1.2/ >

CONNEXIONS

Rice University, Houston, Texas


This selection and arrangement of content as a collection is copyrighted by Douglas L. Jones. It is licensed under the
Creative Commons Attribution 1.0 license (http://creativecommons.org/licenses/by/1.0).
Collection structure revised: February 22, 2007
PDF generated: March 18, 2010
For copyright and attribution information for the modules contained in this collection, see p. 97.
Table of Contents
1 The Discrete Fourier Transform
1.1 DFT Denition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Spectrum Analysis
2.1 Spectrum Analysis Using the Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Classical Statistical Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Short Time Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Fast Fourier Transform Algorithms
3.1 Overview of Fast Fourier Transform (FFT) Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 Running FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3 Goertzel's Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Power-of-Two FFTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Multidimensional Index Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.6 The Prime Factor Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4 Fast Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 Chirp-z Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6 FFTs of prime length and Rader's conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7 Ecient FFT Algorithm and Programming Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8 Choosing the Best FFT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Attributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
iv
Chapter 1

The Discrete Fourier Transform


1.1 DFT Denition and Properties 1

1.1.1 DFT
The discrete Fourier transform (DFT)2 is the primary transform used for numerical computation in digital
signal processing. It is very widely used for spectrum analysis (Section 2.1), fast convolution (Chapter 4),
and many other applications. The DFT transforms N discrete-time samples to the same number of discrete
frequency samples, and is dened as
N −1  
x (n) e−(j )
X 2πnk
X (k) = N (1.1)
n=0

The DFT is widely used in part because it can be computed very eciently using fast Fourier transform
(FFT)3 algorithms.

1.1.2 IDFT
The inverse DFT (IDFT) transforms N discrete-frequency samples to the same number of discrete-time
samples. The IDFT has a form very similar to the DFT,
N −1
1 X 2πnk

x (n) = X (k) ej N (1.2)
N
k=0

and can thus also be computed eciently using FFTs4 .

1.1.3 DFT and IDFT properties


1.1.3.1 Periodicity
2πnk
Due to the N -sample periodicity of the complex exponential basis functions ej N in the DFT and IDFT,
the resulting transforms are also periodic with N samples.

X (k + N ) = X (k)
1 This content is available online at <http://cnx.org/content/m12019/1.5/>.
2 "The DFT: Frequency Domain with a Computer Analysis" <http://cnx.org/content/m10992/latest/>
3 The DFT, FFT, and Practical Spectral Analysis <http://cnx.org/content/col10281/latest/>
4 The DFT, FFT, and Practical Spectral Analysis <http://cnx.org/content/col10281/latest/>

1
2 CHAPTER 1. THE DISCRETE FOURIER TRANSFORM

x (n) = x (n + N )

1.1.3.2 Circular Shift


A shift in time corresponds to a phase shift that is linear in frequency. Because of the periodicity induced
by the DFT and IDFT, the shift is circular, or modulo N samples.
 
x ((n − m) modN ) ⇔ X (k) e−(j N )
2πkm

The modulus operator pmodN means the remainder of p when divided by N . For example,
9mod5 = 4
and
−1mod5 = 4

1.1.3.3 Time Reversal


(x ((−n) modN ) = x ((N − n) modN ) ⇔ X ((N − k) modN ) = X ((−k) modN ))
Note: time-reversal maps (0 ⇔ 0), (1 ⇔ N − 1), (2 ⇔ N − 2), etc. as illustrated in the gure below.

(a) (b)

Figure 1.1: Illustration of circular time-reversal (a) Original signal (b) Time-reversed

1.1.3.4 Complex Conjugate


 
x (n) ⇔ X ((−k) modN )

1.1.3.5 Circular Convolution Property


Circular convolution is dened as
N −1
!
. X
x (n) ∗ h (n) = (x (m) x ((n − m) modN ))
m=0

Circular convolution of two discrete-time signals corresponds to multiplication of their DFTs:


(x (n) ∗ h (n) ⇔ X (k) H (k))
3

1.1.3.6 Multiplication Property


A similar property relates multiplication in time to circular convolution in frequency.
 
1
x (n) h (n) ⇔ X (k) ∗ H (k)
N

1.1.3.7 Parseval's Theorem


Parseval's theorem relates the energy of a length-N discrete-time signal (or one period) to the energy of its
DFT.
N −1  N −1
X 2
 1 X 2

(|x (n) |) = (|X (k) |)
n=0
N
k=0

1.1.3.8 Symmetry
The continuous-time Fourier transform5 , the DTFT (2.1), and DFT (2.3) are all dened as transforms of
complex-valued data to complex-valued spectra. However, in practice signals are often real-valued. The
DFT of a real-valued discrete-time signal has a special symmetry, in which the real part of the transform
values are DFT even symmetric and the imaginary part is DFT odd symmetric, as illustrated in the
equation and gure below.
x (n) real ⇔ X (k) = X ((N − k) modN ) (This implies X (0), X N2 are real-valued.)


5 "Continuous-Time Fourier Transform (CTFT)" <http://cnx.org/content/m10098/latest/>


4 CHAPTER 1. THE DISCRETE FOURIER TRANSFORM

(a) Real part of X(k) is even

(b) Imaginary part of X(k) is odd

Figure 1.2: DFT symmetry of real-valued signal (a) Even-symmetry in DFT sense (b) Odd-symmetry
in DFT sense
Chapter 2

Spectrum Analysis
2.1 Spectrum Analysis Using the Discrete Fourier Transform 1

2.1.1 Discrete-Time Fourier Transform


The Discrete-Time Fourier Transform (DTFT)2 is the primary theoretical tool for understanding the fre-
quency content of a discrete-time (sampled) signal. The DTFT3 is dened as
∞ 
X 
X (ω) = x (n) e−(jωn) (2.1)
n=−∞

The inverse DTFT (IDTFT) is dened by an integral formula, because it operates on a continuous-frequency
DTFT spectrum: Z π
1
x (n) = X (k) ejωn dω (2.2)
2π −π

The DTFT is very useful for theory and analysis, but is not practical for numerically computing a
spectrum digitally, because

1. innite time samples means


• innite computation
• innite delay
2. The transform is continuous in the discrete-time frequency, ω

For practical computation of the frequency content of real-world signals, the Discrete Fourier Transform
(DFT) is used.

2.1.2 Discrete Fourier Transform


The DFT transforms N samples of a discrete-time signal to the same number of discrete frequency samples,
and is dened as
N −1  
j2πnk
x (n) e−( N )
X
X (k) = (2.3)
n=0
1 This content is available online at <http://cnx.org/content/m12032/1.6/>.
2 "Discrete-Time Fourier Transform (DTFT)" <http://cnx.org/content/m10247/latest/>
3 "Discrete-Time Fourier Transform (DTFT)" <http://cnx.org/content/m10247/latest/>

5
6 CHAPTER 2. SPECTRUM ANALYSIS

The DFT is invertible by the inverse discrete Fourier transform (IDFT):


N −1
1 X 2πnk

x (n) = X (k) e+j N (2.4)
N
k=0

The DFT (2.3) and IDFT (2.4) are a self-contained, one-to-one transform pair for a length-N discrete-time
signal. (That is, the DFT (2.3) is not merely an approximation to the DTFT (2.1) as discussed next.)
However, the DFT (2.3) is very often used as a practical approximation to the DTFT (2.1).

2.1.3 Relationships Between DFT and DTFT


2.1.3.1 DFT and Discrete Fourier Series
The DFT (2.3) gives the discrete-time Fourier series coecients of a periodic sequence (x (n) = x (n + N ))
of period N samples, or   
2π X 2πk
X (ω) = X (k) δ ω − (2.5)
N N
as can easily be conrmed by computing the inverse DTFT of the corresponding line spectrum:

1
R π 2π P
X (k) δ ω − 2πk
 jωn
x (n) = 2π −π N N e dω

1 N −1 2πnk
X (k) e+j N
P
= N k=0
(2.6)
= IDFT (X (k))
= x (n)
The DFT can thus be used to exactly compute the relative values of the N line spectral components of
the DTFT of any periodic discrete-time sequence with an integer-length period.

2.1.3.2 DFT and DTFT of nite-length data


When a discrete-time sequence happens to equal zero for all samples except for those between 0 and N − 1,
the innite sum in the DTFT (2.1) equation becomes the same as the nite sum from 0 to N − 1 in the
DFT (2.3) equation. By matching the arguments in the exponential terms, we observe that the DFT values
exactly equal the DTFT for specic DTFT frequencies ωk = 2πk N . That is, the DFT computes exact
samples of the DTFT at N equally spaced frequencies ωk = 2πk N , or
  ∞  −1 
 NX
2πk j2πnk

x (n) e−( N ) = X (k)
X
X ωk = = x (n) e−(jωk n) =
N n=−∞ n=0

2.1.3.3 DFT as a DTFT approximation


In most cases, the signal is neither exactly periodic nor truly of nite length; in such cases, the DFT of a
nite block of N consecutive discrete-time samples does not exactly equal samples of the DTFT at specic
frequencies. Instead, the DFT (2.3) gives frequency samples of a windowed (truncated) DTFT (2.1)
−1  ∞ 
^
  NX
2πk  X 
X ωk = = x (n) e−(jωk n) = x (n) w (n) e−(jωk n) = X (k)
N n=0 n=−∞

 1 if 0 ≤ n < N
where w (n) = Once again, X (k) exactly equals X (ωk ) a DTFT frequency sample only
 0 if else
when x (n) = 0 , n ∈
/ [0, N − 1]
7

2.1.4 Relationship between continuous-time FT and DFT


The goal of spectrum analysis is often to determine the frequency content of an analog (continuous-time)
signal; very often, as in most modern spectrum analyzers, this is actually accomplished by sampling the
analog signal, windowing (truncating) the data, and computing and plotting the magnitude of its DFT. It
is thus essential to relate the DFT frequency samples back to the original analog frequency. Assuming that
the analog signal is bandlimited and the sampling frequency exceeds twice that limit so that no frequency
aliasing occurs, the relationship between the continuous-time Fourier frequency Ω (in radians) and the DTFT
frequency ω imposed by sampling is ω = ΩT where T is the sampling period. Through the relationship
N between the DTFT frequency ω and the DFT frequency index k , the correspondence between the
ωk = 2πk
DFT frequency index and the original analog frequency can be found:
2πk
Ω=
NT
or in terms of analog frequency f in Hertz (cycles per second rather than radians)

k
f=
NT
for k in the range k between 0 and N2 . It is important to note that k ∈ + 1, N − 1 correspond to
N 
2
negative frequencies due to the periodicity of the DTFT and the DFT.
Exercise 2.1 (Solution on p. 36.)
In general, will DFT frequency values X (k) exactly equal samples of
 the analog Fourier transform
Xa at the corresponding frequencies? That is, will X (k) = Xa 2πk
NT ?

2.1.5 Zero-Padding
If more than N equally spaced frequency samples of a length-N signal are desired, they can easily be obtained
by zero-padding the discrete-time signal and computing a DFT of the longer length. In particular, if LN
DTFT (2.1) samples are desired of a length-N sequence, one can compute the length-LN DFT (2.3) of a
length-LN zero-padded sequence

 x (n) if 0 ≤ n ≤ N − 1
z (n) =
 0 if N ≤ n ≤ LN − 1

  NX−1   LN −1 
2πk 
x (n) e−(j LN ) = z (n) e−(j LN ) = DFTLN [z [n]]
2πkn X 2πkn
X wk = =
LN n=0 n=0

Note that zero-padding interpolates the spectrum. One should always zero-pad (by about at least a factor
of 4) when using the DFT (2.3) to approximate the DTFT (2.1) to get a clear picture of the DTFT (2.1).
While performing computations on zeros may at rst seem inecient, using FFT (Section 3.1) algorithms,
which generally expect the same number of input and output samples, actually makes this approach very
ecient.
Figure 2.1 (Spectrum without zero-padding) shows the magnitude of the DFT values corresponding to
the non-negative frequencies of a real-valued length-64 DFT of a length-64 signal, both in a "stem" format
to emphasize the discrete nature of the DFT frequency samples, and as a line plot to emphasize its use as
an approximation to the continuous-in-frequency DTFT. From this gure, it appears that the signal has a
single dominant frequency component.
8 CHAPTER 2. SPECTRUM ANALYSIS

Spectrum without zero-padding

(a) Stem plot

(b) Line Plot

Figure 2.1: Magnitude DFT spectrum of 64 samples of a signal with a length-64 DFT (no zero padding)

Zero-padding by a factor of two by appending 64 zero values to the signal and computing a length-128 DFT
yields Figure 2.2 (Spectrum with factor-of-two zero-padding). It can now be seen that the signal consists of at
least two narrowband frequency components; the gap between them fell between DFT samples in Figure 2.1
(Spectrum without zero-padding), resulting in a misleading picture of the signal's spectral content. This
is sometimes called the picket-fence eect, and is a result of insucient sampling in frequency. While
zero-padding by a factor of two has revealed more structure, it is unclear whether the peak magnitudes
are reliably rendered, and the jagged linear interpolation in the line graph does not yet reect the smooth,
continuously-dierentiable spectrum of the DTFT of a nite-length truncated signal. Errors in the apparent
peak magnitude due to insucient frequency sampling is sometimes referred to as scalloping loss.
9

Spectrum with factor-of-two zero-padding

(a) Stem plot

(b) Line Plot

Figure 2.2: Magnitude DFT spectrum of 64 samples of a signal with a length-128 DFT (double-length
zero-padding)

Zero-padding to four times the length of the signal, as shown in Figure 2.3 (Spectrum with factor-of-four
zero-padding), clearly shows the spectral structure and reveals that the magnitude of the two spectral lines
are nearly identical. The line graph is still a bit rough and the peak magnitudes and frequencies may not be
precisely captured, but the spectral characteristics of the truncated signal are now clear.
10 CHAPTER 2. SPECTRUM ANALYSIS

Spectrum with factor-of-four zero-padding

(a) Stem plot

(b) Line Plot

Figure 2.3: Magnitude DFT spectrum of 64 samples of a signal with a length-256 zero-padded DFT
(four times zero-padding)

Zero-padding to a length of 1024, as shown in Figure 2.4 (Spectrum with factor-of-sixteen zero-padding)
yields a spectrum that is smooth and continuous to the resolution of the computer screen, and produces a
very accurate rendition of the DTFT of the truncated signal.
11

Spectrum with factor-of-sixteen zero-padding

(a) Stem plot

(b) Line Plot

Figure 2.4: Magnitude DFT spectrum of 64 samples of a signal with a length-1024 zero-padded DFT.
The spectrum now looks smooth and continuous and reveals all the structure of the DTFT of a truncated
signal.

The signal used in this example actually consisted of two pure sinusoids of equal magnitude. The slight
dierence in magnitude of the two dominant peaks, the breadth of the peaks, and the sinc-like lesser side
lobe peaks throughout frequency are artifacts of the truncation, or windowing, process used to practically
approximate the DFT. These problems and partial solutions to them are discussed in the following section.

2.1.6 Eects of Windowing


Applying the DTFT multiplication property
∞ 
X  1
\
X (ωk ) = x (n) w (n) e−(jωk n) = X (ωk ) ∗ W (ωk )
n=−∞

we nd that the DFT (2.3) of the windowed (truncated) signal produces samples not of the true (desired)
DTFT spectrum X (ω), but of a smoothed verson X (ω) ∗ W (ω). We want this to resemble X (ω) as
closely as possible, so W (ω) should be as close to an impulse as possible. The window w (n) need not be a
simple truncation (or rectangle, or boxcar) window; other shapes can also be used as long as they limit
the sequence to at most N consecutive nonzero samples. All good windows are impulse-like, and represent
various tradeos between three criteria:
12 CHAPTER 2. SPECTRUM ANALYSIS

1. main lobe width: (limits resolution of closely-spaced peaks of equal height)


2. height of rst sidelobe: (limits ability to see a small peak near a big peak)
3. slope of sidelobe drop-o: (limits ability to see small peaks further away from a big peak)

Many dierent window functions4 have been developed for truncating and shaping a length-N signal
segment for spectral analysis. The simple truncation window has a periodic sinc DTFT, as shown in
Figure 2.5. It has the narrowest main-lobe width, 2π N at the -3 dB level and N between the two zeros

surrounding the main lobe, of the common window functions, but also the largest side-lobe peak, at about
-13 dB. The side-lobes also taper o relatively slowly.

(a) Rectangular window

(b) Magnitude of boxcar window spectrum

Figure 2.5: Length-64 truncation (boxcar) window and its magnitude DFT spectrum

The Hann window (sometimes


  also called the hanning window), illustrated in Figure 2.6, takes the
form w [n] = 0.5 − 0.5cos 2πn
N −1 for n between 0 and N − 1. It has a main-lobe width (about 3π
N at the -3
dB level and between the two zeros surrounding the main lobe) considerably larger than the rectangular

N
window, but the largest side-lobe peak is much lower, at about -31.5 dB. The side-lobes also taper o
much faster. For a given length, this window is worse than the boxcar window at separating closely-spaced
4 http://en.wikipedia.org/wiki/Window_function
13

spectral components of similar magnitude, but better for identifying smaller-magnitude components at a
greater distance from the larger components.

(a) Hann window

(b) Magnitude of Hann window spectrum

Figure 2.6: Length-64 Hann window and its magnitude DFT spectrum

The Hamming window, illustrated in Figure 2.7, has  a form similar to the Hann window but with
slightly dierent constants: w [n] = 0.538 − 0.462cos N2πn
−1 for n between 0 and N − 1. Since it is composed
of the same Fourier series harmonics as the Hann window, it has a similar main-lobe width (a bit less than
N at the -3 dB level and N between the two zeros surrounding the main lobe), but the largest side-lobe
3π 8π

peak is much lower, at about -42.5 dB. However, the side-lobes also taper o much more slowly than with
the Hann window. For a given length, the Hamming window is better than the Hann (and of course the
boxcar) windows at separating a small component relatively near to a large component, but worse than the
Hann for identifying very small components at considerable frequency separation. Due to their shape and
form, the Hann and Hamming windows are also known as raised-cosine windows.
14 CHAPTER 2. SPECTRUM ANALYSIS

(a) Hamming window

(b) Magnitude of Hamming window spectrum

Figure 2.7: Length-64 Hamming window and its magnitude DFT spectrum

note: Standard even-length windows are symmetric around a point halfway between the window
samples N2 − 1 and N2 . For some applications such as time-frequency analysis (Section 2.3), it
may be important to align the window perfectly to a sample. In such cases, a DFT-symmetric
window that is symmetric around the N2 -th samplecan be used. For example, the DFT-symmetric
Hamming window is w [n] = 0.538 − 0.462cos 2πn N . A DFT-symmetric window has a purely real-
valued DFT and DTFT. DFT-symmetric versions of windows, such as the Hamming and Hann
windows, composed of few discrete Fourier series terms of period N , have few non-zero DFT terms
(only when not zero-padded) and can be used eciently in running FFTs (Section 3.2).

The main-lobe width of a window is an inverse function of the window-length N ; for any type of window, a
longer window will always provide better resolution.
Many other windows exist that make various other tradeos between main-lobe width, height of largest
side-lobe, and side-lobe rollo rate. The Kaiser window5 family, based on a modied Bessel function, has an
adjustable parameter that allows the user to tune the tradeo over a continuous range. The Kaiser window
has near-optimal time-frequency resolution and is widely used. A list of many dierent windows can be
5 http://en.wikipedia.org/wiki/Kaiser_window
15

found here6 .
Example 2.1
Figure 2.8 shows 64 samples of a real-valued signal composed of several sinusoids of various fre-
quencies and amplitudes.

Figure 2.8: 64 samples of an unknown signal

Figure 2.9 shows the magnitude (in dB) of the positive frequencies of a length-1024 zero-padded
DFT of this signal (that is, using a simple truncation, or rectangular, window).
6 http://en.wikipedia.org/wiki/Window_function
16 CHAPTER 2. SPECTRUM ANALYSIS

Figure 2.9: Magnitude (in dB) of the zero-padded DFT spectrum of the signal in Figure 2.8 using a
simple length-64 rectangular window

From this spectrum, it is clear that the signal has two large, nearby frequency components with
frequencies near 1 radian of essentially the same magnitude.
Figure 2.10 shows the spectral estimate produced using a length-64 Hamming window applied
to the same signal shown in Figure 2.8.
17

Figure 2.10: Magnitude (in dB) of the zero-padded DFT spectrum of the signal in Figure 2.8 using a
length-64 Hamming window

The two large spectral peaks can no longer be resolved; they blur into a single broad peak due
to the reduced spectral resolution of the broader main lobe of the Hamming window. However, the
lower side-lobes reveal a third component at a frequency of about 0.7 radians at about 35 dB lower
magnitude than the larger components. This component was entirely buried under the side-lobes
when the rectangular window was used, but now stands out well above the much lower nearby
side-lobes of the Hamming window.
Figure 2.11 shows the spectral estimate produced using a length-64 Hann window applied to
the same signal shown in Figure 2.8.
18 CHAPTER 2. SPECTRUM ANALYSIS

Figure 2.11: Magnitude (in dB) of the zero-padded DFT spectrum of the signal in Figure 2.8 using a
length-64 Hann window

The two large components again merge into a single peak, and the smaller component observed
with the Hamming window is largely lost under the higher nearby side-lobes of the Hann window.
However, due to the much faster side-lobe rollo of the Hann window's spectrum, a fourth com-
ponent at a frequency of about 2.5 radians with a magnitude about 65 dB below that of the main
peaks is now clearly visible.
This example illustrates that no single window is best for all spectrum analyses. The best
window depends on the nature of the signal, and dierent windows may be better for dierent
components of the same signal. A skilled spectrum analysist may apply several dierent windows
to a signal to gain a fuller understanding of the data.

2.2 Classical Statistical Spectral Estimation 7

Many signals are either partly or wholly stochastic, or random. Important examples include human speech,
vibration in machines, and CDMA8 communication signals. Given the ever-present noise in electronic sys-
7 This content is available online at <http://cnx.org/content/m12014/1.3/>.
8 http://en.wikipedia.org/wiki/Cdma
19

tems, it can be argued that almost all signals are at least partly stochastic. Such signals may have a
distinct average spectral structure that reveals important information (such as for speech recognition or
early detection of damage in machinery). Spectrum analysis of any single block of data using window-based
deterministic spectrum analysis (Section 2.1), however, produces a random spectrum that may be dicult to
interpret. For such situations, the classical statistical spectrum estimation methods described in this module
can be used.
The goal in classical statistical spectrum analysis is to estimate E (|X (ω) |) , the power spectral
h i
2

density (PSD) across frequency of the stochastic signal. That is, the goal is to nd the expected (mean,
or average) energy density of the signal as a function of frequency. (For zero-mean signals, this equals the
variance of each frequency sample.) Since the spectrum of each block of signal samples is itself random, we
must average the squared spectral magnitudes over a number of blocks of data to nd the mean. There are
two main classical approaches, the periodogram (Section 2.2.1: Periodogram method) and auto-correlation
(Section 2.2.2: Auto-correlation-based approach) methods.

2.2.1 Periodogram method


The periodogram method divides the signal into a number of shorter (and often overlapped) blocks of data,
computes the squared magnitude of the windowed (Section 2.1.6: Eects of Windowing) (and usually zero-
padded (Section 2.1.5: Zero-Padding)) DFT (2.3), Xi (ωk ), of each block, and averages them to estimate the
power spectral density. The squared magnitudes of the DFTs of L possibly overlapped length-N windowed
blocks of signal (each probably with zero-padding (Section 2.1.5: Zero-Padding)) are averaged to estimate
the power spectral density:
L
\ 1 X 2

X (ωk ) = (|Xi (ωk ) |)
L i=1
For a xed total number of samples, this introduces a tradeo: Larger individual data blocks provides better
frequency resolution due to the use of a longer window, but it means there are less blocks to average, so
the estimate has higher variance and appears more noisy. The best tradeo depends on the application.
Overlapping blocks by a factor of two to four increases the number of averages and reduces the variance, but
since the same data is being reused, still more overlapping does not further reduce the variance. As with any
window-based spectrum estimation (Section 2.1.6: Eects of Windowing) procedure, the window function
introduces broadening and sidelobes into the power hspectrum estimate. i That is,hthe periodogram produces
an estimate of the windowed spectrum X (ω) = E (|X (ω) ∗ WM |) , not of E (|X (ω) |) .
i
\ 2 2

Example 2.2
Figure 2.12 shows the non-negative frequencies of the DFT (zero-padded to 1024 total samples) of
64 samples of a real-valued stochastic signal.
20 CHAPTER 2. SPECTRUM ANALYSIS

Figure 2.12: DFT magnitude (in dB) of 64 samples of a stochastic signal

With no averaging, the power spectrum is very noisy and dicult to interpret other than noting
a signicant reduction in spectral energy above about half the Nyquist frequency. Various peaks
and valleys appear in the lower frequencies, but it is impossible to say from this gure whether
they represent actual structure in the power spectral density (PSD) or simply random variation in
this single realization. Figure 2.13 shows the same frequencies of a length-1024 DFT of a length-
1024 signal. While the frequency resolution has improved, there is still no averaging, so it remains
dicult to understand the power spectral density of this signal. Certain small peaks in frequency
might represent narrowband components in the spectrum, or may just be random noise peaks.
21

Figure 2.13: DFT magnitude (in dB) of 1024 samples of a stochastic signal

In Figure 2.14, a power spectral density computed from averaging the squared magnitudes of
length-1024 zero-padded DFTs of 508 length-64 blocks of data (overlapped by a factor of four, or a
16-sample step between blocks) are shown.
22 CHAPTER 2. SPECTRUM ANALYSIS

Figure 2.14: Power spectrum density estimate (in dB) of 1024 samples of a stochastic signal

While the frequency resolution corresponds to that of a length-64 truncation window, the aver-
aging greatly reduces the variance of the spectral estimate and allows the user to reliably conclude
that the signal consists of lowpass broadband noise with a at power spectrum up to half the
Nyquist frequency, with a stronger narrowband frequency component at around 0.65 radians.

2.2.2 Auto-correlation-based approach


The averaging necessary to estimate a power spectral density can be performed in the discrete-time domain,
rather than in frequency, using the auto-correlation method. The squared magnitude of the frequency
response, from the DTFT multiplication and conjugation properties, corresponds in the discrete-time domain
to the signal convolved with the time-reverse of itself,
 
2
(|X (ω) |) = X (ω) X ∗ (ω) ↔ (x (n) , x∗ (−n)) = r (n)

or its auto-correlation X
r (n) = (x (k) x∗ (n + k))
We can thus compute the squared magnitude of the spectrum of a signal by computing the DFT of its
auto-correlation. For stochastic signals, the power spectral density is an expectation, or average, and by
23

linearity of expectation can be found by transforming the average of the auto-correlation. For a nite block
of N signal samples, the average of the autocorrelation values, r (n), is
N −(1−n)
1 X
r (n) = (x (k) x∗ (n + k))
N −n
k=0

Note that with increasing lag, n, fewer values are averaged, so they introduce more noise into the estimated
power spectrum. By windowing (Section 2.1.6: Eects of Windowing) the auto-correlation before transform-
ing it to the frequency domain, a less noisy power spectrum is obtained, at the expense of less resolution.
The multiplication property of the DTFT shows that the windowing smooths the resulting power spectrum
via convolution with the DTFT of the window:
M
X    h i
2
\
X (ω) = r (n) w (n) e−(jωn) = E (|X (ω) |) ∗ W (ω)
n=−M

This yields another important interpretation of how the auto-correlation method works: it estimates the
power spectral density by averaging the power spectrum over nearby frequencies, through convolu-
tion with the window function's transform, to reduce variance. Just as with the periodogram approach, there
is always a variance vs. resolution tradeo. The periodogram and the auto-correlation method give similar
results for a similar amount of averaging; the user should simply note that in the periodogram case, the
window introduces smoothing of the spectrum via frequency convolution before squaring the magnitude,
whereas the periodogram convolves the squared magnitude with W (ω).

2.3 Short Time Fourier Transform 9

2.3.1 Short Time Fourier Transform


The Fourier transforms (FT, DTFT, DFT, etc.) do not clearly indicate how the frequency content of a signal
changes over time.
That information is hidden in the phase - it is not revealed by the plot of the magnitude of the spectrum.

note: To see how the frequency content of a signal changes over time, we can cut the signal into
blocks and compute the spectrum of each block.

To improve the result,

1. blocks are overlapping


2. each block is multiplied by a window that is tapered at its endpoints.

Several parameters must be chosen:

• Block length, R.
• The type of window.
• Amount of overlap between blocks. (Figure 2.15 (STFT: Overlap Parameter))
• Amount of zero padding, if any.
9 This content is available online at <http://cnx.org/content/m10570/2.4/>.
24 CHAPTER 2. SPECTRUM ANALYSIS

STFT: Overlap Parameter

Figure 2.15

The short-time Fourier transform is dened as


X (ω, m) = (STFT (x (n)) := DTFT (x (n − m) w (n)))
(2.7)
P∞ −(jωn)

= n=−∞ x (n − m) w (n) e
PR−1 −(jωn)

= n=0 x (n − m) w (n) e

where w (n) is the window function of length R.


25

1. The STFT of a signal x (n) is a function of two variables: time and frequency.
2. The block length is determined by the support of the window function w (n).
3. A graphical display of the magnitude of the STFT, |X (ω, m) |, is called the spectrogram of the signal.
It is often used in speech processing.
4. The STFT of a signal is invertible.
5. One can choose the block length. A long block length will provide higher frequency resolution (because
the main-lobe of the window function will be narrow). A short block length will provide higher time
resolution because less averaging across samples is performed for each STFT value.
6. A narrow-band spectrogram is one computed using a relatively long block length R, (long window
function).
7. A wide-band spectrogram is one computed using a relatively short block length R, (short window
function).

2.3.1.1 Sampled STFT


To numerically evaluate the STFT, we sample the frequency axis ω in N equally spaced samples from ω = 0
to ω = 2π .

ωk = k , 0≤k ≤N −1 (2.8)
N
We then have the discrete STFT,
PR−1

x (n − m) w (n) e−(jωn)
 
X d (k, m) := X N k, m = n=0
PR−1  
= n=0 x (n − m) w (n) WN −(kn) (2.9)

n=0 , 0,. . .0
x (n − m) w (n) |R−1

= DFTN

where 0,. . .0 is N − R.
In this denition, the overlap between adjacent blocks is R − 1. The signal is shifted along the window
one sample at a time. That generates more points than is usually needed, so we also sample the STFT along
the time direction. That means we usually evaluate

X d (k, Lm)

where L is the time-skip. The relation between the time-skip, the number of overlapping samples, and the
block length is
Overlap = R − L

Exercise 2.2 (Solution on p. 36.)


Match each signal to its spectrogram in Figure 2.16.
26 CHAPTER 2. SPECTRUM ANALYSIS

(a)

(b)

Figure 2.16
27

2.3.1.2 Spectrogram Example

Figure 2.17
28 CHAPTER 2. SPECTRUM ANALYSIS

Figure 2.18

The matlab program for producing the gures above (Figure 2.17 and Figure 2.18).

% LOAD DATA
load mtlb;
x = mtlb;

figure(1), clf
plot(0:4000,x)
xlabel('n')
ylabel('x(n)')

% SET PARAMETERS
R = 256; % R: block length
window = hamming(R); % window function of length R
N = 512; % N: frequency discretization
L = 35; % L: time lapse between blocks
fs = 7418; % fs: sampling frequency
29

overlap = R - L;

% COMPUTE SPECTROGRAM
[B,f,t] = specgram(x,N,fs,window,overlap);

% MAKE PLOT
figure(2), clf
imagesc(t,f,log10(abs(B)));
colormap('jet')
axis xy
xlabel('time')
ylabel('frequency')
title('SPECTROGRAM, R = 256')
30 CHAPTER 2. SPECTRUM ANALYSIS

2.3.1.3 Eect of window length R

Narrow-band spectrogram: better frequency resolution

Figure 2.19
31

Wide-band spectrogram: better time resolution

Figure 2.20

Here is another example to illustrate the frequency/time resolution trade-o (See gures - Figure 2.19
(Narrow-band spectrogram: better frequency resolution), Figure 2.20 (Wide-band spectrogram: better time
resolution), and Figure 2.21 (Eect of Window Length R)).
32 CHAPTER 2. SPECTRUM ANALYSIS

Eect of Window Length R

(a)

(b)

Figure 2.21

2.3.1.4 Eect of L and N


A spectrogram is computed with dierent parameters:

L ∈ {1, 10}

N ∈ {32, 256}

• L = time lapse between blocks.


• N = FFT length (Each block is zero-padded to length N .)
In each case, the block length is 30 samples.
Exercise 2.3 (Solution on p. 36.)
For each of the four spectrograms in Figure 2.22 can you tell what L and N are?
33

(a)

(b)

Figure 2.22

L and N do not eect the time resolution or the frequency resolution. They only aect the 'pixelation'.

2.3.1.5 Eect of R and L


Shown below are four spectrograms of the same signal. Each spectrogram is computed using a dierent set
of parameters.
R ∈ {120, 256, 1024}

L ∈ {35, 250}
where

• R = block length
• L = time lapse between blocks.

Exercise 2.4 (Solution on p. 36.)


For each of the four spectrograms in Figure 2.23, match the above values of L and R.
34 CHAPTER 2. SPECTRUM ANALYSIS

Figure 2.23

If you like, you may listen to this signal with the soundsc command; the data is in the le: stft_data.m.
Here (Figure 2.24) is a gure of the signal.
35

Figure 2.24
36 CHAPTER 2. SPECTRUM ANALYSIS

Solutions to Exercises in Chapter 2


Solution to Exercise 2.1 (p. 7)
In general, NO. The DTFT exactly corresponds to the continuous-time Fourier transform only when the
signal is bandlimited and sampled at more than twice its highest frequency. The DFT frequency values
exactly correspond to frequency samples of the DTFT only when the discrete-time signal is time-limited.
However, a bandlimited continuous-time signal cannot be time-limited, so in general these conditions cannot
both be satised.
It can, however, be true for a small class of analog signals which are not time-limited but happen to
exactly equal zero at all sample times outside of the interval n ∈ [0, N − 1]. The sinc function with a
bandwidth equal to the Nyquist frequency and centered at t = 0 is an example.
Solution to Exercise 2.2 (p. 25)
Solution to Exercise 2.3 (p. 32)
Solution to Exercise 2.4 (p. 33)
Chapter 3

Fast Fourier Transform Algorithms


3.1 Overview of Fast Fourier Transform (FFT) Algorithms 1

A fast Fourier transform2 , or FFT3 , is not a new transform, but is a computationally ecient algorithm for
the computing the DFT (Section 1.1). The length-N DFT, dened as
N −1  
x (n) e−(j )
X 2πnk
X (k) = N (3.1)
n=0

where X (k) and x (n) are in general complex-valued and 0 ≤ k , n ≤ N −1, requires N complex multiplies to
compute each X (k). Direct computation of all N frequency samples thus requires N 2 complex multiplies and
nk . −(j 2πnk
 
N (N − 1) complex additions. (This assumes precomputation of the DFT coecients W = e N ) ;
N
otherwise, the cost is even higher.) For the large DFT lengths used in many applications, N 2 operations
may be prohibitive. (For example, digital terrestrial television broadcast in Europe uses N = 2048 or 8192
OFDM channels, and the SETI4 project uses up to length-4194304 DFTs.) DFTs are thus almost always
computed in practice by an FFT algorithm5 . FFTs are very widely used in signal processing, for applications
such as spectrum analysis (Section 2.1) and digital ltering via fast convolution (Chapter 4).

3.1.1 History of the FFT


It is now known that C.F. Gauss6 invented an FFT in 1805 or so to assist the computation of planetary
orbits via discrete Fourier series. Various FFT algorithms were independently invented over the next two
centuries, but FFTs achieved widespread awareness and impact only with the Cooley and Tukey algorithm
published in 1965, which came at a time of increasing use of digital computers and when the vast range of
applications of numerical Fourier techniques was becoming apparent. Cooley and Tukey's algorithm spawned
a surge of research in FFTs and was also partly responsible for the emergence of Digital Signal Processing
(DSP) as a distinct, recognized discipline. Since then, many dierent algorithms have been rediscovered or
developed, and ecient FFTs now exist for all DFT lengths.
1 This content is available online at <http://cnx.org/content/m12026/1.3/>.
2 The DFT, FFT, and Practical Spectral Analysis <http://cnx.org/content/col10281/latest/>
3 The DFT, FFT, and Practical Spectral Analysis <http://cnx.org/content/col10281/latest/>
4 http://en.wikipedia.org/wiki/SETI
5 The DFT, FFT, and Practical Spectral Analysis <http://cnx.org/content/col10281/latest/>
6 http://en.wikipedia.org/wiki/Carl_Friedrich_Gauss

37
38 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

3.1.2 Summary of FFT algorithms


The main strategy behind most FFT algorithms is to factor a length-N DFT into a number of shorter-
length DFTs, the outputs of which are reused multiple times (usually in additional short-length DFTs!) to
compute the nal results. The lengths of the short DFTs correspond to integer factors of the DFT length, N ,
leading to dierent algorithms for dierent lengths and factors. By far the most commonly used FFTs select
N = 2M to be a power of two, leading to the very ecient power-of-two FFT algorithms (Section 3.4.1),
including the decimation-in-time radix-2 FFT (Section 3.4.2.1) and the decimation-in-frequency radix-2
FFT (Section 3.4.2.2) algorithms, the radix-4 FFT (Section 3.4.3) (N = 4M ), and the split-radix FFT
(Section 3.4.4). Power-of-two algorithms gain their high eciency from extensive reuse of intermediate results
and from the low complexity of length-2 and length-4 DFTs, which require no multiplications. Algorithms
for lengths with repeated common factors (Section 3.5) (such as 2 or 4 in the radix-2 and radix-4 algorithms,
respectively) require extra twiddle factor multiplications between the short-length DFTs, which together
lead to a computational complexity of O (N logN ), a very considerable savings over direct computation of
the DFT.
The other major class of algorithms is the Prime-Factor Algorithms (PFA) (Section 3.6). In PFAs,
the short-length DFTs must be of relatively prime lengths. These algorithms gain eciency by reuse of
intermediate computations and by eliminating twiddle-factor multiplies, but require more operations than the
power-of-two algorithms to compute the short DFTs of various prime lengths. In the end, the computational
costs of the prime-factor and the power-of-two algorithms are comparable for similar lengths, as illustrated in
Choosing the Best FFT Algorithm (Chapter 8). Prime-length DFTs cannot be factored into shorter DFTs,
but in dierent ways both Rader's conversion (Chapter 6) and the chirp z-transform (Chapter 5) convert
prime-length DFTs into convolutions of other lengths that can be computed eciently using FFTs via fast
convolution (Chapter 4).
Some applications require only a few DFT frequency samples, in which case Goertzel's algorithm (Sec-
tion 3.3) halves the number of computations relative to the DFT sum. Other applications involve successive
DFTs of overlapped blocks of samples, for which the running FFT (Section 3.2) can be more ecient than
separate FFTs of each block.

3.2 Running FFT 7

Some applications need DFT (2.3) frequencies of the most recent N samples on an ongoing basis. One
example is DTMF8 , or touch-tone telephone dialing, in which a detection circuit must constantly monitor
the line for two simultaneous frequencies indicating that a telephone button is depressed. In such cases,
most of the data in each successive block of samples is the same, and it is possible to eciently update the
DFT value from the previous sample to compute that of the current sample. Figure 3.1 illustrates successive
length-4 blocks of data for which successive DFT values may be needed. The running FFT algorithm
described here can be used to compute successive DFT values at a cost of only two complex multiplies and
additions per DFT frequency.
7 This content is available online at <http://cnx.org/content/m12029/1.5/>.
8 http://en.wikipedia.org/wiki/DTMF
39

Figure 3.1: The running FFT eciently computes DFT values for successive overlapped blocks of
samples.

The running FFT algorithm is derived by expressing each DFT sample, Xn+1 (ωk ), for the next block at
time n + 1 in terms of the previous value, Xn (ωk ), at time n.
N
X −1  
Xn (ωk ) = x (n − p) e−(jωk p)
p=0

N
X −1  
Xn+1 (ωk ) = x (n + 1 − p) e−(jωk p)
p=0

Let q = p − 1:
N
X −2   N
X −2  
Xn+1 (ωk ) = x (n − q) e−(jωk (q−1)) = ejωk x (n − q) e−(jωk q) + x (n + 1)
q=−1 q=0

Now let's add and subtract e−(jωk (N −2)) x (n − N + 1):


P −2
Xn+1 (ωk ) = ejωk N −(jωk q)
+ ejωk x (n − (N − 1)) e−(jωk (N −1)) −

q=0 x (n − q) e (3.2)
−1
e−(jωk (N −2)) x (n − N + 1) + x (n + 1) = ejωk N −(jωk )
P 
q=0 x (n − q) e + x (n + 1) −
−(jωk ) jωk −(jωk (N −2))
e x (n − N + 1) = e Xn (ωk ) + x (n + 1) − e x (n − N + 1)
This running FFT algorithm requires only two complex multiplies and adds per update, rather than N
if each DFT value were recomputed according to the DFT equation. Another advantage of this algorithm
is that it works for any ωk , rather than just the standard DFT frequencies. This can make it advantageous
for applications, such as DTMF detection, where only a few arbitrary frequencies are needed.
Successive computation of a specic DFT frequency for overlapped blocks can also be thought of as a
length-N FIR lter9 . The running FFT is an ecient recursive implementation of this lter for this special
case. Figure 3.2 shows a block diagram of the running FFT algorithm. The running FFT is one way to
compute DFT lterbanks10 . If a window other than rectangular is desired, a running FFT requires either
a fast recursive implementation of the corresponding windowed, modulated impulse response, or it must
have few non-zero coecients so that it can be applied after the running FFT update via frequency-domain
convolution. DFT-symmmetric raised-cosine windows (Section 2.1.6: Eects of Windowing) are an example.
9 Digital Filter Design <http://cnx.org/content/col10285/latest/>
10 "DFT-Based Filterbanks" <http://cnx.org/content/m12771/latest/>
40 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Figure 3.2: Block diagram of the running FFT computation, implemented as a recursive lter

3.3 Goertzel's Algorithm 11

Some applications require only a few DFT frequencies. One example is frequency-shift keying (FSK)12
demodulation, in which typically two frequencies are used to transmit binary data; another example is
DTMF13 , or touch-tone telephone dialing, in which a detection circuit must constantly monitor the line
for two simultaneous frequencies indicating that a telephone button is depressed. Goertzel's algorithm[11]
reduces the number of real-valued multiplications by almost a factor of two relative to direct computation via
the DFT equation (2.3). Goertzel's algorithm is thus useful for computing a few frequency values; if many
or most DFT values are needed, FFT algorithms (Section 3.1) that compute all DFT samples in O (N logN )
operations are faster. Goertzel's algorithm can be derived by converting the DFT equation (Section 1.1) into
an equivalent form as a convolution, which can be eciently implemented as a digital lter. For increased
clarity, in the equations below the complex exponential is denoted as e−(j N ) = WNk . Note that because
2πk

WN−N k always equals 1, the DFT equation (Section 1.1) can be rewritten as a convolution, or ltering
operation:
PN −1 
X (k) = n=0 x (n) 1WNnk
PN −1
x (n) WN−N k WNnk

= n=0
PN −1 
(N −n)(−k)
 (3.3)
= n=0 x (n) WN
WN−k x (0) + x (1) WN−k + x (2) WN−k + · · · + x (N − 1) WN−k
  
=

Note that this last expression can be written in terms of a recursive dierence equation14

y (n) = WN−k y (n − 1) + x (n)

where y (−1) = 0. The DFT coecient equals the output of the dierence equation at time n = N :

X (k) = y (N )
11 This content is available online at <http://cnx.org/content/m12024/1.5/>.
12 http://en.wikipedia.org/wiki/Frequency-shift_keying
13 http://en.wikipedia.org/wiki/DTMF
14 "Dierence Equation" <http://cnx.org/content/m10595/latest/>
41

Expressing the dierence equation as a z-transform15 and multiplying both numerator and denominator by
1 − WNk z −1 gives the transfer function
Y (z) 1 1 − WNk z −1 1 − WNk z −1
= H (z) = = =
1 − WN−k z −1 WNk + WN−k z −1 − z −2 1 − 2cos 2πk
   
X (z) 1− N z −1 − z −2
This system can be realized by the structure in Figure 3.3

Figure 3.3

We want y (n) not for all n, but only for n = N . We can thus compute only the recursive part, or
just the left side of the ow graph in Figure 3.3, for n = [0, 1, . . . , N ], which involves only a real/complex
product rather than a complex/complex product as in a direct DFT (2.3), plus one complex multiply to get
y (N ) = X (k).
note: The input x (N ) at time n = N must equal 0! A slightly more ecient alternate imple-
mentation16 that computes the full recursion only through n = N − 1 and combines the nonzero
operations of the nal recursion with the nal complex multiply can be found here17 , complete
with pseudocode (for real-valued data).
If the data are real-valued, only real/real multiplications and real additions are needed until the nal multiply.
note: The computational cost of Goertzel's algorithm is thus 2N + 2 real multiplies and 4N − 2
real adds, a reduction of almost a factor of two in the number of real multiplies relative to direct
computation via the DFT equation. If the data are real-valued, this cost is almost halved again.
For certain frequencies, additional simplications requiring even fewer multiplications are possible. (For
example, for the DC (k = 0) frequency, all the multipliers equal 1 and only additions are needed.) A
correspondence by C.G. Boncelet, Jr.[7] describes some of these additional simplications. Once again,
Goertzel's and Boncelet's algorithms are ecient for a few DFT frequency samples; if more than logN
frequencies are needed, O (N logN ) FFT algorithms (Section 3.1) that compute all frequencies simultaneously
will be more ecient.
15 "Dierence Equation" <http://cnx.org/content/m10595/latest/>
16 http://www.mstarlabs.com/dsp/goertzel/goertzel.html
17 http://www.mstarlabs.com/dsp/goertzel/goertzel.html
42 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

3.4 Power-of-Two FFTs


3.4.1 Power-of-two FFTs18
FFTs of length N = 2M equal to a power of two are, by far, the most commonly used. These algorithms are
very ecient, relatively simple, and a single program can compute power-of-two FFTs of dierent lengths.
As with most FFT algorithms, they gain their eciency by computing all DFT (Section 1.1) points simul-
taneously through extensive reuse of intermediate computations; they are thus ecient when many DFT
frequency samples are needed. The simplest power-of-two FFTs are the decimation-in-time radix-2 FFT
(Section 3.4.2.1) and the decimation-in-frequency radix-2 FFT (Section 3.4.2.2); they reduce the length-
N = 2M DFT to a series of length-2 DFT computations with twiddle-factor complex multiplications
between them. The radix-4 FFT algorithm (Section 3.4.3) similarly reduces a length-N = 4M DFT to a
series of length-4 DFT computations with twiddle-factor multiplies in between. Radix-4 FFTs require only
75% as many complex multiplications as the radix-2 algorithms, although the number of complex additions
remains the same. Radix-8 and higher-radix FFT algorithms can be derived using multi-dimensional index
maps (Section 3.5) to reduce the computational complexity a bit more. However, the split-radix algorithm
(Section 3.4.4) and its recent extensions combine the best elements of the radix-2 and radix-4 algorithms to
obtain lower complexity than either or than any higher radix, requiring only two-thirds as many complex
multiplies as the radix-2 algorithms. All of these
 algorithms obtain huge savings over direct computation of
the DFT, reducing the complexity from O N 2 to O (N logN ).
The eciency of an FFT implementation depends on more than just the number of computations. E-
cient FFT programming tricks (Chapter 7) can make up to a several-fold dierence in the run-time of FFT
programs. Alternate FFT structures (Section 3.4.2.3) can lead to a more convenient data ow for certain
hardware. As discussed in choosing the best FFT algorithm (Chapter 8), certain hardware is designed for,
and thus most ecient for, FFTs of specic lengths or radices.

3.4.2 Radix-2 Algorithms


3.4.2.1 Decimation-in-time (DIT) Radix-2 FFT19
The radix-2 decimation-in-time and decimation-in-frequency (Section 3.4.2.2) fast Fourier transforms (FFTs)
are the simplest FFT algorithms (Section 3.1). Like all FFTs, they gain their speed by reusing the results
of smaller, intermediate computations to compute multiple DFT frequency outputs.

3.4.2.1.1 Decimation in time


The radix-2 decimation-in-time algorithm rearranges the discrete Fourier transform (DFT) equation (Sec-
tion 1.1) into two parts: a sum over the even-numbered discrete-time indices n = [0, 2, 4, . . . , N − 2] and a
sum over the odd-numbered indices n = [1, 3, 5, . . . , N − 1] as in (3.4):
PN −1  −(j 2πnk

X (k) = x (n) e N )
n=0
P N2 −1  2π(2n)k
 PN 
2 −1
2π(2n+1)k

= n=0 x (2n) e−(j N ) + n=0 x (2n + 1) e−(j N )
(3.4)
„ «! „ «!
P N2 −1 − j 2πnk 2πk P N −1 − j 2πnk
+ e−(j N )
N N
= n=0 x (2n) e 2 2
x (2n + 1) e
n=0
2

= DFT N [[x (0) , x (2) , . . . , x (N − 2)]] + WNk DFT N [[x (1) , x (3) , . . . , x (N − 1)]]
2 2

The mathematical simplications in (3.4) reveal that all DFT frequency outputs X (k) can be computed as
the sum of the outputs of two length- N2 DFTs, of the even-indexed and odd-indexed discrete-time samples,
respectively, where the odd-indexed short DFT is multiplied by a so-called twiddle factor term WNk =
18 This content is available online at <http://cnx.org/content/m12059/1.2/>.
19 This content is available online at <http://cnx.org/content/m12016/1.7/>.
43

e−(j N ) . This is called a decimation in time because the time samples are rearranged in alternating
2πk

groups, and a radix-2 algorithm because there are two groups. Figure 3.4 graphically illustrates this form
of the DFT computation, where for convenience the frequency outputs of the length- N2 DFT of the even-
indexed time samples are denoted G (k) and those of the odd-indexed samples as H (k). Because of the
periodicity with N2 frequency samples of these length- N2 DFTs, G (k) and H (k) can be used to compute two
of the length-N DFT frequencies, namely X (k) and X k + N2 , but with a dierent twiddle factor. This


reuse of these short-length DFT outputs gives the FFT its computational savings.

Figure 3.4: Decimation in time of a length-N DFT into two length- N2 DFTs followed by a combining
stage.

Whereas direct computation of all N DFT frequencies according to the DFT equation (Section 1.1)
would require N 2 complex multiplies and N 2 − N complex additions (for complex-valued data), by reusing
the results of the two short-length DFTs as illustrated in Figure 3.4, the computational cost is now
New Operation Counts
2 2
• 2 N2 + N = N2 + N complex multiplies
N2
• 2 N2 2 − 1 + N = 2 complex additions
N


This simple reorganization and reuse has reduced the total computation by almost a factor of two over direct
DFT (Section 1.1) computation!

3.4.2.1.2 Additional Simplication


A basic buttery operation is shown in Figure 3.5, which requires only N2 twiddle-factor multiplies per
stage. It is worthwhile to note that, after merging the twiddle factors to a single term on the lower branch,
44 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

the remaining buttery is actually a length-2 DFT! The theory of multi-dimensional index maps (Section 3.5)
shows that this must be the case, and that FFTs of any factorable length may consist of successive stages of
shorter-length FFTs with twiddle-factor multiplications in between.

(a) (b)

Figure 3.5: Radix-2 DIT buttery simplication: both operations produce the same outputs

3.4.2.1.3 Radix-2 decimation-in-time FFT


The same radix-2 decimation in time can be applied recursively to the two length N2 DFT (Section 1.1)s to
save computation. When successively applied until the shorter and shorter DFTs reach length-2, the result
is the radix-2 DIT FFT algorithm (Figure 3.6).

Figure 3.6: Radix-2 Decimation-in-Time FFT algorithm for a length-8 signal


45

The full radix-2 decimation-in-time decomposition illustrated in Figure 3.6 using the simplied butteries
(Figure 3.5) involves M = log2 N stages, each with N2 butteries per stage. Each buttery requires 1 complex
multiply and 2 adds per buttery. The total cost of the algorithm is thus
Computational cost of radix-2 DIT FFT
• N2 log2 N complex multiplies
• N log2 N complex adds
This is a remarkable savings over direct computation of the DFT. For example, a length-1024 DFT would
require 1048576 complex multiplications and 1047552 complex additions with direct computation, but only
5120 complex multiplications and 10240 complex additions using the radix-2 FFT, a savings by a factor of
100 or more. The relative savings increase with longer FFT lengths, and are less for shorter lengths.
Modest additional reductions in computation can be achieved by noting that certain twiddle factors,
N N N 3N
namely Using special butteries for WN0 , WN2 , WN4 , WN8 , WN8 , require no multiplications, or fewer real
multiplies than other ones. By implementing special butteries for these twiddle factors as discussed in FFT
algorithm and programming tricks, the computational cost of the radix-2 decimation-in-time FFT can be
reduced to

• 2N log2 N − 7N + 12 real multiplies


• 3N log2 N − 3N + 4 real additions

note: In a decimation-in-time radix-2 FFT as illustrated in Figure 3.6, the input is in bit-reversed
order (hence "decimation-in-time"). That is, if the time-sample index n is written as a binary
number, the order is that binary number reversed. The bit-reversal process is illustrated for a
length-N = 8 example below.

Example 3.1: N=8

In-order index In-order index in binary Bit-reversed binary Bit-reversed index


0 000 000 0
1 001 100 4
2 010 010 2
3 011 110 6
4 100 001 1
5 101 101 5
6 110 011 3
7 111 111 7

Table 3.1

It is important to note that, if the input signal data are placed in bit-reversed order before beginning the
FFT computations, the outputs of each buttery throughout the computation can be placed in the same
memory locations from which the inputs were fetched, resulting in an in-place algorithm that requires no
extra memory to perform the FFT. Most FFT implementations are in-place, and overwrite the input data
with the intermediate values and nally the output.
46 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

3.4.2.1.4 Example FFT Code


The following function, written in the C programming language, implements a radix-2 decimation-in-time
FFT. It is designed for computing the DFT of complex-valued inputs to produce complex-valued outputs,
with the real and imaginary parts of each number stored in separate double-precision oating-point arrays. It
is an in-place algorithm, so the intermediate and nal output values are stored in the same array as the input
data, which is overwritten. After initializations, the program rst bit-reverses the discrete-time samples, as
is typical with a decimation-in-time algorithm (but see alternate FFT structures (Section 3.4.2.3) for DIT
algorithms with other input orders), then computes the FFT in stages according to the above description.
Ihis FFT program (p. 46) uses a standard three-loop structure for the main FFT computation. The
outer loop steps through the stages (each column in Figure 3.6); the middle loop steps through "ights"
(butteries with the same twiddle factor from each short-length DFT at each stage), and the inner loop
steps through the individual butteries. This ordering minimizes the number of fetches or computations of
the twiddle-factor values. Since the bit-reverse of a bit-reversed index is the original index, bit-reversal can
be performed fairly simply by swapping pairs of data.

note: While of O (N logN ) complexity and thus much faster than a direct DFT, this simple program
is optimized for clarity, not for speed. A speed-optimized program making use of additional ecient
FFT algorithm and programming tricks (Chapter 7) will compute a DFT several times faster on
most machines.

/**********************************************************/
/* fft.c */
/* (c) Douglas L. Jones */
/* University of Illinois at Urbana-Champaign */
/* January 19, 1992 */
/* */
/* fft: in-place radix-2 DIT DFT of a complex input */
/* */
/* input: */
/* n: length of FFT: must be a power of two */
/* m: n = 2**m */
/* input/output */
/* x: double array of length n with real part of data */
/* y: double array of length n with imag part of data */
/* */
/* Permission to copy and use this program is granted */
/* under a Creative Commons "Attribution" license */
/* http://creativecommons.org/licenses/by/1.0/ */
/**********************************************************/
fft(n,m,x,y)
int n,m;
double x[],y[];
{
int i,j,k,n1,n2;
double c,s,e,a,t1,t2;

j = 0; /* bit-reverse */
n2 = n/2;
for (i=1; i < n - 1; i++)
47

{
n1 = n2;
while ( j >= n1 )
{
j = j - n1;
n1 = n1/2;
}
j = j + n1;

if (i < j)
{
t1 = x[i];
x[i] = x[j];
x[j] = t1;
t1 = y[i];
y[i] = y[j];
y[j] = t1;
}
}

n1 = 0; /* FFT */
n2 = 1;

for (i=0; i < m; i++)


{
n1 = n2;
n2 = n2 + n2;
e = -6.283185307179586/n2;
a = 0.0;

for (j=0; j < n1; j++)


{
c = cos(a);
s = sin(a);
a = a + e;

for (k=j; k < n; k=k+n2)


{
t1 = c*x[k+n1] - s*y[k+n1];
t2 = s*x[k+n1] + c*y[k+n1];
x[k+n1] = x[k] - t1;
y[k+n1] = y[k] - t2;
x[k] = x[k] + t1;
y[k] = y[k] + t2;
}
}
}

return;
}
48 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

3.4.2.2 Decimation-in-Frequency (DIF) Radix-2 FFT20


The radix-2 decimation-in-frequency and decimation-in-time (Section 3.4.2.1) fast Fourier transforms (FFTs)
are the simplest FFT algorithms (Section 3.1). Like all FFTs, they compute the discrete Fourier transform
(DFT) (Section 1.1)
PN −1  −(j 2πnk

X (k) = N )
n=0 x (n) e
PN −1  (3.5)
nk
= n=0 x (n) W N

where for notational convenience WNk = e−(j N ) . FFT algorithms gain their speed by reusing the results
2πk

of smaller, intermediate computations to compute multiple DFT frequency outputs.

3.4.2.2.1 Decimation in frequency


The radix-2 decimation-in-frequency algorithm rearranges the discrete Fourier transform (DFT) equa-
tion (3.5) into two parts: computation of the even-numbered discrete-frequency indices X (k) for k =
[0, 2, 4, . . . , N − 2] (or X (2r) as in (3.6)) and computation of the odd-numbered indices k = [1, 3, 5, . . . , N − 1]
(or X (2r + 1) as in (3.7))
PN −1 
X (2r) = n=0 x (n) WN2rn
 
P N2 −1  P N2 −1 N
 2 )
2r (n+ N
= n=0 x (n) WN2rn + n=0 x n+ 2 WN

=
P N2 −1
x (n) WN 2rn

+ n=0 x n +
P N2 −1 N

WN2rn 1
 (3.6)
n=0 2
P N2 −1
  rn 
= n=0 x (n) + x n + N2 W N
2

DFT N x (n) + x n + N2
 
=
2

PN −1  (2r+1)n

X (2r + 1) = n=0 x (n) W N
P N2 −1  N  (2r+1)n 
= n=0 x (n) + WN2 x n + N2 WN
P N2 −1   rn  (3.7)
x (n) − x n + N2 WNn W N

= n=0
2

= DFT N x (n) − x n + N2 WNn


  
2

The mathematical simplications in (3.6) and (3.7) reveal that both the even-indexed and odd-indexed
frequency outputs X (k) can each be computed by a length- N2 DFT. The inputs to these DFTs are sums or
dierences of the rst and second halves of the input signal, respectively, where the input to the short DFT
producing the odd-indexed frequencies is multiplied by a so-called twiddle factor term WNk = e−(j N ) .
2πk

This is called a decimation in frequency because the frequency samples are computed separately in
alternating groups, and a radix-2 algorithm because there are two groups. Figure 3.7 graphically illustrates
this form of the DFT computation. This conversion of the full DFT into a series of shorter DFTs with a
simple preprocessing step gives the decimation-in-frequency FFT its computational savings.
20 This content is available online at <http://cnx.org/content/m12018/1.6/>.
49

Figure 3.7: Decimation in frequency of a length-N DFT into two length- N2 DFTs preceded by a
preprocessing stage.

Whereas direct computation of all N DFT frequencies according to the DFT equation (Section 1.1)
would require N 2 complex multiplies and N 2 − N complex additions (for complex-valued data), by breaking
the computation into two short-length DFTs with some preliminary combining of the data, as illustrated in
Figure 3.7, the computational cost is now
New Operation Counts
2 2
• 2 N2 + N = N2 + N2 complex multiplies
N2
• 2 N2 2 − 1 + N = 2 complex additions
N


This simple manipulation has reduced the total computational cost of the DFT by almost a factor of two!
The initial combining operations for both short-length DFTs involve parallel groups of two time samples,
x (n) and x n + N2 . One of these so-called buttery operations is illustrated in Figure 3.8. There are N2


butteries per stage, each requiring a complex addition and subtraction followed by one twiddle-factor
multiplication by WNn = e−(j N ) on the lower output branch.
2πn
50 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Figure 3.8: DIF buttery: twiddle factor after length-2 DFT

It is worthwhile to note that the initial add/subtract part of the DIF buttery is actually a length-2
DFT! The theory of multi-dimensional index maps (Section 3.5) shows that this must be the case, and that
FFTs of any factorable length may consist of successive stages of shorter-length FFTs with twiddle-factor
multiplications in between. It is also worth noting that this buttery diers from the decimation-in-time
radix-2 buttery (Figure 3.5) in that the twiddle factor multiplication occurs after the combining.

3.4.2.2.2 Radix-2 decimation-in-frequency algorithm


The same radix-2 decimation in frequency can be applied recursively to the two length- N2 DFT (Section 1.1)s
to save additional computation. When successively applied until the shorter and shorter DFTs reach length-2,
the result is the radix-2 decimation-in-frequency FFT algorithm (Figure 3.9).

Figure 3.9: Radix-2 decimation-in-frequency FFT for a length-8 signal


51

The full radix-2 decimation-in-frequency decomposition illustrated in Figure 3.9 requires M = log2 N
stages, each with N2 butteries per stage. Each buttery requires 1 complex multiply and 2 adds per
buttery. The total cost of the algorithm is thus
Computational cost of radix-2 DIF FFT
• N2 log2 N complex multiplies
• N log2 N complex adds
This is a remarkable savings over direct computation of the DFT. For example, a length-1024 DFT would
require 1048576 complex multiplications and 1047552 complex additions with direct computation, but only
5120 complex multiplications and 10240 complex additions using the radix-2 FFT, a savings by a factor
of 100 or more. The relative savings increase with longer FFT lengths, and are less for shorter lengths.
Modest additional reductions in computation can be achieved by noting that certain twiddle factors, namely
N N N 3N
WN0 , WN2 , WN4 , WN8 , WN8 , require no multiplications, or fewer real multiplies than other ones. By
implementing special butteries for these twiddle factors as discussed in FFT algorithm and programming
tricks (Chapter 7), the computational cost of the radix-2 decimation-in-frequency FFT can be reduced to

• 2N log2 N − 7N + 12 real multiplies


• 3N log2 N − 3N + 4 real additions
The decimation-in-frequency FFT is a ow-graph reversal of the decimation-in-time (Section 3.4.2.1) FFT:
it has the same twiddle factors (in reverse pattern) and the same operation counts.

note: In a decimation-in-frequency radix-2 FFT as illustrated in Figure 3.9, the output is in


bit-reversed order (hence "decimation-in-frequency"). That is, if the frequency-sample index n is
written as a binary number, the order is that binary number reversed. The bit-reversal process is
illustrated here (Example 3.1: N=8).

It is important to note that, if the input data are in order before beginning the FFT computations, the
outputs of each buttery throughout the computation can be placed in the same memory locations from
which the inputs were fetched, resulting in an in-place algorithm that requires no extra memory to perform
the FFT. Most FFT implementations are in-place, and overwrite the input data with the intermediate values
and nally the output.

3.4.2.3 Alternate FFT Structures21


Bit-reversing (Section 3.4.2.1) the input in decimation-in-time (DIT) FFTs (Section 3.4.2.1) or the output in
decimation-in-frequency (DIF) FFTs (Section 3.4.2.2) can sometimes be inconvenient or inecient. For such
situations, alternate FFT structures have been developed. Such structures involve the same mathematical
computations as the standard algorithms, but alter the memory locations in which intermediate values are
stored or the order of computation of the FFT butteries (Section 3.4.2.1).
The structure in Figure 3.10 computes a decimation-in-frequency FFT (Section 3.4.2.2), but remaps
the memory usage so that the input is bit-reversed (Section 3.4.2.1), and the output is in-order as in the
conventional decimation-in-time FFT (Section 3.4.2.1). This alternate structure is still considered a DIF
FFT because the twiddle factors (Section 3.4.2.1) are applied as in the DIF FFT (Section 3.4.2.2). This
structure is useful if for some reason the DIF buttery is preferred but it is easier to bit-reverse the input.
21 This content is available online at <http://cnx.org/content/m12012/1.6/>.
52 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Figure 3.10: Decimation-in-frequency radix-2 FFT (Section 3.4.2.2) with bit-reversed input. This
is an in-place (Section 3.4.2.1) algorithm in which the same memory can be reused throughout the
computation.

There is a similar structure for the decimation-in-time FFT (Section 3.4.2.1) with in-order inputs and
bit-reversed frequencies. This structure can be useful for fast convolution (Chapter 4) on machines that favor
decimation-in-time algorithms because the lter can be stored in bit-reverse order, and then the inverse FFT
returns an in-order result without ever bit-reversing any data. As discussed in Ecient FFT Programming
Tricks (Chapter 7), this may save several percent of the execution time.
The structure in Figure 3.11 implements a decimation-in-frequency FFT (Section 3.4.2.2) that has both
input and output in order. It thus avoids the need for bit-reversing altogether. Unfortunately, it destroys
the in-place (Section 3.4.2.1) structure somewhat, making an FFT program more complicated and requiring
more memory; on most machines the resulting cost exceeds the benets. This structure can be computed in
place if two butteries are computed simultaneously.
53

Figure 3.11: Decimation-in-frequency radix-2 FFT with in-order input and output. It can be computed
in-place if two butteries are computed simultaneously.

The structure in Figure 3.12 has a constant geometry; the connections between memory locations are iden-
tical in each FFT stage (Section 3.4.2.1). Since it is not in-place and requires bit-reversal, it is inconvenient
for software implementation, but can be attractive for a highly parallel hardware implementation because
the connections between stages can be hardwired. An analogous structure exists that has bit-reversed inputs
and in-order outputs.
54 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Figure 3.12: This constant-geometry structure has the same interconnect pattern from stage to stage.
This structure is sometimes useful for special hardware.

3.4.3 Radix-4 FFT Algorithms22


The radix-4 decimation-in-time (Section 3.4.2.1) and decimation-in-frequency (Section 3.4.2.2) fast Fourier
transforms (FFTs) (Section 3.1) gain their speed by reusing the results of smaller, intermediate computations
to compute multiple DFT frequency outputs. The radix-4 decimation-in-time algorithm rearranges the
discrete Fourier transform (DFT) equation (Section 1.1) into four parts: sums over all groups of every
fourth discrete-time index n = [0, 4, 8, . . . , N − 4], n = [1, 5, 9, . . . , N − 3], n = [2, 6, 10, . . . , N − 2] and
n = [3, 7, 11, . . . , N − 1] as in (3.8). (This works out only when the FFT length is a multiple of four.) Just as
in the radix-2 decimation-in-time FFT (Section 3.4.2.1), further mathematical manipulation shows that the
length-N DFT can be computed as the sum of the outputs of four length- N4 DFTs, of the even-indexed and
odd-indexed discrete-time samples, respectively, where three of them are multiplied by so-called twiddle
factors WNk = e−(j N ) , WN2k , and WN3k .
2πk

22 This content is available online at <http://cnx.org/content/m12027/1.4/>.


55

PN −1  −(j 2πnk
 P N4 −1  2π(4n)k

N )
−(j )
X (k) = n=0 x (n) e = n=0 x (4n) e + N (3.8)
P N4 −1  2π(4n+1)k
 N
−1
 2π(4n+2)k

x (4n + 1) e−(j N ) x (4n + 2) e−(j N )
P
n=0 + 4
n=0 +
P N4 −1  2π(4n+3)k

n=0 x (4n + 3) e−(j N ) = DFT N [x (4n)] + WNk DFT N [x (4n + 1)] +
4 4

WN2k DFT N [x (4n + 2)] + WN3k DFT N [x (4n + 3)]


4 4

This is called a decimation in time because the time samples are rearranged in alternating groups,
and a radix-4 algorithm because there are four groups. Figure 3.13 (Radix-4 DIT structure) graphically
illustrates this form of the DFT computation.
56 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Radix-4 DIT structure

Figure 3.13: Decimation in time of a length-N DFT into four length- N4 DFTs followed by a combining
stage.
57

Due to the periodicity with N4 of the short-length DFTs, their outputs for frequency-sample k are reused
to compute X (k), X k + N4 , X k + N2 , and X k + 3N 4 . It is this reuse that gives the radix-4 FFT
  

its eciency. The computations involved with each group of four frequency samples constitute the radix-
4 buttery, which is shown in Figure 3.14. Through further rearrangement, it can be shown that this
computation can be simplied to three twiddle-factor multiplies and a length-4 DFT! The theory of multi-
dimensional index maps (Section 3.5) shows that this must be the case, and that FFTs of any factorable
length may consist of successive stages of shorter-length FFTs with twiddle-factor multiplications in between.
The length-4 DFT requires no multiplies and only eight complex additions (this ecient computation can
be derived using a radix-2 FFT (Section 3.4.2.1)).

(a) (b)

Figure 3.14: The radix-4 DIT buttery can be simplied to a length-4 DFT preceded by three
twiddle-factor multiplies.

If the FFT length N = 4M , the shorter-length DFTs can be further decomposed recursively in the same
manner to produce the full radix-4 decimation-in-time FFT. As in the radix-2 decimation-in-time FFT
(Section 3.4.2.1), each stage of decomposition creates additional savings in computation. To determine the
total computational cost of the radix-4 FFT, note that there are M = log4 N = log22 N stages, each with N4
butteries per stage. Each radix-4 buttery requires 3 complex multiplies and 8 complex additions. The
total cost is then
Radix-4 FFT Operation Counts
log2 N
• 3 N4 2 = 38 N log2 N complex multiplies (75% of a radix-2 FFT)
log2 N
• 8 N4 2 = N log2 N complex adds (same as a radix-2 FFT)

The radix-4 FFT requires only 75% as many complex multiplies as the radix-2 (Section 3.4.2.1) FFTs,
although it uses the same number of complex additions. These additional savings make it a widely-used
FFT algorithm.
The decimation-in-time operation regroups the input samples at each successive stage of decomposition,
resulting in a "digit-reversed" input order. That is, if the time-sample index n is written as a base-4 number,
the order is that base-4 number reversed. The digit-reversal process is illustrated for a length-N = 64
example below.
Example 3.2: N = 64 = 4^3
58 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Original Number Original Digit Order Reversed Digit Order Digit-Reversed Number
0 000 000 0
1 001 100 16
2 002 200 32
3 003 300 48
4 010 010 4
5 011 110 20
.. .. .. ..
. . . .

Table 3.2

It is important to note that, if the input signal data are placed in digit-reversed order before beginning the
FFT computations, the outputs of each buttery throughout the computation can be placed in the same
memory locations from which the inputs were fetched, resulting in an in-place algorithm that requires
no extra memory to perform the FFT. Most FFT implementations are in-place, and overwrite the input
data with the intermediate values and nally the output. A slight rearrangement within the radix-4 FFT
introduced by Burrus [5] allows the inputs to be arranged in bit-reversed (Section 3.4.2.1) rather than digit-
reversed order.
A radix-4 decimation-in-frequency (Section 3.4.2.2) FFT can be derived similarly to the radix-2 DIF
FFT (Section 3.4.2.2), by separately computing all four groups of every fourth output frequency sample.
The DIF radix-4 FFT is a ow-graph reversal of the DIT radix-4 FFT, with the same operation counts
and twiddle factors in the reversed order. The output ends up in digit-reversed order for an in-place DIF
algorithm.
Exercise 3.1 (Solution on p. 69.)
How do we derive a radix-4 algorithm when N = 4M 2?

3.4.4 Split-radix FFT Algorithms23


The split-radix algorithm, rst clearly described and named by Duhamel and Hollman[8] in 1984, required
fewer total multiply and add operations operations than any previous power-of-two algorithm. (Yavne [18]
rst derived essentially the same algorithm in 1968, but the description was so atypical that the work was
largely neglected.) For a time many FFT experts thought it to be optimal in terms of total complexity, but
even more ecient variations have more recently been discovered by Johnson and Frigo [14].
The split-radix algorithm can be derived by careful examination of the radix-2 (Section 3.4.2.1) and radix-
4 (Section 3.4.3) owgraphs as in Figure 1 below. While in most places the radix-4 (Section 3.4.3) algorithm
has fewer nontrivial twiddle factors, in some places the radix-2 (Section 3.4.2.1) actually lacks twiddle
factors present in the radix-4 (Section 3.4.3) structure or those twiddle factors simplify to multiplication by
−j , which actually requires only additions. By mixing radix-2 (Section 3.4.2.1) and radix-4 (Section 3.4.3)
computations appropriately, an algorithm of lower complexity than either can be derived.
23 This content is available online at <http://cnx.org/content/m12031/1.5/>.
59

Motivation for split-radix algorithm

(a) radix-2 (b) radix-4

Figure 3.15: See Decimation-in-Time (DIT) Radix-2 FFT (Section 3.4.2.1) and Radix-4 FFT Algo-
rithms (Section 3.4.3) for more information on these algorithms.

An alternative derivation notes that radix-2 butteries of the form shown in Figure 2 can merge twiddle
factors from two successive stages to eliminate one-third of them; hence, the split-radix algorithm requires
only about two-thirds as many multiplications as a radix-2 FFT.

(a) (b)

Figure 3.16: Note that these two butteries are equivalent


60 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

The split-radix algorithm can also be derived by mixing the radix-2 (Section 3.4.2.1) and radix-4 (Sec-
tion 3.4.3) decompositions.
DIT Split-radix derivation
P N2 −1  −(j
2π(2n)k
)
 P N4 −1  −(j
2π(4n+1)k
)

X (k) = n=0 x (2n) e N + n=0 x (4n + 1) e N + (3.9)
P N4 −1  −(j
2π(4n+3)k
)

n=0 x (4n + 3) e N = DFT N [x (2n)] + WNk DFT N x (4n + 1) +
2 4

WN3k DFT N x (4n + 3)


4

Figure 3 illustrates the resulting split-radix buttery.

Decimation-in-Time Split-Radix Buttery

Figure 3.17: The split-radix buttery mixes radix-2 and radix-4 decompositions and is L-shaped

Further decomposition of the half- and quarter-length DFTs yields the full split-radix algorithm. The
mix of dierent-length FFTs in dierent parts of the owgraph results in a somewhat irregular algorithm;
Sorensen et al.[12] show how to adjust the computation such that the data retains the simpler radix-2

bit-reverse order. A decimation-in-frequency split-radix FFT can be derived analogously.


61

Figure 3.18: The split-radix transform has L-shaped butteries

The multiplicative complexity of the split-radix algorithm is only about two-thirds that of the radix-2
FFT, and is better than the radix-4 FFT or any higher power-of-two radix as well. The additions within
the complex twiddle-factor multiplies are similarly reduced, but since the underlying buttery tree remains
the same in all power-of-two algorithms, the buttery additions remain the same and the overall reduction
in additions is much less.

Operation Counts

Complex M/As Real M/As (4/2) Real M/As (3/3)


M
Multiplies
N  4 38 2
O 3 log2 N 3 N log2 N − 9 N +6+ 9 (−1) N log2 N − 3N + 4
M
Additions O [N log2 N ] 8
3 N log2 N − 16
9 N +2+ 2
9 (−1) 3N log2 N − 3N + 4

Table 3.3
62 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Comments
• The split-radix algorithm has a somewhat irregular structure. Successful progams have been written
(Sorensen[12]) for uni-processor machines, but it may be dicult to eciently code the split-radix
algorithm for vector or multi-processor machines.
• G. Bruun's algorithm[2] requires only N − 2 more operations than the split-radix algorithm and has a
regular structure, so it might be better for multi-processor or special-purpose hardware.
• The execution time of FFT programs generally depends more on compiler- or hardware-friendly soft-
ware design than on the exact computational complexity. See Ecient FFT Algorithm and Program-
ming Tricks (Chapter 7) for further pointers and links to good code.

3.5 Multidimensional Index Maps 24

3.5.1 Multidimensional Index Maps for DIF and DIT algorithms


3.5.1.1 Decimation-in-time algorithm
Radix-2 DIT (Section 3.4.2.1):
N N
N −1 −1
2 −1 
2 
(2n+1)k
X X X
x (n) WNnk x (2n) WN2nk
 
X (k) = = + x (2n + 1) WN
n=0 n=0 n=0

Formalization: Let n = n1 + 2n2 : n1 = [0, 1]: n2 = 0, 1, 2, . . . , N2 − 1


 

N 
N −1 1 2 −1  
(n +2n2 )k 
X X X
nk

X (k) = x (n) WN =  x (n1 + 2n2 ) W 1 N
n=0 n1 =0 n2 =0

Also, let k = N
+ k2 : k1 = [0, 1]: k2 = 0, 1, 2, . . . , N2 − 1
 
2 k1

note: As long as there is a one-to-one correspondence between the original indices [n, k] =
[0, 1, 2, . . . , N − 1] and the n, k generated by the index map, the computation is the same; only
the order in which the sums are done is changed.

Rewriting the DFT (2.3) formula in terms of index map n = n1 + 2n2 , k = N


2 k1 + k2 :

= X N2 k1 + k2

X (k)
 
PN −1 2 k2 +k2 )
n( N
= n=0 x (n) W N
  
P1 P N2 −1 (n1 +n2 )( N2 k1 +k2 )
= n2 =0 x (n 1 + 2n 2 ) W
n1 =0
P N 
N
(3.10)
N

P1 −1 2 1 k2
n
= n1 =0 n2 =0 x ([n1 , n2 ]) WN
2
WNn1 k2 WNN n2 k1 WN2n2 k2
P N  
2 −1
P1 n1 k 2 n1 k2 n2 k2
= x ([n 1 , n2 ]) W W 1W
 n2 =0  2 N
n1 =0 N
2
N
 
P1 n1 k2 n1 k2 P 2 1 n2 k2
= n1 =0 W2 WN n2 =0 x ([n1 , n2 ]) W N
2

note: Key to FFT is choosing index map so that one of the cross-terms disappears!

Exercise 3.2
What is an index map for a radix-4 (Section 3.4.3) DIT algorithm?
24 This content is available online at <http://cnx.org/content/m12025/1.3/>.
63

Exercise 3.3
What is an index map for a radix-4 (Section 3.4.3) DIF algortihm?
Exercise 3.4
What is an index map for a radix-3 DIT algorithm? (N a multiple of 3)
For arbitrary composite N = N1 N2 , we can dene an index map

n = n 1 + N1 n 2

k = N2 k1 + k2

n1 = [0, 1, 2, . . . , N1 − 1]

k1 = [0, 1, 2, . . . , N1 − 1]

n2 = [0, 1, 2, . . . , N2 − 1]

k2 = [0, 1, 2, . . . , N2 − 1]

X (k) = X (k1 , k2 )
PN1 −1 PN2 −1  N2 n1 k1

= n1 =0 n2 =0 x (n1 , n2 ) WN WNn1 k2 WNN k1 n2 WNN1 n2 k2
PN1 −1 PN2 −1  n1 k1 n1 k2 n2 k2
 (3.11)
= n1 =0 x (n 1 , n2 ) W N1 W N 1W N2
hn2 =0 i
= DFTn1 ,N1 WNn1 k2 DFTn2 ,N2 [x (n1 , n2 )]

Computational cost in multipliesl "Common Factor Algorithm (CFA)"


• N1 length-N2 DFTs ⇒ N1 N2 2
• N twiddle factors ⇒ N
• N2 length-N1 DFTs ⇒ N2 N1 2
• Total - N1 N2 2 + N1 N2 + N2 N1 2 = N (N1 + N2 + 1)
"Direct": N 2 = N (N1 N2 )
Example 3.3
N1 = 16

N2 = 15

N = 240

direct = 2402 = 57600

CFA = 7680
Tremendous saving for any composite N
64 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

Pictorial Representations

(a) Emphasizes Multi-dimensional structure


65

Exercise 3.5
Can the composite CFAs be implemented in-place?
Exercise 3.6
What do we do with N = N1 N2 N3 ?

3.6 The Prime Factor Algorithm 25

3.6.1 General Index Maps


n = (K1 n1 + K2 n2 ) modN

n = (K3 k1 + K4 k2 ) modN

n1 = [0, 1, . . . , N1 − 1]

k1 = [0, 1, . . . , N1 − 1]

n2 = [0, 1, . . . , N2 − 1]

k2 = [0, 1, . . . , N2 − 1]
The basic ideas is to simply reorder the DFT (2.3) computation to expose the redundancies in the DFT
(2.3), and exploit these to reduce computation!
Three conditions must be satised to make this map (p. 65) serve our purposes

1. Each map must be one-to-one from 0 to N − 1, because we want to do the same computation, just in
a dierent order.
2. The map must be cleverly chosen so that computation is reduced
3. The map should be chosen to make the short-length transforms be DFTs (2.3). (Not essential, since
fast algorithms for short-length DFT (2.3)-like computations could be developed, but it makes our
work easier.)

3.6.1.1 Conditions for one-to-oneness of general index map


3.6.1.1.1 Case I
N1 , N2 relatively prime (greatest common denominator = 1) i.e. gcd (N1 , N2 ) = 1
K1 = aN2 and/or K2 = bN1 and gcd (K1 , N1 ) = 1, gcd (K2 , N2 ) = 1

3.6.1.1.2 Case II
N1 , N2 not relatively prime: gcd (N1 , N2 ) > 1
K1 = aN2 and K2 6= bN1 and gcd (a, N1 ) = 1, gcd (K2 , N2 ) = 1 or K1 6= aN2 and K2 = bN1 and
gcd (K1 , N1 ) = 1, gcd (b, N2 ) = 1 where K1 , K2 , K3 , K4 , N1 , N2 , a, b integers

note: Requires number-theory/abstract-algebra concepts. Reference: C.S. Burrus [3]

note: Conditions of one-to-oneness must apply to both k and n

25 This content is available online at <http://cnx.org/content/m12033/1.3/>.


66 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

3.6.1.2 Conditions for arithmetic savings


PN1 −1 PN2 −1  (K1 n1 +K2 n2 )(K3 k1 +K4 k2 )

X (k1 , k2 ) = n1 =0 n2 =0 x (n 1 , n2 ) W N
PN1 −1 PN2 −1  K1 K3 n1 k1 K1 K4 n1 k2 K2 K3 n2 k1 K2 K4 n2 k2
 (3.12)
= n1 =0 n2 =0 x (n 1 , n2 ) W N W N W N W N

• (K1 K4 ) modN = 0 exclusive or (K2 K3 ) modN = 0 ⇒ Common Factor Algorithm (CFA). Then

X (k) = DFTN i [twiddle factorsDFTN j [x (n1 , n2 )]]

• (K1 K4 ) modN and (K2 K3 ) modN = 0 ⇒ Prime Factor Algorithm (PFA).


X (k) = DFTN i [DFTN j ]

No twiddle factors!
note: A PFA exists only and always for relatively prime N1 , N2

3.6.1.3 Conditions for short-length transforms to be DFTs


(K1 K3 ) modN = N2 and (K2 K4 ) modN = N1
note: Convenient choice giving a PFA
K1 = N2 , K2 = N1 , K3 = N2 N2 −1 modN1 modN1 , K4 = N1 N1 −1 modN2 modN2 where
   

N1 −1 modN2 is an integer such that N1 N1 −1 mod = 1


 

Example 3.4
N1 = 3, N2 = 5 N = 15
n = (5n1 + 3n2 ) mod15

k = (10k1 + 6k2 ) mod15

1. Checking Conditions for one-to-oneness -


5 = K1 = aN2 = 5a

3 = K2 = bN1 = 3b

gcd (5, 3) = 1

gcd (3, 5) = 1

10 = K3 = aN2 = 5a

6 = K4 = bN1 = 3b

gcd (10, 3) = 1

gcd (6, 5) = 1
67

2. Checking conditions for reduced computation -


(K1 K4 ) mod15 = (5 × 6) mod15 = 0

(K2 K3 ) mod15 = (3 × 10) mod15 = 0


3. Checking Conditions for making the short-length transforms be DFTS -
(K1 K3 ) mod15 = (5 × 10) mod15 = 5 = N2

(K2 K4 ) mod15 = (3 × 6) mod15 = 3 = N1


Therefore, this is a prime factor map.

2-D map

Figure 3.20: n = (5n1 + 3n2 ) mod15 and k = (10k1 + 6k2 ) mod15

Operation Counts
• N2 length- N1 DFTs +N1 length- N2 DFTs

N2 N1 2 + N1 N2 2 = N (N1 + N2 )

complex multiplies
68 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS

• Suppose N = N1 N2 N3 . . . NM
N (N1 + N2 + · · · + NM )
Complex multiplies

note: radix-2 (Section 3.4.2.1), radix-4 (Section 3.4.3) eliminate all multiplies in short-length
DFTs, but have twiddle factors: PFA eliminates all twiddle factors, but ends up with multiplies in
short-length DFTs (2.3). Surprisingly, total operation counts end up being very similar for similar
lengths.
69

Solutions to Exercises in Chapter 3


Solution to Exercise 3.1 (p. 58)
Perform a radix-2 decomposition for one stage, then radix-4 decompositions of all subsequent shorter-length
DFTs.
70 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
Chapter 4

Fast Convolution 1

4.1 Fast Circular Convolution


Since,
N
X −1
(x (m) (h (n − m)) modN ) = y (n) is equivalent toY (k) = X (k) H (k)
m=0

y (n) can be computed as y (n) = IDFT [DFT [x (n)] DFT [h (n)]]


Cost
• Direct
· N 2 complex multiplies.
· N (N − 1) complex adds.
Via FFTs
• · 3 FFTs + N multipies.
· N + 3N 2 log2 N complex multiplies.
· 3 (N log2 N ) complex adds.
If H (k) can be precomputed, cost is only 2 FFts + N multiplies.

4.2 Fast Linear Convolution


DFT (2.3) produces cicular convolution. For linear convolution, we must zero-pad sequences so that circular
wrap-around always wraps over zeros.
1 This content is available online at <http://cnx.org/content/m12022/1.5/>.

71
72 CHAPTER 4. FAST CONVOLUTION

Figure 4.1

To achieve linear convolution using fast circular convolution, we must use zero-padded DFTs of length
N ≥L+M −1

Figure 4.2

Choose shortest convenient N (usually smallest power-of-two greater than or equal to L + M − 1)

y (n) = IDFTN [DFTN [x (n)] DFTN [h (n)]]

note: There is some ineciency when compared to circular convolution due to longer zero-padded
73
 
DFTs (2.3). Still, O N
log2 N savings over direct computation.

4.3 Running Convolution


Suppose L = ∞, as in a real time lter application, or (L  M ). There are ecient block methods for
computing fast convolution.

4.3.1 Overlap-Save (OLS) Method


Note that if a length-M lter h (n) is circularly convulved with a length-N segment of a signal x (n),

Figure 4.3

the rst M − 1 samples are wrapped around and thus is incorrect. However, for M − 1 ≤ n ≤ N − 1,the
convolution is linear convolution, so these samples are correct. Thus N − M + 1 good outputs are produced
for each length-N circular convolution.
The Overlap-Save Method: Break long signal into successive blocks of N samples, each block overlapping
the previous block by M − 1 samples. Perform circular convolution of each block with lter h (m). Discard
rst M − 1 points in each output block, and concatenate the remaining points to create y (n).
74 CHAPTER 4. FAST CONVOLUTION

Figure 4.4

Computation cost for a length-N equals 2n FFT per output sample is (assuming precomputed H (k)) 2
FFTs and N multiplies

2 N2 log2 N + N

N (log2 N + 1)
= complex multiplies
N −M +1 N −M +1
75

2 (N log2 N ) 2N log2 N
= complex adds
N −M +1 N −M +1
Compare to M mults, M − 1 adds per output point for direct method. For a given M , optimal N can
be determined by nding N minimizing operation counts. Usualy, optimal N is 4M ≤ Nopt ≤ 8M .

4.3.2 Overlap-Add (OLA) Method


Zero-pad length-L blocks by M − 1 samples.

Figure 4.5

Add successive blocks, overlapped by M − 1 samples, so that the tails sum to produce the complete linear
convolution.
76 CHAPTER 4. FAST CONVOLUTION

Figure 4.6

Computational Cost: Two length N = L + M − 1 FFTs and M mults and M − 1 adds per L output
points; essentially the sames as OLS method.
Chapter 5

Chirp-z Transform 1

Let z k = AW −k , where A = Ao ejθo , W = Wo e−(jφo ) .


We wish to compute M samples, k = [0, 1, 2, . . . , M − 1] of
N −1 −1
X  NX
x (n) zk −n = x (n) A−n W nk

X (zk ) =
n=0 n=0

1 This content is available online at <http://cnx.org/content/m12013/1.4/>.

77
78 CHAPTER 5. CHIRP-Z TRANSFORM

Figure 5.1

 
2 2
Note that (k − n) = n2 − 2nk + k 2 ⇒ nk = 1
2 n2 + k 2 − (k − n) , So

−1
N
!
X
−n n2 k2 (
− (k−n)2 )
X (zk ) = x (n) A W 2 W 2 W 2

n=0

−1
N
!
k2
X
−n n2 (
− (k−n)2 )
=W 2 x (n) A W 2 W 2

n=0

Thus, X (zk ) can be compared by


79

n2
1. Premultiply x (n) by An W 2 , n = [0, 1, . . . , N − 1] to make y (n)
(
− (k−n)2 )
2. Linearly convolve with W 2
k2
3. Post multiply by to get W 2 to get X (zk ).

1. (list, item 1, p. 78) and 3. (list, item 3, p. 79) require N and M operations respectively. 2. (list,
item 2, p. 79) can be performed eciently using fast convolution.

Figure 5.2

n2
“ ”

W is required only for − ((N − 1)) ≤ n ≤ M − 1, so this linear convolution can be implemented
2

with L ≥ N + M − 1 FFTs.
n2
“ ”

note: Wrap W 2
around L when implementing with circular convolution.

So, a weird-length DFT can be implemented relatively eciently using power-of-two algorithms via the
chirp-z transform.
Also useful for "zoom-FFTs".
80 CHAPTER 5. CHIRP-Z TRANSFORM
Chapter 6

FFTs of prime length and Rader's


conversion 1

The power-of-two FFT algorithms (Section 3.4.1), such as the radix-2 (Section 3.4.2.1) and radix-4 (Sec-
tion 3.4.3) FFTs, and the common-factor (Section 3.5) and prime-factor (Section 3.6) FFTs, achieve great
reductions in computational complexity of the DFT (Section 1.1) when the length, N , is a composite number.
DFTs of prime length are sometimes needed, however, particularly for the short-length DFTs in common-
factor or prime-factor algorithms. The methods described here, along with the composite-length algorithms,
allow fast computation of DFTs of any length.
There are two main ways of performing DFTs of prime length:

1. Rader's conversion, which is most ecient, and the


2. Chirp-z transform (Chapter 5), which is simpler and more general.
Oddly enough, both work by turning prime-length DFTs into convolution! The resulting convolutions can
then be computed eciently by either
1. fast convolution (Chapter 4) via composite-length FFTs (simpler) or by
2. Winograd techniques (more ecient)

6.1 Rader's Conversion


Rader's conversion is a one-dimensional index-mapping (Section 3.5) scheme that turns a length-N DFT
(2.3) (N prime) into a length-( N − 1) convolution and a few additions. Rader's conversion works only for
prime-length N .
An index map simply rearranges the order of the sum operation in the DFT denition (Section 1.1).
Because addition is a commutative operation, the same mathematical result is produced from any order, as
long as all of the same terms are added once and only once. (This is the condition that denes an index
map.) Unlike the multi-dimensional index maps (Section 3.5) used in deriving common factor (Section 3.5)
and prime-factor FFTs (Section 3.6), Rader's conversion uses a one-dimensional index map in a nite group
of N integers: k = (rm ) modN

6.1.1 Fact from number theory


If N is prime, there exists an integer "r" called a primitive root, such that the index map k = (rm ) modN ,
m = [0, 1, 2, . . . , N − 2], uniquely generates all elements k = [1, 2, 3, . . . , N − 1]
1 This content is available online at <http://cnx.org/content/m12023/1.3/>.

81
82 CHAPTER 6. FFTS OF PRIME LENGTH AND RADER'S CONVERSION

Example 6.1
N = 5, r = 2
20 mod5 = 1


21 mod5 = 2


22 mod5 = 4


23 mod5 = 3


6.1.2 Another fact from number theory


For N prime, the inverse of r (i.e. r−1 r modN = 1 is also a primitive root (call it r−1 ).


Example 6.2
N = 5, r = 2 r−1 = 3
(2 × 3) mod5 = 1

30 mod5 = 1


31 mod5 = 3


32 mod5 = 4


33 mod5 = 2


So why do we care? Because we can use these facts to turn a DFT (2.3) into a convolution!

6.1.3 Rader's Conversion


Let n = (r−m ) modN , m = [0, 1, . . . , N − 2] ∧ n ∈ [1, 2, . . . , N − 1] , k = (rp ) modN , p =
[0, 1, . . . , N − 2] ∧ k ∈ [1, 2, . . . , N − 1]

NX −1  x (0) + PN −1 x (n) W nk  if k 6= 0
nk
 n=1 N
X (k) = x (n) WN = PN −1
n=0

n=0 (x (n)) if k = 0

where for convenience WNnk = e−(j ) in the DFT equation. For k 6= 0


2πnk
N

PN −2  −m p −m

X ((rp ) modN ) = m=0 x ((r ) modN ) W r r + x (0)
PN −2  
= m=0 x ((r
−m
) modN ) W r p−m
+ x (0) (6.1)
l
= x (0) + x r−l modN ∗ W r
 

where l = [0, 1, . . . , N − 2]
83

Example 6.3
N = 5, r = 2, r−1 = 3
    
X (0) 0 0 0 0 0 x (0)
    
 X (1)   0 1 2 3 4   x (1) 
    
    
 X (2)  =  0 2 4 1 3   x (2) 
    
    
 X (3)   0 3 1 4 2   x (3) 
    
X (4) 0 4 3 2 1 x (4)
    
X (0) 0 0 0 0 0 x (0)
    
 X (1)   0 1 3 4 2   x (1) 
    
    
 X (2)  =  0 2 1 3 4   x (3) 
    
    
 X (4)   0 4 2 1 1   x (4) 
    
X (3) 0 3 4 2 3 x (2)
where for visibility the matrix entries represent only the power, m of the corresponding DFT term
WNm Note that the 4-by-4 circulant matrix2
 
1 3 4 2
 
 2 1 3 4 
 
 
 4 2 1 1 
 
3 4 2 3

corresponds to a length-4 circular convolution.


Rader's conversion turns a prime-length DFT (2.3) into a few adds and a composite-length (N −1) circular
convolution, which can be computed eciently using either

1. fast convolution (Chapter 4) via FFT and IFFT


2. index-mapped convolution algorithms and short Winograd convolution alogrithms. (Rather compli-
cated, and trades fewer multiplies for many more adds, which may not be worthwile on most modern
processors.) See R.C. Agarwal and J.W. Cooley [1]

6.2 Winograd minimum-multiply convolution and DFT algorithms


S. Winograd has proved that a length-N circular or linear convolution or DFT (2.3) requires less than
2N multiplies (for real data), or 4N real multiplies for complex data. (This doesn't count multiplies by
rational fractions, like 3 or N1 or 17
5
, which can be computed with additions and one overall scaling factor.)
Furthermore, Winograd showed how to construct algorithms achieving these counts. Winograd prime-length
DFTs and convolutions have the following characteristics:

1. Extremely ecient for small N (N < 20)


2. The number of adds becomes huge for large N .

Thus Winograd's minimum-multiply FFT's are useful only for small N . They are very important for Prime-
Factor Algorithms (Section 3.6), which generally use Winograd modules to implement the short-length DFTs.
Tables giving the multiplies and adds necessary to compute Winograd FFTs for various lengths can be found
2 http://en.wikipedia.org/wiki/Circulant_matrix
84 CHAPTER 6. FFTS OF PRIME LENGTH AND RADER'S CONVERSION

in C.S. Burrus (1988)[4]. Tables and FORTRAN and TMS32010 programs for these short-length transforms
can be found in C.S. Burrus and T.W. Parks (1985)[6]. The theory and derivation of these algorithms is
quite elegant but requires substantial background in number theory and abstract algebra. Fortunately for
the practitioner, all of the short algorithms one is likely to need have already been derived and can simply
be looked up without mastering the details of their derivation.

6.3 Winograd Fourier Transform Algorithm (WFTA)


The Winograd Fourier Transform Algorithm (WFTA) is a technique that recombines the short Winograd
modules in a prime-factor FFT (Section 3.6) into a composite-N structure with fewer multiplies but more
adds. While theoretically interesting, WFTAs are complicated and dierent for every length, and on modern
processors with hardware multipliers the trade of multiplies for many more adds is very rarely useful in
practice today.
Chapter 7

Ecient FFT Algorithm and


Programming Tricks 1

The use of FFT algorithms (Section 3.1) such as the radix-2 decimation-in-time (Section 3.4.2.1) or
decimation-in-frequency (Section 3.4.2.2) methods result in tremendous savings in computations when com-
puting the discrete Fourier transform (Section 1.1). While most of the speed-up of FFTs comes from this,
careful implementation can provide additional savings ranging from a few percent to several-fold increases
in program speed.

7.1 Precompute twiddle factors


The twiddle factor (Section 3.4.2.1), or WNk = e−(j N ) , terms that multiply the intermediate data in the
2πk

FFT algorithms (Section 3.1) consist of cosines and sines that each take the equivalent of several multiplies
to compute. However, at most N unique twiddle factors can appear in any FFT or DFT algorithm. (For
example, in the radix-2 decimation-in-time FFT (Section 3.4.2.1), only N2 twiddle factors WN k , k =
0, 1, 2, . . . , N2 − 1 are used.) These twiddle factors can be precomputed once and stored in an array in


computer memory, and accessed in the FFT algorithm by table lookup. This simple technique yields very
substantial savings and is almost always used in practice.

7.2 Compiler-friendly programming


On most computers, only some of the total computation time of an FFT is spent performing the FFT
buttery computations; determining indices, loading and storing data, computing loop parameters and
other operations consume the majority of cycles. Careful programming that allows the compiler to generate
ecient code can make a several-fold improvement in the run-time of an FFT. The best choice of radix in
terms of program speed may depend more on characteristics of the hardware (such as the number of CPU
registers) or compiler than on the exact number of computations. Very often the manufacturer's library
codes are carefully crafted by experts who know intimately both the hardware and compiler architecture and
how to get the most performance out of them, so use of well-written FFT libraries is generally recommended.
Certain freely available programs and libraries are also very good. Perhaps the best current general-purpose
library is the FFTW2 package; information can be found at http://www.tw.org3 . A paper by Frigo and
Johnson[10] describes many of the key issues in developing compiler-friendly code.

1 This content is available online at <http://cnx.org/content/m12021/1.6/>.


2 http://www.tw.org
3 http://www.tw.org

85
CHAPTER 7. EFFICIENT FFT ALGORITHM AND PROGRAMMING
86
TRICKS

7.3 Program in assembly language


While compilers continue to improve, FFT programs written directly in the assembly language of a specic
machine are often several times faster than the best compiled code. This is particularly true for DSP
microprocessors, which have special instructions for accelerating FFTs that compilers don't use. (I have
myself seen dierences of up to 26 to 1 in favor of assembly!) Very often, FFTs in the manufacturer's or
high-performance third-party libraries are hand-coded in assembly. For DSP microprocessors, the codes
developed by Meyer, Schuessler, and Schwarz [15] are perhaps the best ever developed; while the particular
processors are now obsolete, the techniques remain equally relevant today. Most DSP processors provide
special instructions and a hardware design favoring the radix-2 decimation-in-time algorithm, which is thus
generally fastest on these machines.

7.4 Special hardware


Some processors have special hardware accelerators or co-processors specically designed to accelerate FFT
computations. For example, AMI Semiconductor's4 Toccata5 ultra-low-power DSP microprocessor family,
which is widely used in digital hearing aids, have on-chip FFT accelerators; it is always faster and more
power-ecient to use such accelerators and whatever radix they prefer.
In a surprising number of applications, almost all of the computations are FFTs. A number of special-
purpose chips are designed to specically compute FFTs, and are used in specialized high-performance
applications such as radar systems. Other systems, such as OFDM6 -based communications receivers, have
special FFT hardware built into the digital receiver circuit. Such hardware can run many times faster, with
much less power consumption, than FFT programs on general-purpose processors.

7.5 Eective memory management


Cache misses or excessive data movement between registers and memory can greatly slow down an FFT
computation. Ecient programs such as the FFTW package7 are carefully designed to minimize these
ineciences. In-place algorithms (Section 3.4.2.1) reuse the data memory throughout the transform, which
can reduce cache misses for longer lengths.

7.6 Real-valued FFTs


FFTs of real-valued signals require only half as many computations as with complex-valued data. There are
several methods for reducing the computation, which are described in more detail in Sorensen et al.[13]
1. Use DFT symmetry properties (Section 1.1) to do two real-valued DFTs at once with one FFT program
2. Perform one stage of the radix-2 decimation-in-time (Section 3.4.2.1) decomposition and compute the
two length- N2 DFTs using the above approach.
3. Use a direct real-valued FFT algorithm; see H.V. Sorensen et.al.[13]

7.7 Special cases


Occasionally only certain DFT frequencies are needed, the input signal values are mostly zero, the signal is
real-valued (as discussed above), or other special conditions exist for which faster algorithms can be devel-
oped. Sorensen and Burrus [16] describe slightly faster algorithms for pruned8 or zero-padded (Section 2.1.5:
4 http://www.amis.com
5 http://www.amis.com/products/dsp/toccata_plus.html
6 http://en.wikipedia.org/wiki/OFDM
7 http://www.tw.org
8 http://www.tw.org/pruned.html
87

Zero-Padding) data. Goertzel's algorithm (Section 3.3) is useful when only a few DFT outputs are needed.
The running FFT (Section 3.2) can be faster when DFTs of highly overlapped blocks of data are needed, as
in a spectrogram (Section 2.3).

7.8 Higher-radix algorithms


Higher-radix algorithms, such as the radix-4 (Section 3.4.3), radix-8, or split-radix (Section 3.4.4) FFTs,
require fewer computations and can produce modest but worthwhile savings. Even the split-radix FFT
(Section 3.4.4) reduces the multiplications by only 33% and the additions by a much lesser amount relative
to the radix-2 FFTs (Section 3.4.2.1); signicant improvements in program speed are often due to implicit
loop-unrolling9 or other compiler benets than from the computational reduction itself!

7.9 Fast bit-reversal


Bit-reversing (Section 3.4.2.1) the input or output data can consume several percent of the total run-time
of an FFT program. Several fast bit-reversal algorithms have been developed that can reduce this to two
percent or less, including the method published by D.M.W. Evans [9].

7.10 Trade additions for multiplications


When FFTs rst became widely used, hardware multipliers were relatively rare on digital computers, and
multiplications generally required many more cycles than additions. Methods to reduce multiplications,
even at the expense of a substantial increase in additions, were often benecial. The prime factor algorithms
(Section 3.6) and the Winograd Fourier transform algorithms (Chapter 6), which required fewer multiplies
and considerably more additions than the power-of-two-length algorithms (Section 3.4.1), were developed
during this period. Current processors generally have high-speed pipelined hardware multipliers, so trading
multiplies for additions is often no longer benecial. In particular, most machines now support single-cycle
multiply-accumulate (MAC) operations, so balancing the number of multiplies and adds and combining them
into single-cycle MACs generally results in the fastest code. Thus, the prime-factor and Winograd FFTs are
rarely used today unless the application requires FFTs of a specic length.
It is possible to implement a complex multiply with 3 real multiplies and 5 real adds rather than the
usual 4 real multiplies and 2 real adds:
(C + jS) (X + jY ) = CX − SY + j (CY + SX)
but alernatively
Z = C (X − Y )

D =C +S

E =C −S

CX − SY = EY + Z

CY + SX = DX − Z
In an FFT, D and E come entirely from the twiddle factors, so they can be precomputed and stored in a
look-up table. This reduces the cost of the complex twiddle-factor multiply to 3 real multiplies and 3 real
adds, or one less and one more, respectively, than the conventional 4/2 computation.
9 http://en.wikipedia.org/wiki/Loop_unrolling
CHAPTER 7. EFFICIENT FFT ALGORITHM AND PROGRAMMING
88
TRICKS

7.11 Special butteries


N N N 3N
Certain twiddle factors, namely WN0 = 1, WN2 , WN4 , WN8 , WN8 , etc., can be implemented with no additional
operations, or with fewer real operations than a general complex multiply. Programs that specially implement
such butteries in the most ecient manner throughout the algorithm can reduce the computational cost
by up to several N multiplies and additions in a length-N FFT.

7.12 Practical Perspective


When optimizing FFTs for speed, it can be important to maintain perspective on the benets that can be
expected from any given optimization. The following list categorizes the various techniques by potential
benet; these will be somewhat situation- and machine-dependent, but clearly one should begin with the
most signicant and put the most eort where the pay-o is likely to be largest.
Methods to speed up computation of DFTs
• Tremendous Savings -
a. FFT ( logN N savings)
2

• Substantial Savings - (≥ 2)
a. Table lookup of cosine/sine
b. Compiler tricks/good programming
c. Assembly-language programming
d. Special-purpose hardware
e. Real-data FFT for real data (factor of 2)
f. Special cases
• Minor Savings -
a. radix-4 (Section 3.4.3), split-radix (Section 3.4.4) (-10% - +30%)
b. special butteries
c. 3-real-multiplication complex multiply
d. Fast bit-reversal (up to 6%)

note: On general-purpose machines, computation is only part of the total run time. Address
generation, indexing, data shuing, and memory access take up much or most of the cycles.

note: A well-written radix-2 (Section 3.4.2.1) program will run much faster than a poorly written
split-radix (Section 3.4.4) program!
Chapter 8

Choosing the Best FFT Algorithm 1

8.1 Choosing an FFT length


The most commonly used FFT algorithms by far are the power-of-two-length FFT (Section 3.4.1) algorithms.
The Prime Factor Algorithm (PFA) (Section 3.6) and Winograd Fourier Transform Algorithm (WFTA)
(Section 6.3: Winograd Fourier Transform Algorithm (WFTA)) require somewhat fewer multiplies, but the
overall dierence usually isn't sucient to warrant the extra diculty. This is particularly true now that most
processors have single-cycle pipelined hardware multipliers, so the total operation count is more relevant.
As can be seen from the following table, for similar lengths the split-radix algorithm is comparable in total
operations to the Prime Factor Algorithm, and is considerably better than the WFTA, although the PFA and
WTFA require fewer multiplications and more additions. Many processors now support single cycle multiply-
accumulate (MAC) operations; in the power-of-two algorithms all multiplies can be combined with adds in
MACs, so the number of additions is the most relevant indicator of computational cost.

Representative FFT Operation Counts

FFT length Multiplies (real) Adds(real) Mults + Adds


Radix 2 1024 10248 30728 40976
Split Radix 1024 7172 27652 34824
Prime Factor Alg 1008 5804 29100 34904
Winograd FT Alg 1008 3548 34416 37964

Table 8.1

The Winograd Fourier Transform Algorithm (Section 6.3: Winograd Fourier Transform Algorithm
(WFTA)) is particularly dicult to program and is rarely used in practice. For applications in which
the transform length is somewhat arbitrary (such as fast convolution or general spectrum analysis), the
length is usually chosen to be a power of two. When a particular length is required (for example, in the USA
each carrier has exactly 416 frequency channels in each band in the AMPS2 cellular telephone standard), a
Prime Factor Algorithm (Section 3.6) for all the relatively prime terms is preferred, with a Common Factor
Algorithm (Section 3.5) for other non-prime lengths. Winograd's short-length modules (Chapter 6) should
be used for the prime-length factors that are not powers of two. The chirp z-transform (Chapter 5) oers a
universal way to compute any length DFT (Section 2.1) (for example, Matlab3 reportedly uses this method
1 This content is available online at <http://cnx.org/content/m12060/1.3/>.
2 http://en.wikipedia.org/wiki/AMPS
3 http://www.mathworks.com/products/matlab/

89
90 CHAPTER 8. CHOOSING THE BEST FFT ALGORITHM

for lengths other than a power of two), at a few times higher cost than that of a CFA or PFA optimized for
that specic length. The chirp z-transform (Chapter 5), along with Rader's conversion (Section 6.1: Rader's
Conversion), assure us that algorithms of O (N logN ) complexity exist for any DFT length N .

8.2 Selecting a power-of-two-length algorithm


The choice of a power-of-two algorithm may not just depend on computational complexity. The latest exten-
sions of the split-radix algorithm (Section 3.4.4) oer the lowest known power-of-two FFT operation counts,
but the 10%-30% dierence may not make up for other factors such as regularity of structure or data ow,
FFT programming tricks (Chapter 7), or special hardware features. For example, the decimation-in-time
radix-2 FFT (Section 3.4.2.1) is the fastest FFT on Texas Instruments'4 TMS320C54x DSP microprocessors,
because this processor family has special assembly-language instructions that accelerate this particular algo-
rithm. On other hardware, radix-4 algorithms (Section 3.4.3) may be more ecient. Some devices, such as
AMI Semiconductor's5 Toccata6 ultra-low-power DSP microprocessor family, have on-chip FFT accelerators;
it is always faster and more power-ecient to use these accelerators and whatever radix they prefer. For fast
convolution (Chapter 4), the decimation-in-frequency (Section 3.4.2.2) algorithms may be preferred because
the bit-reversing can be bypassed; however, most DSP microprocessors provide zero-overhead bit-reversed
indexing hardware and prefer decimation-in-time algorithms, so this may not be true for such machines.
Good, compiler- or hardware-friendly programming always matters more than modest dierences in raw
operation counts, so manufacturers' or good third-party FFT libraries are often the best choice. The mod-
ule FFT programming tricks (Chapter 7) references some good, free FFT software (including the FFTW7
package) that is carefully coded to be compiler-friendly; such codes are likely to be considerably faster than
codes written by the casual programmer.

8.3 Multi-dimensional FFTs


Multi-dimensional FFTs pose additional possibilities and problems. The orthogonality and separability of
multi-dimensional DFTs allows them to be eciently computed by a series of one-dimensional FFTs along
each dimension. (For example, a two-dimensional DFT can quickly be computed by performing FFTs of
each row of the data matrix followed by FFTs of all columns, or vice-versa.) Vector-radix FFTs have
been developed with higher eciency per sample than row-column algorithms. Multi-dimensional datasets,
however, are often large and frequently exceed the cache size of the processor, and excessive cache misses may
increase the computational time greatly, thus overwhelming any minor complexity reduction from a vector-
radix algorithm. Either vector-radix FFTs must be carefully programmed to match the cache limitations
of a specic processor, or a row-column approach should be used with matrix transposition in between to
ensure data locality for high cache utilization throughout the computation.

8.4 Few time or frequency samples


FFT algorithms gain their eciency through intermediate computations that can be reused to compute
many DFT frequency samples at once. Some applications require only a handful of frequency samples to
be computed; when that number is of order less than O (logN ), direct computation of those values via
Goertzel's algorithm (Section 3.3) is faster. This has the additional advantage that any frequency, not just
the equally-spaced DFT frequency samples, can be selected. Sorensen and Burrus [17] developed algorithms
for when most input samples are zero or only a block of DFT frequencies are needed, but the computational
cost is of the same order.
4 http://www.ti.com/
5 http://www.amis.com
6 http://www.amis.com/products/dsp/toccata_plus.html
7 http://www.tw.org/
91

Some applications, such as time-frequency analysis via the short-time Fourier transform (Section 2.3) or
spectrogram (Section 2.3), require DFTs of overlapped blocks of discrete-time samples. When the step-size
between blocks is less than O (logN ), the running FFT (Section 3.2) will be most ecient. (Note that any
window must be applied via frequency-domain convolution, which is quite ecient for sinusoidal windows
such as the Hamming window.) For step-sizes of O (logN ) or greater, computation of the DFT of each
successive block via an FFT is faster.
92 BIBLIOGRAPHY
Bibliography
[1] R.C. Agarwal and J.W. Cooley. New algorithms for digital convolution. IEEE Trans. on Acoustics,
Speech, and Signal Processing, 25:392410, Oct 1977.

[2] G. Bruun. Z-transform dft lters and ts. IEEE Transactions on Signal Processing , 26:5663, February
1978.

[3] C.S. Burrus. Index mappings for multidimensional formulation of the dft and convolution. ASSP ,
25:239242, June 1977.

[4] C.S. Burrus. Ecient fourier transform and convolution algorithms. In J.S. Lin and A.V. Oppenheim,
editors, Advanced Topics in Signal Processing, chapter Chapter 4. Prentice-Hall, 1988.

[5] C.S. Burrus. Unscrambling for fast dft algorithms. IEEE Transactions on Acoustics, Speech, and Signal
Processing, ASSP-36(7):10861089, July 1988.

[6] C.S. Burrus and T.W. Parks. DFT/FFT and Convolution Algorithms . Wiley-Interscience, 1985.

[7] Jr. C.G. Boncelet. A rearranged dft algorithm requiring n^2/6 multiplications. IEEE Trans. on Acous-
tics, Speech, and Signal Processing, ASSP-34(6):16581659, Dec 1986.

[8] P. Duhamel and H. Hollman. Split-radix t algorithms. Electronics Letters , 20:1416, Jan 5 1984.

[9] D.M.W. Evans. An improved digit-reversal permutation algorithm for the fast fourier and hartley
transforms. IEEE Transactions on Signal Processing, 35(8):11201125, August 1987.

[10] M. Frigo and S.G. Johnson. The design and implementation of tw3. Proceedings of the IEEE, 93(2):216
231, February 2005.

[11] G Goertzel. An algorithm for the evaluation of nite trigonomentric series. The American Mathematical
Monthly, 1958.

[12] M.T. Heideman H.V. Sorensen and C.S. Burrus. On computing the split-radix t. IEEE Transactions
on Signal Processing, 34(1):152156, 1986.

[13] M.T. Heideman H.V. Sorensen, D.L Jones and C.S. Burrus. Real-valued fast fourier transform algo-
rithms. IEEE Transactions on Signal Processing, 35(6):849863, June 1987.

[14] S.G Johnson and M. Frigo. A modied split-radix t with fewer arithmetic operations. IEEE Transac-
tions on Signal Processing, 54, 2006.

[15] H.W. Schuessler R. Meyer and K. Schwarz. Fft implmentation on dsp chips - theory and practice. IEEE
International Conference on Acoustics, Speech, and Signal Processing, 1990.

[16] H.V. Sorensen and C.S. Burrus. Ecient computation of the dft with only a subset of input or output
points. IEEE Transactions on Signal Processing, 41(3):11841200, March 1993.

93
94 BIBLIOGRAPHY

[17] H.V. Sorensen and C.S. Burrus. Ecient computation of the dft with only a subset of input or output
points. IEEE Transactions on Signal Processing, 41(3):11841200, 1993.

[18] R. Yavne. An economical method for calculating the discrete fourier transform. Proc. AFIPS Fall Joint
Computer Conf.,, 33:115125, 1968.
INDEX 95

Index of Keywords and Terms


Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywords
do not necessarily appear in the text of the page. They are merely associated with that section. Ex.
apples, Ÿ 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1

A assembly language, Ÿ 7(85) Fourier, Ÿ 2.1(5)


auto-correlation, Ÿ 2.2(18), 22
G Goertzel, Ÿ 3.3(40)
B bit reverse, Ÿ 7(85)
bit-reversed, 45, 51 H Hamming window, 13
boxcar, 11 Hann window, 12
Bruun, Ÿ 3.4.4(58) hanning, 12
buttery, 43, 49
I IDFT, Ÿ 2.1(5)
C chirp z-transform, Ÿ 6(81), Ÿ 8(89) in-order FFT, Ÿ 3.4.2.3(51)
circular convolution, Ÿ 1.1(1) in-place algorithm, 45, 51, 58
compiler, Ÿ 7(85) index map, Ÿ 6(81), 81
convolution, Ÿ 4(71)
convolution property, Ÿ 1.1(1)
L lag, 23
Cooley-Tukey, Ÿ 3.4.2.1(42), Ÿ 3.4.2.2(48) N narrow-band spectrogram, 25
D decimation in frequency, Ÿ 3.4.2.2(48), 48, P Parseval's theorem, Ÿ 1.1(1)
Ÿ 3.4.2.3(51) periodogram, Ÿ 2.2(18)
decimation in time, Ÿ 3.4.2.1(42), 43, picket-fence eect, 8
Ÿ 3.4.2.3(51), 55 power spectral density, Ÿ 2.2(18)
decimation-in-frequency, Ÿ 3.4.3(54) power spectral density (PSD), 19
decimation-in-time, Ÿ 3.4.3(54) power spectrum, Ÿ 2.2(18)
DFT, Ÿ 1.1(1), Ÿ 2.1(5), Ÿ 3.1(37), Ÿ 3.2(38), prime factor algorithm, Ÿ 6(81), Ÿ 8(89)
Ÿ 3.3(40), Ÿ 3.4.3(54), Ÿ 5(77), Ÿ 7(85) prime length FFTs, Ÿ 6(81)
DFT even symmetric, 3 prime-factor algorithm, Ÿ 3.1(37)
DFT odd symmetric, 3 primitive root, 81
DFT properties, Ÿ 1.1(1)
DFT symmetry, Ÿ 1.1(1) R Rader's conversion, Ÿ 6(81), 81
DFT-symmetric, 14 radix-2, Ÿ 3.4.2.1(42), 43, Ÿ 3.4.2.2(48), 48
discrete Fourier transform, Ÿ 1.1(1) radix-2 algorithm, Ÿ 3.1(37)
DTFT, Ÿ 2.1(5) radix-2 FFT, Ÿ 3.4.1(42)
DTMF, Ÿ 3.2(38) radix-4, Ÿ 3.4.3(54), 55
radix-4 buttery, 57
F fast, Ÿ 4(71) radix-4 decimation-in-time FFT, 57
fast Fourier transform, Ÿ 3.4.2.1(42), radix-4 FFT, Ÿ 3.4.1(42)
Ÿ 3.4.2.2(48) raised-cosine windows, 13
FFT, Ÿ 3.1(37), Ÿ 3.2(38), Ÿ 3.3(40), real FFT, Ÿ 7(85)
Ÿ 3.4.1(42), Ÿ 3.4.2.1(42), Ÿ 3.4.2.2(48), rectangle, 11
Ÿ 3.4.2.3(51), Ÿ 3.4.3(54), Ÿ 3.4.4(58), Ÿ 4(71), recursive FFT, Ÿ 3.2(38)
Ÿ 7(85), Ÿ 8(89) running FFT, Ÿ 3.2(38), 38
FFT structures, Ÿ 3.4.2.3(51)
ights, 46 S scalloping loss, 8
shift property, Ÿ 1.1(1)
96 INDEX

short time fourier transform, Ÿ 2.3(23) twiddle-factor, 42, 43, 49


side lobe, 11
spectrogram, 25 V Vector-radix FFTs, 90
split-radix, Ÿ 3.4.4(58)
split-radix FFT, Ÿ 3.4.1(42)
W WFTA, Ÿ 6(81)
wide-band spectrogram, 25
stage, 43, 49
window, 11
statistical spectrum estimation, Ÿ 2.2(18)
Winograd, Ÿ 6(81)
stft, Ÿ 2.3(23)
Winograd Fourier Transform Algorithm,
T table lookup, Ÿ 7(85), 85 Ÿ 6(81), Ÿ 8(89)
time reversal, Ÿ 1.1(1)
truncation, 11, 12
Y Yavne, Ÿ 3.4.4(58)
twiddle factor, 38, Ÿ 3.4.2.1(42), 42, Z z-tranform, Ÿ 5(77)
Ÿ 3.4.2.2(48), 48 zero-padding, 7
twiddle factors, 54
ATTRIBUTIONS 97

Attributions
Collection: The DFT, FFT, and Practical Spectral Analysis
Edited by: Douglas L. Jones
URL: http://cnx.org/content/col10281/1.2/
License: http://creativecommons.org/licenses/by/1.0
Module: "DFT Denition and Properties"
By: Douglas L. Jones
URL: http://cnx.org/content/m12019/1.5/
Pages: 1-4
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Spectrum Analysis Using the Discrete Fourier Transform"
By: Douglas L. Jones
URL: http://cnx.org/content/m12032/1.6/
Pages: 5-18
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Classical Statistical Spectral Estimation"
By: Douglas L. Jones
URL: http://cnx.org/content/m12014/1.3/
Pages: 18-23
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Short Time Fourier Transform"
By: Ivan Selesnick
URL: http://cnx.org/content/m10570/2.4/
Pages: 23-35
Copyright: Ivan Selesnick
License: http://creativecommons.org/licenses/by/1.0

Module: "Overview of Fast Fourier Transform (FFT) Algorithms"


By: Douglas L. Jones
URL: http://cnx.org/content/m12026/1.3/
Pages: 37-38
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Running FFT"
By: Douglas L. Jones
URL: http://cnx.org/content/m12029/1.5/
Pages: 38-40
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
98 ATTRIBUTIONS

Module: "Goertzel's Algorithm"


By: Douglas L. Jones
URL: http://cnx.org/content/m12024/1.5/
Pages: 40-41
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Power-of-two FFTs"
By: Douglas L. Jones
URL: http://cnx.org/content/m12059/1.2/
Page: 42
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0

Module: "Decimation-in-time (DIT) Radix-2 FFT"


By: Douglas L. Jones
URL: http://cnx.org/content/m12016/1.7/
Pages: 42-48
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0

Module: "Decimation-in-Frequency (DIF) Radix-2 FFT"


By: Douglas L. Jones
URL: http://cnx.org/content/m12018/1.6/
Pages: 48-51
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Alternate FFT Structures"
By: Douglas L. Jones
URL: http://cnx.org/content/m12012/1.6/
Pages: 51-54
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Radix-4 FFT Algorithms"
By: Douglas L. Jones
URL: http://cnx.org/content/m12027/1.4/
Pages: 54-58
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Split-radix FFT Algorithms"
By: Douglas L. Jones
URL: http://cnx.org/content/m12031/1.5/
Pages: 58-62
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
ATTRIBUTIONS 99

Module: "Multidimensional Index Maps"


By: Douglas L. Jones
URL: http://cnx.org/content/m12025/1.3/
Pages: 62-65
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "The Prime Factor Algorithm"
By: Douglas L. Jones
URL: http://cnx.org/content/m12033/1.3/
Pages: 65-68
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Fast Convolution"
By: Douglas L. Jones
URL: http://cnx.org/content/m12022/1.5/
Pages: 71-76
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Chirp-z Transform"
By: Douglas L. Jones
URL: http://cnx.org/content/m12013/1.4/
Pages: 77-79
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "FFTs of prime length and Rader's conversion"
By: Douglas L. Jones
URL: http://cnx.org/content/m12023/1.3/
Pages: 81-84
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Ecient FFT Algorithm and Programming Tricks"
By: Douglas L. Jones
URL: http://cnx.org/content/m12021/1.6/
Pages: 85-88
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Choosing the Best FFT Algorithm"
By: Douglas L. Jones
URL: http://cnx.org/content/m12060/1.3/
Pages: 89-91
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
The DFT, FFT, and Practical Spectral Analysis
This course reviews the Discrete Fourier Transform and explains its practical use in frequency, spectral, and
time-frequency analysis of signals. It gives an in-depth explanation of many Fast Fourier Transform (FFT)
and FFT-based algorithms and a practical guide to their selection and use.

About Connexions
Since 1999, Connexions has been pioneering a global system where anyone can create course materials and
make them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching and
learning environment open to anyone interested in education, including students, teachers, professors and
lifelong learners. We connect ideas and facilitate educational communities.
Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12
schools, distance learners, and lifelong learners. Connexions materials are in many languages, including
English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part
of an exciting new information distribution system that allows for Print on Demand Books. Connexions
has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course
materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.

You might also like