The DFT FFT and Practical Spectral Analysis 2.1
The DFT FFT and Practical Spectral Analysis 2.1
Analysis
Collection Editor:
Douglas L. Jones
The DFT, FFT, and Practical Spectral
Analysis
Collection Editor:
Douglas L. Jones
Authors:
Douglas L. Jones
Ivan Selesnick
Online:
< http://cnx.org/content/col10281/1.2/ >
CONNEXIONS
1.1.1 DFT
The discrete Fourier transform (DFT)2 is the primary transform used for numerical computation in digital
signal processing. It is very widely used for spectrum analysis (Section 2.1), fast convolution (Chapter 4),
and many other applications. The DFT transforms N discrete-time samples to the same number of discrete
frequency samples, and is dened as
N −1
x (n) e−(j )
X 2πnk
X (k) = N (1.1)
n=0
The DFT is widely used in part because it can be computed very eciently using fast Fourier transform
(FFT)3 algorithms.
1.1.2 IDFT
The inverse DFT (IDFT) transforms N discrete-frequency samples to the same number of discrete-time
samples. The IDFT has a form very similar to the DFT,
N −1
1 X 2πnk
x (n) = X (k) ej N (1.2)
N
k=0
X (k + N ) = X (k)
1 This content is available online at <http://cnx.org/content/m12019/1.5/>.
2 "The DFT: Frequency Domain with a Computer Analysis" <http://cnx.org/content/m10992/latest/>
3 The DFT, FFT, and Practical Spectral Analysis <http://cnx.org/content/col10281/latest/>
4 The DFT, FFT, and Practical Spectral Analysis <http://cnx.org/content/col10281/latest/>
1
2 CHAPTER 1. THE DISCRETE FOURIER TRANSFORM
x (n) = x (n + N )
The modulus operator pmodN means the remainder of p when divided by N . For example,
9mod5 = 4
and
−1mod5 = 4
(a) (b)
Figure 1.1: Illustration of circular time-reversal (a) Original signal (b) Time-reversed
1.1.3.8 Symmetry
The continuous-time Fourier transform5 , the DTFT (2.1), and DFT (2.3) are all dened as transforms of
complex-valued data to complex-valued spectra. However, in practice signals are often real-valued. The
DFT of a real-valued discrete-time signal has a special symmetry, in which the real part of the transform
values are DFT even symmetric and the imaginary part is DFT odd symmetric, as illustrated in the
equation and gure below.
x (n) real ⇔ X (k) = X ((N − k) modN ) (This implies X (0), X N2 are real-valued.)
Figure 1.2: DFT symmetry of real-valued signal (a) Even-symmetry in DFT sense (b) Odd-symmetry
in DFT sense
Chapter 2
Spectrum Analysis
2.1 Spectrum Analysis Using the Discrete Fourier Transform 1
The inverse DTFT (IDTFT) is dened by an integral formula, because it operates on a continuous-frequency
DTFT spectrum: Z π
1
x (n) = X (k) ejωn dω (2.2)
2π −π
The DTFT is very useful for theory and analysis, but is not practical for numerically computing a
spectrum digitally, because
For practical computation of the frequency content of real-world signals, the Discrete Fourier Transform
(DFT) is used.
5
6 CHAPTER 2. SPECTRUM ANALYSIS
The DFT (2.3) and IDFT (2.4) are a self-contained, one-to-one transform pair for a length-N discrete-time
signal. (That is, the DFT (2.3) is not merely an approximation to the DTFT (2.1) as discussed next.)
However, the DFT (2.3) is very often used as a practical approximation to the DTFT (2.1).
1
R π 2π P
X (k) δ ω − 2πk
jωn
x (n) = 2π −π N N e dω
1 N −1 2πnk
X (k) e+j N
P
= N k=0
(2.6)
= IDFT (X (k))
= x (n)
The DFT can thus be used to exactly compute the relative values of the N line spectral components of
the DTFT of any periodic discrete-time sequence with an integer-length period.
k
f=
NT
for k in the range k between 0 and N2 . It is important to note that k ∈ + 1, N − 1 correspond to
N
2
negative frequencies due to the periodicity of the DTFT and the DFT.
Exercise 2.1 (Solution on p. 36.)
In general, will DFT frequency values X (k) exactly equal samples of
the analog Fourier transform
Xa at the corresponding frequencies? That is, will X (k) = Xa 2πk
NT ?
2.1.5 Zero-Padding
If more than N equally spaced frequency samples of a length-N signal are desired, they can easily be obtained
by zero-padding the discrete-time signal and computing a DFT of the longer length. In particular, if LN
DTFT (2.1) samples are desired of a length-N sequence, one can compute the length-LN DFT (2.3) of a
length-LN zero-padded sequence
x (n) if 0 ≤ n ≤ N − 1
z (n) =
0 if N ≤ n ≤ LN − 1
NX−1 LN −1
2πk
x (n) e−(j LN ) = z (n) e−(j LN ) = DFTLN [z [n]]
2πkn X 2πkn
X wk = =
LN n=0 n=0
Note that zero-padding interpolates the spectrum. One should always zero-pad (by about at least a factor
of 4) when using the DFT (2.3) to approximate the DTFT (2.1) to get a clear picture of the DTFT (2.1).
While performing computations on zeros may at rst seem inecient, using FFT (Section 3.1) algorithms,
which generally expect the same number of input and output samples, actually makes this approach very
ecient.
Figure 2.1 (Spectrum without zero-padding) shows the magnitude of the DFT values corresponding to
the non-negative frequencies of a real-valued length-64 DFT of a length-64 signal, both in a "stem" format
to emphasize the discrete nature of the DFT frequency samples, and as a line plot to emphasize its use as
an approximation to the continuous-in-frequency DTFT. From this gure, it appears that the signal has a
single dominant frequency component.
8 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.1: Magnitude DFT spectrum of 64 samples of a signal with a length-64 DFT (no zero padding)
Zero-padding by a factor of two by appending 64 zero values to the signal and computing a length-128 DFT
yields Figure 2.2 (Spectrum with factor-of-two zero-padding). It can now be seen that the signal consists of at
least two narrowband frequency components; the gap between them fell between DFT samples in Figure 2.1
(Spectrum without zero-padding), resulting in a misleading picture of the signal's spectral content. This
is sometimes called the picket-fence eect, and is a result of insucient sampling in frequency. While
zero-padding by a factor of two has revealed more structure, it is unclear whether the peak magnitudes
are reliably rendered, and the jagged linear interpolation in the line graph does not yet reect the smooth,
continuously-dierentiable spectrum of the DTFT of a nite-length truncated signal. Errors in the apparent
peak magnitude due to insucient frequency sampling is sometimes referred to as scalloping loss.
9
Figure 2.2: Magnitude DFT spectrum of 64 samples of a signal with a length-128 DFT (double-length
zero-padding)
Zero-padding to four times the length of the signal, as shown in Figure 2.3 (Spectrum with factor-of-four
zero-padding), clearly shows the spectral structure and reveals that the magnitude of the two spectral lines
are nearly identical. The line graph is still a bit rough and the peak magnitudes and frequencies may not be
precisely captured, but the spectral characteristics of the truncated signal are now clear.
10 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.3: Magnitude DFT spectrum of 64 samples of a signal with a length-256 zero-padded DFT
(four times zero-padding)
Zero-padding to a length of 1024, as shown in Figure 2.4 (Spectrum with factor-of-sixteen zero-padding)
yields a spectrum that is smooth and continuous to the resolution of the computer screen, and produces a
very accurate rendition of the DTFT of the truncated signal.
11
Figure 2.4: Magnitude DFT spectrum of 64 samples of a signal with a length-1024 zero-padded DFT.
The spectrum now looks smooth and continuous and reveals all the structure of the DTFT of a truncated
signal.
The signal used in this example actually consisted of two pure sinusoids of equal magnitude. The slight
dierence in magnitude of the two dominant peaks, the breadth of the peaks, and the sinc-like lesser side
lobe peaks throughout frequency are artifacts of the truncation, or windowing, process used to practically
approximate the DFT. These problems and partial solutions to them are discussed in the following section.
we nd that the DFT (2.3) of the windowed (truncated) signal produces samples not of the true (desired)
DTFT spectrum X (ω), but of a smoothed verson X (ω) ∗ W (ω). We want this to resemble X (ω) as
closely as possible, so W (ω) should be as close to an impulse as possible. The window w (n) need not be a
simple truncation (or rectangle, or boxcar) window; other shapes can also be used as long as they limit
the sequence to at most N consecutive nonzero samples. All good windows are impulse-like, and represent
various tradeos between three criteria:
12 CHAPTER 2. SPECTRUM ANALYSIS
Many dierent window functions4 have been developed for truncating and shaping a length-N signal
segment for spectral analysis. The simple truncation window has a periodic sinc DTFT, as shown in
Figure 2.5. It has the narrowest main-lobe width, 2π N at the -3 dB level and N between the two zeros
4π
surrounding the main lobe, of the common window functions, but also the largest side-lobe peak, at about
-13 dB. The side-lobes also taper o relatively slowly.
Figure 2.5: Length-64 truncation (boxcar) window and its magnitude DFT spectrum
spectral components of similar magnitude, but better for identifying smaller-magnitude components at a
greater distance from the larger components.
Figure 2.6: Length-64 Hann window and its magnitude DFT spectrum
The Hamming window, illustrated in Figure 2.7, has a form similar to the Hann window but with
slightly dierent constants: w [n] = 0.538 − 0.462cos N2πn
−1 for n between 0 and N − 1. Since it is composed
of the same Fourier series harmonics as the Hann window, it has a similar main-lobe width (a bit less than
N at the -3 dB level and N between the two zeros surrounding the main lobe), but the largest side-lobe
3π 8π
peak is much lower, at about -42.5 dB. However, the side-lobes also taper o much more slowly than with
the Hann window. For a given length, the Hamming window is better than the Hann (and of course the
boxcar) windows at separating a small component relatively near to a large component, but worse than the
Hann for identifying very small components at considerable frequency separation. Due to their shape and
form, the Hann and Hamming windows are also known as raised-cosine windows.
14 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.7: Length-64 Hamming window and its magnitude DFT spectrum
note: Standard even-length windows are symmetric around a point halfway between the window
samples N2 − 1 and N2 . For some applications such as time-frequency analysis (Section 2.3), it
may be important to align the window perfectly to a sample. In such cases, a DFT-symmetric
window that is symmetric around the N2 -th samplecan be used. For example, the DFT-symmetric
Hamming window is w [n] = 0.538 − 0.462cos 2πn N . A DFT-symmetric window has a purely real-
valued DFT and DTFT. DFT-symmetric versions of windows, such as the Hamming and Hann
windows, composed of few discrete Fourier series terms of period N , have few non-zero DFT terms
(only when not zero-padded) and can be used eciently in running FFTs (Section 3.2).
The main-lobe width of a window is an inverse function of the window-length N ; for any type of window, a
longer window will always provide better resolution.
Many other windows exist that make various other tradeos between main-lobe width, height of largest
side-lobe, and side-lobe rollo rate. The Kaiser window5 family, based on a modied Bessel function, has an
adjustable parameter that allows the user to tune the tradeo over a continuous range. The Kaiser window
has near-optimal time-frequency resolution and is widely used. A list of many dierent windows can be
5 http://en.wikipedia.org/wiki/Kaiser_window
15
found here6 .
Example 2.1
Figure 2.8 shows 64 samples of a real-valued signal composed of several sinusoids of various fre-
quencies and amplitudes.
Figure 2.9 shows the magnitude (in dB) of the positive frequencies of a length-1024 zero-padded
DFT of this signal (that is, using a simple truncation, or rectangular, window).
6 http://en.wikipedia.org/wiki/Window_function
16 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.9: Magnitude (in dB) of the zero-padded DFT spectrum of the signal in Figure 2.8 using a
simple length-64 rectangular window
From this spectrum, it is clear that the signal has two large, nearby frequency components with
frequencies near 1 radian of essentially the same magnitude.
Figure 2.10 shows the spectral estimate produced using a length-64 Hamming window applied
to the same signal shown in Figure 2.8.
17
Figure 2.10: Magnitude (in dB) of the zero-padded DFT spectrum of the signal in Figure 2.8 using a
length-64 Hamming window
The two large spectral peaks can no longer be resolved; they blur into a single broad peak due
to the reduced spectral resolution of the broader main lobe of the Hamming window. However, the
lower side-lobes reveal a third component at a frequency of about 0.7 radians at about 35 dB lower
magnitude than the larger components. This component was entirely buried under the side-lobes
when the rectangular window was used, but now stands out well above the much lower nearby
side-lobes of the Hamming window.
Figure 2.11 shows the spectral estimate produced using a length-64 Hann window applied to
the same signal shown in Figure 2.8.
18 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.11: Magnitude (in dB) of the zero-padded DFT spectrum of the signal in Figure 2.8 using a
length-64 Hann window
The two large components again merge into a single peak, and the smaller component observed
with the Hamming window is largely lost under the higher nearby side-lobes of the Hann window.
However, due to the much faster side-lobe rollo of the Hann window's spectrum, a fourth com-
ponent at a frequency of about 2.5 radians with a magnitude about 65 dB below that of the main
peaks is now clearly visible.
This example illustrates that no single window is best for all spectrum analyses. The best
window depends on the nature of the signal, and dierent windows may be better for dierent
components of the same signal. A skilled spectrum analysist may apply several dierent windows
to a signal to gain a fuller understanding of the data.
Many signals are either partly or wholly stochastic, or random. Important examples include human speech,
vibration in machines, and CDMA8 communication signals. Given the ever-present noise in electronic sys-
7 This content is available online at <http://cnx.org/content/m12014/1.3/>.
8 http://en.wikipedia.org/wiki/Cdma
19
tems, it can be argued that almost all signals are at least partly stochastic. Such signals may have a
distinct average spectral structure that reveals important information (such as for speech recognition or
early detection of damage in machinery). Spectrum analysis of any single block of data using window-based
deterministic spectrum analysis (Section 2.1), however, produces a random spectrum that may be dicult to
interpret. For such situations, the classical statistical spectrum estimation methods described in this module
can be used.
The goal in classical statistical spectrum analysis is to estimate E (|X (ω) |) , the power spectral
h i
2
density (PSD) across frequency of the stochastic signal. That is, the goal is to nd the expected (mean,
or average) energy density of the signal as a function of frequency. (For zero-mean signals, this equals the
variance of each frequency sample.) Since the spectrum of each block of signal samples is itself random, we
must average the squared spectral magnitudes over a number of blocks of data to nd the mean. There are
two main classical approaches, the periodogram (Section 2.2.1: Periodogram method) and auto-correlation
(Section 2.2.2: Auto-correlation-based approach) methods.
Example 2.2
Figure 2.12 shows the non-negative frequencies of the DFT (zero-padded to 1024 total samples) of
64 samples of a real-valued stochastic signal.
20 CHAPTER 2. SPECTRUM ANALYSIS
With no averaging, the power spectrum is very noisy and dicult to interpret other than noting
a signicant reduction in spectral energy above about half the Nyquist frequency. Various peaks
and valleys appear in the lower frequencies, but it is impossible to say from this gure whether
they represent actual structure in the power spectral density (PSD) or simply random variation in
this single realization. Figure 2.13 shows the same frequencies of a length-1024 DFT of a length-
1024 signal. While the frequency resolution has improved, there is still no averaging, so it remains
dicult to understand the power spectral density of this signal. Certain small peaks in frequency
might represent narrowband components in the spectrum, or may just be random noise peaks.
21
Figure 2.13: DFT magnitude (in dB) of 1024 samples of a stochastic signal
In Figure 2.14, a power spectral density computed from averaging the squared magnitudes of
length-1024 zero-padded DFTs of 508 length-64 blocks of data (overlapped by a factor of four, or a
16-sample step between blocks) are shown.
22 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.14: Power spectrum density estimate (in dB) of 1024 samples of a stochastic signal
While the frequency resolution corresponds to that of a length-64 truncation window, the aver-
aging greatly reduces the variance of the spectral estimate and allows the user to reliably conclude
that the signal consists of lowpass broadband noise with a at power spectrum up to half the
Nyquist frequency, with a stronger narrowband frequency component at around 0.65 radians.
or its auto-correlation X
r (n) = (x (k) x∗ (n + k))
We can thus compute the squared magnitude of the spectrum of a signal by computing the DFT of its
auto-correlation. For stochastic signals, the power spectral density is an expectation, or average, and by
23
linearity of expectation can be found by transforming the average of the auto-correlation. For a nite block
of N signal samples, the average of the autocorrelation values, r (n), is
N −(1−n)
1 X
r (n) = (x (k) x∗ (n + k))
N −n
k=0
Note that with increasing lag, n, fewer values are averaged, so they introduce more noise into the estimated
power spectrum. By windowing (Section 2.1.6: Eects of Windowing) the auto-correlation before transform-
ing it to the frequency domain, a less noisy power spectrum is obtained, at the expense of less resolution.
The multiplication property of the DTFT shows that the windowing smooths the resulting power spectrum
via convolution with the DTFT of the window:
M
X h i
2
\
X (ω) = r (n) w (n) e−(jωn) = E (|X (ω) |) ∗ W (ω)
n=−M
This yields another important interpretation of how the auto-correlation method works: it estimates the
power spectral density by averaging the power spectrum over nearby frequencies, through convolu-
tion with the window function's transform, to reduce variance. Just as with the periodogram approach, there
is always a variance vs. resolution tradeo. The periodogram and the auto-correlation method give similar
results for a similar amount of averaging; the user should simply note that in the periodogram case, the
window introduces smoothing of the spectrum via frequency convolution before squaring the magnitude,
whereas the periodogram convolves the squared magnitude with W (ω).
note: To see how the frequency content of a signal changes over time, we can cut the signal into
blocks and compute the spectrum of each block.
• Block length, R.
• The type of window.
• Amount of overlap between blocks. (Figure 2.15 (STFT: Overlap Parameter))
• Amount of zero padding, if any.
9 This content is available online at <http://cnx.org/content/m10570/2.4/>.
24 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.15
1. The STFT of a signal x (n) is a function of two variables: time and frequency.
2. The block length is determined by the support of the window function w (n).
3. A graphical display of the magnitude of the STFT, |X (ω, m) |, is called the spectrogram of the signal.
It is often used in speech processing.
4. The STFT of a signal is invertible.
5. One can choose the block length. A long block length will provide higher frequency resolution (because
the main-lobe of the window function will be narrow). A short block length will provide higher time
resolution because less averaging across samples is performed for each STFT value.
6. A narrow-band spectrogram is one computed using a relatively long block length R, (long window
function).
7. A wide-band spectrogram is one computed using a relatively short block length R, (short window
function).
n=0 , 0,. . .0
x (n − m) w (n) |R−1
= DFTN
where 0,. . .0 is N − R.
In this denition, the overlap between adjacent blocks is R − 1. The signal is shifted along the window
one sample at a time. That generates more points than is usually needed, so we also sample the STFT along
the time direction. That means we usually evaluate
X d (k, Lm)
where L is the time-skip. The relation between the time-skip, the number of overlapping samples, and the
block length is
Overlap = R − L
(a)
(b)
Figure 2.16
27
Figure 2.17
28 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.18
The matlab program for producing the gures above (Figure 2.17 and Figure 2.18).
% LOAD DATA
load mtlb;
x = mtlb;
figure(1), clf
plot(0:4000,x)
xlabel('n')
ylabel('x(n)')
% SET PARAMETERS
R = 256; % R: block length
window = hamming(R); % window function of length R
N = 512; % N: frequency discretization
L = 35; % L: time lapse between blocks
fs = 7418; % fs: sampling frequency
29
overlap = R - L;
% COMPUTE SPECTROGRAM
[B,f,t] = specgram(x,N,fs,window,overlap);
% MAKE PLOT
figure(2), clf
imagesc(t,f,log10(abs(B)));
colormap('jet')
axis xy
xlabel('time')
ylabel('frequency')
title('SPECTROGRAM, R = 256')
30 CHAPTER 2. SPECTRUM ANALYSIS
Figure 2.19
31
Figure 2.20
Here is another example to illustrate the frequency/time resolution trade-o (See gures - Figure 2.19
(Narrow-band spectrogram: better frequency resolution), Figure 2.20 (Wide-band spectrogram: better time
resolution), and Figure 2.21 (Eect of Window Length R)).
32 CHAPTER 2. SPECTRUM ANALYSIS
(a)
(b)
Figure 2.21
L ∈ {1, 10}
N ∈ {32, 256}
(a)
(b)
Figure 2.22
L and N do not eect the time resolution or the frequency resolution. They only aect the 'pixelation'.
L ∈ {35, 250}
where
• R = block length
• L = time lapse between blocks.
Figure 2.23
If you like, you may listen to this signal with the soundsc command; the data is in the le: stft_data.m.
Here (Figure 2.24) is a gure of the signal.
35
Figure 2.24
36 CHAPTER 2. SPECTRUM ANALYSIS
A fast Fourier transform2 , or FFT3 , is not a new transform, but is a computationally ecient algorithm for
the computing the DFT (Section 1.1). The length-N DFT, dened as
N −1
x (n) e−(j )
X 2πnk
X (k) = N (3.1)
n=0
where X (k) and x (n) are in general complex-valued and 0 ≤ k , n ≤ N −1, requires N complex multiplies to
compute each X (k). Direct computation of all N frequency samples thus requires N 2 complex multiplies and
nk . −(j 2πnk
N (N − 1) complex additions. (This assumes precomputation of the DFT coecients W = e N ) ;
N
otherwise, the cost is even higher.) For the large DFT lengths used in many applications, N 2 operations
may be prohibitive. (For example, digital terrestrial television broadcast in Europe uses N = 2048 or 8192
OFDM channels, and the SETI4 project uses up to length-4194304 DFTs.) DFTs are thus almost always
computed in practice by an FFT algorithm5 . FFTs are very widely used in signal processing, for applications
such as spectrum analysis (Section 2.1) and digital ltering via fast convolution (Chapter 4).
37
38 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
Some applications need DFT (2.3) frequencies of the most recent N samples on an ongoing basis. One
example is DTMF8 , or touch-tone telephone dialing, in which a detection circuit must constantly monitor
the line for two simultaneous frequencies indicating that a telephone button is depressed. In such cases,
most of the data in each successive block of samples is the same, and it is possible to eciently update the
DFT value from the previous sample to compute that of the current sample. Figure 3.1 illustrates successive
length-4 blocks of data for which successive DFT values may be needed. The running FFT algorithm
described here can be used to compute successive DFT values at a cost of only two complex multiplies and
additions per DFT frequency.
7 This content is available online at <http://cnx.org/content/m12029/1.5/>.
8 http://en.wikipedia.org/wiki/DTMF
39
Figure 3.1: The running FFT eciently computes DFT values for successive overlapped blocks of
samples.
The running FFT algorithm is derived by expressing each DFT sample, Xn+1 (ωk ), for the next block at
time n + 1 in terms of the previous value, Xn (ωk ), at time n.
N
X −1
Xn (ωk ) = x (n − p) e−(jωk p)
p=0
N
X −1
Xn+1 (ωk ) = x (n + 1 − p) e−(jωk p)
p=0
Let q = p − 1:
N
X −2 N
X −2
Xn+1 (ωk ) = x (n − q) e−(jωk (q−1)) = ejωk x (n − q) e−(jωk q) + x (n + 1)
q=−1 q=0
Figure 3.2: Block diagram of the running FFT computation, implemented as a recursive lter
Some applications require only a few DFT frequencies. One example is frequency-shift keying (FSK)12
demodulation, in which typically two frequencies are used to transmit binary data; another example is
DTMF13 , or touch-tone telephone dialing, in which a detection circuit must constantly monitor the line
for two simultaneous frequencies indicating that a telephone button is depressed. Goertzel's algorithm[11]
reduces the number of real-valued multiplications by almost a factor of two relative to direct computation via
the DFT equation (2.3). Goertzel's algorithm is thus useful for computing a few frequency values; if many
or most DFT values are needed, FFT algorithms (Section 3.1) that compute all DFT samples in O (N logN )
operations are faster. Goertzel's algorithm can be derived by converting the DFT equation (Section 1.1) into
an equivalent form as a convolution, which can be eciently implemented as a digital lter. For increased
clarity, in the equations below the complex exponential is denoted as e−(j N ) = WNk . Note that because
2πk
WN−N k always equals 1, the DFT equation (Section 1.1) can be rewritten as a convolution, or ltering
operation:
PN −1
X (k) = n=0 x (n) 1WNnk
PN −1
x (n) WN−N k WNnk
= n=0
PN −1
(N −n)(−k)
(3.3)
= n=0 x (n) WN
WN−k x (0) + x (1) WN−k + x (2) WN−k + · · · + x (N − 1) WN−k
=
Note that this last expression can be written in terms of a recursive dierence equation14
where y (−1) = 0. The DFT coecient equals the output of the dierence equation at time n = N :
X (k) = y (N )
11 This content is available online at <http://cnx.org/content/m12024/1.5/>.
12 http://en.wikipedia.org/wiki/Frequency-shift_keying
13 http://en.wikipedia.org/wiki/DTMF
14 "Dierence Equation" <http://cnx.org/content/m10595/latest/>
41
Expressing the dierence equation as a z-transform15 and multiplying both numerator and denominator by
1 − WNk z −1 gives the transfer function
Y (z) 1 1 − WNk z −1 1 − WNk z −1
= H (z) = = =
1 − WN−k z −1 WNk + WN−k z −1 − z −2 1 − 2cos 2πk
X (z) 1− N z −1 − z −2
This system can be realized by the structure in Figure 3.3
Figure 3.3
We want y (n) not for all n, but only for n = N . We can thus compute only the recursive part, or
just the left side of the ow graph in Figure 3.3, for n = [0, 1, . . . , N ], which involves only a real/complex
product rather than a complex/complex product as in a direct DFT (2.3), plus one complex multiply to get
y (N ) = X (k).
note: The input x (N ) at time n = N must equal 0! A slightly more ecient alternate imple-
mentation16 that computes the full recursion only through n = N − 1 and combines the nonzero
operations of the nal recursion with the nal complex multiply can be found here17 , complete
with pseudocode (for real-valued data).
If the data are real-valued, only real/real multiplications and real additions are needed until the nal multiply.
note: The computational cost of Goertzel's algorithm is thus 2N + 2 real multiplies and 4N − 2
real adds, a reduction of almost a factor of two in the number of real multiplies relative to direct
computation via the DFT equation. If the data are real-valued, this cost is almost halved again.
For certain frequencies, additional simplications requiring even fewer multiplications are possible. (For
example, for the DC (k = 0) frequency, all the multipliers equal 1 and only additions are needed.) A
correspondence by C.G. Boncelet, Jr.[7] describes some of these additional simplications. Once again,
Goertzel's and Boncelet's algorithms are ecient for a few DFT frequency samples; if more than logN
frequencies are needed, O (N logN ) FFT algorithms (Section 3.1) that compute all frequencies simultaneously
will be more ecient.
15 "Dierence Equation" <http://cnx.org/content/m10595/latest/>
16 http://www.mstarlabs.com/dsp/goertzel/goertzel.html
17 http://www.mstarlabs.com/dsp/goertzel/goertzel.html
42 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
= DFT N [[x (0) , x (2) , . . . , x (N − 2)]] + WNk DFT N [[x (1) , x (3) , . . . , x (N − 1)]]
2 2
The mathematical simplications in (3.4) reveal that all DFT frequency outputs X (k) can be computed as
the sum of the outputs of two length- N2 DFTs, of the even-indexed and odd-indexed discrete-time samples,
respectively, where the odd-indexed short DFT is multiplied by a so-called twiddle factor term WNk =
18 This content is available online at <http://cnx.org/content/m12059/1.2/>.
19 This content is available online at <http://cnx.org/content/m12016/1.7/>.
43
e−(j N ) . This is called a decimation in time because the time samples are rearranged in alternating
2πk
groups, and a radix-2 algorithm because there are two groups. Figure 3.4 graphically illustrates this form
of the DFT computation, where for convenience the frequency outputs of the length- N2 DFT of the even-
indexed time samples are denoted G (k) and those of the odd-indexed samples as H (k). Because of the
periodicity with N2 frequency samples of these length- N2 DFTs, G (k) and H (k) can be used to compute two
of the length-N DFT frequencies, namely X (k) and X k + N2 , but with a dierent twiddle factor. This
reuse of these short-length DFT outputs gives the FFT its computational savings.
Figure 3.4: Decimation in time of a length-N DFT into two length- N2 DFTs followed by a combining
stage.
Whereas direct computation of all N DFT frequencies according to the DFT equation (Section 1.1)
would require N 2 complex multiplies and N 2 − N complex additions (for complex-valued data), by reusing
the results of the two short-length DFTs as illustrated in Figure 3.4, the computational cost is now
New Operation Counts
2 2
• 2 N2 + N = N2 + N complex multiplies
N2
• 2 N2 2 − 1 + N = 2 complex additions
N
This simple reorganization and reuse has reduced the total computation by almost a factor of two over direct
DFT (Section 1.1) computation!
the remaining buttery is actually a length-2 DFT! The theory of multi-dimensional index maps (Section 3.5)
shows that this must be the case, and that FFTs of any factorable length may consist of successive stages of
shorter-length FFTs with twiddle-factor multiplications in between.
(a) (b)
Figure 3.5: Radix-2 DIT buttery simplication: both operations produce the same outputs
The full radix-2 decimation-in-time decomposition illustrated in Figure 3.6 using the simplied butteries
(Figure 3.5) involves M = log2 N stages, each with N2 butteries per stage. Each buttery requires 1 complex
multiply and 2 adds per buttery. The total cost of the algorithm is thus
Computational cost of radix-2 DIT FFT
• N2 log2 N complex multiplies
• N log2 N complex adds
This is a remarkable savings over direct computation of the DFT. For example, a length-1024 DFT would
require 1048576 complex multiplications and 1047552 complex additions with direct computation, but only
5120 complex multiplications and 10240 complex additions using the radix-2 FFT, a savings by a factor of
100 or more. The relative savings increase with longer FFT lengths, and are less for shorter lengths.
Modest additional reductions in computation can be achieved by noting that certain twiddle factors,
N N N 3N
namely Using special butteries for WN0 , WN2 , WN4 , WN8 , WN8 , require no multiplications, or fewer real
multiplies than other ones. By implementing special butteries for these twiddle factors as discussed in FFT
algorithm and programming tricks, the computational cost of the radix-2 decimation-in-time FFT can be
reduced to
note: In a decimation-in-time radix-2 FFT as illustrated in Figure 3.6, the input is in bit-reversed
order (hence "decimation-in-time"). That is, if the time-sample index n is written as a binary
number, the order is that binary number reversed. The bit-reversal process is illustrated for a
length-N = 8 example below.
Table 3.1
It is important to note that, if the input signal data are placed in bit-reversed order before beginning the
FFT computations, the outputs of each buttery throughout the computation can be placed in the same
memory locations from which the inputs were fetched, resulting in an in-place algorithm that requires no
extra memory to perform the FFT. Most FFT implementations are in-place, and overwrite the input data
with the intermediate values and nally the output.
46 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
note: While of O (N logN ) complexity and thus much faster than a direct DFT, this simple program
is optimized for clarity, not for speed. A speed-optimized program making use of additional ecient
FFT algorithm and programming tricks (Chapter 7) will compute a DFT several times faster on
most machines.
/**********************************************************/
/* fft.c */
/* (c) Douglas L. Jones */
/* University of Illinois at Urbana-Champaign */
/* January 19, 1992 */
/* */
/* fft: in-place radix-2 DIT DFT of a complex input */
/* */
/* input: */
/* n: length of FFT: must be a power of two */
/* m: n = 2**m */
/* input/output */
/* x: double array of length n with real part of data */
/* y: double array of length n with imag part of data */
/* */
/* Permission to copy and use this program is granted */
/* under a Creative Commons "Attribution" license */
/* http://creativecommons.org/licenses/by/1.0/ */
/**********************************************************/
fft(n,m,x,y)
int n,m;
double x[],y[];
{
int i,j,k,n1,n2;
double c,s,e,a,t1,t2;
j = 0; /* bit-reverse */
n2 = n/2;
for (i=1; i < n - 1; i++)
47
{
n1 = n2;
while ( j >= n1 )
{
j = j - n1;
n1 = n1/2;
}
j = j + n1;
if (i < j)
{
t1 = x[i];
x[i] = x[j];
x[j] = t1;
t1 = y[i];
y[i] = y[j];
y[j] = t1;
}
}
n1 = 0; /* FFT */
n2 = 1;
return;
}
48 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
where for notational convenience WNk = e−(j N ) . FFT algorithms gain their speed by reusing the results
2πk
=
P N2 −1
x (n) WN 2rn
+ n=0 x n +
P N2 −1 N
WN2rn 1
(3.6)
n=0 2
P N2 −1
rn
= n=0 x (n) + x n + N2 W N
2
DFT N x (n) + x n + N2
=
2
PN −1 (2r+1)n
X (2r + 1) = n=0 x (n) W N
P N2 −1 N (2r+1)n
= n=0 x (n) + WN2 x n + N2 WN
P N2 −1 rn (3.7)
x (n) − x n + N2 WNn W N
= n=0
2
The mathematical simplications in (3.6) and (3.7) reveal that both the even-indexed and odd-indexed
frequency outputs X (k) can each be computed by a length- N2 DFT. The inputs to these DFTs are sums or
dierences of the rst and second halves of the input signal, respectively, where the input to the short DFT
producing the odd-indexed frequencies is multiplied by a so-called twiddle factor term WNk = e−(j N ) .
2πk
This is called a decimation in frequency because the frequency samples are computed separately in
alternating groups, and a radix-2 algorithm because there are two groups. Figure 3.7 graphically illustrates
this form of the DFT computation. This conversion of the full DFT into a series of shorter DFTs with a
simple preprocessing step gives the decimation-in-frequency FFT its computational savings.
20 This content is available online at <http://cnx.org/content/m12018/1.6/>.
49
Figure 3.7: Decimation in frequency of a length-N DFT into two length- N2 DFTs preceded by a
preprocessing stage.
Whereas direct computation of all N DFT frequencies according to the DFT equation (Section 1.1)
would require N 2 complex multiplies and N 2 − N complex additions (for complex-valued data), by breaking
the computation into two short-length DFTs with some preliminary combining of the data, as illustrated in
Figure 3.7, the computational cost is now
New Operation Counts
2 2
• 2 N2 + N = N2 + N2 complex multiplies
N2
• 2 N2 2 − 1 + N = 2 complex additions
N
This simple manipulation has reduced the total computational cost of the DFT by almost a factor of two!
The initial combining operations for both short-length DFTs involve parallel groups of two time samples,
x (n) and x n + N2 . One of these so-called buttery operations is illustrated in Figure 3.8. There are N2
butteries per stage, each requiring a complex addition and subtraction followed by one twiddle-factor
multiplication by WNn = e−(j N ) on the lower output branch.
2πn
50 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
It is worthwhile to note that the initial add/subtract part of the DIF buttery is actually a length-2
DFT! The theory of multi-dimensional index maps (Section 3.5) shows that this must be the case, and that
FFTs of any factorable length may consist of successive stages of shorter-length FFTs with twiddle-factor
multiplications in between. It is also worth noting that this buttery diers from the decimation-in-time
radix-2 buttery (Figure 3.5) in that the twiddle factor multiplication occurs after the combining.
The full radix-2 decimation-in-frequency decomposition illustrated in Figure 3.9 requires M = log2 N
stages, each with N2 butteries per stage. Each buttery requires 1 complex multiply and 2 adds per
buttery. The total cost of the algorithm is thus
Computational cost of radix-2 DIF FFT
• N2 log2 N complex multiplies
• N log2 N complex adds
This is a remarkable savings over direct computation of the DFT. For example, a length-1024 DFT would
require 1048576 complex multiplications and 1047552 complex additions with direct computation, but only
5120 complex multiplications and 10240 complex additions using the radix-2 FFT, a savings by a factor
of 100 or more. The relative savings increase with longer FFT lengths, and are less for shorter lengths.
Modest additional reductions in computation can be achieved by noting that certain twiddle factors, namely
N N N 3N
WN0 , WN2 , WN4 , WN8 , WN8 , require no multiplications, or fewer real multiplies than other ones. By
implementing special butteries for these twiddle factors as discussed in FFT algorithm and programming
tricks (Chapter 7), the computational cost of the radix-2 decimation-in-frequency FFT can be reduced to
It is important to note that, if the input data are in order before beginning the FFT computations, the
outputs of each buttery throughout the computation can be placed in the same memory locations from
which the inputs were fetched, resulting in an in-place algorithm that requires no extra memory to perform
the FFT. Most FFT implementations are in-place, and overwrite the input data with the intermediate values
and nally the output.
Figure 3.10: Decimation-in-frequency radix-2 FFT (Section 3.4.2.2) with bit-reversed input. This
is an in-place (Section 3.4.2.1) algorithm in which the same memory can be reused throughout the
computation.
There is a similar structure for the decimation-in-time FFT (Section 3.4.2.1) with in-order inputs and
bit-reversed frequencies. This structure can be useful for fast convolution (Chapter 4) on machines that favor
decimation-in-time algorithms because the lter can be stored in bit-reverse order, and then the inverse FFT
returns an in-order result without ever bit-reversing any data. As discussed in Ecient FFT Programming
Tricks (Chapter 7), this may save several percent of the execution time.
The structure in Figure 3.11 implements a decimation-in-frequency FFT (Section 3.4.2.2) that has both
input and output in order. It thus avoids the need for bit-reversing altogether. Unfortunately, it destroys
the in-place (Section 3.4.2.1) structure somewhat, making an FFT program more complicated and requiring
more memory; on most machines the resulting cost exceeds the benets. This structure can be computed in
place if two butteries are computed simultaneously.
53
Figure 3.11: Decimation-in-frequency radix-2 FFT with in-order input and output. It can be computed
in-place if two butteries are computed simultaneously.
The structure in Figure 3.12 has a constant geometry; the connections between memory locations are iden-
tical in each FFT stage (Section 3.4.2.1). Since it is not in-place and requires bit-reversal, it is inconvenient
for software implementation, but can be attractive for a highly parallel hardware implementation because
the connections between stages can be hardwired. An analogous structure exists that has bit-reversed inputs
and in-order outputs.
54 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
Figure 3.12: This constant-geometry structure has the same interconnect pattern from stage to stage.
This structure is sometimes useful for special hardware.
PN −1 −(j 2πnk
P N4 −1 2π(4n)k
N )
−(j )
X (k) = n=0 x (n) e = n=0 x (4n) e + N (3.8)
P N4 −1 2π(4n+1)k
N
−1
2π(4n+2)k
x (4n + 1) e−(j N ) x (4n + 2) e−(j N )
P
n=0 + 4
n=0 +
P N4 −1 2π(4n+3)k
n=0 x (4n + 3) e−(j N ) = DFT N [x (4n)] + WNk DFT N [x (4n + 1)] +
4 4
This is called a decimation in time because the time samples are rearranged in alternating groups,
and a radix-4 algorithm because there are four groups. Figure 3.13 (Radix-4 DIT structure) graphically
illustrates this form of the DFT computation.
56 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
Figure 3.13: Decimation in time of a length-N DFT into four length- N4 DFTs followed by a combining
stage.
57
Due to the periodicity with N4 of the short-length DFTs, their outputs for frequency-sample k are reused
to compute X (k), X k + N4 , X k + N2 , and X k + 3N 4 . It is this reuse that gives the radix-4 FFT
its eciency. The computations involved with each group of four frequency samples constitute the radix-
4 buttery, which is shown in Figure 3.14. Through further rearrangement, it can be shown that this
computation can be simplied to three twiddle-factor multiplies and a length-4 DFT! The theory of multi-
dimensional index maps (Section 3.5) shows that this must be the case, and that FFTs of any factorable
length may consist of successive stages of shorter-length FFTs with twiddle-factor multiplications in between.
The length-4 DFT requires no multiplies and only eight complex additions (this ecient computation can
be derived using a radix-2 FFT (Section 3.4.2.1)).
(a) (b)
Figure 3.14: The radix-4 DIT buttery can be simplied to a length-4 DFT preceded by three
twiddle-factor multiplies.
If the FFT length N = 4M , the shorter-length DFTs can be further decomposed recursively in the same
manner to produce the full radix-4 decimation-in-time FFT. As in the radix-2 decimation-in-time FFT
(Section 3.4.2.1), each stage of decomposition creates additional savings in computation. To determine the
total computational cost of the radix-4 FFT, note that there are M = log4 N = log22 N stages, each with N4
butteries per stage. Each radix-4 buttery requires 3 complex multiplies and 8 complex additions. The
total cost is then
Radix-4 FFT Operation Counts
log2 N
• 3 N4 2 = 38 N log2 N complex multiplies (75% of a radix-2 FFT)
log2 N
• 8 N4 2 = N log2 N complex adds (same as a radix-2 FFT)
The radix-4 FFT requires only 75% as many complex multiplies as the radix-2 (Section 3.4.2.1) FFTs,
although it uses the same number of complex additions. These additional savings make it a widely-used
FFT algorithm.
The decimation-in-time operation regroups the input samples at each successive stage of decomposition,
resulting in a "digit-reversed" input order. That is, if the time-sample index n is written as a base-4 number,
the order is that base-4 number reversed. The digit-reversal process is illustrated for a length-N = 64
example below.
Example 3.2: N = 64 = 4^3
58 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
Original Number Original Digit Order Reversed Digit Order Digit-Reversed Number
0 000 000 0
1 001 100 16
2 002 200 32
3 003 300 48
4 010 010 4
5 011 110 20
.. .. .. ..
. . . .
Table 3.2
It is important to note that, if the input signal data are placed in digit-reversed order before beginning the
FFT computations, the outputs of each buttery throughout the computation can be placed in the same
memory locations from which the inputs were fetched, resulting in an in-place algorithm that requires
no extra memory to perform the FFT. Most FFT implementations are in-place, and overwrite the input
data with the intermediate values and nally the output. A slight rearrangement within the radix-4 FFT
introduced by Burrus [5] allows the inputs to be arranged in bit-reversed (Section 3.4.2.1) rather than digit-
reversed order.
A radix-4 decimation-in-frequency (Section 3.4.2.2) FFT can be derived similarly to the radix-2 DIF
FFT (Section 3.4.2.2), by separately computing all four groups of every fourth output frequency sample.
The DIF radix-4 FFT is a ow-graph reversal of the DIT radix-4 FFT, with the same operation counts
and twiddle factors in the reversed order. The output ends up in digit-reversed order for an in-place DIF
algorithm.
Exercise 3.1 (Solution on p. 69.)
How do we derive a radix-4 algorithm when N = 4M 2?
Figure 3.15: See Decimation-in-Time (DIT) Radix-2 FFT (Section 3.4.2.1) and Radix-4 FFT Algo-
rithms (Section 3.4.3) for more information on these algorithms.
An alternative derivation notes that radix-2 butteries of the form shown in Figure 2 can merge twiddle
factors from two successive stages to eliminate one-third of them; hence, the split-radix algorithm requires
only about two-thirds as many multiplications as a radix-2 FFT.
(a) (b)
The split-radix algorithm can also be derived by mixing the radix-2 (Section 3.4.2.1) and radix-4 (Sec-
tion 3.4.3) decompositions.
DIT Split-radix derivation
P N2 −1 −(j
2π(2n)k
)
P N4 −1 −(j
2π(4n+1)k
)
X (k) = n=0 x (2n) e N + n=0 x (4n + 1) e N + (3.9)
P N4 −1 −(j
2π(4n+3)k
)
n=0 x (4n + 3) e N = DFT N [x (2n)] + WNk DFT N x (4n + 1) +
2 4
Figure 3.17: The split-radix buttery mixes radix-2 and radix-4 decompositions and is L-shaped
Further decomposition of the half- and quarter-length DFTs yields the full split-radix algorithm. The
mix of dierent-length FFTs in dierent parts of the owgraph results in a somewhat irregular algorithm;
Sorensen et al.[12] show how to adjust the computation such that the data retains the simpler radix-2
The multiplicative complexity of the split-radix algorithm is only about two-thirds that of the radix-2
FFT, and is better than the radix-4 FFT or any higher power-of-two radix as well. The additions within
the complex twiddle-factor multiplies are similarly reduced, but since the underlying buttery tree remains
the same in all power-of-two algorithms, the buttery additions remain the same and the overall reduction
in additions is much less.
Operation Counts
Table 3.3
62 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
Comments
• The split-radix algorithm has a somewhat irregular structure. Successful progams have been written
(Sorensen[12]) for uni-processor machines, but it may be dicult to eciently code the split-radix
algorithm for vector or multi-processor machines.
• G. Bruun's algorithm[2] requires only N − 2 more operations than the split-radix algorithm and has a
regular structure, so it might be better for multi-processor or special-purpose hardware.
• The execution time of FFT programs generally depends more on compiler- or hardware-friendly soft-
ware design than on the exact computational complexity. See Ecient FFT Algorithm and Program-
ming Tricks (Chapter 7) for further pointers and links to good code.
N
N −1 1 2 −1
(n +2n2 )k
X X X
nk
X (k) = x (n) WN = x (n1 + 2n2 ) W 1 N
n=0 n1 =0 n2 =0
Also, let k = N
+ k2 : k1 = [0, 1]: k2 = 0, 1, 2, . . . , N2 − 1
2 k1
note: As long as there is a one-to-one correspondence between the original indices [n, k] =
[0, 1, 2, . . . , N − 1] and the n, k generated by the index map, the computation is the same; only
the order in which the sums are done is changed.
= X N2 k1 + k2
X (k)
PN −1 2 k2 +k2 )
n( N
= n=0 x (n) W N
P1 P N2 −1 (n1 +n2 )( N2 k1 +k2 )
= n2 =0 x (n 1 + 2n 2 ) W
n1 =0
P N
N
(3.10)
N
P1 −1 2 1 k2
n
= n1 =0 n2 =0 x ([n1 , n2 ]) WN
2
WNn1 k2 WNN n2 k1 WN2n2 k2
P N
2 −1
P1 n1 k 2 n1 k2 n2 k2
= x ([n 1 , n2 ]) W W 1W
n2 =0 2 N
n1 =0 N
2
N
P1 n1 k2 n1 k2 P 2 1 n2 k2
= n1 =0 W2 WN n2 =0 x ([n1 , n2 ]) W N
2
note: Key to FFT is choosing index map so that one of the cross-terms disappears!
Exercise 3.2
What is an index map for a radix-4 (Section 3.4.3) DIT algorithm?
24 This content is available online at <http://cnx.org/content/m12025/1.3/>.
63
Exercise 3.3
What is an index map for a radix-4 (Section 3.4.3) DIF algortihm?
Exercise 3.4
What is an index map for a radix-3 DIT algorithm? (N a multiple of 3)
For arbitrary composite N = N1 N2 , we can dene an index map
n = n 1 + N1 n 2
k = N2 k1 + k2
n1 = [0, 1, 2, . . . , N1 − 1]
k1 = [0, 1, 2, . . . , N1 − 1]
n2 = [0, 1, 2, . . . , N2 − 1]
k2 = [0, 1, 2, . . . , N2 − 1]
X (k) = X (k1 , k2 )
PN1 −1 PN2 −1 N2 n1 k1
= n1 =0 n2 =0 x (n1 , n2 ) WN WNn1 k2 WNN k1 n2 WNN1 n2 k2
PN1 −1 PN2 −1 n1 k1 n1 k2 n2 k2
(3.11)
= n1 =0 x (n 1 , n2 ) W N1 W N 1W N2
hn2 =0 i
= DFTn1 ,N1 WNn1 k2 DFTn2 ,N2 [x (n1 , n2 )]
N2 = 15
N = 240
CFA = 7680
Tremendous saving for any composite N
64 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
Pictorial Representations
Exercise 3.5
Can the composite CFAs be implemented in-place?
Exercise 3.6
What do we do with N = N1 N2 N3 ?
n = (K3 k1 + K4 k2 ) modN
n1 = [0, 1, . . . , N1 − 1]
k1 = [0, 1, . . . , N1 − 1]
n2 = [0, 1, . . . , N2 − 1]
k2 = [0, 1, . . . , N2 − 1]
The basic ideas is to simply reorder the DFT (2.3) computation to expose the redundancies in the DFT
(2.3), and exploit these to reduce computation!
Three conditions must be satised to make this map (p. 65) serve our purposes
1. Each map must be one-to-one from 0 to N − 1, because we want to do the same computation, just in
a dierent order.
2. The map must be cleverly chosen so that computation is reduced
3. The map should be chosen to make the short-length transforms be DFTs (2.3). (Not essential, since
fast algorithms for short-length DFT (2.3)-like computations could be developed, but it makes our
work easier.)
3.6.1.1.2 Case II
N1 , N2 not relatively prime: gcd (N1 , N2 ) > 1
K1 = aN2 and K2 6= bN1 and gcd (a, N1 ) = 1, gcd (K2 , N2 ) = 1 or K1 6= aN2 and K2 = bN1 and
gcd (K1 , N1 ) = 1, gcd (b, N2 ) = 1 where K1 , K2 , K3 , K4 , N1 , N2 , a, b integers
• (K1 K4 ) modN = 0 exclusive or (K2 K3 ) modN = 0 ⇒ Common Factor Algorithm (CFA). Then
No twiddle factors!
note: A PFA exists only and always for relatively prime N1 , N2
Example 3.4
N1 = 3, N2 = 5 N = 15
n = (5n1 + 3n2 ) mod15
3 = K2 = bN1 = 3b
gcd (5, 3) = 1
gcd (3, 5) = 1
10 = K3 = aN2 = 5a
6 = K4 = bN1 = 3b
gcd (10, 3) = 1
gcd (6, 5) = 1
67
2-D map
Operation Counts
• N2 length- N1 DFTs +N1 length- N2 DFTs
N2 N1 2 + N1 N2 2 = N (N1 + N2 )
complex multiplies
68 CHAPTER 3. FAST FOURIER TRANSFORM ALGORITHMS
• Suppose N = N1 N2 N3 . . . NM
N (N1 + N2 + · · · + NM )
Complex multiplies
note: radix-2 (Section 3.4.2.1), radix-4 (Section 3.4.3) eliminate all multiplies in short-length
DFTs, but have twiddle factors: PFA eliminates all twiddle factors, but ends up with multiplies in
short-length DFTs (2.3). Surprisingly, total operation counts end up being very similar for similar
lengths.
69
Fast Convolution 1
71
72 CHAPTER 4. FAST CONVOLUTION
Figure 4.1
To achieve linear convolution using fast circular convolution, we must use zero-padded DFTs of length
N ≥L+M −1
Figure 4.2
note: There is some ineciency when compared to circular convolution due to longer zero-padded
73
DFTs (2.3). Still, O N
log2 N savings over direct computation.
Figure 4.3
the rst M − 1 samples are wrapped around and thus is incorrect. However, for M − 1 ≤ n ≤ N − 1,the
convolution is linear convolution, so these samples are correct. Thus N − M + 1 good outputs are produced
for each length-N circular convolution.
The Overlap-Save Method: Break long signal into successive blocks of N samples, each block overlapping
the previous block by M − 1 samples. Perform circular convolution of each block with lter h (m). Discard
rst M − 1 points in each output block, and concatenate the remaining points to create y (n).
74 CHAPTER 4. FAST CONVOLUTION
Figure 4.4
Computation cost for a length-N equals 2n FFT per output sample is (assuming precomputed H (k)) 2
FFTs and N multiplies
2 N2 log2 N + N
N (log2 N + 1)
= complex multiplies
N −M +1 N −M +1
75
2 (N log2 N ) 2N log2 N
= complex adds
N −M +1 N −M +1
Compare to M mults, M − 1 adds per output point for direct method. For a given M , optimal N can
be determined by nding N minimizing operation counts. Usualy, optimal N is 4M ≤ Nopt ≤ 8M .
Figure 4.5
Add successive blocks, overlapped by M − 1 samples, so that the tails sum to produce the complete linear
convolution.
76 CHAPTER 4. FAST CONVOLUTION
Figure 4.6
Computational Cost: Two length N = L + M − 1 FFTs and M mults and M − 1 adds per L output
points; essentially the sames as OLS method.
Chapter 5
Chirp-z Transform 1
77
78 CHAPTER 5. CHIRP-Z TRANSFORM
Figure 5.1
2 2
Note that (k − n) = n2 − 2nk + k 2 ⇒ nk = 1
2 n2 + k 2 − (k − n) , So
−1
N
!
X
−n n2 k2 (
− (k−n)2 )
X (zk ) = x (n) A W 2 W 2 W 2
n=0
−1
N
!
k2
X
−n n2 (
− (k−n)2 )
=W 2 x (n) A W 2 W 2
n=0
n2
1. Premultiply x (n) by An W 2 , n = [0, 1, . . . , N − 1] to make y (n)
(
− (k−n)2 )
2. Linearly convolve with W 2
k2
3. Post multiply by to get W 2 to get X (zk ).
1. (list, item 1, p. 78) and 3. (list, item 3, p. 79) require N and M operations respectively. 2. (list,
item 2, p. 79) can be performed eciently using fast convolution.
Figure 5.2
n2
“ ”
−
W is required only for − ((N − 1)) ≤ n ≤ M − 1, so this linear convolution can be implemented
2
with L ≥ N + M − 1 FFTs.
n2
“ ”
−
note: Wrap W 2
around L when implementing with circular convolution.
So, a weird-length DFT can be implemented relatively eciently using power-of-two algorithms via the
chirp-z transform.
Also useful for "zoom-FFTs".
80 CHAPTER 5. CHIRP-Z TRANSFORM
Chapter 6
The power-of-two FFT algorithms (Section 3.4.1), such as the radix-2 (Section 3.4.2.1) and radix-4 (Sec-
tion 3.4.3) FFTs, and the common-factor (Section 3.5) and prime-factor (Section 3.6) FFTs, achieve great
reductions in computational complexity of the DFT (Section 1.1) when the length, N , is a composite number.
DFTs of prime length are sometimes needed, however, particularly for the short-length DFTs in common-
factor or prime-factor algorithms. The methods described here, along with the composite-length algorithms,
allow fast computation of DFTs of any length.
There are two main ways of performing DFTs of prime length:
81
82 CHAPTER 6. FFTS OF PRIME LENGTH AND RADER'S CONVERSION
Example 6.1
N = 5, r = 2
20 mod5 = 1
21 mod5 = 2
22 mod5 = 4
23 mod5 = 3
Example 6.2
N = 5, r = 2 r−1 = 3
(2 × 3) mod5 = 1
30 mod5 = 1
31 mod5 = 3
32 mod5 = 4
33 mod5 = 2
So why do we care? Because we can use these facts to turn a DFT (2.3) into a convolution!
PN −2 −m p −m
X ((rp ) modN ) = m=0 x ((r ) modN ) W r r + x (0)
PN −2
= m=0 x ((r
−m
) modN ) W r p−m
+ x (0) (6.1)
l
= x (0) + x r−l modN ∗ W r
where l = [0, 1, . . . , N − 2]
83
Example 6.3
N = 5, r = 2, r−1 = 3
X (0) 0 0 0 0 0 x (0)
X (1) 0 1 2 3 4 x (1)
X (2) = 0 2 4 1 3 x (2)
X (3) 0 3 1 4 2 x (3)
X (4) 0 4 3 2 1 x (4)
X (0) 0 0 0 0 0 x (0)
X (1) 0 1 3 4 2 x (1)
X (2) = 0 2 1 3 4 x (3)
X (4) 0 4 2 1 1 x (4)
X (3) 0 3 4 2 3 x (2)
where for visibility the matrix entries represent only the power, m of the corresponding DFT term
WNm Note that the 4-by-4 circulant matrix2
1 3 4 2
2 1 3 4
4 2 1 1
3 4 2 3
Thus Winograd's minimum-multiply FFT's are useful only for small N . They are very important for Prime-
Factor Algorithms (Section 3.6), which generally use Winograd modules to implement the short-length DFTs.
Tables giving the multiplies and adds necessary to compute Winograd FFTs for various lengths can be found
2 http://en.wikipedia.org/wiki/Circulant_matrix
84 CHAPTER 6. FFTS OF PRIME LENGTH AND RADER'S CONVERSION
in C.S. Burrus (1988)[4]. Tables and FORTRAN and TMS32010 programs for these short-length transforms
can be found in C.S. Burrus and T.W. Parks (1985)[6]. The theory and derivation of these algorithms is
quite elegant but requires substantial background in number theory and abstract algebra. Fortunately for
the practitioner, all of the short algorithms one is likely to need have already been derived and can simply
be looked up without mastering the details of their derivation.
The use of FFT algorithms (Section 3.1) such as the radix-2 decimation-in-time (Section 3.4.2.1) or
decimation-in-frequency (Section 3.4.2.2) methods result in tremendous savings in computations when com-
puting the discrete Fourier transform (Section 1.1). While most of the speed-up of FFTs comes from this,
careful implementation can provide additional savings ranging from a few percent to several-fold increases
in program speed.
FFT algorithms (Section 3.1) consist of cosines and sines that each take the equivalent of several multiplies
to compute. However, at most N unique twiddle factors can appear in any FFT or DFT algorithm. (For
example, in the radix-2 decimation-in-time FFT (Section 3.4.2.1), only N2 twiddle factors WN k , k =
0, 1, 2, . . . , N2 − 1 are used.) These twiddle factors can be precomputed once and stored in an array in
computer memory, and accessed in the FFT algorithm by table lookup. This simple technique yields very
substantial savings and is almost always used in practice.
85
CHAPTER 7. EFFICIENT FFT ALGORITHM AND PROGRAMMING
86
TRICKS
Zero-Padding) data. Goertzel's algorithm (Section 3.3) is useful when only a few DFT outputs are needed.
The running FFT (Section 3.2) can be faster when DFTs of highly overlapped blocks of data are needed, as
in a spectrogram (Section 2.3).
D =C +S
E =C −S
CX − SY = EY + Z
CY + SX = DX − Z
In an FFT, D and E come entirely from the twiddle factors, so they can be precomputed and stored in a
look-up table. This reduces the cost of the complex twiddle-factor multiply to 3 real multiplies and 3 real
adds, or one less and one more, respectively, than the conventional 4/2 computation.
9 http://en.wikipedia.org/wiki/Loop_unrolling
CHAPTER 7. EFFICIENT FFT ALGORITHM AND PROGRAMMING
88
TRICKS
• Substantial Savings - (≥ 2)
a. Table lookup of cosine/sine
b. Compiler tricks/good programming
c. Assembly-language programming
d. Special-purpose hardware
e. Real-data FFT for real data (factor of 2)
f. Special cases
• Minor Savings -
a. radix-4 (Section 3.4.3), split-radix (Section 3.4.4) (-10% - +30%)
b. special butteries
c. 3-real-multiplication complex multiply
d. Fast bit-reversal (up to 6%)
note: On general-purpose machines, computation is only part of the total run time. Address
generation, indexing, data shuing, and memory access take up much or most of the cycles.
note: A well-written radix-2 (Section 3.4.2.1) program will run much faster than a poorly written
split-radix (Section 3.4.4) program!
Chapter 8
Table 8.1
The Winograd Fourier Transform Algorithm (Section 6.3: Winograd Fourier Transform Algorithm
(WFTA)) is particularly dicult to program and is rarely used in practice. For applications in which
the transform length is somewhat arbitrary (such as fast convolution or general spectrum analysis), the
length is usually chosen to be a power of two. When a particular length is required (for example, in the USA
each carrier has exactly 416 frequency channels in each band in the AMPS2 cellular telephone standard), a
Prime Factor Algorithm (Section 3.6) for all the relatively prime terms is preferred, with a Common Factor
Algorithm (Section 3.5) for other non-prime lengths. Winograd's short-length modules (Chapter 6) should
be used for the prime-length factors that are not powers of two. The chirp z-transform (Chapter 5) oers a
universal way to compute any length DFT (Section 2.1) (for example, Matlab3 reportedly uses this method
1 This content is available online at <http://cnx.org/content/m12060/1.3/>.
2 http://en.wikipedia.org/wiki/AMPS
3 http://www.mathworks.com/products/matlab/
89
90 CHAPTER 8. CHOOSING THE BEST FFT ALGORITHM
for lengths other than a power of two), at a few times higher cost than that of a CFA or PFA optimized for
that specic length. The chirp z-transform (Chapter 5), along with Rader's conversion (Section 6.1: Rader's
Conversion), assure us that algorithms of O (N logN ) complexity exist for any DFT length N .
Some applications, such as time-frequency analysis via the short-time Fourier transform (Section 2.3) or
spectrogram (Section 2.3), require DFTs of overlapped blocks of discrete-time samples. When the step-size
between blocks is less than O (logN ), the running FFT (Section 3.2) will be most ecient. (Note that any
window must be applied via frequency-domain convolution, which is quite ecient for sinusoidal windows
such as the Hamming window.) For step-sizes of O (logN ) or greater, computation of the DFT of each
successive block via an FFT is faster.
92 BIBLIOGRAPHY
Bibliography
[1] R.C. Agarwal and J.W. Cooley. New algorithms for digital convolution. IEEE Trans. on Acoustics,
Speech, and Signal Processing, 25:392410, Oct 1977.
[2] G. Bruun. Z-transform dft lters and ts. IEEE Transactions on Signal Processing , 26:5663, February
1978.
[3] C.S. Burrus. Index mappings for multidimensional formulation of the dft and convolution. ASSP ,
25:239242, June 1977.
[4] C.S. Burrus. Ecient fourier transform and convolution algorithms. In J.S. Lin and A.V. Oppenheim,
editors, Advanced Topics in Signal Processing, chapter Chapter 4. Prentice-Hall, 1988.
[5] C.S. Burrus. Unscrambling for fast dft algorithms. IEEE Transactions on Acoustics, Speech, and Signal
Processing, ASSP-36(7):10861089, July 1988.
[6] C.S. Burrus and T.W. Parks. DFT/FFT and Convolution Algorithms . Wiley-Interscience, 1985.
[7] Jr. C.G. Boncelet. A rearranged dft algorithm requiring n^2/6 multiplications. IEEE Trans. on Acous-
tics, Speech, and Signal Processing, ASSP-34(6):16581659, Dec 1986.
[8] P. Duhamel and H. Hollman. Split-radix t algorithms. Electronics Letters , 20:1416, Jan 5 1984.
[9] D.M.W. Evans. An improved digit-reversal permutation algorithm for the fast fourier and hartley
transforms. IEEE Transactions on Signal Processing, 35(8):11201125, August 1987.
[10] M. Frigo and S.G. Johnson. The design and implementation of tw3. Proceedings of the IEEE, 93(2):216
231, February 2005.
[11] G Goertzel. An algorithm for the evaluation of nite trigonomentric series. The American Mathematical
Monthly, 1958.
[12] M.T. Heideman H.V. Sorensen and C.S. Burrus. On computing the split-radix t. IEEE Transactions
on Signal Processing, 34(1):152156, 1986.
[13] M.T. Heideman H.V. Sorensen, D.L Jones and C.S. Burrus. Real-valued fast fourier transform algo-
rithms. IEEE Transactions on Signal Processing, 35(6):849863, June 1987.
[14] S.G Johnson and M. Frigo. A modied split-radix t with fewer arithmetic operations. IEEE Transac-
tions on Signal Processing, 54, 2006.
[15] H.W. Schuessler R. Meyer and K. Schwarz. Fft implmentation on dsp chips - theory and practice. IEEE
International Conference on Acoustics, Speech, and Signal Processing, 1990.
[16] H.V. Sorensen and C.S. Burrus. Ecient computation of the dft with only a subset of input or output
points. IEEE Transactions on Signal Processing, 41(3):11841200, March 1993.
93
94 BIBLIOGRAPHY
[17] H.V. Sorensen and C.S. Burrus. Ecient computation of the dft with only a subset of input or output
points. IEEE Transactions on Signal Processing, 41(3):11841200, 1993.
[18] R. Yavne. An economical method for calculating the discrete fourier transform. Proc. AFIPS Fall Joint
Computer Conf.,, 33:115125, 1968.
INDEX 95
Attributions
Collection: The DFT, FFT, and Practical Spectral Analysis
Edited by: Douglas L. Jones
URL: http://cnx.org/content/col10281/1.2/
License: http://creativecommons.org/licenses/by/1.0
Module: "DFT Denition and Properties"
By: Douglas L. Jones
URL: http://cnx.org/content/m12019/1.5/
Pages: 1-4
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Spectrum Analysis Using the Discrete Fourier Transform"
By: Douglas L. Jones
URL: http://cnx.org/content/m12032/1.6/
Pages: 5-18
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Classical Statistical Spectral Estimation"
By: Douglas L. Jones
URL: http://cnx.org/content/m12014/1.3/
Pages: 18-23
Copyright: Douglas L. Jones
License: http://creativecommons.org/licenses/by/1.0
Module: "Short Time Fourier Transform"
By: Ivan Selesnick
URL: http://cnx.org/content/m10570/2.4/
Pages: 23-35
Copyright: Ivan Selesnick
License: http://creativecommons.org/licenses/by/1.0
About Connexions
Since 1999, Connexions has been pioneering a global system where anyone can create course materials and
make them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching and
learning environment open to anyone interested in education, including students, teachers, professors and
lifelong learners. We connect ideas and facilitate educational communities.
Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12
schools, distance learners, and lifelong learners. Connexions materials are in many languages, including
English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part
of an exciting new information distribution system that allows for Print on Demand Books. Connexions
has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course
materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.