Sec. 9.3 Decimation-in-Time FFT Algorithms 635
transform using the Goertzel algorithm, the number of real multiplications required
is approximately N? and the number of real additions is approximately 2N?. While
this is more efficient than the direct computation of the discrete Fourier transform, the
amount of computation is still proportional to N?.
In either the direct method or the Goertzel method we do not need to evaluate
X{k] at all N values of k. Indeed, we can evaluate X[k] for any M values of k, with
each DFT value being computed by a recursive system of the form of Figure 9.2 with
appropriate coefficients. In this case, the total computation is proportional to NM. The
Goertzel method and the direct method are attractive when M is small; however, as
indicated previously, algorithms are available for which the computation is proportional
to Nlog, N when N is a power of 2. Therefore, when M is less than log, N, either the
Goertzel algorithm or the direct method may in fact be the most efficient method,
but when all N values of X[k] are required, the decimation-in-time algorithms, to be
considered next, are roughly (N/log, N) times more efficient than either the direct
method or the Goertzel method.
9.3 DECIMATION-IN-TIME FFT ALGORITHMS
In computing the DFT, dramatic efficiency results from decomposing the computation
into successively smaller DFT computations. In this process, we exploit both the sym-
metry and the periodicity of the complex exponential WA" = e~/@"/")k"_ Algorithms in
which the decomposition is based on decomposing the sequence x{n] into successively
smaller subsequences are called decimation-in-time algorithms.
‘The principle of the decimation-in-time algorithm is most conveniently illustrated
by considering the special case of N an integer power of 2, ie., N = 2. Since N is an
even integer, we can consider computing X [k] by separating x[n] into two (N/2)-point?
sequences consisting of the even-numbered points in x[] and the odd-numbered points
in x[nJ. With X[K] given by
wa
Xk = So xinlwek, k= 0,1,....N-1, (9.10)
n=
and separating x[n] into its even- and odd-numbered points, we obtain
XK) = 2 oxlnl ek + SO atest, (9.11)
neven mold
or, with the substitution of variables n = 2r for n even and n = 2r +1 forn odd,
(N2)-1 (N/D-1
XK = So xfar]wee + SO xfar + wy
= od
(9.12)
wat wat
= 2 xerwiyt+ we SO xa + 1ycwey.
0 0
3When discussing FFT algorithms, we use the words sample and point interchangeably to mean se-
quence value. Also, we refer to a sequence of length N as an N-point sequence, and the DFT of a sequence
of length JV will be called an N-point DFT.636 Computation of the Discrete Fourier Transform Chap. 9
But Wi = Wye, since
PIRI) =e PHAN) = Wy, (9.13)
Consequently, Eq. (9.12) can be rewritten as
(nyt (wy
X= SO x2r]Wih + WA SO x2 + 11K (0.4)
r=0 = :
= G[K]+ WEA[A, = k= 0,1,....N—1
Each of the sums in Eq. (9.14) is recognized as an (N/2)-point DFT, the first sum being
the (N/2)-point DFT of the even-numbered points of the original sequence and the
second being the (N/2)-point DFT of the odd-numbered points of the original sequence.
Although the index kranges over N values, k = 0, 1,..., N—1, each of the sums must be
computed only for k between 0 and (N/2)— 1, since G[k] and H[&] are each periodic in
k with period N/2. After the two DFTs are computed, they are combined according to
Eq. (9.14) toyield the N-point DFT X[k]. Figure 9.3 depicts this computation for N = 8.
In this figure, we have used the signal flow graph conventions that were introduced in
Chapter 6 for representing difference equations. That is, branches entering a node are
summed to produce the node variable. When no coefficient is indicated, the branch
transmittance is assumed to be unity. For other branches, the transmittance of a branch
is an integer power of Ww.
In Figure 9,3, two 4-point DFTs are computed, with G[&] designating the 4-point
DFT of the even-numbered points and H/{k] designating the 4-point DFT of the odd-
numbered points. X(0] is then obtained by multiplying [0] by Wf and adding the
product to G[0]. X{1] is obtained by multiplying H{[1] by Wj, and adding that result
to G[1]. Equation (9.14) states that, to compute X/[4], we should multiply H[4] by Wy
and add the result of G[4]. However, since G[k] and H{A] are both periodic in k with
oo)
10] > a x0)
rpjory
pein
a DFT
x[6]o->
x(t)
xBlo -
point Hil] we
xijo> DET >» X[6] Figure 9.3 Flow graph of the
AQ] ws decimation-in-time decomposition of an
N-point DFT computation into two
xo (7 (W/2}-point DFT computations
AB] We (Wa)Sec. 9.3 Decimation-in-Time FFT Algorithms 637
period 4, H[4] = H{0] and G[4] = G[0]. Thus, X [4] is obtained by multiplying [0]
by W and adding the result to G[0]. As shown in Figure 9.3, the values X{5], X [6],
and X[7] are obtained similarly.
With the computation restructured according to Eq. (9.14), we can compare the
number of multiplications and additions required with those required for a direct com-
putation of the DFT. Previously we saw that, for direct computation without exploiting
symmetry, N? complex multiplications and additions were required. By comparison,
Eg. (9.14) requires the computation of two (N/2)-point DFTs, which in turn requires
2(.N/2)? complex multiplications and approximately 2(N/2)* complex additions if we
do the (N/2)-point DFTs by the direct method. Then the two (N/2)-point DFTs must
be combined, requiring N complex muitiplications, corresponding to multiplying the
second sum by Wf, and N complex additions, corresponding to adding the product
obtained to the first sum. Consequently, the computation of Eq. (9.14) for all values
of k requires at most N + 2(N/2)* or N + N?/2 complex multiplications and complex
additions. It is easy to verify that for N > 2, the total N + N?/2 will be less than N?.
Equation (9.14) corresponds to breaking the original N-point computation into
two (N/2)-point DFT computations. If N/2 is even, as it is when N is equal to a power
of 2, then we can consider computing each of the (N/2)-point DFTs in Eq. (9.14) by
breaking each of the sums in that equation into two (N/4)-point DFTs, which would
then be combined to yield the (N/2)-point DFTs. Thus, G[A] in Eq. (9.14) would be
represented as
(way (npay=1 (npay-t
Gid= XS slrlwis= Yo sewak+ SY eetruwes, 0.15)
= = =
or
(N/4)-1 (N/A)-1
GK = S> sl2€]Wiis+ Whe D> el2e+ 1) Wii, (9.16)
= =
Similarly, H[k] would be represented as
yt (jay
AUK) = D> AE Wihg + Why D2 nize + 1)W ih (9.17)
- =
Consequently, the (N/2)-point DFT G[A] can be obtained by combining the (N/4)-point
DFTs of the sequences g[2¢] and g[2€ + 1]. Similarly, the (N/2)-point DFT H[K] can
be obtained by combining the (N/4)-point DFTs of the sequences h[2¢) and h[2¢ + 1]
Thus, if the 4-point DFTs in Figure 9.3 are computed according to Eqs. (9.16) and
(9.17), then that computation would be carried out as indicated in Figure 9.4. Inserting
the computation of Figure 9.4 into the flow graph of Figure 9.3, we obtain the complete
flow graph of Figure 9.5, where we have expressed the coefficients in terms of powers
of Wy rather than powers of Wyo, using the fact that Wy = W3.
For the 8-point DFT that we have been using as an illustration, the computation
has been reduced to a computation of 2-point DFTs. For example, the 2-point DFT of
the sequence consisting of x[0] and x[4] is depicted in Figure 9.6. With the computation
‘For simplicity, we shall assume that Nis large, so that (1V ~ 1) can be approximated by N.638 Computation of the Discrete Fourier Transform Chap. 9
slo
point
eal Det
apo Figure 9.6 Flow graph of the
point decimation-in-time decomposition of an
4 (N/2)-point DFT computation into two
xf6}o—>{_PFT (W/4)-point DFT computations
(N = 8),
xo
xt}
N
{Point
DFT
xo
x{elo
point
aloes
prt
xo
xis}
X~ point
Point
pr
xBloe
xt
N-point
3 - point
DET
Figure 9.5 Result of substituting the
structure of Figure 9.4 into Figure 9.3.
x14) = Figure 9.6 Flow graph of a 2-point
Wr=WH"=-1 DFT. one
of Figure 9.6 inserted in the flow graph of Figure 9.5, we obtain the complete flow graph
for computation of the 8-point DFT, as shows in Figure 9.7.
For the more general case, but with N still a power of 2, we would proceed by
decomposing the (N/4)-point transforms in Eqs. (9.16) and (9.17) into (N/8)-point
transforms and continue until we were left with only 2-point transforms. This requires
v = log, N stages of computation. Previously, we found that in the original decomposi-
tion of an N-point transform into two (N/2)-point transforms, the number of complex
multiplications and additions required was N + 2(N/2)?. When the (N/2)-point trans-
forms are decomposed into (N/4)-point transforms, the factor of (NV/2)? is replaced by
N/2 + 2(N/4), so the overall computation then requires N + N + 4(N/4)? complex
multiplications and additions. If N = 2", this can be done at most v = log, N times,
so that after carrying out this decomposition as many times as possible, the number of,
complex multiplications and additions is equal to Nv = Nlog, N.Sec. 9.3. Decimation-in-Time FFT Algorithms 639
Figure 9.7 Flow graph of complete
decimation-in-time decomposition of an
8-point DFT computation.
Figure 9.8 Flow graph of basic
butterfly computation in Figure 9.7.
‘The flow graph of Figure 9.7 displays the operations explicitly. By counting
branches with transmittances of the form Wx, we note that each stage has N complex
muitiplications and N complex additions. Since there are log, N stages, we have a total
of Nlog, Ncomplex multiplications and additions. This is the substantial computational
savings that we have previously indicated was possible. For example, if N = 2" = 1024,
then N? = 2 = 1,048,576, and Nlog, N = 10,240, a reduction of more than two
orders of magnitude!
‘The computation in the flow graph of Figure 9.7 can be reduced further by exploit-
ing the symmetry and periodicity of the coefficients W,. We first note that, in proceeding
from one stage to the next in Figure 9.7, the basic computation is in the form of Fig-
ure 9.8, ‘e., it involves obtaining a pair of values in one stage from a pair of values in the
preceding stage, where the coefficients are always powers of Wy and the exponents are
separated by N/2. Because of the shape of the flow graph, this elementary computation
is called a butterfly. Since
WAP =e iRRININE grin =
(9.18)
the factor Wx"? can be written as
Wii? = WP Wh = Whe (9.19)x{o]°
x{dJo
x2]e
x{6l°
x{ljo
640 Computation of the Discrete Fourier Transform Chap. 9
With this observation, the butterfly computation of Figure 9.8 can be simplified to the
form shown in Figure 9.9, which requires only one complex multiplication instead of
two. Using the basic flow graph of Figure 9.9 as a replacement for butterflies of the form
of Figure 9.8, we obtain from Figure 9.7 the flow graph of Figure 9.10. In particular, the
number of complex multiplications has been reduced by a factor of 2 over the number
in Figure 9.7.
(m=—1)st rth
stage stone
we Figure 9.9 Flow graph of simplified
oe butterfly computation requiring only one
“1 complex multiplication.
9.3.1 In-Place Computations
‘The flow graph of Figure 9.10 describes an algorithm for the computation of the discrete
Fourier transform. The essential features of the flow graph are the branches connecting
the nodes and the transmittance of each of these branches. No matter how the nodes in
the flow graph are rearranged, it will always represent the same computation, provided
that the connections between the nodes and the transmittances of the connections
are maintained. The particular form for the flow graph in Figure 9.10 arose out of
deriving the algorithm by separating the original sequence into the even-numbered and
odd-numbered points and then continuing to create smaller and smaller subsequences
in the same way. An interesting by-product of this derivation is that this flow graph,
in addition to describing an efficient procedure for computing the discrete Fourier
transform, also suggests a useful way of storing the original data and storing the results
of the computation in intermediate arrays.
To see this, it is useful to note that according to Figure 9.10, each stage of the
computation takes a set of N complex numbers and transforms them into another set of
Figure 9.10 Flow graph of 8-point DFT
using the butterfly computation of
Figure 9.9,646 Computation of the Discrete Fourier Transform Chap. 9
access memory is not available is shown in Figure 9.16. This flow graph represents
the decimation-in-time algorithm originally given by Singleton (1969). (See DSP Com-
mittee, 1979, for a program using serial memory.) Note first that in this flow graph the
input is in bit-reversed order and the output is in normal order. The important feature
of the flow graph is that the geometry is identical for each stage: only the branch trans-
mittances change from stage to stage. This makes it possible to access data sequentially.
Suppose, for example that we have four separate disk files, and suppose that the first
half of the input sequence (in bit-reversed order) is stored in one file and the second
half is stored in a second file. Then the sequence can be accessed sequentially in files
1 and 2 and the results written sequentially on files 3 and 4, with the first half of the
new array being written to file 3 and the second half to file 4. Then at the next stage of
computation, files 3 and 4 are the input, and the output is written to files 1 and 2. This
is repeated for each of the v stages. Such an algorithm could be useful in computing the
DFT of extremely long sequences,
9.4 DECIMATION-IN-FREQUENCY FFT ALGORITHMS
‘The decimation-in-time FFT algorithms are all based on structuring the DFT compu-
tation by forming smaller and smaller subsequences of the input sequence x{n]. Alter-
natively, we can consider dividing the output sequence X(k] into smaller and smaller
subsequences in the same manner. FFT algorithms based on this procedure are com-
monly called decimation-in-frequency algorithms.
To develop these FFT algorithms, let us again restrict the discussion to N'a power
of 2 and consider computing separately the even-numbered frequency samples and the
odd-numbered frequency samples. Since
na
X[K = Soxtnlwat, k= 01....N-1, (9.23)
“=
the even-numbered frequency samples are
Net
Xr] = So xtn]wee?, or .(N/2) 1, (9.24)
=
which can be expressed as
nyt wet
Xr] = SO xf warr + SO xin we" (9.25)
= 7
With a substitution of variables in the second summation in Eq. (9.25), we obtain
wart (wat :
Xr] = So xfer SO xf + (ayy) (9.26)
n= n=OSec.9.4 _Decimation-in-Frequency FFT Algorithms 647
Finally, because of the periodicity of 2”,
wayne — wR WAN = wien, (9.27)
and since Wi = Ww/2, Eq. (9.26) can be expressed as
wat
XR] = Yo Gln tale + (N/2D)WH, = 0,1,...,(N/2)- 1. (9.28)
n=O
Equation (9.28) is the (N/2)-point DFT of the (N/2)-point sequence obtained by adding
the first half and the last half of the input sequence. Adding the two halves of the input
sequence represents time aliasing, consistent with the fact that in computing only the
even-numbered frequency samples, we are undersampling the Fourier transform of x{71).
We can now consider obtaining the odd-numbered frequency points, given by
wat
Xr +1) = So xtaywye*?,
1
0,1,...,(N/2)= 1. (9.29)
As before, we can rearrange Eq. (9.29) as
way wa
Xr iy= So xterra wget. (9.30)
= non
An alternative form for the second summation in Eq. (9.30) is
S (2r+1) oat [n(N/2)2r+1)
DS xtaywye? = SS xine (N/a) Wy Cr
none =
(N21
= WYRED Stn + (N/2) WA
no
way
=- DO vt (Ny2yywrern, (931)
n=
where we have used the fact that Wi"”"" = 1 and Wi” = —1. Substituting Eq. (9.31)
into Eq, (9.30) and combining the two summiations, we obtain
(N/2)=1
Xr +1) = SO Caled = xl (N/2)) we, (9.32)
=«(01
x(t]
x2
=)
x(4]
x51
«(61
x(7]
648 Computation of the Discrete Fourier Transform Chap. 9
or, since Wy = Ww,
(nzzjnt
X[2r+ t= Yo (ln) — ale + (N/2))WHW Ki.
c= (9.33)
r=0,1,....(N/2)-1.
Equation (9.33) is the (N/2)-point DFT of the sequence obtained by subtracting the
second half of the input sequence from the first half and multiplying the resulting
sequence by Wf. Thus, on the basis of Eqs, (9.28) and (9.33), with g[n] = x[n] +
x[n-+N/2] and hn] = x{n}—x[n+ N/2], the DFT can be computed by first forming the
sequences g[7] and h{n], then computing A{]W,,, and finally computing the (N/2)-point
DFTs of these two sequences to obtain the even-numbered output points and the odd-
numbered output points, respectively. The procedure suggested by Eqs. (9.28) and (9.33)
is illustrated for the case of an 8-point DFT in Figure 9.17.
Proceeding in a manner similar to that followed in deriving the decimation-in-
time algorithm, we note that since N is a power of 2, N/2 is even; consequently, the
(N/2)-point DFTs can be computed by computing the even-numbered and odd num-
bered output points for those DFTs separately. As in the case of the procedure leading
to Eqs. (9.28) and (9.33), this is accomplished by combining the first half and the last,
half of the input points for each of the (N/2)-point DFTs and then computing (N/4)-
point DFTs. The flow graph resulting from taking this step for the 8-point example is
shown in Figure 9.18. For the 8-point example, the computation has now been reduced
to the computation of 2-point DFTs, which are implemented by adding and subtracting
the input points, as discussed previously. Thus, the 2-point DFTs in Figure 9.18 can be
replaced by the computation shown in Figure 9.19, so the computation of the 8-point
DFT can be accomplished by the algorithm depicted in Figure 9.20.
By counting the arithmetic operations in Figure 9.20 and generalizing to N = 2",
we see that the computation of Figure 9.20 requires (N/2) log, ‘V complex multiplica-
tions and Nlog, N complex additions. Thus, the total number of computations is the
same for the decimation-in-frequency and the decimation-in-time algorithms.
0
(01 aa
NX point [OAPI
DFT Jo x[4]
p> X16]
}-—° X[1]
A point XB)
2 Figure 9.17 Flow graph of
per [X15] decimation-in-frequency decomposition
‘of an N-point DFT computation ints two
}-—o.x[7] _ (N/2)-point DFT computations
(
N= 8)x(0]
xt)
[2]
x13]
fa]
x{S]
x{6]
x7]
Sec. 9.4 Decimation-in-Frequency FFT Algor
ths
x
pin
aR
DF
}—« x10]
fo X14]
<1 JB point
7
DFT
Jo x12]
[+0 x16]
<1 8 point
Wy DFT
[0x11]
fo x15]
ST] — pon
point
X11)
Xl
fo x13]
AT
Xp]
Xa
x7
Figure 9.18 Flow graph of
decimation-in-frequency decomposition
of an 8-point DFT into four 2-point OFT
computations.
Figure 9.19 Flow graph of a typical
2-point OFT as required in the last stage
of decimation-in-frequency
decomposition.
Figure 9.20 Flow graph of complete
decimation-in-frequency decomposition
of an 8-point DFT computation.