DCT Vs KLT Implementation and Concepts Through MATLAB
DCT Vs KLT Implementation and Concepts Through MATLAB
by Bryan Davis
alias Mojo
EEL5701:
Digital Signal Processing Foundations
Dr. Talyor
August 2, 1999
Discrete Cosine Transform and Karhunen Loéve Transform
Approximation of Correlated Signals
Purpose
The purpose of this project is to estimate the compression ratio and reconstruction accuracy of signals
compressed using the Karhunen Loéve Transform (KLT) and the Discrete Cosine Transform (DCT) with a
given error acceptance. The signal will be assumed to be a realization of a stationary process having a
autocorrelation of RSS = ρτ. The covariance matrix will then have the form {M: Mij = ρ|i-j|σ2}, where σ2 is
the variance of the individual signal. Because of the correlation between samples, the KLT should be able
to compress the signal data significantly, while still retaining most of the information in the signal. Since
the DCT is an approximation for the KLT for signals with this type of correlation, it should do
approximately the same thing, with much less computation than the KLT.
This result is experimentally verified using the included Matlab function ‘proj2’.
Here is part of the code related to creating the random process S and verifying that it has the correct
autocorrelation function:
% This function analyzes the average case over Ns samples for proj(N,p,err). This is used to
% verify the validity of the random process S, and show that the averages for the KLT and DCT
% approximations are as expected.
The first statistic collected is the expected energy of S, and is displayed in the variable avg_signal_energy.
This is calculated by taking the average of the signal energies for each realization of S. Ideally, this should
equal N.
The next statistic is the covariance matrix of S, and is contained in the variable covariance_matrix. This is
a NxN matrix where each entry Cij corresponds to the covariance between the RV at time i and the RV at
time j. This is calculated using a function in Matlab called cov. Ideally, this should be identical to the
matrix M. Since this can be a large matrix, it is not directly displayed. To see its value type:
covariance_matrix at the prompt after proj2() is displayed.
The next statistic is the root-mean-squared error of the coefficients in the covariance matrix, and is
contained in the variable covariance_rms_error. This is a metric of the accuracy of the covariance matrix
to M. It should approach 0, as NS approaches ∞.
A plot of the autocorrelation and power spectral density (PSD) are also displayed. The black graph
represents the ideal random process, and the red graph represents this approximation. The process is
supposed to stationary and therefore the autocorrelation should only be a function of the time difference,
and not the absolute time. Assuming that, the autocorrelation is found by the covariance between the RV
taken at time 0, and the RV taken at time τ. The PSD is just the Fourier transform of the autocorrelation. As
NS approaches ∞, the two graphs should converge.
So first, we decompose S’s covariance matrix M into Q-1*D*Q. Then we create an Nx(N-r) diagonal matrix
U that has the effect of removing the columns of Q that correspond to the smallest diagonal entries in D.
These smallest entries in D should make up (1-err)% of the total value of D. Now, Q-1*s will transform the
highly correlated signal s, into one with no correlation. Then UT*Q-1*s will throw away the least significant
entries in Q-1*s. This (UT*Q-1) will be our compression matrix. The corresponding decompression matrix
will be Q (or equivalently, Q*U). So in a real system (which I am not modeling), we would predetermine
T=UT*Q-1, the compression matrix, and TT=Q*U, the decompression matrix, based the assumption that our
future signal has a known covariance matrix. We right multiply the signal s by T, send the signal (which
has N-r samples instead of N samples), receive it, right multiply it by TT, and we have our reconstructed
signal. All of this compression-decompression can me modeled by TT*T*s, or equivalently Q*U*UT*Q-1 =
Q*U`*Q-1, where U’ = U*UT. U` is a diagonal matrix with zeros in the diagonal entries corresponding to
the thrown away columns of Q, and ones in the diagonal entries corresponding to the retained columns of
Q.
Since we already calculated Q and D, there is no need to do it again. Otherwise, we produce U` (called just
U in the code), and perform the calculation Q*U`*Q-1*S to model the compression-decompression losses.
So first, we decompose S’s covariance matrix M into Φ-1*E*Φ. Then we create an Nx(N-r) diagonal matrix
V that has the effect of removing the columns of Φ that correspond to the smallest diagonal entries in E.
These smallest diagonal entries in E should make up (1-err)% of the total value of the diagonal entries in E.
Now, Φ -1*s should approximately transform the highly correlated signal s, into one with no correlation.
Then VT*Φ-1*s will throw away the least significant entries in Φ-1*s. This (VT*Φ-1) will be our
compression matrix. The corresponding decompression matrix will be Φ (or equivalently, Φ*V). So in a
real system (which I am not modeling), we would predetermine T=VT*Φ-1, the compression matrix, and
TT=Φ*V, the decompression matrix, based the assumption that our future signal has a known covariance
matrix. We right multiply the signal s by T, send the signal (which has N-r samples instead of N samples),
receive it, right multiply it by TT, and we have our reconstructed signal. All of this compression-
decompression can me modeled by TT*T*s, or equivalently Φ*V*VT*Φ-1 = Φ*V`*Φ-1, where V’ = V*VT.
V` is a diagonal matrix with zeros in the diagonal entries corresponding to the thrown away columns of Φ,
and ones in the diagonal entries corresponding to the retained columns of Φ.
%% proj2 continued . . .
% Lastly we need to analyze the array of signals created by the KLT and the DCT.
% Calculate the KLT and DCT average percent energy loss. This should be (1-err).
output.KLT_avg_percent_energy_loss = mean((output.avg_signal_energy - sum(S1.^2))/N);
output.DCT_avg_percent_energy_loss = mean((output.avg_signal_energy - sum(S2.^2))/N);
% Calculate the KLT and DCT average energy error. This is the average over all signals of
% the sum of the energy difference between S and SX for each sample of the signal.
output.KLT_avg_error = mean(sum((S-S1).^2)/N);
output.DCT_avg_error = mean(sum((S-S2).^2)/N);
% calculate the KLT and DCT compression ratio. This corresponds to the ratio of coefficents in
% in S vs. to the non-zero coefficents in Q*D*S (KLT) or R*E*S (DCT).
output.KLT_compression_ratio = N/trace(U);
output.DCT_compression_ratio = N/trace(V);
KLT_avg_percent_energy_loss is the average energy lost in each signal by KLT. Ideally, it should have a
value of 1-err.
DCT_avg_percent_energy_loss is the average energy lost in each signal by DCT. Ideally, it should have a
value of 1-err.
KLT_avg_error is the average squared difference in each signal using KLT. Ideally, it should have a value
of 1-err
DCT_avg_error is the average squared difference in each signal using DCT. Ideallt, it should have a value
of 1-err
KLT_compression_ratio is the decompressed signal length to KLT compressed signal length ratio.
DCT_compression_ratio is the decompressed signal length to DCT compressed signal length ratio.
Here is the code that analyzes the results of the KLT and DCT compression-decompression using a single
signal. It is found in the function proj(N,ρ,err).
% Here we calculate some metrics about s, s1 and s2.
signal_energy is the sum of the squares of the samples in the signal s. It is random with an expected value
of N.
KLT_percent_energy_loss is the percentage of the energy lost using KLT. Also random, with an expected
value of 1-err.
DCT_percent_energy_loss is the percentage of the energy lost using DCT. Also random, with an expected
value of 1-err.
KLT_error is the mean of the square of the difference between each sample of s1 and s. Again, random
with an expected value of 1-err.
DCT_error is the mean of the square of the difference between each sample of s2 and s. Random, with an
expected value of 1-err.
KLT_compression_ratio is the decompressed signal length to KLT compressed signal length ratio.
DCT_compression_ratio is the decompressed signal length to DCT compressed signal length ratio.
Graph Tendencies
In the function proj graph s, s1 and s2 are displayed on the same plot. Another plot shows a magnitude plot
of the signals’ Fourier transforms.
The signals should be smooth but wavy as the value of ρ (the correlation coefficient) increases. For a
negative value of ρ, the signal should be very jagged, but overall rather flat. As |ρ| increases towards 1, the
compression ratio should increase. For ρ = 0, the signals should be very jagged and wavy, and little
compression should be possible.
The closer ‘err’ is to 1, the closer the approximations should match the original signal s. This is at a cost of
a decreased compression ratio.
The larger N is, the more data points there are on the graph. If a power of 2 is used for N, Matlab goes a
little faster. Values larger than 512 take exceedingly long to run. This is because of the calculation of the
KLT transform matrices, not the actual compression-decompression algorithm.
An interesting deviation from what is expected, is that for very high correlated signals (|ρ| ~ 0.999), the
DCT does not approximate the signal very well at the end points. Otherwise, the DCT and the KLT are
comparable, although the KLT always slightly outperforms the DCT, as far as compression goes. They
both perform very well (better than I expected).
proj2 takes four parameters. The first three parameters are the same. NS is the size of the sample space for
each RV. 1,000-10,000 are good values. >100,000 is very slow.
Conclusions
This project was very successful in that it taught me a lot about signal compression. The graphical results
are very revealing in that they show the output of compressed signals along side of their Fourier transforms.
The compression-decompression algorithms are highly compressive and reconstruct rather accurately the
original signal. As the Fourier transforms display, the compression for positively correlated signals losses
the frequencies near the Nyquist frequency, and compression for negatively correlated signals losses
frequencies near DC. This is what is expected, since a highly positively correlated signal is smooth and
wavy, which suggests it has very little energy in the high frequencies. Likewise, a highly negatively
correlated signal is jagged and flat overall, which suggests it has very little energy in the low frequencies.
A very serious limitation to the application of the compression algorithms is that the 1st and 2nd order
statistics of the signal must be known a priori. The algorithm exploits the fact that the signal has a known
autocorrelation to compress it. Autocorrelation cannot be inferred just from one signal. Also interesting to
note is that when the original signal is right multiplied by Φ -1 (or Q-1), and then most of the vector entries
are removed by V or(U), that is not necessarily the smallest entries that are zeroed out. There really is no
correlation between the size of the entry and whether or not it is kept, or zeroed out. This is determined by
the matrix V, and is independent of the signal. It is just that the information contained in those entries is
insignificant when reconstructing the signal.
Source Code
Here is the source code in its entirety. Included here but not metioned in the text is the source code for
creating the covariance matrix, Markov(N,ρ), creating the DCT transform matrix, DCT(N), and code for
creating the matrices U and V, reduce(D,1-err).
Markov.m
% creates a Markov covariance matrix where M(i,j) = p^abs(i-j)
function M = Markov(N,p)
for i = 1:N
for j = 1:N
M(i,j) = p^abs(i - j);
end
end
return
DCT.m
% creates the DCT-II transform matrix
function A = DCT(N)
for j = 1:N
A(1,j) = sqrt(2)/2;
end
for i = 2:N
for j = 1:N
A(i,j) = cos((i-1)*(j-0.5)*pi/N);
end
end
A = sqrt(2/N)*A;
return
reduce.m
% creates a diagonal matrix R which has ones in the locations where 1-err%
% of the diagonal value of D's largest entries lies.
function R = reduce(D, err)
D = abs(diag(D));
N=length(D);
err=err*N; % normalize the error to trace(D), which equals N.
R=eye(N); % start of with idenity matrix, and slowly reduce the
1's
% % on the diagonal until err% of the trace is gone.
[small,index] = min(D);
while ((err > small) & (small ~= 100)) ,
D(index) = 100;
R(index,index) = 0;
err = err - small;
[small,index] = min(D);
end
return
proj.m
% This function generates a realization of the random process S and calls it s.
% It also generates two signals s1 and s2, which are the compression-decompression
% approximations that result from the KLT and DCT, respectively.
% It plots these signals along with their Fourier transforms, and gives some analysis
metrics.
function output = proj(N,p,err)
return
proj2.m
% This function analyzes the average case over Ns samples for proj(N,p,err). This is used
to
% verify the validity of the random process S, and show that the averages for the KLT and
DCT
% approximations are as expected.
% Does the discrete random process possess the orignal covariance matrix?
% Calculate the average signal energy of the realiztions of S. This should approximate N.
output.avg_signal_energy = mean(sum(S.^2));
% Calculate the covariance matrix of S. This should approximate M.
output.covariance_matrix = cov(transpose(S));
% How close is the covariance matrix? Calculate the rms value of M - covariance_matrix.
% This should ideally be zero.
output.covariance_rms_error = sqrt(mean(mean((M - output.covariance_matrix).^2)));
% The next step is to find the KLT approximation to each realization of the random
process S.
% Since the KLT uses eigenvalue decomposition of the matrix M, we need Q and D,
% such that inv(Q)*D*Q = M. But we already did this calculation in creating the random
process.
% We need to reduce D so it contains at least err% of it's energy in the fewest possible
% coefficents (U=reduce(D,1-err) does this). Then we need to calculate the matrix
Q*U*inv(Q).
% This matrix will perform the compress-decompress operation on each realization of S,
% producing S1, a facsimile to S.
U = reduce(D,1-err); % U is a diagonal martix with ones corresponding to the
largest
% % values that together contain err% of the energy in D
S1 = Q*U*inv(Q)*S; % approximation to S using KLT
% The next step is to find the DCT approximation to each realization of the random
process S.
% The DCT uses an NxN transformation matrix R that approximates the eignevectors of M.
% We need to find E (an approximation to a diagonal) such that inv(R)*E*R = M.
% We need to reduce E so it contains at least err% of it's energy in the fewest possible
% diagonal coefficents (V=reduce(E,1-err) does this). Then we need to calculate the
matrix
% R*E*inv(R). This matrix will perform the compress-decompress operation on
% each realization of S, producing S2, a facsimile to S.
% Lastly we need to analyze the array of signals created by the KLT and the DCT.
% Calculate the KLT and DCT average percent energy loss. This should be (1-err).
output.KLT_avg_percent_energy_loss = mean((output.avg_signal_energy - sum(S1.^2))/N);
output.DCT_avg_percent_energy_loss = mean((output.avg_signal_energy - sum(S2.^2))/N);
% Calculate the KLT and DCT average energy error. This is the average over all signals of
% the sum of the energy difference between S and SX for each sample of the signal.
output.KLT_avg_error = mean(sum((S-S1).^2)/N);
output.DCT_avg_error = mean(sum((S-S2).^2)/N);
% calculate the KLT and DCT compression ratio. This corresponds to the ratio of
coefficents in
% in S vs. to the non-zero coefficents in Q*D*S (KLT) or R*E*S (DCT).
output.KLT_compression_ratio = N/trace(U);
output.DCT_compression_ratio = N/trace(V);
return