IJET - 3 05 04 043 - Final
IJET - 3 05 04 043 - Final
Where a11, a12, a21, a22 are the parameters that depend on the distances of the microphones from the speakers.
Here the source signals s1 and s2 is estimated from the mixed signals x1 and x2 using Independent component
analysis. This is known as blind source separation. In this process the mixed signals are obtained from
statistically independent and non-Gaussian source signals. For simplicity we assume the unknown mixing
matrix A, as the square matrix. The estimated source signals could be obtained up to their permutation, sign, and
amplitude only that is their order and variance cannot be obtained with independent component analysis.
In recent years, Researchers had proposed many criterions, Minimization of Mutual information have
been used to estimate source signals using Independent component analysis. In those maximization of non-
Guassianity gives the better performance. There are two techniques in maximizing non-guasianity, they are
using kurtosis and negentropy. In which negentropy is more reliable as kurtosis is most sensitive to outliers and
computationally robust process.
In this paper, we estimated the source signals using Independent component analysis [2] by maximizing
negentropy. The maximization of negentropy can be done using two algorithms (Fast ICA and gradient). To
estimate the source signals the demixing matrix is estimated. The fundamental restriction in ICA [3] is that the
independent components are non-guassian in nature. To see why gaussian variables make ICA impossible,
assume that the signals are Gaussian and mixing matrix is orthogonal. Then x1 and x2 are Gaussian, uncorrelated,
and of unit variance. Their joint density is given by
1 x 2 + x2
p ( x1 , x2 ) = 2π exp( − 1 2 2 ) (3)
This distribution is illustrated in Fig 1. The Figure 1 shows that the density is completely symmetric. So, it does
not contain any information on the directions of the mixing matrix.
To estimate one of the independent components, consider a linear combination of the xi let us denote this by
y = wT x = wt xt (4)
t
where w is a vector to be determined. If w were one of the rows of the inverse of A, then the linear combination
will equal one of the independent components. So determine such a w (i.e inverse of A) without knowledge of A
matrix is not practical, but we can find an estimator that gives good approximation. To see how this leads to the
basic principle of ICA estimation, let us make change of variables, defining
z = AT w (5)
y = w As = z s
T T
y is thus a linear combination of si, with weights given by zi. Since a sum of even two independent random
T
variables is more gaussian than the original variables, z s is more gaussian than any of the si and becomes least
gaussian when it in fact equals one of the si. In this case, only one of the elements zi of z is nonzero. Therefore,
T
we could take as w a vector that maximizes the non gaussianity of w x [4]. Such a vector would necessarily
correspond to a z which has only one nonzero component. This means that w x = z s equals one of the
T T
T
independent components. Maximizing the non gaussianity of w x thus gives us one of the independent
components. To find several independent components, we need to find all the local maxima. Its not difficult,
because different independent components are uncorrelated. This corresponds to orthogonalization in a suitably
transformed (i.e. whitened) space.
II. EVALUATION OF INDEPENDENT COMPONENTS BY MAXIMIZING A QUANTITATIVE MEASURE OF NON-
GAUSSIANITY
Two quantitative measures of non-gaussianity are used in ICA estimation are kurtosis and negentropy.
A. Negentropy
Negentropy is based on the information-theoretic quantity [5] of differential entropy, which we here call simply
entropy. The more “random”, i.e., unpredictable and unstructured the variable is, the larger its entropy. The
(differential) entropy H of a random vector y with density py (η ) is defined as
Gaussian variable has the largest entropy among all random variables. This means, entropy could be used as a
measure of nongaussianity. Negentropy J is defined as follows
J ( y) = H ( ygauss ) − H ( y ) (7)
Where ygauss is a Gaussian random vector of the same covariance matrix as ‘y’. Negentropy, or negative
normalized entropy, is always non-negative, and is zero if and only if ‘y’ has a Gaussian distribution.
Negentropy can be done using two algorithms as stated above. To make computation easy we center the data to
make mean zero and then we go for whitening process to make uncorrelated data with variance one. The
whitening process is done by eigen value decomposition method.
~ −1
x = ED 2
ET x (8)
The estimation of negentropy is difficult, as mentioned above, and therefore this contrast function remains
mainly a theoretical one. The classical method of approximating negentropy is using higher-order moments,
1 1
J ( y) ≈ E{ y 3 }2 + kurt ( y ) 2 (9)
12 48
The random variable y is assumed to be of zero mean and unit variance. In particular, these approximations
suffer from the no robustness encountered with kurtosis. To avoid the problems encountered with the preceding
approximations, new approximations were developed. These approximations were based on the maximum-
entropy principle. In general we obtain the following approximation
p 2
Where ki are positive constants, and v is a gaussian variable of zero mean and unit variance
B. Negentropy based fixed point algorithm
A much faster method for maximizing negentropy [6] is done using fixed-point algorithm. The
T
resulting FastICA algorithm finds a direction, i.e., a unit vector w, such that the projection w z maximizes non-
gaussianity. Non-gaussianity is here measured by the approximation of negentropy.
T
FastICA is based on a fixed-point iteration for finding a maximum of the nongaussianity of w z . The
FastICA algorithm using negentropy combines preferable statistical properties due to negentropy. The fixed
point iteration can be approximated as follows:
w ← E{zg ( wT z )} (11)
The above iteration does not have the good convergence properties of the FastICA using kurtosis, because the
non polynomial moments do not have the same nice algebraic properties as real cumulants like kurtosis. So the
modified iteration process can be as below,
w ← E{zg(wT z)} ⇔(1+α)w = E{zg(wT z)}+αw (12)
Due to the subsequent normalization of w to unit norm, the latter equation gives a fixed-point iteration
that has the same fixed points. So choice of α is more useful, it may be possible to obtain an algorithm that
converges as fast as the fixed-point algorithm using kurtosis. So the algorithm can be further simplified as
w ← E{zg ( wT z )} − E{g '( wT z )}w (13)
g 2 ( y ) = y exp(− y
2
)
2
5. Let w ← w / w
6. If not converged, go back to step 4.
C. Negentropy based gradient algorithm[8]
A simple gradient algorithm can be derived as, Taking the gradient of the approximation of negentropy with
2
respect to w, and taking the normalization E{( wT z )2 } = w = I into account, we can obtain the following
algorithm,
Δw ∝ γ E{zg (wT z )} (14)
w← w/ w (15)
Where γ = [ E{G( wT z )} − E{G(v)}] , v being any Gaussian random variable with zero mean and unit
T
varience. The normalization is necessary to project w on the unit sphere to keep the variance of w z constant.
The parameter γ , which gives the algorithm a kind of “self-adaptation” quality, can be easily estimated as
follows
Δγ ∝ [ E{G( wT z )} − E{G(v)}] − γ (16)
Step wise procedure for gradient negentropy:
1. Center the data to make its mean zero.
2. Whiten the data to give z.
3. Choose an initial vector w of unit norm, and an initial value for γ.
4. Update Δw ∝ γ zg ( w z ) , where g is defined as in above algorithm.
T
5. Normalize w ← w / w
By using deflamationary orthogonalisation S1 separated male voice signals are estimated after one unit
algorithm
Fig. 7. Separated signal male voice signal from mixed voice signal
In the next simulation two standard signals are used as source signals, such as chirp and gong. Then the signal
‘S’ is produced by adding the two source signals. Now this signal is multiplied with random matrix to get a
mixed signal ‘X’. The whitened signal is obtained when the mixed signal is done through the whitening process.
The sample length of mixed signal X and estimated independent components are both of same order in the
simulation.
Fig.8 and Fig. 9, are the two source signals (chirp and gong signals respectively). Fig.10 is the mixed
signal X. Now the demixing matrix is found by using any one of the one unit algorithm as explained above.
After the completion of the one unit algorithm we get one of the source signal as separated signal which as
shown in Fig.11 and Fig.12.
By deflamationary orthogonalisation other signals are estimated after one unit algorithm.
When the correlation coefficient reaches 1 then the two signals are highly correlated. When the value of
correlation coefficient reaches nearly zero then there is no correlation between the two signals.
Algorithm Male voice (S1) separated from the mixture S Female voice (S2) separated from the
for mixture S
negentropy
Correlation Average No of Correlation Average No of
Coefficients Execution iterations Coefficients Execution iterations
between S1&S2 time (sec) between S1&S2 time (sec)
Table I. Performance of Male and Female voice separation in Fast ICA and Gradient Algorithm
Algorithm Chirp signal (P1) separated from the mixture Gong signal (P2) separated from the
for P mixture P
negentropy
Correlation Average No of Correlation Average No of
Coefficients Execution iterations Coefficients Execution iterations
between P1&P2 time (sec) between P1&P2 time (sec)
Table II. Performance of Chirp and Gong signal separation in Fast ICA and Gradient Algorithm
From the above Table I and Table II we observed the Fast ICA provides better execution time compared to
gradient with minimum no of iteration. Gradient ICA provides lesser values for correlation coefficients, which
indicates that there is no correlation between separated signal with other signal
IV. CONCLUSION
This paper shows that Gradient based negentropy algorithm provides higher efficiency in separating speech
signals compared with Fast ICA based negentropy algorithm. Fast ICA needs less execution time as compared
to gradient based negentropy with minimum number of iterations.
REFERENCES
[1] Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902.
[2] Hyvärinen, A., J. Karhunen and E. Oja,” Independent Component Analysis.” New York: Wiley,(2001) pp: 165-202.
[3] P. Comon, “Independent Component Analysis-A new concept?” Signal Processing, vol. 36, pp. 287-314, 1994.
[4] Hyvärinen, Aapo, and Erkki Oja. "Independent component analysis: algorithms and applications." Neural networks 13.4 (2000):
411-430.
[5] Hyvarinen, Aapo. "Fast and robust fixed-point algorithms for independent component analysis." IEEE Transactions on Neural
Networks 10.3 (1999): 626-634.
[6] T. Chien and B.C. Chen, “A new independent component analysis for speech recognition and separation.” IEEE Trans. Speech Audio
Process. 14. 4 (2006): 1245-1254.
[7] K. Mohanaprasad and P. Arulmozhivarman. “Comparison of Independent component analysis techniques for Acoustic Echo
Cancellation during Double Talk scenario.” Australian Journal of Basic and Applied Sciences. 7. 4 (2013): 108-113.
[8] R. Ganesh, and K. Dinesh , “ An overview of independent component analysis and its application” Informatica., 2011: pp: 63-81.
[9] H.M.M. Joho and G. Moschytz, “ Combined blind / non blind source separation based on the natural gradient,” IEEE signal process.
Lett. 8. 8 (2001): 236-238.
[10] S. Miyabe, T. Takatani, H. Saruwatari, K.Shikano, and Y. Tatekura. “Barge-in and noise-free spoken dialogue interface based on
sound field control and semi-blind source separation.” In proc. Eur. Signal Process. Conf., 2007:232-236.