Discrminant Analysis
Discrminant Analysis
The model
Z = W X
Where, Z is the 1×n vector of discriminant score,
W’ is the 1×p vector of discriminant weights and
X is p×n matrix
Here n is the number of observations and p is the number of independent variables.
-1-
The formation of discriminant function is based on the principle of maximizing variations
between the groups and minimizing the variations within the group. Using the data
matrix, the mean corrected sum-of-squares and cross product matrices for the groups are
formed. We will denote these as W1, W2, W3 etc. Similarly the mean corrected sum-of-
squares and cross product matrices and for the total group are formed by considering all
the observations. We will denote these as T1, T2, T3 etc.
Now
W = W1 + W2 + W3 + …………. + Wk
Similarly
T = T1 + T2 + T3 + …………. + Tk
Where k is the number of groups
Also
T=W+B
=> B = T – W
We know that with respect to linear composite ‘w’ the between sum-of-squares is given
by
ŵBw
and similarly within sum-of-squares is given by
ŵWw
wˆ Bw
̂ = --- (1)
ˆ Ww
w
From (1),
̂ ŵWw = ŵBw
ŵBw – ̂ ŵWw = 0
-2-
Now making the partial derivative of ̂ with respect to ŵ equal to zero
ˆ Bw )
(w (ˆw ˆ)0
ˆ Ww
wˆ wˆ
Bw – ̂ Ww = 0
(B – ̂ W)w = 0
The number of the discriminant functions to be considered can be found by min (p, k-1).
Here p is the number of independent variables and k is the number of groups. We should
keep as many discriminant functions according to the above requirement.
Classification rules:
1. Classification of object based on single discriminant function:-
a) Compute the mean discriminant values ( Z ) for each group by
substituting the mean of the observations of independent variables in the
discriminant function.
b) For the new object also, calculate the Z value.
c) Calculate the distances between the value in step (b) and each mean
discriminate value in step (a).
d) The object is classified as belonging to a particular group on the basis of
shortest distance calculated in step (c).
2. Classification of object based on two discriminant functions:-
a) Compute the mean discriminant values ( Z i ) for each group by
substituting the mean of the observations of independent variables in the
discriminant functions. Thus we will get ( Z 1 , Z 2 ) for each group.
b) For the new object also, calculate the ( Z 1 , Z 2 )
-3-
c) Calculate the distances between the value in step (b) and each mean
discriminate value in step (a) using Euclidean Distance formula.
d) The object is classified as belonging to a particular group on the basis of
shortest distance calculated in step (c).
This process can be continued even for more than two discriminant functions.
Example: A table is given below which contains data on breakfast cereals produced by
three different manufacturers (G, K and Q).
-4-
Smacks K 110 2 1 70 1 9 15 40 2
SpecialK K 110 6 0 230 1 16 3 55 2
CapNCrunch Q 120 1 2 220 0 12 12 35 3
HoneyGrahamOhs Q 120 1 2 220 1 12 11 45 3
Life Q 100 4 2 150 2 12 6 95 3
PuffedRice Q 50 1 0 0 0 13 0 15 3
PuffedWheat Q 50 2 0 0 1 10 0 50 3
QuakerOatmeal Q 100 5 2 0 2.7 1 1 110 3
x1=G1-column mean of G1
W1=x1’*x1
-5-
-6-
x2=G2- column mean of G2
W2=x2’*x2
-7-
W3=x3’*x3
W=W1+W2+W3
-8-
T1=t1'*t1
T2=t2’*t2
-9-
t3=G3- column mean of whole data
T3=t3’*t3
T=T1+T2+T3
B=T-W
- 10 -
W-1B
0.0110 0.0352 -0.0527 0.0107 -0.0720 - 0.0709i -0.0720 + 0.0709i -0.0379 -0.0681
0.2490 -0.2144 0.7022 0.4065 -0.2650 - 0.0459i -0.2650 + 0.0459i 0.1273 -0.2781
-0.4886 -0.3488 0.6202 -0.8032 0.5703 + 0.0049i 0.5703 -0.0049i -0.8396 -0.6239
-0.0003 -0.0040 -0.0037 0.0121 0.0065 + 0.0061i 0.0065 -0.0061i 0.0023 -0.0092
0.8133 0.8854 0.1352 -0.2461 0.6898 0.6898 -0.5126 -0.2671
0.1388 -0.1436 0.2039 -0.3589 0.1092 + 0.0391i 0.1092 - 0.0391i 0.0458 0.1612
0.1347 -0.1613 0.2443 0.0013 0.3081 + 0.0837i 0.3081 - 0.0837i 0.1119 0.6566
-0.0173 -0.0226 -0.0084 0.0067 -0.0256 + 0.0156i -0.0256 - 0.0156i -0.0001 -0.0180
1.8698 0 0 0 0 0 0 0
0 0.4810 0 0 0 0 0 0
0 0 0.0000 0 0 0 0 0
0 0 0 0.0000 0 0 0 0
0 0 0 0 -0.0000 + 0.0000i 0 0 0
0 0 0 0 0 -0.0000 - 0.0000i 0 0
0 0 0 0 0 0 -0.0000 0
0 0 0 0 0 0 0 -0.0000
- 11 -
Z1 = 0.011X1 + 0.249X2 - 0.4886X3 - 0.0003X4 + 0.8133X5 + 0.1388X6 + 0.1347X7
- 0.0173X8
- 12 -
Actual Distance Distance Distance Pred.
X1 X2 X3 X4 X5 X6 X7 X8 Z1 Z2
group from G1 from G2 from G3 Group
110 2 2 180 1.5 10.5 10 70 1 3.4841 -1.3547 0.5634447 2.090529 0.751399 1
- 13 -
Calculation of misclassification error
1) Re-substitution method – In this method whole sample data is considered. The
available group data is re-substituted to calculate the APperent Error Rate (APER).
Let us suppose, there are two groups G1 and G2 and n1, n2 be the number of
observations in two groups respectively. After substituting the data values in the
discriminant function, if the discriminant function discriminates as shown in the table
below, then the misclassification rate is 0.
Predicted membership
Actual G1 G2
membership G1 n1 0
G2 0 n2
Let us suppose n1’ is the misclassified observation in G1 and n2’ is the misclassified
observation in G2 as shown below.
Predicted membership
Actual G1 G2
membership G1 n1- n1’ n1’
G2 n2’ n2- n2’
- 14 -
3) The U method or Cross Validation or Jack Knife method – If there are two
groups, at a time leave one element in each group and with remaining of that group
and with the other elements of the other group, do the classification. Then use the left
out element for the error rate calculation. Do this until all the elements in the group
are subjected to misclassification procedure.
Go to the second group, leave one element and do the classification for the
remaining elements with the all the elements of the other group the left out element is
used for misclassification test. Continue until all the elements are subjected to the
misclassification procedure.
Statistical Tests:
To identify whether the developed discriminant functions discriminate properly, we have
some tests to solve our purpose. One useful test is Mahalonobis’ D2 test statistic, which
is used mainly for two-group problem. Here D 2 is the generalized distance between the
centroids of the two groups. The centroid for each group is arrived by substituting the
mean values of each independent variable in the discriminant function. The
corresponding Z1 and Z2 values are the centroid of the groups.
The distance is calculated as follows:-
D2 = Z2 – Z1, if Z1 < Z2
Z1 – Z2, if Z2 < Z1
If the distance D2 is higher, then it means that the discriminant function is discriminating
effectively. It was shown that the distance D 2 depends upon a test statistic M. thus M
follows F distribution with degrees of freedom p, n1+n2-p-1. Here n1 and n2 are the
number of observations in the two groups and p is the number of independent variables.
- 15 -
In this test, M is tested for a particular ‘α’ value. If the value of M is significant at that
point then the discriminant function is significant otherwise not.
Also we can say that there is not much difference between the two groups.
If there are two groups (number of discriminant function is only one), then it is
possible to test the significance of the discriminant function by the above test. But if there
are more than two groups i.e. number of discriminant function is more than one, then the
Bartlet’s χ2 test statistic comes as handy. This will help us to retain the significant
discriminant functions.
This statistic is given by
r
V = {(n-1) - ½ (p+k)}
j 1
ln(1+λj)
The λ values are the eigen values of the W -1B matrix. The test statistic V is tested to
identify whether it is significant or not. This is done by using χ 2 distribution with degrees
of freedom p (k-1) for particular ‘α’ value. If the V value is significant then the next job is
to identify how many discriminant functions are significant out of total discriminant
functions. This test is cariied out in the following manner.
First calculate the V1 value. This is the value corresponding to the highest eigen value.
Subtract V1 from V. Now test the V – V 1 value is significant or not using degrees of
freedom (p-1) (k-2). If it is significant, then the significance of the next discriminant
function is checked. If V – V1 value is not significant then it means that first discriminant
function corresponding to the highest eigen value is significant and other discriminant
functions are not significant.
- 16 -
V1
V – V1
If V – V1 is significant then it means that there are some more discriminant functions
which may contribute to the significance of the discriminate the groups.
1
V2 = {(n-1) – (p+k)} ln(1+λ2)
2
Now calculate V - V1 - V2. Test the significance of this value for the degrees of freedom
(p-2)(k-3). If it is significant then the second discriminant function is significant and also
we should proceed to explore whether other significant discriminant functions are there
or not. If the V - V1 - V2 value is not significant then come to the conclusion that the
second discriminant function is significant in addition to the first one. Stop the procedure.
This whole process is given in the flowchart as follows: -
- 17 -
Calculate V
i=1
i=i+1
CalculateVi
V = V – V1
Subtract V1 from V
Discriminant function of Is V- Vi
λi value is significant Significant ?
Stop
- 18 -
Interpretation of the attributes with respect to Discriminant axes
Step 3: Stretch the each group centroid also by multiplying it with the approximate F-
values for the different discriminant functions. The approximate F-values for each
discriminant function can be obtained as follows.
For first Discriminant function,
(n k )
F Highest eigen value
(k 1)
Thus the F value for each discriminant function is calculated. The new centroid is also
calculated according to the discriminant loadings.
- 19 -
Step 1: As we have already seen how to decide about the number of discriminant
function to retain. Now let us see how the discriminant loadings are calculated.
Where Wj is the discriminant weights and Wj* are rescaled discriminant weights.
C contains the square roots of the diagonal elements of total sample variance-covariance
matrix.
Covariance matrix is obtained by dividing each element of the mean corrected sum-of-
squares and cross-products matrix (S) by n-1, the sample size less 1.
The mean corrected sum-of-squares and cross-products matrix (S) can be obtained by
pre-multiplying the mean corrected data matrix by its transpose.
(b) Calculation of discriminant loadings.
1^j = R*Wj*
Where R is the correlation matrix. R can be obtained as follows
R = 1/(n-1)*(D-1/2SD-1/2)
D-1/2 contains the reciprocals of the standard deviations of the variables in original data
matrix.
For the given example
(a)
Cov_mat = (S)/42
- 20 -
1.0e+003*
0.3765 0.0009 0.0055 0.5738 0.0000 0.0247 0.0523 0.2024
0.0009 0.0015 0.0002 0.0132 0.0011 - 0.0004 - 0.0022 0.0406
0.0055 0.0002 0.0007 - 0.0012 0.0002 - 0.0012 0.0006 0.0159
0.5738 0.0132 - 0.0012 6.3522 0.0132 0.2072 - 0.0104 0.8024
0.0000 0.0011 0.0002 0.0132 0.0033 - 0.0017 - 0.0002 0.1115
0.0247 - 0.0004 - 0.0012 0.2072 - 0.0017 0.0191 - 0.0055 - 0.0568
0.0523 - 0.0022 0.0006 - 0.0104 - 0.0002 - 0.0055 0.0209 0.0280
0.2024 0.0406 0.0159 0.8024 0.1115 - 0.0568 0.0280 4.4081
19.4048 0 0 0 0 0 0 0
0 1.2224 0 0 0 0 0 0
0 0 0.8073 0 0 0 0 0
0 0 0 79.7004 0 0 0 0
0 0 0 0 1.8066 0 0 0
0 0 0 0 0 4.3703 0 0
0 0 0 0 0 0 4.5744 0
0 0 0 0 0 0 0 66.3932
W1* = C*(Eigen vector 1) W2* = C*(Eigen vector 2)
0.2125 0.6824
0.3044 -0.2621
-0.3944 -0.2816
-0.0243 -0.3179
1.4693 1.5996
0.6064 -0.6278
0.6160 -0.7379
-1.1479 -1.5037
(b)D-1/2 matrix
- 21 -
0.0527 0 0 0 0 0 0 0
0 0.8185 0 0 0 0 0
0 0 1.2478 0 0 0 0 0
0 0 0 0.0129 0 0 0 0
0 0 0 0 0.5558 0 0 0
0 0 0 0 0 0.2349 0 0
0 0 0 0 0 0 0.2204 0
0 0 0 0 0 0 0 0.0151
R = 1/(n-1)*(D-1/2SD-1/2) =
- 22 -
0.2443 -0.2561
0.2293 -0.1029
Variable contribution
In the previous section, we have seen that how discriminant loadings are calculated.
These discriminant loadings will help us to know about the contribution of the variables
in discriminating the objects. Discriminant loadings are the correlation between the
discriminant function and the corresponding variable. Let us suppose, if a variable
attached to the first discriminant function, then it contributes more in the discrimination
process, provided the discriminant function represent the more variation compare to other
discriminant function.
Step 2:
Calculation of Wilks’ Lambda ( )
For variable X1
G1(:,1)-Mn1(1)
-0.5882
-0.5882
-0.5882
-0.5882
-0.5882
-0.5882
-0.5882
-0.5882
-10.5882
19.4118
-10.5882
-0.5882
29.4118
-10.5882
-0.5882
-10.5882
-0.5882
w1= 1.6941e+003
G2(:,1)-Mn2(1)
-41
-1
- 23 -
-11
-1
-1
-1
-1
-1
-11
9
-1
49
9
29
-21
-11
9
-1
-1
-1
w2= 5980
G3(:,1)-Mn3(1)
30
30
10
-40
-40
10
w3 = 5200
W=w1+w2+w3 = 12874
Similarly,
T1 = G1(:,1)-Mn_whole(1))'*(G1(:,1)-Mn_whole(1))= 2.4631e+003
T2= (G2(:,1)-Mn_whole(1))'*(G2(:,1)-Mn_whole(1)) = 6.9988e+003
T2= (G3(:,1)-Mn_whole(1))'*(G3(:,1)-Mn_whole(1)) = 6.3531e+003
T = T1 + T2 +T3 = 1.5815e+004
- 24 -
X3 0.8381
X4 0.7783
X5 0.9125
X6 0.7864
X7 0.9293
X8 0.9636
The univariate F-values can also be calculated as mentioned earlier. These are given in
the following table.
Now the previously calculated discriminant loadings are stretched by multiplying them
with the respective variable’s univariate F-value. The stretched discriminant loadings are
given in the following table.
- 25 -
Stretched Discriminant Loadings
Variable
Function 1 Function 2
X1 2.041876 -1.99526
X2 0.029616 0.01608
X3 -1.86554 -0.7554
X4 2.444583 -3.07182
X5 0.653271 0.262958
X6 2.750295 -1.84131
X7 0.371336 -0.38927
X8 0.173122 -0.07769
x=
6.7255 -0.0758 0.1624 37.1536 -0.2595 1.3088 1.0951 6.6389
7.1373 0.1713 -0.4229 25.0065 0.6964 1.9706 0.9275 13.3889
-13.8627 -0.0954 0.2604 -62.1602 -0.4369 -3.2794 -2.0225 -20.0278
>> s=x'*x
1.0e+003 *
cov_mat=s/2
1.0e+003 *
- 26 -
0.6450 0.0037 -0.0104 2.9348 0.0175 0.1509 0.0948 0.9132
0.0046 0.0001 -0.0002 0.0175 0.0004 0.0012 0.0006 0.0082
0.0342 0.0003 -0.0007 0.1509 0.0012 0.0082 0.0049 0.0504
0.0210 0.0001 -0.0004 0.0948 0.0006 0.0049 0.0031 0.0301
0.2089 0.0019 -0.0049 0.9132 0.0082 0.0504 0.0301 0.3122
c=
12.0072 0 0 0 0 0 0 0
0 0.1486 0 0 0 0 0 0
0 0 0.3695 0 0 0 0 0
0 0 0 54.1738 0 0 0 0
0 0 0 0 0.6096 0 0 0
0 0 0 0 0 2.8593 0 0
0 0 0 0 0 0 1.7536 0
0 0 0 0 0 0 0 17.6699
ev1 =
0.0110
0.2490
-0.4886
-0.0003
0.8133
0.1388
0.1347
-0.0173
ev2 =
0.0352
-0.2144
-0.3488
-0.0040
0.8854
-0.1436
-0.1613
-0.0226
wstar1=c*ev1
wstar1 =
0.1321
0.0370
-0.1805
-0.0163
0.4958
- 27 -
0.3969
0.2362
-0.3057
>> wstar2=c*ev2
0.4227
-0.0319
-0.1289
-0.2167
0.5397
-0.4106
-0.2829
-0.3993
STD (g)
d-1/2 =
0.0833 0 0 0 0 0 0 0
0 6.7295 0 0 0 0 0 0
0 0 2.7064 0 0 0 0 0
0 0 0 0.0185 0 0 0 0
0 0 0 0 1.6404 0 0 0
0 0 0 0 0 0.3497 0 0
0 0 0 0 0 0 0.5703 0
0 0 0 0 0 0 0 0.0566
R=
- 28 -
0.9432 0.0826
-0.9698 -0.0400
0.8198 -0.5248
0.9745 0.0317
0.9412 -0.4277
0.8575 -0.4991
0.9719 -0.3905
- 29 -