0% found this document useful (0 votes)
2 views

Discriminant analysis

Discriminant analysis is a statistical technique used to classify cases into categories based on independent metric variables when the dependent variable is categorical. It can handle two groups (two-group discriminant analysis) or multiple groups (Multiple Discriminant Analysis), and involves deriving a discriminant function to maximize differences between groups. Key objectives include developing discriminant functions, assessing group differences, and evaluating classification accuracy, with validation being essential for ensuring generalizability of results.

Uploaded by

tsandrasanal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Discriminant analysis

Discriminant analysis is a statistical technique used to classify cases into categories based on independent metric variables when the dependent variable is categorical. It can handle two groups (two-group discriminant analysis) or multiple groups (Multiple Discriminant Analysis), and involves deriving a discriminant function to maximize differences between groups. Key objectives include developing discriminant functions, assessing group differences, and evaluating classification accuracy, with validation being essential for ensuring generalizability of results.

Uploaded by

tsandrasanal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Discriminant Analysis

1
• Discriminant analysis is an appropriate statistical technique when the
dependent variable is a categorical (nominal or nonmetric) variable
and the independent variables are metric variables.
• In many cases, the dependent variable consists of two groups or
classifications, for example, good versus bad or high versus low. In
other circumstances, more than two groups are involved, such as low,
medium, and high classifications.
• Discriminant analysis is capable of handling either two groups or
multiple groups.

2
• When two classifications are involved, the technique is referred to as
two-group discriminant analysis. When three or more classifications
are identified, the technique is referred to as Multiple Discriminant
Analysis (MDA).
• Discriminant analysis involves deriving a variate. The discriminant
variate is the linear combination of the two (or more) independent
variables that will discriminate best between the objects in the
groups defined a priori. Discrimination is achieved by calculating the
variate’s weight for each independent variable to maximize the
differences between the groups (i.e., the between-group variance
relative to the within-group variance).

3
Objectives of Discriminant Analysis
1. Development of discriminant functions, or linear combinations of the
predictor or independent variables, that best discriminate between the
categories of the criterion or dependent variable (groups).
2. Examination of whether significant differences exist among the groups,
in terms of the predictor variables.
3. Determination of which predictor variables contribute to most of the
inter-group differences.
4. Classification of cases to one of the groups based on the values of the
predictor variables.
5. Evaluation of the accuracy of classification

4
Assumptions
1. Cases or the individuals should be independent
2. Predictor variables should have a multivariate normal distribution
3. Within-group variance-covariance matrices should be equal across
the groups
4. Group membership is assumed to be mutually exclusive, i.e., no
case belongs to more than one group
5. Group membership should be collectively exhaustive, i.e., all cases
are members of a group

5
Discriminant analysis model
• The variate for a discriminant analysis, also known as the discriminant
function, is derived from an equation much like that seen in multiple
regression. It takes the following form:

Zjk= a+W1X1k+W2X2k+…..+WnXnk

Zjk = discriminant Z scores of discriminant function j for object k


a=intercept
Wi = discriminant weight for independent variable I
Xik= independent variable i for object k

6
Discriminant analysis model
• The coefficients or weights (Wi) are estimated so that the groups
differ as much as possible on the values of the discriminant function.
This occurs when the ratio of between-group sum of squares to
within-group sum of squares for the discriminant scores is at a
maximum. Any other linear combination of the predictors will result in
a smaller ratio.
• If the dependent variable consists of more than two groups,
discriminant analysis will calculate more than one discriminant
function. In fact, it will calculate NG-1 functions, where NG is the
number of groups. Each discriminant function will calculate a separate
discriminant Z score.

7
Discriminant analysis is the appropriate statistical technique for testing
the hypothesis that the group means of a set of independent variables
for two or more groups are equal.
By averaging the discriminant scores for all the individuals within a
particular group, we arrive at the group mean. This group mean is
referred to as a centroid.
When the analysis involves two groups, there are two centroids; with
three groups , there are three centroids; and so forth. The centroids
indicate the most typical location of any member from a particular
group, and a comparison of the group centroids shows how far apart
the groups are in terms of that discriminant function.

8
Assessing overall model fit
This assessment involve three tasks:
1. Calculating discriminant Z scores for each observation
2. Evaluating group differences on the discriminant z scores
3. Assessing group membership prediction accuracy

9
Assessing overall model fit
1. Calculating discriminant Z scores for each observation

• the discriminant z scores is calculated for each discriminant


function for every observation in the sample. The discriminant
score acts as a concise and simple representation of each
discriminant function.
• Groups can be distinguished by their discriminant scores, as we
will see, the discriminant scores can play an instrumental role in
predicting group membership.

10
Assessing overall model fit
2. Evaluating group differences on the discriminant z scores
• Once the discriminant Z scores are calculated, the first assessment of overall
model fit is to determine the magnitude of differences between the members
of each group in terms of the discriminant Z scores.
• A summary measure of the group differences is a comparison of the group
centroids. Each group will have a normal distribution of discriminant Z scores.
The degree of overlap between the discriminant score distributions can then
be used as a measure of the success of the technique

11
Assessing overall model fit
2. Evaluating group differences on the discriminant z scores
• The difference between centroids are measured in terms
of Mahalanobis D2 measure. It measures how much a
case's values on the independent variables differ from the
average of all cases. A large Mahalanobis distance
identifies a case as having extreme values on one or more
of the independent variables.
• Another test is based on likelihood ratio test, known as
Wilk’s Lamda test. It is the ratio of the determinant of the
within-group covariance matrix to the determinant of the
total covariance matrix.

12
Assessing overall model fit
3. Assessing group membership prediction accuracy
• To determine the predictive ability of a discriminant function, the researcher
must construct classification matrices.
• The classification matrix procedures provides a perspective on practical
significance. With multiple discriminant analysis, the percentage correctly
classified, also termed the hit ratio, reveals how well the discriminant
function classified the objects.

13
Assessing overall model fit
3. Assessing group membership prediction accuracy
• Classifying Individual observations
• The basic formula for computing the optimal cutting score
between any two groups is:
𝑁𝐴𝑍𝐵+𝑁𝐵𝑍𝐴
ZCS =
𝑁𝐴+𝑁𝐵
Where
ZCS = cutting score between groups A and B
𝑁𝐴 = number of observations in group A
𝑁𝐵 = number of observations in group B
𝑍𝐴 = centroid for group A
𝑍𝐵 = centroid for group B
14
Assessing overall model fit
3. Assessing group membership prediction accuracy
• Classifying Individual observations
• If the groups are specified to be of equal size, then optimum
cutting score will be halfway between the two group centroids
and becomes simply of the two centroids:
𝑍𝐵 +𝑍𝐴
ZCS =
2

15
The output of discriminant analysis consists of the following statistics:

1. Eigen Value: For each discriminant function, the eigenvalue is the


ratio of between group to within-group sums of squares. These
eigenvalues describes how much discriminating ability a function
possesses. The magnitudes of eigenvalues are indicative of the
function’s discriminating abilities.
2. Canonical correlation. Canonical correlation measures the extent of
association between the discriminant scores and the groups. It is a
measure of association between the single discriminant function
and the set of dummy variables that define the group membership.
It can be interpreted as the amount of variance in the discriminant
function scores that can be explained by group differences.
16
The output of discriminant analysis consists of the following statistics:
3. Centroid. The centroid is the mean values for the discriminant scores for
a particular group. There are as many centroids as there are groups, as
there is one for each group. The means for a group on all the functions
are the group centroids.
4. Classification matrix. Sometimes also called confusion or prediction
matrix, the classification matrix contains the number of correctly
classified and misclassified cases. In case of no misclassification, all
diagonals are non-zero and all off-diagonals are zero.
5. Discriminant function coefficients. The discriminant function coefficients
(unstandardised) are the multipliers of variables, when the variables are
in the original units of measurement.
6. Discriminant scores. The unstandardised coefficients are multiplied by
the values of the variables. These products are summed and added to
the constant term to obtain the discriminant scores

17
The output of discriminant analysis consists of the following statistics:
7. Pooled within-group correlation matrix. The pooled within-group correlation
matrix is computed by averaging the separate covariance matrices for all the
groups.
8. Box’s M test: It check the assumption of homogeneity of covariance matrices.
The null hypothesis for this test is that the observed covariance matrices for
the dependent variables are equal across groups
9. Standardised discriminant function coefficients. These are standardised
discriminant function coefficients or discriminant weights
10. Structure Matrix: this gives the correlation of predictor variables with the
discriminant function.
11. Wilk’s Lambda. Wilk’s λ for each predictor is the ratio of the within-group sum
of squares to the total sum of squares. It is defined as the proportion of the
total variance in the discriminant score not explained by difference among the
groups. Its value varies between 0 and 1. Large values of λ (near 1) indicate that
group means do not seem to be different. Small values of λ (near 0) indicate
that the group means seem to be different.

18
Validation of Results

• Validation is a critical step in any discriminating analysis because many


times, especially with smaller samples, the results can lack generalizability
(external validity).
• The most common approach for establishing external validity is the
assessment of hit ratios. Most often the validation of the hit ratios is
performed by creating a holdout sample, also referred to as the validation
sample.
• The purpose of utilizing a holdout sample for validation purposes is to see
how well the discriminant function works on a sample of observations not
used to derive the discriminant function.
• This process involves developing a discriminant function with the analysis
sample and then applying it to the holdout sample.
19
Calculate the hit ratio

Predicted Group
Membership Total
No Yes
Original Count No 384 133 517
Yes 48 135 183

You might also like