Discriminant Analysis or D.A is a multivariate technique used to
classify cases into distinct groups. It separates distinct sets of objects (or
observations) and allocates new objects (or observations) to previously
defined groups. Discriminant analysis is concerned with the problem of
classification, which arises when a researcher having made a number of
measurements on an individual, wishes to classify the individual into one of
several categories on the basis of these multivariate measurements
(Onyeagu, 2003).
Discriminant analysis will help us analyze the differences between
groups and provide us with a means to assign or classify any case into the
groups which it most closely resembles.
There are two aspects of discriminant analysis,
1. Predictive Discriminant Analysis (PDA) or Classification, which is
concerned with classifying objects into one of several groups and
2. Descriptive Discriminant Analysis (DDA) which focused on
revealing major differences among the groups (Stevens 1996).

According to Huberty (1994), Descriptive discriminant analysis includes the
collection of techniques involving two or more criterion variables and a set
of one or more grouping variables, each with two or more levels. “Whereas
in predictive discriminant analysis (PDA) the multiple response variables
play the role of predictor variables. In descriptive discriminant analysis
(DDA) they are viewed as outcome variables and the grouping variable(s) as
the explanatory variable(s). That is, the roles of the two types of variables
involved in a multivariate multigroup setting in DDA are reversed from the
role in PDA.

A researcher may wish to discard variables that are redundant (in the
presence of other variables) when a large number of variables are available
for groups separation. Here (in discriminant analysis), variables (say y’s) are
selected and, the basic model does not change. Unlike regression, where
independent variables are selected and consequently, the model is altered.
Stepwise selection is a combination of forward and backward
variables selection methods. In forward selection, the variable entered at
each step is the one that maximizes the partial F-Statistic based on Wilks’.
The maximal additional separation of groups above and beyond the

separation already attained by the other variables is thus obtained. The
proportion of these F’s that exceed Fα is greater than α. While in backward
selection (elimination), the variable that contributes least is deleted at each
step as shown by the partial F.
The variables which are selected one at a time, and at each step, are
re-examined to see if any variable that entered earlier has become redundant
in the presence of recently added variables. When the largest partial F
among the variables available for entry fails to exceed a preset threshold
value, the procedure stops.
Stepwise discriminant Analysis is a form of discriminant analysis.
During the selection process no discriminant functions are calculated.
However, after the completion of the subset selection, discriminant function
is calculated for the selected variables. These variables can also be used in
the construction of classification functions.

1. Construct the discriminant function.
2. Evaluate the discriminant function for population one (1) by
substituting the mean values of X1, X2, ….., Xp into Y = L1X1 + L2
X2+…+LPXP, label the value obtained, Y1.

3. Repeat step 2 for population two (2) and label the value obtained, Y2.
4. Since one is usually greater than the other, assume Y2 > Y1
5. Compute the critical value, YC = Y1 + Y2 2 6. Then state the discriminating procedure as; assign the new individual
to population one (1) if Y < YC and to population two (2) if Y > YC or
YC < Y.

Johnson and Wichern (1992) defined two goals of discriminant
analysis as:
1. To describe either graphically (in at most three dimensions) or
algebraically the differential features of objects (or observations)
from several known collections (populations). We try to find
discriminants such that the collections are separated as much as
2. To sort objects (observations) into two or more labeled classes.
The emphasis is on deriving a rule that can be used to optimally
assign a new object to the labeled classes. Johnson and Wichern
(1992) used the term discrimination to refer to Goal 1 and
Classification or Allocation to refer to goal 2.

The goals of discriminant analysis include identifying the relative
contribution of the p variables to separation of the groups and finding the
optimal plane on which the points can be projected to illustrate the
configuration of the groups.

1. A geologist might wish to classify fossils into their respective
categories of fossils groups on the basis of measurements on sizes,
shapes and ages of the fossils.
2. A doctor may intend to classify new born babies into different
categories of blood groups, based on measurement obtained from the
blood samples of the babies.
3. Students applying for admission into a University are given a common
Entrance Examinations (CEE), the vector of their scores in the
entrance examination is a set of measurement, X. The problem is to
classify a student on the basis of his scores on the entrance
4. An automobile Engineer might decide to classify an automobile
engine into one of several categories of engine on the basis of
measurement of its power output, size and shape.

5. A nutritionist might classify food substances into categories of food
nutrient as carbohydrate, minerals, water, protein, fat and oil, and
vitamin on the basis of measurement on comparative amount of
different nutrients in the food.
As we have seen in the examples above, individuals are assigned to
groups taking cognizance of data related to the groups.

This study is necessary for the following purposes:
1. For classification of cases into groups using the stepwise
methodologies of discriminant analysis;
2. To identify and discard or remove redundant variables or variables
which are little related to group distinction;
3. To compare the probabilities of misclassification and the hit ratios
obtained with discriminant analysis (all independent variables) to
that obtained with stepwise procedures.


1.7.1 Discriminant Function
This is a latent variable which is created as a linear combination of
discriminating variables, such that
Y = L1x1 + L2x2 + …..+ Lp xp
where the L’s are the discriminant coefficients, the x’s are the discriminating

1.7.2 The eigenvalue: This is the ratio of importance of the dimensions
which classifies cases of the dependent variables. There is one eigenvalue
for each discriminant function. With more than one discriminant function,
the first eigenvalue will be the largest and the most important in explanatory
power, while the last eigenvalue will be the smallest and the least important
in explanatory power.
Relative importance is assessed by eigenvalues since they reflect the
percents of variance explained in the dependent variable, cumulating to
100% for all functions. Eigenvalues are part of the default of output in SPSS
(Analysis, Classify, Discrimination).


1.7.3 The Discriminant Score
This is the value obtained from applying a discriminant function
formula to the data for a given case. For standardized data, Z score is the
discriminant score.

1.7.4 Cutoff
When group sizes are equal, the mean of the two centroids for two
groups discriminant analysis is the cut off. The cut off is the weighted mean
if the groups are unequal. A case is classed as 0 if the discriminant score of
the discriminant function is less than or equal to the cut off or classed as 1 if
above it.

1.7.5 The Relative Percentage
This is equal to the eigenvalue of a function divided by the sum of
all eigenvalues of all discriminant functions in the model. It is the percent of
discriminating power for the model associated with a particular discriminant
function. It tells us how many functions are important. The ratio of
eigenvalues indicates the relative discriminating power of the discriminant


1.7.6 The Canonical Correlation, R*
This measures the association between the groups formed by the
dependent and the given discriminant function. A large canonical correlation
indicates high correlation between the discriminant functions and the groups.
An R* of 1.0 shows that all of the variability in the discriminant scores can
be accounted for by that dimension. The relative percentage and R* do not
have to be correlated. Canonical Correlation, R* , also shows how much each
function is useful in determining group differences.

1.7.7 Mahalanobis Distances
This is the distance between a case and the centroid for each group
(of the dependent variables) in attribute space (a dimensional space defined
by n variables). There is one mahalanobis distance for each group of case,
and it will be classified as belonging to the group with the smallest
mahalanobis distance. This means that the closer the case to the group
centriod, the smaller the mahalanobis distance. Mahalanobis distance is
measured in terms of standard deviations from the centroid.
1.7.8 The Classification Table
This is a table in which the rows are observed categories of the
dependent and the columns are the predicted categories of the dependent. All
cases lie on the diagonal at perfect prediction.

1.7.9 Hit Ratio
This is the percentage of cases on the diagonal of a confusion
matrix. It is the percentage of correct classifications. The higher the hit ratio
the less the error of misclassification, also the less the hit ratio the higher the
error rate.
1.7.10 Tolerance
This is the proportion of the variation in the independent variables that
is not explained by the variables already in the model. Zero tolerance means
that the independent variable under consideration is a perfect linear
combination of other variables already in the model. A tolerance of 1 implies
that the predictor variables are completely independent of other predictor
variables already in the model. Most computer packages set the minimum
tolerance at 0.01 as the default option.