Академический Документы
Профессиональный Документы
Культура Документы
2. Examples include the selection process for a job, the admission process
of an educational programme in a college, or dividing a group of people
into potential buyers and non-buyers.
7. The independent (x) variables are continuous scale variables, and used as
predictors of the group to which the objects will belong. Therefore, to be able to
use discriminant analysis, we need to have some data on y and the x variables
from experience and / or past records.
:
v
Then, the classification of the existing data points is done using the equation,
and the accuracy of the model is determined. This output is given by the
classification matrix (also called the confusion matrix), which tells us what
percentage of the existing data points is correctly classified by this model.
This percentage is somewhat analogous to the R2 in regression analysis
(percentage of variation in dependent variable explained by the model). Of
course, the actual predictive accuracy of the discriminant model may be less
than the figure obtained by applying it to the data points on which it was
based.
2. The coefficients of x1 and x2 are the ones which provide the answer, but
not the raw (unstandardised) coefficients. To overcome the problem of
different measurement units, we must obtain standardised discriminant
coefficients. These are available from the computer output.
Suppose SBB has managed to get from SBI, its sister bank, some data
on SBI¶s credit card holders who turned out to be µlow risk¶ (no
default) and µhigh risk¶ (defaulting on payments) customers. These
data on 18 customers are given in fig. 1.
÷
÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷÷ ÷÷
÷
÷ ÷÷ ÷÷ ÷÷ ÷÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷ ÷ ÷ ÷
÷
We will perform a discriminant analysis and advise SBB on how to set up
its system to screen potential good customers (low risk) from bad customers
(high risk). In particular, we will build a discriminant function (model) and
find out
.How to classify a new credit card applicant into one of the two
groups ± µlow risk¶ or µhigh risk¶, by building a decision rule and a
cut off score.
Input Data are given in fig. 1.
'
We will now find answers to all the four questions
we have raised earlier.
Q1. How good is the Model? How many of the 18
data points does it classify correctly?
To answer this question, we look at the computer
output labelled fig. 3. This is a part of the
discriminant analysis output from any computer
package such as SPSS, SYSTAT, STATISTICA,
SAS etc. (there could be minor variations in the exact
numbers obtained, and major variations could occur
if options chosen by the student are different. For
example, if a priori probabilities chosen for the
classification into the two groups are equal, as we
have assumed while generating this output, then you
will very likely see similar numbers in your output).
As mentioned earlier, this level of accuracy may not hold for all future
classification of new cases. But it is still a pointer towards the model being a
good one, assuming the input data were relevant and scientifically collected.
There are ways of checking the validity of the model, but these will be
discussed separately.
-1 .3 7 0 + 1 .3 7
M e a n o f G ro u p 1 M e a n o f G ro u p 2
(L o w R is k ) (H ig h R is k )
÷
O
O
!
"
#
O
According to our decision rule, any discriminant score to the left of the
midpoint of 0 leads to a classification in the low risk group. Therefore,
we should give this person a credit card, as he is a low risk customer. The
same process is to be followed for any new applicant. If his discriminant
score is to the right of the midpoint of 0, he should be denied a credit
card, as he is a µhigh risk¶ customer.