Академический Документы
Профессиональный Документы
Культура Документы
introduction
This paper focuses on using Nave Bayes, one of the Data
Mining algorithms (shipped in-the-box with Analytic
Services) to develop a model to solve a typical business
problem in the admissions department at an academic
university referred to as ABC University in the paper. The
paper details out the approach that is taken by the user to solve
the problem and explains the various steps that are performed
by using Analytic Services in general and the Analytic Services
Data Mining Framework in particular, towards arriving at the
solution.
problem statement
One of the problems related to managing admissions that
typical universities face is to be able to predict with reasonable
hyperion.com
white paper
acceptance information from the previous years admissions
process. The problem at hand is to use all this available data
and predict whether an applicant will choose to enroll or not.
The ABC University is also interested in analyzing the
composite factors influencing the enrollment decision. This
additional analysis is useful in adjusting the admissions policy
at the university and also in ensuring effective cost
management in the admissions department.
available data
The admissions department is currently gathering
demographic, geographic, test scores, financial information,
etc., from applicants as part of the admissions process. There
is also historical data available indicating the actual
enrollment status of applicants along with all the other
attributes that were collected as part of the admission process.
The dataset made available has 33 different attributes for
each applicant inclusive of the decision result attribute. There
are in all about 11000 records available.
hyperion.com
white paper
hyperion.com
State
Name
VT
CA
MA
MI
NH
NJ
white paper
Measure Group
Explanation
Measures related to information about the applicants identity were organized into this
group. Some of these measures were transformed from string type to number type
to facilitate modeling it within the Analytic Services database context.
Measures related to various test scores and high school examination results were
organized into this group.
Measures related to the context of the applicants application processing have been
organized together into this group.
Measures providing information about the financial support and funding associated
with the applicant.
hyperion.com
white paper
Categorical Type
FARecieved
AppStatus
Applicant Type
Numerical Type
StudBudget
TotalAward
hyperion.com
white paper
Given the fact that this problem can be looked at as a
classification problem and the fact that there is historical
information available, one of the algorithms that is suitable for
the analysis is the Nave Bayes classification algorithm. We
chose Nave Bayes for modeling this particular business
problem.
hyperion.com
white paper
hyperion.com
white paper
hyperion.com
white paper
The Nave Bayes algorithm has two predictor accessors
Numerical Predictor and Categorical Predictor and one
target accessor. Figure 5 shows the various domains that need
to be defined for the accessors. Table 5 shows the values that
were used for the case being discussed. All the information
provided during this stage of model building is preserved in a
template file so as to facilitate reuse of the information if
necessary.
Table 5: Setting up accessors for the build mode while using Naive Bayes algorithm
hyperion.com
white paper
hyperion.com
white paper
Once the process is completed the results of the test appear
(the name of which was specified in the last step of the Data
Mining Wizard) against the Model Results node. Figure 7
shows the node in the Administration Services Console
Enterprise View pane where the Mining Results node is
visible.
The model can be queried within the Administration
Services Console interface to obtain a list of the model
accessors by using the Query Result functionality. Invoking
Show Result for the Test accessor will indicate the result of
the test. Figure 8 below shows the list of model accessors in the
result set of a model based on the Nave Bayes algorithm used
in the test mode.
If the Test accessor has a value 1.0 then the test is deemed
successful and the model is declared good or valid for
prediction. Figure 9 shows the result of test for the case being
discussed in this paper.
At this stage we have:
Built a Data Mining model built using the Nave Bayes
algorithm
The model has been verified as valid with 95% confidence
hyperion.com
11
white paper
Table 6: Setting up accessors for the apply mode while using Naive Bayes algorithm
12
hyperion.com
white paper
means additional promotional expenditure in trying to follow
up on an applicant who will eventually not enroll. The
importance of each should be analyzed in the context of the
business and the model needs to be rebuilt if necessary with a
different training set (historical data) or with a different set of
attributes.
Figure 10 below shows the confusion matrix constructed
using the data set that was analyzed as part of this case study.
It is evident from the confusion matrix that the model
predicted that 1550 (1478 + 72) students will enroll. Of that,
only 1478 actually enrolled and 72 did not enroll. This implies
that there were 72 false positives. Similarly, the model
predicted that 9805 (9356 + 449) students will not enroll. Of
that, only 9356 actually did not enroll, whereas 449 actually
did enroll. This implies that there were 449 false negatives.
mapping
In some cases when the model has been developed for a
different context and needs to be used elsewhere, the
Mapping functionality is useful. Through this functionality
the user can provide information to the Data Mining
Framework on how to interpret the existing model accessors
in the new context in which it is being deployed. More
information on using this functionality can be obtained from
the online help documentation.
additional functionality
The Analytic Services Data Mining Framework offers more
functionality that can be used when deploying models in real
business scenarios. Some of the further steps that can be
considered include:
transformations
The Data Mining Framework also offers the ability to apply a
transform to the input data just before it is presented to the
algorithm. Similarly, the output data can be transformed
before being written into the Analytic Services cube. The Data
Mining Framework offers a basic list of transformations exp,
log, pow, scale, shift, linear that can be used through the Data
hyperion.com
13
white paper
summary
suggested reading
footnote
1
,
. .
. .
Copyright 2005 Hyperion Solutions Corporation. All rights reserved. Hyperion, the Hyperion H logo, and Hyperions product names are trademarks of Hyperion. References to
other companies and their products use trademarks owned by the respective companies and are for reference purpose only. 5164_0805
hyperion.com