Data Mining

white paper
using analytic services data mining

framework for classification
predicting the enrollment of students at a university a case study
ata Mining is the process of knowledge discovery involving finding

hidden patterns and associations, constructing analytical models,
performing classification and prediction, and presenting mining results. Data
Mining is one of the functional groups that is offered with Hyperion System
9 BI+ Analytic Services a highly scalable enterprise class architecture
analytic server (OLAP). The Data Mining Framework within Analytic Services
integrates data mining functions with OLAP and provides the users with
highly flexible and extensible on-line analytical mining capabilities. On-line
analytical mining greatly enhances the power of exploratory data analysis by
providing users with the facilities for data mining on different subsets of data
at different levels of abstraction in combination with the core analytic services
like drill up, drill down, pivoting, filtering, slicing and dicing all performed
on the same OLAP data source.
introduction
This paper focuses on using Nave Bayes, one of the Data
Mining algorithms (shipped in-the-box with Analytic
Services) to develop a model to solve a typical business
problem in the admissions department at an academic
university referred to as ABC University in the paper. The
paper details out the approach that is taken by the user to solve
the problem and explains the various steps that are performed
by using Analytic Services in general and the Analytic Services
Data Mining Framework in particular, towards arriving at the
solution.
problem statement
One of the problems related to managing admissions that
typical universities face is to be able to predict with reasonable
accuracy the likelihood that an applicant would eventually

enroll in an academic program. Universities typically incur a
considerable expense in promoting their programs and in
following up with prospective candidates. Identifying
applicants with a higher likelihood of enrollment into the
program will help the university channel the promotional
expenditure in a more gainful way. The candidates typically
apply to more than one university to widen their chances of
getting enrolled within that academic year. Universities that
can quickly arrive at a decision on the applicant stand a higher
chance of getting acceptance from candidates.
ABC University collects from applicants a variety of data as
part of the admissions process: demographic, geographic, test
scores, financial information, etc. In addition to that, the
admissions department at the ABC University also has
hyperion.com
white paper
acceptance information from the previous years admissions
process. The problem at hand is to use all this available data
and predict whether an applicant will choose to enroll or not.
The ABC University is also interested in analyzing the
composite factors influencing the enrollment decision. This
additional analysis is useful in adjusting the admissions policy
at the university and also in ensuring effective cost
management in the admissions department.
available data
The admissions department is currently gathering
demographic, geographic, test scores, financial information,
etc., from applicants as part of the admissions process. There
is also historical data available indicating the actual
enrollment status of applicants along with all the other
attributes that were collected as part of the admission process.
The dataset made available has 33 different attributes for
each applicant inclusive of the decision result attribute. There
are in all about 11000 records available.
Table 1: List of potential mining attributes available in database

2
hyperion.com
white paper
preparing for data mining

cube is the data source
The algorithms in the Data Mining Framework are designed to
work on data present within an Analytic Services cube. The
design of the cube should take into consideration the data
needs for all kinds of analyses (OLAP and Data Mining) that
the user is interested in performing. Once the data is brought
into the cube environment it can then be accessed through the
Data Mining Framework for predictive analytics.
The Data Mining Framework uses MDX expressions to
identify sections within the cube to obtain input data for the
algorithm as well as to write back the results. The Data Mining
Framework can only take regular dimension members as
mining attributes. What this implies is that only data that is
referenced through regular dimension members (not through
attribute dimensions or user defined attributes) can be
presented as input data to the Data Mining Framework.
Accordingly, the data that is required for predictive analytics
should be modeled within the standard dimensions and
measures within a cube.
In the case study being discussed in this paper, the primary
business requirement was to build a classification model for
prediction. Since there were no other accompanying business
requirements, the design of the Analytic Services cube was
primarily driven by the Data Mining analytics need. For
example, we have not used any attribute dimension modeling
in the case study. However, in the generic case it is more likely
that the cube caters to both regular OLAP analytics and
predictive analytics within the same dimensional model.
preparing mining attributes

The available input data can broadly be of two data types
number or string. However, since measures in Analytic
Services are essentially stored in the database in a numerical
format, the string type input data will have to be encoded into
a number type data before being stored in Analytic Services.
For example, if the gender information is available as a string
stating Male or Female it needs to be first encoded into a
numeric like 1 or 0, before being stored as a measure in the
Analytic Services OLAP database.
Mining attributes can be of two types categorical or
numerical. Mining attributes that describe discrete
information content like gender (Male or Female), zip code
(95054, 94304, 90210, etc.), customer category (Gold, Silver,
Blue), status information (Applied, Approved, Declined,
On Hold), etc. are termed categorical attribute types.
Mining attributes that describe continuous information
content like sales, revenue, income, etc. are termed numerical
attribute types. The Analytic Services Data Mining Framework
has the capability of working with algorithms that can handle
both categorical and numerical attribute types. Among the
algorithms that are shipped in the box with the Analytic

Services Data Mining Framework, the Nave Bayes and the
Decision Tree algorithms have the capability to handle both
categorical as well as numerical mining attribute types and
treat them accordingly.
One of the key steps in Data Mining is the data auditing or
the data conditioning phase. This involves putting together,
cleansing, categorizing, normalizing, and proper encoding of
data. This step is usually performed outside the Data Mining
tool. The effectiveness of the Data Mining algorithm is largely
dependent on the quality and completeness of the source data.
In some cases, for various mathematical reasons, the available
input data may also need to be transformed before it is
brought into a Data Mining environment. Transformations
may sometimes also include splitting or combining of input
data columns. Some of these transformations may be done on
the input dataset outside the Data Mining Framework by
using standard data manipulation techniques available in ETL
tools or RDBMS environments. For the current case the input
data does not need any mathematical transformation, but
some encoding is needed to convert data into a format that can
be processed within the Analytic Services OLAP environment.
In the current problem at the ABC University, the available
set of input data consisted of both string and number
data types. The list below gives some of the input data,
which needed encoding of string type input into number
type input:
Identity related data like Gender, City, State, Ethnicity
Data related to the application process like Application
Status, Primary Source of contact, Applicant Type, etc.
Date related data like Application Date, Source Date, etc.
(Dates were available in the original dataset as strings,
specifically they had two different formats yymmdd
and mm/dd/yy, and they had to be encoded into a number.)
In the current case study, these encodings were done
outside the Analytic Services environment by the construction
of look-up master tables where the string type input were
listed in a tabular format and the records were sequentially
numbered. Subsequently, the string type input was referred to
by its corresponding numeric identifier during data load into
Analytic Services. Table 2 shows a few samples of how such
mapping files will look like.
State
ID
1
2
3
4
5
6
hyperion.com
State
Name
VT
CA
MA
MI
NH
NJ
AppliedStatus Application Status

ID
3
Applied
4
Offered Admission
5
Paid Fees
6
Enrolled
Table 2: Typical mapping of numeric identifiers

3
white paper
preparing the cube

After all the input data has been identified and made ready, the
next step is to design an outline and load the data into an
Analytic Services cube.
In the context of the current case the Analytic Services
outline created was as follows:
All the input data (measures in the OLAP context) were
organized together into five groups (a two level hierarchy
created in the measures dimension) based on a logical
grouping of measures. The details of each of the measure
are explained in the table below -Table 3: Analytic Services
outline expanded.
Measure Group
Data load is performed just as it is normally done for any

Analytic Services cube.
At this stage we have:
Designed an Analytic Services cube
Loaded it with relevant data
It should be noted that the steps described so far are
generic to Analytic Services cube building and did not need
any specific support from the Analytic Services Data Mining
Framework.
Explanation
Measures related to information about the applicants identity were organized into this
group. Some of these measures were transformed from string type to number type
to facilitate modeling it within the Analytic Services database context.
Measures related to various test scores and high school examination results were
organized into this group.
Measures related to the context of the applicants application processing have been
organized together into this group.
Measures related to the academic background.
Measures providing information about the financial support and funding associated
with the applicant.
Table 3: Analytic Services outline expanded

4
hyperion.com
white paper
identifying the optimal set of

mining attributes
attributes, grouped by the input attribute type categorical or

numerical.
It is necessary to reduce the number of attributes / variables

presented to an algorithm so that the information content is
enhanced and the noise minimized. This is usually performed
using supporting mathematical techniques to ensure that the
most significant attributes are retained within the dataset that
is presented to the algorithm. It should be noted here that the
choice of significant attributes are more driven by the
particular data rather than by the problem itself. Attribute
analysis or attribute conditioning is one of the initial steps in
the Data Mining process and is currently performed outside
the Data Mining Framework. The main objective during this
exercise is to identify a subset of mining attributes that are
highly correlated with the predicted attribute; while ensuring
that the correlation within the identified subset of attributes is
as low as possible.
The Analytic Services platform provides for a wide variety
of tools and techniques that can be used in the attribute
selection process. One method to identify an optimal set of
attributes is to use certain special data reduction techniques
implemented within Analytic Services through Custom
Defined Functions (CDFs). Additionally, users can use other
data visualization tools like Hyperion Visual Explorer to arrive
at a decision on the effectiveness of specific attributes in
contributing to the overall predictive strength of the Data
Mining algorithm. Depending on the nature of the problem
the users may choose to utilize an appropriate tool and
technique in deciding the optimal set of attributes.
One of the advantages of working with the Analytic
Services Data Mining Framework is the inherent capability in
Analytic Services to support customized methods for attribute
selection by the use of Custom Defined Functions (CDFs).
This is essential since the process of mining attribute selection
can vary significantly across various problems and having an
extensible toolkit comes in very handy to be able to customize
a method to suit a specific problem.
In the current case at ABC University, a CDF was used to
identify the correlation effects amongst the available set of
mining attributes. A thorough analysis of various subsets of
the available mining attributes was performed to identify a
subset that is highly correlated with the predicted mining
attribute and at the same time has low correlation scores
within the subset in itself. Since some Data Mining algorithms
(like Nave Bayes, Neural Net) are quite sensitive to interattribute dependencies, an attempt was made to outline the
clusters of mutually dependent attributes, with a certain
degree of success. From each cluster a single, most convenient,
attribute was selected. For this case study, an expert made the
decision, but this process can be generalized to a large degree.
An optimal set of five mining attributes was identified after
this exercise. Table 4 shows the list of identified mining
Categorical Type
FARecieved
AppStatus
Applicant Type
Numerical Type
StudBudget
TotalAward
Table 4: Optimal set of mining attributes identified

Identified the optimal subset of measures (mining attributes)
modeling the problem

We will now use the Data Mining Framework to define an
appropriate model (for the business problem) based on the
Analytic Services cube and the identified subset of mining
attributes (measures). Setting up the model includes selecting
the algorithm, defining algorithm parameters and identifying
the input data location and output data location for the
algorithm.
choosing the algorithm

The next step in the Data Mining process is to pick the
appropriate algorithm. There are a set of six basic algorithms
provided in the Data Mining Framework Nave Bayes,
Regression, Decision Tree, Neural Network, Clustering and
Association Rules. The Analytic Services Data Mining
Framework also allows for the inclusion of new algorithms
through a well defined process described in the vendor guide
that is part of the Data Mining SDK. The six basic algorithms
are a sample set that is shipped with the product to provide a
starting point for using the Data Mining Framework.
Choosing an algorithm for a specific problem needs basic
knowledge of the problem domain and the applicability of
specific mathematical techniques to efficiently solve problems
in that domain.
The specific problem that is being discussed in this paper
falls into a class of problems termed as classification problems.
The need here is to classify each applicant into a discrete set of
classes on the basis of certain numerical and categorical
information available about the applicant. The class referred
to in this context is the status of the applicants application
looked at from an enrollment perspective: will enroll or will
not enroll. There is historical data available indicating which
kind (with a specific combination of categorical and
numerical factors associated with them) of applicants that
have gone ahead and accepted offers from the ABC University
and subsequently enrolled into the programs. There is data
available for the negative case as well i.e. applicants that did
not eventually enroll into the program.
hyperion.com
white paper
Given the fact that this problem can be looked at as a
classification problem and the fact that there is historical
information available, one of the algorithms that is suitable for
the analysis is the Nave Bayes classification algorithm. We
chose Nave Bayes for modeling this particular business
problem.
deciding on the algorithm parameters

Every algorithm has a set of parameters that control the
behavior of the algorithm. Algorithm users need to choose the
parameters based on their knowledge of the problem domain
and the characteristics of the input data. Analytic Services
provides adequate support for such preliminary analysis of
data using Hyperion Visual Explorer or the Analytic Services
Spreadsheet Client. Users are free to analyze the data using any
tool convenient and determine their choices for the various
algorithm parameters.
Each of the algorithms has a set of parameters that
determine the way the algorithm will process the input data.
For the current case, the algorithm chosen is Nave Bayes and
it has four parameters that need to be specified Categorical,
Numerical, RangeCount, Threshold. The details of each of the
parameters and the implications of setting them are described
in the online help documentation.
Out of the selected list of attributes we have a few that are
of categorical type and hence our choice for the Categorical
parameter is a yes. Similarly, there are attributes that are of
numerical type and hence the choice for Numerical
parameter also is a yes. The data was analyzed using a
histogram plot to understand the distribution before deciding
on the value to be provided for the RangeCount parameter.
This parameter needs to be large enough to allow for the
algorithm to use all the variety available in the data and at the
same time should be small enough to prevent over fitting.
From the analysis of the input data for this particular case,
setting this parameter 12 seemed reasonable. The
RangeCount controls the binning1 process in the algorithm.
It should be emphasized that the binning schemes (including
bin count) really depend on the specific circumstances and
may vary to a great degree between different problems.
Identified the optimal subset of measures (mining attributes)
Chosen the algorithm suitable for the problem
Identified the parameter values for the chosen algorithm
in effectively using the Data Mining functionality to provide

predictive solutions to business problems.
1. Building the Data Mining model
2. Testing the Data Mining model
3. Applying the Data Mining model
Each of these steps, performed using the Data Mining
Wizard in the Administration Services Console, uses MDX
expressions to define the context within the cube to perform
the data mining operation. Various accessors, specified as
MDX expressions, identify data locations within the cube. The
framework uses the data in the locations as input to the
algorithm or writes output to the specified location.
Accessors need to be defined for each of the algorithms so
as to let the algorithm know specific contexts for each of the
following:
(the attribute domain) the expression to identify the factors of our analysis that will be used for prediction [In the
current context this expression pertains to the mining
attributes that we identified]
(the sequence domain) the expression to identify the
cases/records that need to be analyzed [In the current
context this expression will identify the list of applicants]
(the external domain) the expression to identify if multiple
models need to be built [Not relevant in the current
context]
(the anchor) the expression to specify the additional
restrictions from dimensions that are not really participating in this data mining operation [In the current context all the dimensions of the cube that we used have
relevance to the problem. Accordingly, the anchor in the
current context only helps restrict the algorithm scope to
the right measure in the Measures dimension]
Additional details for each of these expressions can be
obtained from the online help documentation.
building the data mining model

To access the Data Mining Framework, you will need to bring
up the Data Mining Wizard in the Administration Services
Console, and choose the appropriate application and database
as shown in Figure 1 on the next page.
applying the data mining framework

Now that we have completed all the preparatory steps for Data
Mining, the next step is to use the Data Mining Wizard in the
Administration Services Console to build a Data Mining
model for the business problem. There are three steps involved
6
hyperion.com
white paper
Figure 1: Choosing the application and database
In the next screen (Figure 2 below), depending on whether you

are building a new model or revising an existing model, you
choose the appropriate task option.
Figure 2: Creating a Build Task
hyperion.com
white paper
Figure 3: Settings to handle missing data
This will bring up the wizard screen for setting the

algorithm parameters and the accessor information associated
with the chosen algorithm, in this case Nave Bayes. The user
will select a node in the left pane to see and provide values for
the appropriate options and fields displayed in the right pane.
As shown in Figure 3, select Choose mining task settings to
set how to handle missing data in the cube. The choice in this
case is to replace with As NaN (Not-A-Number).
The Nave Bayes algorithm requires that we declare upfront

if we plan to use either or both of Categorical and Numerical
predictors. In the context of the current case, we have both
categorical and numerical attribute types and hence the choice
is True for both these parameters. RangeCount was decided
at 12. Threshold was fixed at 1e-4, a very small value. Figure
4 shows the completed screen for the parameters setting.
Figure 4: Setting parameters

8
hyperion.com
white paper
The Nave Bayes algorithm has two predictor accessors
Numerical Predictor and Categorical Predictor and one
target accessor. Figure 5 shows the various domains that need
to be defined for the accessors. Table 5 shows the values that
were used for the case being discussed. All the information
provided during this stage of model building is preserved in a
template file so as to facilitate reuse of the information if
necessary.
Figure 5: Accessors associated with Naive Bayes algorithm
Table 5: Setting up accessors for the build mode while using Naive Bayes algorithm
hyperion.com
white paper
Figure 6: Generating the template and model
Once the accessors are defined, the Data Mining Wizard

will prompt the user to provide names for the template and
model that will be generated at this stage. Figure 6 shows the
screen in which the model and template names need to be
defined.
Built a Data Mining model built using the Nave Bayes
algorithm
testing the data mining model

The next step will be to test the newly built model to verify that
it satisfies the level of statistical significance that is needed for
the model to be put to use. Ideally, a part of the input data
(with valid known outcomes historical data) will be set aside
as a test dataset to verify the goodness of the Data Mining
10
model that is developed by the use of the algorithm. Testing

the model on this test dataset and comparing the outcomes
predicted by the model against the known outcomes
(historical data) is also one among the multiple processes
supported by the Data Mining Wizard. A test mode template
can be created by a process similar to creating a build mode
template as described in the previous section. While building
the test mode template the user needs to provide a
Confidence parameter to let the Data Mining Framework
know the minimum confidence level necessary to declare the
model as a valid one. We specified a value of 0.95 for the
Confidence parameter. The exact steps in the wizard and
descriptions of the various parameters can be obtained from
the online help documentation.
hyperion.com
white paper
Once the process is completed the results of the test appear
(the name of which was specified in the last step of the Data
Mining Wizard) against the Model Results node. Figure 7
shows the node in the Administration Services Console
Enterprise View pane where the Mining Results node is
visible.
The model can be queried within the Administration
Services Console interface to obtain a list of the model
accessors by using the Query Result functionality. Invoking
Show Result for the Test accessor will indicate the result of
the test. Figure 8 below shows the list of model accessors in the
result set of a model based on the Nave Bayes algorithm used
in the test mode.
If the Test accessor has a value 1.0 then the test is deemed
successful and the model is declared good or valid for
prediction. Figure 9 shows the result of test for the case being
discussed in this paper.
Built a Data Mining model built using the Nave Bayes
algorithm
The model has been verified as valid with 95% confidence
Figure 8: Model accessors for result set associated with a

model based on Naive Bayes algorithm
Figure 7: Model Results node in the Administration

Services Console interface
Figure 9: Test results
hyperion.com
11
white paper
applying the data mining model
interpreting the results
The intent at this stage is to use the recently constructed Data

Mining model to predict whether new applicants are likely to
enroll into the program. Using the Data Mining model in the
apply mode is similar to the earlier two steps. The Data Mining
Wizard guides the user to provide the parameters appropriate
to the apply mode. The Target domain is usually different in
the apply mode since data is written back to the cube. The
details of the various accessors and the associated domains can
be obtained from the online help documentation. Table 6
shows the values that were provided to the Data Mining
Wizard to use the model in the apply mode.
Just as in the build mode the names of the results model
and template are specified in the wizard and the template is
saved before the model is executed. The results of the
prediction are written into the location specified by the
Target accessor The mining attribute that is referred to by
the MDX expression: {[ActualStatus]}. The results can be
visualized either by querying the model results in the
Administration Services Console using the Query Result
functionality as described in the previous section, or by
accessing the cube and reviewing the data written back to the
cube. One of the options to view the results will be to use the
Analytical Services Spread Sheet Client to connect to the
database and view the cube data for the ActualStatus measure.
The results of the Data Mining model need to be interpreted

in the context of the business problem that it is attempting to
solve. Any transformation done to the input measures need to
be appropriately adjusted for while attempting to interpret the
results. In the context of the case being discussed in this paper,
the intent was to predict whether applicants were likely to
enroll at the ABC University. The possible outcomes in this
case are either the applicant will enroll or the applicant will
not enroll. The model was verified against the entire set of
available data (over 11300 records).
the confusion matrix

You can construct a confusion matrix by listing the false
positives and false negatives in a tabular format. A false
positive happens when the model predicts that an applicant
will enroll and in reality the applicant does not enroll. A false
negative happens when the model predicts that an applicant
will not enroll and in reality the applicant does enroll. The
results predicted by the model can be compared with the
actual outcome as available in the historical data to build the
confusion matrix. In general for such classification problems,
it is most likely that one of these (false positives or false
negatives) will be slightly more important than the other in a
business context. In the case being discussed in this paper, a
false negative means lost revenue, whereas a false positive
Table 6: Setting up accessors for the apply mode while using Naive Bayes algorithm
12
hyperion.com
white paper
means additional promotional expenditure in trying to follow
up on an applicant who will eventually not enroll. The
importance of each should be analyzed in the context of the
business and the model needs to be rebuilt if necessary with a
different training set (historical data) or with a different set of
attributes.
Figure 10 below shows the confusion matrix constructed
using the data set that was analyzed as part of this case study.
It is evident from the confusion matrix that the model
predicted that 1550 (1478 + 72) students will enroll. Of that,
only 1478 actually enrolled and 72 did not enroll. This implies
that there were 72 false positives. Similarly, the model
predicted that 9805 (9356 + 449) students will not enroll. Of
that, only 9356 actually did not enroll, whereas 449 actually
did enroll. This implies that there were 449 false negatives.
Mining Wizard. The details of each of these transformations,

what they do and how to use them can be obtained from the
Analytic Services online help documentation. This list of
transformations is further extensible through the import of
custom Java routines written specifically for the purpose. The
details of how to write Java routines to be imported as
additional transforms can be obtained from the vendor guide
that is shipped as part of the Data Mining SDK
mapping
In some cases when the model has been developed for a
different context and needs to be used elsewhere, the
Mapping functionality is useful. Through this functionality
the user can provide information to the Data Mining
Framework on how to interpret the existing model accessors
in the new context in which it is being deployed. More
information on using this functionality can be obtained from
the online help documentation.
import/export of pmml models

The Data Mining Framework allows for portability through
import and export of mining models using the PMML format.
setting up models for scoring

Figure 10: Confusion matrix to analyze the models
effectiveness in prediction
analyzing the results

On further analysis of the results the following observations
can be made:
Incorrect Predictions # of Cases Percentage of Cases
False positives
72
0.634%
False negatives
449
3.954%
Total
521
4.59%
Success rate of the model: 95.41% (only 521 incorrect

predictions in 11355 cases)
additional functionality
The Analytic Services Data Mining Framework offers more
functionality that can be used when deploying models in real
business scenarios. Some of the further steps that can be
considered include:
transformations
The Data Mining Framework also offers the ability to apply a
transform to the input data just before it is presented to the
algorithm. Similarly, the output data can be transformed
before being written into the Analytic Services cube. The Data
Mining Framework offers a basic list of transformations exp,
log, pow, scale, shift, linear that can be used through the Data
The Data Mining models built using the Analytic Services

Data Mining Framework can also be set up for scoring. In the
scoring mode the user interacts with the model at real time
and the results are not written to the database. The input data
can either be sourced from the cube or through data templates
which the user fills up during execution. The scoring mode of
deployment can be combined with custom applications built
using developer tools provided by Hyperion Application
Builder to make applications that cater to a specific business
process while leveraging powerful predictive analytic
capability from the Analytic Services Data Mining Framework.
The online help documentation provides additional details on
how to score a Data Mining model.
using the data mining framework

in batch mode
There is also a batch mode interface to access the
functionalities provided in the Data Mining Framework.
Scripts written using the MaxL command interface can be
used to do almost all the functionality that is exposed through
the Data Mining Wizard. Details of the MaxL commands and
their usage can be obtained from the online help
documentation.
building custom applications

Custom applications can be developed using Analytic Services
as the backend database and developer tools provided along
with Hyperion Application Builder. The functionality
provided by the Data Mining Framework can be invoked
through APIs.
hyperion.com
13
white paper
summary
suggested reading
Data Mining is one of the functional groups among the

comprehensive enterprise class analytic functionalities offered
within Analytic Services. This case study focused on using the
Nave Bayes algorithm to solve a classification problem,
modeled using a real life data set. It was possible to get a
95.41% success rate in the classification exercise using the
Analytic Services Data Mining Framework.
Some of the business benefits of Data Mining in the OLAP
context that can be illustrated from the current case include:
It can serve as a discovery tool in a critical decisionsupport process. It includes evaluation of the critical
parameters affecting the outcome of a customer (applicant) behavior. The ABC University had initially assumed
that some time-related factors played a stronger role in
influencing the judgment to enroll. The Data Mining
exercise proved it not to be true. In fact, some other, financial attributes appeared as number one.
The successful prediction mechanism can become a base
for a full-blown risk-management application. In case of
ABC University, again, they can devise a policy to invest
more promotional expenditure in tracking applicants
with distinctly higher academic credentials but with moderate probability of enrollment. Similarly, the prediction
mechanism can help the admissions department in
making decisions on admission offers even before they
have seen the entire applicant pool.
Operational control and reporting tool. Traditional OLAP
reporting can provide visibility into the state of the
admissions operations, extent of funds utilization and
reporting on various other financial/operational indicators; in all providing better control on the conformance
between planned and actual business positions.
1. Data Mining: Concepts and Techniques

Jiawei Han, Micheline Kamber
2. Data Mining Techniques: For Marketing, Sales, and Customer
Relationship Management
Michael J. A. Berry, Gordon S. Linoff.
3. Data Mining Explained
Rhonda Delmater, Jr., Monte Hancock
4. Data Mining: A Hands-On Approach for Business
Professionals (Data Warehousing Institute Series)
Robert Groth
footnote
1
Breaking up a continuous range of data into discrete

segments / bins.

,
. .
. .
Copyright 2005 Hyperion Solutions Corporation. All rights reserved. Hyperion, the Hyperion H logo, and Hyperions product names are trademarks of Hyperion. References to
other companies and their products use trademarks owned by the respective companies and are for reference purpose only. 5164_0805
hyperion.com

Data Mining

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Mining

Загружено:

Авторское право:

Доступные форматы

white paper

using analytic services data mining

predicting the enrollment of students at a university a case study

ata Mining is the process of knowledge discovery involving finding

accuracy the likelihood that an applicant would eventually

Table 1: List of potential mining attributes available in database

preparing for data mining

preparing mining attributes

algorithms that are shipped in the box with the Analytic

AppliedStatus Application Status

Table 2: Typical mapping of numeric identifiers

preparing the cube

Data load is performed just as it is normally done for any

Measures related to the academic background.

Table 3: Analytic Services outline expanded

identifying the optimal set of

attributes, grouped by the input attribute type categorical or

It is necessary to reduce the number of attributes / variables

Table 4: Optimal set of mining attributes identified

modeling the problem

choosing the algorithm

deciding on the algorithm parameters

in effectively using the Data Mining functionality to provide

building the data mining model

applying the data mining framework

Figure 1: Choosing the application and database

In the next screen (Figure 2 below), depending on whether you

choose the appropriate task option.

Figure 2: Creating a Build Task

Figure 3: Settings to handle missing data

This will bring up the wizard screen for setting the

The Nave Bayes algorithm requires that we declare upfront

Figure 4: Setting parameters

Figure 5: Accessors associated with Naive Bayes algorithm

Figure 6: Generating the template and model

Once the accessors are defined, the Data Mining Wizard

testing the data mining model

model that is developed by the use of the algorithm. Testing

Figure 8: Model accessors for result set associated with a

Figure 7: Model Results node in the Administration

Figure 9: Test results

applying the data mining model

interpreting the results

The intent at this stage is to use the recently constructed Data

The results of the Data Mining model need to be interpreted

the confusion matrix

Mining Wizard. The details of each of these transformations,

import/export of pmml models

setting up models for scoring

analyzing the results

Success rate of the model: 95.41% (only 521 incorrect

The Data Mining models built using the Analytic Services

using the data mining framework

building custom applications

Data Mining is one of the functional groups among the

1. Data Mining: Concepts and Techniques

Breaking up a continuous range of data into discrete

Вам также может понравиться