Академический Документы
Профессиональный Документы
Культура Документы
6.2 Scoring
Response variable
Level of interest
Group-by variable (when available)
Models to compare
Model Comparison Properties
Cumulative Lift cumulative lift up to and including the specified percentile bin of the data, sorted in
descending order of the predicted event probabilities.
Cumulative % Captured cumulative number of events observed up to and including the specified
percentile bin divided by the total numbers of events, sorted in descending order of the predicted event
probabilities.
Cumulative % Events cumulative number of events observed up to and including the specified
percentile bin divided by the total number of observations, sorted in descending order of the predicted
event probabilities.
Gain similar to a lift chart. It equals the expected response rate using the predictive model divided by
the expected response rate from using no model at all.
Gini Coefficient (Somers D) a measure of the quality of the model. It has values between -1 and 1.
Closer to 1 is better.
Gamma (Goodman Kruskals Gamma) a measure of rank correlation, it measures the strength of
association of cross tabulated data that is measured at the ordinal level. Values range from -1 to 1.
Tau (Kendalls Tau Coefficient) a measure of rank correlation with values between -1 and 1. It is also
called the Kendall rank correlation coefficient.
Model Comparison Results: Logistic Reg vs. Decision Tree
These two panels are displayed to help you analyze the results of the
model comparison:
Assessment
Categorical response
Lift
ROC
Misclassification
Fit Statistic
Model Comparison Summary Table
6.2 Scoring
1. Determine which model you want to export. (Most likely your champion
model results from performing a model comparison.)
2. Click Options (the down arrow on the Visualization toolbar) and select
Export Score Code.
You Have the Score Code Now What?
Determine the structure and file type of the new data to be scored.
Make sure that the data have the same fields as the original data
that you used to build the Visual Statistics model.
Where does the data live?
Are the data inside a database?
Score the new data using any of the following:
SAS code
C
Java
PMML
Student Exercise
Comparing Models &
Creating Score Code
Using SAS Data Mining & Machine Learning
6.2 Scoring
Model building
Model building is an
Confirmatory and Tournament
off-line process
exploratory statistics implied
Machine Learning
Training This process repeats until
This is the Retrain
data no more improvement is possible
definition of an (i.e. the model reaches
analytic machine! convergence)
Champion
Algorithm
New
Input
Estimated
Hypothesis output
(score code)
Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.
Machine Learning Why Is It Important Now?
The big question is why is machine learning gaining momentum now? There are three reasons for this:
1. Maturity of Machine Learning as a field in terms of new methods and algorithms, many have been
refactored to run in memory
2. The profusion of data to learn from
3. The ability to leverage advances in computational technology with Machine Learning.
Computation is abundant and cheap.
SAS Visual Data Mining and Machine Learning (Visual Interface)
https://www.sas.com/en_my/software/ana
lytics/data-mining-machine-learning.html
Forests
+ -
Problem: Take the + + The decision point is
issue of classifying
these 10 binary points + - - determined from
consideration of
correctly in two- possible explanatory
dimensional space variables (columns)
+ - available in the data
- set
Decision point
+
+
+ +
+ + - given greater weight
+ - -
++
Points correctly
classified in first
-
gradient are given
lower weights
-
Decision
point
Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.
How GB Models Work: Gradient #3
# #
1 2
+ _
_+ _ + Decision
+ point
_
+
_
Model continues to focus on high weight points misclassified from prior iteration
Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.
How GB Models Work: After Successive Iterations
#1 #2 #3
+ _
This is why it +
_ _ + In this overly
simple
is called an
additive
+ example, all
model _ points are now
+_ correctly
classified!