Вы находитесь на странице: 1из 35

Visualization for Analytics [Week 6]

Suneel Grover, M.B.A., M.S.


The George Washington University (Dept. of Decision Science - M.S. in Business Analytics) - Professorial Lecturer
SAS Principal Solutions Architect
Using SAS Visual Statistics

6.1 Comparing Models

6.2 Scoring

6.3 Visual Data Mining & Machine Learning


Objectives

Discuss model comparison features.


Describe comparison results.
Compare a logistic regression model with a decision tree model.
Model Comparison in SAS Visual Statistics

Model comparison involves comparing the performance of


competing models based on various benchmarking criteria.
Criteria differ depending on model type and response variable type
(continuous or categorical).
Model comparison requires at least two trained models that are
initialized and updated.
Model comparisons are not saved.
If changes are made to a model after model comparison, you must re-
create the model comparison.
Model Comparison Window

In the Model Comparison window, select the following:

Response variable
Level of interest
Group-by variable (when available)
Models to compare
Model Comparison Properties

Name: enables you to specify the name for this


comparison.
Fit statistic: specifies the comparison criterion
that is plotted in the Fit Statistics window and
used to determine the champion model.
Prediction Cutoff: when available, specifies the
cutoff probability that determines whether an
observation is a modeled event. The default is
.50
Percentile: when available, specifies the plotted
percentile for the specified fit statistic. The
default is 5 (percent).
Fit Statistic Definitions

Misclassification a measure of how many observations are incorrectly classified for


each value of the response variable.
C Statistic represents the area under the ROC curve.
Kolmogorov-Smirnov (K-S) Statistic a goodness of fit statistic that represents the
maximum separation between the model ROC curve and the baseline ROC curve.
FPR false positive rate.
FDR false discovery rate or the expected false positive rate.
F1 Score the weighted average of precision (positive predicted value) and recall
(sensitivity). It is also known as the F-score or F-measure.
Lift a measure of the advantage (or lift) of using a predictive model to improve on the
target response versus not using a model. It is a measure of the effectiveness of a
predictive model calculated as the ratio between the results obtained with and without the
predictive model. The higher the lift in the lower percentiles of the chart, the better the
model is.
Fit Statistic Definitions

Cumulative Lift cumulative lift up to and including the specified percentile bin of the data, sorted in
descending order of the predicted event probabilities.
Cumulative % Captured cumulative number of events observed up to and including the specified
percentile bin divided by the total numbers of events, sorted in descending order of the predicted event
probabilities.
Cumulative % Events cumulative number of events observed up to and including the specified
percentile bin divided by the total number of observations, sorted in descending order of the predicted
event probabilities.
Gain similar to a lift chart. It equals the expected response rate using the predictive model divided by
the expected response rate from using no model at all.
Gini Coefficient (Somers D) a measure of the quality of the model. It has values between -1 and 1.
Closer to 1 is better.
Gamma (Goodman Kruskals Gamma) a measure of rank correlation, it measures the strength of
association of cross tabulated data that is measured at the ordinal level. Values range from -1 to 1.
Tau (Kendalls Tau Coefficient) a measure of rank correlation with values between -1 and 1. It is also
called the Kendall rank correlation coefficient.
Model Comparison Results: Logistic Reg vs. Decision Tree

These two panels are displayed to help you analyze the results of the
model comparison:
Assessment
Categorical response
Lift
ROC
Misclassification
Fit Statistic
Model Comparison Summary Table

The Model Comparison summary table provides these two tabs:


Statistics: provides summary statistics for each model in the comparison.
Variable Importance: indicates which variables had the greatest impact on
each of the models in the comparison.
Summary Table: Logistic Regression vs. Decision Tree

Category Response: Cause of Death


Using SAS Visual Statistics

6.1 Comparing Models

6.2 Scoring

6.3 Visual Data Mining & Machine Learning


Model Scoring

Scoring is the process of applying the model to data to make


predictions.
Score code could also be applied to validation data. Scored validation data
can be used to honestly assess a model built with the Visual Statistics
functionality.
SAS Visual Analytics can export the model score code for use with other
SAS products.
Score code consists only of Base SAS code.
You can export the score code at any time from a fitted model.
The scoring code is saved as a .sas file.
The score code is usually contained in a single SAS DATA step. It can be
represented in alternate code forms such as C, Java, or PMML.
Model Scoring

1. Determine which model you want to export. (Most likely your champion
model results from performing a model comparison.)
2. Click Options (the down arrow on the Visualization toolbar) and select
Export Score Code.
You Have the Score Code Now What?

Determine the structure and file type of the new data to be scored.
Make sure that the data have the same fields as the original data
that you used to build the Visual Statistics model.
Where does the data live?
Are the data inside a database?
Score the new data using any of the following:
SAS code
C
Java
PMML
Student Exercise
Comparing Models &
Creating Score Code
Using SAS Data Mining & Machine Learning

6.1 Comparing Models

6.2 Scoring

6.3 Visual Data Mining & Machine Learning


SAS Visual Analytics + Statistics + VDMML

SAS Visual SAS Visual


SAS Visual Data Mining &
Analytics Statistics
Machine Learning
SAS Visual Data Mining & Machine Learning - Roles Continuum
Analysts and Business Users
Discovery and Reporting

Data Scientists Data Miners, Statisticians


and Programmers and Programmers
Machine Learning Predictive Analytics
Machines Are Integral to the Predictive Analytic Process

Traditional Predictive Analysis This is the process we


Training data follow when we do
data mining
Hypothesis testing is the
end goal

Model building
Model building is an
Confirmatory and Tournament
off-line process
exploratory statistics implied

(supervised versus Champion Scoring is done on-


Algorithm
unsupervised) are line or in-line
supported
New
Input
Estimated
output
Hypothesis
Scoring (score code)
Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.
An Analytic Machine That Learns

Machine Learning
Training This process repeats until
This is the Retrain
data no more improvement is possible
definition of an (i.e. the model reaches
analytic machine! convergence)

Champion
Algorithm

New
Input
Estimated
Hypothesis output
(score code)
Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.
Machine Learning Why Is It Important Now?

Data Computing Power Algorithms

The big question is why is machine learning gaining momentum now? There are three reasons for this:

1. Maturity of Machine Learning as a field in terms of new methods and algorithms, many have been
refactored to run in memory
2. The profusion of data to learn from
3. The ability to leverage advances in computational technology with Machine Learning.
Computation is abundant and cheap.
SAS Visual Data Mining and Machine Learning (Visual Interface)

Machine Learning Techniques


Factorization Machine
Forest
Gradient Boosting
Neural Network
Support Vector Machine
Common Features
Training-validation
Auto-tuning
Model Assessment
Score Code or Analytic Store
Model comparison
SAS Visual Data Mining and Machine Learning (Programming Interface)

https://www.sas.com/en_my/software/ana
lytics/data-mining-machine-learning.html
Forests

Forests are used primarily when building a


classification models on large datasets by
generating many decision trees
simultaneously from slightly different
samples of the training data.
Uses bootstrapping (or sampling with
replacement) to generate lots of trees
simultaneously.
The final tree is an ensemble from all the
other trees.
Gradient Boosting

Gradient Boosting is used primarily when


building a classification models on large
datasets by generating many decision
trees sequentially from slightly different
subsamples of the training data.
Uses subsampling (or sampling without
replacement) to generate lots of trees
sequentially.
It makes slight adjustments to the final tree
with each tree it makes so that when you are
finished your tree is a composite of all the
trees that were run before it.
Random Forest vs Gradient Boosting

Both techniques leverage multiple decision trees, but in very


different ways.

Both are ensembling techniques, with Gradient Boosting typically


achieving greater precision as compared to Random Forests

Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.


How GB Models Work: Initial Gradient (#1)
Initially all points are given the same weight

+ -
Problem: Take the + + The decision point is
issue of classifying
these 10 binary points + - - determined from
consideration of
correctly in two- possible explanatory
dimensional space variables (columns)
+ - available in the data

- set

Decision point

Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.


How GB Models Work: Gradient #2
#
1 Points misclassified
from last gradient are

+
+
+ +
+ + - given greater weight

+ - -
++
Points correctly
classified in first
-
gradient are given
lower weights
-
Decision
point
Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.
How GB Models Work: Gradient #3
# #
1 2

+ _
_+ _ + Decision
+ point
_
+
_
Model continues to focus on high weight points misclassified from prior iteration
Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.
How GB Models Work: After Successive Iterations

#1 #2 #3

Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.


How GB Models Work: The Ensemble Model Is Built
#1 #2 #3

+ _
This is why it +
_ _ + In this overly
simple
is called an
additive
+ example, all
model _ points are now
+_ correctly
classified!

Copy rig ht SA S Institute Inc. A ll rig hts re se rve d.


Autotuning & Hyper Parameters

Decision tree: PROC TREESPLIT Neural Networks: PROC NNET


Depth of tree Number of hidden layers
Splitting criterion Number of neurons in each hidden layer
Number of bins for interval variables L1 regularization
Forest: PROC FOREST L2 regularization
Number of trees SGD options (annealing rate, learning rate)
Number of levels in each tree Support Vector Machines: PROC SVMACHINE
Bootstrap sampling rate Polynomial degree
Number of inputs used for splitting a node Penalty value
Gradient Boosting: PROC GRADBOOST Factorization Machine: PROC FACTMAC
Number of iterations (trees) Number of factors
Sampling proportion Step size (learning rate)
LASSO (L1) regularization Number of iterations
Ridge (L2) regularization
Number of inputs used for splitting a node
Learning Rate
Teacher Demo
Visual Data Mining &
Machine Learning
Questions?
Thank You

Вам также может понравиться