Вы находитесь на странице: 1из 25

Data Mining

Dr. Saed Sayad


University of Toronto 2010 saed.sayad@utoronto.ca

http://chem-eng.utoronto.ca/~datamining/

Data Mining

Data mining is about explaining the past and predicting the future by means of data analysis.

http://chem-eng.utoronto.ca/~datamining/

Data Mining
Statistics AI & Machine Learning

Data Mining

Database & DW
http://chem-eng.utoronto.ca/~datamining/ 3

Data Mining Applications


CRM Banking Credit Scoring Direct Marketing/ Fundraising Fraud Detection Retail Insurance Telecom Manufacturing Science Health care/ HR Medical/ Pharma Government applications Other e-Commerce Biotech/Genomics Web Travel/Hospitality Security / Anti-terrorism Junk email / Anti-spam Investment / Stocks Entertainment/ Music Gambling 0 10 20 30

Source: KDnuggets.com

40

50

60 4

http://chem-eng.utoronto.ca/~datamining/

Data mining activity in 2007 compare to 2006


much lower somewhat lower 5% 4% much higher 20%

about the same 41%

somewhat higher 30%

Source: KDnuggets.com

http://chem-eng.utoronto.ca/~datamining/

Data Mining Steps


1 2

Problem Definition Data Preparation


Data Exploration Modeling Evaluation

3
4 5 6

Deployment
http://chem-eng.utoronto.ca/~datamining/ 6

CRISP-DM Process Model


CRoss-Industry Standard Process for Data Mining

Source: http://www.crisp-dm.org/Process/index.htm http://chem-eng.utoronto.ca/~datamining/ 7

1. Problem Definition
Understanding the project objectives and requirements from a business perspective and then converting this knowledge into a data mining problem definition with a preliminary plan designed to achieve the objectives.

Source: http://www.crisp-dm.org/Process/index.htm

http://chem-eng.utoronto.ca/~datamining/

2. Data Preparation
Data
DSN

ETL
Data
Text

Modeling Data
http://chem-eng.utoronto.ca/~datamining/ 9

3. Data Exploration
Average, StDev, Min, Max, ...

Univariate Analysis

Bar, Line, Pie, ...


Charts

Data Exploration Correlation Z test, ... Bivariate Analysis

Combination Charts
http://chem-eng.utoronto.ca/~datamining/ 10

Data Exploration - Univariate

http://chem-eng.utoronto.ca/~datamining/

11

Data Exploration - Bivariate

http://chem-eng.utoronto.ca/~datamining/

12

4. Modeling
Classification Regression
Linear Regression Robust Regression Neural Network

Clustering

Association

Bayesian

Hierarchical

A Priori

Decision Tree Logistic Regression

K-Means

SVM

http://chem-eng.utoronto.ca/~datamining/

13

Data Mining: Classification & Regression


Frequency Table
OneR

Covariance Matrix
Linear
Regression

Similarity Functions
KNN

Neural Networks
Perceptron

Others

SVM

Bayesian

LDA
(Z Score)

Back
Propagation

GA

Decision Tree Markov Chains

PCA/PCR

RBF

Logistic Regression Robust Regression

HMM

Scalable Methods

http://chem-eng.utoronto.ca/~datamining/

14

Modeling - Classification

Age

Responder
e.g., Y or N

http://chem-eng.utoronto.ca/~datamining/

15

Modeling - Regression

Age

Amount Purchased e.g., $350

http://chem-eng.utoronto.ca/~datamining/

16

Modeling - Clustering

Income

Age

http://chem-eng.utoronto.ca/~datamining/

17

Association Rules
Market Basket Analysis

http://chem-eng.utoronto.ca/~datamining/

18

5. Evaluation
Charts
Gain Chart Lift Chart K-S Chart

Stats
Confusion Matrix Mean Square Error Variables Contribution

http://chem-eng.utoronto.ca/~datamining/

19

Evaluation - Confusion Matrix


Positive Cases Predicted Positive Negative Cases

True Positive
CM

False Positive True Negative


20

Predicted Negative

False Negative

http://chem-eng.utoronto.ca/~datamining/

Evaluation Gain Chart


Responder%

100%

45%

10%
Population%

10%

50%
http://chem-eng.utoronto.ca/~datamining/

100%
21

6. Deployment

SQL

VB

JAVA

HTML

http://chem-eng.utoronto.ca/~datamining/

22

Data Mining Team


Modeler

Domain
Expert

DBA

Analyst
23

http://chem-eng.utoronto.ca/~datamining/

Data Mining Software Vendors


SAS

SPSS

KXEN

Data Mining
Angoss KNIME

http://chem-eng.utoronto.ca/~datamining/

24

Case Study...

http://chem-eng.utoronto.ca/~datamining/

25

Вам также может понравиться