Вы находитесь на странице: 1из 11

Grading

Intro to Data Mining


..

assignment
z midterm exam
z final exam
z

40 %
30 %
30 %

Worapoj Kreesuradej, Ph.D.


Associate Professor
Data Mining & Data Exploration Laboratory (DME Lab),
Faculty of Information Technology,
King Mongkut's Institute of Technology Ladkrabang,
Web: www.it.kmitl.ac.th/dme
Email: worapoj@it.kimitl.ac.th, dme@it.kmitl.ac.th.

References
Christopher Westphal and Teresa
Blaxton, Data Mining Solutions, John Wiley
& Sons Inc., 1998.
z Pieter Adrians and Dolf Zantinge, Data
Mining, Addison Wesley, 1996.
z Michael J.A. Berry and Gordon Linoff,
Data Mining Techniques, John Wiley &
Sons Inc., 1997.
z Alex Berson and Stephen J. Smith, Data
Warehousing, Data Mining & OLAP,
McGraw Hill, 1997.
z

References
z

z
z

Vasant Dhar and Roger Stein, Seven


Methods for Transforming Corporate Data
Into Business Intelligence, Prentice Hall,
1997.
Cabena Peter etc., Discovering data
mining, Prentice Hall, 1997.
Business Modeling and Data Mining
by Dorian Pyle , Morgan Kaufmann; 1st
edition (April 2003) .
Exploratory Data Mining and Data Cleaning
by Tamraparni Dasu (Author), Theodore
Johnson (Author) , John Wiley & Sons; 1st
edition (May 9, 2003)

References
z

Building Data Mining Applications for CRM by


Alex Berson, Kurt Thearling, Stephen J. Smith,
McGraw-Hill Osborne Media; (December 22,
1999)
Statistical Modeling and Analysis for Database
Marketing: Effective Techniques for Mining Big
Databy Bruce Ratner , : Chapman & Hall; (May
2003)
Data Mining, Ian H. Witten, Eible Frank, Morgan
Kaufman, 2005.

What is Data Mining?


z Definition:

Data Mining is the


process of extracting previously
unknown, valid and actionable
information from large database and
then using the information to make
crucial business decisions.
z Alternative names: Knowledge
Discovery in Databases (KDD)

Evolution of Database
Technology

Evolution of Database
Technology

z 1960s:

z 1980s:

Data collection, database


creation, and network DBMS

z 1970s:

Relational data model,


relational DBMS implementation

RDBMS, advanced data


models (extended-relational, OO,
deductive, etc.) and applicationoriented DBMS (spatial, scientific,
engineering, etc.)
z 1990s2000s: Data mining and
data warehousing, multimedia
databases, and Web databases

Business Impact
Data Mining
Client Data

Increasing
business Impact

Data
Warehouse

Custom
Application

Data Mining

Query & Reporting tool

Information Discovery
Data Exploration
ERP

OLAP
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts

Packaged
Application
Custom
Application

Intelligence Enterprise

Potential Applications of
Data Mining
z Market

analysis and management

purchasing

pattern over time


cross-selling
customer profiling
direct mail campaign
market segmentation

OLAP

Data Sources
Paper, Files, Information Providers, Database Systems, OLTP

Potential Applications of
Data Mining
z Risk

analysis and management

forecasting
credit

scoring for loan application


processing
profile of attrition (churn management)

Potential Applications of Data


Mining
z Fraud

detection and management

money

laundering detect suspicious


money transactions
detecting Inappropriate Medical
Treatments

Potential Applications of Data


Mining
z Web

mining

Web

Usage Mining
Web Content Mining

Automatic Classification of Web


Document

Web

Potential Applications of
Data Mining
z

Structure Mining

Data mining process


Pattern Evaluation

Text mining

Data Mining

Dividing documents into groups


Document feature extraction

Data Preparation
Preprocessed
Data

Selection
Structured Data

Business
Objective

Target
Data

Uns tructured
Data

Databases

Data mining process


z Business

Objectives Determination

Identify

the business problems or


opportunity

z Data

Selection

Identify

all internal or external sources


of information and select which
subset of the data is needed for the
data mining application.

Data mining process


z Data
the

Transformation

goal is to transform data to suit the


intended analysis and the data formats
required by the data mining algorithms,
many of which have particular
requirements.

Data mining process


z Data

Preprocessing

The

goal of data preprocessing is to


ensure the quality of the selected data.
current data set, sampling data, unit
conversion, representation formats,
detecting missing value

Data mining process


z Data

Mining

Select
Data

modeling technique

Mining Operations

Predictive Modeling
Database Segmentation
Link Analysis
Visualization

Database Segmentation
(clustering)

Database Segmentation

z partitioning

a database into
segments of similar records, that is
records that share a number of
properties.

Annual
Income

Model:

K-means, Kohonen neural


networks

Age

Predictive Modeling
models for future prediction
Classification:
predicts categorical class labels
Prediction:
models continuous-valued
functions

Example of Predictive Modeling:


Classification

z Finding

z Model:

decision-tree, neural network

Classifier
Training
Data

Unseen Data
(Jeff, Professor, 4)

NAME
Tom
Merlisa
George
Joseph

RANK
YEARS TENURED
Assistant Prof
2
no
Associate Prof
7
no
Professor
5
yes
Assistant Prof
7
yes

Tenured?

Link Analysis

Visualization of Link Analysis


software

z Finding

frequent patterns,
associations, correlations, or causal
structures among sets of items or
objects in transaction databases,
relational databases, and other
information repositories
Model:

Apiori Algorithm,

Visualization

Visualization

Visualization of a decision
tree in MineSet 3.0

Data mining process


z Analysis

of results

Interpret

and evaluate the output form


data mining.
Have we found something that is
interesting, valid, and actionable?

Data mining process


z Assimilation
The

of knowledge

objective is to put into action,


according to the new, valid and
actionable information from the
previous process steps.

Effort Required for Each


Data Mining Process Step

Methodology for data mining

Examples of Data Mining


Systems

zCRISP-DM
Cross

Industry Standard
Process for Data Mining
(CRISP-DM)

zConsortium

of data
miners from various
industries
manufacturing,
marketing, and
government

Examples of Data Mining


Systems
z SAS
A

Enterprise Miner

variety of statistical analysis tools

Data

warehouse tools and multiple data


mining algorithms

z Clementine
Multiple

(from SPSS)

data mining algorithms and


advanced statistics

z IBM

Intelligent Miner

wide range of data mining algorithms


Scalable mining algorithms
Toolkits: neural network algorithms,
statistical methods, data preparation,
and data visualization tools
Tight integration with IBM's DB2
relational database system

Examples of Data Mining


Systems
z SQL

Server 2005

Multiple

data mining modules:


discovery-driven OLAP analysis,
association, classification, and
clustering

Tight

integration with SQL Server


relational database system

Examples of Data Mining


Systems

Examples of Data Mining


Systems

z Oracle

z Weka

Data Miner

Multiple

data mining modules:


discovery-driven OLAP analysis,
association, classification, and
clustering

Examples of Data Mining


Systems
z DBMiner

Inc.)

Multiple

(DBMiner Technology

data mining modules:


discovery-driven OLAP analysis,
association, classification, and
clustering
Efficient, association and sequentialpattern mining functions, and visual
classification tool
Mining both relational databases and
data warehouses

(Open Source Software)

Multiple

data mining modules:


association, classification,
visualization and clustering

Open

Source Software

Trends in Data Mining


z Application

exploration

development

of application-specific
data mining system
Invisible data mining (mining as built-in
function)

Trends in Data Mining


z Scalable

data mining methods

Constraint-based

mining: use of
constraints to guide data mining
systems in their search for
interesting patterns

z Integration

of data mining with


database systems, data
warehouse systems, and Web
database systems

Trends in Data Mining


z Standardization

of data mining

language
A

standard will facilitate systematic


development, improve
interoperability, and promote the
education and use of data mining
systems in industry and society

PMML (Predictive Model Markup Language)


OLE DB for Data Mining
JDM API (Java Data Mining API)

z Web

Mining
z Multimedia Mining

Data Mining: Confluence of


Multiple Disciplines
Database
Technology

Machine
Learning

Information
Science

Statistics

Data Mining

Visualization

Other
Disciplines

Thank you !!!

Вам также может понравиться