Вы находитесь на странице: 1из 19

UNIT- 3

DATA MINING

WHAT IS DATA MINING?


Data Mining is a collection of techniques for efficient automated discovery of previously unknown , valid , novel, useful and understandable patterns in large databases. Pattern must be actionable so that they may be used in an enterprises decision making process It is also known as Knowledge Discovery Data Mining refer to the extraction of hidden predictive information patterns from large database.

DATA MINING

Raw Information

Data Mining

Hidden information Pattern

NEED FOR DATA MINING


Data mining has found many application in the last few years for a number of reasons: Growth in OLTP data Growth in data due to cards Growth in data due to web Growth in data due to telephone transactions, banking, medical. Growth in data storage capacity Decline in cost processing Availability of software/ tool

DATA MINING PROCESS


Requirement analysis Clearly define goals Clearly define business problem Data Selection and Collection Cleaning and preparing data Data mining exploration and validation Implementing , evaluating and monitoring Results visualization

CRISP( CROSS INDUSTRY STANDARD PROCESS) DATA MINING MODEL

KNOWLEDGE DISCOVERY PROCESS


Data mining is process of knowledge discovery. Data mining discovers knowledge or information that you never knew was present in your data. Knowledge itself as manifests itself as relationships and patterns.

RELATIONSHIPS
Data mining discovers relationships between two or more different objects along with the time dimension. Sometime relationship may occur between same objects Discovery of relationships is a key result of data mining.

KNOWLEDGE DISCOVERY PHASES


1.

2.
3. 4. 5. 6.

Define Business Objectives Prepare data Perform data mining Evaluate results Present Discoveries Incorporate Usage of Discoveries

KNOWLEDGE DISCOVERY PROCESS


Evaluation and Application of Results Application of suitable Data Mining Techniques

Selection and preparation of data

Determination Of business objectives

DATA MINING VS DATA WAREHOUSE

OLAP What is happening in enterprise. Summary data

Limited dimensions Small number of attributes. User driven , interactive analysis Multidimensional , drill down , and slice- and- dice Mature and widely used

DATA MINING Predict future based on why this happening. Detailed transaction- level data. Large dimensions Many dimension attributes. Data- driven automatic knowledge discovery. Prepare data, mining tools Still emerging

RELATIONSHIP OF DATA WAREHOUSE AND DATA MINING


Data mining algorithms need large amount of data, detailed level data whereas in data warehouse contain lowest level of data. Data mining need integrated and cleansed data whereas data warehouse contain data that is suitable for data mining. Infrastructure of data warehouse is robust, with parallel processing technology and relational database systems since data mining needs this type of data

DATA MINING TECHNIQUES


Association rules mining or market basket analysis Supervised classification Cluster Analysis Web data mining Search engines

TECHNIQUES
Association rules mining or market basket analysis
Transaction Items bought

1 2 3 4

bread, milk, cheese bread, cheese jam, milk milk, ghee

Now here we can see maximum combination of bread and cheese

SUPERVISED CLASSIFICATION
Data mining technique origin from machine learning techniques. It help in predicting whether an individual is likely to respond to a direct mail or not. Identify good risk for granting loans or insurance. Rule for insurance If sex= female & 19<= age<=43 then Life insurance = yes

CLUSTER ANALYSIS
Grouping

data into disjoint sets that are similar in some respect. It also attempts to place dissimilar data in different clusters. For example, in the context of super market data, clustering of sale items to perform effective shelf space organization is a typical application

WEB DATA MINING


It has impact on way we search &find information at home and at work Evaluation of learning Sites Example :- student portal Check login Notes Submit online test Chat page for clarifying doubts

SEARCH ENGINES
It is huge databases of web pages and software package for indexing and retrieving pages that enable users to find information Ranking help the user to choose best one

DATA MINING APPLICATION


Customer Segmentation Market basket analysis Risk management Fraud detection Demand prediction Delinquency Tracking