Вы находитесь на странице: 1из 25

DATA MINING

PRESENTED BY : SIDDHARTH KHARE TANMAY NAGAR AMAN SINGH VIPLAV VINOD TARUN KEWALRAMANI

DATA MINING..?

Process of semi-automatically analyzing large databases to find patterns that are:


valid:

hold on new data with some certainty novel: non-obvious to the system useful: should be possible to act on the item understandable: humans should be able to interpret the pattern

DATA MINING TWO MAIN COMPONENTS

Knowledge Discovery Concrete information gleaned from known data. Data you may not have known, but which is supported by recorded facts.

Knowledge Prediction Uses known data to forecast future trends, events, etc. (i.e: Stock market predictions) E.g. neural networks are inherently geared towards prediction and pattern recognition.

SCOPE OF DATA MINING


Data is being produced Data is being warehoused The computing power is available The computing power is affordable The competitive pressures are strong

SOURCES OF DATA MINING

Databases

(most obvious) Text Documents Computer Simulations Social Networks

TYPES OF DATA
RAW DATA META DATA PREPOSITIONAL DATA RELATIONAL DATA

SOME BASIC OPERATIONS

Predictive:
Regression Classification Collaborative

Filtering

Descriptive:
Clustering

/ similarity matching Association rules and variants Deviation detection

SOME BASIC OPERATIONS (CONTD.)


Classification : Learn a method for predicting the


instance class from pre-labeled (classified) instances

Training Data: used to build the model Test data: used to validate the model (determine accuracy of the model) Given data is usually divided into training and test sets.

SOME BASIC OPERATIONS (CONTD.)


Regression : Predict a value of a given continuous valued variable (dependent variable) based on values of other variables (independent variables) Examples:

predicting

sales volumes of new product based on advertising expenditure Time series prediction of stock market indices.

SOME BASIC OPERATIONS (CONTD.)

Collaborative filtering : Mix of the two strategies, group


based on common items purchased

Clustering : Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that
data

points in one cluster are more similar to one another data points in separate clusters are less similar to one another.

SOME BASIC OPERATIONS (CONTD.)

Association Rule : Given a set of records, each of which contain some number of items from a given collection
produce

dependency rules which will predict occurrence of an item based on occurences of other items

Deviation Detection : Detects the deviation from the mean values.

KNOWLEDGE DISCOVERY PROCESS Goal


understanding
Data

the application domain.

selection, acquisition, integration Data cleaning


noise,
Exploratory

missing data, outliers,etc.


data analysis

dimensionality
Data

reduction, transformations
mining

selecting selecting
Testing

appropriate method algorithm

and verification Interpretation Consolidation and use

KNOWLEDGE DETECTION PROCESS


Integration Interpretation & Evaluation Knowledge

Knowledge
Raw Data
__ __ __ __ __ __ __ __ __

Patterns and Rules

Understanding

DATA Ware house

Transformed Data Target Data

MAJOR APPLICATION AREAS


Industry Finance Insurance Telecommunication Transport Consumer goods Data Service providers Utilities Application Credit Card Analysis Claims, Fraud Analysis Call record analysis Logistics management promotion analysis Value added data Power usage analysis

SOME APPLICATIONS

Search Engine : Google success is due to its algorithm which uses mainly links to the page Diagnostics : Helps to predict the rate of molecule generation within the body to note any abnormal symptoms.

Molecular

SOME APPLICATIONS (CONTD.)

Direct

Marketing and CRM : Most major direct marketing companies are using modeling and data mining Most financial companies are using customer modeling Modeling is easier than changing customer behaviour

SOME APPLICATIONS (CONTD.)


Learning Combinatorial/Game Data Mining Good for analyzing winning strategies to games, and thus developing intelligent AI opponents. (ie: Chess) Business Strategies Market Basket Analysis Identify customer demographics, preferences, and purchasing patterns. Risk Analysis Product Defect Analysis Analyze product defect rates for given plants and predict possible complications (read: lawsuits) down the line.
AI/Machine

SOME APPLICATIONS (CONTD.)

Data Mining for Financial Data Analysis :


Financial data collected in banks and financial institutions are often relatively complete, reliable, and of high quality Design and construction of data warehouses for multidimensional data analysis and data mining View the debt and revenue changes Access statistical information such as max, min, total, average, trend, etc. Loan payment prediction/consumer credit policy analysis

SOME APPLICATIONS (CONTD.)


Increasing sales

at a Retail Store :

Identify customer buying behaviors Discover customer shopping patterns and trends Improve the quality of customer service Achieve better customer retention and satisfaction Enhance goods consumption ratios Design more effective goods transportation and distribution policies

SOME APPLICATIONS (CONTD.)


Security and Fraud Detection : Credit Card Fraud Detection Money laundering

FAIS

(US Treasury) Sonar system

Securities Fraud
NASDAQ

Phone fraud
AT&T,

Bell Atlantic, British Telecom/MCI

SOME DATA MINING TOOLS


SGI Mine Set IBM Intelligent Miner SAS Enterprise Miner Microsoft SQL Server 2000 DB Miner (DB Miner Technology Inc.)

ADDITIONAL DATA MINING TECHNIQUES

Video Data Mining Audio Data Mining

TECHNOLOGY LIFE CYCLE

Data mining is at Chasm!? Existing data mining systems are too generic Need business-specific data mining solutions and smooth integration of business logic with data mining functions

CRITICISM OF DATA MINING


Data Mining will invade privacy generate millions of false positives Collection of personal data may be beneficial for companies and consumers, there is also potential for misuse

But can it be effective?

CONCLUSION
There is always some pitfalls in every technology..just depends on the intension in which it is used .it can no doubt gives a cutting edge to the organizations.

Вам также может понравиться