Вы находитесь на странице: 1из 17

Data Warehousing and Data Mining

Data Warehousing
Data warehousing is subject-oriented, integrated, time-variant and non-volatile collection of data in support of managements decision making process

Data Warehousing
Subject Oriented information on each factors (Marketing, Finance, HR, Production) Integrated Collected, stored and merged together Time-variant identified with a particular time period Non-volatile stable data; data added and rarely removed

Database vs. Data warehouse


Database Data Warehouse

Used to collect, store and record data of all transactions Designed and optimized to respond to fundamental / standard queries For OLTP (On Line Transaction Processing)

Used to combine data from all business transactions, categorize and integrate Designed to respond to analytical questions that are critical to business For OLAP (On Line Analytical Processing)

Data warehouses are databases while Databases are not data warehouses

Data Warehousing
Data Marts : Small data warehouse which holds information about a particular subject. Subset of data warehouse in which a summarized or highly focused portion of data is placed for a specific population of users

Data Warehousing

Data Warehousing
Data Sources
Production Data : current input from various TPS Internal Data : about customers, products Archived Data : old data / previous TPS data External Data : from business environment

Data Warehousing
General process in a data warehousing:
Business data External data

ETL

DW

ETL is used to represent data movement and transformation process or Data Staging process Extract information and inputs from business transactions and external environment Transform Standardizing, optimizing, cleansing to fit the business need Load Recording and Storing into target systems in the data warehouse

Data Warehousing
Data Storage Information Delivery
Metadata : Index (similar to dictionary)
Operational Metadata Extraction and Transformation Metadata End-user Metadata

Data Warehousing

Data Mining
The process of discovering meaningful patterns, trends and relationships often previously unknown by sifting large amounts of data using various techniques, that can be used to predict future behaviour and helps in decision making.

Data Mining
Discovering meaningful patterns, trends and relationships Large amount of data Various techniques Predict future behaviour Helps in decision making

Data Mining
Data Mining Algorithm Set of rules and sequential process to be followed while extracting relevant data from a data warehouse.
Eg : Requirement : Determine all credit card holders who has their birthday during the month of March. Algorithm : If MM in DDMMYYYY is equal to March, display First Name, Last name and Contact number

Data Mining
Data Mining algorithms need large amount of data, at the detailed level. Data stored after ETL process in data warehouses are ideal to be mined upon Predicts the future course of action based on why / how a trend is happening Data driven and not user driven Large number of dimensions

Data Mining Techniques


Cluster Detection Decision Trees Link analysis
Associations, Sequential patterns

Data Visualization

Data Mining
Application : Customer profiling Fraud detection Market Basket Analysis Medical Future of a person

Data Mining Softwares


DataLogic/R (Reduct Systems) IDIS:2 (Intelligent Ware) IBM Intelligent miner Pilot discovery Server KnowledgeSeeker (Angoss)