Академический Документы
Профессиональный Документы
Культура Документы
AND
DATA MINING
A Comprehensive guide for students and IT Professionals
(Choice Based Credit System (CBCS) Pattern) New Syllabus
( For B. Sc Computer Science, B.Sc., Software Computer Science, B.Sc. ISM, B.Sc. IT,
B.Sc. Software System, B.Sc. Software Engineering, BCA, M.Sc. Computer Science,
M.Sc. Information Technology, M.Sc. Information System and Management, M.Sc.
Software Engineering, MCA, B.E.CSE, B.Tech IT, M.E CSE, M.Tech IT, M.Phil., and
IT Professionals.)
By
CONTENTS
Preface
Acknowledgement
PART- I
DATA MINING
Chapter 1
Introduction
Chapter 3
3.1 Introduction
3.2 Data selection
3.3 Cleaning
3.4 Coding
3.5 Data mining
3.5.1 Preliminary analysis of the data set using traditional query tools
3.5.1.1 Visualization techniques
3.5.1.2 Likelihood and distance
3.5.1.3 OLAP tools
3.5.1.4 K-nearest neighbor
3.5.1.5 Decision Trees
3.5.1.6 Association Rules
3.5.1.7 Neural networks
3.5.1.8 Genetic algorithms
3.6 Reporting
Summary
Review Questions
Chapter- 4
KDD Environment
Data Mining
Advanced Databases
11.1 Introduction
11.2 A Statistical Perspective on Data Mining
11.2.1 Point Estimation
11.2.2 Models Based on Summarization
11.2.3 Bayes Theorem
11.2.4 Hypothesis Testing
11.2.5 Regression and Correlation
11.3 Similarity Measures
11.4 Decision Trees
11.5 Neural Networks
Data Preprocessing
12.1 1ntroduction
12.2 Why preprocess the data / Need for preprocessing
12.3 Data Preprocessing Techniques / Major Tasks in Data Preprocessing
12.4 Data Cleaning
12.4.1 Missing Data / Values
12.4.1.1 Methods of handling missing data
12.4.2 Noisy Data
12.4.2.1 How to Handle Noisy Data?
12.4.3 Outlier Analysis
12.4.4 Regression
12.5 Data Cleaning as a Process
12.5.1 Discrepancy detection
12.5.2 Discrepancy Detection Tools
12.5.3 Data Transformation
12.5.4 Data Transformation Tools
12.6 Data Integration
12.6.1 Issues to be considered in Data Integration
12.6.1.1 Schema integration
12.6.1.2 Reduction
12.6.1.3 Detecting and resolving data value conflicts
12.6.2 Handling Redundant Data in Data Integration
12.7 Data Transformation
12.7.1 Methods of Data Normalization
12.7.1.1 Min-max normalization
12.7.1.2 z-score normalization
12.7.1.3 Normalization by decimal scaling
12.8 Data Reduction
12.8.1 Data Reduction Strategies
12.8.1.1 Data Cube Aggregation
12.8.1.2 Attribute Subset Selection
12.8.1.3 Dimensionality Reduction
Association Rules
Chapter 14
Summary
Review Questions
Chapter 15
Summary
Review Questions
Chapter 16
16.1 Introduction
Classification
17.1 Classification
17.1.1 ClassificationA Two-Step Process
17.1.2 Prediction
17.1.3 Issues regarding classification and prediction
17.1.4 Comparing Classification and Prediction Methods
17.2 Classification by decision tree induction
17.2.1 Decision Tree Induction
17.2.2 Attribute Selection Measure
17.2.3 Information Gain (ID3/C4.5)
17.2.4 Gini Index (IBM IntelligentMiner)
17.2.5 Extracting Classification Rules from Trees
17.2.6 Avoid Overfitting in Classification
Clustering
Cluster Analysis
19.9.3 CURE
19.9.4 ROCK
19.9.5 CHAMELEON
19.10 Density-Based Methods
19.10.1 DBSCAN
19.10.2 OPTICS
19.10.3 DENCLUE
19.11 Grid-Based Methods
19.11.1 STING
19.11.2 WaveCluster
19.11.3 CLIQUE
19.12 Model-Based Clustering Methods
19.12.1 Expectation Maximization (EM)
19.12.2 Conceptual clustering
19.12.3 Neural network approaches
19.13 Outlier Analysis
19.13.1 Outlier Discovery: Statistical Approaches
19.13.2 Outlier Discovery: Distance-Based Approach
19.13.3 Outlier Discovery: Deviation-Based Approach
Summary
Review Questions
Chapter 20
Chapter 22
22.1 Introduction
22.2 Characteristics of Data Warehouse
Data warehousing
Chapter 25
28.1Tool categories
28.1.1 Reporting tools
28.1.2 Managed Query Tools
28.1.3 Executive information tools
28.1.4 OLAP tools
28.1.5 Data mining tools
28.2 Need for application
28.3 Cognos impromptu
28.4Applications
28.4.1PowerBuilder
Summary
Review Questions
Chapter 29
29.1 Introduction
29.2 Need for OLAP
29.3 Multidimensional data model
29.3.1 From Tables and Spreadsheets to Data Cubes
29.4 OLAP Guidelines / OLAP Product Evaluation Rules
29.5 Data Warehouse Schema / OLAP Schema
29.5.1 Star Schema
29.5.2 Star Schema Keys
29.5.3 Advantages of Star schema
29.5.4 Snow Flake Schema
Security
30.1 Introduction
30.2 Requirements
30.2.1 User Access
30.2.2 Legal Requirements
30.2.3 Audit Requirements
30.2.4 Network Requirements
30.2.5 Data Movement
30.2.6 Documentation
30.2.7 High-Security Environments
30.3 Performance Impact of Security
30.3.1 Views
30.3.2 Data Movement
30.4 Security Impact on Design
30.4.1 Application Development
30.4.2 Database Design
30.4.3 Testing
Summary
Review Questions
Chapter 31
31.1 Introduction
31.2 Definition of Types of System
31.3 Defining the SLA
31.3.1 User Requirements
31.3.2 System Requirements
Summary
Review Questions
Chapter 32
32.1 Introduction
32.2 Day-To Day Operations of the Data Warehouse
32.3 Overnight Processing
Summary
Review Questions
Chapter 33
Capacity Planning
33.1 Process
33.2 Estimating the Load
33.2.1 Initial Configuration
33.2.2 How much CPU bandwidth
33.2.3 How Much Memory
33.2.4 How much disk?
Summary
Review Questions
Chapter 34
Review Questions
Chapter 35
35.1 Introduction
35.1.1 Types of Backup
35.2 Data Warehouse Recovery Models
35.3 Define Backup and Recovery Strategy
35.4 Security Impact on Design of Data Warehouse
35.4.1 Application Development
35.4.2 Database Design
35.4.3 Testing
34.5 Disaster Recovery
Summary
Review Questions
APPENDIX A; Glossary
APPENDIX B: Two marks Questions with Answers
APPENDIX C: Past University Question Papers
BIBLIOGRAPHY