Вы находитесь на странице: 1из 12

4 Planning and Project Management 63

1 Chapter Objectives 63
1 Planning Your Data Warehouse 64
1 Key Issue 64
1 Business Requirements, Not Technology 66
1 Top Management Support 67
1 Justifying Your Data Warehouse 67
1 The Overall Plan 68
1 The Data Warehouse Project 69
1 How is it Different? 70
1 Assessment of Readiness 71
1 The Life-Cycle Approach 71
1 The Development Phases 73
1 The Project Team 74
1 Organizing the Project Team 75
1 Roles and Responsibilities 75
1 Skills and Experience Levels 77
1 User Participation 78
1 Project Management Considerations 80
1 Guiding Principles 81
CONTENTS ix
1 Warning Signs 82
1 Success Factors 82
1 Anatomy of a Successful Project 83
1 Adopt a Practical Approach 84
1 Chapter Summary 86
1 Review Questions 86
1 Exercises 87
5 Defining the Business Requirements 89
1 Chapter Objectives 89
1 Dimensional Analysis 90
1 Usage of Information Unpredictable 90
1 Dimensional Nature of Business Data 90
1 Examples of Business Dimensions 92
1 Information Packages—A New Concept 93
1 Requirements Not Fully Determinate 93
1 Business Dimensions 95
1 Dimension Hierarchies/Categories 95
1 Key Business Metrics or Facts 96
1 Requirements Gathering Methods 97
1 Interview Techniques 99
1 Adapting the JAD Methodology 102
1 Review of Existing Documentation 103
1 Requirements Definition: Scope and Content 104
1 Data Sources 105
1 Data Transformation 105
1 Data Storage 105
1 Information Delivery 105
1 Information Package Diagrams 106
1 Requirements Definition Document Outline 106
1 Chapter Summary 106
1 Review Questions 107
1 Exercises 107
6 Requirements as the Driving Force for Data Warehousing 109
1 Chapter Objectives 109
1 Data Design 110
1 Structure for Business Dimensions 112
1 Structure for Key Measurements 112
1 Levels of Detail 113
1 The Architectural Plan 113
1 Composition of the Components 114
x CONTENTS
1 Special Considerations 115
1 Tools and Products 118
1 Data Storage Specifications 119
1 DBMS Selection 120
1 Storage Sizing 120
1 Information Delivery Strategy 121
1 Queries and Reports 122
1 Types of Analysis 123
1 Information Distribution 1231
1 Decision Support Applications 123
1 Growth and Expansion 123
1 Chapter Summary 124
1 Review Questions 124
1 Exercises 125
Part 3 ARCHITECTURE AND INFRASTRUCTURE
7 The Architectural Components 127
1 Chapter Objectives 127
1 Understanding Data Warehouse Architecture 127
1 Architecture: Definitions 127
1 Architecture in Three Major Areas 128
1 Distinguishing Characteristics 129
1 Different Objectives and Scope 130
1 Data Content 130
1 Complex Analysis and Quick Response 131
1 Flexible and Dynamic 131
1 Metadata-driven 132
1 Architectural Framework 132
1 Architecture Supporting Flow of Data 132
1 The Management and Control Module 133
1 Technical Architecture 134
1 Data Acquisition 135
1 Data Storage 138
1 Information Delivery 140
1 Chapter Summary 142
1 Review Questions 142
1 Exercises 143
8 Infrastructure as the Foundation for Data Warehousing 145
1 Chapter Objectives 145
1 Infrastructure Supporting Architecture 145
CONTENTS xi
1 Operational Infrastructure 147
1 Physical Infrastructure 147
1 Hardware and Operating Systems 148
1 Platform Options 150
1 Server Hardware 158
1 Database Software 164
1 Parallel Processing Options 164
1 Selection of the DBMS 166
1 Collection of Tools 167
1 Architecture First, Then Tools 168
1 Data Modeling 169
1 Data Extraction 169
1 Data Transformation 169
1 Data Loading 169
1 Data Quality 169
1 Queries and Reports 170
1 Online Analytical Processing (OLAP) 170
1 Alert Systems 170
1 Middleware and Connectivity 170
1 Data Warehouse Management 170
1 Chapter Summary 170
1 Review Questions 171
1 Exercises 171
9 The Significant Role of Metadata 173
1 Chapter Objectives 173
1 Why Metadata is Important 173
1 A Critical Need in the Data Warehouse 175
1 Why Metadata is Vital for End-Users 177
1 Why Metadata is Essential for IT 179
1 Automation of Warehousing Tasks 181
1 Establishing the Context of Information 183
1 Metadata Types by Functional Areas 183
1 Data Acquisition 184
1 Data Storage 186
1 Information Delivery 186
1 Business Metadata 187
1 Content Overview 188
1 Examples of Business Metadata 188
1 Content Highlights 189
1 Who Benefits? 190
1 Technical Metadata 190
xii CONTENTS
1 2 Content Overview 190
1 2 Examples of Technical Metadata 191
1 2 Content Highlights 192
1 2 Who Benefits? 192
12 How to Provide Metadata 193
1 2 Metadata Requirements 193
1 2 Sources of Metadata 194
1 2 Challenges for Metadata Management 196
1 2 Metadata Repository 196
1 2 Metadata Integration and Standards 198
1 2 Implementation Options 199
1 2 Chapter Summary 200
1 2 Review Questions 201
1 2 Exercises 201
Part 4 DATA DESIGN AND DATA PREPARATION
10 Principles of Dimensional Modeling 203
1 1Chapter Objectives 203
1 1From Requirements to Data Design 203
1 2 Design Decisions 204
1 2 Dimensional Modeling Basics 204
1 2 E-R Modeling Versus Dimensional Modeling 209
1 2 Use of CASE Tools 209
1 1The STAR Schema 210
1 2 Review of a Simple STAR Schema 210
1 2 Inside a Dimension Table 212
1 2 Inside the Fact Table 214
1 2 The Factless Fact Table 216
1 2 Data Granularity 217
1 1STAR Schema Keys 218
1 2 Primary Keys 218
1 2 Surrogate Keys 219
1 2 Foreign Keys 219
1 1Advantages of the STAR Schema 220
1 2 Easy for Users to Understand 220
1 2 Optimizes Navigation 221
1 2 Most Suitable for Query Processing 222
1 2 STARjoin and STARindex 223
1 1Chapter Summary 223
1 1Review Questions 224
1 1Exercises 224
CONTENTS xiii
11 Dimensional Modeling: Advanced Topics 225
1 1Chapter Objectives 225
1 1Updates to the Dimension Tables 226
1 2 Slowly Changing Dimensions 226
1 2 Type 1 Changes: Correction of Errors 227
1 2 Type 2 Changes: Preservation of History 228
1 2 Type 3 Changes: Tentative Soft Revisions 230
1 1Miscellaneous Dimensions 231
1 2 Large Dimensions 231
1 2 Rapidly Changing Dimensions 233
1 2 Junk Dimensions 235
1 1The Snowflake Schema 235
1 2 Options to Normalize 235
1 2 Advantages and Disadvantages 238
1 2 When to Snowflake 238
1 1Aggregate Fact Tables 239
1 2 Fact Table Sizes 241
1 2 Need for Aggregates 242
1 2 Aggregating Fact Tables 243
1 2 Aggregation Options 247
1 1Families of STARS 249
1 2 Snapshot and Transaction Tables 250
1 2 Core and Custom Tables 251
1 2 Supporting Enterprise Value Chain or Value Circle 251
1 2 Conforming Dimensions 253
1 2 Standardizing Facts 254
1 2 Summary of Family of STARS 254
1 1Chapter Summary 255
1 1Review Questions 255
1 1Exercises 256
12 Data Extraction, Transformation, and Loading 257
1 1Chapter Objectives 257
1 1ETL Overview 258
1 2 Most Important and Most Challenging 259
1 2 Time-consuming and Arduous 260
1 2 ETL Requirements and Steps 260
1 2 Key Factors 261
1 1Data Extraction 262
1 2 Source Identification 263
1 2 Data Extraction Techniques 263
1 2 Evaluation of the Techniques 270
xiv CONTENTS
1 1Data Transformation 271
1 2 Data Transformation: Basic Tasks 272
1 2 Major Transformation Types 273
1 2 Data Integration and Consolidation 275
1 2 Transformation for Dimension Attributes 277
1 2 How to Implement Transformation 277
1 1Data Loading 279
1 2 Applying Data: Techniques and Processes 280
1 2 Data Refresh Versus Update 282
1 2 Procedure for Dimension Tables 283
1 2 Fact Tables: History and Incremental Loads 284
1 2 ETL Summary 285
1 2 ETL Tool Options 285
1 2 Reemphasizing ETL Metadata 286
1 2 ETL Summary and Approach 287
1 1Chapter Summary 288
1 1Review Questions 288
1 1Exercises 289
13 Data Quality: A Key to Success 291
1 1Chapter Objectives 291
1 1Why is Data Quality Critical? 292
1 2 What is Data Quality? 292
1 2 Benefits of Improved Data Quality 295
1 2 Types of Data Quality Problems 296
1 1Data Quality Challenges 299
1 2 Sources of Data Pollution 299
1 2 Validation of Names and Addresses 301
1 2 Costs of Poor Data Quality 302
1 1Data Quality Tools 303
1 2 Categories of Data Cleansing Tools 303
1 2 Error Discovery Features 303
1 2 Data Correction Features 303
1 2 The DBMS for Quality Control 304
1 1Data Quality Initiative 304
1 2 Data Cleansing Decisions 305
1 2 Who Should be Responsible? 307
1 2 The Purification Process 309
1 2 Practical Tips on Data Quality 311
1 1Chapter Summary 311
1 1Review Questions 312
1 1Exercises 312
CONTENTS xv
Part 5 INFORMATION ACCESS AND DELIVERY
14 Matching Information to the Classes of Users 315
1 1Chapter Objectives 315
1 1Information from the Data Warehouse 316
1 2 Data Warehouse Versus Operational Systems 316
1 2 Information Potential 318
1 2 User-Information Interface 321
1 2 Industry Applications 323
1 1Who Will Use the Information? 323
1 2 Classes of Users 323
1 2 What They Need 326
1 2 How to Provide Information 329
1 1Information Delivery 329
1 2 Queries 331
1 2 Reports 332
1 2 Analysis 333
1 2 Applications 334
1 1Information Delivery Tools 335
1 2 The Desktop Environment 335
1 2 Methodology for Tool Selection 335
1 2 Tool Selection Criteria 338
1 2 Information Delivery Framework 340
1 1Chapter Summary 341
1 1Review Questions 341
1 1Exercises 341
15 OLAP in the Data Warehouse 343
1 1Chapter Objectives 343
1 1Demand for Online Analytical Processing 344
1 2 Need for Multidimensional Analysis 344
1 2 Fast Access and Powerful Calculations 345
1 2 Limitations of Other Analysis Methods 347
1 2 OLAP is the Answer 349
1 2 OLAP Definitions and Rules 349
1 2 OLAP Characteristics 352
1 1Major Features and Functions 353
1 2 General Features 353
1 2 Dimensional Analysis 353
1 2 What are Hypercubes? 357
1 2 Drill-Down and Roll-Up 360
1 2 Slice-and-Dice or Rotation 362
xvi CONTENTS
1 2 Uses and Benefits 363
1 1OLAP Models 363
1 2 Overview of Variations 364
1 2 The MOLAP Model 365
1 2 The ROLAP Model 366
1 2 ROLAP Versus MOLAP 367
1 1OLAP Implementation Considerations 368
1 2 Data Design and Preparation 368
1 2 Administration and Performance 370
1 2 OLAP Platforms 372
1 2 OLAP Tools and Products 373
1 2 Implementation Steps 374
1 1Chapter Summary 374
1 1Review Questions 374
1 1Exercises 375
16 Data Warehousing and the Web 377
1 1Chapter Objectives 377
1 1Web-Enabled Data Warehouse 378
1 2 Why the Web? 378
1 2 Convergence of Technologies 380
1 2 Adapting the Data Warehouse for the Web 381
1 2 The Web as a Data Source 382
1 1Web-Based Information Delivery 383
1 2 Expanded Usage 383
1 2 New Information Strategies 385
1 2 Browser Technology for the Data Warehouse 387
1 2 Security Issues 389
1 1OLAP and the Web 389
1 2 Enterprise OLAP 389
1 2 Web-OLAP Approaches 390
1 2 OLAP Engine Design 390
1 1Building a Web-Enabled Data Warehouse 391
1 2 Nature of the Data Webhouse 391
1 2 Implementation Considerations 393
1 2 Putting the Pieces Together 394
1 2 Web Processing Model 394
1 1Chapter Summary 396
1 1Review Questions 396
1 1Exercises 396
CONTENTS xvii
17 Data Mining Basics 399
1 1Chapter Objectives 399
1 1What is Data Mining? 400
1 2 Data Mining Defined 401
1 2 The Knowledge Discovery Process 402
1 2 OLAP Versus Data Mining 404
1 2 Data Mining and the Data Warehouse 406
1 1Major Data Mining Techniques 408
1 2 Cluster Detection 409
1 2 Decision Trees 411
1 2 Memory-Based Reasoning 413
1 2 Link Analysis 415
1 2 Neural Networks 417
1 2 Genetic Algorithms 418
1 2 Moving into Data Mining 419
1 1Data Mining Applications 422
1 2 Benefits of Data Mining 423
1 2 Applications in Retail Industry 424
1 2 Applications in Telecommunications Industry 425
1 2 Applications in Banking and Finance 426
1 1Chapter Summary 426
1 1Review Questions 426
1 1Exercises 427
Part 6 IMPLEMENTATION AND MAINTENANCE
18 The Physical Design Process 429
1 1Chapter Objectives 429
1 1Physical Design Steps 430
1 2 Develop Standards 430
1 2 Create Aggregates Plan 431
1 2 Determine the Data Partitioning Scheme 431
1 2 Establish Clustering Options 432
1 2 Prepare an Indexing Strategy 432
1 2 Assign Storage Structures 432
1 2 Complete Physical Model 433
1 1Physical Design Considerations 433
1 2 Physical Design Objectives 433
1 2 From Logical Model to Physical Model 434
1 2 Physical Model Components 435
1 2 Significance of Standards 436
1 1Physical Storage 438
xviii CONTENTS
1 2 Storage Area Data Structures 439
1 2 Optimizing Storage 440
1 2 Using RAID Technology 442
1 2 Estimating Storage Sizes 442
1 1Indexing the Data Warehouse 443
1 2 Indexing Overview 443
1 2 B-Tree Index 445
1 2 Bitmapped Index 446
1 2 Clustered Indexes 448
1 2 Indexing the Fact Table 448
1 2 Indexing the Dimension Tables 449
1 1Performance Enhancement Techniques 449
1 2 Data Partitioning 449
1 2 Data Clustering 450
1 2 Parallel Processing 450
1 2 Summary Levels 451
1 2 Referential Integrity Checks 451
1 2 Initialization Parameters 451
1 2 Data Arrays 452
1 1Chapter Summary 452
1 1Review Questions 452
1 1Exercises 453
19 Data Warehouse Deployment 455
1 1Chapter Objectives 455
1 1Major Deployment Activities 456
1 2 Complete User Acceptance 456
1 2 Perform Initial Loads 457
1 2 Get User Desktops Ready 458
1 2 Complete Initial User Training 459
1 2 Institute Initial User Support 460
1 2 Deploy in Stages 460
1 1Considerations for a Pilot 462
1 2 When Is a Pilot Data Mart Useful? 462
1 2 Types of Pilot Projects 463
1 2 Choosing the Pilot 465
1 2 Expanding and Integrating the Pilot 466
1 1Security 467
1 2 Security Policy 467
1 2 Managing User Privileges 468
1 2 Password Considerations 469
1 2 Security Tools 469
CONTENTS xix
1 1Backup and Recovery 470
1 2 Why Back Up the Data Warehouse? 470
1 2 Backup Strategy 471
1 2 Setting Up a Practical Schedule 472
1 2 Recovery 472
1 1Chapter Summary 473
1 1Review Questions 474
1 1Exercises 474
20 Growth and Maintenance 477
1 1Chapter Objectives 477
1 1Monitoring the Data Warehouse 478
1 2 Collection of Statistics 478
1 2 Using Statistics for Growth Planning 480
1 2 Using Statistics for Fine-Tuning 480
1 2 Publishing Trends for Users 481
1 1User Training and Support 481
1 2 User Training Content 482
1 2 Preparing the Training Program 482
1 2 Delivering the Training Program 484
1 2 User Support 485
1 1Managing the Data Warehouse 487
1 2 Platform Upgrades 487
1 2 Managing Data Growth 488
1 2 Storage Management 488
1 2 ETL Management 489
1 2 Data Model Revisions 489
1 2 Information Delivery Enhancements 489
1 2 Ongoing Fine-Tuning

Вам также может понравиться