Академический Документы
Профессиональный Документы
Культура Документы
by George M. Marakas
2003, Prentice-Hall
Chapter 1 - 1
1-1: The Modern Data Warehouse A data warehouse is a copy of transaction data specifically structured for querying, analysis and reporting Note that the data warehouse contains a copy of the transactions. These are not updated or changed later by the transaction system. Also note that this data is specially structured, and may have been transformed when it was placed in the warehouse
2003, Prentice-Hall Chapter 1 - 2
1-2: Data Warehouse Roles and Structures The DW has the following primary functions: It is a direct reflection of the business rules of the enterprise. It is the collection point for strategic information. It is the historical store of strategic information. It is the source of information later delivered to data marts. It is the source of stable data regardless of how the business processes may change.
2003, Prentice-Hall Chapter 1 - 3
2003, Prentice-Hall
Chapter 1 - 4
Data Marts
A data mart is a smaller, more focused data warehouse. It reflects the business rules of a specific business unit. The data mart does not need to cleanse its data because that was done when it went into the warehouse. It is a set of tables for direct access by users. These tables are designed for aggregation. It typically is not a source for traditional statistical analysis.
2003, Prentice-Hall Chapter 1 - 5
Data Mart
Data Delivery
Data Mart
Data Mart
2003, Prentice-Hall
Chapter 1 - 6
Financial Analysis Compare actual to budgets on an annual, monthly and month-to-date basis. Review past cash flow trends and forecast future needs. Identify and analyze key expense generators. Instantly generate a current set of key financial ratios and indicators. Receive near-real-time, interactive financial statements. Human Resource Analysis Evaluate trends in benefit program use. Identify the wage and benefits costs to determine company-wide variation. Review compliance levels for EEOC and other regulated activities. Other Areas Warehouses have also been applied to areas such as: logistics, inventory, purchasing, detailed transaction analysis and load balancing.
2003, Prentice-Hall
Chapter 1 - 8
2003, Prentice-Hall
Chapter 1 - 10
2003, Prentice-Hall
Chapter 1 - 11
One-Time Costs
Hardware Disk CPU Network Terminal analysis Software DBMS Terminal analysis Middleware Network Log utility Processing Metadata Infrastructure
Operational
Ongoing refreshment Integration transformation Data model maintenance Record identification maintenance Metadata infrastructure maintenance Archival of data Data aging within the DW
Integration/transformation processing specification Metadata infrastructure population System of record definition Data dictionary language definition Network transfer definition CASE/Repository interface Initial data warehouse population Data model definition Database design definition
2003, Prentice-Hall
Chapter 1 - 12
Data download and transformation contribute to retrieval costs for every report or screen generated
Source System A
Source System B
Source System C
Source System D
2003, Prentice-Hall
Chapter 1 - 14
Data upload and transformation costs occur just once. Retrieval costs are lower.
Source System A
Source System B
Source System D
2003, Prentice-Hall
Chapter 1 - 15
Chapter 1 - 17
The data mart serves the needs of one business unit, not the organization.
Data Marts Departmentalized Summarized, aggregated data Star join design Limited historical data Limited data volume Requirements driven data Focused on departmental needs Multi-dimensional DBMS technologies
2003, Prentice-Hall
Chapter 1 - 18
2003, Prentice-Hall
Chapter 1 - 20
A General Approach
Although all data mining endeavors are unique, they possess a common set of process steps: 1. Infrastructure preparation choice of hardware platform, the database system and one or more mining tools 2. Exploration looking at summary data, sampling and applying intuition 3. Analysis each discovered pattern is analyzed for significance and trends
2003, Prentice-Hall Chapter 1 - 21
2003, Prentice-Hall
Chapter 1 - 23
In general, a correlation coefficient is a number between 0 and 1 that shows strength of a relationship. Some types of correlation are signed () to also show the direction of the relationship. Even a weak correlation can be interesting, however, if it shows a trend over time.
2003, Prentice-Hall Chapter 1 - 24
B BB B B B BB B B B B
Chapter 1 - 25
2003, Prentice-Hall
Chapter 1 - 26
Dr. John Snow used a map to show the source of cholera was a water pump, thus proving the disease was water borne.
2003, Prentice-Hall
Chapter 1 - 29
One of todays more useful types of visualization is in simulators (both in games and in practice). This is the only way most of us will ever fly a Boeing 747.
2003, Prentice-Hall
Chapter 1 - 31
It is now both cheaper and safer to train commercial pilots on simulators. With good software, pilots can be placed in situations they may not ever see until too late in the cockpit.
2003, Prentice-Hall
Chapter 1 - 32
2003, Prentice-Hall
Chapter 1 - 33
2003, Prentice-Hall
Chapter 1 - 34
2003, Prentice-Hall
Chapter 1 - 35