Академический Документы
Профессиональный Документы
Культура Документы
Composed of rules, procedures and functions that enable the data warehouse to work and fulfill the business requirements. Made up of technology that empowers the data warehouse
Source Data
Operational data source1
D A T A A C Q U I S I T I O N
Information Delivery
Meta-data
Lightly summarized data
Manage
Detailed data
DBMS
Data mining
(Second Tier)
Distinguishing Characteristics
Different Objectives and Scope The DW architecture must have components that will work to provide data to users in large volumes in a single session.
Distinguishing Characteristics
Different Objectives and Scope - Contd.. Defining the scope of the DW is also difficult. Factors that is to be considered is Number and extent of data sources. Data granularity and data volumes. Impact of DW on existing Operational systems Scope is measured in terms of data transformation and integration functions
Distinguishing Characteristics
Data Content Read only data in the DW is the primary component in the architecture. Architecture should support business subjects as well as high data volumes.
Distinguishing Characteristics
Complex Analysis and Quick Response Architecture should support complex analysis of strategic information since information retrieval is complex. Users must be able to: Drill down, roll up, slice and dice the data. Review results in different output options Results in tabular format as well as graphical. Finally the architecture should provide a platform for making rapid decisions and to deal with situations quickly
Distinguishing Characteristics
Flexible and Dynamic DW architecture should be flexible enough to accommodate additional requirements as and when they require. Ex: The missed items in the business requirements or those that arises because of the change in the business conditions.
Distinguishing Characteristics
Meta Data- Driven Holds data about every phase of the data movement. It interleaves with and connects other components.
Questions
1.
2.
3.
What are the three major architectural components? State any three characteristics of DW architecture. What is the use of Meta data?
Answers
1.
2.
3.
Data acquisition, Data Storage, Data Delivery. Architecture should support different objectives and scope, read only data, complex analysis and quick response, Flexible and dynamic, and Meta data driven. Meta data details every phase of data movement and it interleaves and connects with other components.
TECHNICAL ARCHITECTURE
The technical architecture of a DW is the complete set of functions and services provided within its components.. Includes the procedures and rules that are required to perform the functions and provide the services. Includes the data stores needed for each component to provide the services Tools are the means to implement the architecture
Components
Data Acquisition
Data Flow
Data Extraction, Data Transformation, Data Staging
Data Storage
Data Flow
Flow, Data Groups, Data Repository
Information delivery
Data Flow
Flow, Service locations, Data Stores
Metadata
Data Extraction
Data Transformation
Data Staging
DATA ACQUISITION
1.
2.
3.
Extract data from data sources Move to the staging area Prepare the data for loading into the DW Components source data and data staging Data Flow Flow Data flow begins at the data sources and pauses at the staging area. After transformation and integration, the data is ready for loading into DW repository.
Data Sources Primary data source consists of * Enterprises operational systems (+) Consolidated data, ready to use (-) Proprietary tools required to extract data * Legacy data resides on hierarchical or network databases
Data Flow Contd Staging Area Each extracted file is examined, reviewed for business rules, perform various transformation functions , sort and merge data, resolve inconsistencies and cleanse the data. This data temporarily resides in staging area before loaded into the DW repository.
List of functions and Services Data Extraction Select data sources and determine the types of filters to be applied to individual sources Generate automatic extract files from operational systems using replication Create intermediate files to store selected data to be merged later Transport extracted data from multiple platforms
Provide automated job control services for creating extract files Reformat input from outside sources, departmental data files, databases etc Generate common application code for data extraction Resolve inconsistencies for common data elements from multiple sources
Map input data to data for DW repository Clean data, de duplicate and merge/purge De normalize extracted data structures Convert data types Calculate and derive attribute values Check for referential integrity Aggregate data as needed Resolve missing values Consolidate and integrate data
Provide back up and recovery for staging area repositories Sort and merge files Create files as input to make changes to dimension tables Create and populate database if relational database Preserve audit trail to relate each data item in the DW to input source Resolve and create primary and foreign keys for load tables Consolidate data sets and create flat files If storage is relational extract load files
Data Storage
Data Marts
Deals with the entire process loading the data from the staging area into the DW repository.
Data Flow
* Bottom-up
Independent data marts to DW
Set of files or tables containing data for a full refresh Meant for the initial loading of the DW set of files or tables containing ongoing incremental loads
* Second group
All DW databases are relational Capabilities of RDBMS is available for processing of data
Load data for full refreshes of DW tables Perform incremental loads at specified intervals Support loading into multiple tables Optimize the loading process Provide automated job control services for loading data into DW Provide backup and recovery Provide security Monitor and fine tune the database Periodically archive data from the database.
Multidimensional Database
OLAP
DM
REPORT / QUERY
Flow
May be top-down or bottom-up User query data is transformed into information either in the form of regular or ad hoc report
Data Flow
Service Location
* Query service may be from user desktop / an application server /database * A comprehensive reporting service is needed for producing reports at regular intervals
Data Flow
Data Stores The following intermediary data stores are used for information delivery:
Proprietary temporary stores to hold results of individual queries and reports for repeated use Data stores for standard reporting Proprietary multidimensional databases.
Provide security to control information access Monitor user access to improve service and for future enhancements. Allow users to browse DW content Simplify access by hiding internal complexities of data storage from users. Automatically reformat queries for optimal execution Govern Queries and control runaway queries Store result set of queries and reports for future use. Provide multiple levels of granularity Provide event triggers to monitor data loading Make provision to perform complex analysis thru OLAP