Вы находитесь на странице: 1из 6

DataWareHousing - ETL Project Life Cycle

Posted by Suresh at 1:08 AM Labels: Data Warehouse, ETL, Life Cycle 0 comments

The following are the different phases involved in a ETL project development life cycle.
1) Requirement Gathering a) Business Requirement Collection ( BRD ) b) System Requirement Collection ( SRD ) 2) Design Phase a) High Level Design Document ( HRD ) b) Low level Design Document ( LLD ) 3) Development Phase a) Mapping Design b) Code Review c) Peer Review 4) Testing a) Unit Testing b) System Integration Testing. c) USer Acceptance Testing ( UAT ) 5) Pre - Production 6) Production ( Go-Live )

Requirement Gathering:
---------------------------------------------

Business Requirement Collection ( BRD ):


-> The business requirement gathering start by business Analyst, onsite technical lead and client business users. -> In this phase,a Business Analyst prepares Business Requirement Document ( BRD ) (or) Business Requirement Specifications ( BRS ) -> BR collection takes place at client location. -> The o/p from BR Analysis are -> BRS Business Requirement Specifications -> SRS System Requirement Specifications

System Requirement Collection ( SRD ):


Senior technical people (or) ETL architect will prepare the SRS which contains s/w and h/w details The SRS will includes a) O/S to be used ( windows or unix ) b) RDBMS required to build database ( oracle, Teradata etc ) c) ETL tools required ( Informatica,Datastage ) d) OLAP tools required ( Cognos ,BO ) The SRS is also called as Technical Requirement Specifications ( TRS )

Design Phase:
---------------------------------------> The o/p from design and planning phase is a) HLD ( High Level Design ) Document b)LLD ( Low Level Design ) Document

HLD ( High Level Design ) Document : An ETL Architect and DWH Architect participate in designing a solution to build a DWH. An HLD document is prepared based on Business Requirement.

LLD ( Low Level Design ) Document : Based on HLD,a senior ETL developer prepare Low Level Design Document The LLD contains more technical details of an ETL System. An LLD contains data flow diagram ( DFD ), details of source and targets of each mapping. An LLD also contains information about full and incremental load. After LLD then Development Phase will start

Development Phase :
-------------------------------------------------Mapping: -> Based an LLD, the ETL team will create mapping ( ETL Code ) -> After designing the mappings, the code ( Mappings ) will be reviewed by developers. Code Review :-> Code Review will be done by developer. -> In code review,the developer will review the code and the logic but not the data. -> The following activities takes place in code review

-> You have to check the naming standards of transformation,mappings of data etc. -> Source and target mapping ( Placed the correct logic or not in mapping ) Peer Review :-> The code will reviewed by your team member ( third party developer )

Testing:
--------------The following various types testing carried out in testing environment. 1) Unit Testing 2) Development Integration Testing 3) System Integration Testing 4) User Acceptance Testing Unit Testing :-> A unit test for the DWH is a white Box testing,It should check the ETL procedure and Mappings. -> The following are the test cases can be executed by an ETL developer. 1) Verify data loss 2) No.of records in the source and target 3) Dataload/Insert 4) Dataload/Update 5) Incremental load 6) Data accuracy 7) verify Naming standards. 8) Verify column Mapping -> The Unit Test will be carried by ETL developer in development phase. -> ETL developer has to do the data validations also in this phase. Development Integration Testing -> Run all the mappings in the sequence order. -> First Run the source to stage mappings. -> Then run the mappings related to dimensions and facts. System Integration Testing :-> After development phase,we have to move our code to QA environment. -> In this environment,we are giving read-only permission to testing people. -> They will test all the workflows.

-> And they will test our code according to their standards. User Acceptance Testing ( UAT ) :-> This test is carried out in the presence of client side technical users to verify the data migration from source to destination.

Production Environment :---------------------------------> Migrate the code into the Go-Live environment from test environment ( QA Environment ).

Thanks: shivakrishnas
LINKS TO THIS POST

Fact Table
Posted by Suresh at 12:36 AM Labels: Data Warehouse, Fact Table 0 comments
Fact Table: a fact table consists of the measurements, metrics or facts of a business process.

Measures or Metrics: These represents the business for ex. Transaction is the business in banking and selling is the business in retail.

Grain or Granularity: The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a SALES fact table might be stated as "Sales volume by Day by Product by Store". Each record in this fact table is therefore uniquely defined by a day, product and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the independent facts but generally provide insights at a higher level of aggregation (a region contains many stores).

Types of Facts: There are three types of facts: Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table. y y Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.

Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table. Let us use examples to illustrate each of the three types of facts. The first example assumes that we are a retailer, and we have a fact table with the following columns: Date Store Product Sales_Amount The purpose of this table is to record the sales amount for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week. Say we are a bank with the following fact table: Date Account Current_Balance

Profit_Margin The purpose of this table is to record the current balance for each account at the end of each day, as well as the profit margin for each account for each day. Current_Balance and Profit_Margin are the facts.Current_Balance is a semadditive fact, as it makes sense to add them up for all accounts (what's the total current balance for all accounts in the bank?), but it does not make sense to add them up through time (adding up all current balances for a given account for each day of the month does not give us any useful information). Profit_Margin is a non-additive fact, for it does not make sense to add them up for the account level or the day level.
LINKS TO THIS POST

Data Warehouse Definition


Posted by Suresh at 11:32 PM Labels: Data Warehouse 0 comments

The most popular definition came from Bill Inmon, who provided the following: A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product. Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.
Ralph Kimball provided

a more concise definition of a data warehouse:

A data warehouse is a copy of transaction data specifically structured for query and analysis. This is a functional view of a data warehouse. Kimball did not address how the data warehouse is built like Inmon did, rather he focused on the functionality of a data warehouse.