Академический Документы
Профессиональный Документы
Культура Документы
An Enterprise Data warehouse is a relational DB which is specially designed for analyzing the business
and making decisions to achieve the business goals and responding to business problems ,but not
designed for business transactional processing
A Data warehouse is a concept of consolidating the data from multiple OLTP data bases
1.Low range
2.Mid range
3.High range
Example:Ms-Access
Can organized and managed Tera bytes and Peta Bytes of information
Example:Teradata,Netezza,GreenPlum,Hadoop.
There are two types of data storage patterns which are supported by relational DB
7.Recomended for data warehousing for small and medium scale enterprises with storage capacity of
gigabytes
2.Storing nothing architecture (every processor has dedicated memory& disk that is not shared by
another processor)
5.Designed only for Building Enterprise data warehouse but not for OLTP
Example:Teradata,Netteza,Hadoop,green plum
1.Data base that supports enormous storage capacity(Billions of rows and Tera bytes)
6.DB that supports mature optimizers to handle complex SQL Queries( Run the queries more faster with
less system resource usage
8.100% data without data loss even S/W,H/W components are down
10.That DB supports low TCO (total cost of owner ship) ease to set up ,administrate & Manage
11. Single DB server that can provide access to hundreds of users concurrently
Data Acquisition:
It is a process of extracting the data from multiple source systems,transforming the data into
consistent format and load in to a target system,To implement the ETL process we need ETL tools
ETL applications are developed using simple graphical user interface,point& click features
Data Cleansing:
Data Merging:
1.Join
2.Union
Data warehouse:
1.Data warehouse is a relational DB that is used to store the historical data for query& Analysis
SOR-->Source of records
1. Dimension Table
2. Fact Table
1. Dimension Table:
Customer,Product,Stores,Employees,Pramotions,Time
Fact Table:
Sales,Purchase,Inventry
1. SA_LoanTransaction Fact
2. CC_Transaction Fact
3. CC_Statement Fact
Fact table consists of Keys and Measures and Fact table consist of Composite Primary key
Composite Primary Key Store Key(X) Prod Key(X) Date Key(X) Revenue(X) S1 P1 D1
3000 S1 P2 D1 2000 S2 P1 D1 2000
3.Fact less Fact table acts as a Bridge between the Dimensional tables
Example of Fact less Fact table: Employee Attendence Fact less Fact
Aud Name
Sponsor Name
Month Key
Paticipant Name
Event Name Aud Type Contribution Qtr Gender Event Type Aud Mgr Address Year Address Event Desc
Aud Address
It consist of semi additive facts and non additive facts it describes states of things in a particular
instance of time
Types of Facts:
1. Additive Facts
1.Additive Fact: Business measurements in a fact table that can be summed up through all of the
dimensional Keys
Fact Table Store Key Prod Key Date Key Revenue S1 P1 12-Jan-15 600 S1 P2 12-Jan-15 400 S2 P2 12-Jan-
15 800 S2 P3 13-Jan-15 500 S3 P1 13-Jan-15 700 S3 P3 14-Jan-15 900
Revenue Report By Date Store Key Revenue Product Key Revenue Date Key Revenue S1 1000 P1 1300
12-Jan-15 1800 S2 1300 P2 1200 13-Jan-15 1200 S3 1600 P3 1400 14-Jan-15 900 Bank Fact table:
Semi Additive Fact: Business measurements in a fact table that can be summed up across only few
Dimensional Keys
Acct Id
Profit Margin 21653 12-Jan-15 700000 - 21654 12-Jan-15 400000 - 21653 13-Jan-15
900000 - 21654 13-Jan-15 600000 - Reports:
Balance By Acct Id Acct Id Balance Balance 21653 1600000 900000 21654 1000000 600000
3.Non Additive Fact:Business measurements in a fact table that cannot be summed up across any
Dimension KeysNote: In a Fact table percentage are always non additive
Types of Dimensions:
1. Confirmed Dimension
2. Degenerated Dimension
3. Shrunken Dimension
4. Junk Dimension
5.Dirty Dimension
Types of Dimensions:
Conformed Dimension: A Dimension that is shared across multiple Fact table that is called Conformed
Dimension Or Dimension that is used to join Data mart
Banking Domain:
Degenerated Dimension:
If a fact table act as dimension and it’s shared with another fact table (or) maintains foreign key in
another fact table .such a table called degenerated dimension.
Shrunken Dimension:
Or
Junk Dimension:
Cardinality is no of unique values in a column or Cardinality expresses the minimum and the maximum
no of instances of an entity ‘B’ that can be associated to an instance of Entity ‘A’
Dirty Dimension:
If a record occurs more than one time in a table by the difference of non key attribute such a table is
called dirty dimension
Orders:
Order Id
Order Date
Payment Mode
Comm/Non Comm Amount 111 - Cash Cash No - 112 - Cash Cash No - 113
- Credit Master No - 114 - Cash Cash No - 115 - Cash Cash No - 116 -
Credit Visa Yes - 117 - Cash Cash No -
Order Id
Order Date
Or
1.SCD TYPE1
2.SCD TYPE-II
3.SCD TYPE-III
1. SCD TYPE-I:
CKEY CID CNAME DOB 101 11 BEN 12-JAN-1967 102 12 ALEN 15-FEB-1966
SCD TYPE-II:
PRODUCTS PID PNAME PRICE EFF_DATE 11 ABC 300 12-JAN-10 12 PQR 270 15-JAN-10 PRODUCT PRICE
OF 12 CHANGED 199 27-AUG-11
PKEY PID PNAME PRICE EFF_DATE END_DATE 100 11 ABC 300 12-JAN-10 101 12 PQR 270 15-JAN-10
26-AUG-11 102 12 PQR 199 27-AUG-11
CURR LOC PREVLOC 101 11 BEN HYD - 102 12 TOM BNG CHE
CURR LOC PREVLOC 101 11 BEN KER HYD 102 12 TOM BNG CHE
Role Play Dimension: Dimension that is recycled in multiple applications within the DB
Data Modeling:
Model: Business presentation of the structure of the data in one or more database
OLTP:ER-Mode is used
Model is normalized
Types of Schema:
1.Star Schema
3.Galaxy Schema
1.Star Schema: In a star schema a centre of a star is Fact table and corners are Dimension tables
Geography Geoid City State Country Region 111 Hyd Ts India Asia 112 VSP Ap India Asia
Cid Cname Gender Geoid City State Country Region 11 C1 1 111 Hyd Ts India Asia 12 C2 1 111 Hyd Ts
India Asia 13 C3 0 112 VSP Ap India Asia 14 C4 1 111 Hyd Ts India Asia
Star schema use more space than Snow flake schema
Galaxy Schema:
1.B*Tree Index
2.BitMap Index
1.B*Tree Index
2.BitMap Index