Вы находитесь на странице: 1из 14

Transaction:

A transaction is a business operation

Technical point of view :

It is a set of DML operations(Insert,UPDATE_DATE,Delete)

OLTP System=OLTP applications(Front end)+Database(Back end)

Data warehousing=ETL Development+BI Development

Enterprise Data warehouse:

An Enterprise Data warehouse is a relational DB which is specially designed for analyzing the business
and making decisions to achieve the business goals and responding to business problems ,but not
designed for business transactional processing

A Data warehouse is a concept of consolidating the data from multiple OLTP data bases

Storage Capacity point of view Relational DB is categorized in to three types

1.Low range

2.Mid range

3.High range

1.Low range DB:

Can organized and managed mega bytes of information

Example:Ms-Access

2.Mid range DB:

Can organized and managed Giga bytes of information

Example:Oracle,Microsoft SQL SERVER,Sybase,DB2,Informix,Postgress SQL

3.High range DB:

Can organized and managed Tera bytes and Peta Bytes of information

Example:Teradata,Netezza,GreenPlum,Hadoop.

Storage point of view data base categorized in to two types.


1.NFS-Normal File storage

2.DFS-Distributed File storage

Data storage Patterns:

There are two types of data storage patterns which are supported by relational DB

1.NFS-Normal File storage

2.DFS-Distributed File storage

NFS-Normal File storage:

1.Single Disk for storing the data

2.Shared every thing architecture(data shared in single disk)

3.Data reads in Sequential

4.All Mid range DB are developed on platform of NFS.

5.Limit scalability or expansion

6.Strongly recommended for OLTP applications

7.Recomended for data warehousing for small and medium scale enterprises with storage capacity of
gigabytes

8.default processor in NFS is only one

9.Disk cant scalable in NFS

Example:Oracle,Sybase,SQL server,DB2,Redbrics,Informix,Postgress SQL

Note:Processor is a S/W component run as .exe

DFS-Distributed File Storage:

1.Multiple disks for storing the data

2.Storing nothing architecture (every processor has dedicated memory& disk that is not shared by
another processor)

3.Data reads in parallel(supports parallelism)


4.Unlimited Scalability

5.Designed only for Building Enterprise data warehouse but not for OLTP

Example:Teradata,Netteza,Hadoop,green plum

Enterprise DWH database Evaluation:

1.Data base that supports enormous storage capacity(Billions of rows and Tera bytes)

2. DB that supports distributed file storage pattern

3.DB that supports nothing architecture

4.Database that supports unlimited scalability(expansion)

5.DB that massively parallel processing

6.DB that supports mature optimizers to handle complex SQL Queries( Run the queries more faster with
less system resource usage

7..DB that supports High Availability(Users can access)

8.100% data without data loss even S/W,H/W components are down

9.Data base that supports parallel loading

10.That DB supports low TCO (total cost of owner ship) ease to set up ,administrate & Manage

11. Single DB server that can provide access to hundreds of users concurrently

Data Acquisition:

It is a process of extracting the data from multiple source systems,transforming the data into
consistent format and load in to a target system,To implement the ETL process we need ETL tools

Types of ETL tools

Two types of ETL tools to build Data Acquisition

1.GUI based ETL tool

2.Program Based ETL tool

Code Based ETL:


ETL applications are developed using programming languages such as

SQL, PLSQL, SAS, Teradata,ETL utilities

GUI Based ETL:

ETL applications are developed using simple graphical user interface,point& click features

Example:Informatca,Data stage, Abnitio,SSIS

MSBI is a package it has(ETL+Reporting=SSIS+SSRS)

Data Cleansing:

It is a process of filtering or rejecting Un wanted source data or records

Data Scrubbing: It is the process of Deriving new attributes or columns

Data Merging:

It is the process of combining the data from multiple source systems

Data merging are two types

1.Join

2.Union

Data warehouse:

1.Data warehouse is a relational DB that is used to store the historical data for query& Analysis

2.Data in a Data warehouse is derived from source system(OLTP/SOI)

SOR-->Source of records

OLTP: (Online transactional Processing)


Computer system that stores time sensitive transaction related data that is processed immediately and
analysis and always kept current.

Difference Between OLTP And Data ware house

Tables in Data Warehouse:

There are two types of tables we have in Data Warehouse

1. Dimension Table

2. Fact Table

1. Dimension Table:

Stores textual or descriptive information about business process

Dimension tables example s in Retail Domain:

Customer,Product,Stores,Employees,Pramotions,Time

Dimension tables example s in Banking Domain:

Applictions, Customers, Products, Branches, Promotions, Time, Billing cycle Dimension

Fact Table:

Fact table stores measurements or metrics of a business process

Fact table examples in Retail Domain:

Sales,Purchase,Inventry

Fact tables examples in Banking Domain

1. SA_LoanTransaction Fact

2. CC_Transaction Fact

3. CC_Statement Fact

Fact table consists of Keys and Measures and Fact table consist of Composite Primary key
Composite Primary Key Store Key(X) Prod Key(X) Date Key(X) Revenue(X) S1 P1 D1
3000 S1 P2 D1 2000 S2 P1 D1 2000

Types of Fact tables:

There are three types of fact tables

1. Fact Less Fact table

2. Cumulative Fact table

3. Snap shot Fact table

1. Fact less Fact table:

1.Fact less Fact table consist of only keys and No Measures

2.Fact less Fact table is to record the events

3.Fact less Fact table acts as a Bridge between the Dimensional tables

Example of Fact less Fact table: Employee Attendence Fact less Fact

Dimension Tables Auditorim Sponsors Time Paticipant Events Aud Id Sponsor


Id Date Key Paticipant Id Event Id

Aud Name

Sponsor Name

Month Key

Paticipant Name

Event Name Aud Type Contribution Qtr Gender Event Type Aud Mgr Address Year Address Event Desc
Aud Address

Fact Table Aud Id Sponsor Id Paticipant id Event id A1 S1 P1 E1 A1 S1 P2 E1 A2


S1 P3 E1
2. Cumulative Fact table:

It consist of additive fact it describes what happened over a period of time

Ex: Sales Fact table, Order Fact table

3. Snapshot Fact table:

It consist of semi additive facts and non additive facts it describes states of things in a particular
instance of time

Ex: Bank Fact table, Inventory Fact table

Degenarate Dimension Key:

Key In a Fact table that is not associated with any Dimension

Example:Order Id,Sale Id, Bill No,Invoice etc

Types of Facts:

There are 3 types of Facts in Fact tables

1. Additive Facts

2.Semi Additive Facts

3. Non Additive Facts

1.Additive Fact: Business measurements in a fact table that can be summed up through all of the
dimensional Keys

Fact Table Store Key Prod Key Date Key Revenue S1 P1 12-Jan-15 600 S1 P2 12-Jan-15 400 S2 P2 12-Jan-
15 800 S2 P3 13-Jan-15 500 S3 P1 13-Jan-15 700 S3 P3 14-Jan-15 900

Reports generation using Keys In above Fact table

Revenue Report By Store


Revenue Report By Product

Revenue Report By Date Store Key Revenue Product Key Revenue Date Key Revenue S1 1000 P1 1300
12-Jan-15 1800 S2 1300 P2 1200 13-Jan-15 1200 S3 1600 P3 1400 14-Jan-15 900 Bank Fact table:

Semi Additive Fact: Business measurements in a fact table that can be summed up across only few
Dimensional Keys

Acct Id

Transaction Date Balance

Profit Margin 21653 12-Jan-15 700000 - 21654 12-Jan-15 400000 - 21653 13-Jan-15
900000 - 21654 13-Jan-15 600000 - Reports:

Balance By Acct Id Acct Id Balance Balance 21653 1600000 900000 21654 1000000 600000

Balance By Date Date Key Balance 12-Jan-15 1100000 13-Jan-15 1500000

The above example is for Semi additive Fact

3.Non Additive Fact:Business measurements in a fact table that cannot be summed up across any
Dimension KeysNote: In a Fact table percentage are always non additive

SEM1 80% SEM2 60% TOTAL 140% Wrong

Note: Example of Non Additive Fact is Unit Price

Types of Dimensions:

The following are the diff types of dimensions in DW

1. Confirmed Dimension

2. Degenerated Dimension

3. Shrunken Dimension

4. Junk Dimension

5.Dirty Dimension

Types of Dimensions:
Conformed Dimension: A Dimension that is shared across multiple Fact table that is called Conformed
Dimension Or Dimension that is used to join Data mart

Banking Domain:

Degenerated Dimension:

If a fact table act as dimension and it’s shared with another fact table (or) maintains foreign key in
another fact table .such a table called degenerated dimension.

Shrunken Dimension:

Dimension that is subsetof toanother dimension

Or

Dimension that is not directly linked to the Fact table

Junk Dimension:

Dimension that is organized based on low cardinality indicator or flag values

Cardinality is no of unique values in a column or Cardinality expresses the minimum and the maximum
no of instances of an entity ‘B’ that can be associated to an instance of Entity ‘A’

The Minimum and Maximum no can be 0,1 or “n”

Dirty Dimension:
If a record occurs more than one time in a table by the difference of non key attribute such a table is
called dirty dimension

Orders:

Order Id

Order Date

Payment Mode

Payment Mode Type

Comm/Non Comm Amount 111 - Cash Cash No - 112 - Cash Cash No - 113
- Credit Master No - 114 - Cash Cash No - 115 - Cash Cash No - 116 -
Credit Visa Yes - 117 - Cash Cash No -

Ord Ind Id Payment

Payment Mode Type

Comm/Non Comm 1 Cash Cash No 2 Credit Master No 3 Credit Visa Yes

Order Id

Order Date

Order Id Amount 111 -1 - 112 -1 - 113 -2 - 114 -1 -


115 -1 - 116 -3 - 117 -1 -

Slowly Changing Dimension:

Dimension that change slowly and irregularly

Or

Dimension that change across time

There are three choices to handle slowly changing dimensions

1.SCD TYPE1
2.SCD TYPE-II

3.SCD TYPE-III

1. SCD TYPE-I:

Most recent changes are maintained

Type1 is current status

Type1 is used for error correction

CID CNAME DOB 11 BEN 12-JAN-1967 12 ALEN 15-FEB-1966

CKEY CID CNAME DOB 101 11 BEN 12-JAN-1967 102 12 ALEN 15-FEB-1966

SCD TYPE-II:

Change is inserted as a new record

Type-II is used to maintain historical status

PRODUCTS PID PNAME PRICE EFF_DATE 11 ABC 300 12-JAN-10 12 PQR 270 15-JAN-10 PRODUCT PRICE
OF 12 CHANGED 199 27-AUG-11

PKEY PID PNAME PRICE EFF_DATE END_DATE 100 11 ABC 300 12-JAN-10 101 12 PQR 270 15-JAN-10
26-AUG-11 102 12 PQR 199 27-AUG-11

Type-II Dimension is referred as Dirty Dimension

Type-II Dimension has redundant data

SCD Type-III: Change is appended as a new column

Type-III is used to maintain partial history status

CID CNAME LOC 11 BEN HYD 12 TOM CHE

CKEY CID CNAME


CURR LOC PREVLOC 101 11 BEN HYD 102 12 TOM CHE

CID CNAME LOC 11 BEN HYD 12 TOM BNG

CKEY CID CNAME

CURR LOC PREVLOC 101 11 BEN HYD - 102 12 TOM BNG CHE

CID CNAME LOC 11 BEN KER 12 TOM BNG

CKEY CID CNAME

CURR LOC PREVLOC 101 11 BEN KER HYD 102 12 TOM BNG CHE

Role Play Dimension: Dimension that is recycled in multiple applications within the DB

Data Modeling:

Model: Business presentation of the structure of the data in one or more database

OLTP:ER-Mode is used

Model is normalized

Model is efficient to wards transaction

Datawarehouse:Dimensional model is used

Model designed based on Facts&Dimensions

Model is efficient in query processiong


Schema:Scema is a collection of users’objects can be a Table,View or Synanim

Types of Schema:

1.Star Schema

2.Snow Flake Schema

3.Galaxy Schema

1.Star Schema: In a star schema a centre of a star is Fact table and corners are Dimension tables

In simple start schema consist of only one Fact table

Star schema Dimension ‘s do not have parent tables

Star schema Dimension’s are Denarmalized

Star schema is De Normalized(every thing in one table) efficient in query processing

2. Snow Flake Schema

Snow flake schema dimensions have one or more parent tables

Snow flake schema is normalized

Snow flake schema is efficient in transaction processing

Customer Cid Cname Gender Geoid 11 C1 1 111 12 C2 1 111 13 C3 0 112 14 C4 1 111

Geography Geoid City State Country Region 111 Hyd Ts India Asia 112 VSP Ap India Asia

Cid Cname Gender Geoid City State Country Region 11 C1 1 111 Hyd Ts India Asia 12 C2 1 111 Hyd Ts
India Asia 13 C3 0 112 VSP Ap India Asia 14 C4 1 111 Hyd Ts India Asia
Star schema use more space than Snow flake schema

Galaxy Schema:

Multiple Fact tables are connected to multiple Dimensions tables

Index: (Fast accessing path)

1.B*Tree Index

2.BitMap Index

1.B*Tree Index

It is used on High Cardinality columns

Example for B*Tree Index=EMPNO

2.BitMap Index

It is used on Low Cardinality columns

Example for Bit Map Index=GENDER

Вам также может понравиться