Вы находитесь на странице: 1из 17

Transaction:

A transaction is a business operation


Technical point of view :
It is a set of DML operations(Insert,Update,Delete)
OLTP System=OLTP applications(Front end)+Database(Back end)
Datawarehousing=ETL Development+BI Development
Enterprise Data warehouse:

An Enterprise Data warehouse is a relational DB which is specially designed for


analysing the business and making decisions to achieve the business goals and
responding to business problems ,but not designed for business transactional
processing
A Data warehouse is a concept of consolidating the data from multiple OLTP data
bases
Storage Capacity point of view Relational DB is categorized in to three types
1.Low range
2.Mid range
3.High range
1.Low range DB:
Can organized and managed mega bytes of information
Example:Ms-Access
2.Mid range DB:
Can organized and managed Giga bytes of information
Example:Oracle,Microsoft SQL SERVER,Sybase,DB2,Informix,Postgress SQL
3.High range DB:
Can organized and managed Tera bytes and Peta Bytes of information
Example:Teradata,Netezza,GreenPlum,Hadoop.

Storage point of view data base categorized in to two types.


1.NFS-Normal File storage
2.DFS-Distributed File storage
Data storage Patterns:
There are two types of data storage patterns which are supported by relational DB
1.NFS-Normal File storage
2.DFS-Distributed File storage
NFS-Normal File storage:
1.Single Disk for storing the data
2.Shared every thing architecture(data shared in single disk)
3.Data reads in Sequential
4.All Mid range DB are developed on platform of NFS.
5.Limit scalability or expansion
6.Strongly recommended for OLTP applications
7.Recomended for data warehousing for small and medium scale enterprises with
storage capacity of gigabytes
8.default processor in NFS is only one
9.Disk cant scalable in NFS
Example:Oracle,Sybase,SQL server,DB2,Redbrics,Informix,Postgress SQL
Note:Processor is a S/W component run as .exe

DFS-Distributed File Storage:


1.Multiple disks for storing the data
2.Storing nothing architecture (every processor has dedicated memory& disk that is
not shared by another processor)
3.Data reads in parallel(supports parallelism)
4.Unlimited Scalability
5.Designed only for Building Enterprise data warehouse but not for OLTP
Example:Teradata,Netteza,Hadoop,green plum

Enterprise DWH database Evaluation:


1.Data base that supports enormous storage capacity(Billians of rows and tera bytes)
2. DB that supports distributed file strorage pattern
3.DB that supports nothing architecture
4.Database that supports unlimited scalability(expansion)
5.DB that massively parallel processing
6.DB that supports mature optimizers to handle complex SQL Queries( Run the
queries more faster with less system resource usage
7..DB that supports High Availability(Users can access)
8.100% data with out data loss even S/W,H/W components are down
9.Data base that supports parallal loading
10.That DB supports low TCO (total cost of owner ship) ease to set up ,administrate
& Manage
11. Single DB server that can provide access to hundreds of users concurrently
Data Acquisition:
It is a process of extracting the data from multiple source systems,transforming the
data into consist ant format and load in to a target system
To implement the ETL process we need ETL tools
Types of ETL tools
Two types of ETL tools to build Data Acquisition
1.Code based ETL tool
2.Program Based ETL tool
Code Based ETL:
ETL applications are developed using programming languages such as
SQL,PLSQL,SAS,Teradata,ETL utilities

GUI Based ETL:


ETL applications are developed using simple graphical user interface,point& click
features
Example:Informatca,Data stage, Abnitio,SSIS
MSBI is a package it has(ETL+Reporting=SSIS+SSRS)
Data Cleansing:
It is a process of filtering or rejecting Un wanted source data or records

Data Scrubbing: It is the process of Deriving new attributes or columns

Data Merging:
It is the process of combining the data from multiple source systems
Data merging are two types
1.Join
2.Union

Informatica Powercenter 9.5:


Informatica Powercenter is a data integration tool from Informatica corporation
which was founded in 1993 Redwood City of Las Angels California.
Informatica Power center is a GUI based data integration plotform which can access
the data from various types of source systems,transform the data into universal
format
It is a client server based ETL product.
Informatica Products:
Informatica Power Center
Informatica Power Mart
Informatica Power Exchange
Informatica Power Analyzer
Informatica Data Quality
Master Data Management
Informatica B2B Integration
Informatica Information Life Cycle Management
Informatica Power Center Architecture:
When we install Informatic Power center the following components installed.
1.Power Center Clients:
1.Power Center Designer
2. Power Center Work flow Manager
3. Power Center Workflow Monitor
4. Power Center Repository Manager
2. Power Center Repository
3.Power Center Repository Services(PCRS)
4.Power Center Integration Services(PCIS)
5.Power Center Domain
6.Informatica Administrator(Web Client)
1.Power Center Clients:
1.Power Center Designer: It is an GUI based client component which allow you to
design ETL applications known as Mapping.
A Mapping defines Extraction,Transformation,Loading
The following objects can be created from designer client
1.Source Definition(Meta Data)
2.Target Definition
3.Mappings with Business rules
2. Power Center Work flow Manager:
It is a GUI based client component which allow you create following objects.
1.Session
2.Work flow
3.Schedulers

Session:
A Session is a task that executes mapping .
A session is a set of instructions that tells ETL server
Work flow:
A Work flow is a set of instructions that tells how to execute the session tasks
A Work flow is designed with two types of batch process
1.Sequential Batch Process
2.Concurrent or Parallel batch Process
1.Sequential Batch Process:
Sequential batch process is recommended when hence it is a dependency between
data loads

2.Concurrent batch Process:


Work flow executes the session tasks all at once this is recommended when there is no
dependency between data loads.
Work flow is a top object in the object development hierarchy

Schedule:It is an Automation of executing the work flow


Power Center work flow Monitor:
a)It is a GUI based client component which allows you to monitor the execution of
sessions and work flows on ETL server
b)Can collect ETL statistics such as
No of records Extracted
No of records Rejected

No of recorded Loaded
Through Put in Power Center :
It defines the efficency or the rate at which recors are extracted/sec,the records are
loaded/sec
Through put can also be express in bytes/sec
Can evaluate the ETL server Efficeny
Users can acess session Log(Execution Log)
Development of ETL Objects:
Step1: Create Source Definition
Step2: Create Target Definition
Step3: Design Mapping (ETL application with or with without Business rules)
Step4: Create Session for each Mapping
Step5: Design Work flow
Step6: Run Work flow
Step7: Monitor Workflow
Power Center Repository Manager:
It is a GUI based Administrative client which is used to perform the following tasks
a) Create ,Edit ,Delete Folders
b) Objects Back up and Restore
c) Assign users to access the folders with read ,write,execute permissions
Power Center Repository:
A repository is a brain of ETL system that stores ETL objects or Meta Data.
A relational DB is required to create repository
Repository DB that consist of system tables that stores ETL objects
Power Centre Repository Service[PCRS]:
A Power Center client component connects to the repository DB using repository
service
A repository service is a set of process that insert ,update,delete,retrive metadata from
repository

Instance: Instance is a Image of Original Objects


Or
Instance is a Image of physical Objects
Note: Repository Service provides Design Level Environment
Power Center Integration Service(PCIS):
An Integraton service is an ETL server that performs
Extraction,Transformation,Loading
It provided run time Environment where ETL objects are executed integration Service
creates Logs and saved in the repository data base through repository service.
Integration service consists of following server components
1.Reader
2.DTM
3.Writer

Reader: It connects to the source and Extract the data from tables,Files,etc
Data Transformation Manager (DTM): It process the data according to the
business rules that you configured in the mapping
Writer: It connects to the target system and loads the data into the tables (or) Files
Note: Log Created by Integration service and saved in repository that log can
accessed by work flow Manager
Power Center Domain:
1.The Informatica power center has the ability to scale the services and shared
resources across multiple machines
2.The power center domain is a primary unit for managing and administrating
application services(PCRS,PCIS)
3.Power Center Domain is a collection of one or more Nodes
4.A Node which host the Domain is known as Primary Node Master Gate way Node

5.If master gate way Node fails users request cant be processed
6.Hence it is recommended to configure more than one Node as Master Gate way
Node
7.If the worker Node fails the request can be distributed to other Nodes[High
Availabilty]
8.Each Node is created or Configured with application services

Informatica Administrator (Web Client):


1.It is an Administrative web client which is used to manage&administrate power
center Domain
2.The following admin tasks can be performed with web client
a) creation of users,Groups
b) assign roles&permissions to the users or users groups
c) enable and disable existing nodes
d) Configuring existing nodes to increase the processing efficiency
f) adding or deleting Nodes
g) Creation of application services(PCRS,PSIC)
Pre Requisites of an ETL process:
STEP1: Set Up Sourc&Target Data Base
STEP2: Create ODBC connections for Sources & Target DB
Set Up Target Data Base:
Start--- >Programs---> Oracle---> Application Development---> SQL PLUS
Log on to Oracle with following details

Create User:
SQL>SHO USER
SQL>Create user BATCH7AM identified by TARGET;
Assign permission to User:
SQL>Grant DBA to BATCH7AM;
ETL Development process:
1.Creation of Source& Target Definitions
2 A Source Definition is created using Source Analyzer tools
3 A Source Analyzer connects to the Source DB using ODBC connection
1.Creation of Source Definition:
A Source Definition is created using Source Analyzer tools
A Source Analyzer connects to the Source DB using ODBC connection

2.Creation Of Target Definition:


A Target Definition is created using Target Designer tool in the Designer Client
Component
A Target Designer connects to the Target DB using ODBC connection

3.Create Mapping(with or with out Business Rule):


A Mapping is made up of following metadata components
a) Source(E)
b) Business Rule(T)
c) Target (L)
A Mapping with out business rule known as Flat Mapping
A Mapping is created using Mapping Designer Client component

4.Creation of Session:
1.A Session is a task that runs the mapping
2.It is created using Task Developer tool in Work flow Manager Client component
3.Every Session is configured with the following details
a) Source Connection
b) Target Connection
c) Load Type
Creation of Reader Connection(Oracle):
From the client Power center work flow Manager Select connections menu click on
Realational select the type Oracle click on New Enter the following details

Creation of Writer Connection(Oracle):


From the client Power center work flowManager Select connections menu click on
Realational select the type Oracle click on New Enter the following details

Configuring the Session:


1.Double click the session select the mapping tab from left window select the source
2.From Left window select source and from connections section click on ( down
arrow) to open relational connection browser select connection
ORACLE_SCOTT_DB
3.From Left window select target and from connections section click on(down arrow)
to open relational connection browser select connection
ORACLE_BATCH7AM_DB click Ok
4.From properties section select target load type=Normal click apply and click Ok
5.From Repository menu Click on Save
Create Work Flow:
From Client Power Center Work flow Manager select Tools menu click on Work
flow Designer
2.From work flow menu select create enter the Work flow Name
w_s_flatmapping_oracle
3.From left window drag the session drop in Work flow Designer
4.From Tool Bar click on Link Task
5.Drag the link from start drop on session instance
6.From Repository menu click on Save
7.Run work flow
8.From Workflow menu click on Start Workflow

Creation of target tables Using Target Designer Tool:


1.Open the click Power Center Designer from Tools menu select designer
2.From left window expand sources subfolder
3.Drop the source definition [EMP] to the target Designers work space
4.Double click the target definition click on Rename DIM_EMPLOYEES
5.Select columns tab from tool bar click on Cut to delete columns
6.From tool bar click on add a new column click apply click Ok
7.From target menu click on generate/Execute SQL
8.Click on connect and connect to the DB with the following Details

Select Create table& Click on Generate&Execute and click Ok then the SQL
stores in a file ,file name called MKTABLES.SQL
Transformations&Types of Transformations:
A transformation is a power center object which allow you to develop the business
rules to process the data in desired business formats.
Transformations are categorized in two types
1.Active transformation
2.Passive Transformation
1.Active transformation:
A transformation that can effect the no of rows(or) change the no of rows is known
as Active transformation
The following are the list of active transformations used to process the data
1.Source Qualifier Transformation
2.Filter Transformation
3.Rank Transformation
4.Sorter Transformation
5.TransactionControl Transformation
6.Update Strategy Transformation
7.Normalizer Transformation
8.Aggrigator Transformation
9.Joiner Transformation
10.Union Transformation
11 .Router Transformation
12.SQL Transformation
13.JAVA Transformation
14.Look Up Transformation(From 9.0 version on wards Act as Active transformation)
1.Passive transformation:
A transformation that doesnt effect the no of rows(or) does nt change the no of
rows is known as Passive transformation
The following are the list of active transformations used to process the data
1.Look UpTransformation( Up to Informatica 8.6 act as Passive transformation)
2.Expression Transformation
3.SQL Transformation(it Act as Duel Transformation)
4.Stored Procedure Transformation
5.Sequence Generator Transformation
6.XML Source Qualifier Transformation

Ports & Types of Ports:


A port represents column of the table (or) File
Every Transformation can have two basic types of Ports
1.Input Port(I)
2.Output Port(O)
Input Port(I): A Port which can receives the data is known as Input Port
Output Port(O): A port which can can provide the data is known as Output Port
Connected &Unconnected Transformations:
Connected Transformation:
A Transformation which is the part of mapping in Data flow direction is known as
connected Transformation
2.Connected to the source and connected to the Target
3. A connected transformation can receive multiple Input Ports and Can return
Multiple Output Ports
Note: All active and passive transformations can be configured as connected
transformation
Un Connected Transformation:
A Transformation which is not a part of Data flow direction neither connected to
the source nor connected to the target is known as Un Connected Transformation
2.Can receive multiple input ports but it always returns a single Output Port.
3. The following transformations can be configured as Un Connected
Look Up Transformation
Stored Procedure Transformation
Filter Transformation:
1.It is an active transformation that can filter the records based on the given
condition
2. The condition can define on single /multiple ports
3. The integration service evaluates the condition writ tens True/False
4.True indicates that the records are allowed for further processing (or) Loading the
data into target
5. False indicates that the records are rejected from filter transformation
6.Rejected records cant be captured(even cant be identified in session log)
7.The Filter transformation functions as where clause in SQL
8.The filter transformation supports single condition on one/more ports
Limitations :
Allows you to define only a single condition
Rejected records cant be captured
Performance Considerations:
1.Keep the filter transformation as close o the Source Qualifier transformation as
possible o filter the rows early in the data flow,as a result we can reduce the no of
rows for further processing
2.copy the required ports from source qualifier to expression transformation
3.Consider the data concatenation rule while designing mapping

Expression Transformation:
1.It is a passive transformation which allow you to calculate the expression for each
row
2.It performs row by row process
3.Expressions are developed using functions&arthematical operations
4.An expression transformation is created with 3types of ports
Input,Output,Variable
5.Expressions are developed either in output(O) or Variable ports(V)
6.Varible ports are recommended to create to simplify the complex expressions and
reuse expressions
Scenario1:
Calculate the tax for each employee who belongs to the sales department ,If sal is
greater than 5000 then calculate the tax as Sal*0.17 else calculated the tax as
Sal*0.13
Sales department is identified with department identification no is 30

Logic:
Expression transformation
SAL-[I]
TAX[O] (IFF(SAL>5000,SAL*0.17,SAL*0.13)
LOAD_DATE[O](SYSDATE)
Scenario2:
Calculate the total salary for each employee based on Sal and Commission
Total sal=Sal+Comm
Comm May have Nulls
Logic:
Expression transformation
TotSal=IIF(ISNULL(COMM),SAL,SAL+COMM)
Scenario3:
Implement LIKE operator using filter transformation in job column of EMP
table SALESMAN is represented 3 different format
SALESMAN
SALES-MAN
PRE-SALES

Variable Port:
A port which can store the data temporarly is known as variable port(v)
2.Varible ports are created to simplify the complex expressions and reuse expressions
in several Output Ports
3.Varible ports are local to the transformation
4.Increase the efficiency of calculations
5.The default value for numerical variable port is 0
6.The default value for variable port with data type string is space
7.Varible ports are not visible normal view of transformation but in edit view
Router Transformation:
A Router Transformation it is of type an active transformation which allows you to
create multiple conditions and passes the data to the multiple target
2.A router transformation is created with two types of the groups
1.Input Group
2.Output Group
Input Group:
Only Input Group can receive the data from source pipe line
Output Group:
Multiple Output Group categorized in to two types
1.User defined Output group
2.Default group
1.User defined Output group:
1.Each user defined output group has one condition
2.All Group conditions are evaluated for each row
3.One row can pass multiple conditions
Default Group
1.Always one default group
2.Captures the rows that fails all group conditions(Rejected records)

Performance Considerations:
The router transformation has a performance advantage over multiple filter conditions
because .A row is read once into Input Group but evaluated multiple times based on
the no of groups,where as using multiple filter transformation requires the same data
to be duplicated for each filter transformation.

Вам также может понравиться