Вы находитесь на странице: 1из 9

Aggregator Transformation

Aggregator Transformation in Informatica is a connected Active transformation which lets you


performs aggregate calculations, such as averages and sums on the group of data. Aggregator
transformation is unlike the Expression transformation, in that you use the Aggregator transformation
to perform calculations on groups. For example: calculating averages, count first, last, max, median,
min, percentile, stddev, sums and variance. The Expression transformation permits you to
perform calculations on a row-by-row basis only. In below article we will go through the properties
of Aggregator Transformation. We will also discuss the steps of adding /configuring Aggregator
transformation in Informatica Mapping.

Business purpose of Aggregator Transformation:


The Aggregator transformation is use to perform aggregate calculations for each data. Data can be
modified using built-in functions. Sample transformations handled by the Aggregate transformer are
AVG, COUNT, MAX, MIN, SUM
FIRST, LAST
MEDIAN, PERCENTILE, STDDEV, VARIANCE

Properties of Aggregator Transformation:

Aggregator Transformation is an Active transformation as it also enables you to use conditional


clauses to filter rows
Aggregator Transformation is a connected Transformation
Types of ports in Aggregator Transformation:
o Input : For reading input Data
o Output : For providing output Data
o Variable: Used to store any temporary calculation

Components of Aggregator Transformation:

Aggregate cache: The Integration Service stores data in the aggregate cache until it completes
aggregate calculations. It stores group values in an index cache and row data in the data cache.
Aggregate expression: Enter an expression in an output port. The expression can include nonaggregate expressions and conditional clauses.
Group by port: Indicate how to create groups. The port can be any input, input/output, output, or
variable port. When grouping data, the Aggregator transformation outputs the last row of each
group unless otherwise specified.
Sorted input: Select this option to improve session performance. To use sorted input, you must
pass data to the Aggregator transformation sorted by group by port, in ascending or descending
order.
You can configure the Aggregator transformation components and options on the Properties and Ports
tab.

Configuring Aggregator Transformation Ports:


You can configure the following components on the ports tab
Port name: Add the name of the port.
Datatype, precision, and scale: Configure the datatype and set the precision and scale for each
port.
Port type: A port can be input, output, input/output, or variable. The input ports receive data and
output ports pass data. The output ports can pass aggregated data (use Aggregate function in
Expression). Variable ports store data temporarily and can store values across the rows.
Expression: Use the Expression Editor to enter expressions. Expressions use the transformation
language, which includes SQL-like functions, to perform calculations. Example ( sum , Max,average)
GroupBy: Indicate how to create groups. The port can be any input, input/output, output, or
variable port. When grouping data, the Aggregator transformation outputs the last row of each
group unless otherwise specified

Configuring Aggregator Transformation Properties:


Modify the Aggregator Transformation properties by clicking on the Properties tab.

Property
Cache Directory

Tracing Level
Sorted Input
Aggregator Data
Cache Size

Aggregator Index
Cache Size

Description
Local directory where the Integration Service creates the index and data cache files. By
default, the Integration Service uses the directory entered in the Workflow Manager for
the process variable $PMCacheDir. If you enter a new directory, make sure the directory
exists and contains enough disk space for the aggregate caches. If you have enabled
incremental aggregation, the Integration Service creates a backup of the files each time
you run the session. The cache directory must contain enough disk space for two sets of
the files
Amount of detail displayed in the session log for this transformation.
Indicates input data is presorted by groups. Select this option only if the mapping passes
sorted data to the Aggregator transformation
Data cache size for the transformation. Default cache size is 2,000,000 bytes. If the total
configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the
session on a 64-bit Integration Service. You can configure the Integration Service to
determine the cache size at run time, or you can configure a numeric value. If you
configure the Integration Service to determine the cache size, you can also configure a
maximum amount of memory for the Integration Service to allocate to the cache
Index cache size for the transformation. Default cache size is 1,000,000 bytes. If the total
configured session cache size is 2 GB (2,147,483,648 bytes) or greater, you must run the
session on a 64-bit Integration Service. You can configure the Integration Service to
determine the cache size at run time, or you can configure a numeric value. If you
configure the Integration Service to determine the cache size, you can also configure a
maximum amount of memory for the Integration Service to
allocate to the cache.

Transformation
Scope

Specifies how the Integration Service applies the transformation logic to incoming data:
Transaction. Applies the transformation logic to all rows in a transaction.
Choose Transaction when a row of data depends on all rows in the same
transaction, but does not depend on rows in other transactions.
All Input. Applies the transformation logic on all incoming data. When you choose
All Input, the PowerCenter drops incoming transaction boundaries. Choose All
Input when a row of data depends on all rows in the source.

Simple Example of Aggregator Transformation:


Problem Statement: Create a mapping to populate min, maximum salary, average and sum of salary
for each department based on the employee data with the help of Aggregator Transformation.

Solution:

Create a new table TARGET.Employees_Aggregate:


Create Table TARGET.EMPLOYEES_Aggregate (
DEPARTMENT_ID DECIMAL(4,0),
EMP_COUNT DECIMAL(4,0),
MIN_SALARY DECIMAL(10,0),
MAX_SALARY DECIMAL(10,0),
AVG_SALARY DECIMAL(10,0),
SUM_SALARY DECIMAL(10,0)
)
Create a new mapping m_Employees_Aggregator by Go to toolbar -> click mapping-> Create
Drag Source (HR.Employees) and Target (TARGET.Employees_Aggregate) to the mapping.
Add Aggregator Transformation by Go to Toolbar->click Transformation -> Create. Select the
Aggregator transformation

You can also select Transformation by clicking function button on Informatica Designer

Enter the name aggr_emp_Salary and click done.

Drag input port from EMPLOYEE_ID, SALARY, DEPARTMENT_ID from SQ_EMPLOYEES (Source
Qualifier ) to aggr_emp_Salary
Also add additional below output port (by Clicking on Create port button)
o COUNT
o MIN_SAL
o MAX_SAL
o AVG_SAL
o SUM_SAL

Check the Group By option for the DEPARTMENT_ID port:

Edit the expression for AVG_SAL ( by clicking on expression editor ) and add below expression
AVG(SALARY)

Similarly add below expression for other port as well


COUNT = COUNT(SALARY)
MIN_SAL = MIN(SALARY)
MAX_SAL = COUNT(SALARY)
SUM_SAL = SUM(SALARY)
Click the Transformation tab and configure transformation properties. ( change Tracing Level as
per your need)

To enhance the performance of Aggregator , it is recommended to provide Sorted Data to it (Via


SQ query or by adding Sorter Transformation before it)
In case Sorted Input data is coming to Aggregator, check the Sorted Input option under the
properties Tab.
Now link all required port from aggr_emp_Salary to Employees_Aggregate Target Definition,
Click on Mapping ( from tool bar) -> then Validate ( to validate the mapping)
Now save the mapping ( by clicking on Repository-> Save or you can also press Ctrl+S)
Generate the workflow and run it

Overall Mapping:

Use of Aggregator to remove duplicates:


The aggregator transformation can be used to eliminate duplicates from the source qualifier or any
other source transformation. The example that well be taking a look at is the union transformation
example in which one of the source tables contains duplicate rows. Well simply insert an aggregator
transformation after the union transformation and select all the columns as the group by columns. This
can be achieved by the following SQL query aswell:
SELECT

A.EMPLOYEE_ID, A.FIRST_NAME, A.LAST_NAME, A.EMAIL, A.PHONE_NUMBER,


A.HIRE_DATE, A.JOB_ID, A.SALARY, A.COMMISSION_PCT, A.MANAGER_ID,
A.DEPARTMENT_ID

FROM
(SELECT

FROM
UNION ALL
SELECT

EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER,


HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID,
DEPARTMENT_ID
HR.EMPLOYEES_1

EMPLOYEE_ID, FIRST_NAME, LAST_NAME, EMAIL, PHONE_NUMBER,


HIRE_DATE, JOB_ID, SALARY, COMMISSION_PCT, MANAGER_ID,
DEPARTMENT_ID
FROM
HR.EMPLOYEES_2) A
GROUP BY
A.EMPLOYEE_ID, A.FIRST_NAME, A.LAST_NAME, A.EMAIL, A.PHONE_NUMBER,
A.HIRE_DATE, A.JOB_ID, A.SALARY, A.COMMISSION_PCT, A.MANAGER_ID,
A.DEPARTMENT_ID

Problem Statement: Remove the duplicate rows coming from the union transformation before
loading into the target table.

Solution:

Use the mapping that weve previously created for Union transformation example:

Disconnect the links from the union transformation to the target table and insert an Aggregator
transformation between them. Link the output ports of the union transformation to the
Aggregator tranformation:

In the Ports tab of the aggregator transformation select all the columns as Group by columns:

Click OK
Drag all the ports from aggr_Employees to the target table Employees:

Click on Mapping ( from tool bar) -> then Validate ( to validate the mapping)
Now save the mapping ( by clicking on Repository-> Save or you can also press Ctrl+S)
Generate the workflow and run it

Вам также может понравиться