Вы находитесь на странице: 1из 4

10/10/2014 IBM Knowledge Center

http://www-01.ibm.com/support/knowledgecenter/SSZJPZ_9.1.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitationsstagetypes.html 1/4
IBM InfoSphere DataStage, Version 9.1.2
Processing stage types
Database native SQL does not provide some of the functionality provided by certain
stage types that you can use in job designs.
Transformer stage
InfoSphere DataStage Balanced Optimization optimizes the functionality
contained in a Transformer stage. It does not attempt to optimize functionality
contained in a BASIC Transformer stage.
SQL does not support static variables. Therefore any Transformer stage that contains
a stage variable whose derivation either directly or indirectly refers to itself cannot be
optimized. Such logic is often used to implement functionality such as accumulators
and control break logic. To be optimized, this logic must be expressed in other ways,
for example, by using aggregation.
SQL does not support temporary variables. There is no way for an expression that
produces one select list element to refer to any other select list element. Therefore,
any Transformer stage that contains an output column derivation that refers to
another output column cannot be optimized. To enable optimization, compute
common subexpressions as stage variables, and then use them in multiple-output
column derivations.
The use of static variables or temporary variables is detected when you attempt to
optimize a job that contains them. The InfoSphere Balanced Optimization tool
provides detailed log information about why the job could not be optimized. It might
be possible to reorganize the computations to allow optimization of that job.
By default, InfoSphere DataStage supports intermediate results with a precision/scale
of decimal(38,10) (up to 38 total decimal digits, of which up to 10 can be fractional
digits) during complex arithmetic calculations. SQL databases individually support
different maximums. For example, Teradata releases earlier than V6.2 support a
maximum precision of 18 decimal digits. So complex calculations with decimal data
types, when pushed into database SQL, can exhibit rounding errors due to lower
precision/scale. To ensure comparable results, you can set the InfoSphere
DataStage environment variables APT_DECIMAL_INTERM_PRECISION,
APT_DECIMAL_INTERM_SCALE, and APT_DECIMAL_INTERM_ROUNDMODE to
control the precision, scale, and rounding of intermediate results.
NULL and 0 values are often equivalent in InfoSphere DataStage. In a root job, the
logical expression Link.Column = 0 might evaluate to true if Link.Column has a NULL
value. SQL has no such equivalence. If the intent is to test for NULL, then use the
function IsNull(Link.Column). The equivalence of 0 and NULL can cause a problem
where the Transformer stage consumes the output of any stage that can produce
NULL values (for example, a Join stage performing an outer join).
Filter stage
10/10/2014 IBM Knowledge Center
http://www-01.ibm.com/support/knowledgecenter/SSZJPZ_9.1.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitationsstagetypes.html 2/4
The InfoSphere DataStage Balanced Optimization tool optimizes the functionality
contained in the Filter stage. When the Filter stage is optimized, all the predicates
specified in the Filter stage get combined with SQL statements in the connector.
The InfoSphere Balanced Optimization tool does not support the following Filter
stage functions:
Nulls value (when set to Greater than)
Output rows only once (when set to True)
Aggregator stage
The InfoSphere DataStage Balanced Optimization tool supports the calculation
aggregation type. It does not support the count or recalculation aggregation types.
The InfoSphere Balanced Optimization tool does not support the following
aggregation functions:
Corrected sum of squares
Missing value
Preserve Type
Summary
For certain statistical functions, such as standard deviation and variance, there might
be small differences in calculated answers between the root job and an optimized job
where aggregation is pushed into the database. This is because of differences in
parallel calculation algorithms, and the precision of intermediate results in InfoSphere
DataStage and the database. These differences are typically beyond the fourth
significant digit and are insignificant for practical purposes. Values for most other
calculations (count, sum, mean) are unaffected.
Remove duplicates stage
The InfoSphere DataStage Balanced Optimization tool optimizes the functionality
contained in the Remove Duplicates stage. Key columns defined in the Remove
Duplicates stage are used to group rows with the same values in non-key columns.
For the Duplicate to retain option, both the First (keep first row in each group) and
Last (keep last row in each group) values are supported. The Remove Duplicates
stage does not support Case Sensitive for key column option = FALSE.
Additionally, the Remove Duplicates stage can only be pushed to a source
connector.
Join stage
The InfoSphere DataStage Balanced Optimization tool supports joins with two inputs.
By default, the Join stage sorts data on both input links. This sorting results in output
data that is sorted on the join keys. If the InfoSphere DataStage project environment
variable APT_NO_SORT_INSERTION is set, the sort is suppressed. Pushing a Join
stage into a database source always generates an ORDER BY clause for the sort
keys, as if APT_NO_SORT_INSERTION was not set.
10/10/2014 IBM Knowledge Center
http://www-01.ibm.com/support/knowledgecenter/SSZJPZ_9.1.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitationsstagetypes.html 3/4
InfoSphere DataStage performs implicit type conversions to make join keys of
different types comparable. SQL, however, is much stricter and such implicit
conversions can lead to runtime database errors. Therefore Join key columns within
different type groups (for examples, numbers and strings) cannot be optimized.
Lookup stage
The InfoSphere DataStage Balanced Optimization tool supports single lookups with
two input links.
The following lookup conditions are supported:
Lookup key equality (case sensitive for character-type columns)
Multi-key lookups
The following lookup conditions are not supported:
Range lookups
Case-insensitive key equality lookups on character-type columns.
Only continue and drop actions are supported for lookup failure and condition-not-
met. Both lookup failure and condition-not-met must be set to the same actions. In
any other case, a warning is given in the optimization log and optimization of the
lookup stage is not performed.
InfoSphere DataStage performs implicit type conversions to make join keys of
different types comparable. SQL, however, is much stricter and such implicit
conversions can lead to runtime database errors. Therefore Lookup key columns
within different type groups (for examples, numbers and strings) cannot be optimized.
By default, for a normal (non-sparse) lookup, InfoSphere DataStage uses only one
row matching on the lookup key or keys from a lookup table, and ignores any other
lookup table rows that match. This behavior is not supported by SQL joins. However,
since the typical InfoSphere DataStage usage is for lookup tables to have unique
rows, Balanced Optimization issues a warning to remind you to ensure that the
lookup table has unique rows. To eliminate this warning, set the Multiple rows
returned from link property on the Lookup stage to the name of the reference link,
or set the Lookup type property of the database connector supplying the reference
table to sparse. If the lookup table does not have unique rows on the lookup keys,
and the intended behavior is to use only one matching row, do not optimize this
lookup.
Copy stage
The InfoSphere DataStage Balanced Optimization tool optimizes the functionality
contained in the Copy stage. The Copy stage is not optimized if the Force option is
set to True. When the Copy stage is optimized, it is removed from the job.
Sort stage
The InfoSphere DataStage Balanced Optimization tool optimizes the functionality
contained in the Sort stage. When the Sort stage is optimized, an equivalent SQL
10/10/2014 IBM Knowledge Center
http://www-01.ibm.com/support/knowledgecenter/SSZJPZ_9.1.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/limitationsstagetypes.html 4/4
ORDER BY clause is generated.
The Sort stage is not optimized if any of these option values are set:
Output Statistics = True
Create Key Change Column = True
Create Cluster Key Change Column = True
Sort as EBCDIC = True
Nulls Position = Last
Stable Sort = True
Containers
The InfoSphere DataStage Balanced Optimization tool does not support containers.
Optimization stops at any container boundaries.
For a job with a local container, use the local container Deconstruct option to make
the container contents part of the job, and so eligible for optimization. Select the local
container stage in the job, right-click, and select Deconstruct.
For a job with a shared container, use the Convert to local container option, and
then deconstruct the local container. Select the shared container stage in the job,
right-click, and select Convert to local container.
Stages with multiple output links or reject links
The Transformer, Filter, and Copy processing stages can have multiple output links
including reject links. Reject link optimization is not widely supported, except in the
Filter stage. A reject link in the Filter stage works like an output link, except that it is
associated with an Otherwise predicate. For a Transformer stage or Copy stage
without a reject link or a Filter stage, multiple output links can be optimized. In the
optimization process, each of these stages with multiple output links is first split into
duplicate copies, one for each output link. In this expanded intermediate job, each
stage has only one output link, and is optimized like any other job that contains
stages with one output link.
Parent topic: Job design considerations
Last updated: 2014-01-21
PDF version of this information: IBM InfoSphere DataStage and QualityStage Parallel Job
Developer's Guide