Вы находитесь на странице: 1из 2

Sequential File

The Sequential File stage is a file stage that allows you to read data from or write
data one or more flat files.
The stage can have a single input link or a single output link, and a single rejects
link.
The stage executes in parallel mode if reading multiple files but executes
sequentially if it is only reading one file. By default a complete file will be read by a
single node (although each node might read more than one file). For fixed-width
files, however, you can configure the stage to behave differently:
You can specify that single files can be read by multiple nodes. This can improve
performance on cluster systems.
You can specify that a number of readers run on a single node. This means, for
example, that a single file can be partitioned as it is read (even though the stage is
constrained to running sequentially on the conductor node).

Read from multiple nodes


This is an optional property and only applies to files containing fixed-length records;
it is mutually exclusive with the Number of Readers Per Node property. Set this to
Yes to allow individual files to be read by several nodes. This can improve
performance on a cluster system.
InfoSphere DataStage knows the number of nodes available, and using the fixed
length record size, and the actual size of the file to be read, allocates the reader on
each node a separate region within the file to process. The regions will be of roughly
equal size.

Number Of readers per node


This is an optional property and only applies to files containing fixed-length records;
it is mutually exclusive with the Read from multiple nodes property. Specifies the
number of instances of the file read operator on a processing node. The default is
one operator per node per input data file. If numReaders is greater than one, each
instance of the file read operator reads a contiguous range of records from the input
file. The starting record location in the file for each operator, or seek location, is
determined by the data file size, the record length, and the number of instances of
the operator, as specified by numReaders.
The resulting data set contains one partition per instance of the file read operator,
as determined by numReaders.
This provides a way of partitioning the data contained in a single file. Each node
reads a single file, but the file can be divided according to the number of readers
per node, and written to separate partitions. This method can result in better I/O
performance on an SMP system.

RCP:
RCP does stand for runtime column propagation. Its purpose is to eliminate the need
for a developer to specifically name any column that does not need to be named in
the design (for example because it is being used in a transformation) and yet have
the data in that column automatically propagate to the stage's output link.

Вам также может понравиться