Вы находитесь на странице: 1из 2

Data types and DML

Data Manipulation Language (DML) provides a rich set of data types, including base and compound
types as well as user defined types.

Data Manipulation Language (DML) is used in AbInitio to define the complete record structure.
During reading/writing/processing data will be treated accordingly to the defined record structure.
This can be defined either in grid mode or in text mode. A DML can be stored under a file name
which can be referred multiple times or can be embedded. Graph uses DML to define record
formats, expressions, transform function and key specifiers. DML refers to record format, which
describe how data should be interpreted. An accurate description of record structure is a
prerequisite to being able to access particular fields.
Fixed length fields can be processed more efficiently than delimited fields because their size is
known without searching for the delimiter.

Example:

$ cat emp.dml
record
decimal(",") empno;
string(",") ename;
decimal(",") deptno;
date("DDMMYYYY")(",") hire_date;
decimal(",") sal = NULL (10000); // Null allowed and default value 10000
decimal("\n") mgr;
end;

Input file, output file, intermediate file and Look up file


Input File:
It reads the data records from a serial file or multifile in the file system.
Input file can provide data to multiple components. Label should be given appropriately to identify
file uniquely.If same label is given than system append with count i.e 1, 2 to make it unique.
Data location specified the file location.Data location can be specified using absolute path or
using paramter as shown below diagram.

Output File:
It writes the data records to a serial file or a multifile in the file system.
Output file does not provide data to other components in the graph.

Intermediate file:
It is used to write data records to file in the middle of the graph. It helps in debugging and further
processing of intermediate file. (In Debugger mode, one can add watch and
create intermediate data for viewing without creating new file and changing graph.) Intermediate
file can receive input from only one source and can provide data to multiple components.

Look up file:
It represents one or multiple serial files or a multiple of data records small enough to be held in
main memory, letting a transform function retrieve records much more quickly than it could
retrieve them if they were stored on disk.Look up file is not connected to other components in
graph. It provides much of the same functionality as a two way join, except that it is marginally
faster than an in memory join.
Look up can not be for outer join.

Filter by Expression, Replicate, Reformat and Redifine


Filter by Expression:
It enables user to track down a particular record or records, or to put together a sample of records
to assists with analysis. It allows filter the data based on expression that identifies only the records
that you need. In FBE on parameter tap click on select_expr parameter it will open Expression
editor. Where one can build expression using input fields from input table/ file and build in
functions and operators. Filter by expression component can also be used for data validation.

If select_expr evaluates to a non zeros result for a record, then that record is selected
otherwise the result is zero and the record goes to the deselect port.
Replicate:
It is used when user want to make multiple copies of a flow for separate processing.

Reformat:
It changes the record format of data records by dropping fields, or by using DML expressions to add
fields, combine fields, or transform the data in the records. It manipulates one record at a time and
does work like validation and cleansing e.g. deleting bad values, setting default values, standardizing
field formats or rejecting records with invalid date etc. A common use of Reformat component is to
Clean input data so that all of the records conform to the same convention.
Transformation rules are defined for transform (0) parameter of Reformat component. Select the
parameter and click on new or edit. Business rules correspond to connect fields in input records to
output record are defined in Transform parameter.

Multiple rules can be assigned to single output field. For any given record, the rule is applied is
determined by condition in the rule. The condition can be explicit or implicit. Called prioritized
rules. Priorities are assigned to the rules in the order in which they are attached to the output field.
The last rule assigned to the output field always has blank (lowest) priority. As shown below each
rule as been assigned unique priority as they were defined and the last one has lowest i.e. blank
priority.

Priorities before the last are indexed from 1 and each rule must have a distinct priority. To set
priority, do right click in business rule and set value. The priorities of the rules do not have to be
consecutive. The transform will attempt to evaluate the rules in decreasing order of priorities. If one
rule fails to produce a value because result is undefined (also known as null value) then transform
moves to next rule. If none of the rule (including any default) gives a valid assignment for the field,
the component will produce an error message and the input record will be rejected. (Certain
expression in rule can cause invalid results leading to errors that cause the record to be rejected
irrespective of the priorities e.g. attempting to assign fields with the wrong length or incompatible
types, dividing by zero or making comparisons involving null or invalid values. This can be avoided by
validating data before using it.

Вам также может понравиться