Вы находитесь на странице: 1из 11

DataStage/390 Sample Job

Welcome to DataStage/390! This sample job introduces you to the features and functionality in DataStages Mainframe edition. It extracts data from two complex flat files, performs a series of processing steps, and writes the data to a delimited flat file. The processing steps include stages to transform data, perform a table lookup, join data from intermediate fixed-width flat files, and finally aggregate data before loading the target data warehouse. As a final step, the target file is prepared for transfer to a Unix system. The sample job is divided into four parts: Part 1 extracts data from a complex flat file called PRODUCT_MASTER, performs simple data transformations in a Transformer stage, performs a lookup to a relational table using a Lookup stage, and loads data to a fixedwidth flat file called PRODUCT. Part 2 extracts data from a complex flat file called SALES_ORDER_ENTRY, performs complex transformations in a Transformer stage, and loads data to a fixed-width flat file called ORDER_LINE_ITEMS. Part 3 joins the data from PRODUCT and ORDER_LINE_ITEMS and loads a fixed-width flat file called PRODUCT_SALES_ANALYSIS. Part 4 aggregates the data from PRODUCT_SALES_ANALYSIS and loads a delimited flat file called MONTHLY_PRODUCT_SALES. A final FTP stage collects the information needed to transfer the target file to the host machine.

This document takes you on a tour of the sample job, showing you each stage and how it is configured. Youll become familiar with the different stage types in mainframe jobs and their unique characteristics.

Part 1: Source and Target Stages for the Lookup


Lets start by opening the sample job in the DataStage Designer: 1. Choose File Open Job. The Open Job dialog box appears. 2. Click the Existing tab, then double-click SUMSCT. The sample job appears in the Designer Job window.

Layout of the Sample Job


The upper left corner of the Designer window displays Part 1 of the sample job. It consists of a Complex Flat File stage labeled PRODUCT_MASTER, a Transformer stage labeled Prod_Mstr_Transform, a Lookup Stage labeled Prod_Uom_Lookup, a Relational stage used for the lookup labeled UOM, and a Fixed-Width Flat File stage labeled PRODUCT. Part 2 is in the lower left corner of the window. There is a Complex Flat File stage labeled SALES_ORDER_ENTRY, a Transformer stage labeled Order_Transform, and a Fixed-Width Flat File intermediate stage labeled ORDER_LINE_ITEMS.

DataStage 4.1

Page 1 of 11

03/30/2013

In the center of the window is Part 3, a Join stage labeled Join_Products_Orders. Part 4 is displayed on the right side of the window. Data from the Join stage flows into a Fixed-Width Flat File stage labeled PRODUCT_SALES_ANALYSIS. The data is output to an Aggregator stage labeled Sales_Aggregation, and then loaded into a Delimited Flat File stage called MONTHLY_PRODUCT_SALES. The last stage in the job design is an FTP stage labeled UNIX_FTP.

Viewing the Configuration of the Job Stages


You can see the basic structure of the job in the Job window. The next step is to view the configuration of each stage. The configuration information specifies the appropriate meta data for each stage and defines what data processing is to be performed.

Source Complex Flat File Stage


The PRODUCT_MASTER stage reads data from a complex flat file. It specifies: The name of the file from which data is extracted The DD name and access type of the file The starting and ending rows The definition of the data columns in the file The columns to output from the stage A constraint to filter the output data

1. Double-click the PRODUCT_MASTER Complex Flat File stage. The Complex Flat File Stage dialog box appears, displaying the stage General page. Notice that this page specifies the name of the file from which data is extracted, XDV4.PRODUCT.MASTER. It also specifies the DD name of the file, the access type, and the starting and ending rows. 2. Click the Columns tab. This page displays the columns definitions of the data being read by the stage. The columns were loaded from the SALESORD.CFD table definition (see page 8 for detailed definitions of the tables and columns used in the sample job). Right-click over the LAST_UPDATE_DATE field and select Edit row... from the shortcut menu. The Edit Column Meta Data dialog box appears. Notice that the Date format field specifies a date format of MMDDCCYY. Click the Next button to display the meta data for the EFF_START_DATE field. Its date format is also MMDDCCYY. 3. Click the File view tab. This page displays the COBOL PICTURE clauses for the columns and the exact storage layout in the file. 4. Now click the Outputs tab. The Constraint page is displayed by default. The constraint specifies that records without a date in the EFF_END_DATE field are not to be output from the stage. 5. Click the Selection tab. Notice that a subset of columns appears in the Selected columns list. 6. Click OK to close the Complex Flat File Stage dialog box.

Transformer Stage
Next, lets look at the Transformer stage. It specifies the transformations to be applied to the data before it is sent to the Lookup stage. 1. Double-click the Prod_Mstr_Transform Transformer stage. The Transformer Editor appears. The upper part of the Transformer Editor shows the columns

DataStage 4.1

Page 2 of 11

03/30/2013

on the input and output links, and the lower part displays the column meta data for each link. 2. Most of the output columns are derived from their corresponding input columns, as indicated by the relationship lines between them. Notice that the derivation for the EXTRACT_DATE output column is CURRENT_DATE. Doubleclick the Derivation cell to open the Expression Editor. Click Constants in the Item type list and notice that CURRENT_DATE is displayed in the Item properties list. This expression uses the constant to specify that EXTRACT_DATE is derived from the current data at the time of execution. 3. Click OK to close the Transformer Editor.

Relational (Reference Link) Stage


Before moving to the Lookup stage, lets examine the UOM Relational stage that serves as the reference link. UOM contains columns with unit-of-measure codes and associated descriptions. 1. Double-click the UOM Relational stage. The Relational Stage dialog box appears, with the Tables page displayed by default. The Selected tables list shows that XDV4.UOM is the table being read by the stage. 2. Click the Select tab. Notice that both columns in the UOM table are being output from the stage. 3. Click the Where, Group By, and Order By tabs to see whether any WHERE, GROUP BY or ORDER BY clauses have been specified. 4. Click the SQL tab to view the SQL statement that has been generated. 5. Click OK to close the Relational Stage dialog box.

Lookup Stage
Now lets look at the Lookup stage, which is designed to match rows from the two input links based on unit-of-measure codes and return unit-of-measure descriptions. 1. Double-click Prod_Uom_Lookup to open the Lookup Stage dialog box. The General page is displayed by default. It indicates that a Singleton Lookup is to be performed using the Auto lookup technique. Skip Row is the action specified if the lookup fails. 2. Click the Inputs tab and select each of the two input links from the Input name field, noticing the column definitions displayed in the Columns grid. 3. Click the Outputs tab. The Lookup Condition page is displayed by default and contains the key expression for performing the lookup. The lookup will be performed when the UOMCD column from the reference link equals the UOM_CODE column from the primary link. 4. Click the Mapping tab. The left pane displays the columns from the reference and primary links. The right pane shows the output column derivations. Notice that the UOM_DESC column from the UOM Relational stage is mapped to the UNIT_OF_MEASURE output column. Most of the other output columns are derived from the primary link input columns. 5. Click OK to close the Lookup Stage dialog box.

DataStage 4.1

Page 3 of 11

03/30/2013

Target Fixed-Width Flat File Stage


The final stage in Part 1 is a Fixed-Width Flat File stage. It receives input from the Lookup stage and specifies how data should be written to the target file. 1. Double-click the PRODUCT Fixed-Width Flat File stage. The Fixed-Width Flat File Stage dialog box appears, with the General page displayed by default. Notice that the name of the file being written is XDV4.PRODUCT and the starting and ending rows are the first and last rows, respectively. This file does not already exist, so Create a new file is specified in the Write option field. 2. Click the Columns tab. The column definitions were loaded from the PRODTBL.CFD table definition (see page 8 for detailed definitions of the tables and columns used in the sample job). 3. Click the Options tab. This tab is enabled when you choose to create a new file in the Write option field. It allows you to define the JCL parameters that are needed to create a new mainframe file, such as end-of-job handling and storage requirements. 4. Click the Inputs tab. The column definitions displayed here match those on the Stage page. 5. Click the Outputs tab. Notice that all columns are selected to be output from the stage and no constraint has been specified. 6. Click OK to close the Fixed-Width Flat File Stage dialog box. You are finished with Part 1. In Part 2 youll see how to perform more advanced transformations to move data from a complex flat file source to a simple flat file target. Now that youre familiar with the basics of the mainframe stage editors, instructions will be briefer from this point forward.

Part 2: Complex Transformations


Part 2 shows you how complex transformations are specified using the Expression Editor in a Transformer stage. First youll take a quick look at the source and target stages, then youll delve into the details of the Transformer stage.

Source Complex Flat File Stage


1. Open the SALES_ORDER_ENTRY Complex Flat File stage. It is similar to the PRODUCT_MASTER source stage. Review the specifications on the General page, then click Columns. Verify that the date formats for the BACK_ORDER_DATE and BACK_ORDER_SHIP_DATE fields are YYDDD. 2. Click Outputs and notice the constraint. Orders that have been cancelled or returned (where LINE_ITEM_STATUS equals X or R), and that have a LINE_ITEM_NO equal to 9, will not be output from the stage.

Target Fixed-Width Flat File Stage


1. Open the ORDER_LINE_ITEMS Fixed-Width Flat File stage. It is similar to the PRODUCT target stage. Data is being written to a new file named XDV4.ORDER.LINE.DAT. Click Columns to view the column definitions, which were loaded from the ORDIEMP.CFD table definition.

DataStage 4.1

Page 4 of 11

03/30/2013

2. Click Outputs and notice that all columns are being passed through the stage and no constraint has been specified.

Transformer Stage
This stage defines the field mappings and transformations of data flowing from the SALES_ORDER_ENTRY source to the ORDER_LINE_ITEMS target. 1. Open the Order_Transform Transformer stage. Click the Show/Hide Stage Variables button on the Transformer Editor toolbar to display the stage variables. Notice the link lines joining the input columns with the stage variables, and the link lines along the right side of the table connecting the variables to the output columns that use them. Four stage variables have been defined: a) TempColorDesc defines field conversions to convert COLOR_CODE input values before they are moved to the COLOR_DESC output column. Doubleclick the Derivation cell to open the Expression Editor, and examine the IFELSE statements used to build the expression. When you are done, click OK. b) WxGrossDisc is a working storage variable that stores intermediate results in the calculation of the WxDiscAmt variable. Right-click and choose Stage Variable Properties from the shortcut menu to display the Transformer Stage Properties dialog box. Notice the properties for this and the other variables. c) WxReturnDisc is also working storage for calculating the WxDiscAmt variable. Open the Expression Editor and look at the expression used to define this variable. d) WxDiscAmt is working storage used in the arithmetic specifications that calculate the values of the DISC_AMT and LINE_ITEM_SALES_AMT output columns. Open the Expression Editor and notice that the expression is based on the WxGrossDisc and WxReturnDisc variables. 2. Now look at the Derivation cells for the COLOR_DESC, DISC_AMT, and LINE_ITEM_SALES_AMT output columns. Notice how the stage variables are used in the expressions. 3. Look at the ORDER_DATE column derivation, which is the concatenation of the ORDER_YY, ORDER_MM, and ORDER_DD input values. 4. Finally, look at the QUANTITY_SOLD column derivation, which subtracts RETURN_QUANTITY from QUANTITY_ORDERED input values. 5. Click OK to close the Transformer Editor. Now that you are done with Part 2, lets move on to Part 3 where a Join stage combines data from the two input streams.

Part 3: Joining Data from Two Inputs


This part of the job joins data from the PRODUCT and ORDER_LINE_ITEMS FixedWidth Flat File stages that were created in Parts 1 and 2. The joined data is passed to another Fixed-Width Flat File stage named PRODUCT_SALES_ANALYSIS.

DataStage 4.1

Page 5 of 11

03/30/2013

Join Stage
1. Double-click the Join_Products_Orders Join stage. On the General page, notice that the join type is an inner join, which returns only those rows that have matching values in both input tables. The join technique is AUTO, which means DataStage will choose the technique based on the information specified in the stage. 2. Click Inputs and look at the column definitions being passed from the two input links. 3. Click Outputs. The Join Condition page is displayed by default. The join will be performed where PRODUCT_ID of the ORDER_LINE_ITEMS input table equals PRODUCT_ID of the PRODUCT input table. 4. Click Mapping and examine the mappings between input columns and output columns. 5. Click OK to close the Join Stage dialog box.

Target Fixed-Width Flat File Stage


1. Open the PRODUCT_SALES_ANALYSIS Fixed-Width Flat File stage. Data will be written to a new file named XDV4.PRODUCT.SALES.DAT. 2. Click Columns and review the column definitions for the stage, which were loaded from the PRODSALE.CFD table definition. Look at the date format specified for the BACK_ORDER_DATE and BACK_ORDER_SHIP columns MMDD-CCYY. How does this compare to the date format specified in the SALES_ORDER_ENTRY source stage? 3. Click Outputs and notice that all columns are being output by the stage, without any constraint. Thats all for Part 3. At this point in the job, youve seen how easy it is to define source and target stages, perform transformations, do a lookup, and join data using DataStage/390. The last step is to aggregate data and load the data warehouse.

Part 4: Aggregating Data and Loading the Data Warehouse


At last, were ready to aggregate data and load the target delimited flat file. There is a final post-processing stage that prepares the target file for transfer to the mainframe. Lets take a look at how these last few stages in the job are configured.

Aggregator Stage
The Aggregator stage groups data from the input link, performs aggregation functions, and outputs the data on a single output link. 1. Double-click the Sales_Aggregation stage. The Outputs page is active by default. Control break aggregation is selected, meaning that the input rows will not be sorted before aggregation occurs. 2. Click Aggregation and examine the settings. Where are first and last values returned? Which columns are summarized and which are averaged? Also

DataStage 4.1

Page 6 of 11

03/30/2013

notice that Group By is checked for the rest of the columns. Every output column from an Aggregator stage must be either aggregated or grouped by. 3. Click Mapping to look at the input-to-output column mappings. Input column names are appended with a tag indicating the aggregation function being performed. Output column derivations also display these tags. 4. Click OK to close the Aggregator Stage dialog box.

Target Delimited Flat File Stage


Finally, lets take a look at the data warehouse created by this job. 1. Double-click the MONTHLY_SALES_ANALYSIS Delimited Flat File stage. Look at the file name, write option, and delimiters specified on the General page. 2. Click Columns to review the column definitions, which were loaded from the MTHSALES.CFD table definition. 3. Click OK to close the Delimited Flat File Stage dialog box.

Post-Processing FTP Stage


FTP stages are used to transfer files to a host system. They collect the information required to generate the job control language (JCL) for performing the file transfer. They can have one or more input links, but no output links since they are always the last stage in a job. 1. Double-click UNIX_FTP stage to open the FTP Stage dialog box. Host machine attributes are specified on the General page. Look at the available profiles listed in the Machine Profile drop-down box. (Machine profiles are defined in the DataStage Manager.) When you select a machine profile, the rest of the fields on the General page are automatically filled in. You can change the settings, but the changes will only be used in the current FTP stage and are not propagated back to the saved profile. 2. Click Inputs. Notice that the File name field is read-only and matches the name specified in the Delimited Flat File stage. The Destination file name field contains the name of the target file on the host system. 3. Click OK to close the FTP Stage dialog box. Congratulations! You have finished reviewing Part 4.

Summary
This sample job introduced you to the capabilities of DataStage/390. It featured most of the source, target, and processing stage types that are available in mainframe jobs. You saw how to configure the individual stages and link them together in manageable steps, resulting in an effective design for building a data warehouse. For more information about DataStage/390, refer to DataStage/390 Job Developers Guide and DataStage/390 Tutorial.

DataStage 4.1

Page 7 of 11

03/30/2013

Sample Table Definitions


The following table definitions are used in the DataStage/390 sample job.

PRODMSTR.CFD
1 PRODUCT-MASTER. 05 PRODUCT-ID. 10 PRODUCT-LINE PIC X(04). 10 PRODUCT-MODELPIC X(05). 05 LAST-UPDATE-DATE PIC X(08). 05 EFF-START-DATE PIC X(08). 05 EFF-END-DATE PIC X(08). 05 ORDER-LEAD-TIME PIC X(02). 05 STOCK-INVENTORY PIC X. 05 UOM-CODE PIC X. 05 UNIT-PRICE PIC S9(5)V99 COMP- 3. 05 WARRANTY-TYPE PIC XX. 05 WARRANTY-PERIOD PIC S9(3) COMP-3. 05 PRODUCT-DESC PIC X(20). 05 AVAILABLE-COLORS OCCURS 10 TIMES. 10 COLOR-CODE PIC X(04). 10 COLOR-DESC PIC X(15). 05 PROD-DISCOUNTS OCCURS 5 TIMES. 10 DISC-FROM-DATE PIC X(08). 10 DISC-END-DATE PIC X(08). 10 DISC-PCT PIC SV9(3) COMP-3.

UOMTBLE.DFD
EXEC SQL DECLARE XDV4.UOM TABLE ( UOMCD CHAR(1) NOT NULL, UOM_DESC CHAR(5) NOT NULL ) END-EXEC.

PRODTBL.CFD
01 PRODUCT-TABLE. 10 PRODUCT-ID 10 EXTRACT-DATE 10 EFF_START_DATE 10 LAST_UPDATE_DATE 10 PROD_DESC 10 UNIT_OF_MEASURE 10 WARRANTY_TYPE 10 WARRANTY_PERIOD PIC X (09). PIC X (08). PIC X (08). PIC X (08). PIC X (20). PIC X (05). PIC X (02). PIC S 9(3)

COMP-3.

DataStage 4.1

Page 8 of 11

03/30/2013

SALESORD.CFD
01 SALES-ORDER-INFO. 05 ORDER-NUMBER 05 LINE-ITEM-NO 05 ORDER-STATUS 05 ORDER-DATE. 10 ORDER-YY 10 FILLER 10 ORDER-MM 10 FILLER 10 ORDER-DD 05 SHIPMENT-DATE. 10 SHIPMENT-YY 10 FILLER 10 SHIPMENT-MM 10 FILLER 10 SHIPMENT-DD 05 CUSTOMER-ID 05 SALES-REP-ID 05 ROUTE-CODE 05 ORDER-TOTAL-AMT 05 SHIPPING-CHARGE 05 TAXES-PAID 05 LINE-ITEM-STATUS 05 PRODUCT-ID 05 QUANTITY-ORDERED 05 UNIT-PRICE 05 COLOR-CODE 05 DISC-PCT 05 LINE-ITEM-AMOUNT 05 LINE-ITEM-TAX 05 ITEM-ORDER-DATE. 10 ITEM-ORDER-YY 10 FILLER 10 ITEM-ORDER-MM 10 FILLER 10 ITEM-ORDER-DD 05 ITEM-SHIP-DATE 05 QUANTITY-SHIPPED 05 RECEIVED-DATE 05 BACK-ORDER-QUANTITY 05 BACK-ORDER-DATE 05 BACK-ORDER-SHIP-DATE 05 RETURN-DATE 05 RETURN-QUANTITY 05 RETURN-REASON-CODE PIC X(10). PIC 9(05). PIC X. PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC X(02). X. X(02). X. X(02). X(02). X. X(02). X. X(02). X(10). X(08). X(10). S9(7)V99 COMP-3. S9(3)V99 COMP-3. S9(5)V99 COMP-3. X. X(09). S9(3) COMP-3. S9(5)V99 COMP-3. X(02). SV9(3) COMP-3. S9(7)V99 COMP-3. PIC S9(3)V99 COMP-3. X(02). X. X(02). X. X(02). X(08). S9(03) X(08). S9(3) S9(5) S9(5) X(08). S9(3) X(02).

COMP-3. COMP-3. COMP-3. COMP-3. COMP-3.

DataStage 4.1

Page 9 of 11

03/30/2013

ORDITEMP.CFD
01 SLS-ORD-ITEM-TEMP. 05 PRODUCT-ID 05 ORDER-YY 05 ORDER-MM 05 ORDER-DATE 05 ORDER-NUMBER 05 CUSTOMER-ID 05 SALES-REP-ID 05 COLOR-DESC 05 ITEM-SHIP-DATE 05 RECEIVED-DATE 05 LINE-ITEM-STATUS 05 QUANTITY-ORDERED 05 QUANTITY-SOLD 05 UNIT-PRICE 05 DISC-AMT 05 LINE-ITEM-ORDER-AMT 05 LINE-ITEM-SALES-AMT 05 BACK-ORDER-QUANTITY 05 BACK-ORDER-DATE 05 BACK-ORDER-SHIP-DATE 05 RETURN-DATE 05 RETURN-REASON-CODE PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC X(09). X(02). X(02). X(08). X(10). X(10). X(08). X(10). X(08). X(08). X. S9(3) COMP-3. S9(3) COMP-3. S9(5)V99 COMP-3. S9(3)V99 COMP-3. S9(7)V99 COMP-3. PIC S9(7)V99 COMP-3. PIC S9(03) COMP-3. PIC S9(05) COMP-3. PIC S9(05) COMP-3. PIC X(08). PICX(02).

PRODSALE.CFD
01 PRODUCT-SALES. 10 PRODUCT-ID 10 ORDER-YY 10 ORDER-MM 10 EXTRACT-DATE 10 ORDER-STATUS 10 ORDER-NUMBER 10 CUSTOMER-ID 10 SALES-REP-ID 10 PROD-DESC 10 COLOR-DESC 10 ITEM-SHIP-DATE 10 RECEIVED-DATE 10 QUANTITY-ORDERED 10 QUANTITY-SOLD 10 UNIT-PRICE 10 DISC-AMT 10 ITEM-ORDER-AMT 10 ITEM-SALES-AMT 10 BACK-ORDER-QUANTITY 10 BACK-ORDER-DATE 10 BACK-ORDER-SHIP 10 RETURN-DATE 10 RETURN-REASON-CODE PIC X(09). PIC X(2). PIC X(2). PIC X(10). PIC X(1). PIC X(10). PIC X(10). PIC X(8). PIC X(20). PIC X(10). PIC X(8). PIC X(8). PIC S9(3) COMP-3. PIC S9(3) COMP-3. PIC S9(5)V99 COMP-3. PIC S9(3)V99 COMP-3. PIC S9(7)V99 COMP-3. PIC S9(5)V99 COMP-3. PIC S9(3) COMP-3. PIC X(10). PIC X(10). PIC X(8). PIC X(2).

DataStage 4.1

Page 10 of 11

03/30/2013

MTHSALES.CFD
01 MONTHLY-PRODUCT-SALES. 10 PRODUCT-ID 10 ORDER-YY 10 ORDER-MM 10 PROD-DESC 10 AVG-QTY-ORDERED 10 AVG-QTY-SOLD 10 AVG-UNIT-PRICE 10 AVG-DISC-AMT 10 GROSS-ORDER_AMT 10 ACT-SALES_AMT 10 BACK-ORDER-QUANTITY PIC PIC PIC PIC PIC PIC PIC PIC PIC PIC X(09). X(2). X(2). X(20). S9(3) COMP-3. PIC S9(3) COMP-3. S9(5)V99 COMP-3. S9(3)V99 COMP-3. S9(7)V99 COMP-3. S9(7)V99 COMP-3. S9(3) COMP-3.

DataStage 4.1

Page 11 of 11

03/30/2013

Вам также может понравиться