Вы находитесь на странице: 1из 9

Ver. 0.

SCDAdvanced
a talend custom component for slowly changing dimension

last update 2014-05-08

Index
1Introduction.............................................................................................................................................................................................................................................1
2Using guide..............................................................................................................................................................................................................................................1
2.1Base Parameters...............................................................................................................................................................................................................................1
2.2Advanced Parameters.......................................................................................................................................................................................................................3
2.3Job example.....................................................................................................................................................................................................................................4
3A look at the code....................................................................................................................................................................................................................................7
3.1Class view........................................................................................................................................................................................................................................7

1 Introduction
This component manage a slowly changing dimension table in a data warehouse environment, starting from the source table and its rows changing.

2 Using guide
2.1 Base Parameters
Typical parameters description (connections, schema, etc.) was omitted. Please refer to official documentation for these.
Following parameters are shared by the whole component:
Parameter name

Type

Description

Table

String

Surrogate key rule

List of values

How to manage value for surrogate key column.


DB Entity Use DB auto increment; no value passed to DB
Table Max+1 Get max value from DB, then increment by 1
Given Use surrogate key value passed in source flow

Versioning rule

List of values

How to calculate date start and date end for versioning.

Ver. 0.1

SCDAdvanced
a talend custom component for slowly changing dimension
Parameter name

Type

last update 2014-05-08

Description

Current timestamp Start date=timestamp, End date= timestamp -1s


Job start time Start date= job start timestamp, End date= job start timestamp -1s
Job start day Start date= job start day (time is 00:00:00), End date= previous day (time is 23:59:59)
of job start day
1st day of month Start date= first day (time is 00:00:00) of month of job start day, End date= last day
(time is 23:59:59) of previous month of job start day
Given value Use string value passed in Start date value text box (1) (may be a formula or a
variable), end date is that value -1s

Start date column

String

Column name for start date used for versioning

End date column

String

Column name for end date used for versioning

Default end date is null

Check

If true, insert new rows with a null end date.


If false, use string value passed in Specific value text box (1) (may be a formula or a variable)

First version specific rule

Check

If true use a specific rule for first versioning rows, choosing a rule from a list of value (look at Versioning
rule for description)

Versioning counter

Check

If true, use a column to store the versioning counter. Column name is specified in the text box and must be in
the output Schema.

Flag active row

Check

If true, use a column to store the active row flag. Column name is specified in the text box and must be in the
output Schema.

Source key include null

Check

If true, source key column may be null and managed correctly by the component

Close target missing rows

Check

If true, it's looking for rows in target that are missing in source and close them

Following parameters are specific for every output schema column:


Parameter name

Type

Description

Surrogate key

Check

True for surrogate key column. Long type is mandatory.

Source key

Check

True for source key column.

Change rule

List of values

Operation to be applied if value is changed.

Ver. 0.1

SCDAdvanced
a talend custom component for slowly changing dimension
Parameter name

Type

last update 2014-05-08

Description

Ignore column This column is every ignored (no value is passed to DB)
Versioning Close current row and insert a new one with new value
Last value only Update column with new value
Keep previous value Store current value in a previous value column (see Previous value column
name parameter), then update this column with new value
History correction Update all rows (by source key) with new value
Audit column No rule applied, store value from source if other columns needs insert or update (see
Audit rule)

Previous value column name

String

Column name that store previous value when a change is occurred

Audit rule

List of values

What operation use this value.


All rows Value is used with insert and update operation
Only added rows Value is used only with insert operation
Only updated rows Value is used only with update operation

Audit value for missing rows

String

If Close target missing rows is true, the audit value to be used when a missing row is found

(1) Every date is given in string format and parsed to java.util.Date using fomat defined in advanced parameters.

2.2 Advanced Parameters


Typical parameters description (Additional JDBC, Statistics, etc.) was omitted. Please refer to official documentation for these.
Parameter name

Type

Description

String formatter for given date String

Date format (SimpleDateFormat) to parse string parameters to Date

Preload row from target

Check

If true, preload all target rows from DB, then processing every source row, it executes search in-memory

Commit every row

Check

If true, execute a commit after every row processing.


If false, none commit is executed by component (use commit specific component)

Show debug in System.out

Check

If true, print verbose message in System.out. Useful for debugging.

SCDAdvanced
a talend custom component for slowly changing dimension

Ver. 0.1
last update 2014-05-08

SCDAdvanced
a talend custom component for slowly changing dimension
2.3 Job example
This is an example of job use.

Ver. 0.1
last update 2014-05-08

SCDAdvanced
a talend custom component for slowly changing dimension
The schema used:

The base parameter tab:

Ver. 0.1
last update 2014-05-08

SCDAdvanced
a talend custom component for slowly changing dimension

The advanced parameter tab:

Ver. 0.1
last update 2014-05-08

SCDAdvanced
a talend custom component for slowly changing dimension

Ver. 0.1
last update 2014-05-08

3 A look at the code


I choose to use a java library to separate some logic and simplify javajets component.
So source is organized like this:

tMSSqlSCDAdvanced_begin.javajet Initialize environment, create runtime-class source, get initial data (preloaded, etc..)
tMSSqlSCDAdvanced_main.javajet Process every source row applying changing rules
tMSSqlSCDAdvanced_end.javajet - Find missing target row and close
tMSSqlSCDAdvanced.skeleton Contain common code for environment, runtime-class definition, etc...
tMSSqlConnection.javajet Code for manage DB connection. Is a copy from official component.
scdAdvanced.jar Class library

At compiling, Javajet classes create some classes depending by input and output schema definition:

SCD_sourceKeyRowStruct Class that represent source key columns; it implements IStructureClass interface.
SCD_sourceRowStruct Class that represent input schema columns; it implements IStructureClass interface and also offers methods to check which
columns are changed.
SCD_targetRowStruct Class that represent output schema columns; it implements ITargetStructureClass interface; It's used to get existing rows from DB.
SCD_auditColumnForInsertRowStruct Class that contains audit data to apply when component executes an insert operation.
SCD_auditColumnForUpdateRowStruct Class that contains audit data to apply when component executes an update operation.

3.1 Class view


CLASS.java
This class contain the source code for skeleton javajet. Component doesn't use this class directly. I created it only to using most friendly features writing code
(intellisense, correction, etc..).

SCDAdvanced
a talend custom component for slowly changing dimension

Ver. 0.1
last update 2014-05-08

IStructureClass
This interface contains methods used to merge data from and to DB. The mergeWithDBGetter method get ResultSet as input and merge data with class attributes
using type-depending getter. The mergeWithDBSetter method get a PreparedStatement and a collection, that contains columns name and parameter index, as
input and merge attributes using type-depending setter.

ITargetStructureClass
It extends IstructureClass, adding getting methods for surrogate key and for versioning id.

SCDFactory
This abstract class contains all configuration depending data and creates sql statement definitions.

SCDManager
It offers operational methods to read and to write data to DB.

StatementAttribute
This class contains a sql statement string and its collections with parameter indexes. There are three different collections:

setterFilterColumnsIndex contains parameters index for Where clause

setterValueColumnsIndex contains parameters index for Insert Values clause

getterColumnsIndex contains parameters index for Select clause

StructureColumn
This class represents all columns information to be passed to SCDFactory. Javajet class create a collection that contains this definition for every output column and
pass it to SCDFactory constructor.

Utility
It contains some general used methods (enum, type mapper, etc..).

MSSqlSCDFactory
This implements abstract SCDFactory to real class for Microsoft Sql Server. It implements some DB specific attributes (separator, true/false, etc..).

Вам также может понравиться