Вы находитесь на странице: 1из 265

Pentaho Data Integration Spoon 3.

0 User Guide

Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective o ners. !or the latest information" please visit our eb site at
.pentaho.org

Last Modified on October 26th, 2007

1. Contents
#. Contents................................................................................................................................................. 2 2. About $his %ocument.............................................................................................................................. & 2.#. 'hat it is...................................................................................................................................... & 2.2. 'hat it is not................................................................................................................................ & (. )ntroduction to *poon.............................................................................................................................. #0 (.#. 'hat is *poon+............................................................................................................................. #0 (.2. )nstallation................................................................................................................................... #0 (.(. ,aunching *poon........................................................................................................................... ## (.-. *upported platforms...................................................................................................................... ## (... /no n )ssues............................................................................................................................... ## (.0. *creen shots................................................................................................................................. #2 (.7. Command line options................................................................................................................... #( (.1. Repository.................................................................................................................................... #. (.1.#. Repository Auto2,ogin......................................................................................................... #0 (.&. ,icense......................................................................................................................................... #0 (.#0. %efinitions.................................................................................................................................. #7 (.#0.#. $ransformation %efinitions................................................................................................. #7 (.##. $oolbar....................................................................................................................................... #1 (.#2. 3ptions...................................................................................................................................... #& (.#2.#. 4eneral $ab...................................................................................................................... #& (.#2.2. ,ook 5 !eel tab................................................................................................................. 2# (.#-. *earch 6eta data........................................................................................................................ 2(.#.. *et environment variable............................................................................................................. 2(.#0. 78ecution log history................................................................................................................... 2. (.#7. Replay........................................................................................................................................ 2. (.#1. 4enerate mapping against target step........................................................................................... 20 (.#1.#. 4enerate mappings e8ample.............................................................................................. 20 (.#&. *afe mode.................................................................................................................................. 27 (.20. 'elcome *creen.......................................................................................................................... 27 -. Creating a $ransformation or 9ob.............................................................................................................. (# -.#. :otes........................................................................................................................................... (# -.2. *creen shot.................................................................................................................................. (2 -.(. Creating a ne database connection............................................................................................... (2 -.(.#. 4eneral.............................................................................................................................. (( -.(.2. Pooling............................................................................................................................... (( -.(.(. 6y*;,............................................................................................................................... (-.(.-. 3racle................................................................................................................................ (-.(... )nformi8............................................................................................................................. (-.(.0. *;, *erver......................................................................................................................... (. -.(.7. *AP R<(............................................................................................................................. (0

Pentaho Data Integration TM

S oon !ser "#ide 2

-.(.1. 4eneric.............................................................................................................................. (0 -.(.&. 3ptions.............................................................................................................................. (7 -.(.#0. *;,.................................................................................................................................. (7 -.(.##. Cluster............................................................................................................................. (7 -.(.#2. Advanced......................................................................................................................... (1 -.(.#(. $est a connection.............................................................................................................. (1 -.(.#-. 78plore............................................................................................................................. (1 -.(.#.. !eature ,ist...................................................................................................................... (1 -.-. 7diting a connection...................................................................................................................... (1 -... %uplicate a connection................................................................................................................... (1 -.0. Copy to clipboard.......................................................................................................................... (1 -.7. 78ecute *;, commands on a connection......................................................................................... (1 -.1. Clear %= Cache option................................................................................................................... (& -.&. ;uoting........................................................................................................................................ (& -.#0. %atabase >sage 4rid................................................................................................................... (& -.##. Configuring 9:%) connections....................................................................................................... -2 -.#2. >nsupported databases................................................................................................................ -.. *;, 7ditor.............................................................................................................................................. -. ..#. %escription................................................................................................................................... -. ..2. ,imitations.................................................................................................................................... -. 0. %atabase 78plorer................................................................................................................................... -0 7. ?ops...................................................................................................................................................... -7 7.#. %escription................................................................................................................................... -7 7.#.#. $ransformation ?ops........................................................................................................... -7 7.#.2. 9ob ?ops............................................................................................................................ -7 7.2. Creating A ?op............................................................................................................................. -1 7.(. ,oops........................................................................................................................................... -1 7.-. 6i8ing ro s@ trap detector............................................................................................................. -1 7... $ransformation hop colors............................................................................................................. -& 1. Aariables................................................................................................................................................ .0 1.#. Aariable usage.............................................................................................................................. .0 1.2. Aariable scope.............................................................................................................................. .0 1.2.#. 7nvironment variables......................................................................................................... .0 1.2.2. /ettle variables................................................................................................................... .# 1.2.(. )nternal variables................................................................................................................ .# &. $ransformation *ettings........................................................................................................................... .2 &.#. %escription................................................................................................................................... .2 &.2. $ransformation $ab....................................................................................................................... .2 &.(. ,ogging........................................................................................................................................ .2 &.-. %ates........................................................................................................................................... .( &... %ependencies............................................................................................................................... .( &.0. 6iscellaneous............................................................................................................................... .( &.7. Partitioning................................................................................................................................... .&.1. *;, =utton................................................................................................................................... .#0. $ransformation *teps............................................................................................................................. ..

Pentaho Data Integration TM

S oon !ser "#ide $

#0.#. %escription................................................................................................................................. .. #0.2. ,aunching several copies of a step................................................................................................ .. #0.(. %istribute or copy+...................................................................................................................... .7 #0.-. *tep error handling...................................................................................................................... .1 #0... Apache Airtual !ile *ystem BA!*C support ..................................................................................... 0# #0...#. 78ample@ Referencing remote Dob files................................................................................ 0# #0...2. 78ample@ Referencing files inside a Eip................................................................................ 02 #0.0. $ransformation *tep $ypes........................................................................................................... 0( #0.0.#. $e8t !ile )nput.................................................................................................................. 0( #0.0.2. $able input....................................................................................................................... 72 #0.0.(. 4et *ystem )nfo................................................................................................................ 71 #0.0.-. 4enerate Ro s................................................................................................................. 1# #0.0... %e2serialiFe from file Bformerly Cube )nputC......................................................................... 12 #0.0.0. G=ase input...................................................................................................................... 1( #0.0.7. 78cel input........................................................................................................................ 1#0.0.1. 4et !ile :ames.................................................................................................................. 1& #0.0.&. $e8t !ile 3utput................................................................................................................ &0 #0.0.#0. $able output................................................................................................................... &( #0.0.##. )nsert < >pdate............................................................................................................... &. #0.0.#2. >pdate........................................................................................................................... &7 #0.0.#(. %elete............................................................................................................................ &1 #0.0.#-. *erialiFe to file Bformerly Cube !ile 3utputC........................................................................ && #0.0.#.. G6, 3utput..................................................................................................................... #00 #0.0.#0. 78cel 3utput................................................................................................................... #02 #0.0.#7. 6icrosoft Access 3utput................................................................................................... #0#0.0.#1. %atabase lookup.............................................................................................................. #0. #0.0.#&. *tream lookup................................................................................................................. #07 #0.0.20. Call %= Procedure............................................................................................................ #0& #0.0.2#. ?$$P Client..................................................................................................................... ### #0.0.22. *elect values................................................................................................................... ##2 #0.0.2(. !ilter ro s....................................................................................................................... ###0.0.2-. *ort ro s........................................................................................................................ ##0 #0.0.2.. Add seHuence................................................................................................................. ##7 #0.0.20. %ummy Bdo nothingC....................................................................................................... ##& #0.0.27. Ro :ormaliser............................................................................................................... #20 #0.0.21. *plit !ields...................................................................................................................... #22 #0.0.(0. >niHue ro s.................................................................................................................... #2. #0.0.(#. 4roup =y........................................................................................................................ #20 #0.0.(2. :ull )f............................................................................................................................. #21 #0.0.((. Calculator....................................................................................................................... #2& #0.0.(-. G6, Add......................................................................................................................... #(# #0.0.(.. Add constants................................................................................................................. #(#0.0.(0. Ro %enormaliser........................................................................................................... #(. #0.0.(7. !lattener......................................................................................................................... #(0 #0.0.(1. Aalue 6apper.................................................................................................................. #(1

Pentaho Data Integration TM

S oon !ser "#ide %

#0.0.(&. =locking step................................................................................................................... #(& #0.0.-0. 9oin Ro s BCartesian productC.......................................................................................... #-0 #0.0.-#. %atabase 9oin................................................................................................................. #-2 #0.0.-2. 6erge ro s..................................................................................................................... #-#0.0.-(. *orted 6erge.................................................................................................................. #-. #0.0.--. 6erge 9oin...................................................................................................................... #-0 #0.0.-.. 9ava*cript Aalues............................................................................................................. #-7 #0.0.-0. 6odified 9ava *cript Aalue................................................................................................ #.#0.0.-7. 78ecute *;, script........................................................................................................... #.0 #0.0.-1. %imension lookup<update................................................................................................. #.1 #0.0.-&. Combination lookup<update.............................................................................................. #0( #0.0..0. 6apping......................................................................................................................... #00 #0.0..#. 4et ro s from result........................................................................................................ #0& #0.0..2. Copy ro s to result.......................................................................................................... #0& #0.0..(. *et Aariable.................................................................................................................... #70 #0.0..-. 4et Aariable.................................................................................................................... #7# #0.0.... 4et files from result......................................................................................................... #72 #0.0..0. *et files in result.............................................................................................................. #7( #0.0..7. )nDector.......................................................................................................................... #7#0.0..1. *ocket reader.................................................................................................................. #7. #0.0..&. *ocket riter................................................................................................................... #7. #0.0.00. Aggregate Ro s.............................................................................................................. #70 #0.0.0#. *treaming G6, )nput....................................................................................................... #77 #0.0.02. Abort ............................................................................................................................. #12 #0.0.0(. 3racle =ulk ,oader ......................................................................................................... #1( #0.0.0-. Append .......................................................................................................................... #1. #0.0.0.. Rege8 7valuation ........................................................................................................... #10 #0.0.00. C*A )nput....................................................................................................................... #11 #0.0.07. !i8ed !ile )nput............................................................................................................... #1& #0.0.01. 6icrosoft Access )nput..................................................................................................... #&# #0.0.0&. ,%AP )nput..................................................................................................................... #&( #0.0.70. Closure 4enerator............................................................................................................ #&. #0.0.7#. 6ondrian )nput............................................................................................................... #&0 #0.0.72. 4et !iles Ro Count........................................................................................................ #&7 #0.0.7(. %ummy Plugin................................................................................................................. #&1 ##. 9ob *ettings.......................................................................................................................................... #&& ##.#. %escription................................................................................................................................. #&& ##.2. 9ob $ab...................................................................................................................................... #&& ##.(. ,og $ab...................................................................................................................................... #&& #2. 9ob 7ntries............................................................................................................................................ 20# #2.#. %escription................................................................................................................................. 20# #2.2. 9ob 7ntry $ypes.......................................................................................................................... 20# #2.2.#. *tart................................................................................................................................ 20# #2.2.2. %ummy 9ob 7ntry.............................................................................................................. 20# #2.2.(. $ransformation................................................................................................................. 202

Pentaho Data Integration TM

S oon !ser "#ide &

#2.2.-. 9ob.................................................................................................................................. 20#2.2... *hell................................................................................................................................ 200 #2.2.0. 6ail.................................................................................................................................. 201 #2.2.7. *;,.................................................................................................................................. 2#0 #2.2.1. 4et a file ith !$P............................................................................................................. 2## #2.2.&. $able 78ists...................................................................................................................... 2#( #2.2.#0. !ile 78ists....................................................................................................................... 2##2.2.##. 4et a file ith *!$P......................................................................................................... 2#. #2.2.#2. ?$$P.............................................................................................................................. 2#0 #2.2.#(. Create a file.................................................................................................................... 2#1 #2.2.#-. %elete a file.................................................................................................................... 2#& #2.2.#.. 'ait for a file.................................................................................................................. 220 #2.2.#0. !ile compare................................................................................................................... 22# #2.2.#7. Put a file ith *!$P......................................................................................................... 222 #2.2.#1. Ping a host..................................................................................................................... 22( #2.2.#&. 'ait for.......................................................................................................................... 22#2.2.20. %isplay 6sgbo8 info......................................................................................................... 22. #2.2.2#. Abort Dob........................................................................................................................ 220 #2.2.22. G*, transformation.......................................................................................................... 227 #2.2.2(. Eip files.......................................................................................................................... 221 #2.2.2-. =ulkload into 6y*;,........................................................................................................ 22& #2.2.2.. 4et 6ails from P3P.......................................................................................................... 2(# #2.2.20. %elete !iles..................................................................................................................... 2(2 #2.2.27. *uccess.......................................................................................................................... 2(( #2.2.21. G*% Aalidator.................................................................................................................. 2(#2.2.2&. 'rite to log..................................................................................................................... 2(. #2.2.(0. Copy !iles....................................................................................................................... 2(0 #2.2.(#. %$% Aalidator................................................................................................................. 2(7 #2.2.(2. Put a file ith !$P........................................................................................................... 2(1 #2.2.((. >nFip.............................................................................................................................. 2(& #2.2.(-. %ummy 9ob 7ntry............................................................................................................ 2-0 #(. 4raphical Aie ...................................................................................................................................... 2-# #(.#. %escription................................................................................................................................. 2-# #(.2. Adding steps or Dob entries........................................................................................................... 2-# #(.2.#. Create steps by drag and drop........................................................................................... 2-# #(.(. ?iding a step.............................................................................................................................. 2-2 #(.-. $ransformation *tep options Bright2click menuC.............................................................................. 2-2 #(.-.#. 7dit step........................................................................................................................... 2-2 #(.-.2. 7dit step description.......................................................................................................... 2-2 #(.-.(. %ata movement................................................................................................................ 2-2 #(.-.-. Change number of copies to start....................................................................................... 2-2 #(.-... Copy to clipboard.............................................................................................................. 2-2 #(.-.0. %uplicate *tep................................................................................................................... 2-2 #(.-.7. %elete step....................................................................................................................... 2-2 #(.-.1. ?ide *tep......................................................................................................................... 2-2

Pentaho Data Integration TM

S oon !ser "#ide 6

#(.-.&. *ho #(.-.#0. *ho

input fields............................................................................................................... 2-2 output fields........................................................................................................... 2-2

#(... 9ob entry options Bright2click menuC.............................................................................................. 2-2 #(...#. 3pen $ransformation<9ob................................................................................................... 2-2 #(...2. 7dit Dob entry.................................................................................................................... 2-2 #(...(. 7dit Dob entry description................................................................................................... 2-( #(...-. Create shado copy of Dob entry........................................................................................ 2-( #(..... Copy selected entries to clipboard BC$R,2CC........................................................................ 2-( #(...0. Align < distribute................................................................................................................ 2-( #(...7. %etach entry..................................................................................................................... 2-( #(...1. %elete all copies of this entry............................................................................................. 2-( #(.0. Adding hops................................................................................................................................ 2-( #-. Running a $ransformation...................................................................................................................... 2-#-.#. Running a $ransformation 3vervie .............................................................................................. 2-#-.2. 78ecution 3ptions........................................................................................................................ 2-#-.2.#. 'here to 78ecute.............................................................................................................. 2-#-.2.2. 3ther 3ptions................................................................................................................... 2-#-.(. *etting up Remote and *lave *ervers............................................................................................ 2-. #-.(.#. 4eneral description............................................................................................................ 2-. #-.(.2. Configuring a remote or slave server................................................................................... 2-. #-.-. Clustering................................................................................................................................... 2-7 #-.-.#. 3vervie .......................................................................................................................... 2-7 #-.-.2. Creating a cluster schema.................................................................................................. 2-7 #-.-.(. 3ptions............................................................................................................................ 2-7 #-.-.-. Running transformations using a cluster.............................................................................. 2-1 #-.-... =asic Clustering 78ample................................................................................................... 2-1 #.. ,ogging................................................................................................................................................ 2.# #..#. ,ogging %escription..................................................................................................................... 2.# #..2. ,og 4rid..................................................................................................................................... 2.# #..2.#. $ransformation ,og 4rid %etails.......................................................................................... 2.# #..2.2. 9ob ,og 4rid..................................................................................................................... 2.2 #..(. =uttons...................................................................................................................................... 2.2 #..(.#. #..-.# $ransformation =uttons .......................................................................................... 2.2 #..(.2. 9ob =uttons...................................................................................................................... 2.. #0. 4rids.................................................................................................................................................... 2.0 #0.#. %escription................................................................................................................................. 2.0 #0.2. >sage......................................................................................................................................... 2.0 #7. Repository 78plorer................................................................................................................................ 2.7 #7.#. %escription................................................................................................................................. 2.7 #7.2. Right click functions..................................................................................................................... 2.7 #7.(. =ackup < Recovery....................................................................................................................... 2.7 #1. *hared obDects...................................................................................................................................... 2.1 #&. APP7:%)G A@ ,4P, ,icense.................................................................................................................... 2.&

Pentaho Data Integration TM

S oon !ser "#ide 7

2. 'bo#t This Doc#(ent


2.1. )hat it is
$his document is a technical description of *poon" the graphical transformation and Dob designer of the Pentaho %ata )ntegration suite also kno n as the /ettle proDect.

2.2. )hat it is not


$his document does not attempt to describe in great detail ho integration solutions" *poon empo ers users 3ther documentation ?ere are links to other documents that you might be interesting to go through transformations@ !lash demos" screen shots" and an introduction to building a simple transformation@ http@<<kettle.pentaho.org<screenshots< Pentaho %ata )ntegration community http@<<kettle.pentaho.org Pentaho %ata )ntegration !orum I discussions on design" features" bugs and enhancements@ http@<<forums.pentaho.org<forumdisplay.php+fK0& Running transformations in batch using Pan@ Running Dobs in batch using /itchen@ Pan2(.0.pdf /itchen2(.0.pdf ebsite I ne s" case studies" eekly tips and more@ hen you are building to create Dobs and transformations for all possible situations. RecogniFing that different developers have different approaches to designing their data ith the freedom and fle8ibility to design solutions in the ay it should beJ manner they feel most appropriate to the problem at hand I and that is the

An introduction to Pentaho %ata )ntegration in Roland =oumanLs blog@ http@<<rpbouman.blogspot.com<2000<00<pentaho2data2integration2kettle2turns.html :icholas 4oodman is also blogging on /ettle and =)@ http@<< .nicholasgoodman.com

Pentaho Data Integration TM

S oon !ser "#ide *

$. Introd#ction to S oon
$.1. )hat is S oon+
/ettle is an acronym for M/ettle 7.$.$.,. 7nvironmentN. $his means it has been designed to help you your 7$$, needs@ the 78traction" $ransformation" $ransportation and ,oading of data. *poon is a graphical user interface that allo s you to design transformations and Dobs that can be run the /ettle tools Pan and /itchen. Pan is a data transformation engine that is capable of performing a multitude of functions such as reading" manipulating and riting data to and from various data sources. /itchen is a program that can e8ecute Dobs designed by *poon in G6, or in a database repository. >sually Dobs are scheduled in batch mode to be run automatically at regular intervals. NOTE: !or a complete description of Pan or /itchen" please refer to the Pan and /itchen user guides. $ransformations and 9obs can describe themselves using an G6, file or can be put in a /ettle database repository. $his information can then be read by Pan or /itchen to e8ecute the described steps in the transformation or run the Dob. )n short" Pentaho %ata )ntegration makes data arehouses easier to build" update and maintainJ ith ith

$.2. Insta,,ation
$he first step is the installation of *un 6icrosystems 9ava Runtime 7nvironment version #.- or higher. Oou can do nload a 9R7 for free at http@<< .Davasoft.com<.

After this" you can simply unFip the Fip2file@ /ettle2(.0.Fip in a directory of your choice. )n the /ettle directory here you unFipped the file" you ill find a number of files. >nder >ni82like environments B*olaris" ,inu8" 6ac3*" PC you ill need to make the shell scripts e8ecutable. 78ecute these commands to make all

shell scripts in the /ettle directory e8ecutable@ cd Kettle chmod +x *.sh

Pentaho Data Integration TM

S oon !ser "#ide -

$.$. La#nching S oon


$o launch *poon on the different platforms these are the scripts that are provided@ Spoon.bat@ spoon.sh@ )f you launch *poon on the 'indo s platform. launch *poon on a >ni82like platform@ ,inu8" Apple 3*G" *olaris" ...

ant to make a shortcut under the 'indo s platform an icon is provided@ Mspoon.icoN to set the

correct icon. *imply point the shortcut to the *poon.bat file.

$.%. S#

orted ,atfor(s
6icrosoft 'indo s@ all platforms since 'indo s &." including Aista ,inu8 4$/@ on i(10 and 810Q0- processors" AppleLs 3*G@ *olaris@ using a 6otif interface B4$/ optionalC A)G@ using a 6otif interface ?P2>G@ using a 6otif interface B4$/ optionalC !ree=*%@ preliminary support on i(10" not yet on 810Q0orks best on 4nome orks both on Po erPC and )ntel machines

$he *poon 4>) is supported on the follo ing platforms@

$.&. .no/n Iss#es


Linux 3ccasional 9A6 crashes running *u*7 ,inu8 and /%7. Running under 4nome has no problems. Bdetected on *>*7 ,inu8 #0.# but earlier versions suffer the same problemC FreeBSD Problems stepC Please check the $racker lists at http@<<kettle.Davaforge.com for up2to2date information on discovered issues. ith drag and drop. 'orkaround is to use the right click popup menu on the canvas. B)nsert ne

Pentaho Data Integration TM

S oon !ser "#ide 10

$.6. Screen shots

%esigning a $ransformation

%esigning a Dob

$he 6ain tree in the upper2left panel of *poon allo s you to bro se connections along

ith the Dobs and

transformations you currently have open. 'hen designing a transformation" the Core 3bDects palate in the lo er left2panel contains the available steps used to build your transformation including input" output" lookup" transform" Doins" scripting steps and more. 'hen designing a Dob" the Core obDects palate contains the available Dob entries. 'hen designing a Dob" the Core 3bDects bar contains a variety of Dob entry types. Pentaho Data Integration TM S oon !ser "#ide 11

$hese items are described in detail in the chapters belo @ -. %atabase Connections" 7. ?ops" #0. $ransformation *teps" #2. 9ob 7ntries" #(. 4raphical Aie .

$.7. Co((and ,ine o tions


$hese are the command line options that you can use -file=filename $his option runs the specified transformation B.ktr @ /ettle $ransformationC. -logfile=Logging Filename $his option allo s you to specify the location of the log file. $he default is the standard output. -level=Logging Level $he level option sets the log level for the transformation being run. $hese are the possible values@ Nothing: Error: Minimal: Basic: Detailed: Debug: o!le"el: %o not sho 3nly sho any output errors hen starting the *poon application@

>se minimal logging $his is the default basic logging level 4ive detailed logging output *ho very detailed output for debugging purposes. level. 'arning 2 this %etailed logging at a ro

ill generate a lot of data.

-rep=Repository name Connect to the repository ith name MRepository nameN.

Note: Oou also need to specify the options Iuser" Ipass and Itrans described belo . $he repository details are loaded from the file repositories.8ml in the local directory or in the /ettle directory@ R?367<.kettle< or C@S%ocuments and *ettingsSTusernameUS.kettle on 'indo s. -user=Username $his is the username ith hich you ant to connect to the repository.

-pass=Password $he pass ord to use to connect to the repository.2transK$ransformation :ame

Pentaho Data Integration TM

S oon !ser "#ide 12

>se this option to select the transformation to run from the repository. -job=Job Name >se this option to select the Dob to run from the repository. #mportant Notes:

3n 'indo s"

e advise you to use the /option:value format to avoid command line parsing

problems by the 6*2%3* shell. !ields in italic represent the values that the options use.

)tVs important that if spaces are present in the option values" you use Huotes or double Huotes to keep them together. $ake a look at the e8amples belo for more info.

Pentaho Data Integration TM

S oon !ser "#ide 1$

$.*. 0e ositor1
*poon provides you ith the ability to store transformation and Dob files to the local file system or in the /ettle repository. $he /ettle repository can be housed in any common relational database. $his means that in order to load a transformation from a database repository" you need to connect to this repository. $o do this" you need to define a database connection to this repository. Oou can do this using the repositories dialog you are presented ith hen you start up *poon@

$he Repository login screen

$he information concerning repositories is stored in a file called Mrepositories.8mlN. $his file resides in the hidden directory M.kettleN in your default home directory. 3n *ettingsSTusernameUS.kettle Note: $he complete path and filename of this file is displayed on the *poon console. )f you donLt ant this dialog to be sho n each time *poon starts up" you can disable it by unchecking the indo s this is C@S%ocuments and

LPresent this dialog at startupL checkbo8 or by using the 3ptions dialog under the 7dit < 3ptions menu. *ee also 2.#-. 3ptions. Note: $he default pass ord for the admin user is also admin. Oou should change this default pass ord

right after the creation using the Repository 78plorer or the MRepository<7dit >serN menu.

Pentaho Data Integration TM

S oon !ser "#ide 1%

$.*.1. 0e ositor1 '#to2Login


Oou can have *poon automatically log into the repository by setting the follo ing environment variables@ /7$$,7QR7P3*)$3RO" /7$$,7Q>*7R and /7$$,7QPA**'3R%. $his prevents you from having to log into the same repository every time. #mportant Note: this is a security risk and you should al ays lock your computer to prevent unauthoriFed

access to the repository.

$.-. License
=eginning ith version 2.2.0" /ettle as released into the public domain under the ,4P, license. Please refer to Appendi8 A for the full te8t of this license. Note: Pentaho %ata )ntegration is referred to as M/ettleN belo . Copyright (C) !!" #e$taho Corporatio$

Kettle is free soft%are& you ca$ redistribute it a$d'or modify it u$der the terms of the ()* +esser (e$eral #ublic +ice$se as published by the ,ree -oft%are ,ou$datio$& either versio$ optio$) a$y later versio$. Kettle is distributed i$ the hope that it %ill be useful/ but 01234*2 5)6 05775)26& %ithout eve$ the implied %arra$ty of 897C35)25:1+126 or ,12)9-,47 5 #5721C*+57 #*7#4-9. -ee the ()* +esser (e$eral #ublic +ice$se for more details. 6ou should have received a copy of the ()* +esser (e$eral #ublic +ice$se alo$g %ith the Kettle distributio$& if $ot/ %rite to the ,ree -oft%are ,ou$datio$/ 1$c./ ;. ,ra$<li$ -t/ ,ifth ,loor/ :osto$/ 85 ! ..!-.=!. *-5 .. of the +ice$se/ or (at your

Pentaho Data Integration TM

S oon !ser "#ide 1&

$.10. Definitions
$.10.1. Transfor(ation Definitions
$alue: Aalues are part of a ro o!: a ro and can contain any type of data@ *trings" floating point :umbers" unlimited precision =ig:umbers" )ntegers" %ates or =oolean values. e8ists of 0 or more values Output stream: an output stream is a stack of ro s that leaves a step. #nput stream: an input stream is a stack of ro s that enters a step. %op: a hop is a graphical representation of one or more data streams bet een 2 steps. A hop al ays represents the output stream for one step and the input stream for another. $he number of streams is eHual to the copies of the destination step. B# or moreC Note: a note is a descriptive piece of information that can be added to a transformation

9ob %efinitions &ob Entr': A Dob entry is one part of a Dob and performs a certain task %op: a hop is a graphical representation of one or more data streams bet een 2 steps. A hop al ays represents the link bet een t o Dob entries and can be set Bdepending on the type of originating Dob entryC to e8ecute the ne8t Dob entry unconditionally" after successful e8ecution or failed e8ecution. Note: a note is a descriptive piece of information that can be added to a Dob

Pentaho Data Integration TM

S oon !ser "#ide 16

$.11. Too,bar
$he icons on the toolbar of the main screen are from left to right@ )con %escription Create a ne Dob or transformation

3pen transformation<Dob from file if youVre not connected to a repository or from the repository if you are connected to one. *ave the transformation<Dob to a file or to the repository. *ave the transformation<Dob under a different name or filename. 3pen the print dialog. Run transformation<Dob@ runs the current transformation from G6, file or repository. Previe transformation@ runs the current transformation from memory. Oou can previe the ro s

that are produced by selected steps. Run the transformation in debug mode allo ing you to troubleshoot e8ecution errors. Replay the processing of a transformation for a certain date and time. $his during the run on that particular date and time. Aerify transformation@ *poon runs a number of checks for every step to see if everything is going to run as it should. Run an impact analysis@ hat impact does the transformation have on the used databases. ill cause certain

steps B$e8t !ile )nput and 78cel )nputC to only process ro s that failed to be interpreted correctly

4enerate the *;, that is needed to run the loaded transformation. ,aunches the database e8plorer allo ing you to previe more. data" run *;, Hueries" generate %%, and

Pentaho Data Integration TM

S oon !ser "#ide 17

$.12. O tions
/ettle options allo you to customiFe a number of properties related to the behavior and look and feel of hether or not to display tips and the the graphical user interface. 78amples include startup options like select 7ditW3ptions... from the menubar. /ettle 'elcome Page" and user interface options like fonts and the colors. $o access the options dialog"

$.12.1. "enera, Tab

3ptions 2 4eneral tab

!eature 6a8imum >ndo ,evel %efault number of lines in previe dialog 6a8imum nr of lines in the logging indo s *ho *ho tips at startup+ elcome page at startup+

%escription $his parameter sets the ma8imum number of steps that can be undone Bor redoneC by *poon. $his parameter allo s you to change the default number of ro s that are reHuested from a step during transformation previe s. *pecify the ma8imum limit of ro s to display in the logging indo . $his options sets the display of tips at startup. $his option controls page hether or not to display the elcome hen launching *poon.

Pentaho Data Integration TM

S oon !ser "#ide 1*

!eature >se database cache+

%escription *poon caches information that is stored on source and target databases. )n some cases this can lead to incorrect results hen youVre in the process of changing those very databases. )n those cases it is possible to disable the cache altogether instead of clearing the cache every time. NOTE: *poon automatically clears the database cache

hen

you launch %%, B%ata %efinition ,anguageC statements to ards a database connection. ?o ever"
3pen last file at startup+

hen using (rd party

tools" clearing the database cache manually may be necessary.


7nable this option to automatically Btry toC load the last transformation you used Bopened or savedC from G6, or repository. Auto save changed files+ 3nly sho tree+ $his option automatically saves a changed transformation before running. the active file in the main $his option reduces the number of transformation and Dob items in the main tree on the left by only sho ing the currently active file. 3nly save used connections to G6,+ $his option limits the G6, e8port of a transformation to the used connections in that transformation. $his comes in handy hile e8changing sample transformations to avoid having all defined connections to be included. Ask about replacing e8isting connections on open<import+ Replace e8isting connections on open<import+ *ho M*aveN dialog+ $his option asks before replacing e8isting database connections during import. $his is the action thatLs being taken sho n. Bsee previous optionC $his flag allo s you to turn off the confirmation dialogs you receive Automatically split hops+ *ho Mcopy or distributeN dialog+ hen a transformation has been changed. hen you hen $his option turns off the confirmation dialogs you get ant to split a hop. Bsee also 7.-. *plitting A ?opC $his options turns off the arning message that appears arning message you link a step to multiple outputs. $his hen there is no dialog

describes the t o options for handling multiple outputs@ %istribute ro s I destination steps receive the ro s in turns Bround robinC Copy ro s I all ro s are sent to all destinations *ho repository dialog at startup+ hen e8iting+ $his option controls sho s up at startup. Ask user $his option controls dialog Clear custom parameters Bsteps<pluginsC Pentaho Data Integration TM hether or not to display the confirmation ere set in the hen a user chooses to e8it the application. hether or not the repositories dialog

$his option clears all parameters and flags that plugin or step dialogs.

S oon !ser "#ide 1-

!eature %isplay tootlips+

%escription $his option controls hether or not to display tooltips for the buttons on the main toolbar.

$.12.2. Loo3 4 5ee, tab

3ptions I ,ook and !eel tab

!eature !i8ed !ont on idth font orkspace

%escription $his is the font that is used in the dialog bo8es" trees" input fields" etc. $his is the font that is used on the graphical vie . $his font is used in the notes that are displayed in the 4raphical Aie . *ets the background color in *poon. )t affects all dialogs too. *ets the background color in the 4raphical Aie of *poon.

!ont for notes =ackground color 'orkspace background color $ab color

$his is the color that is being used to indicate tabs that are active<selected.

)con siFe in

orkspace

$his affects the siFe of the icons in the graph

indo . $he original siFe

of an icon is (28(2 pi8els. $he best results BgraphicallyC are probably at siFes #0"2-"(2"-1"0- and other multiples of (2. ,ine idth on orkspace $his affects the line idth of the hops on the 4raphical Aie and the

border around the steps. *hado siFe on orkspace )f this siFe is larger then 0" a shado of the steps" hops and notes is

dra n on the canvas" making it look like the transformation floats

Pentaho Data Integration TM

S oon !ser "#ide 20

!eature

%escription above the canvas.

%ialog middle percentage

=y default" a parameter is dra n at (.X of the counted from the left. Oou can change this Perhaps this can be useful in cases fonts.

idth of the dialog"

ith this parameter.

here you use unusually large

Canvas anti2aliasing+

*ome platforms like 'indo s" 3*G and ,inu8 support anti2aliasing through 4%)" Carbon or Cairo. Check this to enable smoother lines and icons in your graph vie . )f you enable this and your environment doesnLt ork any more after ards" change the value for option M7nableAntiAliasingN to M:N in file R?367<.kettle<.spoonrc BC@S%ocuments and *ettingsSTuserUS.kettleS.spoonrc on 'indo sC

>se look of 3*+

Checking this on 'indo s allo s you to use the default system settings for fonts and colors in *poon. 3n other platforms" this is al ays the case.

*ho

branding graphics

7nabling this option

ill dra

Pentaho %ata )ntegration branding

graphics on the canvas and in the left hand side Me8pand barN. Preferred ,anguage ?ere you can specify the default language setting. )f a certain te8t hasnLt been translated into this locale" /ettle over locale. Alternative ,anguage =ecause the original language in hich /ettle as ritten is 7nglish" ill fall back to the fail

itLs best to set this locale to 7nglish.

Pentaho Data Integration TM

S oon !ser "#ide 21

$.1$. Search Meta data

*earch 6eta data %ialog

$his option

ill search in any available fields" connectors or notes of all loaded Dobs and transformations for

the string specified in the !ilter field. $he 6eta data search returns a detailed result set sho ing the location of any search hits. $his feature is accessed by choosing 7ditW*earch 6eta data from the menubar.

$.1%. Set en6iron(ent 6ariab,e

*et 7nvironment Aariable %ialog

$he *et 7nvironment Aariable feature allo s you to e8plicitly create and set environment variables for the current user session. $his is a useful feature hen designing transformations for testing variable substitutions that are normally set dynamically by another Dob or transformation. $his feature is accessible by choosing 7ditW*et 7nvironment Aariable from the menubar. Note: $his screen is also presented hen you run a transformation that use undefined variables. $his

allo s you to define them right before e8ecution time.

Sho/ en6iron(ent 6ariab,es


$his feature ill display the current list of environment variables and their values. )t is accessed by environment variables option from the menubar. selecting the 7ditW*ho

Pentaho Data Integration TM

S oon !ser "#ide 22

$.1&. 78ec#tion ,og histor1


)f you have configured your 9ob or $ransformation to store log information in a database table" you can vie the log information from previous e8ecutions by right2clicking on the Dob or transformation in the 6ain ill sho $ree and selecting L3pen ?istory Aie L. $his vie

$ransformation ?istory $ab

NOTE: $he log history for a Dob or transformation

ill also open by default each ne8t time you e8ecute the

file.

$.16. 0e ,a1

$he Replay feature allo s you to re2run a transformation that failed. Replay functionality is implemented for $e8t !ile )nput and 78cel input. )t allo s you to send files that had errors back to the source and have the data corrected. 3:,O the lines that failed before are then processed during the replay if a .line file is present. )t uses the date in the filename of the .line file to match the entered replay date.

Pentaho Data Integration TM

S oon !ser "#ide 2$

$.17. "enerate (a
)n cases

ing against target ste


ill ant to map the fields from the stream to their ith an easy2to2use dialog

here you have a fi8ed target table" you

corresponding fields in the target output table. $his is normally accomplished using a *elect Aalues step in your transformation. $he L4enerate mapping against targetL option provides you for defining these mappings that dropped into your transformation flo prior to the table output step. ill automatically create the resulting *elect Aalues step that can be

$o access the L4enerate mapping against targetL option is accessed by right2clicking on the table output step.

4enerate 6apping %ialog

After defining your mappings" select 3/ and the *elect Aalues step containing your mappings the step into your transformation Dust before the table output step.

ill appear on

orkspace. *imply" attach the mapping step into your transformation immediatelyAttach the mapping

$.17.1. "enerate (a
table@

ings e8a( ,e
hich e ant to generate mappings to our target output

?ere is an e8ample of a simple transformation in

*plit hop before generating mappings

=egin by right2clicking on the $able output step and selecting L4enerate mappings against targetL. Add all necessary mappings using the 4enerate 6apping dialog sho n above and click 3/. Oou $able output mapping step has been added to the canvas@ ill no see a

Pentaho Data Integration TM

S oon !ser "#ide 2%

$able output 6apping *tep added to canvas

!inally" drag the generated $able output 6apping step into your transformation flo output step@

prior to the table

)nsert mapping step into transformation flo

$.1*. Safe (ode


)n cases here you are mi8ing the ro s from various sources" you need to make sure that these ro indo or on the 78ecute a $ransformation<9ob that passes and all have the same layout in all conditions. !or this purpose" the *poon logging the transformation )f a ro ill check every ro e added a Msafe modeN option that is available in indo . 'hen running in Msafe modeN" ill see if the layouts are all identical.

is found that does not have the same layout as the first ro " an error is thro n and the step and are reported on.

offending ro

Note: this option is also available in Pan

ith the Msafe modeN option.

$.1-. )e,co(e Screen


$he elcome screen ill display the first time you launch *poon (.0 providing you ith links to additional elcome page in *poon information about Pentaho %ata )ntegration. Oou can disable the launching of the options B7ditW3ptionsC.

Pentaho Data Integration TM

S oon !ser "#ide 2&

$he

elcome screen

Pentaho Data Integration TM

S oon !ser "#ide 26

%. Creating a Transfor(ation or 9ob


Oou create a ne $ransformation by clicking on the :e $ransformation button on the main toolbar" by ill open a ne selecting !ileW:e W$ransformation from the menubar" or by using the C$R,2: hotkey. $his $ransformation tab for you to being designing your transformation. Oou create a ne 9ob by clicking on the :e 9ob button on the main toolbar" by selecting !ileW:e W9ob ill open an ne 9ob tab for you to begin

from the menubar" or by using the C$R,2A,$2: hotkey. $his designing your Dob.

Creating a ne

transformation or Dob

%.1. :otes
:otes allo you to add descriptive te8t notes to the 9ob or $ransformation canvas. $o add a note to the ith the mouse using the left button. graphical vie " right2click on the canvas and select LAdd noteL. ,ater" these notes can be edited by double clicking on them and dragged around the screen by dragging on them $o remove a note" right2click on the note and select L%elete noteL.

:otes

Pentaho Data Integration TM

S oon !ser "#ide 27

&. Database Connections


A database connection describes the method by multiple transformations or Dobs. hich /ettle can connect to a database. Oou can create ithin connections specific to a 9ob or $ransformation or store them in the /ettle repository for re2use

&.1. Screen shot

$he Connection information dialog

&.2. Creating a ne/ database connection


$his section describes ho to create and create a ne database connection including a detailed description of each connection property available in the Connection information dialog. Oou begin creating a ne selecting L:e L or L:e pressing !(. connection by right2clicking on the L%atabase ConnectionsL tree entry and Connection 'iFardL" by double2clicking on L%atabase ConnectionsL" or simply by

Pentaho Data Integration TM

S oon !ser "#ide 2*

Creating a ne

database connection

$his

ill launch the LConnection informationL dialog sho n above. $he follo ing topics describe the

configuration options available on each tab of the Connection information dialog.

&.2.1. "enera,
$he general tab is here you setup the basic information about your connection like the connection name" provides a more detailed type" access method" server name and login credentials. $he table belo description of the options available on the 4eneral tab@ !eature Connection :ame Connection $ype 6ethod of access %escription >niHuely identifies a connection across transformations and Dobs $he type of database you are connecting to Bi.e. 6y*;," 3racle" etc.C $his ill be either :ative B9%=CC" 3%=C" or 3C). Available access types are

dependent on the type of database you are connecting to *erver host name %efines the host name of the server on specify the host by )P2address %atabase name )dentifies the database name you the %*: name here Port number >sername Pass ord *ets the $CP<)P port number on hich the database listens ant to connect to. )n case of 3%=C" specify hich the database resides. Oou can also

3ptionally specifies the username to connect to the database 3ptionally specifies the pass ord to connect to the database

&.2.2. Poo,ing
$he pooling tab allo s you to configure your connection to use connection pooling and define options related to connection pooling like the initial pool siFe" ma8imum pool siFe and connection pool parameters. $he table belo provides a more detailed description of the options available on the Pooling tab@

Pentaho Data Integration TM

S oon !ser "#ide 2-

!eature >se a connection pool $he initial pool siFe $he ma8imum pool siFe. Parameter $able

%escription Check this option to enable connection pooling. *ets the initial siFe of the connection pool. *ets the ma8imum number of connections in the connection pool. Allo s you to define additional custom pool parameters.

&.2.$. M1S;L
=ecause by default" 6y*;, gives back complete Huery results in one block to the client B/ettle in this caseC e had to enable Mresult streamingN by default. $he big dra back of this is that it allo s only # BoneC single Huery to be opened at any given time. )f you run into trouble because of that" you can disable this option in the 6y*;, tab of the database connection dialog. Another issue you might come across is that the default timeout in the 6y*;, 9%=C driver is set to 0. Bno timeoutC $his leads to a problem in certain situations as it doesnLt allo /ettle to detect a server crash or sudden net ork failure if it happens in the middle of a Huery or open database connection. $his in turn leads to the infinite stalling of a transformation or Dob. $o solve this" set the Mconnect$imeoutN and Msocket$imeoutN parameters for 6y*;, in the 3ptions tab. $he value to be specified is in milliseconds@ for a 2 minute timeout you Oou can also revie ould specify value #20000 B 2 8 00 8 #000 C. help te8t on

other options on the linked 6y*;, help page by clicking on the L*ho

option usageL button found on the 3ptions tab.

&.2.%. Orac,e
$his tab allo s you to specify the default data and inde8 tablespaces *;, for 3racle tables and inde8es. $his version of Pentaho %ata )ntegration ships the most stable and recent driver other strange problems" you might ith the 3racle 9%=C driver version #0.2.0. )t is in general ith 3racle connectivity or ith hich /ettle ill use hen generating

e could find. ?o ever" if you do have issues

ant to consider replacing the #0.2. 9%=C driver to match your database

server. Replace files MoDdbc#-.DarN and Morai#1n.DarN in the directory libe8t<9%=C of your distribution the files found in the R3RAC,7Q?367<Ddbc directory on your server. )f you ships ant to use 3C) and an 3racle :et1 client" please read on. !or 3C) to ith version #0.2.

ork" the 9%=C driver version ith version #0.#" 2...0

used in /ettle needs to match your 3racle client version. 3racle 2...0 shipped

Oou can either install that version of the 3racle client or Bprobably easierC change the 9%=C driver in P%) if versions donLt match up. Bsee aboveC

&.2.&. Infor(i8
!or )nformi8" you need to specify the )nformi8 *erver name in the )nformi8 tab in order for a connection to be usable.

Pentaho Data Integration TM

S oon !ser "#ide $0

&.2.6. S;L Ser6er


$his tab allo s you configure the follo ing properties specific to 6icrosoft *;, *erver@ !eature *;, *erver instance name %escription *ets the instance name property for the *;, *erver connection. hen using dot notation to separate schema and table.

>se .. to separate schema and table 7nable

3ther properties can be configured by adding connection parameters on the options tab of the Connection information dialog. !or e8ample" you can enable single sign2on login by defining the domain option on the 3ptions tab as sho n belo @

$he *;, *erver YinstanceY property

!rom the D$%* !A; on http@<<Dtds.sourceforge.net<faH.html@

*pecifies the 'indo s domain to authenticate in. )f present and the user name and pass ord are provided" D$%* uses 'indo s B:$,6C authentication instead of the usual *;, *erver authentication Bi.e. the user and pass ord provided are the domain user and pass ordC. $his allo s non2'indo s clients to log in to servers authentication. hich are only configured to accept 'indo s

)f the domain parameter is present but no user name and pass ord are provided" D$%* uses its native *ingle2*ign23n library and logs in ork one ith the logged 'indo s userLs credentials Bfor this to to do thisC. ould obviously need to be on 'indo s" logged into a domain" and also have the **3

library installed 22 consult R7A%67.**3 in the distribution on ho

Pentaho Data Integration TM

S oon !ser "#ide $1

&.2.7. S'P 0<$


$his tab allo s you configure the follo ing properties specific to *AP R<(@ !eature ,anguage *ystem :umber *AP Client %escription *pecifies the language to be used hen connecting to *AP. hich you ant to connect.

*pecifies the system number of the *AP system to

*pecifies the three digit client number for the connection.

&.2.*. "eneric
$his tab is here you specify the >R, and %river class for 4eneric %atabase connectionsJ Oou can also dynamically set these properties using /ettle variables. $his provides the ability to access data from multiple database types using the same transformations and Dobs. Note: 6ake sure to use clean A:*) *;, that

orks on all used database types in that case.

Pentaho Data Integration TM

S oon !ser "#ide $2

&.2.-. O tions
$his tab allo s you to set database specific options for the connection by adding parameters to the generated >R,. $o add a parameter" select the ne8t available ro configuration help" click the Z*ho *poon database type@ in the parameter table" choose your bro ser tab ill appear in database type" then enter a valid parameter name and its corresponding value. !or more database specific help te8t on option usageV button and a ne ith additional information about the configuring the 9%=C connection for the currently selected

%isplay options help in a *poon bro ser

&.2.10. S;L
$his tab allo s you to enter a number of *;, commands immediately after connecting to the database. $his is sometimes needed for various reasons like licensing" configuration" logging" tracing" etc.

&.2.11. C,#ster
$his tab allo s you to enable clustering for the database connection and create connections to the data partitions. $o enable clustering for the connection" check the L>se Clustering+L option. $o create a ne data partition" enter a partition )% and the hostname" port" database" username and

pass ord for connecting to the partition.

Pentaho Data Integration TM

S oon !ser "#ide $$

&.2.12. 'd6anced
$his tab allo s you configure the follo ing properties for the connection@ !eature ;uote all identifiers in database !orce all identifiers to lo er case !orce all identifiers to upper case %escription *pecifies the language to be used connect. *pecifies the three digit client number for the connection. hen connecting to *AP. hich you ant to

*pecifies the system number of the *AP system to

&.2.1$. Test a connection


$he L$estL button in the Connection information dialog allo s you to test the current connection. An 3/ message ill be displayed if *poon is able to establish a connection ith the target database.

&.2.1%. 78 ,ore
$he %atabase 78plorer allo s you to interactively bro se the target database" previe data" generate %%, and much more. $o open the %atabase 78plorer for an e8isting connection" click the L78ploreL button found on the Connection information dialog or right2click on the connection in the 6ain tree and select L78ploreL. Please see %atabase 78plorer for more information.

&.2.1&. 5eat#re List


!eature list@ e8poses the 9%=C >R," class and various database settings for the connection such as the list of reserved ords.

&.$. 7diting a connection


$o edit an e8isting connection" double2click on the connection name in the main tree or right2click on the connection name and select Y7dit connectionY.

&.%. D# ,icate a connection


$o duplicate an e8isting connection" right2click on the connection name and select Y%uplicateY.

&.&. Co 1 to c,i board


Accessed by right2clicking on a connection name in the main tree" this option copies the G6, describing the connection to the clipboard. %elete a Y%eleteY.

connection

$o delete an e8isting database connection" right2click on the connection name in the main tree and select

&.6. 78ec#te S;L co((ands on a connection


$o e8ecute *;, command against an e8isting connection" right2click on the connection name and select Y*;, 7ditorY. *ee also *;, 7ditor for more information.

Pentaho Data Integration TM

S oon !ser "#ide $%

&.7. C,ear D= Cache o tion


$o speed up connections *poon uses a database cache. 'hen the information in the cache no longer represents the layout of the database" right2click on the connection in the 6ain tree and select the LClear %= Cache...L option. $his is commonly used hen databases tables have been changed" created or deleted.

&.*. ;#oting
'e had more and more people complain about the handling of reserved it" field names ith decimals B.C in it" table names ith. ords for many Bbut not allC of the supported ould be impossible to properly Huote tables ords" field names ith spaces in e ith dashes and other special characters in it ... implemented a database specific Huoting system that allo s you to pretty much use any name or character that the database is comfortable

Pentaho %ata )ntegration contains a list of reserved databases. $o correctly implement Huoting" or fields

e had to go for a strict separation bet een the schema

Buser<o nerC of a table and the tablename itself. 3ther ise it

ith one or more periods in them. Putting dots in table and field names is apparently common

practice in certain 7RP systems. Bfor e8ample fields like MA.A.$.NC =ecause e too can be rong hen doing the Huoting" e have added a ne rule in version 2...0@ hen

there is a start or end2Huote in the tablename or schema" Pentaho %ata )ntegration refrains from doing the Huoting. $his allo s you to specify the Huoting mechanism yourself. $his leaves you all the freedom you need to get out of any sticky situation that might be left. :evertheless" feel free to let us kno that e can improve our Huoting algorithms. about it so

&.-. Database !sage "rid


%atabase 3racle Access 6ethod *erver :ame or )P %atabase :ame Address :ative 3%=C 3C) 6y*;, :ative 3%=C A*<-00 :ative 3%=C 6* Access 6* *;, *erver 3%=C :ative 3%=C )=6 %=2 :ative ReHuired ReHuired ReHuired ReHuired ReHuired 3racle database *)% 3%=C %*: name %atabase $:* name 6y*;, database name 3ptional B((00C 3%=C %*: name A*<-00 ,ibrary name 3%=C %*: name 3%=C %*: name %atabase name 3%=C %*: name %atabase name ReHuired B.0000C ReHuired B#-((C ReHuired ReHuired 3ptional 3ptional ReHuired ReHuired 3ptional ReHuired Port [ BdefaultC ReHuired B#.2#C ReHuired ReHuired 3ptional >sername 5 Pass ord ReHuired

Pentaho Data Integration TM

S oon !ser "#ide $&

%atabase

Access 6ethod *erver :ame or )P %atabase :ame Address 3%=C 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase :ame 3%=C %*: name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired ReHuired %atabase name %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name

Port [ BdefaultC

>sername 5 Pass ord ReHuired

Postgre*;,

:ative 3%=C

ReHuired B.-(2C

ReHuired ReHuired

)ntersystems Cach\

:ative 3%=C

ReHuired B#&72C

ReHuired ReHuired

*ybase

:ative 3%=C

ReHuiredB.0 ReHuired 0#C ReHuired ReHuired B2#..C ReHuired ReHuired 3ptional ReHuired B(0.0C ReHuired ReHuired ReHuired B&00#C ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired B(0.0C ReHuired ReHuired ReHuired B0-.(C ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired B.-10C ReHuired

4upta *;, =ase

:ative 3%=C

%base )))")A or ..0 !irebird *;,

3%=C :ative 3%=C

?ypersonic

:ative

6a8%= B*AP %=C :ative 3%=C )ngres :ative 3%=C =orland )nterbase :ative 3%=C 78ten%= :ative 3%=C $eradata :ative 3%=C 3racle R%= :ative 3%=C ?2 :ative 3%=C :eteFFa :ative

Pentaho Data Integration TM

S oon !ser "#ide $6

%atabase

Access 6ethod *erver :ame or )P %atabase :ame Address 3%=C 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name optional %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name

Port [ BdefaultC

>sername 5 Pass ord ReHuired ReHuired ReHuired ReHuired ReHuired

)=6 >niverse

:ative 3%=C

*;,ite

:ative 3%=C

Apache %erby

:ative 3%=C

3ptional B#.27C

3ptional 3ptional

4eneric B]C

:ative 3%=C

ReHuired BAnyC

ReHuired 3ptional

B]C $he generic database connection also needs to specify the >R, and %river class in the 4eneric tabJ 'e no also allo these fields to be specified using a variable. $hat ay you can access data from multiple orks on database types using the same transformations and Dobs. 6ake sure to use clean A:*) *;, that all used database types in that case.

Pentaho Data Integration TM

S oon !ser "#ide $7

&.10. Config#ring 9:DI connections


)f you are developing transformations and Dobs that =ecause you donLt the transformations" ill be deployed on an application server such as the Pentaho platform running on 9=oss" you can configure your database connections using 9:%). ant to have an application server running all the time during development or testing of e have supplied a ay of configuring a 9:%) connection for MlocalN /ettle use.

$o configure" edit properties file called Msimple2Dndi<Ddbc.propertiesN !or e8ample" to connect to the databases used in Pentaho %emo platform do nload" use this information in the properties file@ -ample>ata'type=javax.s?l.>ata-ource -ample>ata'driver=org.hs?ldb.jdbc>river -ample>ata'url=jdbc@hs?ldb@hs?l@''localhost'sampledata -ample>ata'user=pe$tahoAuser -ample>ata'pass%ord=pass%ord BuartC'type=javax.s?l.>ata-ource BuartC'driver=org.hs?ldb.jdbc>river BuartC'url=jdbc@hs?ldb@hs?l@''localhost'?uartC BuartC'user=pe$tahoAuser BuartC'pass%ord=pass%ord 3iber$ate'type=javax.s?l.>ata-ource 3iber$ate'driver=org.hs?ldb.jdbc>river 3iber$ate'url=jdbc@hs?ldb@hs?l@''localhost'hiber$ate 3iber$ate'user=hibuser 3iber$ate'pass%ord=pass%ord -har<'type=javax.s?l.>ata-ource -har<'driver=org.hs?ldb.jdbc>river -har<'url=jdbc@hs?ldb@hs?l@''localhost'shar< -har<'user=sa -har<'pass%ord=

Pentaho Data Integration TM

S oon !ser "#ide $*

Note: )t is important that the information stored in this file in the simple2Dndi directory mirrors the content

of your application server data sources.

&.11. !ns#
)f you solution. A fe and<or soft are.

orted databases
and e ill try to find a database types are not supported in this release because of the lack of sample database

ant to access a database type that is not yet supported" let us kno

Please note that it is usually still possible to read from these databases by using the 4eneric database driver through an 3%=C or 9%=C connection.

Pentaho Data Integration TM

S oon !ser "#ide $-

6. S;L 7ditor

$he *imple *;, 7ditor dialog

6.1. Descri tion


$he *imple *;, 7ditor is an easy2to2use tool 7ditor is used to previe hen you need to e8ecute standard *;, commands for tasks like creating tables" dropping inde8es and modifying fields. )n several places throughout *poon" the *;, and e8ecute %%, B%ata %efinition ,anguageC generated by *poon such as ould be if Mcreate<alter tableN" Mcreate inde8N and Mcreate seHuenceN *;, commands. An e8ample of this )nput dialog. *poon

you added at $able 3utput step to a transformation and clicked the *;, button at the bottom of the $able ill automatically generate the necessary %%, for the output step to function properly and present that to the end user via the *;, 7ditor. Notes:

6ultiple *;, *tatements have to be separated by semi2colons B^C. =efore these *;, *tatements are sent to the database to be e8ecuted" *poon removes returns" line2feeds and the separating semi2colons. /ettle clears the database cache for the database connection on statements. hich you launch %%,

6.2. Li(itations
$his is a simple *;, editor. )t does not kno all the dialects of all the more than 20 supported databases. ith the database in that case. $hat means that creating stored procedures" triggers and other database specific obDects might pose problems. Please consider using the tools that came

Pentaho Data Integration TM

S oon !ser "#ide %0

7. Database 78 ,orer

$he database e8plorer dialog

7.1. Descri tion


$he %atabase 78plorer provides the ability to e8plore configured database connections. )t currently supports tables" vie s and synonyms along ith the catalog and<or schema to hich the table belongs.

$he buttons to the right provide Huick access the follo ing features for the selected table@ !eature Previe Previe first #00 ro s of... first ... ro s of... %escription Returns the first #00 ro s from the selected table Prompts the user for the number of ro s to return from the selected table *pecifies the three digit client number for the connection %isplays a list of column names" data types" etc. from the selected table 4enerates the %%, to create the selected table based on the current connection type 4enerate %%, for other connection 3pen *;, for... $runcate table... Prompts the user for another connection" then generates the %%, to create the selected table based on the user selected connection type. ,aunches the *imple *;, 7ditor for the selected table 4enerates a $R>:CA$7 table statement for the current table. Note: $he statement is commented out by default to prevent the user from

:umber of ro s... *ho ,ayout of...

4enerate %%,

accidentally deleting the table data.

Pentaho Data Integration TM

S oon !ser "#ide %1

*. >o s

7diting a $ransformation ?op

7diting a 9ob ?op

*.1. Descri tion


A hop connects one transformation step or Dob entry indicated ith an arro on the graphical vie for e8ampleC. ith another. $he direction of the data flo is pane. A hop can be enabled or disabled Bfor testing purposes

*.1.1. Transfor(ation >o s


'hen a hop is disabled in a transformation" the steps do nstream of the disabled hop are cut off from any data flo ing upstream of the disabled hop. $his may lead to une8pected results may not reveal any of the incoming fields as long as the hop is still disabled. hen editing the do nstream steps. !or e8ample" if a particular step2type offers a M4et !ieldsN button" clicking the button

*.1.2. 9ob >o s


=esides the e8ecution order" it also specifies the condition on M>nconditionalN specifies that the ne8t Dob entry originating Dob entry. M!ollo hen result is trueN specifies that the ne8t Dob entry as false" ... ill only be e8ecuted hen the result as false" meaning unsuccessful e8ecution" file not found" table not as false" ... ill only be e8ecuted hen the result ithout of the originating Dob entry error" evaluation M!ollo as true" meaning successful e8ecution" file found" table found" hich the ne8t Dob entry ill be e8ecuted. Oou can specify the evaluation mode by right clicking on the Dob hop@ ill be e8ecuted regardless of the result of the

hen result is falseN specifies that the ne8t Dob entry

of the originating Dob entry

found" errorBsC occurred" evaluation

*.2. Creating ' >o


Oou can easily create a ne hop bet een 2 steps by one of the follo ing options@ bet een 2 steps bet een 2 steps hile using the middle mouse button. hile pressing the *?)!$ key and using the left %ragging on the 4raphical Aie %ragging on the 4raphical Aie mouse button.

Pentaho Data Integration TM

S oon !ser "#ide %2

*electing t o steps in the tree" clicking right and selecting Yne *electing t o steps in the graphical vie selecting Yne *plitting A ?op step into a ne orks only hopY

hopY

BC$R, _ left mouse clickC" right clicking on a step and

Oou can easily insert a ne ant to split the hop. $his

hop bet een t o steps by dragging the step Bin the 4raphical ill be asked if you

Aie C over a hop until the hop becomes dra n in bold. Release the left button and you

ith steps that have not yet been connected to another step.

*.$. Loo s
,oops are not allo ed in transformations because *poon depends heavily on the previous steps to determine the field values that are passed from one step to another. )f transformations e often e ould allo loops in ould get endless loops and undetermined results.

,oops are allo ed in Dobs because *poon e8ecutes Dob entries seHuentially. 9ust make sure you donLt build endless loops. $his Dob entry can help you e8it closed loops based on the number of times a Dob entry as e8ecuted.

*.%. Mi8ing ro/s? tra detector


6i8ing ro s ith different layout is not allo ed in a transformation. 6i8ing ro layouts ill cause steps to fail because fields can not be found here e8pected or the data type changes une8pectedly. arnings at design time hen a step is receiving mi8ed layouts@

$he Mtrap detectorN is in place to provide

)n this case" the full error report reads@

'e detected ro s ro

ith varying number of fields" this is not allo ed in a transformation. $he first

contained #( fields" another one contained #0 @ `customerQtkK0" versionK0" dateQfromK"

dateQtoK" C>*$367R:RK0" :A67K" !)R*$:A67K" ,A:4>A47K" 47:%7RK" *$R77$K" ?3>*:RK" =>*:RK" E)PC3%7K" ,3CA$)3:K" C3>:$ROK" %A$7Q3!Q=)R$?Ka
Note: this is only a Pentaho Data Integration TM

arning and

ill not prevent you from performing the task you

ant to do.
S oon !ser "#ide %$

*.&. Transfor(ation ho co,ors


$ransformation hops display in a variety of colors based on the properties and state of the hop. $he follo ing table describes the meaning behind a transformation hopLs color@ ?op Color 4reen Red Oello 6agenta 4ray =lack =lue 3range B%ot lineC Red B=old %ot lineC 6eaning %istribute ro s@ if multiple hops are leaving a step" ro s of data evenly distributed to all target steps. Copies ro s@ if multiple hops are leaving a step" all ro s of data copied to all target steps. Provides info for step" distributes ro s Provides info for step" copies ro s $he hop is disabled. $he hop has a named target step. Candidate hop using middle button _ drag $he hop is never used because no data ill ever go there. ill be ill be

$he hop is used for carrying ro s that caused errors in source stepBsC.

Pentaho Data Integration TM

S oon !ser "#ide %%

-. @ariab,es
-.1. @ariab,e #sage
Aariables can be used throughout Pentaho %ata )ntegration" including entries. Aariables can be defined by setting them setting them in the <ettle.properties file in the directory@ D3489'.<ettle (*$ix'+i$ux'4-E) C@F>ocume$ts a$d -etti$gsFGuser$ameHF.<ettleF (0i$do%s) $he ay to use them is either by grabbing them using the 4et Aariable step or by specifying meta2data ithin transformation steps and Dob ith the M*et AariableN step in a transformation or by

strings like@ DIJ5715:+9K or LLJ5715:+9LL =oth formats can be used and even mi8ed" the first is a >:)G derivative" the second is derived from 6icrosoft 'indo s. %ialogs that support variable usage throughout Pentaho %ata )ntegration are visually indicated using a red dollar sign like this@

Oou can use C$R,2*PAC7 hotkey to select a variable to be inserted into the property value. 6ouse over the variable icon to see a shortcut help te8t displayed.

-.2. @ariab,e sco e


$he scope of a variable is defined by the place in hich it is defined.

-.2.1. 7n6iron(ent 6ariab,es


$he first usage Band only usage in previous /ettle versionsC as to set an environment variable. $his as traditionally done by passing options to the 9ava Airtual 6achine B9A6C ay to specify the location of temporary files in a platform independent *ettingsSTusernameS,ocal *ettingsS$emp on 'indo s machines. $he only problem arise if you you ith using environment variables is that the usage is not dynamic in nature and problems ay. $hat is because if you run t o or more ith the 2% option. )tLs also an easy ay" for e8ample using variable

RbDava.io.tmpdirc. $his variable points to directory <tmp on >ni8<,inu8<3*G and to C@S%ocuments and

ould try to use them in a dynamic

transformations or Dobs run at the same time on an application server Bfor e8ample the Pentaho platformC ould get conflicts. Changes to the environment variables are visible to all soft are running on the virtual machine.

Pentaho Data Integration TM

S oon !ser "#ide %&

-.2.2. .ett,e 6ariab,es


=ecause the scope of an environment variable B&.2.#.7nvironment variablesC is too broad" /ettle variables ere introduced to provide a ay to define variables that are local to the Dob in hich Dob you hich the variable is set. ant to set the variableLs $he M*et AariableN step in a transformation allo s you to specify in scope Bi.e. parent Dob" grand2parent Dob or the root DobC.

-.2.$. Interna, 6ariab,es


$he follo ing variables are al ays defined@ Aariable :ame )nternal./ettle.=uild.%ate )nternal./ettle.=uild.Aersion )nternal./ettle.Aersion $hese variables are defined in a transformation@ Aariable :ame )nternal.$ransformation.!ilename.%irectory )nternal.$ransformation.!ilename.:ame )nternal.$ransformation.:ame )nternal.$ransformation.Repository.%irectory *ample value %@S/ettleSsamples %enormaliser 2 2 series of key2value pairs.ktr %enormaliser 2 2 series of key2value pairs sample < *ample value 2007<0.<22 #1@0#@(& 20-. 2...0

$hese are the internal variables that are defined in a 9ob@ Aariable :ame )nternal.9ob.!ilename.%irectory )nternal.9ob.!ilename.:ame )nternal.9ob.:ame )nternal.9ob.Repository.%irectory *ample value <home<matt<Dobs :ested Dobs.kDb :ested Dob test case <

$hese variables are defined in a transformation running on a slave server" e8ecuted in clustered mode@ Aariable :ame )nternal.*lave.$ransformation.:umber )nternal.Cluster.*iFe *ample value 0..Tcluster siFe2#U B0"#"2"( or -C Tcluster siFeU B.C

Pentaho Data Integration TM

S oon !ser "#ide %6

10. Transfor(ation Settings

$ransformation *ettings

10.1. Descri tion


$ransformation *ettings are a collection of properties to describe the transformation and configure its behavior. Access $ransformation *ettings from the main menu under $ransformationW*ettings. $he follo ing sections provides a detailed description of the available settings.

10.2. Transfor(ation Tab


$he transformation tab allo s you to specify general properties about the transformation including@ *etting $ransformation name %escription 78tended description *tatus Aersion %irectory Created by Created at ,ast modified by Pentaho Data Integration TM %escription $he name of the transformation ReHuired information if you ant to save to a repository *hort description of the transformation" sho n in the repository e8plorer ,ong e8tended description of the transformation %raft or production status Aersion description $he directory in the repository %isplays the date and time here the transformation is stored as created. %isplays the original creator of the transformation. hen the transformation %isplays the user name of the last user that modified the transformation. S oon !ser "#ide %7

*etting ,ast modified at

%escription %isplays the date and time hen the transformation as last modified.

10.$. Logging
$he ,ogging tab allo s you to configure ho *etting R7A% log step ):P>$ log step 'R)$7 log step 3>$P>$ log step >P%A$7 log step R797C$7% log step ,og connection ,og table >se =atch2)%+ >se logfield to store logging in %escription >se the number of read lines from this step to means@ read from source steps. >se the number of input lines from this step to means@ input from file or database. >se the number of 'ritten means@ ritten lines from this step to rite to the log table. rite to the log table. rite to the log table. rite to the log table. ritten to target steps. rite to the log table. )nput rite to the log table. Read and here logging information is captured. *ettings include@

>se the number of output lines from this step to 3utput means@ output to file or database. >se the number of updated lines from this step to >pdate means@ updated in a database. >se the number of reDected lines from this step to ReDected means@ error record. $he connection used to 7nable this if you rite to a log table.

specifies the name of the log table Bfor e8ample ,Q7$,C ant to have a batch )% in the ,Q7$, file. %isable for ith *poon<Pan version T 2.0. ith the run results in the same back ard compatibility

$his option stores the logging te8t in a C,3= field in the logging table. $his allo s you to have the logging te8t together table. %isable for back ard compatibility ith *poon<Pan version T 2.#

10.%. Dates
$he %ates tab allo s you to configure the follo ing date related settings@ *etting 6a8date connection 6a8date table 6a8date field 6a8date offset %escription 4et the upper limit for a date range on this connection. 4et the upper limit for a date range in this table. 4et the upper limit for a date range in this field. )ncreases the upper date limit 2(@00@00" but you kno 6a8imum date difference ith this amount. >se this for e8ample" if you find that the field %A$7Q,A*$Q>P% has a ma8imum value of 200-20.22& that the values for the last minute are not complete. ill allo )n this case" simply set the offset to 200. *ets the ma8imum date difference in the obtained date range. $his you to limit Dob siFes.

10.&. De endencies
$he %ependencies tab allo s you to enter all of the dependencies for the transformation. !or e8ample" if a dimension is depending on ( lookup tables" e have to make sure that these lookup tables have not e need to e8tend the date range to force a S oon !ser "#ide %* changed. )f the values in these lookup tables have changed" Pentaho Data Integration TM

full refresh of the dimension. $he dependencies allo you have a Mdata last changedN column in the table. $he L4et dependencies buttonL

you to look up

hether a table has changed in case

ill try to automatically detect dependencies.

10.6. Misce,,aneo#s
$he 6iscellaneous tab allo s you to configure the follo ing settings@ *etting :umber of ro s in ro sets *ho a feedback ro %escription $his option allo s you to change the siFe of the buffers bet een the connected steps in a transformation. Oou 3nly in transformation steps+ $he feedback siFe hen you run lo $his controls ill rarely<never need to change this parameter. hile the on memory it might be an option to lo er this parameter.

hether or not to add a feedback entry into the log file

transformation is being e8ecuted. =y default" this feature is enabled and configured to display a feedback record every .000 ro s. *ets the number of ro s to process before entering a feedback entry into the log. *et this higher hen processing large amounts of data to reduce the amount of information in the log file.

>se uniHue connections *hared obDects file 6anage thread priorities+

$his allo s use to open one uniHue connection per defined and used database connection in the transformation. Checking this option is reHuired in order to allo a failing transformation to be rolled back completely. *pecifies the location of the G6, file used to stored shared obDects like database connections" clustering schemas and more. Allo s you to enable or disable the internal logic for changing the 9ava thread priorities based on the number of input and output ro s in the perspective Mro setN buffers. $his can be useful in some simplistic situations of using the logic is e8ceeds the benefit of the thread prioritiFation. here the cost

10.7. Partitioning
$he Partitioning tab provides a list of available database partitions. Oou can create a ne clicking on the M:e N button. $he M4et PartitionsN button been defined for the connection. partition by ill retrieve a list of available partitions that have

10.*. S;L =#tton


Click the *;, button at the bottom of the $ransformation properties button to generate the *;, needed to create the logging table. $he %%, ill display in the *imple *;, 7ditor allo ing you to e8ecute this or any other *;, statementBsC against the logging connection.

Pentaho Data Integration TM

S oon !ser "#ide %-

11. Transfor(ation Ste s


11.1. Descri tion
A step is one part of a transformation. *teps can provide you ith a ide range of functionality ranging from reading te8t2files to implementing slo ly changing dimensions. $his chapter describes various step settings follo ed by a detailed description of available step types.

11.2. La#nching se6era, co ies of a ste


*ometimes it can be useful to launch the same step several times. !or e8ample" for performance reasons it can be useful to launch a database lookup step ( times or more. $hat is because database connections usually have a certain latency. ,aunching the same step several times keeps the database busy on different connections" effectively lo ering the latency. Oou can launch several copies of step in a transformation simply by right2clicking on a step in the graphical vie startPN@ and then by selecting Mchange number of copies to

$he Y*tep copiesY popup menu

Oou

ill get this dialog@

$he step copies dialog

Pentaho Data Integration TM

S oon !ser "#ide &0

)f you enter ( this

ill be sho n@

6ultiple step copies e8ample

)t is the technical eHuivalent of this@

6ultiple step copies eHuivalent

Pentaho Data Integration TM

S oon !ser "#ide &1

11.$. Distrib#te or co 1+
)n the e8ample above" green lines are sho n bet een the steps. $his indicates that ro s are distributed among the target steps. )n this case" it means that the first ro to Mdatabase lookup #N" etc. ?o ever" if e right click on step MAN" and select MCopy dataN" you ill get the hops dra n in red@ coming from step MAN goes to step Mdatabase lookup #N" the second to Mdatabase lookup 2N" the third to M%atabase lookup (N" the fourth back

MCopy dataN means that all ro s from step MAN are copied to all ( the target steps. )n this case it means that step M=N gets ( copies of all the ro s that MAN has sent out. NOTE: =ecause of the fact that all these steps are run as different threads" the order in

hich the

single ro s arrive at step M=N is probably not going to be the same as they left step MAN.

Pentaho Data Integration TM

S oon !ser "#ide &2

11.%. Ste error hand,ing

*tep error handling settings

*tep error handling allo s you to configure a step such that instead of halting a transformation click on the step and select M%efine 7rror handling...N. )n the e8ample belo " .. e artificially generate an error in the *cript Aalues step

hen an

error occurs" pass those ro s that caused an error to a different step. $o configure error handling" right

hen an )% is higher than

$o configure the error handling" you can right click on the step involved and select the M7rror handing...N menu item@

Pentaho Data Integration TM

S oon !ser "#ide &$

NOTE: this menu item only appears

hen clicking on steps that support the ne

error handling code.

As you can see" you can add e8tra fields being to the Merror ro sN@

$his

ay"

e can easily define ne

data flo s in our transformations. $he typical use2case for this is an

alternative

ay of doing an >psert B)nsert<>pdateC@

Pentaho Data Integration TM

S oon !ser "#ide &%

$his transformation performs an insert regardless of the content of the table. )f you put a primary key on the )% Bin this case the customer )%C the insert into the table cause an error. =ecause of the error handling e can pass the ro s in error to the update step. Preliminary tests have sho n this strategy of doing upserts to be ( times faster in certain situations. B ith a lo updates to inserts ratioC

Pentaho Data Integration TM

S oon !ser "#ide &&

11.&. ' ache @irt#a, 5i,e S1ste( A@5SB s#


Commons Airtual !ile *ystem.

ort
ay to reference source

/ettle provides support for the Apache Airtual !ile *ystem BA!*C as an additional

files" transformations and Dobs from any location you like. !or more information about A!*" visit Apache

11.&.1. 78a( ,e? 0eferencing re(ote Cob fi,es


?ere is a simple e8ample of using A!* to reference the location of a Dob file /itchen@ e ant to e8ecute using

sh <itche$.sh -file@http@''%%%.<ettle.be'(e$erate7o%s.<jb

$o open this Dob using A!* from

ithin *poon" select !ileW3pen file from >R,@

7nter the >R, Lhttp@<<

.kettle.be<4enerateRo s.kDbL and click the 3/ button to load the Dob in *poon@

$he transformation

e are about to launch is also located on the

eb server. $he internal variable for the

Dob name directory is@

1$ter$al.Mob.,ile$ame.>irectory

http@''%%%.<ettle.be'

Pentaho Data Integration TM

S oon !ser "#ide &6

$his allo s us to reference the transformation as follo s@

Note: Oou

ill not be able to save the Dob back to the

eb server in this e8ample. $hat is not because


ith A!*" please visit@

do not support it" but because you donVt have the permission to do so.
!or more information on the almost endless list of possibilities files" Dar2files" ram drives" *6=" BsCftp" BsChttp" etc. 'e ill e8tend this list even further in the near future ith our o n drivers for the Pentaho solutions http@<<Dakarta.apache.org<commons<vfs<filesystems.html. 78amples include direct loading from Fip2files" gF2

repository and later on for the /ettle repository Bsomething like@ psr@<< and pdi@<< >R)sC.

11.&.2. 78a( ,e? 0eferencing fi,es inside a Di

$he e8ample above illustrates the ability to use a Apache A!* support )ntegration suite as

ild2card to directly select files inside of a Fip file.

as implemented in all steps and Dob entries that are part of the Pentaho %ata ell as in the recent Pentaho platform code and in Pentaho Analyses B6ondrianC.

Pentaho Data Integration TM

S oon !ser "#ide &7

11.6. Transfor(ation Ste T1 es


11.6.1. Te8t 5i,e In #t
)con

$e8t file input %ialog

11.6.1.1. "enera, descri tion


$he $e8t !ile )nput step is used to read date from a variety of different te8t2file types. $he most commonly used formats include Comma *eparated Aalues BC*A filesC generated by spreadsheets and fi8ed idth flat files.

$he $e8t !ile )nput step provides the ability to specify a list of files to read" or a list of directories ith ild cards in the form of regular e8pressions. )n addition" you can accept filenames from a previous step making filename handling more even more generic. $he follo ing sections describe in detail the available options for configuring the $e8t file input step.

11.6.1.2. 5i,e o tions


$he table belo 3ption !ile or directory provides a detailed descriptions of the features available on the !ile tab@ %escription $his field specifies the location and<or name of the input te8t file. NOTE@ press the MaddN button to add the file<directory< ildcard

combination to the list of selected files BgridC belo .


Regular e8pression *elected !iles *pecify the regular e8pression you ant to use to select the files in the ildcard selectionsC along directory specified in the previous option. $his table contains a list of selected files Bor ith a property specifying if file is reHuired or not. )f a file is reHuired and it isnLt found" an error is generated. 3ther ise" the filename is simply skipped. *ho filenamesBsC... %isplays a list of all files that file definitions. Pentaho Data Integration TM S oon !ser "#ide &* ill be loaded based on the current selected

3ption *ho *ho file content content from

%escription %isplays the content of the selected file. %isplays the content from the first data line only for the selected file.

first data line

11.6.1.2.1. Selecting Files to read data from


$he file tab Bsho n aboveC is $o specify a file@ #. 2. 7nter the location of the file in the L!ile or directoryL field or click the =ro se button to bro se the local file system. Click the LAddL button to add a file to the list of Lselected filesL like this@ here you identify the file or files from hich you ant to read data.

Adding entries to the list of files

11.6.1.2.2. Selecting file using Regular Expressions


Oou can also have this step search for files by specifying a ild card in the form of a regular ild cards. e8pression. Regular e8pressions are more sophisticated than simply using L]L and L+L ?ere are a fe !ilename <dirA< <dir=< <dirC< e8amples of regular e8pressions@ Regular .]userdata.]S.t8t AAA.] `A2Ea`02&a.] 78pression !iles selected All files in <dirA< ending on .t8t All files in <dir=< All files in <dirC< ith names starting out ith names starting ith AAA ith a capital and ith names containing userdata and

follo ed by a digit BA02E&C

11.6.1.2.3. Accept filenames from previous step

Accepting filenames from previous steps

Pentaho Data Integration TM

S oon !ser "#ide &-

$his option allo s even more fle8ibility in combination source@ te8t file" database table" etc. 3ption Accept filenames from previous steps *tep to read filenames from !ield in the input to use as filename $e8t !ile )nput %escription

ith other steps like M4et !ilenamesN. Oou ay the filename can come from any

can construct your filename and pass it to this step. $his

$his enables the option to get filenames from previous steps. $he step to read the filenames from ill look in this step to determine the filenames to use.

Pentaho Data Integration TM

S oon !ser "#ide 60

11.6.1.$. Content s ecification


$he content tab allo s you to specify the format of the te8t files that are being read. ?ere is a list of the options on this tab@ 3ption !ile type %escription $his can be either C*A or !i8ed length. =ased on this selection" *poon ill launch a different helper 4>) button in the last MfieldsN tab. *eparator 7nclosure 3ne or more characters that separate the fields in a single line of te8t. $ypically this is ^ or a tab. *ome fields can be enclosed by a pair of strings to allo an enclosures allo oNcloc< $e%s. Allo breaks in enclosed $his is an e8perimental feature hich is currently disabled. fields+ 7scape Note: $his functionality is implemented and available in the C*A )nput separator characters in fields. $he enclosure string is optional. )f you use repeat te8t line N)ot the $i$e oNNcloc< $e%s.N. 'ith L the enclosure string" this gets parsed as )ot the $i$e hen you press the Mget fieldsN

*tep.
*pecify an escape character Bor charactersC if you have escaped characters in your data. )f you have S as an escape character" the te8t N)ot the $i$e oFNcloc< $e%s.L B ith L the enclosureC parsed as )ot the $i$e oNcloc< $e%s. ?eader 5 number of header 7nable this option if your te8t file has a header ro . B!irst lines in the lines !ooter 5 number of footer lines 'rapped lines 5 number of raps Paged layout 5 page siFe 5 doc header fileC Oou can specify the number of times the header lines appears. 7nable this option if your te8t file has a footer ro . B,ast lines in the fileC Oou can specify the number of times the footer ro >se this if you deal rapped. Oou can use these options as a last resort hen dealing ith te8ts meant for printing on a line printer. >se the number of document header lines to skip introductory te8ts and the number of lines per page to position the data lines. Compression :o empty ro s )nclude filename in output !ilename field name Ro num in output+ Ro number field name 7nable this option if your te8t file is placed in a Eip or 4Eip archive. NOTE@ At the moment" only the first file in the archive is read. %onLt send empty ro s to the ne8t steps. 7nable this if you ant the filename to be part of the output. ith data lines that have appears. rapped beyond a ill get

certain page limit. :ote that headers 5 footers are never considered

$he name of the field that contains the filename. 7nable this if you ant the ro number to be part of the output. number.

$he name of the field that contains the ro Allo s the ro number to be reset per file.

Ro num by file+

Pentaho Data Integration TM

S oon !ser "#ide 61

3ption !ormat

%escription $his can be either %3*" >:)G or mi8ed. >:)G files have lines that are terminated by line feeds. %3* files have lines separated by carriage returns and line feeds. )f you specify mi8ed" no verification is done.

7ncoding

*pecify the te8t file encoding to use. ,eave blank to use the default encoding on your system. $o use >nicode specify >$!21 or >$!2#0. 3n first use" *poon ill search your system for available encodings.

,imit =e lenient dates+ $he date format ,ocale hen parsing

*ets the number of lines that is read from the file. 0 means@ read all lines. %isable this option if you ant strict parsing of data fields. )n case ill become !eb #st. ritten in full like ould be lenient parsing is enabled" dates like 9an (2nd

$his locale is used to parse dates that have been !rench BfrQ!RC locale ould not

M!ebruary 2nd" 2000N. Parsing this date on a system running in the ork because !ebruary called !\vrier in that locale.

Pentaho Data Integration TM

S oon !ser "#ide 62

11.6.1.%. 7rror hand,ing


$he error handling tab occur. $he table belo 3ption )gnore errors+ *kip error lines as added to allo %escription Check this option if you 7nable this option if you numbers on 7rror count field name 7rror fields field name 7rror te8t field name 'arnings file directory ant to ignore errors during parsing ant to skip those lines that contain errors. ill contain the line ith errors are not ill be empty BnullC ill contain the number ill contain the field ill contain the you to specify ho this step should react hen errors describes the options available for 7rror handling@

:ote that you can generate an e8tra file that hich the errors occurred. )f lines

skipped" the fields that did have parsing errors" Add a field to the output stream ro s. $his field of errors on the line. Add a field to the output stream ro s. $his field names on hich an error occurred. Add a field to the output stream ro s. $his field 'hen arnings are generated" they ill be T arning

descriptions of the parsing errors that have occurred. ill be put in this directory. $he name of that file 7rror files directory !ailing line numbers files directory

dirU<filename.TdateQtimeU.T arning e8tensionU 'hen errors occur" they file ill be put in this directory. $he name of that ill be put in this ill be TerrorfileQdirU<filename.TdateQtimeU.TerrorfileQe8tensionU ill be Terrorline

'hen a parsing error occur on a line" the line number directory. $he name of that file dirU<filename.TdateQtimeU.Terrorline e8tensionU

Pentaho Data Integration TM

S oon !ser "#ide 6$

11.6.1.&. 5i,ters
$he filters tab provides the ability to specify the lines you ant to skip in the te8t file.

*pecifying te8t file filters

$he table belo 3ption !ilter string !ilter position

describes the available options for defining filters@ %escription $he string to look for. $he position here the filter string has to be at in the line. 0 is the first 0 here" the filter string is hen the position in the line. )f you specify a value belo searched for in the entire string.

*top on filter

*pecify O here if you

ant to stop processing the current te8t file

filter string is encountered.

Pentaho Data Integration TM

S oon !ser "#ide 6%

11.6.1.6. 5ie,ds
$he fields tab is 3ption :ame $ype !ormat ,ength here you specify the information about the name and format of the fields being read from the te8t file. Available options include@ %escription name of the field $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4rouping :ull if %efault $rim Repeat !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ used to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C treat this value as :>,, $he default value in case the field in the te8t file O<:@ )f the corresponding value in this ro time it as not empty as not specified. BemptyC type trim this field Bleft" right" bothC before processing is empty@ repeat the one from the last

11.6.1.6.1. Number Formats


$he information on :umber formats *ymbol 0 [ . 2 " 7 ^ X Su20(0 e BSu00A-C ,ocation :umber :umber :umber :umber :umber :umber *ub pattern boundary Prefi8 or suffi8 Prefi8 or suffi8 Prefi8 or suffi8 Oes Oes :o 6ultiply by #00 and sho 6ultiply by #000 and sho as percentage as per mille as taken from the *un 9ava AP) documentation" to be found 6eaning %igit %igit" Fero sho s as absent %ecimal separator or monetary decimal separator 6inus sign 4rouping separator *eparates mantissa and e8ponent in scientific notation. :eed not be Huoted in prefi8 or suffi8. Oes *eparates positive and negative sub patterns here@ http@<<Dava.sun.com<D2se<#.-.2<docs<api<Dava<te8t<%ecimal!ormat.html ,ocaliFed Oes Oes Oes Oes Oes Oes

Currency sign" replaced by currency symbol. )f doubled" replaced by international currency symbol. )f present in a pattern" the monetary decimal separator is used instead of the decimal separator.

Prefi8 or suffi8

:o

>sed to Huote special characters in a prefi8 or suffi8" for e8ample" YL[L[Y formats #2( to Y[#2(Y. $o create a single Huote itself" use t o in a ro @ Y[ oLLclockY.

Pentaho Data Integration TM

S oon !ser "#ide 6&

*cientific :otation )n a pattern" the e8ponent character immediately follo ed by one or more digit characters indicates scientific notation. 78ample@ Y0.[[[70Y formats the number #2(- as Y#.2(-7(Y.

11.6.1.6.2.

ate formats
as taken from the *un 9ava AP) documentation" to be found Presentation $e8t Oear 6onth :umber :umber :umber :umber :umber $e8t $e8t :umber 0 :umber 2:umber 0 :umber #2 :umber (0 :umber .. :umber &71 4eneral time Fone R!C 122 time Fone Pacific *tandard $ime^ P*$^ 46$201@00 20100 78amples A% #&&0^ &0 9uly^ 9ul^ 07 27 2 #1& #0 2 $uesday^ $ue P6

$he information on %ate formats ,etter 4 y 6 ' % d ! 7 a ? k / h m s * F E

here@ http@<<Dava.sun.com<D2se<#.-.2<docs<api<Dava<te8t<*imple%ate!ormat.html %ate or $ime Component 7ra designator Oear 6onth in year 'eek in year 'eek in month %ay in year %ay in month %ay of %ay in eek in month eek

Am<pm marker ?our in day B022(C ?our in day B#22-C ?our in am<pm B02##C ?our in am<pm B#2#2C 6inute in hour *econd in minute 6illisecond $ime Fone $ime Fone

11.6.1.7. 78tras
!unction<=utton *ho filenames %escription $his option sho s a list of all the files selected. Please note that if the transformation is to be run on a separate server" the result might be incorrect. *ho *ho file content content from $he MAie N button sho s the first lines of the te8t2file. 6ake sure that the file2format is correct. 'hen in doubt" try both %3* and >:)G formats. $his button helps you in positioning the data lines in comple8 te8t files multiple header lines" etc. $his button allo s you to guess the layout of the file. )n case of a C*A file" this is done pretty much automatically. 'hen you selected a file length fields" you need to specify the field boundaries using a Previe ro s Press this button to previe the ro s generated by this step. ith fi8ed iFard. ith first data line 4et fields

Pentaho Data Integration TM

S oon !ser "#ide 66

11.6.2. Tab,e in #t

)con

$able )nput

11.6.2.1. "enera, descri tion


$his step is used to read information from a database" using a connection and *;,. =asic *;, statements are generated automatically.

11.6.2.2. O tions
3ption *tep name Connection *;, %escription :ame of the step. $his name has to be uniHue in a single transformation. $he database connection used to read data from. $he *;, statement used to read information from the database connection. Oou can also click the L4et *;, select statement...L button to bro se tables and automatically generate a basic select statement. 7nable laFy conversion Replace variables in script+ )nsert data from step ,aFy conversion ill avoid unnecessary data type conversions and can as provided result in a significant performance improvements. Check to enable. 7nable this to replace variables in the script. $his feature to allo you to test ith or *pecify the input step name locators 78ecute for each ro + ,imit siFe here here ithout performing variable substitutions. e can e8pect information to come

from. $his information can then be inserted into the *;, statement. $he e insert information is indicated by + BHuestion marksC. 7nable this option to perform the data insert for each individual ro . *ets the number of lines that is read from the database. 0 means@ read all lines.

Pentaho Data Integration TM

S oon !ser "#ide 67

11.6.2.$. 78a( ,e?


Consider for e8ample the follo ing *;, statement@ -9+9C2 * ,748 customers 03979 cha$gedAdate :92099) O 5)> O $his statement needs 2 dates that are read on the Y)nsert data fromY step. NOTE: $he dates can be provided using the Y4et *ystem )nfoY step type. !or e8ample if you

ant

to read all customers that have had their data changed yesterday" you might do it like this@

$he Mread customer dataN step looks like this@

Pentaho Data Integration TM

S oon !ser "#ide 6*

And the Mget date range for yesterdayN looks like this@

11.6.2.%. 78tras
!unction<=utton Previe %escription $his option previe s this step. )t is done by previe transformation of a ne ith 2 steps@ this one and a %ummy step. Oou can indo .

see the detailed logging of that e8ecution by clicking on the logs button in the the previe

Pentaho Data Integration TM

S oon !ser "#ide 6-

11.6.$. "et S1ste( Info

)con

11.6.$.1. "enera, descri tion


$his step retrieves information from the /ettle environment" available information to retrieve includes@ )tem system date BvariableC system date Bfi8edC start date range B$ransformationC end date range B$ransformationC start data range B9obC 7nd date range B9obC Oesterday 00@00@00 Oesterday 2(@.&@.& $oday 00@00@00 $oday 2(@.&@.& $omorro $omorro 00@00@00 2(@.&@.& %escription *ystem time" changes every time you ask a date. *ystem time" determined at the start of the transformation. *tart of date range" based upon information in 7$, log table. *ee" also $ransformation *ettings. 7nd of date range" based upon information in 7$, log table. *ee" also $ransformation *ettings. *tart of date range based upon information in the 7$, log table. *ee also $ransformation *ettings. 7nd of date range based upon information in the 7$, log table. *ee also $ransformation *ettings. *tart of yesterday. 7nd of yesterday. *tart of today. 7nd of today. *tart of tomorro . 7nd of tomorro *tart of last month. 7nd of last month. *tart of this month. 7nd of this month. *tart of ne8t month. 7nd of ne8t month. Copy nr of the step. *ee also ,aunching several copies of a

!irst day of last month 00@00@00 ,ast day of last month 2(@.&@.& !irst day of this month 00@00@00 ,ast day of this month 2(@.&@.& !irst day of ne8t month 00@00@00 ,ast day of ne8t month 2(@.&@.& copy of step

Pentaho Data Integration TM

S oon !ser "#ide 70

)tem transformation name transformation file name >ser that modified the transformation last %ate hen the transformation as modified last transformation batch )% ?ostname )P address Returns the )P address of the server. command line argument # command line argument 2 command line argument ( command line argument command line argument . command line argument 0 command line argument 7 command line argument 1 command line argument & command line argument #0 /ettle version /ettle =uild Aersion /ettle =uild %ate

%escription
step.

:ame of the transformation. !ile name of the transformation BG6, onlyC.

)%Q=A$C? value in the logging table" see 0. $ransformation settings. Returns the hostname of the server.

Argument # on the command line. Argument 2 on the command line. Argument ( on the command line. Argument - on the command line. Argument . on the command line. Argument 0 on the command line. Argument 7 on the command line. Argument 1 on the command line. Argument & on the command line. Argument #0 on the command line. Returns the /ettle version Be.g. 2...0C Returns the build version of the core /ettle library Be.g. #(C Returns the build date of the core /ettle library

11.6.$.2. O tions
$here follo ing table describes the options for configuring the 4et *ystem info step@ 3ption *tep :ame !ields %escription :ame of the step. $his name has to be uniHue in a single transformation. $he fields to output.

Pentaho Data Integration TM

S oon !ser "#ide 71

11.6.$.$. !sage
$he first type of usage is to simply get information from the system@

!rom version 2.(.0 on" this step also accepts input ro s. $he selected values ro s found in the input streamBsC@

ill be added to the

Pentaho Data Integration TM

S oon !ser "#ide 72

11.6.%. "enerate 0o/s

)con

4enerate Ro s

11.6.%.1. "enera, descri tion


$his step type outputs a number of ro s" default empty but optionally containing a number of static fields.

11.6.%.2. O tions
3ption *tep :ame ,imit !ields %escription :ame of the step. $his name has to be uniHue in a single transformation. *ets the ma8imum number of ro s you $his table is ro s you are generating. ant to generate. here you configure the structure and values BoptionallyC of the

Pentaho Data Integration TM

S oon !ser "#ide 7$

11.6.&. De2seria,iEe fro( fi,e Afor(er,1 C#be In #tB

)con

%e2serialFe from file

11.6.&.1. "enera, descri tion


Read ro s of data from a binary /ettle cube file. NOTE@ $his step should only be used to store short lived data. )t is not guaranteed that the file

format stays the same bet een versions of Pentaho %ata )ntegration.

11.6.&.2. O tions
3ption *tep :ame !ilename ,imit *iFe %escription :ame of the step. $his name has to be uniHue in a single transformation. $he name of the /ettle cube file that value of L0L indicates no siFe limit. ill be generated. ritten to the cube file. A Allo s you to optionally limit the number of ro s

Pentaho Data Integration TM

S oon !ser "#ide 7%

11.6.6. F=ase in #t

)con

Gbase input

11.6.6.1. "enera, descri tion


'ith this step it is possible to read data from most types of %=! file derivates called the G=ase family. Bd=ase )))<)A" !o8pro" Clipper" ...C

11.6.6.2. O tions
$he follo ing options are available for the Gbase input step@ 3ption *tep :ame !ilename ,imit *iFe Accept filenames Add ro nr+ )nclude filename in output+ Character2set name to use Previe Click this button to previe that ill be read. %escription :ame of the step. $his name has to be uniHue in a single transformation. $he name of the %=! file to read data from Allo s you to optionally limit the number of ro s read. Allo s you to read in filenames from a previous step in the transformation. Adds a field to the output number. 3ptionally allo s you to insert a field containing the filename onto the stream. *pecifies the character set Bi.e. A*C))" >$!21C to use. ith the specified name that contains the ro

Pentaho Data Integration TM

S oon !ser "#ide 7&

11.6.7. 78ce, in #t

)con

78cel input

11.6.7.1. "enera, descri tion


$his step provides the ability to read data from one or more 78cel files. $he follo ing sections describe each of the available features for configuring the 78cel input step.

11.6.7.2. 5i,es Tab


$he files tab is options include@ 3ption *tep :ame !ile or directory %escription :ame of the step. $his name has to be uniHue in a single transformation. $his field specifies the location and<or name of the input te8t file. NOTE@ press the MaddN button to add the file<directory< ildcard here you define the location of the 78cel files you ish to read from. Available

combination to the list of selected files BgridC belo .


Regular e8pression *elected !iles *pecify the regular e8pression you ant to use to select the files in the ildcard selectionsC along directory specified in the previous option. $his table contains a list of selected files Bor ith a property specifying if file is reHuired or not. )f a file is reHuired and it isnLt found" an error is generated. 3ther ise" the filename is simply skipped. Accept filenames from previous steps *ho filenamesBsC... Allo s you to read in filenames from a previous step in the transformation. %isplays a list of all files that ill be loaded based on the current

Pentaho Data Integration TM

S oon !ser "#ide 76

3ption Previe ro s

%escription selected file definitions. Click to previe the contents of the specified 78cel file.

11.6.7.$. Sheets
$he ,ist of sheets to read table displays currently selected sheets to read from. >se the L4et sheetnameBsCL button to fill in the available sheets automatically. Note: Oou also need to specify the start ro

and column for each selected sheet. $his

determines the coordinates for

here the step should start reading.

11.6.7.%. Content
$he content tab allo s you to configure the follo ing properties@ 3ption ?eader :o empty ro s *top on empty ro !ilename field *heetname field *heer ro nr field %escription Check if the sheets specified have a header ro Check this if you donLt $his that e need to skip. hen a ant empty ro s in the output of this step.

ill make the step stop reading the current sheet of a file

empty line is encountered. *pecify a field name to include the filename in the output of this step. *pecify a field name to include the sheetname in the output of this step. *pecify a field name to include the sheet ro the step. $he sheet ro sheet. Ro nr ritten field *pecify a field name to include the ro step. MRo ,imit 7ncoding number number in the output of the rittenN is the number of ro s processed" starting number in the output of number in the 78cel number is the actual ro

at # and counting up regardless of sheets and files. limit the number of ro s to this number" 0 means@ all ro s. *pecify the character encoding Bi.e. >$!21" A*C))C

11.6.7.&. 7rror hand,ing


$he 7rror handling tab allo s you to configure the follo ing properties@ 3ption *trict types+ %escription 7nable this option if you ant to fail immediately upon reading an ill attempt to convert une8pected field type. 'hen disabled" /ettle incoming fields to the reHuested data type. )gnore errors+ *kip error lines+ Check this option if you 7nable this option if you ant to ignore errors during parsing ant to skip those lines that contain errors.

Note: you can generate an e8tra file that

ill contain the line numbers

on
'arnings file directory

hich the errors occurred. )f lines


arnings are generated" they ill be T arning

ith errors are not skipped" the


ill be put in this directory. $he

fields that did have parsing errors"


'hen name of that file

ill be empty BnullC.

Pentaho Data Integration TM

S oon !ser "#ide 77

3ption 7rror files directory !ailing line numbers files directory

%escription dirU<filename.TdateQtimeU.T arning e8tensionU 'hen errors occur" they file ill be put in this directory. $he name of that ill be put in this ill be TerrorfileQdirU<filename.TdateQtimeU.TerrorfileQe8tensionU ill be Terrorline

'hen a parsing error occur on a line" the line number directory. $he name of that file dirU<filename.TdateQtimeU.Terrorline e8tensionU

11.6.7.6. 5ie,ds
$he fields tab is for specifying the fields that need to be read from the 78cel files. A button M4et fields from header ro N is provided to automatically fill in the available fields if the sheets have a header ro . !or a given field" the L$ypeL column is provided for performing data type conversions. !or e8ample" if you ant to read a %ate and you have a *tring value in the 78cel file" you can specify the conversion mask. Note: in the case of :umber to %ate conversion Be8ample@ 200.#021 22U 3ctober 21th" 200.C you

should simply specify the conversion mask yyyy66dd because there

ill be an implicit :umber to

*tring conversion taking place before doing the *tring to %ate conversion.

Pentaho Data Integration TM

S oon !ser "#ide 7*

11.6.*. FML In #t

)con

G6, )nput dialog

11.6.*.1. "enera, descri tion


$his step allo s you to read information stored in G6, files. $he follo ing sections descibe the interface for defining the filenames you the G6, file and the fields to retrieve. Note: Oou specify the fields by the path to the 7lement or Attribute and by entering conversion ant to read from" the repeating part of the data part of

masks" data types and other meta2data.

11.6.*.2. 5i,e Tab


$he files tab is options include@ 3ption *tep name !ile or directory %escription :ame of the step. $his name has to be uniHue in a single transformation. $his field specifies the location and<or name of the input te8t file. NOTE@ press the MaddN button to add the file<directory< ildcard combination here you define the location of the 78cel files you ish to read from. Available

to the list of selected files BgridC belo .


Regular e8pression *elected files *pecify the regular e8pression you ant to use to select the files in the ildcard selectionsC along ith a directory specified in the previous option. $his table contains a list of selected files Bor property specifying if file is reHuired or not. )f a file is reHuired and it isnLt found" an error is generated. 3ther ise" the filename is simply skipped. *ho !ilenameBsC $his option sho s a list of the files the ill be generated. NOTE@ $his is a simulation and sometimes depends on the number of ro s

in each file" etc.

Pentaho Data Integration TM

S oon !ser "#ide 7-

11.6.*.$. Content
$he content tab contains the follo ing options for describing the content being read@ 3ption )nclude filename in output 5 fieldname Ro num in output 5 fieldname ,imit :r of header ro s to skip ,ocation Oou can specify the ma8imum number of ro s to read here. *pecify the number of ro s to skip" from the start of an G6, document" before starting to process. *pecify the path by G7o%sH G7o%H G,ield.H...G',ield.H ... G'7o%H ... G'7o%sH $hen you set the location to Ro s" Ro Note: you can also set the root BRo sC as a repeating element location. ay of elements to the repeating part of the G6, file. !or e8ample if you are reading ro s from this G6, file@ %escription Check this option if you the ro field here the filename ant to have the name of the G6, file to ill end up in. ant to have a ro number Bstarts at #C in the here the integer ill end up in. hich belongs in the output stream. Oou can specify the name of the

Check this option if you

output stream. Oou can specify the name

$he output

ill then contain # BoneC ro .

11.6.*.%. 5ie,ds
$he fields tab is properties@ 3ption :ame $ype !ormat ,ength %escription $he name of the field. $ype of the field can be either *tring" %ate or :umber. $he format mask to convert of format specifiers. $he length option depends on the field type follo s@ Precision Currency %ecimal 4roup :umber 2 $otal number of significant figures in a number *tring 2 total length of string %ate 2 length of printed output of the string Be.g. - only gives back yearC :umber 2 :umber of floating point digits *tring 2 unused %ate 2 unused ith. *ee :umber !ormats for a complete description here you define properties for the location and format of the fields being read describes each of the options for configuring the field from the G6, document. $he table belo

$he precision option depends on the field type as follo s@

*ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C

Pentaho Data Integration TM

S oon !ser "#ide *0

3ption $rim type Repeat Position

%escription $he trimming method to apply on the string found in the G6,. Check this if you the previous ro . $he position of the G6, element or attribute. Oou use the follo ing synta8 to specify the position of an element" for e8ample@ $he first element called MelementN@ 9=eleme$t'. $he first attribute called MattributeN@ 5=attribute'. $he first attribute called MattributeN in the second MelementN tag@ 9=eleme$t' / 5=attribute'. ant to repeat empty values ith the corresponding value from

NOTE@ Oou can auto2generate all the possible positions in the G6, file supplied by using the M4et

!ieldsN button.
NOTE@ *upport

as added for G6, documents

here all the information is stored in the Repeating you to grab this information. $he

Bor RootC element. $he special RK locater

as added to allo

M4et fieldsN button finds this information if itLs present.

Pentaho Data Integration TM

S oon !ser "#ide *1

11.6.-. "et 5i,e :a(es

)con

4et !ile :ames dialog

11.6.-.1. "enera, descri tion


$his step allo s you to get information regarding filenames on the file system. $he retrieved filenames are added as ro s onto the stream. $he output fields for this step are@ filename 2 the complete filename" including the path B<tmp<kettle<somefile.t8tC shortQfilename 2 only the filename" path 2 only the path B<tmp<kettle<C ithout the path Bsomefile.t8tC

11.6.!.1.1. File tab


$his tab defines the location of the files you ant to retrieve filenames for. !or more information about specifying file locations" see *electing !iles to read data from.

11.6.!.1.2. Filters
$he filters tab allo s you to filter the retrived filenames based on@ All files and folders !iles only !olders only

Pentaho Data Integration TM

S oon !ser "#ide *2

11.6.10. Te8t 5i,e O#t #t

)con

$e8t file output dialog

11.6.10.1. "enera, Descri tion


$he $e8t file output step is used to e8port data to te8t file format. $is is commonly used to generate Comma *eparated Aalues BC*A filesC that can be read by spreadsheet applications.

11.6.10.2. 5i,e Tab


$he !ile tab is 3ption *tep name !ilename Run this as a command instead+ 78tension )nclude stepnr in filename )nclude partition nr in filename+ )nclude date in filename )ncludes the system date in the filename. BQ200-#2(#C. Adds a point and the e8tension to the end of the filename. B.t8tC )f you run the step in multiple copies B,aunching several copies of a stepC" the copy number is included in the filename" before the e8tension. BQ0C. )ncludes the data partition number in the filename. here you define basic properties about the file being created" such as@ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his field specifies the filename and location of the output te8t file. Check this to MpipeN the results into the command or script you specify.

Pentaho Data Integration TM

S oon !ser "#ide *$

3ption )nclude time in filename *ho filenameBsC

%escription )ncludes the system date in the filename. BQ2(.&.&C. $his option sho s a list of the files the ill be generated.

NOTE@ $his is a simulation and sometimes depends on the number of ro s

in each file" etc.

11.6.10.$. Content
$he content tab contains the follo ing options for describing the content being read@ 3ption Append *eparator 7nclosure %escription Check this to append lines to the end of the specified file. *pecify the character that separates the fields in a single line of te8t. $ypically this is ^ or a tab. A pair of strings can enclose some fields. $his allo s separator characters in fields. $he enclosure string is optional. ?eader 7nable this option if you the te8t file to have a header ro . B!irst line in the fileC. !orce the enclosure around fields+ ?eader !ooter !ormat 7ncoding 7nable this option if you the fileC. 7nable this option if you the fileC. $his can be either %3* or >:)G. >:)G files have lines are separated by linefeeds. %3* files have lines separated by carriage returns and line feeds. *pecify the te8t file encoding to use. ,eave blank to use the default encoding on your system. $o use >nicode specify >$!21 or >$!2#0. 3n first use" *poon ill search your system for available encodings. Compression Allo s you to specify the type of compression" .Fip or .gFip to use compressing the output. NOTE@ At the moment" only one file is placed in a single archive. Right pad fields !ast data dump Bno formattingC *plit every P ro s Add 7nding line of file Add spaces to the end of the fields Bor remove characters at the endC until they have the specified length. )mproves the performance hen dumping large amounts of data to a te8t file by not including any formatting information. )f this number : is larger than Fero" split the resulting te8t2file into multiple parts of : ro s. Allo s you to specify an alternate ending ro to the output file. hen ant the te8t file to have a footer ro . B,ast line in ant the te8t file to have a header ro . B!irst line in $his option forces all field names to be enclosed in the 7nclosure property above. ith the character specified ant

11.6.10.%. 5ie,ds
$he fields tab is 3ption :ame here you define properties for the fields being e8ported. $he table belo

describes each of the options for configuring the field properties@ %escription $he name of the field.

Pentaho Data Integration TM

S oon !ser "#ide *%

3ption $ype !ormat ,ength

%escription $ype of the field can be either *tring" %ate or :umber. $he format mask to convert of format symbols. $he length option depends on the field type follo s@ :umber 2 $otal number of significant figures in a number *tring 2 total length of string %ate 2 length of printed output of the string Be.g. - only gives back yearC :umber 2 :umber of floating point digits *tring 2 unused %ate 2 unused ith. *ee :umber !ormats for a complete description

Precision

$he precision option depends on the field type as follo s@

Currency %ecimal 4roup $rim type :ull 4et fields 6inimal idth

*ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C $he trimming method to apply on the string found in the G6,. )f the value of the field is null" insert this string into the te8tfile Click to retrieve the list of fields from the input streamBsC Alter the options in the fields tab in such a ay that the resulting e idth of lines in the te8t file is minimal. *o instead of save 000000#" ill no longer be padded to their specified length. rite #" etc. *tring fields

Pentaho Data Integration TM

S oon !ser "#ide *&

11.6.11. Tab,e o#t #t

)con

$able output dialog

11.6.11.1. "enera, descri tion


$his step type allo s you to load data into a database table.

11.6.11.2. O tions
$he table belo 3ption *tep name Connection $arget *chema $arget table Commit siFe describes the available options for the $able output step@ %escription :ame of the step. $his name has to be uniHue in a single transformation. $he database connection used to important for data sources that allo $he name of the table to rite data to. rite data to. $his is ith dots Z.V )n it. for table names $he name of the *chema for the table to rite data to.

>se transactions to insert ro s in the database table. Commit the connection every : ro s if : is larger than 0. 3ther ise" donLt use transactions. B*lo erC NOTE: $ransactions are not supported on all database platforms.

$runcate table

*elect this if you

ant the table to be truncated before the first ro

is

inserted into the table.

Pentaho Data Integration TM

S oon !ser "#ide *6

3ption )gnore insert errors

%escription 6akes /ettle ignore all insert errors such as violated primary keys. A ma8imum of 20 arnings ill be logged ho ever. $his option is not ant to use batch inserts. $his feature groups available for batch inserts.

>se batch update for inserts Partition data over tables

7nable this option if you

inserts statements to limit round trips to the database. $his is the fastest option and is enabled by default. >se this options to split the data over multiple tables. !or e8ample instead of inserting all data into table *A,7*" put the data into tables *A,7*Q200.#0" *A,7*Q200.##" *A,7*Q200.#2" ... >se this on systems that donLt have partitioned tables and<or donLt allo inserts into >:)3: A,, vie s or the master of inherited tables. *A,7* allo s you to report on the complete sales@ $he vie

CR7A$7 3R R7P,AC7 A)7' *A,7* A* *7,7C$ ] !R36 *A,7*Q200.0# >:)3: A,, *7,7C$ ] !R36 *A,7*Q200.02 >:)3: A,, *7,7C$ ] !R36 *A,7*Q200.0( >:)3: A,, *7,7C$ ] !R36 *A,7*Q200.0P )s the name of the table defined in a field. >se these options to split the data over one or more tables. $he name of the target table is defined in the field you specify. !or e8ample if you store customer data in the field gender" the data might end up in tables 6 and ! B6ale and !emaleC. $here is an option to e8clude the field containing the tablename from being inserted into the tables. Return auto2generated key :ame of auto2 generated key field *;, Check this if you inserting a ro ant to get back the key that as generated by ill contain into the table. field in the output ro s that

*pecify the name of the ne the auto2generated key.

4enerate the *;, to create the output table automatically.

Pentaho Data Integration TM

S oon !ser "#ide *7

11.6.12. Insert < ! date

)con

)nsert<>pdate dialog

11.6.12.1. "enera, descri tion


$his step type first looks up a ro )f they are not all the same" the ro in a table using one or more lookup keys. )f the ro in the table is updated. canVt be found" it inserts the ro . )f it can be found and the fields to update are the same" nothing is done.

11.6.12.2. O tions
$he table belo 3ption *tep name Connection $arget schema $arget table Commit siFe %onLt perform any updates Pentaho Data Integration TM provides a description of available options for )nsert<>pdate@ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he database connection used to important for data sources that allo :ame of the table in commit. )f this option is checked" the values in the database are never updated. 3nly inserts are done. S oon !ser "#ide ** hich you rite data to. rite data to. $his is ith dots Z.V )n it. for table names $he name of the *chema for the table to

ant to do the insert or update.

$he number of ro s to change Binsert < updateC before running a

3ption /ey ,ookup table

%escription ?ere you specify a list of field values and comparators. Oou can use the follo ing comparators@ K" TU" T" TK" U" UK" ,)/7" =7$'77:" )* :>,," )* :3$ :>,, Note: Click the L4et fieldsL button to retrieve a list of fields from the

input streasmBsC.
>pdate !ields *pecify all fields in the table you keys. Please note that you can avoid updates on certain fields by specifying : in the update column. Note: Click the L4et update fieldsL button to retrieve a list of update ant to insert<update including the

fields from the input streamBsC.


*;, button Click the L*;,L button to generate the *;, to create the table and inde8es for correct operation.

Pentaho Data Integration TM

S oon !ser "#ide *-

11.6.1$. ! date

)con

>pdate dialog

11.6.1$.1. "enera, descri tion


$his step is the same as the )nsert < >pdate step e8cept that no insert is ever done in the database table" 3:,O updates are performed.

Pentaho Data Integration TM

S oon !ser "#ide -0

11.6.1%. De,ete

)con

%elete dialog

11.6.1%.1. "enera, descri tion


$his step is the same as the >pdate step e8cept that instead of updating" ro s are deleted.

Pentaho Data Integration TM

S oon !ser "#ide -1

11.6.1&. Seria,iEe to fi,e Afor(er,1 C#be 5i,e O#t #tB

)con

*erialiFe to file dialog

11.6.1&.1. "enera, descri tion


$his step stores ro s of data in a binary form in a file. )t has the advantage over a te8t BflatC file that the content does not have to be parsed stored in the cube file as ell. hen read back. $his is because the meta2data is

NOTE@ $his step should probably not be used as the tool that could use these Cube files

as

never created" the cube files are not using any MstandardN format.

Pentaho Data Integration TM

S oon !ser "#ide -2

11.6.16. FML O#t #t

)con

G6, output dialog

11.6.16.1. "enera, descri tion


$his step allo s you to rite ro s from any source to one or more G6, files.

11.6.16.2. 5i,e Tab


$he file tab is 3ption *tep name !ilename 78tension )nclude stepnr in filename )nclude date in filename )nclude time in filename )ncludes the system date in the filename BQ2(.&.&C. here you set general properties for the G6, output file format@ %escription :ame of the step. $his name has to be uniHue in a single transformation. $his field specifies the filename and location of the output te8t file. Adds a point and the e8tension to the end of the filename B.8mlC. )f you run the step in multiple copies Bsee also ,aunching *everal Copies of a stepC" the copy number is included in the filename" before the e8tension BQ0C. )ncludes the system date in the filename BQ200-#2(#C.

11.6.16.$. Content
3ption Eipped 7ncoding Parent G6, element Ro G6, element *plit every ... ro s. %escription Check this if you file. $he name of the root element in the G6, document. $he name of the ro another is created Pentaho Data Integration TM S oon !ser "#ide -$ element to use in the G6, document. $he ma8imum number of ro s of data to put in a single G6, file before ant the G6, file to be stored in a E)P archive. $he encoding to use. $his encoding is specified in the header of the G6,

11.6.16.%. 5ie,ds
3ption !ieldname 7lementname $ype %escription $he name of the field. $he name of the element in the G6, file to use. $ype@ $ype of the field can be either *tring" %ate" or :umber. $ype of the field can be either *tring" %ate" or :umber. !ormat mask to convert of format specifiers. ,ength $he length option depends on the field type follo s@ :umber 2 $otal number of significant figures in a number *tring 2 total length of string %ate 2 length of printed output of the string Be.g. - only gives back yearC Note: the output string is padded to this length if it is specified. Precision $he precision option depends on the field type as follo s@ Currency %ecimal 4roup :ull 4et fields 6inimal idth :umber 2 :umber of floating point digits *tring 2 unused %ate 2 unused ith@ see :umber formats for a complete description

*ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C )f the value of the field is null" insert this string into the te8tfile Click to retrieve the list of fields from the input streamBsC. Alter the options in the fields tab in such a fields ay that the resulting e idth of lines in the te8t file is minimal. *o instead of save 000000#" ill no longer be padded to their specified length. rite #" etc. *tring

Pentaho Data Integration TM

S oon !ser "#ide -%

11.6.17. 78ce, O#t #t

)con

78cel output dialog

11.6.17.1. "enera, descri tion


'ith this step you can rite data to one or more 78cel files. $he follo ing sections describe the features available for configuring the 78cel output step.

11.6.17.2. 5i,e Tab


$he file tab is include@ 3ption *tep name !ilename 78tension )nclude stepnr in filename )nclude date in filename )nclude time in filename *ho filenameBsC $his option sho s a list of the files the ill be generated. NOTE@ $his is a simulation and sometimes depends on the number of )ncludes the system date in the filename BQ2(.&.&C. %escription :ame of the step. $his name has to be uniHue in a single transformation. $his field specifies the filename and location of the output te8t file. Adds a point and the e8tension to the end of the filename B.8mlC. )f you run the step in multiple copies Bsee also ,aunching *everal copies of a step" the copy number is included in the filename" before the e8tension BQ0C. )ncludes the system date in the filename BQ200-#2(#C. here you configure the filename of the 78cel output step. Available options

ro s in each file" etc.

11.6.17.$. Content
$he content tab provides additional options for the generated 78cel output file including@

Pentaho Data Integration TM

S oon !ser "#ide -&

3ption ?eader !ooter 7ncoding *plit every...ro s *heet name Protect sheet+ Pass ord >se $emplate 78cel $emplate Append to 78cel template

%escription Check if the spreadsheet needs a header above the e8ported ro s of data. Check if the spreadsheet needs a footer belo for the platform. *plits the data over several output files. Beach in itLs o n spreadsheetC *pecify the name of the *heet to rite to. Check to enable pass ord protection on the target sheet. *pecify the pass ord for the protected sheet. $his is an e8perimental feature that reHuires testing. Check this to use a template hen outputting data to 78cel. $he name of the template used to format the 78cel output file Check this option to have the output appended to the 78cel template specified the e8ported ro s of data. *pecify the encoding of the spreadsheet" leave empty to keep the default

11.6.17.%. 5ie,ds
$he fields tab is here you specify the :ame" data type and format of the fields being ritten to 78cel. $he L4et !ieldsL button such a e ay that the resulting ill retrieve a list of available fields from the input streamBsC coming ill automatically alter the options in the fields tab in idth of lines in the te8t file is minimal. *o instead of save 000000#" ill no longer be padded to their specified length.

into the step. $he L6inimal 'idthL button rite #" etc. *tring fields

Note: Oou can specify any format definitions available in 78cel. $hese formats are not tied to any

/ettle specific formatting.

Pentaho Data Integration TM

S oon !ser "#ide -6

11.6.1*. Microsoft 'ccess O#t #t

)con

6icrosoft Access output dialog

11.6.1*.1. "enera, Descri tion


$his allo s you to create a ne Access database file as an output in a transformation.

11.6.1*.2. O tions
$he follo ing options are available for configuring the Access output@ 3ption *tep name $he database filename Create database $arget table Create table Commit siFe %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he filename of the database file you are connecting to. Check this to generate a ne *pecify the table you Check this to create a ne %efines the commit siFe Access database file. ant to output data to. table in the Access database. hen outputting data.

Pentaho Data Integration TM

S oon !ser "#ide -7

11.6.1-. Database ,oo3#

)con

%atabase Aalue ,ookup dialog

11.6.1-.1. "enera, descri tion


$his step type allo s you to look up values in a database table. ,ookup values are added as ne fields onto the stream.

11.6.1-.2. O tions
$he follo ing table describes the available options for configuring the database lookup@ 3ption *tep name Connection ,ookup schema ,ookup $able 7nable cache+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *elect the database connection for the lookup. *pecify the target schema to use for the lookup. $he name of the table here e do the lookup. e e8pect the $his option caches database lookups. $his means that value. Cache siFe in ro s *pecify the siFe in ro s of the cache to use.

database to return the same value all the time for a certain lookup

Pentaho Data Integration TM

S oon !ser "#ide -*

3ption /eys to look up table Aalues to return table %o not pass the ro the lookup fails !aile on multiple results+ 3rder by 4et !ileds 4et lookup fields if

%escription *pecify the keys necessary to perform the lookup. *elect the fields from the lookup table to add to the output stream. Check to avoid passing a ro hen the lookup fails.

Check this option to cause the step to fail if the lookup returns multiple results. $he order by field allo s you to specify a field and order type Bascending<descendingC for ho step. Click to return a list of available fields from the lookup table to add to the stepLs output stream. the data is retrieved. Click to return a list of available fields from the input streamBsC of the

#M(O T)NT NOTE: if other processes are changing values in the table

here you do the lookup"

it might be un ise to cache values. ?o ever" in all other cases" enabling this option can seriously increase the performance because database lookups are relatively slo . )f you find that you canVt use the cache" consider launching several copies of this step at the same time. $his database busy via different connections. $o see ho copies of a step. ill keep the to do this" please see ,aunching several

Pentaho Data Integration TM

S oon !ser "#ide --

11.6.20. Strea( ,oo3#

)con

*tream ,ookup dialog

11.6.20.1. "enera, descri tion


$his step type allo s you to look up data using information coming from other steps in the transformation. $he data coming from the L*ource stepL is first read into memory and is then used to look up data from the main stream.

Pentaho Data Integration TM

S oon !ser "#ide 100

!or e8ample" this transformation adds information coming from a te8t2file B=C to data coming from a database table BAC@

$he fact that Bsee belo C@

e use information from = to do the lookups is indicated by the option@ L*ource stepL

11.6.20.2. O tions
$he table belo 3ption *tep name ,ookup step $he keys to lookup... !ields to retrieve describes the features available for configuring the stream lookup@ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his is the step name here the lookup data is coming from Allo s you to specify the names of the fields that are used to lookup values. Aalues are al ays searched using the MeHualN comparison. Oou can specify the names of the fields to retrieve here" as default value in case the value case you didnVt like the old one. Preserve memory /ey and value are e8actly one integer field >se sorted list 4et fields Check this to store values using a sorted list. $his provides better memory usage $his hen orking ith data sets containing ide ro s. ant to use ill automatically fill in the names of all the available fields on the $his $his ill encode ro s of data to preserve memory ill also preserve memory hile sorting. hile e8ecuting a sort. as not found or a ne ell as the fieldname in

source side BAC. Oou can then delete all the fields you donVt for lookup. 4et lookup fields $his

ill automatically insert the names of all the available fields on the ant to

lookup side B=C. Oou can then delete the fields you donVt retrieve.

Pentaho Data Integration TM

S oon !ser "#ide 101

11.6.21. Ca,, D= Proced#re

)con

Call %= Procedure dialog

11.6.21.1. "enera, descri tion


$his step type allo s you to e8ecute a database procedure Bor functionC and get the resultBsC back.

11.6.21.2. O tions
$he follo ing table describes the available options for the Call %= Procedure step@ 3ption *tep name Connection Proc2name !ind it button %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. :ame of the database connection on hich the procedure resides. :ame of the procedure or function to call. Click to search on the specified database connection for available procedures and functions Bat the moment only on 3racle and *;,*erverC. 7nable auto commmit )n some situations you ant to do updates in the database using the specified procedure. )n that case you can either have the changes done using auto2commit or by disabling this. )f auto2commit is disabled" a single commit is being performed after the last ro this step. Result name Result type Parameters :ame of the result of the function call" leave this empty in case of procedure. $ype of the result of the function call. :ot used in case of a procedure. ,ist of parameters that the procedure or function needs !ield name@ :ame of the field. %irection@ Can be either ): Binput onlyC" 3>$ Boutput onlyC" ):3>$ Bvalue is changed on the databaseC. as received by

Pentaho Data Integration TM

S oon !ser "#ide 102

3ption

%escription $ype@ >sed for output parameters so that /ettle kno s comes back. hat

4et !ields

$his function fills in all the fields in the input streams to make your life easier. *imply delete the lines you donVt need and re2order the ones you do need.

Pentaho Data Integration TM

S oon !ser "#ide 10$

11.6.22. >TTP C,ient

)con

Call %= Procedure dialog

11.6.22.1. "enera, Descri tion


$he ?$$P client step performs a very simple call to a base >R, this@ http@''G*7+HOparam.=value.Pparam =value P.. $he result is stored in a *tring field ith the specified name. ith options appended to it like

11.6.22.2. O tions
$he follo ing table describes the options available for the ?$$P client step@ 3ption *tep name >R, Result fieldname Parameters %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he base >R, string $he name of the field to store results $his section is here you define the parameter name2value pairs to pass on the >R,.

Pentaho Data Integration TM

S oon !ser "#ide 10%

11.6.2$. Select "alues

)con

*elect Aalues dialog

11.6.2$.1. "enera, descri tion


$his *elect values step is useful for selecting" renaming and configuring the lengh and precesion of the fields on the stream. $hese operations are organiFed into different categories@ *elect 5 Alter 2 *pecify the e8act order and name in the output ro s Remove 2 *pecify the fields that have to be removed from the output ro s 6eta2data 2 Change the name" type" length and precision Bthe meta2dataC of one or more fields hich the fields have to be placed in

11.6.2$.2. Se,ect 4 ',ter


$he *elect 5 Alter tab provides the follo ing options@ 3ption *tep name !ields 4et fields to select 7dit 6apping %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his tab allo s you to rename a field and specify the lenth and precision. Click to insert fields from all input steams to the step. Click to open a mapping dialog to easily define mulitple mappings bet een source and target fields. Note: this step only )nclude unspecified fields" ordered by name 7nable if you

orks if there is only one target output step.

ant to implicitly select all other fields from the input

streamBsC that are not e8plicity selected in the !ields section.

11.6.2$.$. 0e(o6e
$his tab allo s you to enter the fields that you ant removed from th e stream. Oou can also click the L4et fields to removeL button to add all fields from the input streamBsC. $his makes it easier if you are trying to remove several fields. After getting all fields" simply delete any of the fields that you do not ant remove from the stream.

Pentaho Data Integration TM

S oon !ser "#ide 10&

11.6.2$.%. Meta2data
$his tab allo s you to rename" change data types" and change the length and precision of fields coming into the *elect Aalues step. Click the L4et fields to changeL button to add all fields on the input streamBsC. Note: $he type column is useful for cases

here you need to set a specific data type to avoid

repeated data type conversions. !or e8ample" if your transformation is taking advantage of the laFy conversion option and includes a sort step" this could result in repeated data conversions internal to the sort step in order to perform the data comparisons. Oou can *tringC. orkaround this issue by using the *elect Aalues step to convert your sort key fields to normal data Bi.e. from =inary to

Pentaho Data Integration TM

S oon !ser "#ide 106

11.6.2%. 5i,ter ro/s

)con

!ilter ro s dialog

11.6.2%.1. "enera, descri tion


$his step type allo s you to filter ro s based upon conditions and comparisons. 3nce this step is connected to a previous step Bone or more and receiving inputC" you can simply click on the MTfieldUN" MKN and MTvalueUN areas to construct a condition. Oou can add more conditions by clicking on the LAdd conditionL icon seen here@ )t ill convert the original condition to a subcondition and add one more. A subcondition can be

edited simply by clicking on it Bgoing do n one level into the condition treeC. !or e8ample" this is a more comple8 e8ample@

Pentaho Data Integration TM

S oon !ser "#ide 107

11.6.2%.2. O tions
3ption *tep name *end LtrueL data to step *end LfalseL data to step $he Condition %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he ro s for to this step. $he ro s for to this step. Click the L:3$L button in the upper left to negate the condition. Click on the T!ieldU buttons to select from a list of fields from the input streamBsC to build your condidionBsC. Click on the TvalueU button to enter a specific value into your conditionBsC. $o delete a condition" right2click on it and select L%elete ConditionL. Add Condition button Click to add a condition. hich the condition specified evaluates to false are send hich the condition specified evaluates to true are send

Pentaho Data Integration TM

S oon !ser "#ide 10*

11.6.2&. Sort ro/s

)con

*ort ro s dialog

11.6.2&.1. "enera, descri tion


$his step type sorts ro s based upon the fields you specify and sorted ascending or descending. NOTE: /ettle has to sort ro s using temporary files hether or not they should be

hen the number of ro s e8ceeds .000.

11.6.2&.2. O tions
$he follo ing table describes the options for the *ort step@ 3ption *tep name *ort directory %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his is the directory in system. $6P2file prefi8 *ort siFe Choose a recogniFable prefi8 in order to recogniFe the files sho up in the temp directory. hen they hich the temporary files are stored in case it is needed. $he default is the standard temporary directory for the

$he more ro s you can store in memory" the faster the sort gets. $his is because less temporary files need to be used and less )<3 is generated.

Compress $6P !iles 3nly pass uniHue ro s+ !ields table 4et !ields button

$his option compresses temporary files complete the sort. 7nable this option if you streamBsC.

hen they are needed to

ant to only pass uniHue ro s to the output

*pecify the fields and direction Bascending<decendingC to sort. Oou can optionally specify hether or not to perform a case sensitive sort. Click to retrieve a list of all fields coming in on the streamBsC.

Pentaho Data Integration TM

S oon !ser "#ide 10-

11.6.26. 'dd seG#ence

)con

*ort ro s dialog

11.6.26.1. "enera, descri tion


$his step ill add a seHuence to the stream. A seHuence is an ever2changing integer value ith a certain start and increment value. Oou can either use a database B3racleC seHuence to determine the value of the seHuence" or have it generated by /ettle.. NOTE@ /ettle seHuences are only uniHue

hen used in the same transformation. Also" they are not

stored" so the values start back at the same value every time the transformation is launched.

11.6.26.2. O tions
$he follo ing table describes the options for the Add *eHuence step@ 3ption *tep name :ame of value >se %= to generate the seHuence %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. :ame of the ne seHuence value that is added to the stream. ant the seHuence to be driven by a 7nable this option if you database seHuence. >se a transformation Pentaho Data Integration TM *onnection name@ choose the name of the connection on hich the database seHuence resides. Schema name@ optionally specify the tableLs schema name Se+uence name@ allo s you to enter the name of the database seHuence. 7nable this option if you ant the seHuence to be generated by S oon !ser "#ide 110

3ption counter to generate the seHuence

%escription /ettle. ,se counter to calculate se+uence@ 7nable this option if you ant the seHuence to be generated by /ettle. *ounter name -optional.@ if multiple steps in a transformation generate the same value name" this option ould allo you to specify the name of the counter you ith. $his ould avoid ant to associate the seHuence

forcing uniHue seHuencing across multiple steps. Attention@ )n this case you have to ensure that@ start"

increment and ma8imum value of all counters same name are identical" other ise the result is unpredictable.
Start at@ give the start value of the seHuence.

ith the

#ncrement b'@ give the increment of the seHuence. Maximum "alue@ this is the ma8imum value after the seHuence hich ill start back at the start value B*tart AtC.

Examples: -tart at = ./ i$creme$t by = ./ max value = = Q 2his %ill produce@ ./ / =/ ./ / =/ ./ R -tart at = !/ i$creme$t by = -./ max value = Q 2his %ill produce@ !/ -./ - / !/ -./ - / !R

Pentaho Data Integration TM

S oon !ser "#ide 111

11.6.27. D#((1 Ado nothingB

)con

%ummy dialog

11.6.27.1. "enera, descri tion


$his step does not do anything. )ts main function is perform as a placeholder in case you each other. )f you step. ?ere is another e8ample using the %ummy step" starting ith the follo ing transformation@ ant to test something. !or e8ample" to have a transformation" you need at least 2 steps connected to ant to test for e8ample a test file input step" you can connect it to a dummy

>nfortunately" the L*tream ,ookupL step can only read lookup information from one stream. $he %ummy step can be used to ork around this limitation like this@

Pentaho Data Integration TM

S oon !ser "#ide 112

11.6.2*. 0o/ :or(a,iser

)con

Ro

:ormalser dialog

11.6.2*.1. "enera, descri tion


$his step normaliFes data back from pivoted tables. !or e8ample" starting 6onth 200(<0# 200(<02 P $he Ro 6onth 200(<0# 200(<0# 200(<0# 200(<02 200(<02 200(<02 P :ormaiFer step ith this e8ample of product sales data@ Product A #0 #2 P Product = . 7 P Product C #7 #& P

ill convert this data into the follo ing format so that it is easier to Product A = C A = C P sales #0 . #7 #2 7 #& P

update your fact table@

11.6.2*.2. O tions
$he follo ing options are available for the Ro 3ption *tep name $ypefield !ields table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he name of the type field. BProduct in our e8ampleC $his is a list of the fields you ant to normaliFer" you ill need to set :ormaliser *tep@

Pentaho Data Integration TM

S oon !ser "#ide 11$

3ption

%escription the follo ing properties for each selected field@ Fieldname@ :ame of the fields to normaliFe BProduct A f C in our e8ampleC. T'pe@ 4ive a string to classify the field BA" = or C in our e8ampleC. Ne! /ield@ Oou can give one or more fields here the ne value should transferred to Bsales in our e8ampleC.

4et !ields button

Click to retrieve a list of all fields coming in on the streamBsC.

11.6.2*.$. 78a( ,e H nor(a,ising (#,ti ,e ro/s in a sing,e ste


$he follo ing e8ample illustrates using the Ro at a time. =eginning %A$7 200(0#0# P . P PR#Q:R PR#Q*, #00 P :ormalaser step to normaliFe more than one ro PR2Q*, 2.0 P PR(Q:R P PR(Q*, #.0 P ith the follo ing data format@ PR2Q:R #0 P

Oou can convert this into@ %A$7 $ype 200(0#0# 200(0#0# 200(0#0# P $his Product Product# Product2 Product( P ould be the setup to do it ith@ *ales #00 2.0 #.0 P Product :umber . #0 P

Pentaho Data Integration TM

S oon !ser "#ide 11%

11.6.2-. S ,it 5ie,ds

)con

*plit !ields dialog

11.6.2-.1. "enera, descri tion


$his step allo s you to split fields based upon delimiter information.

11.6.2-.2. O tions
$he follo ing options are available for configuring the *plit !ields step@ 3ption *tep name !ield to split %elimiter !ields table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he name of the field you $his table is ant to split. field %elimiter that determines the end of a field. here you define the properties for each ne field" you created by the split. !or each ne ill need to define the

field name" data type and other properties.

11.6.2-.$. S ,it fie,ds e8a( ,es


Example 0@ *A,7*QAA,>7* field containing@ M.00"(00"200"#00N >se these settings to split the field into - ne %elimiter@ " !ield@ *A,7*#" *A,7*2" *A,7*(" *A,7*)d@ remove )% no" no" no" no type@ :umber" :umber" :umber" :umber format@ [[[.[[" [[[.[[" [[[.[[" [[[.[[ group@ decimal@ . fields@

Pentaho Data Integration TM

S oon !ser "#ide 11&

currency@ length@ (" (" (" ( precision@ 0" 0" 0" 0

Example 1@ *A,7*QAA,>7* field containing M*ales2K(#0..0" *ales-K#.0.2(N >se these settings to split the field into - ne %elimiter@ " !ield@ *A,7*#" *A,7*2" *A,7*(" *A,7*)d@ *ales#K" *ales2K" *ales(K" *ales-K remove )% yes" yes" yes" yes type@ :umber" :umber" :umber" :umber format@ [[[.[[" [[[.[[" [[[.[[" [[[.[[ group@ decimal@ . currency@ length@ 7" 7" 7" 7 precision@ 2" 2" 2" 2 fields@

Pentaho Data Integration TM

S oon !ser "#ide 116

11.6.$0. !niG#e ro/s

)con

>niHue ro s dialog

11.6.$0.1. "enera, descri tion


$his step removes duplicate ro s from the input streamBsC. #M(O T)NT NOTE@ 6ake sure that the input stream is sortedJ 3ther ise only consecutive

double ro s are evaluated correctly.

11.6.$0.2. O tions
$he follo ing table describes all options for the >niHue ro s step@ 3ption *tep name Add counter to output+ !ields to compare table *pecify the field names you ant to force uniHueness on or click the L4etL button to insert all fields from the input streamBsC. Oou can BoptionallyC choose to ignore case by setting the L)gnore caseL flag to O. !or e8ample@ /ettle" /7$$,7" kettle all first occurrence B/ettleC ill be the same in case the compare is done case insensitive. )n this case" the ill be passed to the ne8t stepBsC. %escription :ame of the step. Note: $his name has to be uniHue in a single transformation.

Pentaho Data Integration TM

S oon !ser "#ide 117

11.6.$1. "ro# =1

)con

4roup by dialog

11.6.$1.1. "enera, descri tion


$his step allo s you to calculate values over a defined group of fields. 78amples of common use cases are@ calculate the average sales per product get the number of yello shirts that e have in stock

11.6.$1.2. O tions
$he follo ing table provides a description of the options available for the 4roup =y step@ 3ption *tep name )nclude all ro s+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. Check this if you output" $emporary files directory ant all ro s in the output" not Dust the aggregation. $o differentiate bet een the 2 types of ro s in the e need a flag in the output. Oou need to specify the name hich the temporary files are stored in case it of the flag field in that case. Bthe type is booleanC $his is the directory in system. $6P2file prefi8 Add line number" restart in each group ,ine number field name *pecify the file prefi8 used Check this if you group. Check this if you ant to add a line number that restarts at # in each hen naming temporary files. ant to add a line number that restarts at # in each is needed. $he default is the standard temporary directory for the

Pentaho Data Integration TM

S oon !ser "#ide 11*

3ption 4roup fields table Aggregates table

%escription group. *pecify the fields over hich you ant to group. Oou can click the L4et !ieldsL button to add all fields from the input streamBsC. *pecify the fields that need to be aggregated" the method and the name of the resulting ne field.

Pentaho Data Integration TM

S oon !ser "#ide 11-

11.6.$2. :#,, If

)con

:ull )f dialog

11.6.$2.1. "enera, descri tion


)f the string representation of a certain field is eHual to the specified value" then the value is set the null BemptyC. Oou can add all fields from the input streamBsC using the L4et !ieldsL.

Pentaho Data Integration TM

S oon !ser "#ide 120

11.6.$$. Ca,c#,ator

)con

Calculator dialog

11.6.$$.1. "enera, descri tion


$his calculator step provides a pre2defined functions that can be e8ecuted on input field values. )f you have a need for other generic" often used functions" please visit our community page and let us kno about your enhancement reHuest.

Note: An important advantage Calculator has over custom 9ava*cript scripts is that the e8ecution

speed of Calculator is many times that of a script.


=esides the arguments B!ield A" !ield = and !ield CC you also need to specify the return type of the function. Oou can also opt to remove the field from the result BoutputC after all values calculated. $his is useful for removing temporary values. ere

11.6.$$.2. 10.%.$%.% 5#nction List


!unction *et field to constant A A_= A2= A]= A<= A]A *;R$B A C #00 ] A < = A 2 B A ] = < #00 C A _ B A ] = < #00 C %escription Create a field A plus = A minus = A multiplied by = A divided by = $he sHuare of A $he sHuare root of A Percentage of A in = *ubtract =X of A Add =X to A ith a constant value ReHuired fields A A and = A and = A and = A and = A A A and = A and = A and =

Pentaho Data Integration TM

S oon !ser "#ide 121

!unction A _ = ]C *;R$B A]A _ =]= C R3>:%B A C R3>:%B A" = C *et field to constant A %ate A _ = days Oear of date A 6onth of date A %ay of year of date %ay of month of date A %ay of eek of date A 'eek of year of date A )*3100# 'eek of year of date A )*3100# Oear of date A =yte to he8 encode of string A ?e8 encode of string A Char to he8 encode of string A ?e8 decode of string A

%escription Add A and = times C Calculate gBA2_=2C Round A to the nearest integer Round A to = decimal positions Create a field ith a constant value Add = days to %ate field A Calculate the year of date A Calculate number the month of date A A Calculate the day of year B#2(0.C Calculate the day of month B#2(#C Calculate the day of Calculate the Calculate the .(C Calculate the year )*3100# style 7ncode bytes in a string to a he8adecimal representation 7ncode a string in its o n he8adecimal representation 7ncode characters in a string to a he8adecimal representation %ecode a string from its he8adecimal representation Badd a leading 0 lengthC hen A is of odd eek B#27C eek of year B#2.-C eek of the year )*3100# style B#2

ReHuired fields A" = and C A and = A A and = A A and = A A A A A A A A A A A A

Pentaho Data Integration TM

S oon !ser "#ide 122

11.6.$%. FML 'dd

)con

Add G6, dialog

11.6.$%.1. "enera, descri tion


$his step allo s you to encode the content of a number of fields in a ro added to the ro in the form of a *tring field. in G6,. $his G6, is

11.6.$%.2. Content Tab


3ption *tep name 7ncoding 3utput Aalue Root G6, element 3mit G6, header %escription :ame of the step. $his name has to be uniHue in a single transformation. $he encoding to use. $his encoding is specified in the header of the G6, file. $he name of the ne field that ill contain the G6, $he name of the root element in the generated G6,. Check to not include the G6, header in the output.

11.6.$%.$. 5ie,ds
$he !ields tab is 3ption !ieldname 7lement name $ype !ormat ,ength Precesion here you configure the output fields and their formats. $he table belo %escription :ame of the field. $he name of the element in the G6, file to use. $ype of the field can be either *tring" %ate" or :umber. !ormat mask to convert of format specifiers. 3utput string is padded to this length if it is specified. $he precision to use. ith@ see :umber !ormats for a complete description describes each of the available properties for a field@

Pentaho Data Integration TM

S oon !ser "#ide 12$

3ption Currency %ecimal 4rouping :ull

%escription *ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C $he string to use in case the field value is null. Attribute@ make this an attribute B: means @ elementC

11.6.$%.%. 'dd FML78a( ,e


>se Case ) have data that comes in a variety of classes and ) !or e8ample" ) Ra data id # 2 . 0 7 *OLO, blue red blue red =lue 2 ( # . 1 & id ( 0 #0 #2 #3 . ( & 2 7 2 ( 2 1 7 . . . . . . 3 # 2 1 2 LEN4T% 0 0 0 0 0 5#DT% )D#,S blue red blue blue red ant to turn the ra ould like to store it as G6, in my database. data into the database layout belo @

S%)(E *OLO, circle circle circle circle circle S%)(E rectangle rectangle rectangle rectangle rectangle 3utput *ample #D ( 2 3 7

*L)SS6D)T) T*?AP7 typeKYcircleYU TC3,3>RUblueT<C3,3>RU TRA%)>*U .T<RA%)>* T<*?AP7U

T*?AP7 typeKYrectangleYU TC3,3>RUblueT<C3,3>RU T')%$?U -T<')%$?U T,7:4$?U 0T<,7:4$?U T<*?AP7U

T*?AP7 typeKYrectangleYU TC3,3>RUblueT<C3,3>RU T')%$?U -T<')%$?U T,7:4$?U0T<,7:4$?U T<*?AP7U

T*?AP7 typeKYcircleYU S oon !ser "#ide 12%

Pentaho Data Integration TM

TC3,3>RUblueT<C3,3>RU TRA%)>*U .T<RA%)>*U T<*?AP7U

Pentaho Data Integration TM

S oon !ser "#ide 12&

11.6.$&. 'dd constants

)con

Add constants dialog

11.6.$&.1. "enera, descri tion and #sage


$he Add contant values step is a simple and fast performing ay to add constant values to the stream. $o add a constant" simply specify the name" type and value in the form of a string. $hen" specify the formats to convert the value into the chosen data type.

Pentaho Data Integration TM

S oon !ser "#ide 126

11.6.$6. 0o/ Denor(a,iser

)con

%enormaliser dialog

11.6.$6.1. "enera, descri tion


$his step allo s you de2normaliFe data by looking up key2value pairs. )t also allo s you to immediately convert data types.

11.6.$6.2. O tions
3ption *tep name /ey field 4roup fields $arget fields %escription :ame of the step. $his name has to be uniHue in a single transformation. $he field that defined the key. *pecify the fields that make up the grouping here. *pecify the fields to de2normaliFe. Oou do it by specifying the *tring value for the key field Bsee aboveC. 3ptions are provided to convert data types. 6ostly people use *trings as key2 value pairs so you often need to convert to )nteger" :umber or %ate. )n case you get key2value pair collisions Bkey is not uniHue for the group specifiedC you can specify the aggregation method to use.

Pentaho Data Integration TM

S oon !ser "#ide 127

11.6.$7. 5,attener

)con

!lattener dialog

11.6.$7.1. "enera, descri tion


$his step allo s you flatten seHuentially provided data.

11.6.$7.2. O tions
3ption *tep name $he field to flatten $arget fields %escription :ame of the step. $his name has to be uniHue in a single transformation. $he field that needs to be flattened into different target fields. $he name of the target field to flatten to.

11.6.$7.$. 5,attener 78a( ,e


=eginning !ield# A A % % $his can be flattened to@ !ield# A % !ield2 = 7 !ield( C ! $arget# 3ne $hree $arget2 $ o !our ith the follo ing data set@ !ield2 = = 7 7 !ield( C C ! ! !latten 3ne $ o $hree !our

Pentaho Data Integration TM

S oon !ser "#ide 12*

)n the e8ample above" this is

hat the dialog looks like@

Pentaho Data Integration TM

S oon !ser "#ide 12-

11.6.$*. @a,#e Ma

er

)con

Aalue 6apper dialog

11.6.$*.1. "enera, descri tion


$his step maps string values from one value to another. >sually" you alternative. !or e8ample" if you ant to replace language codes@ ill ant to solve this problem by storing the conversion table in a database. ?o ever" this step provides a simple

!ieldname to use@ ,anguageCode $arget fieldname@ ,anguage%esc *ource<$arget@ 7:<7nglish" !R<!rench" :,<%utch" 7*<*panish" %7<4erman" ... NOTE: )t is also possible to convert a null field or empty *tring value to a non2empty value. ,eave

the M*ource valueN field empty for this. )t is obviously only possible to specify one of these empty source field values.

11.6.$*.2. O tions
$he follo ing properties are used to define the mappings@ 3ption *tep name !ieldname to use $arget field name %efault upon non2 matching !ield values table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. !ield to use as the mapping source. !ield to use as the mapping $arget %efines a default value for situations but thereLs no match. Contans the mapping of source value to converted target value. here the source value is not empty"

Pentaho Data Integration TM

S oon !ser "#ide 1$0

11.6.$-. =,oc3ing ste

)con

=locking dialog

11.6.$-.1. "enera, descri tion


$his simple step blocks all output until the very last ro that time either the last ro as received from the previous step. At is then sent off to the ne8t step or the complete input is sent of to the

ne8t step. $his step then kno s that all previous steps have finished. Oou can use this for triggering plugins" stored procedures" Dava scripts" ... or for synchroniFation purposes.

11.6.$-.2. O tions
$he follo ing table describes the options for the =locking step@ 3ption *tep name Pass all ro s+ *pool directory *pool2file prefi8 Cache siFe Compress spool files+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. %etermines hether to pass # ro s or all ro s. hich the temporary files are stored in case it is hen they orks. $his is the directory in

needed. $he default is the standard temporary directory for the system. Choose a recogniFable prefi8 in order to recogniFe the files sho up in the temp directory. hen they are needed. $he more ro s you can store in memory" the faster the step $his option compresses temporary files

Pentaho Data Integration TM

S oon !ser "#ide 1$1

11.6.%0. 9oin 0o/s ACartesian rod#ctB

)con

9oin ro s dialog

11.6.%0.1. "enera, descri tion


$his step allo s you to produce combinations BCartesian productC of all ro s on the input streams. $his is an e8ample@

$he MOears 8 6onths 8 %aysN step outputs all combinations of Oear" 6onth and %ay. B#&00" #" # 2#00" #2" (#C and can be used to create a date dimension.

Pentaho Data Integration TM

S oon !ser "#ide 1$2

11.6.%0.2. O tions
$he follo ing table describes the options for configuring the 9oin ro s step@ 3ption *tep name $emp directory $6P2file prefi8 6a8. cache siFe %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the name of the directory case you here the system stores temporary files in ill be generated. sets that donVt fit ant to combine more then the cached number of ro s.

$his is the prefi8 of the temporary files that files. $his is needed in case you into memory.

$he number of ro s to cache before the systems reads data from temporary ant to combine large ro

6ain step to read from $he ConditionBsC

*pecifies the step to read the most data from. $his step is not cached or spooled to disk" the others are. Oou can enter a comple8 condition to limit the number of output ro s.

Pentaho Data Integration TM

S oon !ser "#ide 1$$

11.6.%1. Database &oin

)con

%atabase 9oin dialog

11.6.%1.1. "enera, descri tion


>sing data from previous steps" this step allo s you to run a Huery against a database. $he parameters for this Huery can be specified@ as Huestion marks B+C in the *;, Huery. as fields in the data grid. $he t o need to be in the same order. !or e8ample" this step allo s you to run Hueries looking up the oldest person that bought a certain product like this@ -9+9C2 ,748 03979 5)> 47>97 :6 customer$r productAorders/ customer orders.customer$r = customer.customer$r orders.product$r = O customer.dateAofAbirth

All you then need to specify as a parameter is the productnr and youLll get the customernr included in the result.

11.6.%1.2. O tions
$he follo ing table desribes the options for the %atabase 9oin step@ 3ption *tep name Connection %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he database connection to use.

Pentaho Data Integration TM

S oon !ser "#ide 1$%

3ption *;, :umber of ro s to return 3uter Doin+ Parameters table

%escription *;, Huery to launch to ards the database" use Huestion marks as parameter placeholders. 0 means all" any other number limits the number of ro s. Check this to al ays return a result" even if the Huery didnLt return a result. *pecify the fieldns containg parameters and the parameter type.

Pentaho Data Integration TM

S oon !ser "#ide 1$&

11.6.%2. Merge ro/s

)con

6erge ro s dialog

11.6.%2.1. "enera, descri tion


$his step simply allo s you to compare t o streams of ro s. $his is useful if you data from t o different times. )t is often used in situations arehouse does not contain a date of last update. $he t o streams of ro s" a reference stream Bthe old dataC and a compare stream Bthe ne are merged. 7ach time only the last version of a ro marked as follo s@ $he ro case. #M(O T)NT@ both streams need to be sorted on the specified keyBsC. MidenticalN 2 $he key MchangedN 2 $he key Mne N 2 $he key MdeletedN 2 $he key as found in both streams and the values to compare ere identical^ as found in both streams but one or more values is different^ as not found in the compare stream. is passed onto the ne8t steps. $he ro dataC" is ant compare here the source system of a data

as not found in the reference stream^

coming from the compare stream is passed on to the ne8t steps" e8cept for the MdeletedN

11.6.%2.2. O tions
3ption *tep name Reference ro s origin Compare ro s origin !lag fieldname /eys to match Aalues to compare *pecify the name of the flag flag field on the output stream. *pecify fields containing the keys to match on. Click the L4et key fieldsL button to insert all of the fields originating from the reference ro s step. *pecify fields contaning the values to compare. Click the L4et value fieldsL button to insert all of the fields from the originating value ro s step. *pecify the step origin for the compare ro s. %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the step orgin for the reference ro s.

Pentaho Data Integration TM

S oon !ser "#ide 1$6

11.6.%$. Sorted Merge

)con

*orted 6erge dialog

11.6.%$.1. "enera, descri tion and #sage


$he *orted 6erge step merges ro s coming from multiple input steps providing these ro s are sorted themselves on the given key fields.

11.6.%$.2. O tions
3ption *tep name !ields table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the fieldname and sort direction Bascending<decendingC. Click the L4et !ieldsL button to retrieve a list of fields from the input streamBsC.

Pentaho Data Integration TM

S oon !ser "#ide 1$7

11.6.%%. Merge 9oin

)con

6erge 9oin dialog

11.6.%%.1. "enera, descri tion and #sage


$he 6erge 9oin step performs a classic merge Doin bet een data sets ith data coming from t o different input steps. 9oin options include )::7R" ,7!$ 3>$7R" R)4?$ 3>$7R" and !>,, 3>$7R. NOTE@ $his step e8pects the ro s coming in to be sorted on the specified key fields

11.6.%%.2. O tions
$he follo ing table describes the options available for the 6erge 9oin step@ 3ption *tep name !irst *tep *econd *tep 9oin $ype /eys for # step /eys for 2nd step
st

%escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the first input step to the merge Doin. *pecify the second input step to the merge Doin. *elect from the available types of Doins. *pecify the key fields on *pecify the key fields on hich the incoming data is sorted. Click the L4et hich the incoming data is sorted. Click the L4et key fieldsL button to retrieve a list of fields from the specified step. key fieldsL button to retrieve a list of fields from the specified step.

Pentaho Data Integration TM

S oon !ser "#ide 1$*

11.6.%&. 9a6aScri t @a,#es

)con

9ava *cript value dialog

11.6.%&.1. "enera, descri tion


$his step type allo s you to do comple8 calculations using the 9ava*cript language. $he 9ava*cript engine used is Rhino #..R..

11.6.%&.2. O tions
3ption *tep name 9ava *cript !ields )nsert fields button $est script button 4et variables button 3ption *tep name 9ava *cript !ields )nsert fields button $est script button 4et variables button %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the script you ish to run. e ant to add to the output steam. $hese are the fields that $ests

)nserts the fields and the standard method to grab the value of the field. hether or not the script compiles. 4ets the ne ly created variables and inserts them into the M!ieldsN grid. %escription :ame of the step^ this name has to be uniHue in a single transformation *pecify the script you ant to run $he fields to add to the output steam )nserts the fields and the standard method to grab the value of the field $ests hether or not the script compiles 4ets the ne ly created variables and inserts them into the !ields grid

11.6.%&.$. @a,#e f#nctions


$his is a list of functions that you can use to manipulate values@

Pentaho Data Integration TM

S oon !ser "#ide 1$-

!unction Aalue CloneBC void set:ameB*tring nameC *tring get:ameBC . void setAalueBdouble numC void setAalueB*tring strC void setAalueB%ate datC void setAalueBboolean boolC void setAalueBlong lC void setAalueBAalue vC double get:umberBC *tring get*tringBC int get*tring,engthBC %ate get%ateBC boolean get=ooleanBC

%escription =uilds a copy of a value and returns a Aalue. *ets the name of a value. 4et the name of a value *et the value to a floating point value. *et the value to a string value. *et the value to a %ate value. *et the value to a =oolean value. *et the value to an integer value. *et the value to the value contained in another field. 4ets the value of a field as a floating point value. 4ets the value of a field as a te8tual string representation. 4ets the length of the string representation. 4ets the value of the field as a date value. 4ets the value of a field as a =oolean. NOTE: *tring MON or MtrueN is converted to true. NOTE: :umeric value 0 is converted to false" everything else is

true.
long get)ntegerBC 4ets the value of a field as an integer. NOTE: %ate fields are converted to the number of milliseconds

since 9anuary #st #&70 00@00@00 46$.


boolean is7mptyBC boolean is*tringBC boolean is%ateBC boolean is:umberBC boolean is=ooleanBC boolean is)ntegerBC boolean is:umericBC *tring to*tringBC *tring to*tringBboolean padC *tring to*tring6etaBC void set,engthBint lC void set,engthBint l" int pC int get,engthBC int getPrecisionBC void setPrecisionBint pC *tring get$ype%escBC void set:ullBC void clear:ullBC void set:ullBboolean nC boolean is:ullBC int compareBAalue vC Returns true if the value is null. Compares t o values and returns # if the first value is larger than the second" 2# if it is smaller and 0 if they are eHual. Pentaho Data Integration TM S oon !ser "#ide 1%0 )f the value has no type" this function returns true. )f the value is of type *tring" this function returns true. )f the value is of type *tring" this function returns true. )f the value is of type :umber" this function returns true. )f the value is of type =oolean" this function returns true. )f the value is of type )nteger" this function returns true. )f the value is of type :umber or )nteger" this function returns true. Returns the te8tual representation of the value. Returns the te8tual representation of the value" padded to the length of the string if pad K true. Returns the meta2data information of the value as a string. *ets the length of the value. *ets the length and precision of the value. Returns the length of the value. Returns the precision of the value. *ets the precision of a value. Returns the description of a value as a string. Be.g. M*tringN" M:umberN" M%ateN" M)ntegerN" M=ooleanNC. *ets the value to :ull. Removes the null setting.

!unction boolean eHualsB3bDect vC int hashCodeBC Aalue negateBC Aalue andBAalue vC Aalue 8orBAalue vC Aalue orBAalue vC Aalue boolQandBAalue vC Aalue boolQorBAalue vC Aalue boolQ8orBAalue vC Aalue boolQnotBC Aalue greaterQeHualBAalue vC Aalue smallerQeHualBAalue vC Aalue differentBAalue vC Aalue eHualBAalue vC Aalue likeBAalue vC Aalue greaterBAalue vC Aalue smallerBAalue vC Aalue minusBdouble vC Aalue minusBlong vC Aalue minusBint vC Aalue minusBbyte vC Aalue minusBAalue vC Aalue plusBdouble vC Aalue plusBlong vC Aalue plusBint vC Aalue plusBbyte vC Aalue plusBAalue vC Aalue divideBdouble vC Aalue divideBlong vC Aalue divideBint vC Aalue divideBbyte vC Aalue divideBAalue vC Aalue multiplyBdouble vC Aalue multiplyBlong vC Aalue multiplyBint vC Aalue multiplyBbyte vC Pentaho Data Integration TM

%escription Compares t o values and returns true if the t o values have the same value. Returns a signed 0- values representing the value in the form of a hash code. )f the value is numeric" multiplies the value by 2#" in all other cases it doesnVt do anything. Calculates the bit ise A:% of t o integer values. Calculates the bit ise G3R of t o integer values. Calculates the bit ise 3R of t o integer values. Calculates the boolean A:% of t o boolean values. Calculates the boolean 3R of t o boolean values. Calculates the boolean G3R of t o boolean values. Calculates the boolean :3$ of a boolean value. Compares t o values and sets the first to true if the second is greater or eHual to the first. Compares t o values and sets the first to true if the second is smaller or eHual to the first. Compares t o values and sets the first to true if the second is different from the first. Compares t o values and sets the first to true if the second is eHual to the first. *ets the first value to true if the second string is part of the first. Compares t o values and sets the first to true if the second is greater than the first. Compares t o values and sets the first to true if the second is smaller than the first. *ubtracts v from the field value.

Adds v to the field value.

%ivides the field value by v.

6ultiplies the field value by v. NOTE: *trings can be multiplied as

ell@ the result is v times the

string concatenated.

S oon !ser "#ide 1%1

!unction Aalue multiplyBAalue vC Aalue absBC Aalue acosBC Aalue asinBC Aalue atanBC Aalue atan2BAalue arg0C Aalue atan2Bdouble arg0C Aalue ceilBC Aalue cosBC Aalue coshBC Aalue e8pBC Aalue floorBC Aalue initcapBC Aalue lengthBC Aalue logBC Aalue lo erBC Aalue lpadBAalue lenC Aalue lpadBAalue len" Aalue padstrC Aalue lpadBint lenC Aalue lpadBint len" *tring padstrC Aalue ltrimBC Aalue modBAalue argC Aalue modBdouble arg0C Aalue nvlBAalue altC Aalue po erBAalue argC Aalue po erBdouble arg0C Aalue replaceBAalue repl" Aalue *tring ithC ithC Aalue replaceB*tring repl" Aalue roundBC Aalue rpadBAalue lenC Aalue rpadBAalue len" Aalue padstrC Aalue rpadBint lenC Aalue rpadBint len" *tring padstrC Aalue rtrimBC Aalue signBC

%escription

*ets the field value to the Ifield value if the value

as negative.

*ets the field value to the cosine of the number value. *ets the field value to the arc sine of the number value. *ets the field value to the arc tangents of the number value. *ets the field value to the second arc tangents of the number value. *ets the field value to the ceiling of a number value. *ets the field value to the cosine of a number value. *ets the field value to the hyperbolic cosine of a number value. *ets the field value to the e8p of a number value. *ets the field value to the floor of a number value. *ets the all first characters of ords in a string to uppercase. Mmatt castersN 2U M6att CastersN *ets the value of the field to the length of the *tring value. *ets the field value to the log of a number value. *ets the field value to the string value in lo ercase. *ets the field value to the string value" left padded to a certain length. %efault the padding string is a single space. 3ptionally" you can specify your o n padding string.

*ets the field value to the string" string.

ithout spaces to the left of the

*ets the value to the modulus of the first and the second number. )f the field value is :ull" set the value to alt. Raises the field value to the po er arg. Replaces a string in the field value ith another.

Rounds the field value to the nearest integer. *ets the field value to the string value" right padded to a certain length. %efault the padding string is a single space. 3ptionally" you can specify your o n padding string.

Remove the spaces to the right of the field value. *ets the value of the string to 2#" 0 or # in case the field value is negative" Fero or positive.

Pentaho Data Integration TM

S oon !ser "#ide 1%2

!unction Aalue sinBC Aalue sHrtBC Aalue substrBAalue from" Aalue toC Aalue substrBAalue fromC Aalue substrBint fromC Aalue substrBint from" int toC Aalue sysdateBC Aalue num2strBC Aalue num2strB*tring arg0C Aalue num2strB*tring arg0" *tring arg#C Aalue num2strB*tring arg0" *tring arg#" *tring arg2C Aalue num2strB*tring arg0" *tring arg#" *tring arg2" *tring arg(C .

%escription *ets the value of the field to the sine of the number value. *ets the value of the field to the sHuare root of the number value. *ets the value of the field to the substring of the string value.

*ets the field value to the system date *ets the field value to the tangents of the number value. Converts a number to a string. Arg0@ format pattern" see also :umber !ormats Arg#@ %ecimal separator Beither . or "C Arg2@ 4rouping separator Beither . or "C Arg(@ Currency symbol !or e8ample converting value@ . =S.;" usi$g $um str(TUUU/UU!.!!V/ T/V/ T.V) gives .. =S/;" . = usi$g $um str(TUUU/UU!.!!V/ T/V/ T.V) gives !/ = . =S.;" usi$g $um str(T!!!/!!!.!!V/ T/V/ T.V) gives !!.. =S/;"

Aalue tanBAalue args`aC

Aalue dat2strBC Aalue dat2strB*tring arg0C Aalue dat2strB*tring arg0" *tring arg#C Aalue num2datBC Aalue str2datB*tring arg0C Aalue str2datB*tring arg0" *tring arg#C Aalue str2numBC Aalue str2numB*tring arg0C Aalue str2numB*tring arg0" *tring arg#C Aalue str2numB*tring arg0" *tring arg#" *tring arg2C Aalue str2numB*tring arg0" *tring arg#" *tring arg2" *tring arg(C Aalue dat2numBC Aalue trimBC Aalue upperBC Aalue eBC Aalue piBC Pentaho Data Integration TM

Converts a date into a string. Arg0@ format pattern" see also :umber !ormats Arg#@ localiFed date2time pattern characters Bu" tC Converts a number to a date based upon the number of milliseconds since 9anuary #st" #&70 00@00@00 46$. Converts a string to a date. Arg0@ format pattern" see also :umber !ormats Arg#@ localiFed date2time pattern characters Bu" tC Converts a string into a number. Arg0@ format pattern" see also :umber !ormats Arg#@ %ecimal separator Beither . or "C Arg2@ 4rouping separator Beither . or "C Arg(@ Currency symbol

Converts a date into a number being the number of milliseconds since 9anuary #st" #&70 00@00@00 46$. Remove spaces left and right of the string value. *ets the field value to the uppercase string value. *ets the value to e *ets the value to p S oon !ser "#ide 1%$

!unction Aalue addQmonthsBint monthsC Aalue lastQdayBC Aalue firstQdayBC Aalue truncBC Aalue truncBdouble levelC Aalue truncBint levelC

%escription Adds a number of months to the date value. *ets the field value to the last day of the month of the date value. *ets the field value to the first day of the month of the date value. *et the field value to the truncated number or date value. ,evel means the number of positions behind the comma or in the case of a date" .Kmonths" -Kdays" (Khours" 2Kminutes" #Kseconds" 0Kmiliseconds

Aalue he87ncodeBC Aalue he8%ecodeBC

7ncode a *tring value in its he8adecimal representation. 7.g. )f value is a string MaN" the result value is a string M0#N" the result value is odd a leading 0 ould be M0#N. ould be MaN. )f the input string %ecode a *tring value from its he8adecimal representation. 7.g. )f ill be silently added.

11.6.%&.%. 9a6aScri t 78a( ,es 11.6."#.".1. Remember t$e previous ro%


*ometimes it can be useful to kno this piece of code@ var prevQro ^ if (prevAro% == $ull) prevAro% = ro%& ... -tri$g previous)ame = prevAro%.get-tri$g(T)ameV/ T-V)& ... prevAro% = ro%& :ote that ro is a special field that contains all values in the current ro . the value of the previous ro . $his can be accomplished by

11.6."#.".2. Set t$e location name of an address to uppercase


locatio$.upper()&

11.6."#.".3. Extract information from a date field


<< Oear<6onth<%ay representation@ ymd = dateAfield.Clo$e().dat str(Wyyyy'88'ddW).get-tri$g()& '' >ay'8o$th'6ear represe$tatio$@ dmy = dateAfield.Clo$e().dat str(Wdd'88'yyyyW).get-tri$g()& '' 6ear'8o$th @ ym = dateAfield.Clo$e().dat str(Wyyyy'88W).get-tri$g()& '' +o$g descriptio$ of the mo$th i$ the local la$guage@ mo$thAlo$gAdesc= dateAfield.Clo$e().dat str(W8888W).i$itcap().get-tri$g()& '' 0ee< of the year (.-;=) %ee<AofAyear = dateAfield.Clo$e().dat str(W%W).get1$teger()& '' day of %ee</ short descriptio$ (84)--*)) dayAofA%ee<AshortAdesc =dateAfield.Clo$e().dat str(W999W).upper().get-tri$g()& Pentaho Data Integration TM S oon !ser "#ide 1%%

'' >ay of the %ee< (8o$day--u$day) dayAofA%ee<Adesc = dateAfield.Clo$e().dat str(W9999W).i$itcap().get-tri$g()& '' >ay of %ee< (.-X) dayAofA%ee< = dateAfield.Clo$e().dat str(W,W).get1$teger()& NOTE: )f you donVt use CloneBC" the original value

ill be over ritten by the methods that

ork on

the /ettle Aalues.

Pentaho Data Integration TM

S oon !ser "#ide 1%&

11.6.%6. Modified 9a6a Scri t @a,#e

)con

6odified 9avascript values dialog

11.6.%6.1. "enera, Descri tion


$his is a modified version of the L9ava*cript AaluesL step that provides better performance and an easier" e8pression based user interface for building 9ava*cript e8pressions. $his step you to create multiple scripts for each step. ill also allo

11.6.%6.2. 9a6a scri t f#nctions


$his section provides a tree vie of your available scripts" functions" input fields and output fields. Trans/ormation Scripts@ displays a list of scripts you have created in this step Trans/ormation *onstants@ a list of pre2defined" static constants including */)PQ$RA:*!3R6A$)3:" 7RR3RQ$RA:*!3R6A$)3:" and C3:$):>7Q$RA:*!3R6A$)3: Trans/ormation Functions@ contains a variety of *tring" :umeric" %ate" ,ogic and specialiFed functions you can use to create your script. $o add a function to your script" simply double2click on the function or drag it to the location in your script that you insert it. #nput Fields@ a list of inputs coming into the step. %ouble2click or use drag and drop to insert the field into your script. Output Fields@ a list of outputs for the step. ish to

11.6.%6.$. 9a6a Scri t


$his section is here you edit the script for this step. Oou can insert functions" constants" input ish to insert or by fields" etc. from the tree control on the left by double2clicking on the node you dragging the obDect onto the 9ava *cript panel.

Pentaho Data Integration TM

S oon !ser "#ide 1%6

11.6.%6.%. 5ie,ds
$he !ields table contains a list of variables from your script including the ability to add metadata like a descriptive name.

11.6.%6.&. 78tras
4et $ariables button 7 Retrieves a list of variables from your script. Test script button 7 >se this button to test the synta8 of your script.

11.6.%6.6. 9a6a scri t interna, 'PI obCects


Oou can use the follo ing internal AP) obDects Bfor reference see the classes in the sourceC@ 6Trans/ormationName6@ a *tring 6step6@ the actual step instance of org.pentaho.di.trans.steps.scriptvaluesQmod.*criptAalues6od ro!Meta@ the actual instance of org.pentaho.di.core.ro .Ro 6eta ro!@ the actual instance of the actual data 3bDect`a ith the actual transformation name

Pentaho Data Integration TM

S oon !ser "#ide 1%7

11.6.%7. 78ec#te S;L scri t

)con

78ecute *;, *cripts

11.6.%7.1. "enera, descri tion


Oou can e8ecute *;, scripts ith this step" either once" during the initialiFation phase of the that the step is given. $he second option can be used transformation" or once for every input2ro to use parameters in *;, scripts. !or e8ample" if you a transformation@ ant to create . tables Btab#" tab2" tab(" tab- and tab.C you could make such

Pentaho Data Integration TM

S oon !ser "#ide 1%*

$he *;, script to e8ecute might look like this@ C79529 25:+9 tabO ( a 1)29(97 )& $he field name to specify as parameter is then the McountN seHuence second step. NOTE@ $he e8ecution of the transformation e defined in the

ill halt

hen a statement in the script fails.

As e8tra option" you can return the total number of inserts B):*7R$ ):$3 statementsC" updates B>P%A$7 tableC" deletes B%7,7$7 !R36 tableC and reads B*7,7C$ statementsC by specifying the field names in the lo er right of the dialog.

Pentaho Data Integration TM

S oon !ser "#ide 1%-

11.6.%*. Di(ension ,oo3# <# date

)con

%imension ,ookup<>pdate

11.6.%*.1. "enera, descri tion


$his step type allo s you to implement Ralph /imballVs slo ly changing dimension for both types@ $ype ) BupdateC and $ype )) BinsertC. :ot only can you use this dimension for updating a dimension table" it can also be used for looking up values in a dimension.

Pentaho Data Integration TM

S oon !ser "#ide 1&0

)n our dimension implementation each entry in the dimension table has the follo ing properties@ 3ption $echnical key Aersion field *tart of date range 7nd of date range /eys !ields %escription $his is the primary key of the dimension. *ho s the version of the dimension entry Ba revision numberC. $his is the fieldname containing the validity starting date. $his is the fieldname containing the validity ending date. $hese are the keys used in your source systems. !or e8ample@ customer numbers" product id" etc. $hese fields contain the actual information of a dimension.

As a result of the lookup or update operation of this step type" a field is added to the stream containing the technical key of the dimension. )n case the field is not found" the value of the dimension entry for not found B0 or #" based on the type of databaseC is returned. NOTE@ $his dimension entry is added automatically to the dimension table

hen the update is first

run.

11.6.%*.2. O tions
$he follo ing table provides a more detailed description of the options for the %imension ,ookup<>pdate step@ 3ption *tep name >pdate the dimension+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. Check this option if you ant to update the dimension based on the information in the input stream. )f this option is not enabled" the dimension only does lookups and only adds the technical key field to the streams. Connection $arget schema $arget table Commit siFe Cache siFe in ro s :ame of the database connection on resides. $his allo s you to specify a schema name to improve precision in the Huoting and allo *etting this to #0 for table2names ith dots Z.V in it. :ame of the dimension table. ill generate a commit every #0 inserts or updates. ill be held in memory to $his is the cache siFe in number of ro s that database. Note: Please note that only the last version of a dimension entry is hich the dimension table

speed up lookups by reducing the number of round trips to the

kept in memory. )f there are more entries passing than kept in memory" the technical keys memory in the hope that these are the most relevant.

hat can be

ith the highest values are kept in

A cache siFe of 0 caches as many ro s as possible and until your 9A6 runs out of memory. >se this option gro n too large. isely ith dimensions that canLt

Pentaho Data Integration TM

S oon !ser "#ide 1&1

3ption /eys tab !ields tab

%escription A cache siFe of 2# means that caching is disabled. *pecify the names of the keys in the stream and in the dimension table. $his specify ill enable the step to do the lookup. ant the values to be updated Bfor all versions" ant to have the values inserted into e used in the version. )n the e8ample as !or each of the fields you need to have in the dimension" you can hether you this is a $ype ) operationC or you the dimension as a ne

screenshot the birth date is something thatVs not variable in time" so if the birth date changes" it means that it in all versions of the dimension entry. $echnical key field $his indicates the primary key of the dimension. )t is also referred to as *urrogate /ey. >se the ne name option to rename the technical key after a lookup. !or e8ample" if you need to lookup different types of products like 3R)4):A,QPR3%>C$Q$/" R7P,AC767:$QPR3%>C$Q$/" ... Creation of technical key *pecify ho the technical key is generated" options available for your connection ill be grayed out@ technical key ill be created >se table ma8imum _ #@ A ne hich are not rong in previous versions. )tVs only logical then" that the previous values are corrected

from the ma8imum key in the table. :ote that the ne ma8imum is al ays cached" so that the ma8imum does not need to be calculated for each ne ro . ant to use >se seHuence@ *pecify the seHuence name if you technical key Btypical for 3racle e.g.C. >se auto increment field@ >se an auto increment field in the database table to generate the technical key Btypical for %=2 e.g.C. Aersion field *tream %atefield *pecifies the name of the field to store the version Brevision numberC in. )f you have the date at hich the dimension entry as last changed" you can specify the name of that field here. )t allo s the dimension entry to be accurately described for %ate range start field $able daterange end 4et !ields button *;, button hat the date range concerns. )f ill be taken. you donVt have such a date" the system date

a database seHuence on the table connection to generate the

*pecify the names of the dimension entries start range. *pecify the names of the dimension entries end range. !ills in all the available fields on the input stream" e8cept for the keys you specified. 4enerates the *;, to build the dimension and allo s you to e8ecute this *;,.

11.6.%*.$. 0e(ar3s
For the Stream date /ield@ Consider adding an e8tra date field from *ystem )nfo if you donVt ant the date ranges to be different all the time. !or e8ample if you have e8tracts from a source Pentaho Data Integration TM S oon !ser "#ide 1&2

system being done every night at midnight" consider adding date MOesterday 2(@.&@.&N as a field to the stream by using a 9oin step. #M(O T)NT NOTE@ this needs to be a %ate field. 'e isolate functionality and as such reHuire

you to do date type conversions in advance.


For the 8Date range start and end /ields9@ Oou can only enter a year in these fields" not a timestamp. )f you enter a year OOOO Be.g. 2#00C" it 00@00@00.00N in the dimension table. ill be used as timestamp MOOOO20#20#

Pentaho Data Integration TM

S oon !ser "#ide 1&$

11.6.%-. Co(bination ,oo3# <# date

)con

Combination ,ookup<>pdate

11.6.%-.1. "enera, descri tion


$his step allo s you to store information in a Dunk2dimension table" and can possibly also be used to maintain /imball pure $ype # dimensions. )n short #. 2. (. -. hat it table^ )f this combination of business key fields e8ists" return its technical key Bsurrogate idC^ )f this combination of business key doesnLt e8ist yet" insert a ro and return its Bne C technical key^ Put all input fields on the output stream including the returned technical key" but remove all business key fields if Mremove lookup fieldsN is true. *o hat this step does is create<maintain a technical key out of data for the business key already e8isted or ith business keys. After as created. $his step ill only ith the ne key fields ill do is@

,ookup combination of business key field#... fieldn from the input stream in a dimension

passing through this step all of the remaining data changes for the dimension table can be made as updates" as either a ro maintain the key information" you update<lookup step. /ettle ill store the information in a table here the primary key is the combination of the business in case you have a large number of ill still need to update the non2key information in the

dimension table" e.g. by putting an update step Bbased on technical keyC after the combination

key fields in the table. =ecause this process can be very slo speed up lookup performance dramatically

fields" /ettle also supports a Mhash codeN field representing all fields in the dimension. $his can hile limiting the fields to inde8 to #.

Pentaho Data Integration TM

S oon !ser "#ide 1&%

11.6.%-.2. O tions
3ption *tep name >pdate the dimension+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. Check this option if you ant to update the dimension based on the information in the input stream. )f this option is not enabled" the dimension only does lookups and only adds the technical key field to the streams. Connection $arget schema $arget table Commit siFe Cache siFe in ro s :ame of the database connection on resides. $his allo s you to specify a schema name to improve precision in the Huoting and allo *etting this to #0 for table2names ith dots Z.V in it. :ame of the dimension table. ill generate a commit every #0 inserts or updates. ill be held in memory to $his is the cache siFe in number of ro s that database. Note: Please note that only the last version of a dimension entry is hich the dimension table

speed up lookups by reducing the number of round trips to the

kept in memory. )f there are more entries passing than kept in memory" the technical keys memory in the hope that these are the most relevant.

hat can be

ith the highest values are kept in

A cache siFe of 0 caches as many ro s as possible and until your 9A6 runs out of memory. >se this option gro n too large. A cache siFe of 2# means that caching is disabled. /ey fields $echnical key field *pecify the names of the keys in the stream and in the dimension table. $his ill enable the step to do the lookup. $his indicates the primary key of the dimension. )t is also referred to as *urrogate /ey. >se the ne name option to rename the technical key after a lookup. !or e8ample" if you need to lookup different types of products like 3R)4):A,QPR3%>C$Q$/" R7P,AC767:$QPR3%>C$Q$/" ... Creation of technical key *pecify ho the technical key is generated" options available for your connection ill be grayed out@ technical key ill be created >se table ma8imum _ #@ A ne hich are not isely ith dimensions that canLt

from the ma8imum key in the table. :ote that the ne ma8imum is al ays cached" so that the ma8imum does not need to be calculated for each ne ro . ant to use >se seHuence@ *pecify the seHuence name if you technical key Btypical for 3racle e.g.C. >se auto increment field@ >se an auto increment field in the database table to generate the technical key Btypical for %=2 Pentaho Data Integration TM S oon !ser "#ide 1&&

a database seHuence on the table connection to generate the

3ption Remove lookup fields+

%escription e.g.C. 7nable this option if you technical key. ant to remove all the lookup fields from the input stream in the output. $he only e8tra field added is then the

>se hashcode

$his option allo s you to generate a hash code" representing all values in the key fields in a numerical form Ba signed 0- bit integerC. $his hash code has to be stored in the table. #M(O T)NT: $his hash code is :3$ uniHue. As such it makes no

sense to place a uniHue inde8 on it.


$able daterange end 4et !ields button *;, button *pecify the names of the dimension entries end range. !ills in all the available fields on the input stream" e8cept for the keys you specified. 4enerates the *;, to build the dimension and allo s you to e8ecute this *;,.

11.6.%-.$. 0e(ar3s
$he Combination ,ookup<>pdate step assumes that the dimension table it maintains is not updated concurrently by other transformations<applications. 'hen you use e.g. the M$able 6a8 _ #N method to create the technical keys the step highest technical key. $he technical ill not al ays go to the database to retrieve the ne8t ould ill most likely get errors on duplicate technical ill be cached locally" so if multiple transformations

update the dimension table simultaneously you

keys. >sing a seHuence or an auto increment technical key to generate the technical key it is still not advised to concurrently do updates to a dimension table because of possible conflicts bet een transformations. )t is assumed that the technical key is the primary key of the dimension table or at least has a uniHue inde8 on it. )tLs not #00X reHuired but if a technical key e8ists multiple times in the dimension table the result for the Combination ,ookup<>pdate step is unreliable.

Pentaho Data Integration TM

S oon !ser "#ide 1&6

11.6.&0. Ma

ing

)con

6apping Be8ecute sub2transformationC

11.6.&0.1. "enera, descri tion 4 #se


'hen you need to do a certain transformation over and over again" you can turn the repetitive part into a mapping. A mapping is a transformation that@ specifies ho specified ho the input is going to be using a 6apping)nput step the input fields are transformed@ the field that are added and deleted

!or e8ample you can create mappings for dimension lookups so that you donLt need to enter the natural keys etc. every time again. )n the dialog" you can specify the transformation name and even launch another *poon edit the selected transformation. indo to

Pentaho Data Integration TM

S oon !ser "#ide 1&7

78ample@
*uppose e ant to create a mapping that does a lookup in the customer slo ly changing arehouse" you need to specify the details for the dimension in Huestion e ant to create a mapping. $he details needed for the dimension. )n a larger

every time again. $o get better re2use inputs

dimension lookup are in this case the customer number and the lookup reference date. $hese 2 e specify in the mapping input step@

After this

e can perform any calculation in our reusable transformation B6appingC" in our case

do a lookup in the dimension@

Pentaho Data Integration TM

S oon !ser "#ide 1&*

$his dimension lookup step adds one field to the eHuation@ customerQtk. 'e can specify the fields that ere in the 6apping 3utput step@

$he complete mapping looks like this@

'hen

ant to re2use this mapping" this is ho

e can do it@

As you can see" the

ay

e do it is by MmappingN the stream fields of our choice to the reHuired e can re2name on the output side.

input fields of the mapping. Any added fields

Pentaho Data Integration TM

S oon !ser "#ide 1&-

11.6.&1. "et ro/s fro( res#,t

)con

4et ro s from previous result

11.6.&1.1. "enera, descri tion


$his step returns ro s that $he ro s $o allo ere previously generated by another transformation in a Dob. ere passed on to this step using the MCopy ro s to resultN step.

you to design more easily" you can enter the meta2data of the fields youLre e8pecting

from the previous transformation in a Dob. #M(O T)NT: no validation of the supplied metadata is done at this time to allo

for greater

fle8ibility. )t is Dust an aid at design time.

11.6.&2. Co 1 ro/s to res#,t

)con

Copy ro s to result strings

11.6.&2.1. "enera, descri tion


$his step allo s you to transfer ro s of data Bin memoryC to the ne8t transformation BDob entryC in a Dob.

Pentaho Data Integration TM

S oon !ser "#ide 160

11.6.&$. Set @ariab,e

)con

*et 7nvironment Aariables

11.6.&$.1. "enera, descri tion


$his step allo s you to set variables in a Dob or in the virtual machine. )t accepts one Band only oneC ro of data to set the value of a variable. $alid in the "irtual machine@ the complete virtual machine variable. 5) N#N4@ this makes your transformation only fit to run in a stand2alone fashion. ill kno about this ?ere are the possible scope settings@

Running on an application server like on the Pentaho frame ork can become a problem. $hat is because other transformations running on the server step makes.
$alid in the parent :ob@ the variable is only valid in the parent Dob. $alid in the grand7parent :ob@ the variable is valid in the grand2parent Dob and all the child Dobs and transformations. $alid in the root :ob@ the variable is valid in the root Dob and all the child Dobs and transformations.

ill also see the changes this

11.6.&$.2. @ariab,e #sage


Refer to Aariables for a description of the use of variables.

Pentaho Data Integration TM

S oon !ser "#ide 161

11.6.&%. "et @ariab,e

)con

4et Aariable

11.6.&%.1. "enera, descri tion


$his step allo s you to get the value of a variable. $his step can return ro s or add values to input ro s. NOTE@ Oou need to specify the complete variable specification in the format DIvariableK or LLvariableLL Bas described in AariablesC . $hat means you can also enter complete strings in the

variable column" not Dust a variable.


!or e8ample" you can specify@ DIjava.io.tmpdirK'<ettle'tempfile.txt and it e8panded to <tmp<kettle<tempfile.t8t on >ni82like systems. ill be

Pentaho Data Integration TM

S oon !ser "#ide 162

11.6.&&. "et fi,es fro( res#,t

)con

4et files from result

11.6.&&.1. "enera, descri tion


7very time a file gets processed" used or created in a transformation or a Dob" the details of the file" the Dob entry" the step" etc. is captured and added to the result. Oou can access this file information using this step. $hese are the output fields@ !ield name type filename path parentorigin origin comment timestamp $ype *tring *tring *tring *tring *tring *tring %ate 78ample :ormal" ,og" 7rror" 7rror2line" etc. somefile.t8t C@S!ooS=arSsomefile.t8t Process files transformation $e8t !ile )nput Read by te8t file input 200020022( #2@(-@.0

Pentaho Data Integration TM

S oon !ser "#ide 16$

11.6.&6. Set fi,es in res#,t

)con

*et file in result

11.6.&6.1. "enera, descri tion


$his step can be used to route the list of files to the results stream. !or e8ample" the 6ail Dob entry can use this list of files to attach to a mail" so perhaps you donLt ant all files sent" but only a certain selection. !or this" you can create a transformation that sets e8actly those files you ant to attach.

Pentaho Data Integration TM

S oon !ser "#ide 16%

11.6.&7. InCector

)con

Ro

)nDector

11.6.&7.1. "enera, descri tion


)nDector as created for those people that are developing special purpose transformations and ith it@ transformations that have no input at design time@ do ant to LinDectL ro s into the transformation using the /ettle AP) and 9ava. Among other things you can build LheadlessL transformations not read from file or database. ?ere is some information on ho o o o to do it@ Oou can ask a $rans obDect for a Ro Producer obDect Also see the use case test in package@ be.ibridge.<ettle.test.ro%producer >se this type of code@

2ra$s tra$s = $e% 2ra$s(... 2ra$s8eta ...)& tra$s.prepare9xecutio$(args)& 7o%#rocuder rp = tra$s.add7o%#roducer(-tri$g step$ame/ i$t stepCopy)& After that you start the threads in the transformation. $hen you can inDect the ro s transformation is running@ tra$s.start2hreads()& ... rp.put7o%(7o% -ome7o%6ou3ave2o1$ject)& ... Oou can also specify the ro s you are e8pecting to be inDected. $his makes it easier to build transformations because you have the meta2data at design time. hile the

Pentaho Data Integration TM

S oon !ser "#ide 16&

11.6.&*. Soc3et reader

)con

*ocket reader

11.6.&*.1. "enera, descri tion and #se


*ocket readers are used to transfer data from one server to another over $CP<)P. $he primary use for these steps is in2line in a clustering environment. )f you clustered transformation doesC ant to use these yourself" make sure to synchroniFe the preparation and start cycles of the transformations bet een the hosts. Blike the

11.6.&-. Soc3et /riter

)con

*ocket

riter

11.6.&-.1. "enera, descri tion and #se


*ocket riters are used to transfer data from one server to another over $CP<)P. $he primary use ant to use these yourself" make sure for these steps is in2line in a clustering environment. )f you clustered transformation doesC

to synchroniFe the preparation and start cycles of the transformations bet een the hosts. Blike the

Pentaho Data Integration TM

S oon !ser "#ide 166

11.6.60. 'ggregate 0o/s

)con

Aggregate Ro s

11.6.60.1. "enera, descri tion


$his step type allo s you to Huickly aggregate ro s based upon all the ro s. $hese are the available aggregation types@ *>6 2 the sum of a field AA7RA47 2 the average of a field C3>:$ 2 the number of Bnot nullC values 6): 2 the minimum value of a field 6AG 2 the ma8imum value of a field !)R*$ 2 the first value of a field ,A*$ 2 the last value of a field

NOTE@ $his step type is deprecated. *ee the 4roup =y step for a more po erful

ay of aggregating

ro s of data. $he aggregate step can be removed in a future version.

Pentaho Data Integration TM

S oon !ser "#ide 167

11.6.61. Strea(ing FML In #t

)con

*treaming G6,

11.6.61.1. "enera, descri tion


$he purpose of this step is to provide value parsing. $his step is based on *AG parser to provide better performances ith larger files. )t is very similar to Gml )nput" there are only differences in content and field tabs. $he follo ing sections describe in detail the properties and settings available for the *treaming G6, input step.

11.6.61.2. 5i,e Tab


3ption *tep name !ile or directory %escription :ame of the step. $his name has to be uniHue in a single transformation. $his field specifies the location and<or name of the input te8t file. NOTE@ press the MaddN button to add the file<directory< ildcard

combination to the list of selected files BgridC belo .


Regular e8pression *elected !iles *pecify the regular e8pression you ant to use to select the files in the ildcard selectionsC along directory specified in the previous option. $his table contains a list of selected files Bor ith a property specifying if file is reHuired or not. )f a file is reHuired and it isnLt found" an error is generated. 3ther ise" the filename is simply skipped. *ho filenamesBsC... %isplays a list of all files that file definitions. ill be loaded based on the current selected

11.6.61.$. Content
3ption )nclude filename in output 5 fieldname %escription Check this option if you the ro field here the filename ant to have the name of the G6, file to ill end up in. hich belongs in the output stream. Oou can specify the name of the

Pentaho Data Integration TM

S oon !ser "#ide 16*

3ption Ro num in output 5 fieldname ,imit ,ocation

%escription Check this option if you ant to have a ro number Bstarts at #C in the here the integer ill end up in. output stream. Oou can specify the name *pecify the path by follo s@ )@ still specify an attribute Ep@ specify an element defined by position BeHuivalent to 7 in original G6,)nputC. Ea@ specify an element defined by an attribute and allo 78ample@ 7pKelement<# 7aKelement<att@val attribute called YattY this is the first element called YelementY this is the element called YelementY that have an ith YvalY value value parsing.

Oou can specify the ma8imum number of ro s to read here. ay of elements to the repeating part of the G6, file. $he element column is used to specify the element and position as

11.6.61.%. 5ie,ds
3ption :ame $ype !ormat ,ength %escription :ame of the field $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4roup $rim type :ull if Repeat Position !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ >sed to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C type trim this field Bleft" right" bothC before processing treat this value as :>,, O<:@ )f the corresponding value in this ro the last time it as not empty is empty@ repeat the one from

Position@ $he position of the G6, element or attribute. Oou use the follo ing synta8 to specify the position of an element@ $he first element called MelementN@ 9=eleme$t'. $he first attribute called MattributeN@ 5=attribute'. $he first attribute called MattributeN in the second MelementN tag@ 9=eleme$t' / 5=attribute'. NOTE@ Oou can auto2generate all the possible positions in the G6, file

supplied by using the M4et !ieldsN button.


NOTE@ *upport

as added for G6, documents

here all the information as

is stored in the Repeating Bor RootC element. $he special RK locater


Pentaho Data Integration TM

S oon !ser "#ide 16-

3ption

%escription

added to allo

you to grab this information. $he M4et fieldsN button finds

this information if itLs present.

11.6.61.&. Strea(ing FML 78a( ,e


Consider the follo ing G6,@

Pentaho Data Integration TM

S oon !ser "#ide 170

*uppose that this@

e are interested in cars

e must specify the location of the repeating element like

:o

lets see the fields"

e have different YpropertyY elements that are differentiated by their

YnameY attribute"

e are about to have the follo ing fields YbrandY" YtypeY and Ypo erY according

to the YnameY attribute. !or this" e must specify the association bet een YpropertyY and YnameY in the first grid.

Pressing the M4et !ieldsN button retrieves the right fields including properties. ,et us no try leaving the ne grid empty.

Pentaho Data Integration TM

S oon !ser "#ide 171

Oou can see that in this case the step is and missing elements

orking like the original G6,)nput and retrieve fields by

their position. )n this case" it is better to use value parsing" cause you get the right field names" ill not corrupt results Bfor e8ample missing Tproperty nameKYpo erYU T<propertyU in some ro sC.

Pentaho Data Integration TM

S oon !ser "#ide 172

11.6.62. 'bort

)con

Abort

11.6.62.1. "enera, descri tion


$his step type allo s you abort a transformation upon seeing input. )tLs main use is in error handling. !or e8ample" you can use this step so that a transformation can be aborted after 8 number of ro s flo to over an error hop.

11.6.62.2. O tions
3ption *tep name Abort threshold %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he threshold of number of ro s after )f threshold is 0" the abort step is ." the abort step Abort message Al ays log ill be used. Al ays log the ro s processed by the Abort step. $his allo s the ro s to be logged although the log level of the transformation $his ay you can al ays see in the log to abort. ould normally not do it. hich ro s caused the transformation hich to abort the transformations. 7.g. ill abort after seeing the first ro . )f threshold

ill abort after seeing the si8th ro .

$he message to put in the log upon aborting. )f not filled in a default message

Pentaho Data Integration TM

S oon !ser "#ide 17$

11.6.6$. Orac,e =#,3 Loader

)con

3racle =ulk ,oader

11.6.6$.1. "enera, descri tion


$his step type allo s you bulk load data to an 3racle database. )t a proper load format and ill rite the data it receives to ill then invoke 3racle *;,],oader to transfer it to the specified table.

#M(O T)NT; 9ust like all steps in the M78perimentalN category" this step is not considered ready for production use by the author. )n the specific case of the 3racle =ulk loader to do e8tensive testing on it. Oour feedback is most elcome as al ays. e lacked the time

11.6.6$.2. O tions
3ption *tep name Connection $arget schema $arget table *Hldr path ,oad method %escription :ame of the step.

Note: $his name has to be uniHue in a single transformation.


:ame of the database connection on data sources that allo hich the dimension table resides. rite data to. $his is important for ith dots Z.V in it. $he name of the *chema for the table to for table names :ame of the target table. !ull path to the sHlldr utility Bincluding sHlldrC. )f sHlldr is in the path of the e8ecuting application you can leave it to sHlldr. 7ither MAutomatic load Bat the endCN or M6anual load Bonly creation of filesCN. Automatic load ill start up sHlldr after receiving all input ith the specified arguments in this step. 6anual load ill only create a control and data file" this

Pentaho Data Integration TM

S oon !ser "#ide 17%

3ption

%escription can be used as a back2door@ you can have P%) generate the data and create e.g. your o n control file to load the data Boutside of this stepC.

,oad action 6a8imum errors Commit

Append" )nsert" Replace" $runcate. $hese map to the sHlldr action to be performed. $he number of ro s in error after the M7RR3RN attribute of sHlldr. $he number of ro s after attribute of sHlldr load. hich to commit" this corresponds to the MR3'*N hich differs bet een using a conventional and a direct path hich sHlldr ill abort. $his corresponds to

=ind *iFe Read *iFe Control file %ata file ,og file =ad file %iscard file 7ncoding %irect path 7rase cfg<dat files after use !ields to load

Corresponds to the M=):%*)E7N attribute of sHlldr. Corresponds to the MR7A%*)E7N attribute of sHlldr. $he name of the file used as control file for sHlldr. $he name of the data file in hich the data ill be ritten. $he name of the log file" optionally defined. $he name of the bad file" optionally defined. $he name of the discard file" optionally defined. 7ncodes data in a specific encoding" any valid encoding can be chosen besides the one in the drop do n list. * itch on direct path loading" corresponds to %)R7C$K$R>7 in sHlldr. 'hen s itched on the control and data file ill be erased after loading.

$his table contains a list of fields to load data from" properties include@ Table /ield@ $able field to be loaded in the 3racle table^ Stream /ield@ !ield to be taken from the incoming ro s^ Date mas<@ 7ither M%ateN or M%ate maskN" determines ho date<timestamps ill be loaded in 3racle. 'hen left empty defaults to M%ateN in case of dates.

Pentaho Data Integration TM

S oon !ser "#ide 17&

11.6.6%. '

end

)con

Append

11.6.6%.1. "enera, descri tion


$his step type allo s you to order the ro s of t o inputs hops. !irst" all of the ro s of the M?ead hopN ill be read and output" after that all of the ro s of the M$ail hopN ill be ritten to the output. )f more than 2 hops need to be used you can use multiple append steps in seHuence.

11.6.6%.2. O tions
3ption *tep name ?ead hop $ail hop %escription :ame of the step.

Note: $his name has to be uniHue in a single transformation.


:ame of the hop of :ame of the hop of hich the ro s should be output first. hich the ro s should be output last.

Pentaho Data Integration TM

S oon !ser "#ide 176

11.6.6&. 0ege8 76a,#ation

)con

Rege8 7valuation

11.6.6&.1. "enera, descri tion


$his step type allo s you to validate an input field against regular e8pression. A regular e8pression Brege8 or rege8p for shortC is a special te8t string for describing a search pattern. !or e8ample" the eHuivalent rege8 for in a file manager is@ .*F.txt ildcard notations such as ].t8t to find all te8t files

11.6.6&.2. Settings Tab


3ption *tep name !ield to evaluate Result !ieldname Regular e8pression >se variable substitution %escription :ame of the step.

Note: $his name has to be uniHue in a single transformation.


:ame of the field to evaluate $he name of the return field BbooleanC Put here the regular e8pression to match. )f you use variable" return itLs content by selecting this option.

11.6.6&.$. Content
3ption )gnore differences in >nicode encodings 7nables case2 insensitive matching %escription Check to ignore differences. Note: $his may improve performance" but be sure you data only

contains >* A*C)) characters.


=y default" case2insensitive matching assumes that only characters in the >*2A*C)) charset are being matched. >nicode2a are case2 insensitive matching can be enabled by specifying the L>nicode2a are case...L flag in conDunction ith this flag.

Pentaho Data Integration TM

S oon !ser "#ide 177

3ption Permit pattern 7nable dotall mode hitespace and

%escription Note: Oou can also enable this via the embedded flag e8pression B+iC. 'hen enabled" the step comments starting ill ignore hitespace and embedded

and comments in

ith [ through the end of the line.

Note: Comments mode can also be enabled via the embedded flag

e8pression B+8C.
'hen enabled" the e8pression L.L matches any character including the line terminator. =y default" this e8pression does not match the line terminators. Note: %otall mode can also be enabled via the flag e8pression B+sC. 7nable multiline mode 'hen enabled" the e8pressions LhL and LRL match Dust after or Dust before" respectively" a line terminator or the end of the input seHuence. =y default" these e8pressions only match at the beginning and the end of the entire input seHuence. Note: 6ultiline mode can also be enabled via the flag e8pression B+mC 7nable >nicode2a are case folding 'hen enabled" in conDunction ith the Case2insensitive flag" case2 ith the >nicode insensitive matching is done in a manner consistent

standard. =y default" case2insensitive matching assumes that only characters in the >*2A*C)) charset are being matched. Note: >nicode2a are case folding can also be enabled via the

embedded flag e8pression B+uC.


7nables >ni8 lines mode 'hen enabled" only the line terminator is recogniFed in the behavior of L.L" LhL" and LRL. Note: >ni8 lines mode can also be enabled via the embedded flag

mode B+dC.

Pentaho Data Integration TM

S oon !ser "#ide 17*

11.6.66. CS@ In #t

)con

C*A )nput

11.6.66.1. "enera, descri tion


$his step provides the ability to read data from a delimited file.

11.6.66.2. O tions
$he table belo 3ption *tep name !ilename %elimiter 7nclosure :)3 buffer siFe describes the options available for the C*A )nput step@ %escription :ame of the step.

Note: $his name has to be uniHue in a single transformation.


*pecify the name of the C*A file to read from. *pecify the file delimiter character used in the target file. *pecify the enclosure character used in the target file. $his step uses :on2=locking )<3 for increased performance. $he buffer siFe is the number of bytes that beyond a fe ,aFy conversion ?eader ro !ields $able Previe button present+ ill be read in one pass. ?igher values typically lead to better performance" although it tops off Huickly 6=. !or ]very] fast disks you might consider putting it ill avoid unnecessary data type conversions and can containing in the U#06= range. ,aFy conversion result in a significant performance improvements. Check to enable. 7nable this option if the target file contains a header ro column names. $his table contains an ordered list of fields to be read from the target file. Click to previe the data coming from the target file. ill be 4et !ields button Click to return a list of fields from the target file based on the current settings Bi.e. %elimiter" 7nclosure" etc.C. All fields identified added to the !ields $able.

Pentaho Data Integration TM

S oon !ser "#ide 17-

11.6.67. 5i8ed 5i,e In #t

)con

!i8ed !ile )nput

11.6.67.1. "enera, descri tion


$his step is used to read data from a fi8ed idth file

11.6.67.2. O tions
$he table belo 3ption *tep name !ilename ,ine idth in bytes ,ine feeds present+ :)3 buffer siFe describes the options available for the !i8ed !ile )nput step@ %escription :ame of the step.

Note: $his name has to be uniHue in a single transformation.


*pecify the name of the C*A file to read from. *pecify the idth of each record in the target file. Check if the target file contains line feed characters. $his step uses :on2=locking )<3 for increased performance. $he buffer siFe is the number of bytes that fe ,aFy conversion ?eader ro present+ ill be read in one pass. ?igher values typically lead to better performance" although it tops off Huickly beyond a 6=. !or ]very] fast disks you might consider putting it in the U#06= ill avoid unnecessary data type conversions and can containing range. ,aFy conversion result in a significant performance improvements. Check to enable. 7nable this option if the target file contains a header ro column names. Running in parallel+ 7nable this option to have each copy of the step read a dedicated part of the te8t file. !or e8ample" if you run the step in . copies locally" each step ill read one .th of the ro s in the file. )f you run it in . copies ill read one .0th of the across #0 slave servers" each Y!i8ed )nputY step ro s in the file. !ields $able $his table contains an ordered list of fields to be read from the target file.

Pentaho Data Integration TM

S oon !ser "#ide 1*0

3ption Previe button 4et !ields button

%escription Click to previe the data coming from the target file. ill be added Click to return a list of fields from the target file based on the current settings Bi.e. %elimiter" 7nclosure" etc.C. All fields identified to the !ields $able.

Pentaho Data Integration TM

S oon !ser "#ide 1*1

11.6.6*. Microsoft 'ccess In #t

)con

6icrosoft Access )nput

11.6.6*.1. "enera, descri tion


$his step provides the ability to read data from a 6icrosoft Access database. $he follo ing sections describe the available options for the Access input step.

11.6.6*.2. 5i,e Tab


3ption *tep name !ile or directory %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his field specifies the location and<or name of the input te8t file. NOTE@ press the MaddN button to add the file<directory< ildcard

combination to the list of selected files BgridC belo .


Regular 78pression *elected files *pecify the regular e8pression you ant to use to select the files in ildcard selectionsC the directory specified in the previous option. $his table contains a list of selected files Bor along ith a property specifying if file is reHuired or not. )f a file is

reHuired and it isnLt found" an error is generated. 3ther ise" the filename is simply skipped. *ho Previe !ilenameBsC button ro s button %isplays a list of all files that selected file definitions. %isplays a previe configuration. of the data based on the current step ill be loaded based on the current

11.6.6*.$. Content
3ption $able Pentaho Data Integration TM %escription *pecify the name of the table to read from or click bro se to bro se for a S oon !ser "#ide 1*2

3ption )nclude filename in output+ )nclude tablename in output+ )nclude ro num in output+ Reset Ro num per file ,imit

%escription table. 3ptionally allo s you to insert a field containing the filename onto the stream. 3ptionally allo s you to insert a field containing the tablename onto the stream. 3ptionally allo s you to insert a field containing the ro stream. 3ptionally allo s you to reset the ro from. 3ptionally specify a limit on the number of ro s to read. number for each file being read number onto the

11.6.6*.%. 5ie,ds
3ption :ame Column $ype !ormat ,ength %escription :ame of the field $he name of the column being read from. $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4roup $rim type Repeat !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ >sed to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C $ype trim this field Bleft" right" bothC before processing O<:@ )f the corresponding value in this ro last time it as not empty is empty@ repeat the one from the

Pentaho Data Integration TM

S oon !ser "#ide 1*$

11.6.6-. LD'P In #t

)con

,%AP )nput

11.6.6-.1. "enera, descri tion


$he ,%AP )nput step allo s you to read information like users" roles and other data from an ,%AP server. $he follo ing sections describe the available options for the ,%AP input step.

11.6.6-.2. "enera, Tab


3ption *tep name ?ost Port >ser Authentication >sername Pass ord $est connection button *earch base !ilter *tring Previe ro s button *pecify the location in the directory from Click to previe hich the ,%AP search begins. *pecify the filter string for filtering the results. ro s based on the current step settings. %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he hostname or )P address of the ,%AP server. $he $CP port to use. $his is usually (1&. 7nable this option if you ,%AP server. >sername for authenticating Pass ord for authenticating ith the ,%AP server. ith the ,%AP server. ant to pass authentication credentials to the

Click to test connecting to the ,%AP server.

11.6.6-.$. Content

Pentaho Data Integration TM

S oon !ser "#ide 1*%

3ption )nclude ro num in output+ Ro num fieldname ,imit

%escription 3ptionally allo s you to insert a field containing the ro stream. *pecify the name of the field to contain ro numbers. 3ptionally specify the a limit on the number of ro s to read. number onto the

11.6.6-.%. 5ie,ds
3ption :ame Column $ype !ormat ,ength %escription :ame of the field $he name of the column being read from. $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4roup $rim type Repeat !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ >sed to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C $ype trim this field Bleft" right" bothC before processing O<:@ )f the corresponding value in this ro last time it as not empty is empty@ repeat the one from the

Pentaho Data Integration TM

S oon !ser "#ide 1*&

11.6.70. C,os#re "enerator

)con

Closure 4enerator

11.6.70.1. "enera, descri tion


$his step is used to generate a Refle8ive $ransitive Closure $able for PentahoLs 6ondrian relational 3,AP engine. !or more information on parent2child hierarchies in 6ondrian and ho closure tables can help

improve performance" please refer to the 6ondrian documentation found here. $echnically" this step reads all input ro s in memory and calculates all possible parent2child relationships. )t attaches the distance Bin levelsC from parent to child.

11.6.70.2. O tions
3ption *tep name Parent )% field Child )% field %istance field name Root is Fero B)ntegerC+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he field name that contains the parent )% of the parent2child relationship. $he field name that contains the child )% of the parent2child relationship. $he name of the distance field that Fero B0C. ill be added to the output. Check this bo8 if the root of the parent2child tree is not empty BnullC but

11.6.70.$. 78a( ,e
$he e8ample data sho n belo tables found here. as taken from the 6ondrian help pages on the subDect of closure

Pentaho Data Integration TM

S oon !ser "#ide 1*6

$his transformation is available in directory samples<transformations< in filename YClosure generator


2 standard mondrian sample.ktrY.

Pentaho Data Integration TM

S oon !ser "#ide 1*7

11.6.71. Mondrian In #t

)con

6ondrian )nput

11.6.71.1. "enera, descri tion


$his step provides the ability to e8ecute an 6%G Huery against a 6ondrian R3,AP cube and get the result back in a tabular format.

11.6.71.2. O tions
3ption *tep name Connection 6%G ;uery Catalog location Previe button %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he database connection to the database associated you ant to Huery. ant to e8ecute. *pecify the 6%G Huery you Click to previe ith the 6ondrian cube

*pecify the location of the 6ondrian *chema file. the data based on the current step settings.

Pentaho Data Integration TM

S oon !ser "#ide 1**

11.6.72. "et 5i,es 0o/ Co#nt

)con

4et !iles Ro

Count

11.6.72.1. "enera, descri tion


$his step ill return the ro counts for one or more files. $he follo ing sections describe the Count step. available options for the 4et !iles Ro

11.6.72.2. 5i,e Tab


3ption *tep name !ile or directory %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his field specifies the location and<or name of the input te8t file. NOTE@ press the MaddN button to add the file<directory< ildcard combination

to the list of selected files BgridC belo .


Regular 78pression *elected !iles *pecify the regular e8pression you ant to use to select the files in the ildcard selectionsC along ith a directory specified in the previous option. $his table contains a list of selected files Bor property specifying if file is reHuired or not. )f a file is reHuired and it isnLt found" an error is generated. 3ther ise" the filename is simply skipped. *ho !ilenameBsC Previe Ro s %isplays a list of all files that definitions. %isplays the content of the selected file. ill be loaded based on the current selected file

11.6.72.$. Content
3ption Ro s Count fieldname Ro s *eparator type Ro separator %escription :ame of the field that *pecify the ro a custom ro )nclude files count in ill contain the fileBsC ro countBsC. count. separator type for generating the ro separator.

'hen the *eparator type is set to custom" this setting is used to specify 3ptionally allo s you to insert a field containing the fileBsC count onto

Pentaho Data Integration TM

S oon !ser "#ide 1*-

output+ !iles Count fieldname

the stream. :ame of the field that ill contain the file counts.

Pentaho Data Integration TM

S oon !ser "#ide 1-0

11.6.7$. D#((1 P,#gin

)con

%ummy Plugin

11.6.7$.1. "enera, descri tion


$his step is provided as an e8ample for developers on ho documentation 'iki found here. to build a custom plug in. !or more information on plugin development" refer to plug in development page found in the Pentaho

Pentaho Data Integration TM

S oon !ser "#ide 1-1

12. 9ob Settings

12.1. Descri tion


9ob *etting are options that control ho a Dob is behaving and ho it is logging hat it is doing. $o access 9ob *ettings" select 9obW*ettings from the menubar.

12.2. 9ob Tab


$his table describes all of the general 9ob *ettings found on the 9ob tab@ 3ption 9ob :ame %escription 78tended description *tatus Aersion %irectory Created by Created at ,ast modified by ,ast modified at %escription $he name of the Dob. Note: $his is reHuired information if you ,ong e8tended description of the Dob %raft or production status Aersion description $he directory in the repository %isplays the date and time %isplays the date and time here the Dob is stored as created. as last modified. %isplays the original creator of the Dob. hen the Dob hen the Dob %isplays the user name of the last user that modified the Dob.

ant to save to a repository.

*hort description of the Dob" sho n in the repository e8plorer

12.$. Log Tab


$his table describes all of the general 9ob *ettings found on the ,og tab@ 3ption ,og connection Pentaho Data Integration TM %escription >se this connection to rite to a log table S oon !ser "#ide 1-2

3ption ,og table >se batch2)% Pass the batch2)% to Dob entries+ >se logfield to store logging in+ *;, button

%escription specifies the name of the log table Bfor e8ample ,Q7$,C >se a batch )% in the logging table Check this if you Check this if you te8t field. BC,3=C 4enerates the *;, needed to create the logging table and allo s you to e8ecute this *;, statement. ant to pass the generated uniHue batch )% to BtransformationC ant to store the logging of this Dob in the logging table in a long Dob entries in this Dob.

Pentaho Data Integration TM

S oon !ser "#ide 1-$

1$. 9ob 7ntries


1$.1. Descri tion
A Dob entry is one part of a Dob. 9ob entries can provide you e8ecuting transformations to getting files from a available Dob entry types. ith a ide range of functionality ranging from for a complete list of all eb server. Please see belo

1$.2. 9ob 7ntr1 T1 es


1$.2.1. Start

)con

*tart

1$.2.1.1. "enera, descri tion


*tart is here the Dob starts to e8ecute and is reHuired before the Dob can be e8ecuted. 3nly unconditional Dob hops are available from a *tart Dob entry. $he start icon also contains basic scheduling functionality.

1$.2.2. D#((1 9ob 7ntr1

)con

1$.2.2.1. "enera, Descri tion


>se the %ummy Dob entry to do nothing in a Dob. $his can be useful to make Dob dra ings clearer or for looping. %ummy performs no evaluation.

Pentaho Data Integration TM

S oon !ser "#ide 1-%

1$.2.$. Transfor(ation

)con

*tart

1$.2.$.1. "enera, descri tion


Oou can use the $ransformation Dob entry to e8ecute a previously defined transformation.

1$.2.$.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it :ame of transformation Repository directory !ilename *pecify log file :ame of log file 78tension of the log file )nclude date in filename )nclude time in filename ,ogging level Copy previous results to arguments Arguments ill be the same Dob entry. here the transformation is located. $he name of the transformation to start. $he directory in the repository )f youLre not orking the transformation to start. Check this if you ant to specify a separate logging file for the e8ecution of this transformation. $he directory and base name of the log file Bfor e8ample C@SlogsC $he filename e8tension Bfor e8ample@ log or t8tC Adds the system date to the filename. BQ200.#2(#C Adds the system time to the filename. BQ2(.&.&C *pecifies the logging level for the e8ecution of the transformation. *ee also the logging indo in #.. ,ogging $he results from a previous transformation can be sent to this one using the MCopy ro s to resultN step *pecify the strings to use as arguments for the transformation. ith a repository" specify the G6, filename of

Pentaho Data Integration TM

S oon !ser "#ide 1-&

3ption 78ecute once for every input ro Clear the list or result ro s before e8ecution Clear the list of result files before e8ecution

%escription *upport for MloopingN has been added by allo ing a transformation to be e8ecuted once for every input ro . Checking this makes sure that the list or result ro s is cleared before the transformation is started. Checking this makes sure that the list or result files is cleared before the transformation is started.

NOTE: you can use variables Rbpathc in the filename and transformation name fields to specify

the transformation to be e8ecuted.

Pentaho Data Integration TM

S oon !ser "#ide 1-6

1$.2.%. 9ob

)con

9ob

1$.2.%.1. "enera, descri tion


Oou can use the 9ob Dob entry to e8ecute a previously defined Dob. 5) N#N4; Although it is possible to create a recursive" never ending Dob that points to itself" you

should be a are. $his Dob

ill probably eventually fail

ith an out of memory or stack error.

1$.2.%.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. :ame of transformation Repository directory !ilename *pecify log file :ame of log file 78tension of the log file )nclude date in filename Adds the system date to the filename. BQ200.#2(#C $he directory in the repository )f youLre not Dob to start. Check this if you ant to specify a separate logging file for the e8ecution of this Dob. $he directory and base name of the log file Bfor e8ample C@SlogsC $he filename e8tension Bfor e8ample@ log or t8tC orking here the Dob is located. ith a repository" specify the G6, filename of the $he name of the Dob to start. ill

Pentaho Data Integration TM

S oon !ser "#ide 1-7

3ption )nclude time in filename ,ogging level Copy previous results to arguments Arguments 78ecute once for every input ro

%escription Adds the system time to the filename. BQ2(.&.&C *pecifies the logging level for the e8ecution of the Dob. *ee also the logging indo in #.. ,ogging $he results from a previous transformation can be sent to this Dob using the MCopy ro s to resultN step in a transformation. *pecify the strings to use as arguments for the Dob. $his implements looping. )f the previous Dob entry returns a set of result ro s" you can have this Dob e8ecuted once for every ro ro found. 3ne is passed to this Dob at every e8ecution. !or e8ample you can

e8ecute a Dob for each file found in a directory using this option. NOTE: you can use variables Rbpathc in the filename and Dob name fields to specify the Dob to be

e8ecuted.

Pentaho Data Integration TM

S oon !ser "#ide 1-*

1$.2.&. She,,

)con

*hell

1$.2.&.1. "enera, descri tion


Oou can use the *hell Dob entry to e8ecute a shell script on the host NOTE: *hell scripts canSoutput te8t to the console here the Dob is running.

indo . $his output

ill be transferred to the

/ettle logging system. %oing this no longer blocks the shell script.
NOTE: 3n 'indo s" scripts are preceded by MC6%.7G7 <CN B:$<GP<2000C or MC366A:%.C36 <CN

B&."&1C.

1$.2.&.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it *cript file name 'orking directory ill be the same Dob entry. $he filename of the shell script to start" should include full path else Rbuser.dirc is used as path. $he directory that script. $he script. 'hen the field is left empty or the Rbuser.dirc *pecify log file :ame of log file 78tension of the log file ill be used as Check this if you orking directory is invalid orking directory. ill be used as orking directory for the shell hen the shell orking directory only becomes active

script starts so M!ilenameN should still include the full path to the

ant to specify a separate logging file for the

e8ecution of this shell script. $he directory and base name of the log file Bfor e8ample C@SlogsC $he filename e8tension Bfor e8ample@ log or t8tC

Pentaho Data Integration TM

S oon !ser "#ide 1--

3ption )nclude date in filename+ )nclude time in filename+ ,oglevel Copy previous results to arguments+ 78ecute once for every input ro

%escription Adds the system date to the filename. BQ200.#2(#C Adds the system time to the filename. BQ2(.&.&C *pecifies the logging level for the e8ecution of the shell. *ee also the logging indo in #.. ,ogging $he results from a previous transformation can be sent to the shell script using the MCopy ro s to resultN step. Bas argumentsC $his implements looping. )f the previous Dob entry returns a set of result ro s" you can have this shell script e8ecuted once for every ro found. 3ne ro is passed to this script at every e8ecution in can then be found on command line combination ith the copy previous result to arguments. $he values

of the corresponding result ro Arguments table

argument R#" R2" ... BX#" X2" X(" ... on 'indo sC *pecify the strings to use as arguments for the shell script.

Pentaho Data Integration TM

S oon !ser "#ide 200

1$.2.6. Mai,

)con

9ob 6ail

1$.2.6.1. "enera, descri tion


Oou can use the 6ail Dob entry to send an e26ail.

1$.2.6.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. %estination address >se authentication Authentication user Authentication pass ord *6$P server Reply address *ubDect )nclude date in message Contact person Contact phone Comment Attach files to message *elect the result files types to attach. Eip files into a single archive $he name of the contact person to be placed in the e26ail $he contact telephone number to be placed in the e26ail Additional comment to be placed in the e26ail Check this if you ant to attach files to this message. 'hen a transformation Bor DobC processes files Bte8t" e8cel" dbf" etcC an entry is being added to the list of files in the result of that transformation or Dob. *pecify the types of result files you Check this if you ant to add. ant to Fip all selected files into a single archive $he mail server to hich the mail has to be sent. $he reply address for this e26ail $he subDect of the e26ail Check this if you ant to include the date in the e26ail $he destination for the e26ail Check this if your *6$P server reHuires you to authenticate yourself. $he user name to authenticate $he pass ord to authenticate ith ith. ill

BrecommendedJC $he Fip filename *pecify the name of the Fip file that ill be placed into the e2mail. NOTE: All te8t fields can be specified using Benvironment and /ettleC

Pentaho Data Integration TM

S oon !ser "#ide 201

3ption

%escription

variables" possibly set in a previous transformation using the *et Aariable step.

Pentaho Data Integration TM

S oon !ser "#ide 202

1$.2.7. S;L

)con

78ecute *;, *cript

1$.2.7.1. "enera, descri tion


Oou can use the *;, Dob entry to e8ecute an *;, script. $his means a number of *;, statements separated by.

1$.2.7.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it %atabase Connection >se variable substitution+ *;, script ill be the same Dob entry. $he database connection to use. 7nables kettle variables to be used in the *;, *cript. $he *;, script to e8ecute.

Pentaho Data Integration TM

S oon !ser "#ide 20$

1$.2.*. "et a fi,e /ith 5TP

)con

4et a file

ith !$P

1$.2.*.1. "enera, descri tion


Oou can use the !$P Dob entry to get one or more files from an !$P server.

1$.2.*.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. !$P server name >ser name Pass ord Remote directory $arget directory 'ildcard $he name of the server or the )P address $he user name to log into the !$P server $he pass ord to log into the !$P server $he remote directory on the !$P server from $he directory on the machine on place the transferred files *pecify a regular e8pression here if you !or e8ample@ .*txtD 5.*Y!-Z[F.txt .txt >se binary mode+ $imeout Remove files after retrieval+ %onLt over rite files Pentaho Data Integration TM Check this if the files need to be transferred in binary mode. $he !$P server timeout in seconds. Remove the files on the !$P server" but only after all selected files have been successfully transferred. *kip a file directory. S oon !ser "#ide 20% hen a file ith identical name already e8ists in the target @ get all text files @ files tarti$g %ith 5 e$di$g %ith a $umber a$d ant to select multiple files. hich e get the files hich you ant to hich /ettle runs in ill be the

3ption >se active !$P connection Control 7ncoding

%escription Check this to use active mode !$P instead of the passive mode BdefaultC. $he encoding to use for the ftp control instructions" the encoding matters e.g. for the ftpLing of filenames hen they contain special characters. !or 'estern 7urope and the >*A M)*3211.&2#N should suffice. Oou can enter any encoding that is valid on your server.

1$.2.*.$. :otes
*ome !$P servers do not allo e8ampleC. $herefore" files to be !$PLed hen they contain certain characters Bspaces for hen choosing filenames for files to be !$PLed" be sure to check up front

hether your particular !$P server is able to process your kind of filenames.

Pentaho Data Integration TM

S oon !ser "#ide 20&

1$.2.-. Tab,e 78ists

)con

$able 78ists

1$.2.-.1. "enera, descri tion


Oou can use the $able 78ists Dob entry to verify if a certain table e8ists on a database.

1$.2.-.2. O tions
3ption :ame of the Dob entry %atabase connection $able name $he name of the database table to check %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. $he database connection to use ill be the same

Pentaho Data Integration TM

S oon !ser "#ide 206

1$.2.10. 5i,e 78ists

)con

!ile 78ists

1$.2.10.1. "enera, descri tion


Oou can use the !ile 78ists Dob entry to verify if a certain file e8ists on the server on runs. hich /ettle

1$.2.10.2. O tions
3ption :ame of the Dob entry !ilename %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. $he name and path of the file to check for ill be the

Pentaho Data Integration TM

S oon !ser "#ide 207

1$.2.11. "et a fi,e /ith S5TP

)con

4et files

ith *ecure!$P

1$.2.11.1. "enera, descri tion


Oou can use the *!$P Dob entry to get one or more files from an !$P server using the *ecure !$P protocol.

1$.2.11.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it *!$P2server name < )P Port >ser name Pass ord Remote directory $arget directory 'ildcard ill be the same Dob entry. $he name of the *!$P server or the )P address $he $CP port to use. $his is usually 22 $he user name to log into the *!$P server $he pass ord to log into the *!$P server $he remote directory on the *!$P server from $he directory on the machine on to place the transferred files *pecify a regular e8pression here if you !or e8ample@ .*txtD 5.*Y!-Z[F.txt .txt Remove files after retrieval+ Remove the files after they have been successfully transferred. @ get all text files @ files tarti$g %ith 5 e$di$g %ith a $umber a$d ant to select multiple files. hich e get the files hich you ant hich /ettle runs in

Pentaho Data Integration TM

S oon !ser "#ide 20*

1$.2.12. >TTP

)con

?$$P $ransfer

1$.2.12.1. "enera, descri tion


Oou can use the ?$$P Dob entry to get a file from a eb server using the ?$$P protocol.

1$.2.12.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it >R, Run for every result ro ill be the same Dob entry. ant to run this Dob entry for every ro that as $he >R, to use Bfor e8ample@ http@<<kettle.pentaho.orgC Check this if you resultN )nput field >R, $arget filename Append to specified target file Add date and time to target filename $arget filename e8tension >pload file >sername Pass ord Pro8y server for upload Pro8y port $he username to authenticate $he pass ord to authenticate ith. !or 'indo s %omains" put the ith. %omain in from of the user like this %36A):S>sername $he ?$$P pro8y server name or )P address $he ?$$P pro8y port to use Busually 1010C Check this if you ant to add date and time yyy66ddQ??mmss to the target filename. *pecify the target filename e8tension in case youLre adding a date and time to the filename $he target filename. Append to the target file if it already e8ists hich contains $he fieldname in the result ro s to get the >R, from generated by a previous transformation. >se the MCopy ro s to

Pentaho Data Integration TM

S oon !ser "#ide 20-

3ption )gnore pro8y for hosts

%escription *pecify a regular e8pression matching the hosts you W separated. !or e8ample #27S.0S..] ant to ignore"

Pentaho Data Integration TM

S oon !ser "#ide 210

1$.2.1$. Create a fi,e

)con

Create file

1$.2.1$.1. "enera, descri tion


Oou can use the Create a file Dob entry to create an empty file. $his is useful for creating MtriggerN files from ithin Dobs.

1$.2.1$.2. O tions
3ption 9ob entry name !ile name !ail if file e8ists %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file to create an empty file. $he Dob entry ill follo the failure outgoing hop hen the file to be created already e8ists Bempty or notC and this option is s itched on. $he default is on.

Pentaho Data Integration TM

S oon !ser "#ide 211

1$.2.1%. De,ete a fi,e

)con

%elete file

1$.2.1%.1. "enera, descri tion


Oou can use the %elete a file Dob entry to delete a file Bempty or notC.

1$.2.1%.2. O tions
3ption 9ob entry name !ile name !ail if file doesnLt e8ist %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file to delete. $he Dob entry default is off. ill follo the failure outgoing hop hen the file to be deleted does not e8ist anymore and this option is s itched on. $he

Pentaho Data Integration TM

S oon !ser "#ide 212

1$.2.1&. )ait for a fi,e

)con

'ait for file

1$.2.1&.1. "enera, descri tion


Oou can use the 'ait for file Dob entry to check ait for a file. $his Dob entry hich the flo ill sleep and periodically hether the specified file e8ists after ill continue. $he Dob entry can either

ait indefinitely for the file or it can timeout after a certain time.

1$.2.1&.2. O tions
3ption 9ob entry name !ile name 6a8imum timeout %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file to is the number seconds after option Check cycle time ill determine ait for. ait indefinitely. $his ill continue even if the file ill be $he ma8imum timeout in number of seconds" or 0 to hich the flo

as not created. 'hen the timeout is reached the M*uccess on timeoutN hether the outgoing success or failure hop follo ed. $he time in seconds bet een checking for the file. $he file ill be checked for in the start of the e8ecution and then every Mcheck cycle timeN seconds until the ma8imum timeout is reached. A Dob can only be stopped every Mcheck cycle timeN as else the Dob entry step ill be sleeping. A check cycle time of (0 or 00 seconds seems to be a good trade2off bet een the period until the file is detected and the reHuired CP> usage. *uccess on timeout $his option determines outgoing hop !ile siFe check hat to do hen the M6a8imum timeoutN has been ill evaluate to success the success ill after detecting the specified reached. )f enabled" the Dob entry

ill be follo ed if the file is not detected after timeout.

'hen this is s itched on the Dob entry

file" only continue if the file siFe hasnLt changed the last check Mcycle time secondsN. $his is useful e.g. if a file is created in place Balthough itLs recommended to generate a file else here and then move it in placeC.

Pentaho Data Integration TM

S oon !ser "#ide 21$

1$.2.16. 5i,e co( are

)con

!ile Compare

1$.2.16.1. "enera, descri tion


Oou can use the !ile compare Dob entry to compare the contents of 2 files and control the flo the Dob by it. 'hen the contents of the files are the same the success outgoing hop follo ed" else the failure hop ill be follo ed. ill be of

1$.2.16.2. O tions
3ption 9ob entry name !ile name # !ile name 2 %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file of the first file to compare. $he name and path of the file of the second file to compare.

Pentaho Data Integration TM

S oon !ser "#ide 21%

1$.2.17. P#t a fi,e /ith S5TP

)con

!ile Compare

1$.2.17.1. "enera, descri tion


Oou can use the Put files *ecure !$P protocol. ith *!$P Dob entry to put one or more files to an !$P server using the

1$.2.17.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. *!$P2server name B)PC *!$P port >ser name Pass ord Remote directory ,ocal directory 'ildcard $he name of the *!$P server or the )P address $he $CP port to use. $his is usually 22 $he user name to log into the *!$P server $he pass ord to log into the *!$P server $he remote directory on the *!$P server to $he directory on the machine on ant to ftp the files from *pecify a regular e8pression here if you !or e8ample@ .*txtD 5.*Y!-Z[F.txt .txt Remove files after transferral+ Remove the files after they have been successfully transferred. @ get all text files @ files tarti$g %ith 5 e$di$g %ith a $umber a$d ant to select multiple files. hich e put the files hich you hich /ettle runs from ill

Pentaho Data Integration TM

S oon !ser "#ide 21&

1$.2.1*. Ping a host

)con

Ping a host

1$.2.1*.1. "enera, descri tion


Oou can use the Ping a host Dob entry to ping a host using the )C6P protocol.

1$.2.1*.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. ?ost name<)P *end...packets $he name or the )P address of the host to ping $he number of packets to send Bby default 2C ill

Pentaho Data Integration TM

S oon !ser "#ide 216

1$.2.1-. )ait for

)con

'ait for

1$.2.1-.1. "enera, descri tion


Oou can use the 'ait for to ait a delay before running the ne8t Dob entry.

1$.2.1-.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it the same Dob entry. 'ait for >nit time $he delay to ait *pecify the unit time Bsecond" minute and hourC ill be

Pentaho Data Integration TM

S oon !ser "#ide 217

1$.2.20. Dis ,a1 Msgbo8 info

)con

6sg=o8 )nfo

1$.2.20.1. "enera, descri tion


$his Dob entry allo you to display a message bo8 in Dob. Oou can easily see here you are in the process. $his entry is only available using the 4raphical >ser )nterface to e8ecute the Dob.

1$.2.20.2. O tions
3ption :ame of the Dob entry 6essage title 6essage body %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it entry. $he title of the message $he message to display ill be the same Dob

Pentaho Data Integration TM

S oon !ser "#ide 21*

1$.2.21. 'bort Cob

)con

6sg=o8 )nfo

1$.2.21.1. "enera, descri tion


>se this Dob entry if you ant to abort a Dob.

1$.2.21.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. 6essage 6essage to add in log hen aborting ill

Pentaho Data Integration TM

S oon !ser "#ide 21-

1$.2.22. FSL transfor(ation

)con

G*, $ransformation

1$.2.22.1. "enera, descri tion


G*, transformation Dob entry is designed to transform Bby applying G*, document C G6, documents into other documents BG6, or other format" such as ?$6, or plain te8tC. $he original document is not changed^ rather" a ne the G6, file. document is created based on the content of

1$.2.22.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it G6, !ile name G*, !ile name 3utput !ile name )f file e8ists ill be the same Dob entry. $he full name of the source G6, file $he full name of the G*, file $he full name of the created document Bresult of G*, transformationC %efine the behavior e8ists 3ptions @ Create ne created %o nothing @ nothing !ail @ the Dob ill fail ill be done ith uniHue name @ a ne output file ill be hen an output file ith the same name

Pentaho Data Integration TM

S oon !ser "#ide 220

1$.2.2$. Di fi,es

)con

Create a Eip file

1$.2.2$.1. "enera, descri tion


$his step creates a standard E)P archive using the options you specify in the dialog.

1$.2.2$.2. O tions
3ption :ame of the Dob entry *ource directory )nclude 78clude ildcard Eip file name Compression )f Fip file e8ists After Fipping 6ove files to $he full name of the destination archive $he compression level to be used B%efault" =est Compression" =est speedC $he action to take hen there already is a file at the target destination. $he action to take after Fipping $he target directory to move the source files to after Fipping ildcard %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. $he source directory of the files to be Fipped $he $he ildcard Bregular e8pressionC of the files to include in the Fip archive ildcard Bregular e8pressionC of the files to e8clude from the Fip archive ill be the same

Pentaho Data Integration TM

S oon !ser "#ide 221

1$.2.2%. =#,3,oad into M1S;L

)con

=ulkload into 6y*;,

1$.2.2%.1. "enera, descri tion


$his step is used to perform bulk load operations from a flat file into a 6y*;, database table.

1$.2.2%.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. %atabase connection $arget schema $arget table name *ource file name ,ocal $he database connection used to important for data sources that allo $he name of the table to rite data to. rite data to. $his is ith dots Z.V )n it. for table names $he name of the *chema for the table to rite data to. ill

:ame of the te8t file to load data from. 7nabled@ the file is read by the client program on the client host and sent to the server. %isabled@ the file must be located on the server host and is read

Pentaho Data Integration TM

S oon !ser "#ide 222

3ption Priority !ields terminated by !ields enclosed by !ields escaped by ,ines started by ,ines terminated by !ields

%escription directly by the server. *pecify the priority in 6y*;, for the bulk load. *pecify the fields delimiter in the te8t file source. *pecify the enclosure character for fields in the source te8t file. *pecify the escape character for fields in the source te8t file. *pecify the characterBsC used to indicate the start of a ro source te8t file. *pecify the characterBsC used to indicate the end of a ro te8t file. *pecify the names of attributes of Ttable:ameU that are set by your data file Bseparated by commasC. Any attributes unspecified in the list of attributes ill be set to :>,,. in the source in the

Replace data )gnore the first ... lines Add files to result

7nable this option to over rite e8isting data in the target table. 3ptionally specify a number of lines to ignore. 7nable this to add the destination files to the results file names. $his is useful if you Dob entry. ant to attach these files to an email using the 7mail

Pentaho Data Integration TM

S oon !ser "#ide 22$

1$.2.2&. "et Mai,s fro( POP

)con

4et 6ails from P3P

1$.2.2&.1. "enera, descri tion


$his step provides the ability to read one or more emails from a P3P server.

1$.2.2&.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it the same Dob entry. *ource ?ost >sername Pass ord >se P3P Port $arget directory $arget filename pattern Retrieve Retrieve the .. first emails %elete emails after ith **, $he host name or )P address of the P3P mail server. $he username for authenticating to the P3P server. $he pass ord for authenticating to the P3P server. 7nable this option to connect using a *ecure *ocket ,ayer B**,C connection. 'hen **, option is enabled" use this property to set the )P port for **, communication ith the P3P server. here to land the emails retrieved. ildcard used to identify the target *pecify the target directory for *pecify the regular e8pression filenames. >se this to specify hether to retrieve all emails" unread emails" or a specific number of emails. )f the Retrieve property is set to L!irst...emailsL" this property is used to specify the number of emails to retrieve. 7nable this option to delete all retrieved emails from the P3P server. ill be

Pentaho Data Integration TM

S oon !ser "#ide 22%

1$.2.26. De,ete 5i,es

)con

%elete !iles

1$.2.26.1. "enera, descri tion


$his step is used to delete one or more files from a specified folder.

1$.2.26.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. )nclude *ubfolders Copy previous results to args+ !ile<!older 'ildcard !iles<!olders $able 7nable this option to also delete matched files from subfolders of the target directory. 7nable this to pass the results of the previous entry to the arguments of this entry. $he target file or folder to delete files from. $he regular e8pression used to define the file name pattern for the files to delete. $his table displays the list of currently defined files and folders to delete. ill be the

Pentaho Data Integration TM

S oon !ser "#ide 22&

1$.2.27. S#ccess

)con

*uccess

1$.2.27.1. "enera, descri tion


$his step is similar to the dummy step in that it does not do anything. )ts main function is as a placeholder for routing the Dob flo upon successful evaluation.

1$.2.27.2. O tions
3ption *uccess %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. ill be the same

Pentaho Data Integration TM

S oon !ser "#ide 226

1$.2.2*. FSD @a,idator

)con

G*% Aalidator

1$.2.2*.1. "enera, descri tion


$his step ill validate an G6, file against and G6, *chema %efinition BG*%C.

1$.2.2*.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. G6, !ile name G*% !ile name *pecify the name of the G6, document to validate. *pecify the name of the G*% file used for validation of the G6, document. ill be the same

Pentaho Data Integration TM

S oon !ser "#ide 227

1$.2.2-. )rite to ,og

)con

'rite to log

1$.2.2-.1. "enera, descri tion


$his step provides the ability to rite an entry to the e8ecution log.

1$.2.2-.2. O tions
3ption 'rite to log %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. ,og level ,og subDect ,og message *pecify the log level condition for ritten to the log. *pecify a short subDect for the log message. *pecify the detailed message to be ritten to the log file. hen the specified log message should be ill be the same

Pentaho Data Integration TM

S oon !ser "#ide 22*

1$.2.$0. Co 1 5i,es

)con

Copy !iles

1$.2.$0.1. "enera, descri tion


$his step provides the ability to copy one or more files to another location.

1$.2.$0.2. "enera, Tab


3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. )nclude *ubfolders %estination is a file Copy empty folders Replace e8isting files Remove source files Copy previous results to args !ile<!older source !ile<!older destination 'ildcard !ile<!olders 7nable this option to also copy matched files from subfolders of the target directory. 7nable this option if the target of the copy is a file. )f including subfolders" this option allo s you to specify to copy empty folders. 7nable this option to automatically over rite any e8isting files. 7nable this option to remove the source files after copy is completed. 7nable this to pass the results of the previous entry to the arguments of this entry. *pecify the source file or folder to copy. *pecify the target file or folder to copy files to. $he regular e8pression used to define the file name pattern for the files to copy. $his table displays the list of currently defined files and folders to copy. hether or not ill

Pentaho Data Integration TM

S oon !ser "#ide 22-

1$.2.$0.$. 0es#,ts fi,es na(es


$he LAdd files to result files nameL useful if you ill add the destination files to the results file names. $his is ant to attach these files to an email using the 7mail Dob entry.

Pentaho Data Integration TM

S oon !ser "#ide 2$0

1$.2.$1. DTD @a,idator

)con

%$% Aalidator

1$.2.$1.1. "enera, descri tion


$his step provides the ability to validate an G6, document against a %ocument $ype %efinition B%$%C.

1$.2.$1.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. G6, !ile name %$% )ntern %$% !ile name *pecify the G6, document to validate. 7nable this option if the %$% is defined validated. *pecify the file name containing the %$% used for validation. ithing the G6, document being ill be the same

Pentaho Data Integration TM

S oon !ser "#ide 2$1

1$.2.$2. P#t a fi,e /ith 5TP

)con

Put files

ith !$P

1$.2.$2.1. "enera, descri tion


Oou can use the Put files !$P protocol. ith !$P Dob entry to put one or more files on an !$P server using the

1$.2.$2.2. O tions
3ption :ame of this Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it !$P server name<)P address Port >sername Pass ord ,ocal directory Remote directory 'ildcard Bregular e8pressionC ill be the same Dob entry. $he name of the !$P server or the )P address $he $CP port to use. $his is usually 2# $he user name to log into the !$P server $he pass ord to log into the !$P server $he directory on the machine on hich you files *pecify a regular e8pression here if you multiple files. !or e8ample@ .*txtD @ get all text files e$di$g %ith a $umber a$d .txt =inary mode+ Pentaho Data Integration TM 7nable this option to perform the transfer in =inary mode. S oon !ser "#ide 2$2 5.*Y!-Z[F.txt @ files tarti$g %ith 5 ant to select hich /ettle runs from hich e put the ant to !$P the files from

$he remote directory on the !$P server to

3ption $imeout Remove files after transferal+ %onLt overrite files >se active !$P connection Control 7ncoding

%escription *pecify the timeout period before ending in error. Remove the files after they have been successfully transferred. 7nable this option to prevent over riting any e8isting files on the target !$P server. 7nable this option to use and active !$P connection. *pecify the character set to user for filenames and directories.

Pentaho Data Integration TM

S oon !ser "#ide 2$$

1$.2.$$. !nEi

)con

>nFip

1$.2.$$.1. "enera, descri tion


$his step is used to decompress a Fip file to a specified location.

1$.2.$$.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. Eip !ile name $arget %irectory )nclude 'ildcard 78clude 'ildcard After e8traction 6ove !iles $o Add e8tracted file to result *pecify the name of the file to unFip. *pecify the target directory to place the unFipped contents. *pecify a regular e8pression unFip. *pecify a regular e8pression from the unFip process. *pecify the action to take after unFipping the file. 3ptions include do nothing" delete files" or move files to specified location. )f the move files action is specified in the LAfter 78tractionL property" this field is used to identify the target location to move the files to. 7nable this to add the destination files to the results file names. $his is useful if you entry. ant to attach these files to an email using the 7mail Dob ildcard for any files you ant to e8clude ildcard for the specific files you ant to ill be the

Pentaho Data Integration TM

S oon !ser "#ide 2$%

1$.2.$%. D#((1 9ob 7ntr1

)con

1$.2.$%.1. "enera, descri tion


$his is an e8ample plugin used to illustrate ho provided to highlight a to develop and deploy your o n custom plugins into /ettle. $he functionality of the %ummy 9ob 7ntry B4et files from *!$PC is arbitrary and only orking plugin. !or more information about developing /ettle plugins" visit
'riting your o n Pentaho %ata )ntegration Plug2)n on the Pentaho 'iki.

Pentaho Data Integration TM

S oon !ser "#ide 2$&

1%. "ra hica, @ie/


1%.1. Descri tion
$he 4raphical Aie type@ tab contains the canvas on hich transformations and Dobs are dra n. $here ill be a separate tab for each Dob and<or transformation you currently have open ith an icon indicating the file

$ransformation )con

9ob )con

$he 4raphical Aie and the flo

tabBsC provide an easy to understand representation of the

ork that needs to be done

of the data.

1%.2. 'dding ste s or Cob entries


1%.2.1. Create ste s b1 drag and dro
Adding steps to a transformation Bor a Dob entry to a DobC on the canvas is easy@ simply select a step type from the tree on the left and drag in onto the canvas@

At the location of the mouse you the transformation.

ill see a sHuare that represents the location of the steps

hen you let go

of the button. 'hen you let go of the mouse button the selected step B$e8t file inputC

ill become part of

Oou can also add a transformation step by right2clicking on the

orkspace and selecting :e

*tep...W*tep

type.
Pentaho Data Integration TM S oon !ser "#ide 2$6

1%.$. >iding a ste


)f you right click on a step or Dob entry that is dra n on the graphical vie " you allo s you to select the option@ M?ide stepN. $his delete it. ill get a popup2menu that ill remove the step from the graphical vie " but not

1%.%. Transfor(ation Ste o tions Aright2c,ic3 (en#B


$his section describes the right2click menu options 4raphical Aie . hen you right2click on a transformation step in the

1%.%.1. 7dit ste


$his opens the step dialog so that you can change its settings.

1%.%.2. 7dit ste descri tion


$his opens a dialog that allo s you to enter a te8tual description of the step.

1%.%.$. Data (o6e(ent


*ee %istribute or copy for a complete description of the available data movement options.

1%.%.%. Change n#(ber of co ies to start...


*ee ,aunching several copies of a step.

1%.%.&. Co 1 to c,i board


$his option allo s you to copy the G6, defining the step to the clipboard. Oou can then paste this step into another transformation.

1%.%.6. D# ,icate Ste


$his option ill create a copy" positioned a bit lo er to the right of the original step.

1%.%.7. De,ete ste


$his ill permanently remove the step from the transformation.

1%.%.*. >ide Ste


$his ill hide the step in the 4raphical Aie " but not remove it from the transformation.

1%.%.-. Sho/ in #t fie,ds


$his option tries to determine all the fields and their origin by tracing the input2streams back to their source.

1%.%.10. Sho/ o#t #t fie,ds


$his option adds the fields of the current step to the ones of the input fields and sho s the result.

1%.&. 9ob entr1 o tions Aright2c,ic3 (en#B


1%.&.1. O en Transfor(ation<9ob
$his opens a ne tab displaying the selected transformation or Dob.

1%.&.2. 7dit Cob entr1


$his opens the dialog for the Dob entry allo Pentaho Data Integration TM you to change the settings S oon !ser "#ide 2$7

1%.&.$. 7dit Cob entr1 descri tion


$his opens a dialog that allo s you to enter a te8tual description of the Dob entry.

1%.&.%. Create shado/ co 1 of Cob entr1


$his option ill create a copy" positioned a bit lo er to the right of the original Dob entry.

1%.&.&. Co 1 se,ected entries to c,i board ACT0L2CB


Copies the G6, describing the selected Dob entries to the clipboard.

1%.&.6. ',ign < distrib#te


$his option allo s you to keep the graph clean by aligning Dob entries ith each other.

1%.&.7. Detach entr1


>nlinks this Dob entry from the hops that connect it to other steps.

1%.&.*. De,ete a,, co ies of this entr1.


%elete all copies of this Dob entry" not Dust this oneJ

1%.6. 'dding ho s
3n the graphical vie the Huickest ay to create a ne hop is by dragging ith the mouse from one step to another using the middle button. Oou can also drag the left button and press the *?)!$ key at the same time. !or a more complete e8planation regarding hops" please refer to chapter on ?ops.

Pentaho Data Integration TM

S oon !ser "#ide 2$*

1&. 0#nning a Transfor(ation


1&.1. 0#nning a Transfor(ation O6er6ie/
'hen you are finished modifying your transformation" you can run it by clicking on the run button from the main menu" toolbar or by pressing !&.

78ecute a transformation

1&.2. 78ec#tion O tions


1&.2.1. )here to 78ec#te
$here are three options for deciding here you ant your transformation to be e8ecuted@ ill be run on the machine you are currently using here you ant the e8ecution to take Local Execution@ the transformation or Dob

Execute remotel'@ allo s you to specify a remote server

place. $his feature reHuires that you have Pentaho %ata )ntegration installed on a remote machine and running the Carte service. *ee the #-.-.( Configuring a remote or slave server for more details on setting up remote and slave servers. Execute clustered@ Allo s you to e8ecute the Dob or transformation in a clustered environment. *ee the section on Clustering for more details on ho clustered environment. to e8ecute a Dob or transformation in a

1&.2.2. Other O tions


$he follo ing table provides a detailed description of other 78ecution options@ 3ption 7nable *afe mode ,og level Pentaho Data Integration TM %escription Places the transformation in *afe 6ode. Additional ro runtime" see also@ *afe 6ode $his allo s you to specify the level of detail you ant to capture in the log. !or S oon !ser "#ide 2$checking is enabled at

3ption Replay date

%escription detailed descriptions of the log level types see ,ogging. $his ill set the replay date for hen you ant to replay the transformation. )t ill pick up information in the te8t file input or 78cel input steps to skip ro s already processed on the replay date.

Arguments Aariables

$his grid allo s you to set the value of arguments to be used transformation. $his grid allo s you to set the value of variables to be used transformation.

hen running the hen running the

1&.$. Setting # 0e(ote and S,a6e Ser6ers


1&.$.1. "enera, descri tion
*lave servers allo having a small you to e8ecute a transformation on a remote server. *etting up a slave server reHuires ill accept input from either eb2server running on your remote machine called MCarteN that

*poon Bremote 5 clustered e8ecutionC or from the $ransformation Dob entry Bclustered e8ecutionC.

1&.$.2. Config#ring a re(ote or s,a6e ser6er


)nstall Pentaho %ata )ntegration on server you ant to use to remotely e8ecute transformations Bfor more information on setting up a remote server" see the chapter on )nstallationC. $he installation includes a small eb server called Carte used to support remote reHuests. *tart the Carte server by running Carte.bat B'indo sC or carte.sh from the root of your Pentaho %ata )ntegration installation. :e8t" you need to point your master server to each of the slave server. $o do this" double click on Z*lave serverV node in the tree control on the left" or by right2clicking on Z*lave *erverV and selecting the :e *erver option. *lave

1&.$.2.1. Ser6ice tab o tions


3ption *erver name ?ostname or )P address Port >sername Pentaho Data Integration TM %efines the port you ish to use for communicating ith the remote server 7nter the username credential for accessing the remote server S oon !ser "#ide 2%0 %escription $he friendly name of the server you ish to use as a slave $he address of the machine to be used as a slave

3ption Pass ord )s the master

%escription 7nter the pass ord credential for accessing the remote server $his setting tells Pentaho %ata )ntegration that this server ill act as the master server in any clustered e8ecutions of the transformation

Note@

hen e8ecuting a transformation in a clustered environment" you should have # server

setup as the master and all remaining servers in the cluster as slaves.

1&.$.2.2. Pro81 tab o tions


3ption Pro8y server hostname $he pro8y server port )gnor pro8y for hosts@ rege8pWseparated %escription *ets the hostname for the Pro8y server you are connecting through *ets the port number used in communication *pecify the serverBsC for ith the pro8y hich the pro8y should not be active. $his

option supports specifying multiple servers using regular e8pressions. Oou can also add multiple servers and e8pressions separated by the ZWV character.

Pentaho Data Integration TM

S oon !ser "#ide 2%1

1&.%. C,#stering
1&.%.1. O6er6ie/
Clustering allo s transformations and transformation steps to be e8ecuted in parallel on more than one server. $he clustering schema defines clustered e8ecution options. hich slave servers you ant to assign to the cluster and a variety of

1&.%.2. Creating a c,#ster sche(a


=egin by double2clicking on the Z/ettle cluster schemasV node in the tree on the left or right2clicking on that node and selecting Z:e clustering schemaV@

1&.%.$. O tions
3ption *chema name Port %escription $he name of the clustering schema ?ere you can specify the port from server hich to start numbering ports for the slave servers. 7ach additional clustered step e8ecuting on a slave ill consume an additional port.

Note@ 6ake sure no other net orking protocols are in the same range

to avoid net orking problems.


*ockets buffer siFe *ockets flush interval *ockets data compressed+ *lave *ervers $he internal buffer siFe to use. $he amount of ro s after hich the internal buffer is sent completely over the net ork and emptied. 'hen this is checked" all data is compressed using the 4Fip compression algorithm to minimiFe net ork traffic. $his is a list of the servers to be used in the cluster. Oou should have one master server and any number of slave servers. $o add servers to the cluster" click on the Z*elect slave serversV button to select from the list of available slave servers. *ee B for more details on ho slave server. Pentaho Data Integration TM S oon !ser "#ide 2%2 to create a

1&.%.%. 0#nning transfor(ations #sing a c,#ster


'hen you chose to run a $ransformation" select the Z78ecute clusteredV. Oou options@ (ost trans/ormation@ *plit the transformation and post it to the different master and slave servers. (repare execution@ $his runs the initialiFation phase of the transformation on the master and slave servers. Start execution@ $his starts the actual e8ecution of the master and slave transformations. Sho! trans/ormations@ *ho the generated BconvertedC transformations that ill be e8ecuted on the cluster Bsee the =asic Clustering 78ample for more information generated transformationsC. ill have the follo ing

1&.%.&. =asic C,#stering 78a( ,e


*uppose that you have data from a database table and you 9ava*cript program. !or performance reasons you Oou begin by creating a cluster ant to run it through a particularly comple8 ant to e8ecute this program on . different hosts.

ith one master server and - slaves@

$hen you create the transformation as usual" connecting 2 steps e8ecute is running on a cluster@

ith a hop. Oou specify that the script to

Pentaho Data Integration TM

S oon !ser "#ide 2%$

$hen select the cluster to use@

$he transformation

ill then be dra n like this on the graphical vie @

$he C8- indicates that the step

ill be e8ecuted on a cluster. *uppose

e then store the calculated

information as a result in a table again@

'hen

e e8ecute this transformation locally" hen

ill see no difference

ith the usual result you e8pect from

a non2clustered e8ecution. $hat means that you can use the normal local e8ecution to test the transformation. ?o ever e can e8ecute the transformation in a clustered fashion like this@

Pentaho Data Integration TM

S oon !ser "#ide 2%%

)n this case" . transformations 3ne master@

ill be generated for the . servers in the cluster.

And - slaves transformations@

As you can see" data

ill be sent over the $CP<)P sockets using the *ocket 'riter and *ocket Reader steps.

Pentaho Data Integration TM

S oon !ser "#ide 2%&

16. Logging
16.1. Logging Descri tion

A log vie

tab

ill open automatically each time you run a transformation or Dob. $he log grid displays a list

of transformation steps or Dob entries for the current e8ecution. $he log te8t sho s log information based on the current logging level.

16.2. Log "rid


$he log grid is actually a tree that offers a hierarchical vie $ransformation steps and 9ob entries on the e8ecution of a transformation or Dob. ill be highlighted in red as seen above if they fail during e8ecution.

16.2.1. Transfor(ation Log "rid Detai,s


$he log grid displays the follo ing details for each step running in the transformation@ 3ption *tepname Copynr Read 'ritten )nput 3utput >pdated ReDected 7rrors Active $ime %escription $he name of the step Copy number of the step :umber of lines read from input2streams :umber of lines :umber of lines ritten to output2streams ritten to file or database :umber of lines read from file or database :umber of lines updated in the database :umber of errors that occurred $he status of the step@ running" finished or stopped $he number of seconds that the step has been running. $he speed in ro s per second at hich the step processes ro s.

Pentaho Data Integration TM

S oon !ser "#ide 2%6

3ption *peed input<output

%escription Priority of the step B#0Khighest" #Klo estC" nr of ro s in the input2streamBsC" nr of ro s in the output2streamBsC. *leep time Bget<putC is the time that a step had to go to sleep Bin nano secondsC because the input buffer as empty BgetC or the output buffer as full BputC.

NOTE: $he system is tuning the steps priority in such a

ay that the slo est steps get the highest

priority.

16.2.2. 9ob Log "rid


$he log grid displays the follo ing details for each Dob entry e8ecuting in the 9ob@ 3ption 9ob<9ob 7ntry Comment Result Reason :r ,og date %escription $he name of the Dob < Dob entry A comment on the state of the entry e8ecution $he result Bsuccess or failureC of the Dob entry Reason@ hy as this Dob entry started+ $he value of the nr variable in the result obDect Bavailable in evaluation 9avascriptC ,og date@ logging date" corresponds ith the start or end of the Dob entry.

16.$. =#ttons
16.$.1. 1&.%.1 Transfor(ation =#ttons
16.$.1.1. 1&.%.1.1. Start
$his button starts the transformation. Please note that *poon tries to launch this from the G6,2file or repository. )t is therefore necessary that the transformation is saved. $he output of the e8ecution is displayed in the ,og $e8t part of the ,og Aie .

16.$.1.2. Pre6ie/ Adeb#gB

Pentaho Data Integration TM

S oon !ser "#ide 2%7

$his button launches the $ransformation %ebug dialog allo ing you to specify the number of ro s to previe and define conditional breakpoints for the previe e8ecution. After configuring the e8ecution for the currently debug information" click the L;uick ,aunchL button to begin the previe

selected step. $he output of the e8ecution is displayed in the ,og $e8t part of the ,og Aie .

16.3.1.2.1.

ebug &ptions
$he follo ing table provides a detailed description of the debug options@ 3ption *tep ,ist %escription $he step list on the left displays a list of available steps from the current transformation. *elect a step to begin configuring related options like number of ro s and break2points. :umber of ro s to retrieve 7nter the ro s per step you ant to previe for the selected step. After the reHuested ro s are obtained from the different steps" the transformation is ended and the result is sho n. Note: $his option

ill only take effect

hen the LRetrieve first ro sL

option is checked.
Retrieve first ro s Bprevie C Pause transformation on condition =reak2 point<pause condition 7nter conditions based on comparing one field to another field or value. 7nable this to restrict the previe specified above. 7nable this option to cause the transformation to pause if one of the conditional break2points evaluates to true during e8ecution. siFe to the number of ro s

16.3.1.2.2.

ebug example
*tarting ith the simple transformation sho n here@

$he generate ro s step generates empty ro s and adds an id from # to #000. :o ant to pause the transformation and see the content of the ro id==\X $o do this" simply click on the debug icon in the main toolbar@ here@

Pentaho Data Integration TM

S oon !ser "#ide 2%*

As you can see"

e can specify a condition on

hich the transformation is paused. Oou as met. e8ecution and display the

can also specify to keep the last : ro s in memory before the condition Pressing the L;uick ,aunchL button ill begin the previe follo ing dialog upon meeting the condition@

!or convenience" the order of the ro s is reversed in the previe indo " you transformation

indo

so that the ro

that met the condition is al ays at the top of the results. After closing the previe ill note that the transformation is paused Bsee log tabC and you can then ill be paused again and another previe dialog ill display. resume e8ecution by clicking the resume button. )f a condition is met again" the

16.$.1.$. Sho/ error ,ines


$his button displays all lines of the ,og $e8t that contain the ord 7RR3R Blo er2 or uppercaseC. Oou can then choose to edit the source step of the error.

16.$.1.%. C,ear ,og


$his clears the te8t in the ,og $e8t 'indo .

16.$.1.&. Log Settings


$his is the M,og *ettingsN dialog@

Pentaho Data Integration TM

S oon !ser "#ide 2%-

)f you put a te8t in the filter field" only the lines that contain this te8t indo .

ill be sho n in the ,og $e8t

$he M,og levelN setting allo s you to select the logging level. Oou can choose one of these@ Ro :othing@ 7rror@ 6inimal@ =asic@ %etailed@ %ebug@ %onVt sho 3nly sho any output errors

3nly use minimal logging $his is the default basic logging level 4ive detailed logging output !or debugging purposes" very detailed output. level" this can generate a lot of data. ill be preceded by the time of day.

level@ ,ogging at a ro

)f the M7nable timeN option is enabled" all lines in the logging

16.$.1.6. >ide inacti6e


Checking this hides all steps that finished processing.

16.$.1.7. Safe (ode


Places the transformation in *afe 6ode. Additional ro
*afe mode.

checking is enabled at runtime" see also

Pentaho Data Integration TM

S oon !ser "#ide 2&0

16.$.2. 9ob =#ttons


16.$.2.1. Start Cob
$his button begins e8ecution of the current 9ob. Please note that *poon launches attempts to launch the Dob from and G6,2file or the /ettle repository. )t is therefore necessary that the Dob is saved. $he output of the e8ecution is displayed in the ,og $e8t part of the ,og Aie .

16.$.2.2. Sto Cob


$his button stops a running Dob.

16.$.2.$. 0efresh ,og


Refreshes the log indo .

16.$.2.%. C,ear ,og


$his clears the te8t in the ,og $e8t 'indo .

16.$.2.&. Log Settings


$his is the M,og *ettingsN dialog@

)f you put a te8t in the filter field" only the lines that contain this te8t indo .

ill be sho n in the ,og $e8t

$he M,og levelN setting allo s you to select the logging level. Oou can choose one of these@ 7rror@ :othing@ 6inimal@ =asic@ %etailed@ %ebug@ Ro level@ 3nly sho %onVt sho errors any output

3nly use minimal logging $his is the default basic logging level 4ive detailed logging output !or debugging purposes" very detailed output. ,ogging at a ro level" this can generate a lot of data. ill be preceded by the time of day.

)f the M7nable timeN option is enabled" all lines in the logging

16.$.2.6. '#to2refresh
7nable this option to disable the logging this indo from updating all the time. Oou might net ork connection. ant to do hen youLre using a remote desktop BA:C" G##C over a slo

Pentaho Data Integration TM

S oon !ser "#ide 2&1

17. "rids
17.1. Descri tion
4rids BtablesC are used throughout the *poon interface to enter or display information. $his section describes common functions available hen orking ith a 4rid.

17.2. !sage
Click on a cell to begin editing that field. After pressing enter" you can navigate the grid by using the cursor keys. Pressing enter again allo s you to edit the ne ly selected field. $he follo ing table describes the functions available 3ption )nsert before this ro )nsert after this ro 6ove the ro 6ove the ro up do n %escription )nserts an empty ro before the ro you clicked on. hen you right2click on a cell in the grid@

)nserts an empty ro
6ove the ro 6ove the ro

after the ro

you clicked on.

you clicked on up. $he keyboard shortcut for this is C$R,2>P you clicked on do n. $he keyboard shortcut for this is C$R,2%3':.

3ptimal column siFe including header 3ptimal column siFe e8cluding header Clear all *elect all ro s Clear selections Copy selected lines to clipboard Past clipboard to table Cut selected lines

ResiFe all columns so that it displays all values completely" including the header. $he keyboard shortcut for this function is function key !(. ResiFe all columns so that it displays all values completely. $he keyboard shortcut for this function is function key !-. Clears all information in the grid. Oou ill be asked to confirm this operation. *elects all ro s in the grid. $he keyboard shortcut for this function is C$R,2A. Clears the selection of ro s in the grid. $he keyboard shortcut for this function is 7*C. Copies the selected lines to the clipboard in a te8tual representation. $hese lines can then be e8changed ith other programs such as spreadsheets or even other hich *poon 5 /ettle dialogs. $he keyboard shortcut for this function is C$R,2C. )nsert the lines that are on the clipboard to the grid" right after the line on you clicked. $he keyboard shortcut for this function is C$R,2A. Copies the selected lines to the clipboard in a te8tual representation. After that" the lines are deleted from the grid. $he keyboard shortcut for this function is C$R,2G.

%elete selected lines /eep only selected lines Copy field values to all ro s >ndo Redo

%eletes all selected lines from the grid. $he keyboard shortcut for this function is %7,. )f there are more lines to delete then there are to keep" simply select the lines you ant to keep and use this function. $hey keyboard shortcut for this function is C$R,2/. )f all ro s in the grid need to have the same value for a certain column" you can use this function to do this. >ndo the previous grid operation. $he keyboard shortcut for this function is C$R,2 E. Redo the ne8t grid operation. $he keyboard shortcut for this function is C$R,2O.

Pentaho Data Integration TM

S oon !ser "#ide 2&2

1*. 0e ositor1 78 ,orer

1*.1. Descri tion


$he repository 78plorer sho s you a tree vie Connections Partition *chemas *lave servers Clusters $ransformations on the database repository to hich you are connected. )t allo s you to e8amine and modify the content from the repository including@

Oou can also click on the column header to sort content by name" obDect type" user or changed date.

1*.2. 0ight c,ic3 f#nctions


Right clicking on an obDect in the repository obDects. ill bring up basic functions such as open" delete and rename

1*.$. =ac3# < 0eco6er1


)t is possible to e8port the complete repository in G6,@ *ee the options available in the file menu of the repository e8plorer.

NOTE: you can restore the obDects from a backed up repository any here in the target repository directory

tree.

Pentaho Data Integration TM

S oon !ser "#ide 2&$

1-. Shared obCects


A variety of obDects can no include@ MshareN. %atabase connections *teps *lave servers Partition schemas Cluster schemas be placed in a shared obDects file on the local machine. $he default location for the shared obDects file is R?367<.kettle<shared.8ml. 3bDects that can be shared using this method

$o share one of these obDects" simply right2click on the obDect in the tree control on the left and choose

Note: $he location of the shared obDects file is configurable on the M6iscellaneousN tab of the

$ransformationW*ettings dialog.

Pentaho Data Integration TM

S oon !ser "#ide 2&%

20. 'PP7:DIF '? L"PL License


()* +9--97 (9)975+ #*:+1C +1C9)-9 Jersio$ ../ ,ebruary .ZZZ

Copyright (C) .ZZ./ .ZZZ ,ree -oft%are ,ou$datio$/ 1$c. ;. ,ra$<li$ -t/ ,ifth ,loor/ :osto$/ 85 ! ..!-.=!. *-5 9veryo$e is permitted to copy a$d distribute verbatim copies of this lice$se docume$t/ but cha$gi$g it is $ot allo%ed. Y2his is the first released versio$ of the +esser (#+. 1t also cou$ts as the successor of the ()* +ibrary #ublic +ice$se/ versio$ $umber ...[ #reamble 2he lice$ses for most soft%are are desig$ed to ta<e a%ay your freedom to share a$d cha$ge it. :y co$trast/ the ()* (e$eral #ublic +ice$ses are i$te$ded to guara$tee your freedom to share a$d cha$ge free soft%are--to ma<e sure the soft%are is free for all its users. 2his lice$se/ the +esser (e$eral #ublic +ice$se/ applies to some specially desig$ated soft%are pac<ages--typically libraries--of the ,ree -oft%are ,ou$datio$ a$d other authors %ho decide to use it. 6ou ca$ use it too/ but %e suggest you first thi$< carefully about %hether this lice$se or the ordi$ary (e$eral #ublic +ice$se is the better strategy to use i$ a$y particular case/ based o$ the expla$atio$s belo%. 0he$ %e spea< of free soft%are/ %e are referri$g to freedom of use/ $ot price. 4ur (e$eral #ublic +ice$ses are desig$ed to ma<e sure that you have the freedom to distribute copies of free soft%are (a$d charge for this service if you %a$t)& that you receive source code or ca$ get it if you %a$t it& that you ca$ cha$ge the soft%are a$d use pieces of it i$ $e% free programs& a$d that you are i$formed that you ca$ do these thi$gs. 2o protect your rights/ %e $eed to ma<e restrictio$s that forbid distributors to de$y you these rights or to as< you to surre$der these rights. 2hese restrictio$s tra$slate to certai$ respo$sibilities for you if you distribute copies of the library or if you modify it. ,or example/ if you distribute copies of the library/ %hether gratis or for a fee/ you must give the recipie$ts all the rights that %e gave you. 6ou must ma<e sure that they/ too/ receive or ca$ get the source code. 1f you li$< other code %ith the library/ you must provide complete object files to the recipie$ts/ so that they ca$ reli$< them %ith the library after ma<i$g cha$ges / he$ce the versio$

Pentaho Data Integration TM

S oon !ser "#ide 2&&

to the library a$d recompili$g it. 5$d you must sho% them these terms so they <$o% their rights. 0e protect your rights %ith a t%o-step method@ (.) %e copyright the library/ a$d ( ) %e offer you this lice$se/ %hich gives you legal permissio$ to copy/ distribute a$d'or modify the library. 2o protect each distributor/ %e %a$t to ma<e it very clear that there is $o %arra$ty for the free library. 5lso/ if the library is modified by someo$e else a$d passed o$/ the recipie$ts should <$o% that %hat they have is $ot the origi$al versio$/ so that the origi$al authorNs reputatio$ %ill $ot be affected by problems that might be i$troduced by others. ,i$ally/ soft%are pate$ts pose a co$sta$t threat to the existe$ce of a$y free program. 0e %a$t to ma<e sure that a compa$y ca$$ot effectively restrict the users of a free program by obtai$i$g a restrictive lice$se from a pate$t holder. 2herefore/ %e i$sist that a$y pate$t lice$se obtai$ed for a versio$ of the library must be co$siste$t %ith the full freedom of use specified i$ this lice$se. 8ost ()* soft%are/ i$cludi$g some libraries/ is covered by the ordi$ary ()* (e$eral #ublic +ice$se. 2his lice$se/ the ()* +esser (e$eral #ublic +ice$se/ applies to certai$ desig$ated libraries/ a$d is ?uite differe$t from the ordi$ary (e$eral #ublic +ice$se. 0e use this lice$se for certai$ libraries i$ order to permit li$<i$g those libraries i$to $o$-free programs. 0he$ a program is li$<ed %ith a library/ %hether statically or usi$g a shared library/ the combi$atio$ of the t%o is legally spea<i$g a combi$ed %or</ a derivative of the origi$al library. 2he ordi$ary (e$eral #ublic +ice$se therefore permits such li$<i$g o$ly if the e$tire combi$atio$ fits its criteria of freedom. 2he +esser (e$eral #ublic +ice$se permits more lax criteria for li$<i$g other code %ith the library. 0e call this lice$se the W+esserW (e$eral #ublic +ice$se because it does +ess to protect the userNs freedom tha$ the ordi$ary (e$eral #ublic +ice$se. 1t also provides other free soft%are developers +ess of a$ adva$tage over competi$g $o$-free programs. 2hese disadva$tages are the reaso$ %e use the ordi$ary (e$eral #ublic +ice$se for ma$y libraries. 3o%ever/ the +esser lice$se provides adva$tages i$ certai$ special circumsta$ces. ,or example/ o$ rare occasio$s/ there may be a special $eed to e$courage the %idest possible use of a certai$ library/ so that it becomes a de-facto sta$dard. 2o achieve this/ $o$-free programs must be allo%ed to use the library. 5 more fre?ue$t case is that a free library does the same job as %idely used $o$-free libraries. 1$ this case/ there is little to gai$ by Pentaho Data Integration TM S oon !ser "#ide 2&6

limiti$g the free library to free soft%are o$ly/ so %e use the +esser (e$eral #ublic +ice$se. 1$ other cases/ permissio$ to use a particular library i$ $o$-free programs e$ables a greater $umber of people to use a large body of free soft%are. ,or example/ permissio$ to use the ()* C +ibrary i$ $o$-free programs e$ables ma$y more people to use the %hole ()* operati$g system/ as %ell as its varia$t/ the ()*'+i$ux operati$g system. 5lthough the +esser (e$eral #ublic +ice$se is +ess protective of the usersN freedom/ it does e$sure that the user of a program that is li$<ed %ith the +ibrary has the freedom a$d the %here%ithal to ru$ that program usi$g a modified versio$ of the +ibrary. 2he precise terms a$d co$ditio$s for copyi$g/ distributio$ a$d modificatio$ follo%. #ay close atte$tio$ to the differe$ce bet%ee$ a= W%or< based o$ the libraryW a$d a W%or< that uses the libraryW. 2he former co$tai$s code derived from the library/ %hereas the latter must be combi$ed %ith the library i$ order to ru$. ()* +9--97 (9)975+ #*:+1C +1C9)-9 2978- 5)> C4)>1214)- ,47 C4#61)(/ >1-271:*214) 5)> 84>1,1C5214) !. 2his +ice$se 5greeme$t applies to a$y soft%are library or other program %hich co$tai$s a $otice placed by the copyright holder or other authoriCed party sayi$g it may be distributed u$der the terms of this +esser (e$eral #ublic +ice$se (also called Wthis +ice$seW). 9ach lice$see is addressed as WyouW. 5 WlibraryW mea$s a collectio$ of soft%are fu$ctio$s a$d'or data prepared so as to be co$ve$ie$tly li$<ed %ith applicatio$ programs(%hich use some of those fu$ctio$s a$d data) to form executables. 2he W+ibraryW/ belo%/ refers to a$y such soft%are library or %or< %hich has bee$ distributed u$der these terms. 5 W%or< based o$ the +ibraryW mea$s either the +ibrary or a$y derivative %or< u$der copyright la%@ that is to say/ a %or< co$tai$i$g the +ibrary or a portio$ of it/ either verbatim or %ith modificatio$s a$d'or tra$slated straightfor%ardly i$to a$other la$guage. (3erei$after/ tra$slatio$ is i$cluded %ithout limitatio$ i$ the term Wmodificatio$W.) W-ource codeW for a %or< mea$s the preferred form of the %or< for ma<i$g modificatio$s to it. ,or a library/ complete source code mea$s all the source code for all modules it co$tai$s/ plus a$y associated i$terface defi$itio$ files/ plus the scripts used to co$trol compilatio$ a$d i$stallatio$ of the library.

Pentaho Data Integration TM

S oon !ser "#ide 2&7

5ctivities other tha$ copyi$g/ distributio$ a$d modificatio$ are $ot covered by this +ice$se& they are outside its scope. 2he act of ru$$i$g a program usi$g the +ibrary is $ot restricted/ a$d output from such a program is covered o$ly if its co$te$ts co$stitute a %or< based o$ the +ibrary (i$depe$de$t of the use of the +ibrary i$ a tool for %riti$g it). 0hether that is true depe$ds o$ %hat the +ibrary does a$d %hat the program that uses the +ibrary does. .. 6ou may copy a$d distribute verbatim copies of the +ibraryNs complete source code as you receive it/ i$ a$y medium/ provided that you co$spicuously a$d appropriately publish o$ each copy a$ appropriate copyright $otice a$d disclaimer of %arra$ty& <eep i$tact all the $otices that refer to this +ice$se a$d to the abse$ce of a$y %arra$ty& a$d distribute a copy of this +ice$se alo$g %ith the +ibrary. 6ou may charge a fee for the physical act of tra$sferri$g a copy/ a$d you may at your optio$ offer %arra$ty protectio$ i$ excha$ge for afee. . 6ou may modify your copy or copies of the +ibrary or a$y portio$ of it/ thus formi$g a %or< based o$ the +ibrary/ a$d copy a$d distribute such modificatio$s or %or< u$der the terms of -ectio$ . above/ provided that you also meet all of these co$ditio$s@ a) 2he modified %or< must itself be a soft%are library. b) 6ou must cause the files modified to carry promi$e$t $otices stati$g that you cha$ged the files a$d the date of a$y cha$ge. c) 6ou must cause the %hole of the %or< to be lice$sed at $o charge to all third parties u$der the terms of this +ice$se. d) 1f a facility i$ the modified +ibrary refers to a fu$ctio$ or a table of data to be supplied by a$ applicatio$ program that uses the facility/ other tha$ as a$ argume$t passed %he$ the facility is i$vo<ed/ the$ you must ma<e a good faith effort to e$sure that/ i$ the eve$t a$ applicatio$ does $ot supply such fu$ctio$ or table/ the facility still operates/ a$d performs %hatever part of its purpose remai$s mea$i$gful. (,or example/ a fu$ctio$ i$ a library to compute s?uare roots has a purpose that is e$tirely %ell-defi$ed i$depe$de$t of the applicatio$. 2herefore/ -ubsectio$ d re?uires that a$y applicatio$-supplied fu$ctio$ or table used by this fu$ctio$ must be optio$al@ if the applicatio$ does $ot supply it/ the s?uare root fu$ctio$ must still compute s?uare roots.) 2hese re?uireme$ts apply to the modified %or< as a %hole. 1f ide$tifiable sectio$s of that %or< are $ot derived from the +ibrary/ a$d ca$ be reaso$ably co$sidered i$depe$de$t a$d separate %or<s i$ themselves/ the$ this +ice$se/ a$d its terms/ do $ot apply to those sectio$s %he$ you distribute them as separate %or<s. :ut %he$ you distribute the same sectio$s as part of a %hole %hich is a %or< based o$ the +ibrary/ the distributio$ of the %hole must be o$ the terms of

Pentaho Data Integration TM

S oon !ser "#ide 2&*

this +ice$se/ %hose permissio$s for other lice$sees exte$d to the e$tire %hole/ a$d thus to each a$d every part regardless of %ho %rote it. 2hus/ it is $ot the i$te$t of this sectio$ to claim rights or co$test your rights to %or< %ritte$ e$tirely by you& rather/ the i$te$t is to exercise the right to co$trol the distributio$ of derivative or collective %or<s based o$ the +ibrary. 1$ additio$/ mere aggregatio$ of a$other %or< $ot based o$ the +ibrary %ith the +ibrary (or %ith a %or< based o$ the +ibrary) o$ a volume of a storage or distributio$ medium does $ot bri$g the other %or< u$der the scope of this +ice$se. =. 6ou may opt to apply the terms of the ordi$ary ()* (e$eral #ublic +ice$se i$stead of this +ice$se to a give$ copy of the +ibrary. 2o do this/ you must alter all the $otices that refer to this +ice$se/ so that they refer to the ordi$ary ()* (e$eral #ublic +ice$se/ versio$ a $e%er versio$ tha$ versio$ / i$stead of to this +ice$se. (1f of ordi$ary ()* (e$eral #ublic +ice$se has

appeared/ the$ you ca$ specify that versio$ i$stead if you %a$t.) >o $ot ma<e a$y other cha$ge i$ these $otices. 4$ce this cha$ge is made i$ a give$ copy/ it is irreversible for that copy/ so the ordi$ary ()* (e$eral #ublic +ice$se applies to all subse?ue$t copies a$d derivative %or<s made from that copy. 2his optio$ is useful %he$ you %ish to copy part of the code of= the +ibrary i$to a program that is $ot a library. S. 6ou may copy a$d distribute the +ibrary (or a portio$ or derivative of it/ u$der -ectio$ . a$d ) i$ object code or executable form u$der the terms of -ectio$s above provided that you accompa$y it %ith the complete correspo$di$g above o$ a

machi$e-readable source code/ %hich must be distributed u$der the terms of -ectio$s . a$d medium customarily used for soft%are i$tercha$ge.1f distributio$ of object code is made by offeri$g access to copy from a desig$ated place/ the$ offeri$g e?uivale$t access to copy the source code from the same place satisfies the re?uireme$t to distribute the source code/ eve$ though third parties are $ot compelled to copy the source alo$g %ith the object code. ;. 5 program that co$tai$s $o derivative of a$y portio$ of the +ibrary/ but is desig$ed to %or< %ith the +ibrary by bei$g compiled or li$<ed %ith it/ is called a W%or< that uses the +ibraryW. -uch a %or</ i$ isolatio$/ is $ot a derivative %or< of the +ibrary/ a$d therefore falls outside the scope of this +ice$se. 3o%ever/ li$<i$g a W%or< that uses the +ibraryW %ith the +ibrary creates a$ executable that is a derivative of the +ibrary (because it co$tai$s portio$s of the +ibrary)/ rather tha$ a W%or< that uses the libraryW. 2he executable is Pentaho Data Integration TM S oon !ser "#ide 2&-

therefore covered by this +ice$se. -ectio$ " states terms for distributio$ of such executables. 0he$ a W%or< that uses the +ibraryW uses material from a header file that is part of the +ibrary/ the object code for the %or< may be a derivative %or< of the +ibrary eve$ though the source code is $ot. 0hether this is true is especially sig$ifica$t if the %or< ca$ be li$<ed %ithout the +ibrary/ or if the %or< is itself a library. 2he threshold for this to be true is $ot precisely defi$ed by la%. 1f such a$ object file uses o$ly $umerical parameters/ data structure layouts a$d accessors/ a$d small macros a$d small i$li$e fu$ctio$s (te$ li$es or less i$ le$gth)/ the$ the use of the object file is u$restricted/ regardless of %hether it is legally a derivative %or<. (9xecutables co$tai$i$g this object code plus portio$s of the +ibrary %ill still fall u$der -ectio$ ".) 4ther%ise/ if the %or< is a derivative of the +ibrary/ you may distribute the object code for the %or< u$der the terms of -ectio$ ". 5$y executables co$tai$i$g that %or< also fall u$der -ectio$ "/ %hether or $ot they are li$<ed directly %ith the +ibrary itself. ". 5s a$ exceptio$ to the -ectio$s above/ you may also combi$e or li$< a W%or< that uses the +ibraryW %ith the +ibrary to produce a %or< co$tai$i$g portio$s of the +ibrary/ a$d distribute that %or< u$der terms of your choice/ provided that the terms permit modificatio$ of the %or< for the customerNs o%$ use a$d reverse e$gi$eeri$g for debuggi$g such modificatio$s. 6ou must give promi$e$t $otice %ith each copy of the %or< that the +ibrary is used i$ it a$d that the +ibrary a$d its use are covered by this +ice$se. 6ou must supply a copy of this +ice$se. 1f the %or< duri$g executio$ displays copyright $otices/ you must i$clude the copyright $otice for the +ibrary amo$g them/ as %ell as a refere$ce directi$g the user to the copy of this +ice$se. 5lso/ you must do o$e of these thi$gs@ a) 5ccompa$y the %or< %ith the complete correspo$di$g machi$e-readable source code for the +ibrary i$cludi$g %hatever cha$ges %ere used i$ the %or< (%hich must be distributed u$der -ectio$s . a$d above)& a$d/ if the %or< is a$ executable li$<ed %ith the +ibrary/ %ith the complete machi$e-readable W%or< that uses the +ibraryW/ as object code a$d'or source code/ so that the user ca$ modify the +ibrary a$d the$ reli$< to produce a modified executable co$tai$i$g the modified +ibrary. (1t is u$derstood that the user %ho cha$ges the co$te$ts of defi$itio$s files i$ the +ibrary %ill $ot $ecessarily be able to recompile the applicatio$ to use the modified defi$itio$s.)

Pentaho Data Integration TM

S oon !ser "#ide 260

b) *se a suitable shared library mecha$ism for li$<i$g %ith the +ibrary. 5 suitable mecha$ism is o$e that (.) uses at ru$ time a copy of the library already prese$t o$ the userNs computer system/ rather tha$ copyi$g library fu$ctio$s i$to the executable/ a$d ( ) %ill operate properly %ith a modified versio$ of the library/ if the user i$stalls o$e/ as lo$g as the modified versio$ is i$terface-compatible %ith the versio$ that the %or< %as made %ith. c) 5ccompa$y the %or< %ith a %ritte$ offer/ valid for at least three years/ to give the same user the materials specified i$ -ubsectio$ "a/ above/ for a charge $o more tha$ the cost of performi$g this distributio$. d) 1f distributio$ of the %or< is made by offeri$g access to copy from a desig$ated place/ offer e?uivale$t access to copy the above specified materials from the same place. e) Jerify that the user has already received a copy of these materials or that you have already se$t this user a copy. ,or a$ executable/ the re?uired form of the W%or< that uses the +ibraryW must i$clude a$y data a$d utility programs $eeded for reproduci$g the executable from it. 3o%ever/ as a special exceptio$/ the materials to be distributed $eed $ot i$clude a$ythi$g that is $ormally distributed (i$ either source or bi$ary form) %ith the major compo$e$ts (compiler/ <er$el/ a$d so o$) of the operati$g system o$ %hich the executable ru$s/ u$less that compo$e$t itself accompa$ies the executable. 1t may happe$ that this re?uireme$t co$tradicts the lice$se restrictio$s of other proprietary libraries that do $ot $ormally accompa$y the operati$g system. -uch a co$tradictio$ mea$s you ca$$ot use both them a$d the +ibrary together i$ a$ executable that you distribute. X. 6ou may place library facilities that are a %or< based o$ the +ibrary sideby-side i$ a si$gle library together %ith other library facilities $ot covered by this +ice$se/ a$d distribute such a combi$ed library/ provided that the separate distributio$ of the %or< based o$ the +ibrary a$d of the other library facilities is other%ise permitted/ a$d provided that you do these t%o thi$gs@ a) 5ccompa$y the combi$ed library %ith a copy of the same %or< based o$ the +ibrary/ u$combi$ed %ith a$y other library facilities. 2his must be distributed u$der the terms of the -ectio$s above. b) (ive promi$e$t $otice %ith the combi$ed library of the fact that part of it is a %or< based o$ the +ibrary/ a$d explai$i$g %here to fi$d the accompa$yi$g u$combi$ed form of the same %or<. Pentaho Data Integration TM S oon !ser "#ide 261

\. 6ou may $ot copy/ modify/ sublice$se/ li$< %ith/ or distribute the +ibrary except as expressly provided u$der this +ice$se. 5$y attempt other%ise to copy/ modify/ sublice$se/ li$< %ith/ or distribute the +ibrary is void/ a$d %ill automatically termi$ate your rights u$der this +ice$se. 3o%ever/ parties %ho have received copies/ or rights/ from you u$der this +ice$se %ill $ot have their lice$ses termi$ated so lo$g as such parties remai$ i$ full complia$ce. Z. 6ou are $ot re?uired to accept this +ice$se/ si$ce you have $ot sig$ed it. 3o%ever/ $othi$g else gra$ts you permissio$ to modify or distribute the +ibrary or its derivative %or<s. 2hese actio$s are prohibited by la% if you do $ot accept this +ice$se. 2herefore/ by modifyi$g or distributi$g the +ibrary (or a$y %or< based o$ the +ibrary)/ you i$dicate your accepta$ce of this +ice$se to do so/ a$d all its terms a$d co$ditio$s for copyi$g/ distributi$g or modifyi$g the +ibrary or %or<s based o$ it. .!. 9ach time you redistribute the +ibrary (or a$y %or< based o$ the +ibrary)/ the recipie$t automatically receives a lice$se from the origi$al lice$sor to copy/ distribute/ li$< %ith or modify the +ibrary subject to these terms a$d co$ditio$s. 6ou may $ot impose a$y further restrictio$s o$ the recipie$tsN exercise of the rights gra$ted herei$. 6ou are $ot respo$sible for e$forci$g complia$ce by third parties %ith this +ice$se. ... 1f/ as a co$se?ue$ce of a court judgme$t or allegatio$ of pate$t i$fri$geme$t or for a$y other reaso$ ($ot limited to pate$t issues)/ co$ditio$s are imposed o$ you (%hether by court order/ agreeme$t or other%ise) that co$tradict the co$ditio$s of this +ice$se/ they do $ot excuse you from the co$ditio$s of this +ice$se. 1f you ca$$ot distribute so as to satisfy simulta$eously your obligatio$s u$der this +ice$se a$d a$y other perti$e$t obligatio$s/ the$ as a co$se?ue$ce you may $ot distribute the +ibrary at all. ,or example/ if a pate$t lice$se %ould $ot permit royalty-free redistributio$ of the +ibrary by all those %ho receive copies directly or i$directly through you/ the$ the o$ly %ay you could satisfy both it a$d this +ice$se %ould be to refrai$ e$tirely from distributio$ of the +ibrary. 1f a$y portio$ of this sectio$ is held i$valid or u$e$forceable u$der a$y particular circumsta$ce/ the bala$ce of the sectio$ is i$te$ded to apply/ a$d the sectio$ as a %hole is i$te$ded to apply i$ other circumsta$ces. 1t is $ot the purpose of this sectio$ to i$duce you to i$fri$ge a$y pate$ts or other property right claims or to co$test validity of a$y such claims& this sectio$ has the sole purpose of protecti$g the i$tegrity of the free soft%are distributio$ system %hich is impleme$ted by public lice$se practices. 8a$y people have made ge$erous co$tributio$s to the %ide ra$ge of soft%are distributed through that system i$ relia$ce o$ co$siste$t applicatio$ of that Pentaho Data Integration TM S oon !ser "#ide 262

system& it is up to the author'do$or to decide if he or she is %illi$g to distribute soft%are through a$y other system a$d a lice$see ca$$ot impose that choice. 2his sectio$ is i$te$ded to ma<e thoroughly clear %hat is believed to be a co$se?ue$ce of the rest of this +ice$se. . . 1f the distributio$ a$d'or use of the +ibrary is restricted i$ certai$ cou$tries either by pate$ts or by copyrighted i$terfaces/ the origi$al copyright holder %ho places the +ibrary u$der this +ice$se may add a$ explicit geographical distributio$ limitatio$ excludi$g those cou$tries/ so that distributio$ is permitted o$ly i$ or amo$g cou$tries $ot thus excluded. 1$ such case/ this +ice$se i$corporates the limitatio$ as if %ritte$ i$ the body of this +ice$se. .=. 2he ,ree -oft%are ,ou$datio$ may publish revised a$d'or $e% versio$s of the +esser (e$eral #ublic +ice$se from time to time. -uch $e% versio$s %ill be similar i$ spirit to the prese$t versio$/ but may differ i$ detail to address $e% problems or co$cer$s. 9ach versio$ is give$ a disti$guishi$g versio$ $umber. 1f the +ibrary specifies a versio$ $umber of this +ice$se %hich applies to it a$d Wa$y later versio$W/ you have the optio$ of follo%i$g the terms a$d co$ditio$s either of that versio$ or of a$y later versio$ published by the ,ree -oft%are ,ou$datio$. 1f the +ibrary does $ot specify a lice$se versio$ $umber/ you may choose a$y versio$ ever published by the ,ree -oft%are ,ou$datio$. .S. 1f you %a$t to i$corporate parts of the +ibrary i$to other free programs %hose distributio$ co$ditio$s are i$compatible %ith these/ %rite to the author to as< for permissio$. ,or soft%are %hich is copyrighted by the ,ree -oft%are ,ou$datio$/ %rite to the ,ree -oft%are ,ou$datio$& %e sometimes ma<e exceptio$s for this. 4ur decisio$ %ill be guided by the t%o goals of preservi$g the free status of all derivatives of our free soft%are a$d of promoti$g the shari$g a$d reuse of soft%are ge$erally. )4 05775)26 .;. :9C5*-9 239 +1:7576 1- +1C9)-9> ,799 4, C357(9/ 23979 1- )4 05775)26 ,47 239 +1:7576/ 24 239 9E29)2 #9781229> :6 5##+1C5:+9 +50. 9EC9#2 039) 4239701-9 -2529> 1) 07121)( 239 C4#671(32 34+>97- 5)>'47 42397 #57219- #74J1>9 239 +1:7576 W5- 1-W 01234*2 05775)26 4, 5)6 K1)>/ 912397 9E#79--9> 47 18#+19>/ 1)C+*>1)(/ :*2 )42 +18129> 24/ 239 18#+19> 05775)219- 4, 897C35)25:1+126 5)> ,12)9-- ,47 5 #5721C*+57 #*7#4-9. 239 9)2179 71-K 5- 24 239 B*5+126 5)> #97,4785)C9 4, 239 +1:7576 10123 64*. -34*+> 239 +1:7576 #74J9 >9,9C21J9/ 64* 5--*89 239 C4-2 4, 5++ )9C9--576 -97J1C1)(/ 79#517 47 C4779C214).

Pentaho Data Integration TM

S oon !ser "#ide 26$

.". 1) )4 9J9)2 *)+9-- 79B*179> :6 5##+1C5:+9 +50 47 5(799> 24 1) 07121)( 01++ 5)6 C4#671(32 34+>97/ 47 5)6 42397 #5726 034 856 84>1,6 5)>'47 79>1-271:*29 239 +1:7576 5- #9781229> 5:4J9/ :9 +15:+9 24 64* ,47 >585(9-/ 1)C+*>1)( 5)6 (9)975+/ -#9C15+/ 1)C1>9)25+ 47 C4)-9B*9)215+ >585(9- 571-1)( 4*2 4, 239 *-9 47 1)5:1+126 24 *-9 239 +1:7576 (1)C+*>1)( :*2 )42 +18129> 24 +4-- 4, >525 47 >525 :91)( 79)>979> 1)5CC*7529 47 +4--9- -*-251)9> :6 64* 47 2317> #57219- 47 5 ,51+*79 4, 239 +1:7576 24 4#97529 0123 5)6 42397 -4,20579)/ 9J9) 1, -*C3 34+>97 47 42397 #5726 35- :99) 5>J1-9> 4, 239 #4--1:1+126 4, -*C3 >585(9-. 9)> 4, 2978- 5)> C4)>1214)3o% to 5pply 2hese 2erms to 6our )e% +ibraries 1f you develop a $e% library/ a$d you %a$t it to be of the greatest possible use to the public/ %e recomme$d ma<i$g it free soft%are that everyo$e ca$ redistribute a$d cha$ge. 6ou ca$ do so by permitti$g redistributio$ u$der these terms (or/ alter$atively/ u$der the terms of the ordi$ary (e$eral #ublic +ice$se). 2o apply these terms/ attach the follo%i$g $otices to the library. 1t is safest to attach them to the start of each source file to most effectively co$vey the exclusio$ of %arra$ty& a$d each file should have at least the WcopyrightW li$e a$d a poi$ter to %here the full $otice is fou$d. Go$e li$e to give the libraryNs $ame a$d a brief idea of %hat it does.H Copyright (C) GyearH G$ame of authorH 2his library is free soft%are& you ca$ redistribute it a$d'or modify it u$der the terms of the ()* +esser (e$eral #ublic +ice$se as published by the ,ree -oft%are ,ou$datio$& either versio$ of the +ice$se/ or (at your optio$) a$y later versio$. 2his library is distributed i$ the hope that it %ill be useful/ but 01234*2 5)6 05775)26& %ithout eve$ the implied %arra$ty of 897C35)25:1+126 or ,12)9-- ,47 5 #5721C*+57 #*7#4-9. -ee the ()* +esser (e$eral #ublic +ice$se for more details. 6ou should have received a copy of the ()* +esser (e$eral #ublic +ice$se alo$g %ith this library& if $ot/ %rite to the ,ree -oft%are ,ou$datio$/ 1$c./ ;. ,ra$<li$ -t/ ,ifth ,loor/ :osto$/ 85 ! ..!-.=!. *-5 5lso add i$formatio$ o$ ho% to co$tact you by electro$ic a$d paper mail. 6ou should also get your employer (if you %or< as a programmer) or your school/ if a$y/ to sig$ a Wcopyright disclaimerW for the library/ if $ecessary. 3ere is a sample& alter the $ames@ ..

Pentaho Data Integration TM

S oon !ser "#ide 26%

6oyody$e/ 1$c./ hereby disclaims all copyright i$terest i$ the library ],robN (a library for t%ea<i$g <$obs) %ritte$ by Mames 7a$dom 3ac<er. Gsig$ature of 2y Coo$H/ . 5pril .ZZ! 2y Coo$/ #reside$t of Jice 2hatNs all there is to it^

Pentaho Data Integration TM

S oon !ser "#ide 26&

Вам также может понравиться