Академический Документы
Профессиональный Документы
Культура Документы
0 User Guide
Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective o ners. !or the latest information" please visit our eb site at
.pentaho.org
1. Contents
#. Contents................................................................................................................................................. 2 2. About $his %ocument.............................................................................................................................. & 2.#. 'hat it is...................................................................................................................................... & 2.2. 'hat it is not................................................................................................................................ & (. )ntroduction to *poon.............................................................................................................................. #0 (.#. 'hat is *poon+............................................................................................................................. #0 (.2. )nstallation................................................................................................................................... #0 (.(. ,aunching *poon........................................................................................................................... ## (.-. *upported platforms...................................................................................................................... ## (... /no n )ssues............................................................................................................................... ## (.0. *creen shots................................................................................................................................. #2 (.7. Command line options................................................................................................................... #( (.1. Repository.................................................................................................................................... #. (.1.#. Repository Auto2,ogin......................................................................................................... #0 (.&. ,icense......................................................................................................................................... #0 (.#0. %efinitions.................................................................................................................................. #7 (.#0.#. $ransformation %efinitions................................................................................................. #7 (.##. $oolbar....................................................................................................................................... #1 (.#2. 3ptions...................................................................................................................................... #& (.#2.#. 4eneral $ab...................................................................................................................... #& (.#2.2. ,ook 5 !eel tab................................................................................................................. 2# (.#-. *earch 6eta data........................................................................................................................ 2(.#.. *et environment variable............................................................................................................. 2(.#0. 78ecution log history................................................................................................................... 2. (.#7. Replay........................................................................................................................................ 2. (.#1. 4enerate mapping against target step........................................................................................... 20 (.#1.#. 4enerate mappings e8ample.............................................................................................. 20 (.#&. *afe mode.................................................................................................................................. 27 (.20. 'elcome *creen.......................................................................................................................... 27 -. Creating a $ransformation or 9ob.............................................................................................................. (# -.#. :otes........................................................................................................................................... (# -.2. *creen shot.................................................................................................................................. (2 -.(. Creating a ne database connection............................................................................................... (2 -.(.#. 4eneral.............................................................................................................................. (( -.(.2. Pooling............................................................................................................................... (( -.(.(. 6y*;,............................................................................................................................... (-.(.-. 3racle................................................................................................................................ (-.(... )nformi8............................................................................................................................. (-.(.0. *;, *erver......................................................................................................................... (. -.(.7. *AP R<(............................................................................................................................. (0
-.(.1. 4eneric.............................................................................................................................. (0 -.(.&. 3ptions.............................................................................................................................. (7 -.(.#0. *;,.................................................................................................................................. (7 -.(.##. Cluster............................................................................................................................. (7 -.(.#2. Advanced......................................................................................................................... (1 -.(.#(. $est a connection.............................................................................................................. (1 -.(.#-. 78plore............................................................................................................................. (1 -.(.#.. !eature ,ist...................................................................................................................... (1 -.-. 7diting a connection...................................................................................................................... (1 -... %uplicate a connection................................................................................................................... (1 -.0. Copy to clipboard.......................................................................................................................... (1 -.7. 78ecute *;, commands on a connection......................................................................................... (1 -.1. Clear %= Cache option................................................................................................................... (& -.&. ;uoting........................................................................................................................................ (& -.#0. %atabase >sage 4rid................................................................................................................... (& -.##. Configuring 9:%) connections....................................................................................................... -2 -.#2. >nsupported databases................................................................................................................ -.. *;, 7ditor.............................................................................................................................................. -. ..#. %escription................................................................................................................................... -. ..2. ,imitations.................................................................................................................................... -. 0. %atabase 78plorer................................................................................................................................... -0 7. ?ops...................................................................................................................................................... -7 7.#. %escription................................................................................................................................... -7 7.#.#. $ransformation ?ops........................................................................................................... -7 7.#.2. 9ob ?ops............................................................................................................................ -7 7.2. Creating A ?op............................................................................................................................. -1 7.(. ,oops........................................................................................................................................... -1 7.-. 6i8ing ro s@ trap detector............................................................................................................. -1 7... $ransformation hop colors............................................................................................................. -& 1. Aariables................................................................................................................................................ .0 1.#. Aariable usage.............................................................................................................................. .0 1.2. Aariable scope.............................................................................................................................. .0 1.2.#. 7nvironment variables......................................................................................................... .0 1.2.2. /ettle variables................................................................................................................... .# 1.2.(. )nternal variables................................................................................................................ .# &. $ransformation *ettings........................................................................................................................... .2 &.#. %escription................................................................................................................................... .2 &.2. $ransformation $ab....................................................................................................................... .2 &.(. ,ogging........................................................................................................................................ .2 &.-. %ates........................................................................................................................................... .( &... %ependencies............................................................................................................................... .( &.0. 6iscellaneous............................................................................................................................... .( &.7. Partitioning................................................................................................................................... .&.1. *;, =utton................................................................................................................................... .#0. $ransformation *teps............................................................................................................................. ..
#0.#. %escription................................................................................................................................. .. #0.2. ,aunching several copies of a step................................................................................................ .. #0.(. %istribute or copy+...................................................................................................................... .7 #0.-. *tep error handling...................................................................................................................... .1 #0... Apache Airtual !ile *ystem BA!*C support ..................................................................................... 0# #0...#. 78ample@ Referencing remote Dob files................................................................................ 0# #0...2. 78ample@ Referencing files inside a Eip................................................................................ 02 #0.0. $ransformation *tep $ypes........................................................................................................... 0( #0.0.#. $e8t !ile )nput.................................................................................................................. 0( #0.0.2. $able input....................................................................................................................... 72 #0.0.(. 4et *ystem )nfo................................................................................................................ 71 #0.0.-. 4enerate Ro s................................................................................................................. 1# #0.0... %e2serialiFe from file Bformerly Cube )nputC......................................................................... 12 #0.0.0. G=ase input...................................................................................................................... 1( #0.0.7. 78cel input........................................................................................................................ 1#0.0.1. 4et !ile :ames.................................................................................................................. 1& #0.0.&. $e8t !ile 3utput................................................................................................................ &0 #0.0.#0. $able output................................................................................................................... &( #0.0.##. )nsert < >pdate............................................................................................................... &. #0.0.#2. >pdate........................................................................................................................... &7 #0.0.#(. %elete............................................................................................................................ &1 #0.0.#-. *erialiFe to file Bformerly Cube !ile 3utputC........................................................................ && #0.0.#.. G6, 3utput..................................................................................................................... #00 #0.0.#0. 78cel 3utput................................................................................................................... #02 #0.0.#7. 6icrosoft Access 3utput................................................................................................... #0#0.0.#1. %atabase lookup.............................................................................................................. #0. #0.0.#&. *tream lookup................................................................................................................. #07 #0.0.20. Call %= Procedure............................................................................................................ #0& #0.0.2#. ?$$P Client..................................................................................................................... ### #0.0.22. *elect values................................................................................................................... ##2 #0.0.2(. !ilter ro s....................................................................................................................... ###0.0.2-. *ort ro s........................................................................................................................ ##0 #0.0.2.. Add seHuence................................................................................................................. ##7 #0.0.20. %ummy Bdo nothingC....................................................................................................... ##& #0.0.27. Ro :ormaliser............................................................................................................... #20 #0.0.21. *plit !ields...................................................................................................................... #22 #0.0.(0. >niHue ro s.................................................................................................................... #2. #0.0.(#. 4roup =y........................................................................................................................ #20 #0.0.(2. :ull )f............................................................................................................................. #21 #0.0.((. Calculator....................................................................................................................... #2& #0.0.(-. G6, Add......................................................................................................................... #(# #0.0.(.. Add constants................................................................................................................. #(#0.0.(0. Ro %enormaliser........................................................................................................... #(. #0.0.(7. !lattener......................................................................................................................... #(0 #0.0.(1. Aalue 6apper.................................................................................................................. #(1
#0.0.(&. =locking step................................................................................................................... #(& #0.0.-0. 9oin Ro s BCartesian productC.......................................................................................... #-0 #0.0.-#. %atabase 9oin................................................................................................................. #-2 #0.0.-2. 6erge ro s..................................................................................................................... #-#0.0.-(. *orted 6erge.................................................................................................................. #-. #0.0.--. 6erge 9oin...................................................................................................................... #-0 #0.0.-.. 9ava*cript Aalues............................................................................................................. #-7 #0.0.-0. 6odified 9ava *cript Aalue................................................................................................ #.#0.0.-7. 78ecute *;, script........................................................................................................... #.0 #0.0.-1. %imension lookup<update................................................................................................. #.1 #0.0.-&. Combination lookup<update.............................................................................................. #0( #0.0..0. 6apping......................................................................................................................... #00 #0.0..#. 4et ro s from result........................................................................................................ #0& #0.0..2. Copy ro s to result.......................................................................................................... #0& #0.0..(. *et Aariable.................................................................................................................... #70 #0.0..-. 4et Aariable.................................................................................................................... #7# #0.0.... 4et files from result......................................................................................................... #72 #0.0..0. *et files in result.............................................................................................................. #7( #0.0..7. )nDector.......................................................................................................................... #7#0.0..1. *ocket reader.................................................................................................................. #7. #0.0..&. *ocket riter................................................................................................................... #7. #0.0.00. Aggregate Ro s.............................................................................................................. #70 #0.0.0#. *treaming G6, )nput....................................................................................................... #77 #0.0.02. Abort ............................................................................................................................. #12 #0.0.0(. 3racle =ulk ,oader ......................................................................................................... #1( #0.0.0-. Append .......................................................................................................................... #1. #0.0.0.. Rege8 7valuation ........................................................................................................... #10 #0.0.00. C*A )nput....................................................................................................................... #11 #0.0.07. !i8ed !ile )nput............................................................................................................... #1& #0.0.01. 6icrosoft Access )nput..................................................................................................... #&# #0.0.0&. ,%AP )nput..................................................................................................................... #&( #0.0.70. Closure 4enerator............................................................................................................ #&. #0.0.7#. 6ondrian )nput............................................................................................................... #&0 #0.0.72. 4et !iles Ro Count........................................................................................................ #&7 #0.0.7(. %ummy Plugin................................................................................................................. #&1 ##. 9ob *ettings.......................................................................................................................................... #&& ##.#. %escription................................................................................................................................. #&& ##.2. 9ob $ab...................................................................................................................................... #&& ##.(. ,og $ab...................................................................................................................................... #&& #2. 9ob 7ntries............................................................................................................................................ 20# #2.#. %escription................................................................................................................................. 20# #2.2. 9ob 7ntry $ypes.......................................................................................................................... 20# #2.2.#. *tart................................................................................................................................ 20# #2.2.2. %ummy 9ob 7ntry.............................................................................................................. 20# #2.2.(. $ransformation................................................................................................................. 202
#2.2.-. 9ob.................................................................................................................................. 20#2.2... *hell................................................................................................................................ 200 #2.2.0. 6ail.................................................................................................................................. 201 #2.2.7. *;,.................................................................................................................................. 2#0 #2.2.1. 4et a file ith !$P............................................................................................................. 2## #2.2.&. $able 78ists...................................................................................................................... 2#( #2.2.#0. !ile 78ists....................................................................................................................... 2##2.2.##. 4et a file ith *!$P......................................................................................................... 2#. #2.2.#2. ?$$P.............................................................................................................................. 2#0 #2.2.#(. Create a file.................................................................................................................... 2#1 #2.2.#-. %elete a file.................................................................................................................... 2#& #2.2.#.. 'ait for a file.................................................................................................................. 220 #2.2.#0. !ile compare................................................................................................................... 22# #2.2.#7. Put a file ith *!$P......................................................................................................... 222 #2.2.#1. Ping a host..................................................................................................................... 22( #2.2.#&. 'ait for.......................................................................................................................... 22#2.2.20. %isplay 6sgbo8 info......................................................................................................... 22. #2.2.2#. Abort Dob........................................................................................................................ 220 #2.2.22. G*, transformation.......................................................................................................... 227 #2.2.2(. Eip files.......................................................................................................................... 221 #2.2.2-. =ulkload into 6y*;,........................................................................................................ 22& #2.2.2.. 4et 6ails from P3P.......................................................................................................... 2(# #2.2.20. %elete !iles..................................................................................................................... 2(2 #2.2.27. *uccess.......................................................................................................................... 2(( #2.2.21. G*% Aalidator.................................................................................................................. 2(#2.2.2&. 'rite to log..................................................................................................................... 2(. #2.2.(0. Copy !iles....................................................................................................................... 2(0 #2.2.(#. %$% Aalidator................................................................................................................. 2(7 #2.2.(2. Put a file ith !$P........................................................................................................... 2(1 #2.2.((. >nFip.............................................................................................................................. 2(& #2.2.(-. %ummy 9ob 7ntry............................................................................................................ 2-0 #(. 4raphical Aie ...................................................................................................................................... 2-# #(.#. %escription................................................................................................................................. 2-# #(.2. Adding steps or Dob entries........................................................................................................... 2-# #(.2.#. Create steps by drag and drop........................................................................................... 2-# #(.(. ?iding a step.............................................................................................................................. 2-2 #(.-. $ransformation *tep options Bright2click menuC.............................................................................. 2-2 #(.-.#. 7dit step........................................................................................................................... 2-2 #(.-.2. 7dit step description.......................................................................................................... 2-2 #(.-.(. %ata movement................................................................................................................ 2-2 #(.-.-. Change number of copies to start....................................................................................... 2-2 #(.-... Copy to clipboard.............................................................................................................. 2-2 #(.-.0. %uplicate *tep................................................................................................................... 2-2 #(.-.7. %elete step....................................................................................................................... 2-2 #(.-.1. ?ide *tep......................................................................................................................... 2-2
#(... 9ob entry options Bright2click menuC.............................................................................................. 2-2 #(...#. 3pen $ransformation<9ob................................................................................................... 2-2 #(...2. 7dit Dob entry.................................................................................................................... 2-2 #(...(. 7dit Dob entry description................................................................................................... 2-( #(...-. Create shado copy of Dob entry........................................................................................ 2-( #(..... Copy selected entries to clipboard BC$R,2CC........................................................................ 2-( #(...0. Align < distribute................................................................................................................ 2-( #(...7. %etach entry..................................................................................................................... 2-( #(...1. %elete all copies of this entry............................................................................................. 2-( #(.0. Adding hops................................................................................................................................ 2-( #-. Running a $ransformation...................................................................................................................... 2-#-.#. Running a $ransformation 3vervie .............................................................................................. 2-#-.2. 78ecution 3ptions........................................................................................................................ 2-#-.2.#. 'here to 78ecute.............................................................................................................. 2-#-.2.2. 3ther 3ptions................................................................................................................... 2-#-.(. *etting up Remote and *lave *ervers............................................................................................ 2-. #-.(.#. 4eneral description............................................................................................................ 2-. #-.(.2. Configuring a remote or slave server................................................................................... 2-. #-.-. Clustering................................................................................................................................... 2-7 #-.-.#. 3vervie .......................................................................................................................... 2-7 #-.-.2. Creating a cluster schema.................................................................................................. 2-7 #-.-.(. 3ptions............................................................................................................................ 2-7 #-.-.-. Running transformations using a cluster.............................................................................. 2-1 #-.-... =asic Clustering 78ample................................................................................................... 2-1 #.. ,ogging................................................................................................................................................ 2.# #..#. ,ogging %escription..................................................................................................................... 2.# #..2. ,og 4rid..................................................................................................................................... 2.# #..2.#. $ransformation ,og 4rid %etails.......................................................................................... 2.# #..2.2. 9ob ,og 4rid..................................................................................................................... 2.2 #..(. =uttons...................................................................................................................................... 2.2 #..(.#. #..-.# $ransformation =uttons .......................................................................................... 2.2 #..(.2. 9ob =uttons...................................................................................................................... 2.. #0. 4rids.................................................................................................................................................... 2.0 #0.#. %escription................................................................................................................................. 2.0 #0.2. >sage......................................................................................................................................... 2.0 #7. Repository 78plorer................................................................................................................................ 2.7 #7.#. %escription................................................................................................................................. 2.7 #7.2. Right click functions..................................................................................................................... 2.7 #7.(. =ackup < Recovery....................................................................................................................... 2.7 #1. *hared obDects...................................................................................................................................... 2.1 #&. APP7:%)G A@ ,4P, ,icense.................................................................................................................... 2.&
An introduction to Pentaho %ata )ntegration in Roland =oumanLs blog@ http@<<rpbouman.blogspot.com<2000<00<pentaho2data2integration2kettle2turns.html :icholas 4oodman is also blogging on /ettle and =)@ http@<< .nicholasgoodman.com
$. Introd#ction to S oon
$.1. )hat is S oon+
/ettle is an acronym for M/ettle 7.$.$.,. 7nvironmentN. $his means it has been designed to help you your 7$$, needs@ the 78traction" $ransformation" $ransportation and ,oading of data. *poon is a graphical user interface that allo s you to design transformations and Dobs that can be run the /ettle tools Pan and /itchen. Pan is a data transformation engine that is capable of performing a multitude of functions such as reading" manipulating and riting data to and from various data sources. /itchen is a program that can e8ecute Dobs designed by *poon in G6, or in a database repository. >sually Dobs are scheduled in batch mode to be run automatically at regular intervals. NOTE: !or a complete description of Pan or /itchen" please refer to the Pan and /itchen user guides. $ransformations and 9obs can describe themselves using an G6, file or can be put in a /ettle database repository. $his information can then be read by Pan or /itchen to e8ecute the described steps in the transformation or run the Dob. )n short" Pentaho %ata )ntegration makes data arehouses easier to build" update and maintainJ ith ith
$.2. Insta,,ation
$he first step is the installation of *un 6icrosystems 9ava Runtime 7nvironment version #.- or higher. Oou can do nload a 9R7 for free at http@<< .Davasoft.com<.
After this" you can simply unFip the Fip2file@ /ettle2(.0.Fip in a directory of your choice. )n the /ettle directory here you unFipped the file" you ill find a number of files. >nder >ni82like environments B*olaris" ,inu8" 6ac3*" PC you ill need to make the shell scripts e8ecutable. 78ecute these commands to make all
ant to make a shortcut under the 'indo s platform an icon is provided@ Mspoon.icoN to set the
$.%. S#
orted ,atfor(s
6icrosoft 'indo s@ all platforms since 'indo s &." including Aista ,inu8 4$/@ on i(10 and 810Q0- processors" AppleLs 3*G@ *olaris@ using a 6otif interface B4$/ optionalC A)G@ using a 6otif interface ?P2>G@ using a 6otif interface B4$/ optionalC !ree=*%@ preliminary support on i(10" not yet on 810Q0orks best on 4nome orks both on Po erPC and )ntel machines
%esigning a $ransformation
%esigning a Dob
$he 6ain tree in the upper2left panel of *poon allo s you to bro se connections along
transformations you currently have open. 'hen designing a transformation" the Core 3bDects palate in the lo er left2panel contains the available steps used to build your transformation including input" output" lookup" transform" Doins" scripting steps and more. 'hen designing a Dob" the Core obDects palate contains the available Dob entries. 'hen designing a Dob" the Core 3bDects bar contains a variety of Dob entry types. Pentaho Data Integration TM S oon !ser "#ide 11
$hese items are described in detail in the chapters belo @ -. %atabase Connections" 7. ?ops" #0. $ransformation *teps" #2. 9ob 7ntries" #(. 4raphical Aie .
>se minimal logging $his is the default basic logging level 4ive detailed logging output *ho very detailed output for debugging purposes. level. 'arning 2 this %etailed logging at a ro
Note: Oou also need to specify the options Iuser" Ipass and Itrans described belo . $he repository details are loaded from the file repositories.8ml in the local directory or in the /ettle directory@ R?367<.kettle< or C@S%ocuments and *ettingsSTusernameUS.kettle on 'indo s. -user=Username $his is the username ith hich you ant to connect to the repository.
>se this option to select the transformation to run from the repository. -job=Job Name >se this option to select the Dob to run from the repository. #mportant Notes:
3n 'indo s"
e advise you to use the /option:value format to avoid command line parsing
problems by the 6*2%3* shell. !ields in italic represent the values that the options use.
)tVs important that if spaces are present in the option values" you use Huotes or double Huotes to keep them together. $ake a look at the e8amples belo for more info.
$.*. 0e ositor1
*poon provides you ith the ability to store transformation and Dob files to the local file system or in the /ettle repository. $he /ettle repository can be housed in any common relational database. $his means that in order to load a transformation from a database repository" you need to connect to this repository. $o do this" you need to define a database connection to this repository. Oou can do this using the repositories dialog you are presented ith hen you start up *poon@
$he information concerning repositories is stored in a file called Mrepositories.8mlN. $his file resides in the hidden directory M.kettleN in your default home directory. 3n *ettingsSTusernameUS.kettle Note: $he complete path and filename of this file is displayed on the *poon console. )f you donLt ant this dialog to be sho n each time *poon starts up" you can disable it by unchecking the indo s this is C@S%ocuments and
LPresent this dialog at startupL checkbo8 or by using the 3ptions dialog under the 7dit < 3ptions menu. *ee also 2.#-. 3ptions. Note: $he default pass ord for the admin user is also admin. Oou should change this default pass ord
right after the creation using the Repository 78plorer or the MRepository<7dit >serN menu.
$.-. License
=eginning ith version 2.2.0" /ettle as released into the public domain under the ,4P, license. Please refer to Appendi8 A for the full te8t of this license. Note: Pentaho %ata )ntegration is referred to as M/ettleN belo . Copyright (C) !!" #e$taho Corporatio$
Kettle is free soft%are& you ca$ redistribute it a$d'or modify it u$der the terms of the ()* +esser (e$eral #ublic +ice$se as published by the ,ree -oft%are ,ou$datio$& either versio$ optio$) a$y later versio$. Kettle is distributed i$ the hope that it %ill be useful/ but 01234*2 5)6 05775)26& %ithout eve$ the implied %arra$ty of 897C35)25:1+126 or ,12)9-,47 5 #5721C*+57 #*7#4-9. -ee the ()* +esser (e$eral #ublic +ice$se for more details. 6ou should have received a copy of the ()* +esser (e$eral #ublic +ice$se alo$g %ith the Kettle distributio$& if $ot/ %rite to the ,ree -oft%are ,ou$datio$/ 1$c./ ;. ,ra$<li$ -t/ ,ifth ,loor/ :osto$/ 85 ! ..!-.=!. *-5 .. of the +ice$se/ or (at your
$.10. Definitions
$.10.1. Transfor(ation Definitions
$alue: Aalues are part of a ro o!: a ro and can contain any type of data@ *trings" floating point :umbers" unlimited precision =ig:umbers" )ntegers" %ates or =oolean values. e8ists of 0 or more values Output stream: an output stream is a stack of ro s that leaves a step. #nput stream: an input stream is a stack of ro s that enters a step. %op: a hop is a graphical representation of one or more data streams bet een 2 steps. A hop al ays represents the output stream for one step and the input stream for another. $he number of streams is eHual to the copies of the destination step. B# or moreC Note: a note is a descriptive piece of information that can be added to a transformation
9ob %efinitions &ob Entr': A Dob entry is one part of a Dob and performs a certain task %op: a hop is a graphical representation of one or more data streams bet een 2 steps. A hop al ays represents the link bet een t o Dob entries and can be set Bdepending on the type of originating Dob entryC to e8ecute the ne8t Dob entry unconditionally" after successful e8ecution or failed e8ecution. Note: a note is a descriptive piece of information that can be added to a Dob
$.11. Too,bar
$he icons on the toolbar of the main screen are from left to right@ )con %escription Create a ne Dob or transformation
3pen transformation<Dob from file if youVre not connected to a repository or from the repository if you are connected to one. *ave the transformation<Dob to a file or to the repository. *ave the transformation<Dob under a different name or filename. 3pen the print dialog. Run transformation<Dob@ runs the current transformation from G6, file or repository. Previe transformation@ runs the current transformation from memory. Oou can previe the ro s
that are produced by selected steps. Run the transformation in debug mode allo ing you to troubleshoot e8ecution errors. Replay the processing of a transformation for a certain date and time. $his during the run on that particular date and time. Aerify transformation@ *poon runs a number of checks for every step to see if everything is going to run as it should. Run an impact analysis@ hat impact does the transformation have on the used databases. ill cause certain
steps B$e8t !ile )nput and 78cel )nputC to only process ro s that failed to be interpreted correctly
4enerate the *;, that is needed to run the loaded transformation. ,aunches the database e8plorer allo ing you to previe more. data" run *;, Hueries" generate %%, and
$.12. O tions
/ettle options allo you to customiFe a number of properties related to the behavior and look and feel of hether or not to display tips and the the graphical user interface. 78amples include startup options like select 7ditW3ptions... from the menubar. /ettle 'elcome Page" and user interface options like fonts and the colors. $o access the options dialog"
!eature 6a8imum >ndo ,evel %efault number of lines in previe dialog 6a8imum nr of lines in the logging indo s *ho *ho tips at startup+ elcome page at startup+
%escription $his parameter sets the ma8imum number of steps that can be undone Bor redoneC by *poon. $his parameter allo s you to change the default number of ro s that are reHuested from a step during transformation previe s. *pecify the ma8imum limit of ro s to display in the logging indo . $his options sets the display of tips at startup. $his option controls page hether or not to display the elcome hen launching *poon.
%escription *poon caches information that is stored on source and target databases. )n some cases this can lead to incorrect results hen youVre in the process of changing those very databases. )n those cases it is possible to disable the cache altogether instead of clearing the cache every time. NOTE: *poon automatically clears the database cache
hen
you launch %%, B%ata %efinition ,anguageC statements to ards a database connection. ?o ever"
3pen last file at startup+
describes the t o options for handling multiple outputs@ %istribute ro s I destination steps receive the ro s in turns Bround robinC Copy ro s I all ro s are sent to all destinations *ho repository dialog at startup+ hen e8iting+ $his option controls sho s up at startup. Ask user $his option controls dialog Clear custom parameters Bsteps<pluginsC Pentaho Data Integration TM hether or not to display the confirmation ere set in the hen a user chooses to e8it the application. hether or not the repositories dialog
$his option clears all parameters and flags that plugin or step dialogs.
%escription $his option controls hether or not to display tooltips for the buttons on the main toolbar.
%escription $his is the font that is used in the dialog bo8es" trees" input fields" etc. $his is the font that is used on the graphical vie . $his font is used in the notes that are displayed in the 4raphical Aie . *ets the background color in *poon. )t affects all dialogs too. *ets the background color in the 4raphical Aie of *poon.
!ont for notes =ackground color 'orkspace background color $ab color
$his is the color that is being used to indicate tabs that are active<selected.
)con siFe in
orkspace
of an icon is (28(2 pi8els. $he best results BgraphicallyC are probably at siFes #0"2-"(2"-1"0- and other multiples of (2. ,ine idth on orkspace $his affects the line idth of the hops on the 4raphical Aie and the
border around the steps. *hado siFe on orkspace )f this siFe is larger then 0" a shado of the steps" hops and notes is
!eature
=y default" a parameter is dra n at (.X of the counted from the left. Oou can change this Perhaps this can be useful in cases fonts.
Canvas anti2aliasing+
*ome platforms like 'indo s" 3*G and ,inu8 support anti2aliasing through 4%)" Carbon or Cairo. Check this to enable smoother lines and icons in your graph vie . )f you enable this and your environment doesnLt ork any more after ards" change the value for option M7nableAntiAliasingN to M:N in file R?367<.kettle<.spoonrc BC@S%ocuments and *ettingsSTuserUS.kettleS.spoonrc on 'indo sC
Checking this on 'indo s allo s you to use the default system settings for fonts and colors in *poon. 3n other platforms" this is al ays the case.
*ho
branding graphics
ill dra
graphics on the canvas and in the left hand side Me8pand barN. Preferred ,anguage ?ere you can specify the default language setting. )f a certain te8t hasnLt been translated into this locale" /ettle over locale. Alternative ,anguage =ecause the original language in hich /ettle as ritten is 7nglish" ill fall back to the fail
$his option
ill search in any available fields" connectors or notes of all loaded Dobs and transformations for
the string specified in the !ilter field. $he 6eta data search returns a detailed result set sho ing the location of any search hits. $his feature is accessed by choosing 7ditW*earch 6eta data from the menubar.
$he *et 7nvironment Aariable feature allo s you to e8plicitly create and set environment variables for the current user session. $his is a useful feature hen designing transformations for testing variable substitutions that are normally set dynamically by another Dob or transformation. $his feature is accessible by choosing 7ditW*et 7nvironment Aariable from the menubar. Note: $his screen is also presented hen you run a transformation that use undefined variables. $his
ill also open by default each ne8t time you e8ecute the
file.
$.16. 0e ,a1
$he Replay feature allo s you to re2run a transformation that failed. Replay functionality is implemented for $e8t !ile )nput and 78cel input. )t allo s you to send files that had errors back to the source and have the data corrected. 3:,O the lines that failed before are then processed during the replay if a .line file is present. )t uses the date in the filename of the .line file to match the entered replay date.
$.17. "enerate (a
)n cases
corresponding fields in the target output table. $his is normally accomplished using a *elect Aalues step in your transformation. $he L4enerate mapping against targetL option provides you for defining these mappings that dropped into your transformation flo prior to the table output step. ill automatically create the resulting *elect Aalues step that can be
$o access the L4enerate mapping against targetL option is accessed by right2clicking on the table output step.
After defining your mappings" select 3/ and the *elect Aalues step containing your mappings the step into your transformation Dust before the table output step.
ill appear on
orkspace. *imply" attach the mapping step into your transformation immediatelyAttach the mapping
$.17.1. "enerate (a
table@
ings e8a( ,e
hich e ant to generate mappings to our target output
=egin by right2clicking on the $able output step and selecting L4enerate mappings against targetL. Add all necessary mappings using the 4enerate 6apping dialog sho n above and click 3/. Oou $able output mapping step has been added to the canvas@ ill no see a
!inally" drag the generated $able output 6apping step into your transformation flo output step@
is found that does not have the same layout as the first ro " an error is thro n and the step and are reported on.
offending ro
$he
elcome screen
from the menubar" or by using the C$R,2A,$2: hotkey. $his designing your Dob.
Creating a ne
transformation or Dob
%.1. :otes
:otes allo you to add descriptive te8t notes to the 9ob or $ransformation canvas. $o add a note to the ith the mouse using the left button. graphical vie " right2click on the canvas and select LAdd noteL. ,ater" these notes can be edited by double clicking on them and dragged around the screen by dragging on them $o remove a note" right2click on the note and select L%elete noteL.
:otes
Creating a ne
database connection
$his
ill launch the LConnection informationL dialog sho n above. $he follo ing topics describe the
&.2.1. "enera,
$he general tab is here you setup the basic information about your connection like the connection name" provides a more detailed type" access method" server name and login credentials. $he table belo description of the options available on the 4eneral tab@ !eature Connection :ame Connection $ype 6ethod of access %escription >niHuely identifies a connection across transformations and Dobs $he type of database you are connecting to Bi.e. 6y*;," 3racle" etc.C $his ill be either :ative B9%=CC" 3%=C" or 3C). Available access types are
dependent on the type of database you are connecting to *erver host name %efines the host name of the server on specify the host by )P2address %atabase name )dentifies the database name you the %*: name here Port number >sername Pass ord *ets the $CP<)P port number on hich the database listens ant to connect to. )n case of 3%=C" specify hich the database resides. Oou can also
3ptionally specifies the username to connect to the database 3ptionally specifies the pass ord to connect to the database
&.2.2. Poo,ing
$he pooling tab allo s you to configure your connection to use connection pooling and define options related to connection pooling like the initial pool siFe" ma8imum pool siFe and connection pool parameters. $he table belo provides a more detailed description of the options available on the Pooling tab@
!eature >se a connection pool $he initial pool siFe $he ma8imum pool siFe. Parameter $able
%escription Check this option to enable connection pooling. *ets the initial siFe of the connection pool. *ets the ma8imum number of connections in the connection pool. Allo s you to define additional custom pool parameters.
&.2.$. M1S;L
=ecause by default" 6y*;, gives back complete Huery results in one block to the client B/ettle in this caseC e had to enable Mresult streamingN by default. $he big dra back of this is that it allo s only # BoneC single Huery to be opened at any given time. )f you run into trouble because of that" you can disable this option in the 6y*;, tab of the database connection dialog. Another issue you might come across is that the default timeout in the 6y*;, 9%=C driver is set to 0. Bno timeoutC $his leads to a problem in certain situations as it doesnLt allo /ettle to detect a server crash or sudden net ork failure if it happens in the middle of a Huery or open database connection. $his in turn leads to the infinite stalling of a transformation or Dob. $o solve this" set the Mconnect$imeoutN and Msocket$imeoutN parameters for 6y*;, in the 3ptions tab. $he value to be specified is in milliseconds@ for a 2 minute timeout you Oou can also revie ould specify value #20000 B 2 8 00 8 #000 C. help te8t on
other options on the linked 6y*;, help page by clicking on the L*ho
&.2.%. Orac,e
$his tab allo s you to specify the default data and inde8 tablespaces *;, for 3racle tables and inde8es. $his version of Pentaho %ata )ntegration ships the most stable and recent driver other strange problems" you might ith the 3racle 9%=C driver version #0.2.0. )t is in general ith 3racle connectivity or ith hich /ettle ill use hen generating
ant to consider replacing the #0.2. 9%=C driver to match your database
server. Replace files MoDdbc#-.DarN and Morai#1n.DarN in the directory libe8t<9%=C of your distribution the files found in the R3RAC,7Q?367<Ddbc directory on your server. )f you ships ant to use 3C) and an 3racle :et1 client" please read on. !or 3C) to ith version #0.2.
used in /ettle needs to match your 3racle client version. 3racle 2...0 shipped
Oou can either install that version of the 3racle client or Bprobably easierC change the 9%=C driver in P%) if versions donLt match up. Bsee aboveC
&.2.&. Infor(i8
!or )nformi8" you need to specify the )nformi8 *erver name in the )nformi8 tab in order for a connection to be usable.
3ther properties can be configured by adding connection parameters on the options tab of the Connection information dialog. !or e8ample" you can enable single sign2on login by defining the domain option on the 3ptions tab as sho n belo @
*pecifies the 'indo s domain to authenticate in. )f present and the user name and pass ord are provided" D$%* uses 'indo s B:$,6C authentication instead of the usual *;, *erver authentication Bi.e. the user and pass ord provided are the domain user and pass ordC. $his allo s non2'indo s clients to log in to servers authentication. hich are only configured to accept 'indo s
)f the domain parameter is present but no user name and pass ord are provided" D$%* uses its native *ingle2*ign23n library and logs in ork one ith the logged 'indo s userLs credentials Bfor this to to do thisC. ould obviously need to be on 'indo s" logged into a domain" and also have the **3
&.2.*. "eneric
$his tab is here you specify the >R, and %river class for 4eneric %atabase connectionsJ Oou can also dynamically set these properties using /ettle variables. $his provides the ability to access data from multiple database types using the same transformations and Dobs. Note: 6ake sure to use clean A:*) *;, that
&.2.-. O tions
$his tab allo s you to set database specific options for the connection by adding parameters to the generated >R,. $o add a parameter" select the ne8t available ro configuration help" click the Z*ho *poon database type@ in the parameter table" choose your bro ser tab ill appear in database type" then enter a valid parameter name and its corresponding value. !or more database specific help te8t on option usageV button and a ne ith additional information about the configuring the 9%=C connection for the currently selected
&.2.10. S;L
$his tab allo s you to enter a number of *;, commands immediately after connecting to the database. $his is sometimes needed for various reasons like licensing" configuration" logging" tracing" etc.
&.2.11. C,#ster
$his tab allo s you to enable clustering for the database connection and create connections to the data partitions. $o enable clustering for the connection" check the L>se Clustering+L option. $o create a ne data partition" enter a partition )% and the hostname" port" database" username and
&.2.12. 'd6anced
$his tab allo s you configure the follo ing properties for the connection@ !eature ;uote all identifiers in database !orce all identifiers to lo er case !orce all identifiers to upper case %escription *pecifies the language to be used connect. *pecifies the three digit client number for the connection. hen connecting to *AP. hich you ant to
&.2.1%. 78 ,ore
$he %atabase 78plorer allo s you to interactively bro se the target database" previe data" generate %%, and much more. $o open the %atabase 78plorer for an e8isting connection" click the L78ploreL button found on the Connection information dialog or right2click on the connection in the 6ain tree and select L78ploreL. Please see %atabase 78plorer for more information.
connection
$o delete an e8isting database connection" right2click on the connection name in the main tree and select
&.*. ;#oting
'e had more and more people complain about the handling of reserved it" field names ith decimals B.C in it" table names ith. ords for many Bbut not allC of the supported ould be impossible to properly Huote tables ords" field names ith spaces in e ith dashes and other special characters in it ... implemented a database specific Huoting system that allo s you to pretty much use any name or character that the database is comfortable
Pentaho %ata )ntegration contains a list of reserved databases. $o correctly implement Huoting" or fields
ith one or more periods in them. Putting dots in table and field names is apparently common
practice in certain 7RP systems. Bfor e8ample fields like MA.A.$.NC =ecause e too can be rong hen doing the Huoting" e have added a ne rule in version 2...0@ hen
there is a start or end2Huote in the tablename or schema" Pentaho %ata )ntegration refrains from doing the Huoting. $his allo s you to specify the Huoting mechanism yourself. $his leaves you all the freedom you need to get out of any sticky situation that might be left. :evertheless" feel free to let us kno that e can improve our Huoting algorithms. about it so
%atabase
Access 6ethod *erver :ame or )P %atabase :ame Address 3%=C 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase :ame 3%=C %*: name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired ReHuired %atabase name %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name
Port [ BdefaultC
Postgre*;,
:ative 3%=C
ReHuired B.-(2C
ReHuired ReHuired
)ntersystems Cach\
:ative 3%=C
ReHuired B#&72C
ReHuired ReHuired
*ybase
:ative 3%=C
ReHuiredB.0 ReHuired 0#C ReHuired ReHuired B2#..C ReHuired ReHuired 3ptional ReHuired B(0.0C ReHuired ReHuired ReHuired B&00#C ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired B(0.0C ReHuired ReHuired ReHuired B0-.(C ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired ReHuired B.-10C ReHuired
:ative 3%=C
?ypersonic
:ative
6a8%= B*AP %=C :ative 3%=C )ngres :ative 3%=C =orland )nterbase :ative 3%=C 78ten%= :ative 3%=C $eradata :ative 3%=C 3racle R%= :ative 3%=C ?2 :ative 3%=C :eteFFa :ative
%atabase
Access 6ethod *erver :ame or )P %atabase :ame Address 3%=C 3%=C %*: name ReHuired %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name optional %atabase name 3%=C %*: name ReHuired %atabase name 3%=C %*: name
Port [ BdefaultC
)=6 >niverse
:ative 3%=C
*;,ite
:ative 3%=C
Apache %erby
:ative 3%=C
3ptional B#.27C
3ptional 3ptional
4eneric B]C
:ative 3%=C
ReHuired BAnyC
ReHuired 3ptional
B]C $he generic database connection also needs to specify the >R, and %river class in the 4eneric tabJ 'e no also allo these fields to be specified using a variable. $hat ay you can access data from multiple orks on database types using the same transformations and Dobs. 6ake sure to use clean A:*) *;, that all used database types in that case.
$o configure" edit properties file called Msimple2Dndi<Ddbc.propertiesN !or e8ample" to connect to the databases used in Pentaho %emo platform do nload" use this information in the properties file@ -ample>ata'type=javax.s?l.>ata-ource -ample>ata'driver=org.hs?ldb.jdbc>river -ample>ata'url=jdbc@hs?ldb@hs?l@''localhost'sampledata -ample>ata'user=pe$tahoAuser -ample>ata'pass%ord=pass%ord BuartC'type=javax.s?l.>ata-ource BuartC'driver=org.hs?ldb.jdbc>river BuartC'url=jdbc@hs?ldb@hs?l@''localhost'?uartC BuartC'user=pe$tahoAuser BuartC'pass%ord=pass%ord 3iber$ate'type=javax.s?l.>ata-ource 3iber$ate'driver=org.hs?ldb.jdbc>river 3iber$ate'url=jdbc@hs?ldb@hs?l@''localhost'hiber$ate 3iber$ate'user=hibuser 3iber$ate'pass%ord=pass%ord -har<'type=javax.s?l.>ata-ource -har<'driver=org.hs?ldb.jdbc>river -har<'url=jdbc@hs?ldb@hs?l@''localhost'shar< -har<'user=sa -har<'pass%ord=
Note: )t is important that the information stored in this file in the simple2Dndi directory mirrors the content
&.11. !ns#
)f you solution. A fe and<or soft are.
orted databases
and e ill try to find a database types are not supported in this release because of the lack of sample database
ant to access a database type that is not yet supported" let us kno
Please note that it is usually still possible to read from these databases by using the 4eneric database driver through an 3%=C or 9%=C connection.
6. S;L 7ditor
you added at $able 3utput step to a transformation and clicked the *;, button at the bottom of the $able ill automatically generate the necessary %%, for the output step to function properly and present that to the end user via the *;, 7ditor. Notes:
6ultiple *;, *tatements have to be separated by semi2colons B^C. =efore these *;, *tatements are sent to the database to be e8ecuted" *poon removes returns" line2feeds and the separating semi2colons. /ettle clears the database cache for the database connection on statements. hich you launch %%,
6.2. Li(itations
$his is a simple *;, editor. )t does not kno all the dialects of all the more than 20 supported databases. ith the database in that case. $hat means that creating stored procedures" triggers and other database specific obDects might pose problems. Please consider using the tools that came
7. Database 78 ,orer
$he buttons to the right provide Huick access the follo ing features for the selected table@ !eature Previe Previe first #00 ro s of... first ... ro s of... %escription Returns the first #00 ro s from the selected table Prompts the user for the number of ro s to return from the selected table *pecifies the three digit client number for the connection %isplays a list of column names" data types" etc. from the selected table 4enerates the %%, to create the selected table based on the current connection type 4enerate %%, for other connection 3pen *;, for... $runcate table... Prompts the user for another connection" then generates the %%, to create the selected table based on the user selected connection type. ,aunches the *imple *;, 7ditor for the selected table 4enerates a $R>:CA$7 table statement for the current table. Note: $he statement is commented out by default to prevent the user from
4enerate %%,
*. >o s
*electing t o steps in the tree" clicking right and selecting Yne *electing t o steps in the graphical vie selecting Yne *plitting A ?op step into a ne orks only hopY
hopY
hop bet een t o steps by dragging the step Bin the 4raphical ill be asked if you
Aie C over a hop until the hop becomes dra n in bold. Release the left button and you
ith steps that have not yet been connected to another step.
*.$. Loo s
,oops are not allo ed in transformations because *poon depends heavily on the previous steps to determine the field values that are passed from one step to another. )f transformations e often e ould allo loops in ould get endless loops and undetermined results.
,oops are allo ed in Dobs because *poon e8ecutes Dob entries seHuentially. 9ust make sure you donLt build endless loops. $his Dob entry can help you e8it closed loops based on the number of times a Dob entry as e8ecuted.
'e detected ro s ro
ith varying number of fields" this is not allo ed in a transformation. $he first
dateQtoK" C>*$367R:RK0" :A67K" !)R*$:A67K" ,A:4>A47K" 47:%7RK" *$R77$K" ?3>*:RK" =>*:RK" E)PC3%7K" ,3CA$)3:K" C3>:$ROK" %A$7Q3!Q=)R$?Ka
Note: this is only a Pentaho Data Integration TM
arning and
ant to do.
S oon !ser "#ide %$
$he hop is used for carrying ro s that caused errors in source stepBsC.
-. @ariab,es
-.1. @ariab,e #sage
Aariables can be used throughout Pentaho %ata )ntegration" including entries. Aariables can be defined by setting them setting them in the <ettle.properties file in the directory@ D3489'.<ettle (*$ix'+i$ux'4-E) C@F>ocume$ts a$d -etti$gsFGuser$ameHF.<ettleF (0i$do%s) $he ay to use them is either by grabbing them using the 4et Aariable step or by specifying meta2data ithin transformation steps and Dob ith the M*et AariableN step in a transformation or by
strings like@ DIJ5715:+9K or LLJ5715:+9LL =oth formats can be used and even mi8ed" the first is a >:)G derivative" the second is derived from 6icrosoft 'indo s. %ialogs that support variable usage throughout Pentaho %ata )ntegration are visually indicated using a red dollar sign like this@
Oou can use C$R,2*PAC7 hotkey to select a variable to be inserted into the property value. 6ouse over the variable icon to see a shortcut help te8t displayed.
RbDava.io.tmpdirc. $his variable points to directory <tmp on >ni8<,inu8<3*G and to C@S%ocuments and
transformations or Dobs run at the same time on an application server Bfor e8ample the Pentaho platformC ould get conflicts. Changes to the environment variables are visible to all soft are running on the virtual machine.
$hese are the internal variables that are defined in a 9ob@ Aariable :ame )nternal.9ob.!ilename.%irectory )nternal.9ob.!ilename.:ame )nternal.9ob.:ame )nternal.9ob.Repository.%irectory *ample value <home<matt<Dobs :ested Dobs.kDb :ested Dob test case <
$hese variables are defined in a transformation running on a slave server" e8ecuted in clustered mode@ Aariable :ame )nternal.*lave.$ransformation.:umber )nternal.Cluster.*iFe *ample value 0..Tcluster siFe2#U B0"#"2"( or -C Tcluster siFeU B.C
$ransformation *ettings
%escription %isplays the date and time hen the transformation as last modified.
10.$. Logging
$he ,ogging tab allo s you to configure ho *etting R7A% log step ):P>$ log step 'R)$7 log step 3>$P>$ log step >P%A$7 log step R797C$7% log step ,og connection ,og table >se =atch2)%+ >se logfield to store logging in %escription >se the number of read lines from this step to means@ read from source steps. >se the number of input lines from this step to means@ input from file or database. >se the number of 'ritten means@ ritten lines from this step to rite to the log table. rite to the log table. rite to the log table. rite to the log table. ritten to target steps. rite to the log table. )nput rite to the log table. Read and here logging information is captured. *ettings include@
>se the number of output lines from this step to 3utput means@ output to file or database. >se the number of updated lines from this step to >pdate means@ updated in a database. >se the number of reDected lines from this step to ReDected means@ error record. $he connection used to 7nable this if you rite to a log table.
specifies the name of the log table Bfor e8ample ,Q7$,C ant to have a batch )% in the ,Q7$, file. %isable for ith *poon<Pan version T 2.0. ith the run results in the same back ard compatibility
$his option stores the logging te8t in a C,3= field in the logging table. $his allo s you to have the logging te8t together table. %isable for back ard compatibility ith *poon<Pan version T 2.#
10.%. Dates
$he %ates tab allo s you to configure the follo ing date related settings@ *etting 6a8date connection 6a8date table 6a8date field 6a8date offset %escription 4et the upper limit for a date range on this connection. 4et the upper limit for a date range in this table. 4et the upper limit for a date range in this field. )ncreases the upper date limit 2(@00@00" but you kno 6a8imum date difference ith this amount. >se this for e8ample" if you find that the field %A$7Q,A*$Q>P% has a ma8imum value of 200-20.22& that the values for the last minute are not complete. ill allo )n this case" simply set the offset to 200. *ets the ma8imum date difference in the obtained date range. $his you to limit Dob siFes.
10.&. De endencies
$he %ependencies tab allo s you to enter all of the dependencies for the transformation. !or e8ample" if a dimension is depending on ( lookup tables" e have to make sure that these lookup tables have not e need to e8tend the date range to force a S oon !ser "#ide %* changed. )f the values in these lookup tables have changed" Pentaho Data Integration TM
full refresh of the dimension. $he dependencies allo you have a Mdata last changedN column in the table. $he L4et dependencies buttonL
you to look up
10.6. Misce,,aneo#s
$he 6iscellaneous tab allo s you to configure the follo ing settings@ *etting :umber of ro s in ro sets *ho a feedback ro %escription $his option allo s you to change the siFe of the buffers bet een the connected steps in a transformation. Oou 3nly in transformation steps+ $he feedback siFe hen you run lo $his controls ill rarely<never need to change this parameter. hile the on memory it might be an option to lo er this parameter.
transformation is being e8ecuted. =y default" this feature is enabled and configured to display a feedback record every .000 ro s. *ets the number of ro s to process before entering a feedback entry into the log. *et this higher hen processing large amounts of data to reduce the amount of information in the log file.
$his allo s use to open one uniHue connection per defined and used database connection in the transformation. Checking this option is reHuired in order to allo a failing transformation to be rolled back completely. *pecifies the location of the G6, file used to stored shared obDects like database connections" clustering schemas and more. Allo s you to enable or disable the internal logic for changing the 9ava thread priorities based on the number of input and output ro s in the perspective Mro setN buffers. $his can be useful in some simplistic situations of using the logic is e8ceeds the benefit of the thread prioritiFation. here the cost
10.7. Partitioning
$he Partitioning tab provides a list of available database partitions. Oou can create a ne clicking on the M:e N button. $he M4et PartitionsN button been defined for the connection. partition by ill retrieve a list of available partitions that have
Oou
ill be sho n@
11.$. Distrib#te or co 1+
)n the e8ample above" green lines are sho n bet een the steps. $his indicates that ro s are distributed among the target steps. )n this case" it means that the first ro to Mdatabase lookup #N" etc. ?o ever" if e right click on step MAN" and select MCopy dataN" you ill get the hops dra n in red@ coming from step MAN goes to step Mdatabase lookup #N" the second to Mdatabase lookup 2N" the third to M%atabase lookup (N" the fourth back
MCopy dataN means that all ro s from step MAN are copied to all ( the target steps. )n this case it means that step M=N gets ( copies of all the ro s that MAN has sent out. NOTE: =ecause of the fact that all these steps are run as different threads" the order in
hich the
single ro s arrive at step M=N is probably not going to be the same as they left step MAN.
*tep error handling allo s you to configure a step such that instead of halting a transformation click on the step and select M%efine 7rror handling...N. )n the e8ample belo " .. e artificially generate an error in the *cript Aalues step
hen an
error occurs" pass those ro s that caused an error to a different step. $o configure error handling" right
$o configure the error handling" you can right click on the step involved and select the M7rror handing...N menu item@
As you can see" you can add e8tra fields being to the Merror ro sN@
$his
ay"
alternative
$his transformation performs an insert regardless of the content of the table. )f you put a primary key on the )% Bin this case the customer )%C the insert into the table cause an error. =ecause of the error handling e can pass the ro s in error to the update step. Preliminary tests have sho n this strategy of doing upserts to be ( times faster in certain situations. B ith a lo updates to inserts ratioC
ort
ay to reference source
/ettle provides support for the Apache Airtual !ile *ystem BA!*C as an additional
files" transformations and Dobs from any location you like. !or more information about A!*" visit Apache
sh <itche$.sh -file@http@''%%%.<ettle.be'(e$erate7o%s.<jb
.kettle.be<4enerateRo s.kDbL and click the 3/ button to load the Dob in *poon@
$he transformation
1$ter$al.Mob.,ile$ame.>irectory
http@''%%%.<ettle.be'
Note: Oou
do not support it" but because you donVt have the permission to do so.
!or more information on the almost endless list of possibilities files" Dar2files" ram drives" *6=" BsCftp" BsChttp" etc. 'e ill e8tend this list even further in the near future ith our o n drivers for the Pentaho solutions http@<<Dakarta.apache.org<commons<vfs<filesystems.html. 78amples include direct loading from Fip2files" gF2
repository and later on for the /ettle repository Bsomething like@ psr@<< and pdi@<< >R)sC.
$he e8ample above illustrates the ability to use a Apache A!* support )ntegration suite as
as implemented in all steps and Dob entries that are part of the Pentaho %ata ell as in the recent Pentaho platform code and in Pentaho Analyses B6ondrianC.
$he $e8t !ile )nput step provides the ability to specify a list of files to read" or a list of directories ith ild cards in the form of regular e8pressions. )n addition" you can accept filenames from a previous step making filename handling more even more generic. $he follo ing sections describe in detail the available options for configuring the $e8t file input step.
%escription %isplays the content of the selected file. %isplays the content from the first data line only for the selected file.
$his option allo s even more fle8ibility in combination source@ te8t file" database table" etc. 3ption Accept filenames from previous steps *tep to read filenames from !ield in the input to use as filename $e8t !ile )nput %escription
ith other steps like M4et !ilenamesN. Oou ay the filename can come from any
$his enables the option to get filenames from previous steps. $he step to read the filenames from ill look in this step to determine the filenames to use.
*tep.
*pecify an escape character Bor charactersC if you have escaped characters in your data. )f you have S as an escape character" the te8t N)ot the $i$e oFNcloc< $e%s.L B ith L the enclosureC parsed as )ot the $i$e oNcloc< $e%s. ?eader 5 number of header 7nable this option if your te8t file has a header ro . B!irst lines in the lines !ooter 5 number of footer lines 'rapped lines 5 number of raps Paged layout 5 page siFe 5 doc header fileC Oou can specify the number of times the header lines appears. 7nable this option if your te8t file has a footer ro . B,ast lines in the fileC Oou can specify the number of times the footer ro >se this if you deal rapped. Oou can use these options as a last resort hen dealing ith te8ts meant for printing on a line printer. >se the number of document header lines to skip introductory te8ts and the number of lines per page to position the data lines. Compression :o empty ro s )nclude filename in output !ilename field name Ro num in output+ Ro number field name 7nable this option if your te8t file is placed in a Eip or 4Eip archive. NOTE@ At the moment" only the first file in the archive is read. %onLt send empty ro s to the ne8t steps. 7nable this if you ant the filename to be part of the output. ith data lines that have appears. rapped beyond a ill get
certain page limit. :ote that headers 5 footers are never considered
$he name of the field that contains the filename. 7nable this if you ant the ro number to be part of the output. number.
$he name of the field that contains the ro Allo s the ro number to be reset per file.
Ro num by file+
3ption !ormat
%escription $his can be either %3*" >:)G or mi8ed. >:)G files have lines that are terminated by line feeds. %3* files have lines separated by carriage returns and line feeds. )f you specify mi8ed" no verification is done.
7ncoding
*pecify the te8t file encoding to use. ,eave blank to use the default encoding on your system. $o use >nicode specify >$!21 or >$!2#0. 3n first use" *poon ill search your system for available encodings.
*ets the number of lines that is read from the file. 0 means@ read all lines. %isable this option if you ant strict parsing of data fields. )n case ill become !eb #st. ritten in full like ould be lenient parsing is enabled" dates like 9an (2nd
$his locale is used to parse dates that have been !rench BfrQ!RC locale ould not
M!ebruary 2nd" 2000N. Parsing this date on a system running in the ork because !ebruary called !\vrier in that locale.
:ote that you can generate an e8tra file that hich the errors occurred. )f lines
skipped" the fields that did have parsing errors" Add a field to the output stream ro s. $his field of errors on the line. Add a field to the output stream ro s. $his field names on hich an error occurred. Add a field to the output stream ro s. $his field 'hen arnings are generated" they ill be T arning
descriptions of the parsing errors that have occurred. ill be put in this directory. $he name of that file 7rror files directory !ailing line numbers files directory
dirU<filename.TdateQtimeU.T arning e8tensionU 'hen errors occur" they file ill be put in this directory. $he name of that ill be put in this ill be TerrorfileQdirU<filename.TdateQtimeU.TerrorfileQe8tensionU ill be Terrorline
'hen a parsing error occur on a line" the line number directory. $he name of that file dirU<filename.TdateQtimeU.Terrorline e8tensionU
11.6.1.&. 5i,ters
$he filters tab provides the ability to specify the lines you ant to skip in the te8t file.
describes the available options for defining filters@ %escription $he string to look for. $he position here the filter string has to be at in the line. 0 is the first 0 here" the filter string is hen the position in the line. )f you specify a value belo searched for in the entire string.
*top on filter
11.6.1.6. 5ie,ds
$he fields tab is 3ption :ame $ype !ormat ,ength here you specify the information about the name and format of the fields being read from the te8t file. Available options include@ %escription name of the field $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4rouping :ull if %efault $rim Repeat !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ used to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C treat this value as :>,, $he default value in case the field in the te8t file O<:@ )f the corresponding value in this ro time it as not empty as not specified. BemptyC type trim this field Bleft" right" bothC before processing is empty@ repeat the one from the last
Currency sign" replaced by currency symbol. )f doubled" replaced by international currency symbol. )f present in a pattern" the monetary decimal separator is used instead of the decimal separator.
Prefi8 or suffi8
:o
>sed to Huote special characters in a prefi8 or suffi8" for e8ample" YL[L[Y formats #2( to Y[#2(Y. $o create a single Huote itself" use t o in a ro @ Y[ oLLclockY.
*cientific :otation )n a pattern" the e8ponent character immediately follo ed by one or more digit characters indicates scientific notation. 78ample@ Y0.[[[70Y formats the number #2(- as Y#.2(-7(Y.
11.6.1.6.2.
ate formats
as taken from the *un 9ava AP) documentation" to be found Presentation $e8t Oear 6onth :umber :umber :umber :umber :umber $e8t $e8t :umber 0 :umber 2:umber 0 :umber #2 :umber (0 :umber .. :umber &71 4eneral time Fone R!C 122 time Fone Pacific *tandard $ime^ P*$^ 46$201@00 20100 78amples A% #&&0^ &0 9uly^ 9ul^ 07 27 2 #1& #0 2 $uesday^ $ue P6
here@ http@<<Dava.sun.com<D2se<#.-.2<docs<api<Dava<te8t<*imple%ate!ormat.html %ate or $ime Component 7ra designator Oear 6onth in year 'eek in year 'eek in month %ay in year %ay in month %ay of %ay in eek in month eek
Am<pm marker ?our in day B022(C ?our in day B#22-C ?our in am<pm B02##C ?our in am<pm B#2#2C 6inute in hour *econd in minute 6illisecond $ime Fone $ime Fone
11.6.1.7. 78tras
!unction<=utton *ho filenames %escription $his option sho s a list of all the files selected. Please note that if the transformation is to be run on a separate server" the result might be incorrect. *ho *ho file content content from $he MAie N button sho s the first lines of the te8t2file. 6ake sure that the file2format is correct. 'hen in doubt" try both %3* and >:)G formats. $his button helps you in positioning the data lines in comple8 te8t files multiple header lines" etc. $his button allo s you to guess the layout of the file. )n case of a C*A file" this is done pretty much automatically. 'hen you selected a file length fields" you need to specify the field boundaries using a Previe ro s Press this button to previe the ro s generated by this step. ith fi8ed iFard. ith first data line 4et fields
11.6.2. Tab,e in #t
)con
$able )nput
11.6.2.2. O tions
3ption *tep name Connection *;, %escription :ame of the step. $his name has to be uniHue in a single transformation. $he database connection used to read data from. $he *;, statement used to read information from the database connection. Oou can also click the L4et *;, select statement...L button to bro se tables and automatically generate a basic select statement. 7nable laFy conversion Replace variables in script+ )nsert data from step ,aFy conversion ill avoid unnecessary data type conversions and can as provided result in a significant performance improvements. Check to enable. 7nable this to replace variables in the script. $his feature to allo you to test ith or *pecify the input step name locators 78ecute for each ro + ,imit siFe here here ithout performing variable substitutions. e can e8pect information to come
from. $his information can then be inserted into the *;, statement. $he e insert information is indicated by + BHuestion marksC. 7nable this option to perform the data insert for each individual ro . *ets the number of lines that is read from the database. 0 means@ read all lines.
ant
to read all customers that have had their data changed yesterday" you might do it like this@
And the Mget date range for yesterdayN looks like this@
11.6.2.%. 78tras
!unction<=utton Previe %escription $his option previe s this step. )t is done by previe transformation of a ne ith 2 steps@ this one and a %ummy step. Oou can indo .
see the detailed logging of that e8ecution by clicking on the logs button in the the previe
)con
!irst day of last month 00@00@00 ,ast day of last month 2(@.&@.& !irst day of this month 00@00@00 ,ast day of this month 2(@.&@.& !irst day of ne8t month 00@00@00 ,ast day of ne8t month 2(@.&@.& copy of step
)tem transformation name transformation file name >ser that modified the transformation last %ate hen the transformation as modified last transformation batch )% ?ostname )P address Returns the )P address of the server. command line argument # command line argument 2 command line argument ( command line argument command line argument . command line argument 0 command line argument 7 command line argument 1 command line argument & command line argument #0 /ettle version /ettle =uild Aersion /ettle =uild %ate
%escription
step.
)%Q=A$C? value in the logging table" see 0. $ransformation settings. Returns the hostname of the server.
Argument # on the command line. Argument 2 on the command line. Argument ( on the command line. Argument - on the command line. Argument . on the command line. Argument 0 on the command line. Argument 7 on the command line. Argument 1 on the command line. Argument & on the command line. Argument #0 on the command line. Returns the /ettle version Be.g. 2...0C Returns the build version of the core /ettle library Be.g. #(C Returns the build date of the core /ettle library
11.6.$.2. O tions
$here follo ing table describes the options for configuring the 4et *ystem info step@ 3ption *tep :ame !ields %escription :ame of the step. $his name has to be uniHue in a single transformation. $he fields to output.
11.6.$.$. !sage
$he first type of usage is to simply get information from the system@
!rom version 2.(.0 on" this step also accepts input ro s. $he selected values ro s found in the input streamBsC@
)con
4enerate Ro s
11.6.%.2. O tions
3ption *tep :ame ,imit !ields %escription :ame of the step. $his name has to be uniHue in a single transformation. *ets the ma8imum number of ro s you $his table is ro s you are generating. ant to generate. here you configure the structure and values BoptionallyC of the
)con
format stays the same bet een versions of Pentaho %ata )ntegration.
11.6.&.2. O tions
3ption *tep :ame !ilename ,imit *iFe %escription :ame of the step. $his name has to be uniHue in a single transformation. $he name of the /ettle cube file that value of L0L indicates no siFe limit. ill be generated. ritten to the cube file. A Allo s you to optionally limit the number of ro s
11.6.6. F=ase in #t
)con
Gbase input
11.6.6.2. O tions
$he follo ing options are available for the Gbase input step@ 3ption *tep :ame !ilename ,imit *iFe Accept filenames Add ro nr+ )nclude filename in output+ Character2set name to use Previe Click this button to previe that ill be read. %escription :ame of the step. $his name has to be uniHue in a single transformation. $he name of the %=! file to read data from Allo s you to optionally limit the number of ro s read. Allo s you to read in filenames from a previous step in the transformation. Adds a field to the output number. 3ptionally allo s you to insert a field containing the filename onto the stream. *pecifies the character set Bi.e. A*C))" >$!21C to use. ith the specified name that contains the ro
11.6.7. 78ce, in #t
)con
78cel input
3ption Previe ro s
%escription selected file definitions. Click to previe the contents of the specified 78cel file.
11.6.7.$. Sheets
$he ,ist of sheets to read table displays currently selected sheets to read from. >se the L4et sheetnameBsCL button to fill in the available sheets automatically. Note: Oou also need to specify the start ro
11.6.7.%. Content
$he content tab allo s you to configure the follo ing properties@ 3ption ?eader :o empty ro s *top on empty ro !ilename field *heetname field *heer ro nr field %escription Check if the sheets specified have a header ro Check this if you donLt $his that e need to skip. hen a ant empty ro s in the output of this step.
ill make the step stop reading the current sheet of a file
empty line is encountered. *pecify a field name to include the filename in the output of this step. *pecify a field name to include the sheetname in the output of this step. *pecify a field name to include the sheet ro the step. $he sheet ro sheet. Ro nr ritten field *pecify a field name to include the ro step. MRo ,imit 7ncoding number number in the output of the rittenN is the number of ro s processed" starting number in the output of number in the 78cel number is the actual ro
at # and counting up regardless of sheets and files. limit the number of ro s to this number" 0 means@ all ro s. *pecify the character encoding Bi.e. >$!21" A*C))C
on
'arnings file directory
%escription dirU<filename.TdateQtimeU.T arning e8tensionU 'hen errors occur" they file ill be put in this directory. $he name of that ill be put in this ill be TerrorfileQdirU<filename.TdateQtimeU.TerrorfileQe8tensionU ill be Terrorline
'hen a parsing error occur on a line" the line number directory. $he name of that file dirU<filename.TdateQtimeU.Terrorline e8tensionU
11.6.7.6. 5ie,ds
$he fields tab is for specifying the fields that need to be read from the 78cel files. A button M4et fields from header ro N is provided to automatically fill in the available fields if the sheets have a header ro . !or a given field" the L$ypeL column is provided for performing data type conversions. !or e8ample" if you ant to read a %ate and you have a *tring value in the 78cel file" you can specify the conversion mask. Note: in the case of :umber to %ate conversion Be8ample@ 200.#021 22U 3ctober 21th" 200.C you
*tring conversion taking place before doing the *tring to %ate conversion.
11.6.*. FML In #t
)con
11.6.*.$. Content
$he content tab contains the follo ing options for describing the content being read@ 3ption )nclude filename in output 5 fieldname Ro num in output 5 fieldname ,imit :r of header ro s to skip ,ocation Oou can specify the ma8imum number of ro s to read here. *pecify the number of ro s to skip" from the start of an G6, document" before starting to process. *pecify the path by G7o%sH G7o%H G,ield.H...G',ield.H ... G'7o%H ... G'7o%sH $hen you set the location to Ro s" Ro Note: you can also set the root BRo sC as a repeating element location. ay of elements to the repeating part of the G6, file. !or e8ample if you are reading ro s from this G6, file@ %escription Check this option if you the ro field here the filename ant to have the name of the G6, file to ill end up in. ant to have a ro number Bstarts at #C in the here the integer ill end up in. hich belongs in the output stream. Oou can specify the name of the
$he output
11.6.*.%. 5ie,ds
$he fields tab is properties@ 3ption :ame $ype !ormat ,ength %escription $he name of the field. $ype of the field can be either *tring" %ate or :umber. $he format mask to convert of format specifiers. $he length option depends on the field type follo s@ Precision Currency %ecimal 4roup :umber 2 $otal number of significant figures in a number *tring 2 total length of string %ate 2 length of printed output of the string Be.g. - only gives back yearC :umber 2 :umber of floating point digits *tring 2 unused %ate 2 unused ith. *ee :umber !ormats for a complete description here you define properties for the location and format of the fields being read describes each of the options for configuring the field from the G6, document. $he table belo
*ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C
%escription $he trimming method to apply on the string found in the G6,. Check this if you the previous ro . $he position of the G6, element or attribute. Oou use the follo ing synta8 to specify the position of an element" for e8ample@ $he first element called MelementN@ 9=eleme$t'. $he first attribute called MattributeN@ 5=attribute'. $he first attribute called MattributeN in the second MelementN tag@ 9=eleme$t' / 5=attribute'. ant to repeat empty values ith the corresponding value from
NOTE@ Oou can auto2generate all the possible positions in the G6, file supplied by using the M4et
!ieldsN button.
NOTE@ *upport
here all the information is stored in the Repeating you to grab this information. $he
as added to allo
)con
11.6.!.1.2. Filters
$he filters tab allo s you to filter the retrived filenames based on@ All files and folders !iles only !olders only
)con
%escription )ncludes the system date in the filename. BQ2(.&.&C. $his option sho s a list of the files the ill be generated.
11.6.10.$. Content
$he content tab contains the follo ing options for describing the content being read@ 3ption Append *eparator 7nclosure %escription Check this to append lines to the end of the specified file. *pecify the character that separates the fields in a single line of te8t. $ypically this is ^ or a tab. A pair of strings can enclose some fields. $his allo s separator characters in fields. $he enclosure string is optional. ?eader 7nable this option if you the te8t file to have a header ro . B!irst line in the fileC. !orce the enclosure around fields+ ?eader !ooter !ormat 7ncoding 7nable this option if you the fileC. 7nable this option if you the fileC. $his can be either %3* or >:)G. >:)G files have lines are separated by linefeeds. %3* files have lines separated by carriage returns and line feeds. *pecify the te8t file encoding to use. ,eave blank to use the default encoding on your system. $o use >nicode specify >$!21 or >$!2#0. 3n first use" *poon ill search your system for available encodings. Compression Allo s you to specify the type of compression" .Fip or .gFip to use compressing the output. NOTE@ At the moment" only one file is placed in a single archive. Right pad fields !ast data dump Bno formattingC *plit every P ro s Add 7nding line of file Add spaces to the end of the fields Bor remove characters at the endC until they have the specified length. )mproves the performance hen dumping large amounts of data to a te8t file by not including any formatting information. )f this number : is larger than Fero" split the resulting te8t2file into multiple parts of : ro s. Allo s you to specify an alternate ending ro to the output file. hen ant the te8t file to have a footer ro . B,ast line in ant the te8t file to have a header ro . B!irst line in $his option forces all field names to be enclosed in the 7nclosure property above. ith the character specified ant
11.6.10.%. 5ie,ds
$he fields tab is 3ption :ame here you define properties for the fields being e8ported. $he table belo
describes each of the options for configuring the field properties@ %escription $he name of the field.
%escription $ype of the field can be either *tring" %ate or :umber. $he format mask to convert of format symbols. $he length option depends on the field type follo s@ :umber 2 $otal number of significant figures in a number *tring 2 total length of string %ate 2 length of printed output of the string Be.g. - only gives back yearC :umber 2 :umber of floating point digits *tring 2 unused %ate 2 unused ith. *ee :umber !ormats for a complete description
Precision
Currency %ecimal 4roup $rim type :ull 4et fields 6inimal idth
*ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C $he trimming method to apply on the string found in the G6,. )f the value of the field is null" insert this string into the te8tfile Click to retrieve the list of fields from the input streamBsC Alter the options in the fields tab in such a ay that the resulting e idth of lines in the te8t file is minimal. *o instead of save 000000#" ill no longer be padded to their specified length. rite #" etc. *tring fields
)con
11.6.11.2. O tions
$he table belo 3ption *tep name Connection $arget *chema $arget table Commit siFe describes the available options for the $able output step@ %escription :ame of the step. $his name has to be uniHue in a single transformation. $he database connection used to important for data sources that allo $he name of the table to rite data to. rite data to. $his is ith dots Z.V )n it. for table names $he name of the *chema for the table to rite data to.
>se transactions to insert ro s in the database table. Commit the connection every : ro s if : is larger than 0. 3ther ise" donLt use transactions. B*lo erC NOTE: $ransactions are not supported on all database platforms.
$runcate table
is
%escription 6akes /ettle ignore all insert errors such as violated primary keys. A ma8imum of 20 arnings ill be logged ho ever. $his option is not ant to use batch inserts. $his feature groups available for batch inserts.
inserts statements to limit round trips to the database. $his is the fastest option and is enabled by default. >se this options to split the data over multiple tables. !or e8ample instead of inserting all data into table *A,7*" put the data into tables *A,7*Q200.#0" *A,7*Q200.##" *A,7*Q200.#2" ... >se this on systems that donLt have partitioned tables and<or donLt allo inserts into >:)3: A,, vie s or the master of inherited tables. *A,7* allo s you to report on the complete sales@ $he vie
CR7A$7 3R R7P,AC7 A)7' *A,7* A* *7,7C$ ] !R36 *A,7*Q200.0# >:)3: A,, *7,7C$ ] !R36 *A,7*Q200.02 >:)3: A,, *7,7C$ ] !R36 *A,7*Q200.0( >:)3: A,, *7,7C$ ] !R36 *A,7*Q200.0P )s the name of the table defined in a field. >se these options to split the data over one or more tables. $he name of the target table is defined in the field you specify. !or e8ample if you store customer data in the field gender" the data might end up in tables 6 and ! B6ale and !emaleC. $here is an option to e8clude the field containing the tablename from being inserted into the tables. Return auto2generated key :ame of auto2 generated key field *;, Check this if you inserting a ro ant to get back the key that as generated by ill contain into the table. field in the output ro s that
)con
)nsert<>pdate dialog
11.6.12.2. O tions
$he table belo 3ption *tep name Connection $arget schema $arget table Commit siFe %onLt perform any updates Pentaho Data Integration TM provides a description of available options for )nsert<>pdate@ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he database connection used to important for data sources that allo :ame of the table in commit. )f this option is checked" the values in the database are never updated. 3nly inserts are done. S oon !ser "#ide ** hich you rite data to. rite data to. $his is ith dots Z.V )n it. for table names $he name of the *chema for the table to
%escription ?ere you specify a list of field values and comparators. Oou can use the follo ing comparators@ K" TU" T" TK" U" UK" ,)/7" =7$'77:" )* :>,," )* :3$ :>,, Note: Click the L4et fieldsL button to retrieve a list of fields from the
input streasmBsC.
>pdate !ields *pecify all fields in the table you keys. Please note that you can avoid updates on certain fields by specifying : in the update column. Note: Click the L4et update fieldsL button to retrieve a list of update ant to insert<update including the
11.6.1$. ! date
)con
>pdate dialog
11.6.1%. De,ete
)con
%elete dialog
)con
NOTE@ $his step should probably not be used as the tool that could use these Cube files
as
never created" the cube files are not using any MstandardN format.
)con
11.6.16.$. Content
3ption Eipped 7ncoding Parent G6, element Ro G6, element *plit every ... ro s. %escription Check this if you file. $he name of the root element in the G6, document. $he name of the ro another is created Pentaho Data Integration TM S oon !ser "#ide -$ element to use in the G6, document. $he ma8imum number of ro s of data to put in a single G6, file before ant the G6, file to be stored in a E)P archive. $he encoding to use. $his encoding is specified in the header of the G6,
11.6.16.%. 5ie,ds
3ption !ieldname 7lementname $ype %escription $he name of the field. $he name of the element in the G6, file to use. $ype@ $ype of the field can be either *tring" %ate" or :umber. $ype of the field can be either *tring" %ate" or :umber. !ormat mask to convert of format specifiers. ,ength $he length option depends on the field type follo s@ :umber 2 $otal number of significant figures in a number *tring 2 total length of string %ate 2 length of printed output of the string Be.g. - only gives back yearC Note: the output string is padded to this length if it is specified. Precision $he precision option depends on the field type as follo s@ Currency %ecimal 4roup :ull 4et fields 6inimal idth :umber 2 :umber of floating point digits *tring 2 unused %ate 2 unused ith@ see :umber formats for a complete description
*ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C )f the value of the field is null" insert this string into the te8tfile Click to retrieve the list of fields from the input streamBsC. Alter the options in the fields tab in such a fields ay that the resulting e idth of lines in the te8t file is minimal. *o instead of save 000000#" ill no longer be padded to their specified length. rite #" etc. *tring
)con
11.6.17.$. Content
$he content tab provides additional options for the generated 78cel output file including@
3ption ?eader !ooter 7ncoding *plit every...ro s *heet name Protect sheet+ Pass ord >se $emplate 78cel $emplate Append to 78cel template
%escription Check if the spreadsheet needs a header above the e8ported ro s of data. Check if the spreadsheet needs a footer belo for the platform. *plits the data over several output files. Beach in itLs o n spreadsheetC *pecify the name of the *heet to rite to. Check to enable pass ord protection on the target sheet. *pecify the pass ord for the protected sheet. $his is an e8perimental feature that reHuires testing. Check this to use a template hen outputting data to 78cel. $he name of the template used to format the 78cel output file Check this option to have the output appended to the 78cel template specified the e8ported ro s of data. *pecify the encoding of the spreadsheet" leave empty to keep the default
11.6.17.%. 5ie,ds
$he fields tab is here you specify the :ame" data type and format of the fields being ritten to 78cel. $he L4et !ieldsL button such a e ay that the resulting ill retrieve a list of available fields from the input streamBsC coming ill automatically alter the options in the fields tab in idth of lines in the te8t file is minimal. *o instead of save 000000#" ill no longer be padded to their specified length.
into the step. $he L6inimal 'idthL button rite #" etc. *tring fields
Note: Oou can specify any format definitions available in 78cel. $hese formats are not tied to any
)con
11.6.1*.2. O tions
$he follo ing options are available for configuring the Access output@ 3ption *tep name $he database filename Create database $arget table Create table Commit siFe %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he filename of the database file you are connecting to. Check this to generate a ne *pecify the table you Check this to create a ne %efines the commit siFe Access database file. ant to output data to. table in the Access database. hen outputting data.
)con
11.6.1-.2. O tions
$he follo ing table describes the available options for configuring the database lookup@ 3ption *tep name Connection ,ookup schema ,ookup $able 7nable cache+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *elect the database connection for the lookup. *pecify the target schema to use for the lookup. $he name of the table here e do the lookup. e e8pect the $his option caches database lookups. $his means that value. Cache siFe in ro s *pecify the siFe in ro s of the cache to use.
database to return the same value all the time for a certain lookup
3ption /eys to look up table Aalues to return table %o not pass the ro the lookup fails !aile on multiple results+ 3rder by 4et !ileds 4et lookup fields if
%escription *pecify the keys necessary to perform the lookup. *elect the fields from the lookup table to add to the output stream. Check to avoid passing a ro hen the lookup fails.
Check this option to cause the step to fail if the lookup returns multiple results. $he order by field allo s you to specify a field and order type Bascending<descendingC for ho step. Click to return a list of available fields from the lookup table to add to the stepLs output stream. the data is retrieved. Click to return a list of available fields from the input streamBsC of the
#M(O T)NT NOTE: if other processes are changing values in the table
it might be un ise to cache values. ?o ever" in all other cases" enabling this option can seriously increase the performance because database lookups are relatively slo . )f you find that you canVt use the cache" consider launching several copies of this step at the same time. $his database busy via different connections. $o see ho copies of a step. ill keep the to do this" please see ,aunching several
)con
!or e8ample" this transformation adds information coming from a te8t2file B=C to data coming from a database table BAC@
e use information from = to do the lookups is indicated by the option@ L*ource stepL
11.6.20.2. O tions
$he table belo 3ption *tep name ,ookup step $he keys to lookup... !ields to retrieve describes the features available for configuring the stream lookup@ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his is the step name here the lookup data is coming from Allo s you to specify the names of the fields that are used to lookup values. Aalues are al ays searched using the MeHualN comparison. Oou can specify the names of the fields to retrieve here" as default value in case the value case you didnVt like the old one. Preserve memory /ey and value are e8actly one integer field >se sorted list 4et fields Check this to store values using a sorted list. $his provides better memory usage $his hen orking ith data sets containing ide ro s. ant to use ill automatically fill in the names of all the available fields on the $his $his ill encode ro s of data to preserve memory ill also preserve memory hile sorting. hile e8ecuting a sort. as not found or a ne ell as the fieldname in
source side BAC. Oou can then delete all the fields you donVt for lookup. 4et lookup fields $his
ill automatically insert the names of all the available fields on the ant to
lookup side B=C. Oou can then delete the fields you donVt retrieve.
)con
11.6.21.2. O tions
$he follo ing table describes the available options for the Call %= Procedure step@ 3ption *tep name Connection Proc2name !ind it button %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. :ame of the database connection on hich the procedure resides. :ame of the procedure or function to call. Click to search on the specified database connection for available procedures and functions Bat the moment only on 3racle and *;,*erverC. 7nable auto commmit )n some situations you ant to do updates in the database using the specified procedure. )n that case you can either have the changes done using auto2commit or by disabling this. )f auto2commit is disabled" a single commit is being performed after the last ro this step. Result name Result type Parameters :ame of the result of the function call" leave this empty in case of procedure. $ype of the result of the function call. :ot used in case of a procedure. ,ist of parameters that the procedure or function needs !ield name@ :ame of the field. %irection@ Can be either ): Binput onlyC" 3>$ Boutput onlyC" ):3>$ Bvalue is changed on the databaseC. as received by
3ption
%escription $ype@ >sed for output parameters so that /ettle kno s comes back. hat
4et !ields
$his function fills in all the fields in the input streams to make your life easier. *imply delete the lines you donVt need and re2order the ones you do need.
)con
11.6.22.2. O tions
$he follo ing table describes the options available for the ?$$P client step@ 3ption *tep name >R, Result fieldname Parameters %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he base >R, string $he name of the field to store results $his section is here you define the parameter name2value pairs to pass on the >R,.
)con
11.6.2$.$. 0e(o6e
$his tab allo s you to enter the fields that you ant removed from th e stream. Oou can also click the L4et fields to removeL button to add all fields from the input streamBsC. $his makes it easier if you are trying to remove several fields. After getting all fields" simply delete any of the fields that you do not ant remove from the stream.
11.6.2$.%. Meta2data
$his tab allo s you to rename" change data types" and change the length and precision of fields coming into the *elect Aalues step. Click the L4et fields to changeL button to add all fields on the input streamBsC. Note: $he type column is useful for cases
repeated data type conversions. !or e8ample" if your transformation is taking advantage of the laFy conversion option and includes a sort step" this could result in repeated data conversions internal to the sort step in order to perform the data comparisons. Oou can *tringC. orkaround this issue by using the *elect Aalues step to convert your sort key fields to normal data Bi.e. from =inary to
)con
!ilter ro s dialog
edited simply by clicking on it Bgoing do n one level into the condition treeC. !or e8ample" this is a more comple8 e8ample@
11.6.2%.2. O tions
3ption *tep name *end LtrueL data to step *end LfalseL data to step $he Condition %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he ro s for to this step. $he ro s for to this step. Click the L:3$L button in the upper left to negate the condition. Click on the T!ieldU buttons to select from a list of fields from the input streamBsC to build your condidionBsC. Click on the TvalueU button to enter a specific value into your conditionBsC. $o delete a condition" right2click on it and select L%elete ConditionL. Add Condition button Click to add a condition. hich the condition specified evaluates to false are send hich the condition specified evaluates to true are send
)con
*ort ro s dialog
11.6.2&.2. O tions
$he follo ing table describes the options for the *ort step@ 3ption *tep name *ort directory %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $his is the directory in system. $6P2file prefi8 *ort siFe Choose a recogniFable prefi8 in order to recogniFe the files sho up in the temp directory. hen they hich the temporary files are stored in case it is needed. $he default is the standard temporary directory for the
$he more ro s you can store in memory" the faster the sort gets. $his is because less temporary files need to be used and less )<3 is generated.
Compress $6P !iles 3nly pass uniHue ro s+ !ields table 4et !ields button
$his option compresses temporary files complete the sort. 7nable this option if you streamBsC.
*pecify the fields and direction Bascending<decendingC to sort. Oou can optionally specify hether or not to perform a case sensitive sort. Click to retrieve a list of all fields coming in on the streamBsC.
)con
*ort ro s dialog
stored" so the values start back at the same value every time the transformation is launched.
11.6.26.2. O tions
$he follo ing table describes the options for the Add *eHuence step@ 3ption *tep name :ame of value >se %= to generate the seHuence %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. :ame of the ne seHuence value that is added to the stream. ant the seHuence to be driven by a 7nable this option if you database seHuence. >se a transformation Pentaho Data Integration TM *onnection name@ choose the name of the connection on hich the database seHuence resides. Schema name@ optionally specify the tableLs schema name Se+uence name@ allo s you to enter the name of the database seHuence. 7nable this option if you ant the seHuence to be generated by S oon !ser "#ide 110
%escription /ettle. ,se counter to calculate se+uence@ 7nable this option if you ant the seHuence to be generated by /ettle. *ounter name -optional.@ if multiple steps in a transformation generate the same value name" this option ould allo you to specify the name of the counter you ith. $his ould avoid ant to associate the seHuence
forcing uniHue seHuencing across multiple steps. Attention@ )n this case you have to ensure that@ start"
increment and ma8imum value of all counters same name are identical" other ise the result is unpredictable.
Start at@ give the start value of the seHuence.
ith the
#ncrement b'@ give the increment of the seHuence. Maximum "alue@ this is the ma8imum value after the seHuence hich ill start back at the start value B*tart AtC.
Examples: -tart at = ./ i$creme$t by = ./ max value = = Q 2his %ill produce@ ./ / =/ ./ / =/ ./ R -tart at = !/ i$creme$t by = -./ max value = Q 2his %ill produce@ !/ -./ - / !/ -./ - / !R
)con
%ummy dialog
>nfortunately" the L*tream ,ookupL step can only read lookup information from one stream. $he %ummy step can be used to ork around this limitation like this@
)con
Ro
:ormalser dialog
ill convert this data into the follo ing format so that it is easier to Product A = C A = C P sales #0 . #7 #2 7 #& P
11.6.2*.2. O tions
$he follo ing options are available for the Ro 3ption *tep name $ypefield !ields table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he name of the type field. BProduct in our e8ampleC $his is a list of the fields you ant to normaliFer" you ill need to set :ormaliser *tep@
3ption
%escription the follo ing properties for each selected field@ Fieldname@ :ame of the fields to normaliFe BProduct A f C in our e8ampleC. T'pe@ 4ive a string to classify the field BA" = or C in our e8ampleC. Ne! /ield@ Oou can give one or more fields here the ne value should transferred to Bsales in our e8ampleC.
Oou can convert this into@ %A$7 $ype 200(0#0# 200(0#0# 200(0#0# P $his Product Product# Product2 Product( P ould be the setup to do it ith@ *ales #00 2.0 #.0 P Product :umber . #0 P
)con
11.6.2-.2. O tions
$he follo ing options are available for configuring the *plit !ields step@ 3ption *tep name !ield to split %elimiter !ields table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he name of the field you $his table is ant to split. field %elimiter that determines the end of a field. here you define the properties for each ne field" you created by the split. !or each ne ill need to define the
Example 1@ *A,7*QAA,>7* field containing M*ales2K(#0..0" *ales-K#.0.2(N >se these settings to split the field into - ne %elimiter@ " !ield@ *A,7*#" *A,7*2" *A,7*(" *A,7*)d@ *ales#K" *ales2K" *ales(K" *ales-K remove )% yes" yes" yes" yes type@ :umber" :umber" :umber" :umber format@ [[[.[[" [[[.[[" [[[.[[" [[[.[[ group@ decimal@ . currency@ length@ 7" 7" 7" 7 precision@ 2" 2" 2" 2 fields@
)con
>niHue ro s dialog
11.6.$0.2. O tions
$he follo ing table describes all options for the >niHue ro s step@ 3ption *tep name Add counter to output+ !ields to compare table *pecify the field names you ant to force uniHueness on or click the L4etL button to insert all fields from the input streamBsC. Oou can BoptionallyC choose to ignore case by setting the L)gnore caseL flag to O. !or e8ample@ /ettle" /7$$,7" kettle all first occurrence B/ettleC ill be the same in case the compare is done case insensitive. )n this case" the ill be passed to the ne8t stepBsC. %escription :ame of the step. Note: $his name has to be uniHue in a single transformation.
11.6.$1. "ro# =1
)con
4roup by dialog
11.6.$1.2. O tions
$he follo ing table provides a description of the options available for the 4roup =y step@ 3ption *tep name )nclude all ro s+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. Check this if you output" $emporary files directory ant all ro s in the output" not Dust the aggregation. $o differentiate bet een the 2 types of ro s in the e need a flag in the output. Oou need to specify the name hich the temporary files are stored in case it of the flag field in that case. Bthe type is booleanC $his is the directory in system. $6P2file prefi8 Add line number" restart in each group ,ine number field name *pecify the file prefi8 used Check this if you group. Check this if you ant to add a line number that restarts at # in each hen naming temporary files. ant to add a line number that restarts at # in each is needed. $he default is the standard temporary directory for the
%escription group. *pecify the fields over hich you ant to group. Oou can click the L4et !ieldsL button to add all fields from the input streamBsC. *pecify the fields that need to be aggregated" the method and the name of the resulting ne field.
11.6.$2. :#,, If
)con
:ull )f dialog
11.6.$$. Ca,c#,ator
)con
Calculator dialog
Note: An important advantage Calculator has over custom 9ava*cript scripts is that the e8ecution
!unction A _ = ]C *;R$B A]A _ =]= C R3>:%B A C R3>:%B A" = C *et field to constant A %ate A _ = days Oear of date A 6onth of date A %ay of year of date %ay of month of date A %ay of eek of date A 'eek of year of date A )*3100# 'eek of year of date A )*3100# Oear of date A =yte to he8 encode of string A ?e8 encode of string A Char to he8 encode of string A ?e8 decode of string A
%escription Add A and = times C Calculate gBA2_=2C Round A to the nearest integer Round A to = decimal positions Create a field ith a constant value Add = days to %ate field A Calculate the year of date A Calculate number the month of date A A Calculate the day of year B#2(0.C Calculate the day of month B#2(#C Calculate the day of Calculate the Calculate the .(C Calculate the year )*3100# style 7ncode bytes in a string to a he8adecimal representation 7ncode a string in its o n he8adecimal representation 7ncode characters in a string to a he8adecimal representation %ecode a string from its he8adecimal representation Badd a leading 0 lengthC hen A is of odd eek B#27C eek of year B#2.-C eek of the year )*3100# style B#2
)con
11.6.$%.$. 5ie,ds
$he !ields tab is 3ption !ieldname 7lement name $ype !ormat ,ength Precesion here you configure the output fields and their formats. $he table belo %escription :ame of the field. $he name of the element in the G6, file to use. $ype of the field can be either *tring" %ate" or :umber. !ormat mask to convert of format specifiers. 3utput string is padded to this length if it is specified. $he precision to use. ith@ see :umber !ormats for a complete description describes each of the available properties for a field@
%escription *ymbol used to represent currencies like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0"000.00C or Y"Y B..000"00C A grouping can be a Y"Y B#0"000.00C or Y.Y B..000"00C $he string to use in case the field value is null. Attribute@ make this an attribute B: means @ elementC
S%)(E *OLO, circle circle circle circle circle S%)(E rectangle rectangle rectangle rectangle rectangle 3utput *ample #D ( 2 3 7
)con
)con
%enormaliser dialog
11.6.$6.2. O tions
3ption *tep name /ey field 4roup fields $arget fields %escription :ame of the step. $his name has to be uniHue in a single transformation. $he field that defined the key. *pecify the fields that make up the grouping here. *pecify the fields to de2normaliFe. Oou do it by specifying the *tring value for the key field Bsee aboveC. 3ptions are provided to convert data types. 6ostly people use *trings as key2 value pairs so you often need to convert to )nteger" :umber or %ate. )n case you get key2value pair collisions Bkey is not uniHue for the group specifiedC you can specify the aggregation method to use.
11.6.$7. 5,attener
)con
!lattener dialog
11.6.$7.2. O tions
3ption *tep name $he field to flatten $arget fields %escription :ame of the step. $his name has to be uniHue in a single transformation. $he field that needs to be flattened into different target fields. $he name of the target field to flatten to.
11.6.$*. @a,#e Ma
er
)con
!ieldname to use@ ,anguageCode $arget fieldname@ ,anguage%esc *ource<$arget@ 7:<7nglish" !R<!rench" :,<%utch" 7*<*panish" %7<4erman" ... NOTE: )t is also possible to convert a null field or empty *tring value to a non2empty value. ,eave
the M*ource valueN field empty for this. )t is obviously only possible to specify one of these empty source field values.
11.6.$*.2. O tions
$he follo ing properties are used to define the mappings@ 3ption *tep name !ieldname to use $arget field name %efault upon non2 matching !ield values table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. !ield to use as the mapping source. !ield to use as the mapping $arget %efines a default value for situations but thereLs no match. Contans the mapping of source value to converted target value. here the source value is not empty"
)con
=locking dialog
ne8t step. $his step then kno s that all previous steps have finished. Oou can use this for triggering plugins" stored procedures" Dava scripts" ... or for synchroniFation purposes.
11.6.$-.2. O tions
$he follo ing table describes the options for the =locking step@ 3ption *tep name Pass all ro s+ *pool directory *pool2file prefi8 Cache siFe Compress spool files+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. %etermines hether to pass # ro s or all ro s. hich the temporary files are stored in case it is hen they orks. $his is the directory in
needed. $he default is the standard temporary directory for the system. Choose a recogniFable prefi8 in order to recogniFe the files sho up in the temp directory. hen they are needed. $he more ro s you can store in memory" the faster the step $his option compresses temporary files
)con
9oin ro s dialog
$he MOears 8 6onths 8 %aysN step outputs all combinations of Oear" 6onth and %ay. B#&00" #" # 2#00" #2" (#C and can be used to create a date dimension.
11.6.%0.2. O tions
$he follo ing table describes the options for configuring the 9oin ro s step@ 3ption *tep name $emp directory $6P2file prefi8 6a8. cache siFe %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the name of the directory case you here the system stores temporary files in ill be generated. sets that donVt fit ant to combine more then the cached number of ro s.
$his is the prefi8 of the temporary files that files. $his is needed in case you into memory.
$he number of ro s to cache before the systems reads data from temporary ant to combine large ro
*pecifies the step to read the most data from. $his step is not cached or spooled to disk" the others are. Oou can enter a comple8 condition to limit the number of output ro s.
)con
All you then need to specify as a parameter is the productnr and youLll get the customernr included in the result.
11.6.%1.2. O tions
$he follo ing table desribes the options for the %atabase 9oin step@ 3ption *tep name Connection %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he database connection to use.
%escription *;, Huery to launch to ards the database" use Huestion marks as parameter placeholders. 0 means all" any other number limits the number of ro s. Check this to al ays return a result" even if the Huery didnLt return a result. *pecify the fieldns containg parameters and the parameter type.
)con
6erge ro s dialog
coming from the compare stream is passed on to the ne8t steps" e8cept for the MdeletedN
11.6.%2.2. O tions
3ption *tep name Reference ro s origin Compare ro s origin !lag fieldname /eys to match Aalues to compare *pecify the name of the flag flag field on the output stream. *pecify fields containing the keys to match on. Click the L4et key fieldsL button to insert all of the fields originating from the reference ro s step. *pecify fields contaning the values to compare. Click the L4et value fieldsL button to insert all of the fields from the originating value ro s step. *pecify the step origin for the compare ro s. %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the step orgin for the reference ro s.
)con
11.6.%$.2. O tions
3ption *tep name !ields table %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the fieldname and sort direction Bascending<decendingC. Click the L4et !ieldsL button to retrieve a list of fields from the input streamBsC.
)con
11.6.%%.2. O tions
$he follo ing table describes the options available for the 6erge 9oin step@ 3ption *tep name !irst *tep *econd *tep 9oin $ype /eys for # step /eys for 2nd step
st
%escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the first input step to the merge Doin. *pecify the second input step to the merge Doin. *elect from the available types of Doins. *pecify the key fields on *pecify the key fields on hich the incoming data is sorted. Click the L4et hich the incoming data is sorted. Click the L4et key fieldsL button to retrieve a list of fields from the specified step. key fieldsL button to retrieve a list of fields from the specified step.
)con
11.6.%&.2. O tions
3ption *tep name 9ava *cript !ields )nsert fields button $est script button 4et variables button 3ption *tep name 9ava *cript !ields )nsert fields button $est script button 4et variables button %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. *pecify the script you ish to run. e ant to add to the output steam. $hese are the fields that $ests
)nserts the fields and the standard method to grab the value of the field. hether or not the script compiles. 4ets the ne ly created variables and inserts them into the M!ieldsN grid. %escription :ame of the step^ this name has to be uniHue in a single transformation *pecify the script you ant to run $he fields to add to the output steam )nserts the fields and the standard method to grab the value of the field $ests hether or not the script compiles 4ets the ne ly created variables and inserts them into the !ields grid
!unction Aalue CloneBC void set:ameB*tring nameC *tring get:ameBC . void setAalueBdouble numC void setAalueB*tring strC void setAalueB%ate datC void setAalueBboolean boolC void setAalueBlong lC void setAalueBAalue vC double get:umberBC *tring get*tringBC int get*tring,engthBC %ate get%ateBC boolean get=ooleanBC
%escription =uilds a copy of a value and returns a Aalue. *ets the name of a value. 4et the name of a value *et the value to a floating point value. *et the value to a string value. *et the value to a %ate value. *et the value to a =oolean value. *et the value to an integer value. *et the value to the value contained in another field. 4ets the value of a field as a floating point value. 4ets the value of a field as a te8tual string representation. 4ets the length of the string representation. 4ets the value of the field as a date value. 4ets the value of a field as a =oolean. NOTE: *tring MON or MtrueN is converted to true. NOTE: :umeric value 0 is converted to false" everything else is
true.
long get)ntegerBC 4ets the value of a field as an integer. NOTE: %ate fields are converted to the number of milliseconds
!unction boolean eHualsB3bDect vC int hashCodeBC Aalue negateBC Aalue andBAalue vC Aalue 8orBAalue vC Aalue orBAalue vC Aalue boolQandBAalue vC Aalue boolQorBAalue vC Aalue boolQ8orBAalue vC Aalue boolQnotBC Aalue greaterQeHualBAalue vC Aalue smallerQeHualBAalue vC Aalue differentBAalue vC Aalue eHualBAalue vC Aalue likeBAalue vC Aalue greaterBAalue vC Aalue smallerBAalue vC Aalue minusBdouble vC Aalue minusBlong vC Aalue minusBint vC Aalue minusBbyte vC Aalue minusBAalue vC Aalue plusBdouble vC Aalue plusBlong vC Aalue plusBint vC Aalue plusBbyte vC Aalue plusBAalue vC Aalue divideBdouble vC Aalue divideBlong vC Aalue divideBint vC Aalue divideBbyte vC Aalue divideBAalue vC Aalue multiplyBdouble vC Aalue multiplyBlong vC Aalue multiplyBint vC Aalue multiplyBbyte vC Pentaho Data Integration TM
%escription Compares t o values and returns true if the t o values have the same value. Returns a signed 0- values representing the value in the form of a hash code. )f the value is numeric" multiplies the value by 2#" in all other cases it doesnVt do anything. Calculates the bit ise A:% of t o integer values. Calculates the bit ise G3R of t o integer values. Calculates the bit ise 3R of t o integer values. Calculates the boolean A:% of t o boolean values. Calculates the boolean 3R of t o boolean values. Calculates the boolean G3R of t o boolean values. Calculates the boolean :3$ of a boolean value. Compares t o values and sets the first to true if the second is greater or eHual to the first. Compares t o values and sets the first to true if the second is smaller or eHual to the first. Compares t o values and sets the first to true if the second is different from the first. Compares t o values and sets the first to true if the second is eHual to the first. *ets the first value to true if the second string is part of the first. Compares t o values and sets the first to true if the second is greater than the first. Compares t o values and sets the first to true if the second is smaller than the first. *ubtracts v from the field value.
string concatenated.
!unction Aalue multiplyBAalue vC Aalue absBC Aalue acosBC Aalue asinBC Aalue atanBC Aalue atan2BAalue arg0C Aalue atan2Bdouble arg0C Aalue ceilBC Aalue cosBC Aalue coshBC Aalue e8pBC Aalue floorBC Aalue initcapBC Aalue lengthBC Aalue logBC Aalue lo erBC Aalue lpadBAalue lenC Aalue lpadBAalue len" Aalue padstrC Aalue lpadBint lenC Aalue lpadBint len" *tring padstrC Aalue ltrimBC Aalue modBAalue argC Aalue modBdouble arg0C Aalue nvlBAalue altC Aalue po erBAalue argC Aalue po erBdouble arg0C Aalue replaceBAalue repl" Aalue *tring ithC ithC Aalue replaceB*tring repl" Aalue roundBC Aalue rpadBAalue lenC Aalue rpadBAalue len" Aalue padstrC Aalue rpadBint lenC Aalue rpadBint len" *tring padstrC Aalue rtrimBC Aalue signBC
%escription
as negative.
*ets the field value to the cosine of the number value. *ets the field value to the arc sine of the number value. *ets the field value to the arc tangents of the number value. *ets the field value to the second arc tangents of the number value. *ets the field value to the ceiling of a number value. *ets the field value to the cosine of a number value. *ets the field value to the hyperbolic cosine of a number value. *ets the field value to the e8p of a number value. *ets the field value to the floor of a number value. *ets the all first characters of ords in a string to uppercase. Mmatt castersN 2U M6att CastersN *ets the value of the field to the length of the *tring value. *ets the field value to the log of a number value. *ets the field value to the string value in lo ercase. *ets the field value to the string value" left padded to a certain length. %efault the padding string is a single space. 3ptionally" you can specify your o n padding string.
*ets the value to the modulus of the first and the second number. )f the field value is :ull" set the value to alt. Raises the field value to the po er arg. Replaces a string in the field value ith another.
Rounds the field value to the nearest integer. *ets the field value to the string value" right padded to a certain length. %efault the padding string is a single space. 3ptionally" you can specify your o n padding string.
Remove the spaces to the right of the field value. *ets the value of the string to 2#" 0 or # in case the field value is negative" Fero or positive.
!unction Aalue sinBC Aalue sHrtBC Aalue substrBAalue from" Aalue toC Aalue substrBAalue fromC Aalue substrBint fromC Aalue substrBint from" int toC Aalue sysdateBC Aalue num2strBC Aalue num2strB*tring arg0C Aalue num2strB*tring arg0" *tring arg#C Aalue num2strB*tring arg0" *tring arg#" *tring arg2C Aalue num2strB*tring arg0" *tring arg#" *tring arg2" *tring arg(C .
%escription *ets the value of the field to the sine of the number value. *ets the value of the field to the sHuare root of the number value. *ets the value of the field to the substring of the string value.
*ets the field value to the system date *ets the field value to the tangents of the number value. Converts a number to a string. Arg0@ format pattern" see also :umber !ormats Arg#@ %ecimal separator Beither . or "C Arg2@ 4rouping separator Beither . or "C Arg(@ Currency symbol !or e8ample converting value@ . =S.;" usi$g $um str(TUUU/UU!.!!V/ T/V/ T.V) gives .. =S/;" . = usi$g $um str(TUUU/UU!.!!V/ T/V/ T.V) gives !/ = . =S.;" usi$g $um str(T!!!/!!!.!!V/ T/V/ T.V) gives !!.. =S/;"
Aalue dat2strBC Aalue dat2strB*tring arg0C Aalue dat2strB*tring arg0" *tring arg#C Aalue num2datBC Aalue str2datB*tring arg0C Aalue str2datB*tring arg0" *tring arg#C Aalue str2numBC Aalue str2numB*tring arg0C Aalue str2numB*tring arg0" *tring arg#C Aalue str2numB*tring arg0" *tring arg#" *tring arg2C Aalue str2numB*tring arg0" *tring arg#" *tring arg2" *tring arg(C Aalue dat2numBC Aalue trimBC Aalue upperBC Aalue eBC Aalue piBC Pentaho Data Integration TM
Converts a date into a string. Arg0@ format pattern" see also :umber !ormats Arg#@ localiFed date2time pattern characters Bu" tC Converts a number to a date based upon the number of milliseconds since 9anuary #st" #&70 00@00@00 46$. Converts a string to a date. Arg0@ format pattern" see also :umber !ormats Arg#@ localiFed date2time pattern characters Bu" tC Converts a string into a number. Arg0@ format pattern" see also :umber !ormats Arg#@ %ecimal separator Beither . or "C Arg2@ 4rouping separator Beither . or "C Arg(@ Currency symbol
Converts a date into a number being the number of milliseconds since 9anuary #st" #&70 00@00@00 46$. Remove spaces left and right of the string value. *ets the field value to the uppercase string value. *ets the value to e *ets the value to p S oon !ser "#ide 1%$
!unction Aalue addQmonthsBint monthsC Aalue lastQdayBC Aalue firstQdayBC Aalue truncBC Aalue truncBdouble levelC Aalue truncBint levelC
%escription Adds a number of months to the date value. *ets the field value to the last day of the month of the date value. *ets the field value to the first day of the month of the date value. *et the field value to the truncated number or date value. ,evel means the number of positions behind the comma or in the case of a date" .Kmonths" -Kdays" (Khours" 2Kminutes" #Kseconds" 0Kmiliseconds
7ncode a *tring value in its he8adecimal representation. 7.g. )f value is a string MaN" the result value is a string M0#N" the result value is odd a leading 0 ould be M0#N. ould be MaN. )f the input string %ecode a *tring value from its he8adecimal representation. 7.g. )f ill be silently added.
'' >ay of the %ee< (8o$day--u$day) dayAofA%ee<Adesc = dateAfield.Clo$e().dat str(W9999W).i$itcap().get-tri$g()& '' >ay of %ee< (.-X) dayAofA%ee< = dateAfield.Clo$e().dat str(W,W).get1$teger()& NOTE: )f you donVt use CloneBC" the original value
ork on
)con
11.6.%6.%. 5ie,ds
$he !ields table contains a list of variables from your script including the ability to add metadata like a descriptive name.
11.6.%6.&. 78tras
4et $ariables button 7 Retrieves a list of variables from your script. Test script button 7 >se this button to test the synta8 of your script.
)con
$he *;, script to e8ecute might look like this@ C79529 25:+9 tabO ( a 1)29(97 )& $he field name to specify as parameter is then the McountN seHuence second step. NOTE@ $he e8ecution of the transformation e defined in the
ill halt
As e8tra option" you can return the total number of inserts B):*7R$ ):$3 statementsC" updates B>P%A$7 tableC" deletes B%7,7$7 !R36 tableC and reads B*7,7C$ statementsC by specifying the field names in the lo er right of the dialog.
)con
%imension ,ookup<>pdate
)n our dimension implementation each entry in the dimension table has the follo ing properties@ 3ption $echnical key Aersion field *tart of date range 7nd of date range /eys !ields %escription $his is the primary key of the dimension. *ho s the version of the dimension entry Ba revision numberC. $his is the fieldname containing the validity starting date. $his is the fieldname containing the validity ending date. $hese are the keys used in your source systems. !or e8ample@ customer numbers" product id" etc. $hese fields contain the actual information of a dimension.
As a result of the lookup or update operation of this step type" a field is added to the stream containing the technical key of the dimension. )n case the field is not found" the value of the dimension entry for not found B0 or #" based on the type of databaseC is returned. NOTE@ $his dimension entry is added automatically to the dimension table
run.
11.6.%*.2. O tions
$he follo ing table provides a more detailed description of the options for the %imension ,ookup<>pdate step@ 3ption *tep name >pdate the dimension+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. Check this option if you ant to update the dimension based on the information in the input stream. )f this option is not enabled" the dimension only does lookups and only adds the technical key field to the streams. Connection $arget schema $arget table Commit siFe Cache siFe in ro s :ame of the database connection on resides. $his allo s you to specify a schema name to improve precision in the Huoting and allo *etting this to #0 for table2names ith dots Z.V in it. :ame of the dimension table. ill generate a commit every #0 inserts or updates. ill be held in memory to $his is the cache siFe in number of ro s that database. Note: Please note that only the last version of a dimension entry is hich the dimension table
kept in memory. )f there are more entries passing than kept in memory" the technical keys memory in the hope that these are the most relevant.
hat can be
A cache siFe of 0 caches as many ro s as possible and until your 9A6 runs out of memory. >se this option gro n too large. isely ith dimensions that canLt
%escription A cache siFe of 2# means that caching is disabled. *pecify the names of the keys in the stream and in the dimension table. $his specify ill enable the step to do the lookup. ant the values to be updated Bfor all versions" ant to have the values inserted into e used in the version. )n the e8ample as !or each of the fields you need to have in the dimension" you can hether you this is a $ype ) operationC or you the dimension as a ne
screenshot the birth date is something thatVs not variable in time" so if the birth date changes" it means that it in all versions of the dimension entry. $echnical key field $his indicates the primary key of the dimension. )t is also referred to as *urrogate /ey. >se the ne name option to rename the technical key after a lookup. !or e8ample" if you need to lookup different types of products like 3R)4):A,QPR3%>C$Q$/" R7P,AC767:$QPR3%>C$Q$/" ... Creation of technical key *pecify ho the technical key is generated" options available for your connection ill be grayed out@ technical key ill be created >se table ma8imum _ #@ A ne hich are not rong in previous versions. )tVs only logical then" that the previous values are corrected
from the ma8imum key in the table. :ote that the ne ma8imum is al ays cached" so that the ma8imum does not need to be calculated for each ne ro . ant to use >se seHuence@ *pecify the seHuence name if you technical key Btypical for 3racle e.g.C. >se auto increment field@ >se an auto increment field in the database table to generate the technical key Btypical for %=2 e.g.C. Aersion field *tream %atefield *pecifies the name of the field to store the version Brevision numberC in. )f you have the date at hich the dimension entry as last changed" you can specify the name of that field here. )t allo s the dimension entry to be accurately described for %ate range start field $able daterange end 4et !ields button *;, button hat the date range concerns. )f ill be taken. you donVt have such a date" the system date
*pecify the names of the dimension entries start range. *pecify the names of the dimension entries end range. !ills in all the available fields on the input stream" e8cept for the keys you specified. 4enerates the *;, to build the dimension and allo s you to e8ecute this *;,.
11.6.%*.$. 0e(ar3s
For the Stream date /ield@ Consider adding an e8tra date field from *ystem )nfo if you donVt ant the date ranges to be different all the time. !or e8ample if you have e8tracts from a source Pentaho Data Integration TM S oon !ser "#ide 1&2
system being done every night at midnight" consider adding date MOesterday 2(@.&@.&N as a field to the stream by using a 9oin step. #M(O T)NT NOTE@ this needs to be a %ate field. 'e isolate functionality and as such reHuire
)con
Combination ,ookup<>pdate
,ookup combination of business key field#... fieldn from the input stream in a dimension
passing through this step all of the remaining data changes for the dimension table can be made as updates" as either a ro maintain the key information" you update<lookup step. /ettle ill store the information in a table here the primary key is the combination of the business in case you have a large number of ill still need to update the non2key information in the
dimension table" e.g. by putting an update step Bbased on technical keyC after the combination
key fields in the table. =ecause this process can be very slo speed up lookup performance dramatically
fields" /ettle also supports a Mhash codeN field representing all fields in the dimension. $his can hile limiting the fields to inde8 to #.
11.6.%-.2. O tions
3ption *tep name >pdate the dimension+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. Check this option if you ant to update the dimension based on the information in the input stream. )f this option is not enabled" the dimension only does lookups and only adds the technical key field to the streams. Connection $arget schema $arget table Commit siFe Cache siFe in ro s :ame of the database connection on resides. $his allo s you to specify a schema name to improve precision in the Huoting and allo *etting this to #0 for table2names ith dots Z.V in it. :ame of the dimension table. ill generate a commit every #0 inserts or updates. ill be held in memory to $his is the cache siFe in number of ro s that database. Note: Please note that only the last version of a dimension entry is hich the dimension table
kept in memory. )f there are more entries passing than kept in memory" the technical keys memory in the hope that these are the most relevant.
hat can be
A cache siFe of 0 caches as many ro s as possible and until your 9A6 runs out of memory. >se this option gro n too large. A cache siFe of 2# means that caching is disabled. /ey fields $echnical key field *pecify the names of the keys in the stream and in the dimension table. $his ill enable the step to do the lookup. $his indicates the primary key of the dimension. )t is also referred to as *urrogate /ey. >se the ne name option to rename the technical key after a lookup. !or e8ample" if you need to lookup different types of products like 3R)4):A,QPR3%>C$Q$/" R7P,AC767:$QPR3%>C$Q$/" ... Creation of technical key *pecify ho the technical key is generated" options available for your connection ill be grayed out@ technical key ill be created >se table ma8imum _ #@ A ne hich are not isely ith dimensions that canLt
from the ma8imum key in the table. :ote that the ne ma8imum is al ays cached" so that the ma8imum does not need to be calculated for each ne ro . ant to use >se seHuence@ *pecify the seHuence name if you technical key Btypical for 3racle e.g.C. >se auto increment field@ >se an auto increment field in the database table to generate the technical key Btypical for %=2 Pentaho Data Integration TM S oon !ser "#ide 1&&
%escription e.g.C. 7nable this option if you technical key. ant to remove all the lookup fields from the input stream in the output. $he only e8tra field added is then the
>se hashcode
$his option allo s you to generate a hash code" representing all values in the key fields in a numerical form Ba signed 0- bit integerC. $his hash code has to be stored in the table. #M(O T)NT: $his hash code is :3$ uniHue. As such it makes no
11.6.%-.$. 0e(ar3s
$he Combination ,ookup<>pdate step assumes that the dimension table it maintains is not updated concurrently by other transformations<applications. 'hen you use e.g. the M$able 6a8 _ #N method to create the technical keys the step highest technical key. $he technical ill not al ays go to the database to retrieve the ne8t ould ill most likely get errors on duplicate technical ill be cached locally" so if multiple transformations
keys. >sing a seHuence or an auto increment technical key to generate the technical key it is still not advised to concurrently do updates to a dimension table because of possible conflicts bet een transformations. )t is assumed that the technical key is the primary key of the dimension table or at least has a uniHue inde8 on it. )tLs not #00X reHuired but if a technical key e8ists multiple times in the dimension table the result for the Combination ,ookup<>pdate step is unreliable.
11.6.&0. Ma
ing
)con
!or e8ample you can create mappings for dimension lookups so that you donLt need to enter the natural keys etc. every time again. )n the dialog" you can specify the transformation name and even launch another *poon edit the selected transformation. indo to
78ample@
*uppose e ant to create a mapping that does a lookup in the customer slo ly changing arehouse" you need to specify the details for the dimension in Huestion e ant to create a mapping. $he details needed for the dimension. )n a larger
dimension lookup are in this case the customer number and the lookup reference date. $hese 2 e specify in the mapping input step@
After this
e can perform any calculation in our reusable transformation B6appingC" in our case
$his dimension lookup step adds one field to the eHuation@ customerQtk. 'e can specify the fields that ere in the 6apping 3utput step@
'hen
e can do it@
ay
e do it is by MmappingN the stream fields of our choice to the reHuired e can re2name on the output side.
)con
you to design more easily" you can enter the meta2data of the fields youLre e8pecting
from the previous transformation in a Dob. #M(O T)NT: no validation of the supplied metadata is done at this time to allo
for greater
)con
)con
Running on an application server like on the Pentaho frame ork can become a problem. $hat is because other transformations running on the server step makes.
$alid in the parent :ob@ the variable is only valid in the parent Dob. $alid in the grand7parent :ob@ the variable is valid in the grand2parent Dob and all the child Dobs and transformations. $alid in the root :ob@ the variable is valid in the root Dob and all the child Dobs and transformations.
)con
4et Aariable
)con
)con
11.6.&7. InCector
)con
Ro
)nDector
2ra$s tra$s = $e% 2ra$s(... 2ra$s8eta ...)& tra$s.prepare9xecutio$(args)& 7o%#rocuder rp = tra$s.add7o%#roducer(-tri$g step$ame/ i$t stepCopy)& After that you start the threads in the transformation. $hen you can inDect the ro s transformation is running@ tra$s.start2hreads()& ... rp.put7o%(7o% -ome7o%6ou3ave2o1$ject)& ... Oou can also specify the ro s you are e8pecting to be inDected. $his makes it easier to build transformations because you have the meta2data at design time. hile the
)con
*ocket reader
)con
*ocket
riter
to synchroniFe the preparation and start cycles of the transformations bet een the hosts. Blike the
)con
Aggregate Ro s
NOTE@ $his step type is deprecated. *ee the 4roup =y step for a more po erful
ay of aggregating
)con
*treaming G6,
11.6.61.$. Content
3ption )nclude filename in output 5 fieldname %escription Check this option if you the ro field here the filename ant to have the name of the G6, file to ill end up in. hich belongs in the output stream. Oou can specify the name of the
%escription Check this option if you ant to have a ro number Bstarts at #C in the here the integer ill end up in. output stream. Oou can specify the name *pecify the path by follo s@ )@ still specify an attribute Ep@ specify an element defined by position BeHuivalent to 7 in original G6,)nputC. Ea@ specify an element defined by an attribute and allo 78ample@ 7pKelement<# 7aKelement<att@val attribute called YattY this is the first element called YelementY this is the element called YelementY that have an ith YvalY value value parsing.
Oou can specify the ma8imum number of ro s to read here. ay of elements to the repeating part of the G6, file. $he element column is used to specify the element and position as
11.6.61.%. 5ie,ds
3ption :ame $ype !ormat ,ength %escription :ame of the field $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4roup $rim type :ull if Repeat Position !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ >sed to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C type trim this field Bleft" right" bothC before processing treat this value as :>,, O<:@ )f the corresponding value in this ro the last time it as not empty is empty@ repeat the one from
Position@ $he position of the G6, element or attribute. Oou use the follo ing synta8 to specify the position of an element@ $he first element called MelementN@ 9=eleme$t'. $he first attribute called MattributeN@ 5=attribute'. $he first attribute called MattributeN in the second MelementN tag@ 9=eleme$t' / 5=attribute'. NOTE@ Oou can auto2generate all the possible positions in the G6, file
3ption
%escription
added to allo
:o
YnameY attribute"
e are about to have the follo ing fields YbrandY" YtypeY and Ypo erY according
to the YnameY attribute. !or this" e must specify the association bet een YpropertyY and YnameY in the first grid.
Pressing the M4et !ieldsN button retrieves the right fields including properties. ,et us no try leaving the ne grid empty.
Oou can see that in this case the step is and missing elements
their position. )n this case" it is better to use value parsing" cause you get the right field names" ill not corrupt results Bfor e8ample missing Tproperty nameKYpo erYU T<propertyU in some ro sC.
11.6.62. 'bort
)con
Abort
11.6.62.2. O tions
3ption *tep name Abort threshold %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he threshold of number of ro s after )f threshold is 0" the abort step is ." the abort step Abort message Al ays log ill be used. Al ays log the ro s processed by the Abort step. $his allo s the ro s to be logged although the log level of the transformation $his ay you can al ays see in the log to abort. ould normally not do it. hich ro s caused the transformation hich to abort the transformations. 7.g. ill abort after seeing the first ro . )f threshold
$he message to put in the log upon aborting. )f not filled in a default message
)con
#M(O T)NT; 9ust like all steps in the M78perimentalN category" this step is not considered ready for production use by the author. )n the specific case of the 3racle =ulk loader to do e8tensive testing on it. Oour feedback is most elcome as al ays. e lacked the time
11.6.6$.2. O tions
3ption *tep name Connection $arget schema $arget table *Hldr path ,oad method %escription :ame of the step.
3ption
%escription can be used as a back2door@ you can have P%) generate the data and create e.g. your o n control file to load the data Boutside of this stepC.
Append" )nsert" Replace" $runcate. $hese map to the sHlldr action to be performed. $he number of ro s in error after the M7RR3RN attribute of sHlldr. $he number of ro s after attribute of sHlldr load. hich to commit" this corresponds to the MR3'*N hich differs bet een using a conventional and a direct path hich sHlldr ill abort. $his corresponds to
=ind *iFe Read *iFe Control file %ata file ,og file =ad file %iscard file 7ncoding %irect path 7rase cfg<dat files after use !ields to load
Corresponds to the M=):%*)E7N attribute of sHlldr. Corresponds to the MR7A%*)E7N attribute of sHlldr. $he name of the file used as control file for sHlldr. $he name of the data file in hich the data ill be ritten. $he name of the log file" optionally defined. $he name of the bad file" optionally defined. $he name of the discard file" optionally defined. 7ncodes data in a specific encoding" any valid encoding can be chosen besides the one in the drop do n list. * itch on direct path loading" corresponds to %)R7C$K$R>7 in sHlldr. 'hen s itched on the control and data file ill be erased after loading.
$his table contains a list of fields to load data from" properties include@ Table /ield@ $able field to be loaded in the 3racle table^ Stream /ield@ !ield to be taken from the incoming ro s^ Date mas<@ 7ither M%ateN or M%ate maskN" determines ho date<timestamps ill be loaded in 3racle. 'hen left empty defaults to M%ateN in case of dates.
11.6.6%. '
end
)con
Append
11.6.6%.2. O tions
3ption *tep name ?ead hop $ail hop %escription :ame of the step.
)con
Rege8 7valuation
11.6.6&.$. Content
3ption )gnore differences in >nicode encodings 7nables case2 insensitive matching %escription Check to ignore differences. Note: $his may improve performance" but be sure you data only
%escription Note: Oou can also enable this via the embedded flag e8pression B+iC. 'hen enabled" the step comments starting ill ignore hitespace and embedded
and comments in
Note: Comments mode can also be enabled via the embedded flag
e8pression B+8C.
'hen enabled" the e8pression L.L matches any character including the line terminator. =y default" this e8pression does not match the line terminators. Note: %otall mode can also be enabled via the flag e8pression B+sC. 7nable multiline mode 'hen enabled" the e8pressions LhL and LRL match Dust after or Dust before" respectively" a line terminator or the end of the input seHuence. =y default" these e8pressions only match at the beginning and the end of the entire input seHuence. Note: 6ultiline mode can also be enabled via the flag e8pression B+mC 7nable >nicode2a are case folding 'hen enabled" in conDunction ith the Case2insensitive flag" case2 ith the >nicode insensitive matching is done in a manner consistent
standard. =y default" case2insensitive matching assumes that only characters in the >*2A*C)) charset are being matched. Note: >nicode2a are case folding can also be enabled via the
mode B+dC.
11.6.66. CS@ In #t
)con
C*A )nput
11.6.66.2. O tions
$he table belo 3ption *tep name !ilename %elimiter 7nclosure :)3 buffer siFe describes the options available for the C*A )nput step@ %escription :ame of the step.
)con
11.6.67.2. O tions
$he table belo 3ption *tep name !ilename ,ine idth in bytes ,ine feeds present+ :)3 buffer siFe describes the options available for the !i8ed !ile )nput step@ %escription :ame of the step.
%escription Click to previe the data coming from the target file. ill be added Click to return a list of fields from the target file based on the current settings Bi.e. %elimiter" 7nclosure" etc.C. All fields identified to the !ields $able.
)con
reHuired and it isnLt found" an error is generated. 3ther ise" the filename is simply skipped. *ho Previe !ilenameBsC button ro s button %isplays a list of all files that selected file definitions. %isplays a previe configuration. of the data based on the current step ill be loaded based on the current
11.6.6*.$. Content
3ption $able Pentaho Data Integration TM %escription *pecify the name of the table to read from or click bro se to bro se for a S oon !ser "#ide 1*2
3ption )nclude filename in output+ )nclude tablename in output+ )nclude ro num in output+ Reset Ro num per file ,imit
%escription table. 3ptionally allo s you to insert a field containing the filename onto the stream. 3ptionally allo s you to insert a field containing the tablename onto the stream. 3ptionally allo s you to insert a field containing the ro stream. 3ptionally allo s you to reset the ro from. 3ptionally specify a limit on the number of ro s to read. number for each file being read number onto the
11.6.6*.%. 5ie,ds
3ption :ame Column $ype !ormat ,ength %escription :ame of the field $he name of the column being read from. $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4roup $rim type Repeat !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ >sed to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C $ype trim this field Bleft" right" bothC before processing O<:@ )f the corresponding value in this ro last time it as not empty is empty@ repeat the one from the
11.6.6-. LD'P In #t
)con
,%AP )nput
11.6.6-.$. Content
%escription 3ptionally allo s you to insert a field containing the ro stream. *pecify the name of the field to contain ro numbers. 3ptionally specify the a limit on the number of ro s to read. number onto the
11.6.6-.%. 5ie,ds
3ption :ame Column $ype !ormat ,ength %escription :ame of the field $he name of the column being read from. $ype of the field can be either *tring" %ate or :umber *ee :umber !ormats for a complete description of format symbols. !or :umber@ $otal number of significant figures in a number^ !or *tring@ total length of string^ !or %ate@ length of printed output of the string Be.g. - only gives back the yearC. Precision Currency %ecimal 4roup $rim type Repeat !or :umber@ :umber of floating point digits^ !or *tring" %ate" =oolean@ unused^ >sed to interpret numbers like R#0"000.00 or d..000"00 A decimal point can be a Y.Y B#0^000.00C or Y"Y B..000"00C A grouping can be a dot Y"Y B#0^000.00C or Y.Y B..000"00C $ype trim this field Bleft" right" bothC before processing O<:@ )f the corresponding value in this ro last time it as not empty is empty@ repeat the one from the
)con
Closure 4enerator
improve performance" please refer to the 6ondrian documentation found here. $echnically" this step reads all input ro s in memory and calculates all possible parent2child relationships. )t attaches the distance Bin levelsC from parent to child.
11.6.70.2. O tions
3ption *tep name Parent )% field Child )% field %istance field name Root is Fero B)ntegerC+ %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he field name that contains the parent )% of the parent2child relationship. $he field name that contains the child )% of the parent2child relationship. $he name of the distance field that Fero B0C. ill be added to the output. Check this bo8 if the root of the parent2child tree is not empty BnullC but
11.6.70.$. 78a( ,e
$he e8ample data sho n belo tables found here. as taken from the 6ondrian help pages on the subDect of closure
11.6.71. Mondrian In #t
)con
6ondrian )nput
11.6.71.2. O tions
3ption *tep name Connection 6%G ;uery Catalog location Previe button %escription :ame of the step. Note: $his name has to be uniHue in a single transformation. $he database connection to the database associated you ant to Huery. ant to e8ecute. *pecify the 6%G Huery you Click to previe ith the 6ondrian cube
*pecify the location of the 6ondrian *chema file. the data based on the current step settings.
)con
4et !iles Ro
Count
11.6.72.$. Content
3ption Ro s Count fieldname Ro s *eparator type Ro separator %escription :ame of the field that *pecify the ro a custom ro )nclude files count in ill contain the fileBsC ro countBsC. count. separator type for generating the ro separator.
'hen the *eparator type is set to custom" this setting is used to specify 3ptionally allo s you to insert a field containing the fileBsC count onto
the stream. :ame of the field that ill contain the file counts.
)con
%ummy Plugin
3ption ,og table >se batch2)% Pass the batch2)% to Dob entries+ >se logfield to store logging in+ *;, button
%escription specifies the name of the log table Bfor e8ample ,Q7$,C >se a batch )% in the logging table Check this if you Check this if you te8t field. BC,3=C 4enerates the *;, needed to create the logging table and allo s you to e8ecute this *;, statement. ant to pass the generated uniHue batch )% to BtransformationC ant to store the logging of this Dob in the logging table in a long Dob entries in this Dob.
)con
*tart
)con
1$.2.$. Transfor(ation
)con
*tart
1$.2.$.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it :ame of transformation Repository directory !ilename *pecify log file :ame of log file 78tension of the log file )nclude date in filename )nclude time in filename ,ogging level Copy previous results to arguments Arguments ill be the same Dob entry. here the transformation is located. $he name of the transformation to start. $he directory in the repository )f youLre not orking the transformation to start. Check this if you ant to specify a separate logging file for the e8ecution of this transformation. $he directory and base name of the log file Bfor e8ample C@SlogsC $he filename e8tension Bfor e8ample@ log or t8tC Adds the system date to the filename. BQ200.#2(#C Adds the system time to the filename. BQ2(.&.&C *pecifies the logging level for the e8ecution of the transformation. *ee also the logging indo in #.. ,ogging $he results from a previous transformation can be sent to this one using the MCopy ro s to resultN step *pecify the strings to use as arguments for the transformation. ith a repository" specify the G6, filename of
3ption 78ecute once for every input ro Clear the list or result ro s before e8ecution Clear the list of result files before e8ecution
%escription *upport for MloopingN has been added by allo ing a transformation to be e8ecuted once for every input ro . Checking this makes sure that the list or result ro s is cleared before the transformation is started. Checking this makes sure that the list or result files is cleared before the transformation is started.
NOTE: you can use variables Rbpathc in the filename and transformation name fields to specify
1$.2.%. 9ob
)con
9ob
1$.2.%.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. :ame of transformation Repository directory !ilename *pecify log file :ame of log file 78tension of the log file )nclude date in filename Adds the system date to the filename. BQ200.#2(#C $he directory in the repository )f youLre not Dob to start. Check this if you ant to specify a separate logging file for the e8ecution of this Dob. $he directory and base name of the log file Bfor e8ample C@SlogsC $he filename e8tension Bfor e8ample@ log or t8tC orking here the Dob is located. ith a repository" specify the G6, filename of the $he name of the Dob to start. ill
3ption )nclude time in filename ,ogging level Copy previous results to arguments Arguments 78ecute once for every input ro
%escription Adds the system time to the filename. BQ2(.&.&C *pecifies the logging level for the e8ecution of the Dob. *ee also the logging indo in #.. ,ogging $he results from a previous transformation can be sent to this Dob using the MCopy ro s to resultN step in a transformation. *pecify the strings to use as arguments for the Dob. $his implements looping. )f the previous Dob entry returns a set of result ro s" you can have this Dob e8ecuted once for every ro ro found. 3ne is passed to this Dob at every e8ecution. !or e8ample you can
e8ecute a Dob for each file found in a directory using this option. NOTE: you can use variables Rbpathc in the filename and Dob name fields to specify the Dob to be
e8ecuted.
1$.2.&. She,,
)con
*hell
/ettle logging system. %oing this no longer blocks the shell script.
NOTE: 3n 'indo s" scripts are preceded by MC6%.7G7 <CN B:$<GP<2000C or MC366A:%.C36 <CN
B&."&1C.
1$.2.&.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it *cript file name 'orking directory ill be the same Dob entry. $he filename of the shell script to start" should include full path else Rbuser.dirc is used as path. $he directory that script. $he script. 'hen the field is left empty or the Rbuser.dirc *pecify log file :ame of log file 78tension of the log file ill be used as Check this if you orking directory is invalid orking directory. ill be used as orking directory for the shell hen the shell orking directory only becomes active
script starts so M!ilenameN should still include the full path to the
e8ecution of this shell script. $he directory and base name of the log file Bfor e8ample C@SlogsC $he filename e8tension Bfor e8ample@ log or t8tC
3ption )nclude date in filename+ )nclude time in filename+ ,oglevel Copy previous results to arguments+ 78ecute once for every input ro
%escription Adds the system date to the filename. BQ200.#2(#C Adds the system time to the filename. BQ2(.&.&C *pecifies the logging level for the e8ecution of the shell. *ee also the logging indo in #.. ,ogging $he results from a previous transformation can be sent to the shell script using the MCopy ro s to resultN step. Bas argumentsC $his implements looping. )f the previous Dob entry returns a set of result ro s" you can have this shell script e8ecuted once for every ro found. 3ne ro is passed to this script at every e8ecution in can then be found on command line combination ith the copy previous result to arguments. $he values
argument R#" R2" ... BX#" X2" X(" ... on 'indo sC *pecify the strings to use as arguments for the shell script.
1$.2.6. Mai,
)con
9ob 6ail
1$.2.6.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. %estination address >se authentication Authentication user Authentication pass ord *6$P server Reply address *ubDect )nclude date in message Contact person Contact phone Comment Attach files to message *elect the result files types to attach. Eip files into a single archive $he name of the contact person to be placed in the e26ail $he contact telephone number to be placed in the e26ail Additional comment to be placed in the e26ail Check this if you ant to attach files to this message. 'hen a transformation Bor DobC processes files Bte8t" e8cel" dbf" etcC an entry is being added to the list of files in the result of that transformation or Dob. *pecify the types of result files you Check this if you ant to add. ant to Fip all selected files into a single archive $he mail server to hich the mail has to be sent. $he reply address for this e26ail $he subDect of the e26ail Check this if you ant to include the date in the e26ail $he destination for the e26ail Check this if your *6$P server reHuires you to authenticate yourself. $he user name to authenticate $he pass ord to authenticate ith ith. ill
BrecommendedJC $he Fip filename *pecify the name of the Fip file that ill be placed into the e2mail. NOTE: All te8t fields can be specified using Benvironment and /ettleC
3ption
%escription
variables" possibly set in a previous transformation using the *et Aariable step.
1$.2.7. S;L
)con
1$.2.7.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it %atabase Connection >se variable substitution+ *;, script ill be the same Dob entry. $he database connection to use. 7nables kettle variables to be used in the *;, *cript. $he *;, script to e8ecute.
)con
4et a file
ith !$P
1$.2.*.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. !$P server name >ser name Pass ord Remote directory $arget directory 'ildcard $he name of the server or the )P address $he user name to log into the !$P server $he pass ord to log into the !$P server $he remote directory on the !$P server from $he directory on the machine on place the transferred files *pecify a regular e8pression here if you !or e8ample@ .*txtD 5.*Y!-Z[F.txt .txt >se binary mode+ $imeout Remove files after retrieval+ %onLt over rite files Pentaho Data Integration TM Check this if the files need to be transferred in binary mode. $he !$P server timeout in seconds. Remove the files on the !$P server" but only after all selected files have been successfully transferred. *kip a file directory. S oon !ser "#ide 20% hen a file ith identical name already e8ists in the target @ get all text files @ files tarti$g %ith 5 e$di$g %ith a $umber a$d ant to select multiple files. hich e get the files hich you ant to hich /ettle runs in ill be the
%escription Check this to use active mode !$P instead of the passive mode BdefaultC. $he encoding to use for the ftp control instructions" the encoding matters e.g. for the ftpLing of filenames hen they contain special characters. !or 'estern 7urope and the >*A M)*3211.&2#N should suffice. Oou can enter any encoding that is valid on your server.
1$.2.*.$. :otes
*ome !$P servers do not allo e8ampleC. $herefore" files to be !$PLed hen they contain certain characters Bspaces for hen choosing filenames for files to be !$PLed" be sure to check up front
hether your particular !$P server is able to process your kind of filenames.
)con
$able 78ists
1$.2.-.2. O tions
3ption :ame of the Dob entry %atabase connection $able name $he name of the database table to check %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. $he database connection to use ill be the same
)con
!ile 78ists
1$.2.10.2. O tions
3ption :ame of the Dob entry !ilename %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. $he name and path of the file to check for ill be the
)con
4et files
ith *ecure!$P
1$.2.11.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it *!$P2server name < )P Port >ser name Pass ord Remote directory $arget directory 'ildcard ill be the same Dob entry. $he name of the *!$P server or the )P address $he $CP port to use. $his is usually 22 $he user name to log into the *!$P server $he pass ord to log into the *!$P server $he remote directory on the *!$P server from $he directory on the machine on to place the transferred files *pecify a regular e8pression here if you !or e8ample@ .*txtD 5.*Y!-Z[F.txt .txt Remove files after retrieval+ Remove the files after they have been successfully transferred. @ get all text files @ files tarti$g %ith 5 e$di$g %ith a $umber a$d ant to select multiple files. hich e get the files hich you ant hich /ettle runs in
1$.2.12. >TTP
)con
?$$P $ransfer
1$.2.12.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it >R, Run for every result ro ill be the same Dob entry. ant to run this Dob entry for every ro that as $he >R, to use Bfor e8ample@ http@<<kettle.pentaho.orgC Check this if you resultN )nput field >R, $arget filename Append to specified target file Add date and time to target filename $arget filename e8tension >pload file >sername Pass ord Pro8y server for upload Pro8y port $he username to authenticate $he pass ord to authenticate ith. !or 'indo s %omains" put the ith. %omain in from of the user like this %36A):S>sername $he ?$$P pro8y server name or )P address $he ?$$P pro8y port to use Busually 1010C Check this if you ant to add date and time yyy66ddQ??mmss to the target filename. *pecify the target filename e8tension in case youLre adding a date and time to the filename $he target filename. Append to the target file if it already e8ists hich contains $he fieldname in the result ro s to get the >R, from generated by a previous transformation. >se the MCopy ro s to
%escription *pecify a regular e8pression matching the hosts you W separated. !or e8ample #27S.0S..] ant to ignore"
)con
Create file
1$.2.1$.2. O tions
3ption 9ob entry name !ile name !ail if file e8ists %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file to create an empty file. $he Dob entry ill follo the failure outgoing hop hen the file to be created already e8ists Bempty or notC and this option is s itched on. $he default is on.
)con
%elete file
1$.2.1%.2. O tions
3ption 9ob entry name !ile name !ail if file doesnLt e8ist %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file to delete. $he Dob entry default is off. ill follo the failure outgoing hop hen the file to be deleted does not e8ist anymore and this option is s itched on. $he
)con
ait indefinitely for the file or it can timeout after a certain time.
1$.2.1&.2. O tions
3ption 9ob entry name !ile name 6a8imum timeout %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file to is the number seconds after option Check cycle time ill determine ait for. ait indefinitely. $his ill continue even if the file ill be $he ma8imum timeout in number of seconds" or 0 to hich the flo
as not created. 'hen the timeout is reached the M*uccess on timeoutN hether the outgoing success or failure hop follo ed. $he time in seconds bet een checking for the file. $he file ill be checked for in the start of the e8ecution and then every Mcheck cycle timeN seconds until the ma8imum timeout is reached. A Dob can only be stopped every Mcheck cycle timeN as else the Dob entry step ill be sleeping. A check cycle time of (0 or 00 seconds seems to be a good trade2off bet een the period until the file is detected and the reHuired CP> usage. *uccess on timeout $his option determines outgoing hop !ile siFe check hat to do hen the M6a8imum timeoutN has been ill evaluate to success the success ill after detecting the specified reached. )f enabled" the Dob entry
file" only continue if the file siFe hasnLt changed the last check Mcycle time secondsN. $his is useful e.g. if a file is created in place Balthough itLs recommended to generate a file else here and then move it in placeC.
)con
!ile Compare
1$.2.16.2. O tions
3ption 9ob entry name !ile name # !ile name 2 %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. $he name and path of the file of the first file to compare. $he name and path of the file of the second file to compare.
)con
!ile Compare
1$.2.17.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. *!$P2server name B)PC *!$P port >ser name Pass ord Remote directory ,ocal directory 'ildcard $he name of the *!$P server or the )P address $he $CP port to use. $his is usually 22 $he user name to log into the *!$P server $he pass ord to log into the *!$P server $he remote directory on the *!$P server to $he directory on the machine on ant to ftp the files from *pecify a regular e8pression here if you !or e8ample@ .*txtD 5.*Y!-Z[F.txt .txt Remove files after transferral+ Remove the files after they have been successfully transferred. @ get all text files @ files tarti$g %ith 5 e$di$g %ith a $umber a$d ant to select multiple files. hich e put the files hich you hich /ettle runs from ill
)con
Ping a host
1$.2.1*.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. ?ost name<)P *end...packets $he name or the )P address of the host to ping $he number of packets to send Bby default 2C ill
)con
'ait for
1$.2.1-.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it the same Dob entry. 'ait for >nit time $he delay to ait *pecify the unit time Bsecond" minute and hourC ill be
)con
6sg=o8 )nfo
1$.2.20.2. O tions
3ption :ame of the Dob entry 6essage title 6essage body %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it entry. $he title of the message $he message to display ill be the same Dob
)con
6sg=o8 )nfo
1$.2.21.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. 6essage 6essage to add in log hen aborting ill
)con
G*, $ransformation
1$.2.22.2. O tions
3ption :ame of the Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it G6, !ile name G*, !ile name 3utput !ile name )f file e8ists ill be the same Dob entry. $he full name of the source G6, file $he full name of the G*, file $he full name of the created document Bresult of G*, transformationC %efine the behavior e8ists 3ptions @ Create ne created %o nothing @ nothing !ail @ the Dob ill fail ill be done ith uniHue name @ a ne output file ill be hen an output file ith the same name
1$.2.2$. Di fi,es
)con
1$.2.2$.2. O tions
3ption :ame of the Dob entry *ource directory )nclude 78clude ildcard Eip file name Compression )f Fip file e8ists After Fipping 6ove files to $he full name of the destination archive $he compression level to be used B%efault" =est Compression" =est speedC $he action to take hen there already is a file at the target destination. $he action to take after Fipping $he target directory to move the source files to after Fipping ildcard %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. $he source directory of the files to be Fipped $he $he ildcard Bregular e8pressionC of the files to include in the Fip archive ildcard Bregular e8pressionC of the files to e8clude from the Fip archive ill be the same
)con
1$.2.2%.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it be the same Dob entry. %atabase connection $arget schema $arget table name *ource file name ,ocal $he database connection used to important for data sources that allo $he name of the table to rite data to. rite data to. $his is ith dots Z.V )n it. for table names $he name of the *chema for the table to rite data to. ill
:ame of the te8t file to load data from. 7nabled@ the file is read by the client program on the client host and sent to the server. %isabled@ the file must be located on the server host and is read
3ption Priority !ields terminated by !ields enclosed by !ields escaped by ,ines started by ,ines terminated by !ields
%escription directly by the server. *pecify the priority in 6y*;, for the bulk load. *pecify the fields delimiter in the te8t file source. *pecify the enclosure character for fields in the source te8t file. *pecify the escape character for fields in the source te8t file. *pecify the characterBsC used to indicate the start of a ro source te8t file. *pecify the characterBsC used to indicate the end of a ro te8t file. *pecify the names of attributes of Ttable:ameU that are set by your data file Bseparated by commasC. Any attributes unspecified in the list of attributes ill be set to :>,,. in the source in the
Replace data )gnore the first ... lines Add files to result
7nable this option to over rite e8isting data in the target table. 3ptionally specify a number of lines to ignore. 7nable this to add the destination files to the results file names. $his is useful if you Dob entry. ant to attach these files to an email using the 7mail
)con
1$.2.2&.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it the same Dob entry. *ource ?ost >sername Pass ord >se P3P Port $arget directory $arget filename pattern Retrieve Retrieve the .. first emails %elete emails after ith **, $he host name or )P address of the P3P mail server. $he username for authenticating to the P3P server. $he pass ord for authenticating to the P3P server. 7nable this option to connect using a *ecure *ocket ,ayer B**,C connection. 'hen **, option is enabled" use this property to set the )P port for **, communication ith the P3P server. here to land the emails retrieved. ildcard used to identify the target *pecify the target directory for *pecify the regular e8pression filenames. >se this to specify hether to retrieve all emails" unread emails" or a specific number of emails. )f the Retrieve property is set to L!irst...emailsL" this property is used to specify the number of emails to retrieve. 7nable this option to delete all retrieved emails from the P3P server. ill be
)con
%elete !iles
1$.2.26.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. )nclude *ubfolders Copy previous results to args+ !ile<!older 'ildcard !iles<!olders $able 7nable this option to also delete matched files from subfolders of the target directory. 7nable this to pass the results of the previous entry to the arguments of this entry. $he target file or folder to delete files from. $he regular e8pression used to define the file name pattern for the files to delete. $his table displays the list of currently defined files and folders to delete. ill be the
1$.2.27. S#ccess
)con
*uccess
1$.2.27.2. O tions
3ption *uccess %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. ill be the same
)con
G*% Aalidator
1$.2.2*.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. G6, !ile name G*% !ile name *pecify the name of the G6, document to validate. *pecify the name of the G*% file used for validation of the G6, document. ill be the same
)con
'rite to log
1$.2.2-.2. O tions
3ption 'rite to log %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. ,og level ,og subDect ,og message *pecify the log level condition for ritten to the log. *pecify a short subDect for the log message. *pecify the detailed message to be ritten to the log file. hen the specified log message should be ill be the same
1$.2.$0. Co 1 5i,es
)con
Copy !iles
)con
%$% Aalidator
1$.2.$1.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it Dob entry. G6, !ile name %$% )ntern %$% !ile name *pecify the G6, document to validate. 7nable this option if the %$% is defined validated. *pecify the file name containing the %$% used for validation. ithing the G6, document being ill be the same
)con
Put files
ith !$P
1$.2.$2.2. O tions
3ption :ame of this Dob entry %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it !$P server name<)P address Port >sername Pass ord ,ocal directory Remote directory 'ildcard Bregular e8pressionC ill be the same Dob entry. $he name of the !$P server or the )P address $he $CP port to use. $his is usually 2# $he user name to log into the !$P server $he pass ord to log into the !$P server $he directory on the machine on hich you files *pecify a regular e8pression here if you multiple files. !or e8ample@ .*txtD @ get all text files e$di$g %ith a $umber a$d .txt =inary mode+ Pentaho Data Integration TM 7nable this option to perform the transfer in =inary mode. S oon !ser "#ide 2$2 5.*Y!-Z[F.txt @ files tarti$g %ith 5 ant to select hich /ettle runs from hich e put the ant to !$P the files from
3ption $imeout Remove files after transferal+ %onLt overrite files >se active !$P connection Control 7ncoding
%escription *pecify the timeout period before ending in error. Remove the files after they have been successfully transferred. 7nable this option to prevent over riting any e8isting files on the target !$P server. 7nable this option to use and active !$P connection. *pecify the character set to user for filenames and directories.
1$.2.$$. !nEi
)con
>nFip
1$.2.$$.2. O tions
3ption 9ob entry name %escription $he name of the Dob entry. $his name has to be uniHue in a single Dob. A Dob entry can be placed several times on the canvas" ho ever it same Dob entry. Eip !ile name $arget %irectory )nclude 'ildcard 78clude 'ildcard After e8traction 6ove !iles $o Add e8tracted file to result *pecify the name of the file to unFip. *pecify the target directory to place the unFipped contents. *pecify a regular e8pression unFip. *pecify a regular e8pression from the unFip process. *pecify the action to take after unFipping the file. 3ptions include do nothing" delete files" or move files to specified location. )f the move files action is specified in the LAfter 78tractionL property" this field is used to identify the target location to move the files to. 7nable this to add the destination files to the results file names. $his is useful if you entry. ant to attach these files to an email using the 7mail Dob ildcard for any files you ant to e8clude ildcard for the specific files you ant to ill be the
)con
$ransformation )con
9ob )con
of the data.
of the button. 'hen you let go of the mouse button the selected step B$e8t file inputC
*tep...W*tep
type.
Pentaho Data Integration TM S oon !ser "#ide 2$6
1%.6. 'dding ho s
3n the graphical vie the Huickest ay to create a ne hop is by dragging ith the mouse from one step to another using the middle button. Oou can also drag the left button and press the *?)!$ key at the same time. !or a more complete e8planation regarding hops" please refer to chapter on ?ops.
78ecute a transformation
place. $his feature reHuires that you have Pentaho %ata )ntegration installed on a remote machine and running the Carte service. *ee the #-.-.( Configuring a remote or slave server for more details on setting up remote and slave servers. Execute clustered@ Allo s you to e8ecute the Dob or transformation in a clustered environment. *ee the section on Clustering for more details on ho clustered environment. to e8ecute a Dob or transformation in a
%escription detailed descriptions of the log level types see ,ogging. $his ill set the replay date for hen you ant to replay the transformation. )t ill pick up information in the te8t file input or 78cel input steps to skip ro s already processed on the replay date.
Arguments Aariables
$his grid allo s you to set the value of arguments to be used transformation. $his grid allo s you to set the value of variables to be used transformation.
*poon Bremote 5 clustered e8ecutionC or from the $ransformation Dob entry Bclustered e8ecutionC.
%escription 7nter the pass ord credential for accessing the remote server $his setting tells Pentaho %ata )ntegration that this server ill act as the master server in any clustered e8ecutions of the transformation
Note@
setup as the master and all remaining servers in the cluster as slaves.
option supports specifying multiple servers using regular e8pressions. Oou can also add multiple servers and e8pressions separated by the ZWV character.
1&.%. C,#stering
1&.%.1. O6er6ie/
Clustering allo s transformations and transformation steps to be e8ecuted in parallel on more than one server. $he clustering schema defines clustered e8ecution options. hich slave servers you ant to assign to the cluster and a variety of
1&.%.$. O tions
3ption *chema name Port %escription $he name of the clustering schema ?ere you can specify the port from server hich to start numbering ports for the slave servers. 7ach additional clustered step e8ecuting on a slave ill consume an additional port.
Note@ 6ake sure no other net orking protocols are in the same range
$hen you create the transformation as usual" connecting 2 steps e8ecute is running on a cluster@
$he transformation
'hen
a non2clustered e8ecution. $hat means that you can use the normal local e8ecution to test the transformation. ?o ever e can e8ecute the transformation in a clustered fashion like this@
ill be sent over the $CP<)P sockets using the *ocket 'riter and *ocket Reader steps.
16. Logging
16.1. Logging Descri tion
A log vie
tab
ill open automatically each time you run a transformation or Dob. $he log grid displays a list
of transformation steps or Dob entries for the current e8ecution. $he log te8t sho s log information based on the current logging level.
%escription Priority of the step B#0Khighest" #Klo estC" nr of ro s in the input2streamBsC" nr of ro s in the output2streamBsC. *leep time Bget<putC is the time that a step had to go to sleep Bin nano secondsC because the input buffer as empty BgetC or the output buffer as full BputC.
priority.
16.$. =#ttons
16.$.1. 1&.%.1 Transfor(ation =#ttons
16.$.1.1. 1&.%.1.1. Start
$his button starts the transformation. Please note that *poon tries to launch this from the G6,2file or repository. )t is therefore necessary that the transformation is saved. $he output of the e8ecution is displayed in the ,og $e8t part of the ,og Aie .
$his button launches the $ransformation %ebug dialog allo ing you to specify the number of ro s to previe and define conditional breakpoints for the previe e8ecution. After configuring the e8ecution for the currently debug information" click the L;uick ,aunchL button to begin the previe
selected step. $he output of the e8ecution is displayed in the ,og $e8t part of the ,og Aie .
16.3.1.2.1.
ebug &ptions
$he follo ing table provides a detailed description of the debug options@ 3ption *tep ,ist %escription $he step list on the left displays a list of available steps from the current transformation. *elect a step to begin configuring related options like number of ro s and break2points. :umber of ro s to retrieve 7nter the ro s per step you ant to previe for the selected step. After the reHuested ro s are obtained from the different steps" the transformation is ended and the result is sho n. Note: $his option
option is checked.
Retrieve first ro s Bprevie C Pause transformation on condition =reak2 point<pause condition 7nter conditions based on comparing one field to another field or value. 7nable this to restrict the previe specified above. 7nable this option to cause the transformation to pause if one of the conditional break2points evaluates to true during e8ecution. siFe to the number of ro s
16.3.1.2.2.
ebug example
*tarting ith the simple transformation sho n here@
$he generate ro s step generates empty ro s and adds an id from # to #000. :o ant to pause the transformation and see the content of the ro id==\X $o do this" simply click on the debug icon in the main toolbar@ here@
hich the transformation is paused. Oou as met. e8ecution and display the
can also specify to keep the last : ro s in memory before the condition Pressing the L;uick ,aunchL button ill begin the previe follo ing dialog upon meeting the condition@
!or convenience" the order of the ro s is reversed in the previe indo " you transformation
indo
so that the ro
that met the condition is al ays at the top of the results. After closing the previe ill note that the transformation is paused Bsee log tabC and you can then ill be paused again and another previe dialog ill display. resume e8ecution by clicking the resume button. )f a condition is met again" the
)f you put a te8t in the filter field" only the lines that contain this te8t indo .
$he M,og levelN setting allo s you to select the logging level. Oou can choose one of these@ Ro :othing@ 7rror@ 6inimal@ =asic@ %etailed@ %ebug@ %onVt sho 3nly sho any output errors
3nly use minimal logging $his is the default basic logging level 4ive detailed logging output !or debugging purposes" very detailed output. level" this can generate a lot of data. ill be preceded by the time of day.
level@ ,ogging at a ro
)f you put a te8t in the filter field" only the lines that contain this te8t indo .
$he M,og levelN setting allo s you to select the logging level. Oou can choose one of these@ 7rror@ :othing@ 6inimal@ =asic@ %etailed@ %ebug@ Ro level@ 3nly sho %onVt sho errors any output
3nly use minimal logging $his is the default basic logging level 4ive detailed logging output !or debugging purposes" very detailed output. ,ogging at a ro level" this can generate a lot of data. ill be preceded by the time of day.
16.$.2.6. '#to2refresh
7nable this option to disable the logging this indo from updating all the time. Oou might net ork connection. ant to do hen youLre using a remote desktop BA:C" G##C over a slo
17. "rids
17.1. Descri tion
4rids BtablesC are used throughout the *poon interface to enter or display information. $his section describes common functions available hen orking ith a 4rid.
17.2. !sage
Click on a cell to begin editing that field. After pressing enter" you can navigate the grid by using the cursor keys. Pressing enter again allo s you to edit the ne ly selected field. $he follo ing table describes the functions available 3ption )nsert before this ro )nsert after this ro 6ove the ro 6ove the ro up do n %escription )nserts an empty ro before the ro you clicked on. hen you right2click on a cell in the grid@
)nserts an empty ro
6ove the ro 6ove the ro
after the ro
you clicked on up. $he keyboard shortcut for this is C$R,2>P you clicked on do n. $he keyboard shortcut for this is C$R,2%3':.
3ptimal column siFe including header 3ptimal column siFe e8cluding header Clear all *elect all ro s Clear selections Copy selected lines to clipboard Past clipboard to table Cut selected lines
ResiFe all columns so that it displays all values completely" including the header. $he keyboard shortcut for this function is function key !(. ResiFe all columns so that it displays all values completely. $he keyboard shortcut for this function is function key !-. Clears all information in the grid. Oou ill be asked to confirm this operation. *elects all ro s in the grid. $he keyboard shortcut for this function is C$R,2A. Clears the selection of ro s in the grid. $he keyboard shortcut for this function is 7*C. Copies the selected lines to the clipboard in a te8tual representation. $hese lines can then be e8changed ith other programs such as spreadsheets or even other hich *poon 5 /ettle dialogs. $he keyboard shortcut for this function is C$R,2C. )nsert the lines that are on the clipboard to the grid" right after the line on you clicked. $he keyboard shortcut for this function is C$R,2A. Copies the selected lines to the clipboard in a te8tual representation. After that" the lines are deleted from the grid. $he keyboard shortcut for this function is C$R,2G.
%elete selected lines /eep only selected lines Copy field values to all ro s >ndo Redo
%eletes all selected lines from the grid. $he keyboard shortcut for this function is %7,. )f there are more lines to delete then there are to keep" simply select the lines you ant to keep and use this function. $hey keyboard shortcut for this function is C$R,2/. )f all ro s in the grid need to have the same value for a certain column" you can use this function to do this. >ndo the previous grid operation. $he keyboard shortcut for this function is C$R,2 E. Redo the ne8t grid operation. $he keyboard shortcut for this function is C$R,2O.
Oou can also click on the column header to sort content by name" obDect type" user or changed date.
NOTE: you can restore the obDects from a backed up repository any here in the target repository directory
tree.
$o share one of these obDects" simply right2click on the obDect in the tree control on the left and choose
Note: $he location of the shared obDects file is configurable on the M6iscellaneousN tab of the
$ransformationW*ettings dialog.
Copyright (C) .ZZ./ .ZZZ ,ree -oft%are ,ou$datio$/ 1$c. ;. ,ra$<li$ -t/ ,ifth ,loor/ :osto$/ 85 ! ..!-.=!. *-5 9veryo$e is permitted to copy a$d distribute verbatim copies of this lice$se docume$t/ but cha$gi$g it is $ot allo%ed. Y2his is the first released versio$ of the +esser (#+. 1t also cou$ts as the successor of the ()* +ibrary #ublic +ice$se/ versio$ $umber ...[ #reamble 2he lice$ses for most soft%are are desig$ed to ta<e a%ay your freedom to share a$d cha$ge it. :y co$trast/ the ()* (e$eral #ublic +ice$ses are i$te$ded to guara$tee your freedom to share a$d cha$ge free soft%are--to ma<e sure the soft%are is free for all its users. 2his lice$se/ the +esser (e$eral #ublic +ice$se/ applies to some specially desig$ated soft%are pac<ages--typically libraries--of the ,ree -oft%are ,ou$datio$ a$d other authors %ho decide to use it. 6ou ca$ use it too/ but %e suggest you first thi$< carefully about %hether this lice$se or the ordi$ary (e$eral #ublic +ice$se is the better strategy to use i$ a$y particular case/ based o$ the expla$atio$s belo%. 0he$ %e spea< of free soft%are/ %e are referri$g to freedom of use/ $ot price. 4ur (e$eral #ublic +ice$ses are desig$ed to ma<e sure that you have the freedom to distribute copies of free soft%are (a$d charge for this service if you %a$t)& that you receive source code or ca$ get it if you %a$t it& that you ca$ cha$ge the soft%are a$d use pieces of it i$ $e% free programs& a$d that you are i$formed that you ca$ do these thi$gs. 2o protect your rights/ %e $eed to ma<e restrictio$s that forbid distributors to de$y you these rights or to as< you to surre$der these rights. 2hese restrictio$s tra$slate to certai$ respo$sibilities for you if you distribute copies of the library or if you modify it. ,or example/ if you distribute copies of the library/ %hether gratis or for a fee/ you must give the recipie$ts all the rights that %e gave you. 6ou must ma<e sure that they/ too/ receive or ca$ get the source code. 1f you li$< other code %ith the library/ you must provide complete object files to the recipie$ts/ so that they ca$ reli$< them %ith the library after ma<i$g cha$ges / he$ce the versio$
to the library a$d recompili$g it. 5$d you must sho% them these terms so they <$o% their rights. 0e protect your rights %ith a t%o-step method@ (.) %e copyright the library/ a$d ( ) %e offer you this lice$se/ %hich gives you legal permissio$ to copy/ distribute a$d'or modify the library. 2o protect each distributor/ %e %a$t to ma<e it very clear that there is $o %arra$ty for the free library. 5lso/ if the library is modified by someo$e else a$d passed o$/ the recipie$ts should <$o% that %hat they have is $ot the origi$al versio$/ so that the origi$al authorNs reputatio$ %ill $ot be affected by problems that might be i$troduced by others. ,i$ally/ soft%are pate$ts pose a co$sta$t threat to the existe$ce of a$y free program. 0e %a$t to ma<e sure that a compa$y ca$$ot effectively restrict the users of a free program by obtai$i$g a restrictive lice$se from a pate$t holder. 2herefore/ %e i$sist that a$y pate$t lice$se obtai$ed for a versio$ of the library must be co$siste$t %ith the full freedom of use specified i$ this lice$se. 8ost ()* soft%are/ i$cludi$g some libraries/ is covered by the ordi$ary ()* (e$eral #ublic +ice$se. 2his lice$se/ the ()* +esser (e$eral #ublic +ice$se/ applies to certai$ desig$ated libraries/ a$d is ?uite differe$t from the ordi$ary (e$eral #ublic +ice$se. 0e use this lice$se for certai$ libraries i$ order to permit li$<i$g those libraries i$to $o$-free programs. 0he$ a program is li$<ed %ith a library/ %hether statically or usi$g a shared library/ the combi$atio$ of the t%o is legally spea<i$g a combi$ed %or</ a derivative of the origi$al library. 2he ordi$ary (e$eral #ublic +ice$se therefore permits such li$<i$g o$ly if the e$tire combi$atio$ fits its criteria of freedom. 2he +esser (e$eral #ublic +ice$se permits more lax criteria for li$<i$g other code %ith the library. 0e call this lice$se the W+esserW (e$eral #ublic +ice$se because it does +ess to protect the userNs freedom tha$ the ordi$ary (e$eral #ublic +ice$se. 1t also provides other free soft%are developers +ess of a$ adva$tage over competi$g $o$-free programs. 2hese disadva$tages are the reaso$ %e use the ordi$ary (e$eral #ublic +ice$se for ma$y libraries. 3o%ever/ the +esser lice$se provides adva$tages i$ certai$ special circumsta$ces. ,or example/ o$ rare occasio$s/ there may be a special $eed to e$courage the %idest possible use of a certai$ library/ so that it becomes a de-facto sta$dard. 2o achieve this/ $o$-free programs must be allo%ed to use the library. 5 more fre?ue$t case is that a free library does the same job as %idely used $o$-free libraries. 1$ this case/ there is little to gai$ by Pentaho Data Integration TM S oon !ser "#ide 2&6
limiti$g the free library to free soft%are o$ly/ so %e use the +esser (e$eral #ublic +ice$se. 1$ other cases/ permissio$ to use a particular library i$ $o$-free programs e$ables a greater $umber of people to use a large body of free soft%are. ,or example/ permissio$ to use the ()* C +ibrary i$ $o$-free programs e$ables ma$y more people to use the %hole ()* operati$g system/ as %ell as its varia$t/ the ()*'+i$ux operati$g system. 5lthough the +esser (e$eral #ublic +ice$se is +ess protective of the usersN freedom/ it does e$sure that the user of a program that is li$<ed %ith the +ibrary has the freedom a$d the %here%ithal to ru$ that program usi$g a modified versio$ of the +ibrary. 2he precise terms a$d co$ditio$s for copyi$g/ distributio$ a$d modificatio$ follo%. #ay close atte$tio$ to the differe$ce bet%ee$ a= W%or< based o$ the libraryW a$d a W%or< that uses the libraryW. 2he former co$tai$s code derived from the library/ %hereas the latter must be combi$ed %ith the library i$ order to ru$. ()* +9--97 (9)975+ #*:+1C +1C9)-9 2978- 5)> C4)>1214)- ,47 C4#61)(/ >1-271:*214) 5)> 84>1,1C5214) !. 2his +ice$se 5greeme$t applies to a$y soft%are library or other program %hich co$tai$s a $otice placed by the copyright holder or other authoriCed party sayi$g it may be distributed u$der the terms of this +esser (e$eral #ublic +ice$se (also called Wthis +ice$seW). 9ach lice$see is addressed as WyouW. 5 WlibraryW mea$s a collectio$ of soft%are fu$ctio$s a$d'or data prepared so as to be co$ve$ie$tly li$<ed %ith applicatio$ programs(%hich use some of those fu$ctio$s a$d data) to form executables. 2he W+ibraryW/ belo%/ refers to a$y such soft%are library or %or< %hich has bee$ distributed u$der these terms. 5 W%or< based o$ the +ibraryW mea$s either the +ibrary or a$y derivative %or< u$der copyright la%@ that is to say/ a %or< co$tai$i$g the +ibrary or a portio$ of it/ either verbatim or %ith modificatio$s a$d'or tra$slated straightfor%ardly i$to a$other la$guage. (3erei$after/ tra$slatio$ is i$cluded %ithout limitatio$ i$ the term Wmodificatio$W.) W-ource codeW for a %or< mea$s the preferred form of the %or< for ma<i$g modificatio$s to it. ,or a library/ complete source code mea$s all the source code for all modules it co$tai$s/ plus a$y associated i$terface defi$itio$ files/ plus the scripts used to co$trol compilatio$ a$d i$stallatio$ of the library.
5ctivities other tha$ copyi$g/ distributio$ a$d modificatio$ are $ot covered by this +ice$se& they are outside its scope. 2he act of ru$$i$g a program usi$g the +ibrary is $ot restricted/ a$d output from such a program is covered o$ly if its co$te$ts co$stitute a %or< based o$ the +ibrary (i$depe$de$t of the use of the +ibrary i$ a tool for %riti$g it). 0hether that is true depe$ds o$ %hat the +ibrary does a$d %hat the program that uses the +ibrary does. .. 6ou may copy a$d distribute verbatim copies of the +ibraryNs complete source code as you receive it/ i$ a$y medium/ provided that you co$spicuously a$d appropriately publish o$ each copy a$ appropriate copyright $otice a$d disclaimer of %arra$ty& <eep i$tact all the $otices that refer to this +ice$se a$d to the abse$ce of a$y %arra$ty& a$d distribute a copy of this +ice$se alo$g %ith the +ibrary. 6ou may charge a fee for the physical act of tra$sferri$g a copy/ a$d you may at your optio$ offer %arra$ty protectio$ i$ excha$ge for afee. . 6ou may modify your copy or copies of the +ibrary or a$y portio$ of it/ thus formi$g a %or< based o$ the +ibrary/ a$d copy a$d distribute such modificatio$s or %or< u$der the terms of -ectio$ . above/ provided that you also meet all of these co$ditio$s@ a) 2he modified %or< must itself be a soft%are library. b) 6ou must cause the files modified to carry promi$e$t $otices stati$g that you cha$ged the files a$d the date of a$y cha$ge. c) 6ou must cause the %hole of the %or< to be lice$sed at $o charge to all third parties u$der the terms of this +ice$se. d) 1f a facility i$ the modified +ibrary refers to a fu$ctio$ or a table of data to be supplied by a$ applicatio$ program that uses the facility/ other tha$ as a$ argume$t passed %he$ the facility is i$vo<ed/ the$ you must ma<e a good faith effort to e$sure that/ i$ the eve$t a$ applicatio$ does $ot supply such fu$ctio$ or table/ the facility still operates/ a$d performs %hatever part of its purpose remai$s mea$i$gful. (,or example/ a fu$ctio$ i$ a library to compute s?uare roots has a purpose that is e$tirely %ell-defi$ed i$depe$de$t of the applicatio$. 2herefore/ -ubsectio$ d re?uires that a$y applicatio$-supplied fu$ctio$ or table used by this fu$ctio$ must be optio$al@ if the applicatio$ does $ot supply it/ the s?uare root fu$ctio$ must still compute s?uare roots.) 2hese re?uireme$ts apply to the modified %or< as a %hole. 1f ide$tifiable sectio$s of that %or< are $ot derived from the +ibrary/ a$d ca$ be reaso$ably co$sidered i$depe$de$t a$d separate %or<s i$ themselves/ the$ this +ice$se/ a$d its terms/ do $ot apply to those sectio$s %he$ you distribute them as separate %or<s. :ut %he$ you distribute the same sectio$s as part of a %hole %hich is a %or< based o$ the +ibrary/ the distributio$ of the %hole must be o$ the terms of
this +ice$se/ %hose permissio$s for other lice$sees exte$d to the e$tire %hole/ a$d thus to each a$d every part regardless of %ho %rote it. 2hus/ it is $ot the i$te$t of this sectio$ to claim rights or co$test your rights to %or< %ritte$ e$tirely by you& rather/ the i$te$t is to exercise the right to co$trol the distributio$ of derivative or collective %or<s based o$ the +ibrary. 1$ additio$/ mere aggregatio$ of a$other %or< $ot based o$ the +ibrary %ith the +ibrary (or %ith a %or< based o$ the +ibrary) o$ a volume of a storage or distributio$ medium does $ot bri$g the other %or< u$der the scope of this +ice$se. =. 6ou may opt to apply the terms of the ordi$ary ()* (e$eral #ublic +ice$se i$stead of this +ice$se to a give$ copy of the +ibrary. 2o do this/ you must alter all the $otices that refer to this +ice$se/ so that they refer to the ordi$ary ()* (e$eral #ublic +ice$se/ versio$ a $e%er versio$ tha$ versio$ / i$stead of to this +ice$se. (1f of ordi$ary ()* (e$eral #ublic +ice$se has
appeared/ the$ you ca$ specify that versio$ i$stead if you %a$t.) >o $ot ma<e a$y other cha$ge i$ these $otices. 4$ce this cha$ge is made i$ a give$ copy/ it is irreversible for that copy/ so the ordi$ary ()* (e$eral #ublic +ice$se applies to all subse?ue$t copies a$d derivative %or<s made from that copy. 2his optio$ is useful %he$ you %ish to copy part of the code of= the +ibrary i$to a program that is $ot a library. S. 6ou may copy a$d distribute the +ibrary (or a portio$ or derivative of it/ u$der -ectio$ . a$d ) i$ object code or executable form u$der the terms of -ectio$s above provided that you accompa$y it %ith the complete correspo$di$g above o$ a
machi$e-readable source code/ %hich must be distributed u$der the terms of -ectio$s . a$d medium customarily used for soft%are i$tercha$ge.1f distributio$ of object code is made by offeri$g access to copy from a desig$ated place/ the$ offeri$g e?uivale$t access to copy the source code from the same place satisfies the re?uireme$t to distribute the source code/ eve$ though third parties are $ot compelled to copy the source alo$g %ith the object code. ;. 5 program that co$tai$s $o derivative of a$y portio$ of the +ibrary/ but is desig$ed to %or< %ith the +ibrary by bei$g compiled or li$<ed %ith it/ is called a W%or< that uses the +ibraryW. -uch a %or</ i$ isolatio$/ is $ot a derivative %or< of the +ibrary/ a$d therefore falls outside the scope of this +ice$se. 3o%ever/ li$<i$g a W%or< that uses the +ibraryW %ith the +ibrary creates a$ executable that is a derivative of the +ibrary (because it co$tai$s portio$s of the +ibrary)/ rather tha$ a W%or< that uses the libraryW. 2he executable is Pentaho Data Integration TM S oon !ser "#ide 2&-
therefore covered by this +ice$se. -ectio$ " states terms for distributio$ of such executables. 0he$ a W%or< that uses the +ibraryW uses material from a header file that is part of the +ibrary/ the object code for the %or< may be a derivative %or< of the +ibrary eve$ though the source code is $ot. 0hether this is true is especially sig$ifica$t if the %or< ca$ be li$<ed %ithout the +ibrary/ or if the %or< is itself a library. 2he threshold for this to be true is $ot precisely defi$ed by la%. 1f such a$ object file uses o$ly $umerical parameters/ data structure layouts a$d accessors/ a$d small macros a$d small i$li$e fu$ctio$s (te$ li$es or less i$ le$gth)/ the$ the use of the object file is u$restricted/ regardless of %hether it is legally a derivative %or<. (9xecutables co$tai$i$g this object code plus portio$s of the +ibrary %ill still fall u$der -ectio$ ".) 4ther%ise/ if the %or< is a derivative of the +ibrary/ you may distribute the object code for the %or< u$der the terms of -ectio$ ". 5$y executables co$tai$i$g that %or< also fall u$der -ectio$ "/ %hether or $ot they are li$<ed directly %ith the +ibrary itself. ". 5s a$ exceptio$ to the -ectio$s above/ you may also combi$e or li$< a W%or< that uses the +ibraryW %ith the +ibrary to produce a %or< co$tai$i$g portio$s of the +ibrary/ a$d distribute that %or< u$der terms of your choice/ provided that the terms permit modificatio$ of the %or< for the customerNs o%$ use a$d reverse e$gi$eeri$g for debuggi$g such modificatio$s. 6ou must give promi$e$t $otice %ith each copy of the %or< that the +ibrary is used i$ it a$d that the +ibrary a$d its use are covered by this +ice$se. 6ou must supply a copy of this +ice$se. 1f the %or< duri$g executio$ displays copyright $otices/ you must i$clude the copyright $otice for the +ibrary amo$g them/ as %ell as a refere$ce directi$g the user to the copy of this +ice$se. 5lso/ you must do o$e of these thi$gs@ a) 5ccompa$y the %or< %ith the complete correspo$di$g machi$e-readable source code for the +ibrary i$cludi$g %hatever cha$ges %ere used i$ the %or< (%hich must be distributed u$der -ectio$s . a$d above)& a$d/ if the %or< is a$ executable li$<ed %ith the +ibrary/ %ith the complete machi$e-readable W%or< that uses the +ibraryW/ as object code a$d'or source code/ so that the user ca$ modify the +ibrary a$d the$ reli$< to produce a modified executable co$tai$i$g the modified +ibrary. (1t is u$derstood that the user %ho cha$ges the co$te$ts of defi$itio$s files i$ the +ibrary %ill $ot $ecessarily be able to recompile the applicatio$ to use the modified defi$itio$s.)
b) *se a suitable shared library mecha$ism for li$<i$g %ith the +ibrary. 5 suitable mecha$ism is o$e that (.) uses at ru$ time a copy of the library already prese$t o$ the userNs computer system/ rather tha$ copyi$g library fu$ctio$s i$to the executable/ a$d ( ) %ill operate properly %ith a modified versio$ of the library/ if the user i$stalls o$e/ as lo$g as the modified versio$ is i$terface-compatible %ith the versio$ that the %or< %as made %ith. c) 5ccompa$y the %or< %ith a %ritte$ offer/ valid for at least three years/ to give the same user the materials specified i$ -ubsectio$ "a/ above/ for a charge $o more tha$ the cost of performi$g this distributio$. d) 1f distributio$ of the %or< is made by offeri$g access to copy from a desig$ated place/ offer e?uivale$t access to copy the above specified materials from the same place. e) Jerify that the user has already received a copy of these materials or that you have already se$t this user a copy. ,or a$ executable/ the re?uired form of the W%or< that uses the +ibraryW must i$clude a$y data a$d utility programs $eeded for reproduci$g the executable from it. 3o%ever/ as a special exceptio$/ the materials to be distributed $eed $ot i$clude a$ythi$g that is $ormally distributed (i$ either source or bi$ary form) %ith the major compo$e$ts (compiler/ <er$el/ a$d so o$) of the operati$g system o$ %hich the executable ru$s/ u$less that compo$e$t itself accompa$ies the executable. 1t may happe$ that this re?uireme$t co$tradicts the lice$se restrictio$s of other proprietary libraries that do $ot $ormally accompa$y the operati$g system. -uch a co$tradictio$ mea$s you ca$$ot use both them a$d the +ibrary together i$ a$ executable that you distribute. X. 6ou may place library facilities that are a %or< based o$ the +ibrary sideby-side i$ a si$gle library together %ith other library facilities $ot covered by this +ice$se/ a$d distribute such a combi$ed library/ provided that the separate distributio$ of the %or< based o$ the +ibrary a$d of the other library facilities is other%ise permitted/ a$d provided that you do these t%o thi$gs@ a) 5ccompa$y the combi$ed library %ith a copy of the same %or< based o$ the +ibrary/ u$combi$ed %ith a$y other library facilities. 2his must be distributed u$der the terms of the -ectio$s above. b) (ive promi$e$t $otice %ith the combi$ed library of the fact that part of it is a %or< based o$ the +ibrary/ a$d explai$i$g %here to fi$d the accompa$yi$g u$combi$ed form of the same %or<. Pentaho Data Integration TM S oon !ser "#ide 261
\. 6ou may $ot copy/ modify/ sublice$se/ li$< %ith/ or distribute the +ibrary except as expressly provided u$der this +ice$se. 5$y attempt other%ise to copy/ modify/ sublice$se/ li$< %ith/ or distribute the +ibrary is void/ a$d %ill automatically termi$ate your rights u$der this +ice$se. 3o%ever/ parties %ho have received copies/ or rights/ from you u$der this +ice$se %ill $ot have their lice$ses termi$ated so lo$g as such parties remai$ i$ full complia$ce. Z. 6ou are $ot re?uired to accept this +ice$se/ si$ce you have $ot sig$ed it. 3o%ever/ $othi$g else gra$ts you permissio$ to modify or distribute the +ibrary or its derivative %or<s. 2hese actio$s are prohibited by la% if you do $ot accept this +ice$se. 2herefore/ by modifyi$g or distributi$g the +ibrary (or a$y %or< based o$ the +ibrary)/ you i$dicate your accepta$ce of this +ice$se to do so/ a$d all its terms a$d co$ditio$s for copyi$g/ distributi$g or modifyi$g the +ibrary or %or<s based o$ it. .!. 9ach time you redistribute the +ibrary (or a$y %or< based o$ the +ibrary)/ the recipie$t automatically receives a lice$se from the origi$al lice$sor to copy/ distribute/ li$< %ith or modify the +ibrary subject to these terms a$d co$ditio$s. 6ou may $ot impose a$y further restrictio$s o$ the recipie$tsN exercise of the rights gra$ted herei$. 6ou are $ot respo$sible for e$forci$g complia$ce by third parties %ith this +ice$se. ... 1f/ as a co$se?ue$ce of a court judgme$t or allegatio$ of pate$t i$fri$geme$t or for a$y other reaso$ ($ot limited to pate$t issues)/ co$ditio$s are imposed o$ you (%hether by court order/ agreeme$t or other%ise) that co$tradict the co$ditio$s of this +ice$se/ they do $ot excuse you from the co$ditio$s of this +ice$se. 1f you ca$$ot distribute so as to satisfy simulta$eously your obligatio$s u$der this +ice$se a$d a$y other perti$e$t obligatio$s/ the$ as a co$se?ue$ce you may $ot distribute the +ibrary at all. ,or example/ if a pate$t lice$se %ould $ot permit royalty-free redistributio$ of the +ibrary by all those %ho receive copies directly or i$directly through you/ the$ the o$ly %ay you could satisfy both it a$d this +ice$se %ould be to refrai$ e$tirely from distributio$ of the +ibrary. 1f a$y portio$ of this sectio$ is held i$valid or u$e$forceable u$der a$y particular circumsta$ce/ the bala$ce of the sectio$ is i$te$ded to apply/ a$d the sectio$ as a %hole is i$te$ded to apply i$ other circumsta$ces. 1t is $ot the purpose of this sectio$ to i$duce you to i$fri$ge a$y pate$ts or other property right claims or to co$test validity of a$y such claims& this sectio$ has the sole purpose of protecti$g the i$tegrity of the free soft%are distributio$ system %hich is impleme$ted by public lice$se practices. 8a$y people have made ge$erous co$tributio$s to the %ide ra$ge of soft%are distributed through that system i$ relia$ce o$ co$siste$t applicatio$ of that Pentaho Data Integration TM S oon !ser "#ide 262
system& it is up to the author'do$or to decide if he or she is %illi$g to distribute soft%are through a$y other system a$d a lice$see ca$$ot impose that choice. 2his sectio$ is i$te$ded to ma<e thoroughly clear %hat is believed to be a co$se?ue$ce of the rest of this +ice$se. . . 1f the distributio$ a$d'or use of the +ibrary is restricted i$ certai$ cou$tries either by pate$ts or by copyrighted i$terfaces/ the origi$al copyright holder %ho places the +ibrary u$der this +ice$se may add a$ explicit geographical distributio$ limitatio$ excludi$g those cou$tries/ so that distributio$ is permitted o$ly i$ or amo$g cou$tries $ot thus excluded. 1$ such case/ this +ice$se i$corporates the limitatio$ as if %ritte$ i$ the body of this +ice$se. .=. 2he ,ree -oft%are ,ou$datio$ may publish revised a$d'or $e% versio$s of the +esser (e$eral #ublic +ice$se from time to time. -uch $e% versio$s %ill be similar i$ spirit to the prese$t versio$/ but may differ i$ detail to address $e% problems or co$cer$s. 9ach versio$ is give$ a disti$guishi$g versio$ $umber. 1f the +ibrary specifies a versio$ $umber of this +ice$se %hich applies to it a$d Wa$y later versio$W/ you have the optio$ of follo%i$g the terms a$d co$ditio$s either of that versio$ or of a$y later versio$ published by the ,ree -oft%are ,ou$datio$. 1f the +ibrary does $ot specify a lice$se versio$ $umber/ you may choose a$y versio$ ever published by the ,ree -oft%are ,ou$datio$. .S. 1f you %a$t to i$corporate parts of the +ibrary i$to other free programs %hose distributio$ co$ditio$s are i$compatible %ith these/ %rite to the author to as< for permissio$. ,or soft%are %hich is copyrighted by the ,ree -oft%are ,ou$datio$/ %rite to the ,ree -oft%are ,ou$datio$& %e sometimes ma<e exceptio$s for this. 4ur decisio$ %ill be guided by the t%o goals of preservi$g the free status of all derivatives of our free soft%are a$d of promoti$g the shari$g a$d reuse of soft%are ge$erally. )4 05775)26 .;. :9C5*-9 239 +1:7576 1- +1C9)-9> ,799 4, C357(9/ 23979 1- )4 05775)26 ,47 239 +1:7576/ 24 239 9E29)2 #9781229> :6 5##+1C5:+9 +50. 9EC9#2 039) 4239701-9 -2529> 1) 07121)( 239 C4#671(32 34+>97- 5)>'47 42397 #57219- #74J1>9 239 +1:7576 W5- 1-W 01234*2 05775)26 4, 5)6 K1)>/ 912397 9E#79--9> 47 18#+19>/ 1)C+*>1)(/ :*2 )42 +18129> 24/ 239 18#+19> 05775)219- 4, 897C35)25:1+126 5)> ,12)9-- ,47 5 #5721C*+57 #*7#4-9. 239 9)2179 71-K 5- 24 239 B*5+126 5)> #97,4785)C9 4, 239 +1:7576 10123 64*. -34*+> 239 +1:7576 #74J9 >9,9C21J9/ 64* 5--*89 239 C4-2 4, 5++ )9C9--576 -97J1C1)(/ 79#517 47 C4779C214).
.". 1) )4 9J9)2 *)+9-- 79B*179> :6 5##+1C5:+9 +50 47 5(799> 24 1) 07121)( 01++ 5)6 C4#671(32 34+>97/ 47 5)6 42397 #5726 034 856 84>1,6 5)>'47 79>1-271:*29 239 +1:7576 5- #9781229> 5:4J9/ :9 +15:+9 24 64* ,47 >585(9-/ 1)C+*>1)( 5)6 (9)975+/ -#9C15+/ 1)C1>9)25+ 47 C4)-9B*9)215+ >585(9- 571-1)( 4*2 4, 239 *-9 47 1)5:1+126 24 *-9 239 +1:7576 (1)C+*>1)( :*2 )42 +18129> 24 +4-- 4, >525 47 >525 :91)( 79)>979> 1)5CC*7529 47 +4--9- -*-251)9> :6 64* 47 2317> #57219- 47 5 ,51+*79 4, 239 +1:7576 24 4#97529 0123 5)6 42397 -4,20579)/ 9J9) 1, -*C3 34+>97 47 42397 #5726 35- :99) 5>J1-9> 4, 239 #4--1:1+126 4, -*C3 >585(9-. 9)> 4, 2978- 5)> C4)>1214)3o% to 5pply 2hese 2erms to 6our )e% +ibraries 1f you develop a $e% library/ a$d you %a$t it to be of the greatest possible use to the public/ %e recomme$d ma<i$g it free soft%are that everyo$e ca$ redistribute a$d cha$ge. 6ou ca$ do so by permitti$g redistributio$ u$der these terms (or/ alter$atively/ u$der the terms of the ordi$ary (e$eral #ublic +ice$se). 2o apply these terms/ attach the follo%i$g $otices to the library. 1t is safest to attach them to the start of each source file to most effectively co$vey the exclusio$ of %arra$ty& a$d each file should have at least the WcopyrightW li$e a$d a poi$ter to %here the full $otice is fou$d. Go$e li$e to give the libraryNs $ame a$d a brief idea of %hat it does.H Copyright (C) GyearH G$ame of authorH 2his library is free soft%are& you ca$ redistribute it a$d'or modify it u$der the terms of the ()* +esser (e$eral #ublic +ice$se as published by the ,ree -oft%are ,ou$datio$& either versio$ of the +ice$se/ or (at your optio$) a$y later versio$. 2his library is distributed i$ the hope that it %ill be useful/ but 01234*2 5)6 05775)26& %ithout eve$ the implied %arra$ty of 897C35)25:1+126 or ,12)9-- ,47 5 #5721C*+57 #*7#4-9. -ee the ()* +esser (e$eral #ublic +ice$se for more details. 6ou should have received a copy of the ()* +esser (e$eral #ublic +ice$se alo$g %ith this library& if $ot/ %rite to the ,ree -oft%are ,ou$datio$/ 1$c./ ;. ,ra$<li$ -t/ ,ifth ,loor/ :osto$/ 85 ! ..!-.=!. *-5 5lso add i$formatio$ o$ ho% to co$tact you by electro$ic a$d paper mail. 6ou should also get your employer (if you %or< as a programmer) or your school/ if a$y/ to sig$ a Wcopyright disclaimerW for the library/ if $ecessary. 3ere is a sample& alter the $ames@ ..
6oyody$e/ 1$c./ hereby disclaims all copyright i$terest i$ the library ],robN (a library for t%ea<i$g <$obs) %ritte$ by Mames 7a$dom 3ac<er. Gsig$ature of 2y Coo$H/ . 5pril .ZZ! 2y Coo$/ #reside$t of Jice 2hatNs all there is to it^