Академический Документы
Профессиональный Документы
Культура Документы
5.4.1
Copyleft
This documentation is provided under the terms of the Creative Commons Public License (CCPL).
For more information about what you can and cannot do with this documentation in accordance with the CCPL,
please read: http://creativecommons.org/licenses/by-nc-sa/2.0/
Notices
Apache Hadoop, Hadoop, HDFS, HBase, Hive, Pig are trademarks of the Apache Software Foundation.
All other brands, product names, company names, trademarks and service marks are the properties of their
respective owners.
Table of Contents
Preface ............................................. xxiii
General information . . . . . . . . . . . . . . . . . . . . . . . . . .
Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Typographical conventions . . . . . . . . . . . .
Feedback and Support . . . . . . . . . . . . . . . . . . . . . . .
xxiii
xxiii
xxiii
xxiii
xxiii
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 54
tCouchDBConnection . . . . . . . . . . . . . . . . . . . . . . . . . . 55
tCouchDBConnection properties . . . . . . . . . 55
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 55
tCouchDBInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
tCouchDBInput properties . . . . . . . . . . . . . . . 56
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . . . 58
tCouchDBOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
tCouchDBOutput properties . . . . . . . . . . . . . 59
Scenario: Replicating data from the
source database to the target database
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
tGSBucketCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
tGSBucketCreate properties . . . . . . . . . . . . . 67
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 68
tGSBucketDelete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
tGSBucketDelete properties . . . . . . . . . . . . . 69
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 69
tGSBucketExist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
tGSBucketExist properties . . . . . . . . . . . . . . . 70
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 70
tGSBucketList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
tGSBucketList properties . . . . . . . . . . . . . . . . 71
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 71
tGSClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
tGSClose properties . . . . . . . . . . . . . . . . . . . . . . 72
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 72
tGSConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
tGSConnection properties . . . . . . . . . . . . . . . . 73
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 73
tGSCopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
tGSCopy properties . . . . . . . . . . . . . . . . . . . . . . 74
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 75
tGSDelete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
tGSDelete properties . . . . . . . . . . . . . . . . . . . . . 76
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 77
tGSGet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
tGSGet properties . . . . . . . . . . . . . . . . . . . . . . . . 78
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 79
tGSList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
tGSList properties . . . . . . . . . . . . . . . . . . . . . . . . 80
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 81
tGSPut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
tGSPut properties . . . . . . . . . . . . . . . . . . . . . . . . 82
Scenario: Managing files with
Google Cloud Storage . . . . . . . . . . . . . . . . . . . 83
tHBaseClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
tHBaseClose properties . . . . . . . . . . . . . . . . . . 90
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 90
tHBaseConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
tHBaseConnection properties . . . . . . . . . . . 91
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . . 92
tHBaseInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
tHBaseInput properties . . . . . . . . . . . . . . . . . . 93
HBase filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Scenario: Exchanging customer data
with HBase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
tHBaseOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
tHBaseOutput properties . . . . . . . . . . . . . . . 104
tHBaseOutput in Talend Map/
Reduce Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . 108
tHCatalogInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
tHCatalogInput Properties . . . . . . . . . . . . . . 109
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . 111
tHCatalogLoad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
tHCatalogLoad Properties . . . . . . . . . . . . . . 112
Related scenario . . . . . . . . . . . . . . . . . . . . . . . . 114
tHCatalogOperation . . . . . . . . . . . . . . . . . . . . . . . . . . 115
tHCatalogOperation Properties . . . . . . . . . 115
iv
tPigJoin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigJoin Properties . . . . . . . . . . . . . . . . . . . . . .
Scenario: Joining two files based on
an exact match and saving the result
to a local file . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigLoad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigLoad Properties . . . . . . . . . . . . . . . . . . . . .
Scenario: Loading an HBase table . . . . . .
tPigMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigMap properties . . . . . . . . . . . . . . . . . . . . .
Optional map settings . . . . . . . . . . . . . . . . . .
Scenario: Joining data about road
conditions in a Pig process . . . . . . . . . . . . .
tPigReplicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigReplicate Properties . . . . . . . . . . . . . . . .
Scenario: Replicating a flow
and sorting two identical flows
respectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigSort Properties . . . . . . . . . . . . . . . . . . . . . .
Scenario: Sorting data in ascending
order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigStoreResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigStoreResult Properties . . . . . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tRiakBucketList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRiakBucketList properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tRiakClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRiakClose properties . . . . . . . . . . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tRiakConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRiakConnection properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tRiakInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRiakInput properties . . . . . . . . . . . . . . . . . . .
Scenario: Exporting data from a Riak
bucket to a local file . . . . . . . . . . . . . . . . . . . .
tRiakKeyList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRiakKeyList properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tRiakOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRiakOutput properties . . . . . . . . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tSqoopExport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSqoopExport Properties . . . . . . . . . . . . . . . .
Additional arguments . . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tSqoopImport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSqoopImport Properties . . . . . . . . . . . . . . . .
Scenario: Importing a MySQL table
to HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSqoopImportAllTables . . . . . . . . . . . . . . . . . . . . . . .
tSqoopImportAllTables Properties . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tSqoopMerge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSqoopMerge Properties . . . . . . . . . . . . . . . .
Scenario: Merging two datasets in
HDFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
298
298
299
305
305
309
314
314
314
315
323
323
324
331
331
331
335
335
338
339
339
339
340
340
340
341
341
341
342
342
343
347
347
347
348
348
349
350
350
354
354
355
355
359
363
363
366
367
367
371
378
378
383
388
388
389
390
390
391
394
tSAPBWInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSAPBWInput Properties . . . . . . . . . . . . . . .
Scenario: Reading data from SAP
BW database . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSAPCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSAPCommit Properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tSAPConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSAPConnection properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tSAPInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSAPInput Properties . . . . . . . . . . . . . . . . . . .
Scenario 1: Retrieving metadata from
the SAP system . . . . . . . . . . . . . . . . . . . . . . . . .
Scenario 2: Reading data in
the different schemas of the
RFC_READ_TABLE function . . . . . . . . .
tSAPOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSAPOutput Properties . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tSAPRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSAPRollback properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tSugarCRMInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSugarCRMInput Properties . . . . . . . . . . . .
Scenario: Extracting account data
from SugarCRM . . . . . . . . . . . . . . . . . . . . . . . .
tSugarCRMOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSugarCRMOutput Properties . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tVtigerCRMInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVtigerCRMInput Properties . . . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tVtigerCRMOutput . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVtigerCRMOutput Properties . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
476
476
476
480
480
480
481
481
481
482
482
483
490
496
496
497
498
498
498
499
499
499
502
502
502
503
503
504
505
505
506
vi
508
508
509
510
510
511
512
512
513
514
514
515
516
516
517
518
518
519
520
520
523
531
531
532
533
533
534
535
535
536
537
537
538
tPaloCheckElements . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloCheckElements Properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tPaloConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloConnection Properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tPaloCube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloCube Properties . . . . . . . . . . . . . . . . . . .
Scenario: Creating a cube in an
existing database . . . . . . . . . . . . . . . . . . . . . . . .
tPaloCubeList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloCubeList Properties . . . . . . . . . . . . . . .
Discovering the read-only output
schema of tPaloCubeList . . . . . . . . . . . . . . .
Scenario: Retrieving detailed cube
information from a given database . . . . . .
tPaloDatabase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloDatabase Properties . . . . . . . . . . . . . . .
Scenario: Creating a database . . . . . . . . . .
tPaloDatabaseList . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloDatabaseList Properties . . . . . . . . . . .
Discovering the read-only output
schema of tPaloDatabaseList . . . . . . . . . . .
Scenario: Retrieving detailed
database information from a given
Palo server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloDimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloDimension Properties . . . . . . . . . . . . .
Scenario: Creating a dimension with
elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloDimensionList . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloDimensionList Properties . . . . . . . . .
Discovering the read-only output
schema of tPaloDimensionList . . . . . . . . .
Scenario: Retrieving detailed
dimension information from a given
database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloInputMulti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloInputMulti Properties . . . . . . . . . . . . . .
Scenario: Retrieving dimension
elements from a given cube . . . . . . . . . . . .
tPaloOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloOutput Properties . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tPaloOutputMulti . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloOutputMulti Properties . . . . . . . . . . . .
Scenario 1: Writing data into a given
cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scenario 2: Rejecting inflow data
when the elements to be written do
not exist in a given cube . . . . . . . . . . . . . . . .
tPaloRule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloRule Properties . . . . . . . . . . . . . . . . . . . .
Scenario: Creating a rule in a given
cube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloRuleList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPaloRuleList Properties . . . . . . . . . . . . . . . .
Discovering the read-only output
schema of tPaloRuleList . . . . . . . . . . . . . . . .
Scenario: Retrieving detailed rule
information from a given cube . . . . . . . . .
tParAccelSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelSCD Properties . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusSCD Properties . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusSCDELT . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusSCDELT Properties . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlSCD Properties . . . . . . . . . . . . . .
539
539
540
541
541
541
542
542
543
546
546
547
547
549
549
550
551
551
552
552
554
554
557
562
562
563
563
566
566
567
571
571
572
573
573
574
577
581
581
582
584
584
585
585
587
587
588
589
589
590
591
591
592
593
593
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tAmazonOracleOutput . . . . . . . . . . . . . . . . . . . . . . . .
tAmazonOracleOutput properties . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tAmazonOracleRollback . . . . . . . . . . . . . . . . . . . . . .
tAmazonOracleRollback properties . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tAmazonOracleRow . . . . . . . . . . . . . . . . . . . . . . . . . . .
tAmazonOracleRow properties . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tCloudStart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tCloudStart Properties . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tCloudStop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tCloudStop Properties . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tGSBucketCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSBucketDelete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSBucketExist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSBucketList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSCopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSDelete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSGet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGSPut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMarketoInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMarketoListOperation . . . . . . . . . . . . . . . . . . . . . . .
tMarketoOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketCreate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketCreate properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketDelete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketDelete properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketExist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketExist properties . . . . . . . . . . . . . .
Scenario: Verifing the absence of a
bucket, creating it and listing all the
S3 buckets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3BucketList properties . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tS3Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3Close properties . . . . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tS3Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3Connection properties . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tS3Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3Delete properties . . . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tS3Get . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3Get properties . . . . . . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tS3List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3List properties . . . . . . . . . . . . . . . . . . . . . . .
Scenario: Listing files with the same
prefix from a bucket . . . . . . . . . . . . . . . . . . . .
tS3Put . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tS3Put properties . . . . . . . . . . . . . . . . . . . . . . .
Scenario: File exchanges with
Amazon S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSalesforceBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSalesforceConnection . . . . . . . . . . . . . . . . . . . . . . . . .
tSalesforceGetDeleted . . . . . . . . . . . . . . . . . . . . . . . . .
tSalesforceGetServerTimestamp . . . . . . . . . . . . .
tSalesforceGetUpdated . . . . . . . . . . . . . . . . . . . . . . . .
tSalesforceInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSalesforceOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSalesforceOutputBulk . . . . . . . . . . . . . . . . . . . . . . . .
655
656
656
658
659
659
659
660
660
661
663
663
664
665
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
681
681
682
682
682
683
683
684
688
688
689
690
690
690
691
691
691
692
692
692
693
693
693
694
694
695
699
699
699
703
704
705
706
707
708
709
710
vii
tSalesforceOutputBulkExec . . . . . . . . . . . . . . . . . . 711
tSugarCRMInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
tSugarCRMOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
716
716
716
717
717
717
719
719
719
723
723
723
726
730
730
730
734
734
734
737
737
737
viii
740
740
740
743
744
745
745
746
748
749
751
751
751
755
755
755
759
759
760
764
764
765
767
777
777
778
782
782
783
787
787
787
791
791
792
793
793
793
798
798
799
802
806
806
807
808
808
808
809
809
809
810
810
810
814
814
815
816
816
818
819
819
819
820
820
821
822
822
822
823
823
824
825
825
825
826
826
826
827
827
827
829
829
830
831
831
831
832
832
834
835
835
835
tAS400Row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tAS400Row properties . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tDB2BulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2BulkExec properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tDB2Close . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Close properties . . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Commit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Commit Properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Connection properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tDB2Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Input properties . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tDB2Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Output properties . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tDB2Rollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Rollback properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tDB2Row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2Row properties . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tDB2SCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2SCDELT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2SP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tDB2SP properties . . . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tInformixBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixBulkExec Properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tInformixClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixClose properties . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tInformixCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixCommit properties . . . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tInformixConnection . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixConnection properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tInformixInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixInput properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tInformixOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixOutput properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tInformixOutputBulk . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixOutputBulk properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tInformixOutputBulkExec . . . . . . . . . . . . . . . . . . . .
tInformixOutputBulkExec properties . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tInformixRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixRollback properties . . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . . . .
tInformixRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixRow properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tInformixSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInformixSP properties . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlBulkExec properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlColumnList . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlColumnList Properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
836
836
837
838
838
839
840
840
840
841
841
841
842
842
843
844
844
845
846
846
848
849
849
849
850
850
851
852
853
854
854
855
856
856
858
859
859
859
860
860
860
861
861
862
863
863
863
865
865
867
868
868
869
870
870
872
873
873
873
874
874
875
876
877
877
878
879
879
881
882
882
882
tMSSqlClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlClose properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlCommit properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlConnection properties . . . . . . . . . .
Scenario: Inserting data into a
database table and extracting useful
information from it . . . . . . . . . . . . . . . . . . . . .
tMSSqlInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlInput properties . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlLastInsertId . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlLastInsertId properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlOutput properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlOutputBulk . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlOutputBulk properties . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlOutputBulkExec . . . . . . . . . . . . . . . . . . . . . .
tMSSqlOutputBulkExec properties . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlRollback properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlRow properties . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlSP Properties . . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlTableList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMSSqlTableList Properties . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlBulkExec properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . . .
tMysqlClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlClose properties . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlColumnList . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlColumnList Properties . . . . . . . . . .
Scenario: Iterating on a DB table and
listing its column names . . . . . . . . . . . . . . . .
tMysqlCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlCommit Properties . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlConnection Properties . . . . . . . . . . .
Scenario: Inserting data in mother/
daughter tables . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlInput properties . . . . . . . . . . . . . . . . .
Scenario 1: Writing columns from a
MySQL database to an output file . . . . . .
Scenario 2: Using context parameters
when reading a table from a MySQL
database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scenario 3: Reading data from
MySQL databases through contextbased dynamic connections . . . . . . . . . . . . .
tMysqlLastInsertId . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlLastInsertId properties . . . . . . . . . .
Scenario: Get the ID for the last
inserted record . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMysqlOutput properties . . . . . . . . . . . . . . . .
883
883
883
884
884
884
885
885
886
891
891
892
893
893
893
894
894
897
898
898
898
899
899
900
901
901
901
902
902
904
905
906
906
907
908
908
908
909
909
910
912
912
912
913
913
913
917
917
917
918
918
918
923
923
924
927
930
936
936
936
941
941
ix
tOracleSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tOracleSCDELT . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tOracleSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tOracleSP Properties . . . . . . . . . . . . . . . . . .
Scenario: Checking number format
using a stored procedure . . . . . . . . . . . . . .
tOracleTableList . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tOracleTableList properties . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlBulkExec . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlBulkExec properties . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlCommit . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlCommit Properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlClose properties . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlConnection . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlConnection Properties . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlInput properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlOutput properties . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlOutputBulk . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlOutputBulk properties . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlOutputBulkExec . . . . . . . . . . . . . . . . .
tPostgresqlOutputBulkExec
properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlRollback . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlRollback properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlRow properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresqlSCDELT . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseBulkExec Properties . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseClose properties . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tSybaseCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseCommit Properties . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tSybaseConnection . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseConnection Properties . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseInput Properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseIQBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseIQBulkExec Properties . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseIQOutputBulkExec . . . . . . . . . . . . . . . . . .
tSybaseIQOutputBulkExec
properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scenario: Bulk-loading data to a
Sybase IQ 12 database . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseOutput Properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseOutputBulk . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseOutputBulk properties . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
1015
1016
1017
1017
1019
1023
1023
1023
1024
1024
1025
1026
1026
1026
1027
1027
1027
1028
1028
1028
1030
1030
1031
1032
1032
1034
1035
1035
1036
1037
1037
1038
1039
1039
1039
1040
1040
1041
1042
1043
1044
1044
1046
1047
1047
1047
1048
1048
1048
1049
1049
1049
1050
1050
1051
1052
1052
1054
1055
1055
1057
1060
1061
1061
1063
1064
1064
1064
tSybaseOutputBulkExec . . . . . . . . . . . . . . . . . . . . .
tSybaseOutputBulkExec properties . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseRollback properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseRow Properties . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseSCDELT . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSybaseSP properties . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tVerticaSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1066
1066
1068
1069
1069
1069
1070
1070
1071
1073
1074
1075
1075
1076
1077
Databases - appliance/datawarehouse
components ...................................... 1079
tGreenplumBulkExec . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumBulkExec Properties . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumClose properties . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumCommit . . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumCommit Properties . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumConnection . . . . . . . . . . . . . . . . . . . . . .
tGreenplumConnection properties . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumGPLoad . . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumGPLoad properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumInput . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumInput properties . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumOutput . . . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumOutput Properties . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumOutputBulk . . . . . . . . . . . . . . . . . . . . .
tGreenplumOutputBulk properties . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumOutputBulkExec . . . . . . . . . . . . . . . .
tGreenplumOutputBulkExec
properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumRollback . . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumRollback properties . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tGreenplumRow Properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tGreenplumSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresBulkExec properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresClose properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tIngresCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresCommit Properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tIngresConnection . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresConnection Properties . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresInput properties . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresOutput properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresOutputBulk . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresOutputBulk properties . . . . . . . . . .
1080
1080
1081
1083
1083
1083
1084
1084
1084
1086
1086
1086
1088
1088
1091
1092
1092
1093
1094
1094
1096
1097
1097
1097
1098
1098
1099
1100
1100
1100
1101
1101
1102
1103
1104
1104
1105
1106
1106
1106
1107
1107
1107
1108
1108
1108
1109
1109
1110
1111
1111
1113
1114
1114
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresOutputBulkExec . . . . . . . . . . . . . . . . . . . . .
tIngresOutputBulkExec properties . . . .
Scenario: Loading data to a table in
the Ingres DBMS . . . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresRollback properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tIngresRow properties . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tIngresSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaBulkExec properties . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tNetezzaClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaClose properties . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaCommit Properties . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaConnection . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaConnection Properties . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tNetezzaInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaInput properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tNetezzaNzLoad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaNzLoad properties . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaOutput properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tNetezzaRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaRollback properties . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tNetezzaRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNetezzaRow properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tNetezzaSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelBulkExec Properties . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tParAccelClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelClose properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tParAccelCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelCommit Properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tParAccelConnection . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelConnection Properties . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tParAccelInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelInput properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tParAccelOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelOutput Properties . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tParAccelOutputBulk . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelOutputBulk properties . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tParAccelOutputBulkExec . . . . . . . . . . . . . . . . . .
tParAccelOutputBulkExec Properties
.........................................
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tParAccelRollback . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelRollback properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tParAccelRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParAccelRow Properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
1114
1115
1115
1116
1119
1120
1120
1120
1121
1121
1122
1123
1124
1124
1125
1126
1126
1126
1127
1127
1127
1128
1128
1128
1129
1129
1130
1131
1131
1135
1136
1136
1138
1139
1139
1139
1140
1140
1141
1142
1143
1143
1145
1146
1146
1146
1147
1147
1147
1148
1148
1149
1150
1150
1151
1152
1152
1154
1155
1155
1155
1157
1157
1158
1159
1159
1159
1160
1160
1161
xi
tParAccelSCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftClose properties . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftCommit properties . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftConnection . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftConnection properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftInput properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tRedshiftOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftOutput properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tRedshiftRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftRollback properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRedshiftRow properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tTeradataClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataClose properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataCommit Properties . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataConnection . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataConnection Properties . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataFastExport . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataFastExport Properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataFastLoad . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataFastLoad Properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataFastLoadUtility . . . . . . . . . . . . . . . . . . .
tTeradataFastLoadUtility Properties . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataInput Properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tTeradataMultiLoad . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataMultiLoad Properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataOutput Properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tTeradataRollback . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataRollback Properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataRow Properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tTeradataTPTExec . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataTPTExec Properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataTPTUtility . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataTPTUtility Properties . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tTeradataTPump . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataTPump Properties . . . . . . . . . . .
Scenario: Inserting data into a
Teradata database table . . . . . . . . . . . . . . . .
tVectorWiseCommit . . . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseCommit Properties . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseConnection . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseConnection Properties . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseInput . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xii
1163
1164
1164
1164
1165
1165
1165
1166
1166
1166
1167
1167
1168
1169
1169
1171
1172
1172
1172
1173
1173
1174
1176
1176
1176
1177
1177
1177
1178
1178
1179
1180
1180
1181
1182
1182
1182
1183
1183
1184
1185
1185
1186
1187
1187
1188
1189
1189
1191
1192
1192
1192
1193
1193
1194
1196
1196
1197
1198
1198
1199
1200
1200
1201
1205
1205
1205
1206
1206
1206
1208
tVectorWiseInput Properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseOutput . . . . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseOutput Properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseRollback . . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseRollback Properties . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseRow Properties . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVerticaBulkExec . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaBulkExec Properties . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tVerticaClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaClose properties . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVerticaCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaCommit Properties . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVerticaConnection . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaConnection Properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVerticaInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaInput Properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tVerticaOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaOutput Properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tVerticaOutputBulk . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaOutputBulk Properties . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tVerticaOutputBulkExec . . . . . . . . . . . . . . . . . . . .
tVerticaOutputBulkExec Properties . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tVerticaRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaRollback Properties . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tVerticaRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tVerticaRow Properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
1208
1209
1210
1210
1212
1213
1213
1213
1214
1214
1215
1216
1216
1217
1218
1218
1218
1219
1219
1219
1220
1220
1220
1222
1222
1223
1224
1224
1227
1228
1228
1228
1230
1230
1231
1232
1232
1232
1233
1233
1234
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1252
1254
1256
1256
1257
1258
1260
1260
1261
1264
1264
1265
1267
1267
1267
1269
1269
1270
1271
1271
1272
1273
1273
1273
1274
1274
1274
1275
1275
1275
1278
1278
1278
1279
1279
1279
1280
1280
1280
1281
1281
1281
1282
1282
1282
1283
1283
1283
1284
1284
1284
1285
1285
1286
1287
1287
1289
1290
1290
1290
1291
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1303
1304
tHSQLDbOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tHSQLDbOutput properties . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tHSQLDbRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tHSQLDbRow properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tInterbaseClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseClose properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseCommit Properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseConnection . . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseConnection properties . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tInterbaseInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseInput properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tInterbaseOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseOutput properties . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tInterbaseRollback . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseRollback properties . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tInterbaseRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tInterbaseRow properties . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tJavaDBInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJavaDBInput properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tJavaDBOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJavaDBOutput properties . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tJavaDBRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJavaDBRow properties . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tJDBCColumnList . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCColumnList Properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tJDBCClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCClose properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tJDBCCommit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCCommit Properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tJDBCConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCConnection Properties . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tJDBCInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCInput properties . . . . . . . . . . . . . . . .
tJDBCInput in Talend Map/Reduce
Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tJDBCOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCOutput properties . . . . . . . . . . . . . . .
tJDBCOutput in Talend Map/Reduce
Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tJDBCRollback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCRollback properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tJDBCRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCRow properties . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tJDBCSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCSP Properties . . . . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tJDBCTableList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tJDBCTableList Properties . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tLDAPAttributesInput . . . . . . . . . . . . . . . . . . . . . . .
tLDAPAttributesInput Properties . . . . . .
1305
1305
1307
1309
1309
1310
1311
1311
1311
1312
1312
1312
1313
1313
1313
1314
1314
1315
1316
1316
1318
1319
1319
1319
1320
1320
1321
1322
1322
1322
1323
1323
1324
1325
1325
1326
1327
1327
1327
1328
1328
1328
1329
1329
1329
1330
1330
1331
1332
1332
1333
1334
1335
1335
1337
1338
1339
1339
1339
1340
1340
1342
1343
1343
1344
1345
1345
1345
1346
1346
xiii
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tLDAPClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tLDAPClose properties . . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tLDAPConnection . . . . . . . . . . . . . . . . . . . . . . . . . . .
tLDAPConnection Properties . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tLDAPInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tLDAPInput Properties . . . . . . . . . . . . . . . .
Scenario: Displaying LDAP
directorys filtered content . . . . . . . . . . . .
tLDAPOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tLDAPOutput Properties . . . . . . . . . . . . . .
Scenario: Editing data in a LDAP
directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tLDAPRenameEntry . . . . . . . . . . . . . . . . . . . . . . . . .
tLDAPRenameEntry properties . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tMaxDBInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMaxDBInput properties . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tMaxDBOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMaxDBOutput properties . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tMaxDBRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMaxDBRow properties . . . . . . . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tMongoDBBulkLoad . . . . . . . . . . . . . . . . . . . . . . . . .
tMongoDBClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMongoDBConnection . . . . . . . . . . . . . . . . . . . . . . .
tMongoDBInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMongoDBOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMongoDBRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNeo4jClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNeo4jConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNeo4jInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNeo4jOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tNeo4jOutputRelationship . . . . . . . . . . . . . . . . . . .
tNeo4jRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParseRecordSet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tParseRecordSet properties . . . . . . . . . . . .
Related Scenario . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusBulkExec . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusBulkExec properties . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusClose . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusClose properties . . . . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusCommit . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusCommit Properties . . . . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusConnection . . . . . . . . . . . . . . . . . . . .
tPostgresPlusConnection Properties . . .
Related scenario . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusInput . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusInput properties . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusOutput . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusOutput properties . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusOutputBulk . . . . . . . . . . . . . . . . . . . .
tPostgresPlusOutputBulk properties . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusOutputBulkExec . . . . . . . . . . . . . .
tPostgresPlusOutputBulkExec
properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusRollback . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusRollback properties . . . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusRow . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPostgresPlusRow properties . . . . . . . . . .
xiv
1347
1348
1348
1348
1349
1349
1350
1351
1351
1352
1355
1355
1356
1360
1360
1361
1362
1362
1362
1364
1364
1365
1366
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1380
1380
1381
1381
1382
1383
1383
1383
1384
1384
1384
1385
1385
1385
1387
1387
1388
1389
1389
1391
1392
1392
1393
1394
1394
1395
1396
1396
1396
1397
1397
tCombinedSQLFilter . . . . . . . . . . . . . . . . . . . . . . . . 1453
tCombinedSQLFilter Properties . . . . . . . 1453
Related Scenario . . . . . . . . . . . . . . . . . . . . . . 1453
tCombinedSQLInput . . . . . . . . . . . . . . . . . . . . . . . . 1454
tCombinedSQLInput properties . . . . . . . 1454
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1454
tCombinedSQLOutput . . . . . . . . . . . . . . . . . . . . . . 1455
tCombinedSQLOutput properties . . . . . 1455
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1455
tDB2Connection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456
tELTGreenplumInput . . . . . . . . . . . . . . . . . . . . . . . 1457
tELTGreenplumInput properties . . . . . . 1457
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1457
tELTGreenplumMap . . . . . . . . . . . . . . . . . . . . . . . . 1458
tELTGreenplumMap properties . . . . . . . 1458
Scenario: Mapping data using a
simple implicit join . . . . . . . . . . . . . . . . . . . . 1459
Related scenario: . . . . . . . . . . . . . . . . . . . . . . 1465
tELTGreenplumOutput . . . . . . . . . . . . . . . . . . . . . 1466
tELTGreenplumOutput properties . . . . . 1466
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1466
tELTHiveInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1468
tELTHiveInput properties . . . . . . . . . . . . . 1468
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1468
tELTHiveMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1469
tELTHiveMap properties . . . . . . . . . . . . . . 1469
Scenario: Joining table columns and
writing them into Hive . . . . . . . . . . . . . . . . 1474
tELTHiveOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1481
tELTHiveOutput properties . . . . . . . . . . . 1481
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1481
tELTJDBCInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1482
tELTJDBCInput properties . . . . . . . . . . . . 1482
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1482
tELTJDBCMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1483
tELTJDBCMap properties . . . . . . . . . . . . 1483
Related scenario: . . . . . . . . . . . . . . . . . . . . . . 1484
tELTJDBCOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . 1485
tELTJDBCOutput properties . . . . . . . . . . 1485
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1485
tELTMSSqlInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1487
tELTMSSqlInput properties . . . . . . . . . . . 1487
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1487
tELTMSSqlMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1488
tELTMSSqlMap properties . . . . . . . . . . . . 1488
Related scenario: . . . . . . . . . . . . . . . . . . . . . . 1489
tELTMSSqlOutput . . . . . . . . . . . . . . . . . . . . . . . . . . 1490
tELTMSSqlOutput properties . . . . . . . . . 1490
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1490
tELTMysqlInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1492
tELTMysqlInput properties . . . . . . . . . . . 1492
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1492
tELTMysqlMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1493
tELTMysqlMap properties . . . . . . . . . . . . 1493
Scenario 1: Aggregating table
columns and filtering . . . . . . . . . . . . . . . . . . 1495
Scenario 2: ELT using an Alias table. . 1499
tELTMysqlOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . 1503
tELTMysqlOutput properties . . . . . . . . . . 1503
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1503
tELTNetezzaInput . . . . . . . . . . . . . . . . . . . . . . . . . . . 1505
tELTNetezzaInput properties . . . . . . . . . . 1505
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1505
tELTNetezzaMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1506
tELTNetezzaMap properties . . . . . . . . . . 1506
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1507
tELTNetezzaOutput . . . . . . . . . . . . . . . . . . . . . . . . . 1508
tELTNetezzaOutput properties . . . . . . . . 1508
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1508
tELTOracleInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1510
tELTOracleInput properties . . . . . . . . . . . 1510
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1510
tELTOracleMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1511
tELTOracleMap properties . . . . . . . . . . . . 1511
Scenario: Updating Oracle DB
entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1513
tELTOracleOutput . . . . . . . . . . . . . . . . . . . . . . . . . . 1516
tELTOracleOutput properties . . . . . . . . . 1516
Scenario: Using the Oracle MERGE
function to update and add data
simultaneously . . . . . . . . . . . . . . . . . . . . . . . . . 1517
tELTPostgresqlInput . . . . . . . . . . . . . . . . . . . . . . . . 1522
tELTPostgresqlInput properties . . . . . . . 1522
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1522
tELTPostgresqlMap . . . . . . . . . . . . . . . . . . . . . . . . . 1523
tELTPostgresqlMap properties . . . . . . . . 1523
Related scenario: . . . . . . . . . . . . . . . . . . . . . . 1524
tELTPostgresqlOutput . . . . . . . . . . . . . . . . . . . . . . 1525
tELTPostgresqlOutput properties . . . . . . 1525
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1525
tELTSybaseInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1527
tELTSybaseInput properties . . . . . . . . . . . 1527
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1527
tELTSybaseMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1528
tELTSybaseMap properties . . . . . . . . . . . 1528
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1529
tELTSybaseOutput . . . . . . . . . . . . . . . . . . . . . . . . . . 1530
tELTSybaseOutput properties . . . . . . . . . 1530
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1531
tELTTeradataInput . . . . . . . . . . . . . . . . . . . . . . . . . . 1532
tELTTeradataInput properties . . . . . . . . . 1532
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1532
tELTTeradataMap . . . . . . . . . . . . . . . . . . . . . . . . . . . 1533
tELTTeradataMap properties . . . . . . . . . . 1533
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1534
tELTTeradataOutput . . . . . . . . . . . . . . . . . . . . . . . . 1536
tELTTeradataOutput properties . . . . . . . 1536
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1536
tFirebirdConnection . . . . . . . . . . . . . . . . . . . . . . . . . 1538
tGreenplumConnection . . . . . . . . . . . . . . . . . . . . . . 1539
tHiveConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1540
tIngresConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . 1541
tInterbaseConnection . . . . . . . . . . . . . . . . . . . . . . . . 1542
tJDBCConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1543
tMSSqlConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . 1544
tMysqlConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1545
tNetezzaConnection . . . . . . . . . . . . . . . . . . . . . . . . . . 1546
tOracleConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . 1547
tParAccelConnection . . . . . . . . . . . . . . . . . . . . . . . . 1548
tPostgresPlusConnection . . . . . . . . . . . . . . . . . . . . 1549
tPostgresqlConnection . . . . . . . . . . . . . . . . . . . . . . . 1550
tSQLiteConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . 1551
tSQLTemplate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1552
tSQLTemplate properties . . . . . . . . . . . . . . 1552
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1553
tSQLTemplateAggregate . . . . . . . . . . . . . . . . . . . . 1554
tSQLTemplateAggregate properties . . . 1554
Scenario: Filtering and aggregating
table columns directly on the DBMS . . 1555
tSQLTemplateCommit . . . . . . . . . . . . . . . . . . . . . . 1559
tSQLTemplateCommit properties . . . . . 1559
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1560
tSQLTemplateFilterColumns . . . . . . . . . . . . . . . 1561
tSQLTemplateFilterColumns
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1561
Related Scenario . . . . . . . . . . . . . . . . . . . . . . 1562
tSQLTemplateFilterRows . . . . . . . . . . . . . . . . . . . 1563
tSQLTemplateFilterRows Properties . . 1563
Related Scenario . . . . . . . . . . . . . . . . . . . . . . 1564
tSQLTemplateMerge . . . . . . . . . . . . . . . . . . . . . . . . 1565
tSQLTemplateMerge properties . . . . . . . 1565
Scenario: Merging data directly on
the DBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1566
tSQLTemplateRollback . . . . . . . . . . . . . . . . . . . . . . 1573
xv
tSQLTemplateRollback properties . . . .
Related scenarios . . . . . . . . . . . . . . . . . . . . . .
tSybaseConnection . . . . . . . . . . . . . . . . . . . . . . . . . . .
tTeradataConnection . . . . . . . . . . . . . . . . . . . . . . . .
tVectorWiseConnection . . . . . . . . . . . . . . . . . . . . . .
1573
1573
1574
1575
1576
1578
1578
1580
1584
1591
1591
1594
xvi
1651
1656
1656
1657
1658
1658
1658
1661
1661
1662
1668
1668
1669
1673
1673
1674
1677
1677
1678
1683
1683
1683
1687
1687
1688
1690
1691
1691
1692
1694
1700
1700
1701
1702
1702
1703
1706
1709
1709
1711
1712
1712
1713
1714
1717
1717
1718
1721
1721
1721
1722
1722
1722
1723
1723
1728
tFileOutputPositional . . . . . . . . . . . . . . . . . . . . . . . . 1729
tFileOutputPositional Properties . . . . . . . 1729
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1730
tFileOutputProperties . . . . . . . . . . . . . . . . . . . . . . . 1731
tFileOutputProperties properties . . . . . . . 1731
Related scenarios . . . . . . . . . . . . . . . . . . . . . . 1731
tFileOutputXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1732
tFileProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1733
tFileProperties Properties . . . . . . . . . . . . . . 1733
Scenario: Displaying the properties
of a processed file . . . . . . . . . . . . . . . . . . . . . 1733
tFileRowCount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1735
tFileRowCount properties . . . . . . . . . . . . . 1735
Scenario: Writing a file to MySQL if
the number of its records matches a
reference value . . . . . . . . . . . . . . . . . . . . . . . . 1736
tFileTouch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1740
tFileTouch properties . . . . . . . . . . . . . . . . . . 1740
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1740
tFileUnarchive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1741
tFileUnarchive Properties . . . . . . . . . . . . . 1741
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1742
tGPGDecrypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1743
tGPGDecrypt Properties . . . . . . . . . . . . . . . 1743
Scenario: Decrypt a GnuPGencrypted file and display its content. . 1743
tHDFSCompare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1746
tHDFSConnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1747
tHDFSCopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1748
tHDFSDelete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1749
tHDFSExist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1750
tHDFSGet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1751
tHDFSList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1752
tHDFSInput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1753
tHDFSOutput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1754
tHDFSProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1755
tHDFSPut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1756
tHDFSRename . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1757
tHDFSRowCount . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1758
tNamedPipeClose . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1759
tNamedPipeClose properties . . . . . . . . . . 1759
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1759
tNamedPipeOpen . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1760
tNamedPipeOpen properties . . . . . . . . . . . 1760
Related scenario . . . . . . . . . . . . . . . . . . . . . . . 1760
tNamedPipeOutput . . . . . . . . . . . . . . . . . . . . . . . . . . 1761
tNamedPipeOutput properties . . . . . . . . . 1761
Scenario: Writing and loading data
through a named-pipe . . . . . . . . . . . . . . . . . 1762
tPivotToColumnsDelimited . . . . . . . . . . . . . . . . . . 1767
tPivotToColumnsDelimited
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1767
Scenario: Using a pivot column to
aggregate data . . . . . . . . . . . . . . . . . . . . . . . . . 1767
tSqoopExport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1770
tSqoopImport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1771
tSqoopImportAllTables . . . . . . . . . . . . . . . . . . . . . . 1772
tSqoopMerge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1773
1776
1776
1778
1779
1781
1782
1783
1783
1783
1784
1784
xvii
xviii
tMsgBox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tMsgBox properties . . . . . . . . . . . . . . . . . . .
Scenario: Hello world! type test . . . . .
tRowGenerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tRowGenerator properties . . . . . . . . . . . . .
Scenario: Generating random java
data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1958
1958
1958
1960
1960
1961
1966
1967
1967
1967
1971
1971
1971
1973
1973
1973
1974
1974
1974
1977
1977
1978
1980
1980
1980
1981
1981
1981
1985
1985
1986
1990
1991
1991
1991
1992
1992
1992
1996
1996
1997
2000
2000
2000
2001
2001
2002
2006
2006
2007
2010
2010
2011
2012
2012
2012
2016
2024
xix
tPigLoad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigReplicate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigSort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tPigStoreResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tReplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tReplace Properties . . . . . . . . . . . . . . . . . . . .
Scenario 1: Multiple replacements
and column filtering . . . . . . . . . . . . . . . . . . .
Scenario 2: Replacing values and
filtering columns using Map/Reduce
components . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSampleRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSampleRow properties . . . . . . . . . . . . . . . .
Scenario: Filtering rows and groups
of rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSortRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSortRow properties . . . . . . . . . . . . . . . . . . .
Scenario 1: Sorting entries . . . . . . . . . . . .
tSplitRow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tSplitRow properties . . . . . . . . . . . . . . . . . .
Scenario 1: Splitting one row into
two rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tWriteJSONField . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tWriteJSONField properties . . . . . . . . . . .
Scenario: Writing flat data into
JSON fields . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related Scenarios . . . . . . . . . . . . . . . . . . . . .
tXMLMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
tXMLMap properties . . . . . . . . . . . . . . . . . .
Scenario 1: Mapping and
transforming XML data . . . . . . . . . . . . . . .
Scenario 2: Launching a lookup
in a second XML flow to join
complementary data . . . . . . . . . . . . . . . . . . .
Scenario 3: Mapping data using a
filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Scenario 4: Catching the data
rejected by lookup and filter . . . . . . . . . .
Scenario 5: Mapping data using a
group element . . . . . . . . . . . . . . . . . . . . . . . . .
Scenario 6: classing the output data
with aggregate element . . . . . . . . . . . . . . . .
Scenario 7: Restructuring products
data using multiple loop elements . . . . .
2113
2114
2115
2116
2117
2118
2118
2119
2121
2129
2129
2129
2132
2132
2133
2135
2135
2135
2139
2139
2140
2144
2145
2145
2145
2150
2154
2157
2159
2163
2166
2176
2176
2177
2182
2182
2182
2185
2185
2186
2188
2188
2189
xx
2192
2192
2195
2200
2200
2200
2201
2201
2201
2202
2264
2264
2264
2268
2273
2273
2273
2276
2276
2282
2287
2287
2287
2290
2290
2291
2292
2296
2296
2298
2299
2302
2302
2303
2304
2304
2305
2309
2310
2310
2310
2314
2314
2314
xxi
Preface
General information
Purpose
This Reference Guide provides use cases and details about how to set parameters for the major
components found in the Palette of the Integration perspective of Talend Studio.
Information presented in this document applies to release 5.4.1.
Audience
This guide is for users and administrators of Talend Studio.
The layout of GUI screens provided in this document may vary slightly from your actual GUI.
Typographical conventions
This guide uses the following typographical conventions:
text in bold: window and dialog box buttons and fields, keyboard keys, menus, and menu options,
text in [bold]: window, wizard, and dialog box titles,
text in courier: system parameters typed in by the user,
text in italics: file, schema, column, row, and variable names referred to in all use cases, and also
names of the fields in the Basic and Advanced setting views referred to in the property table for
each component,
The
icon indicates an item that provides additional information about an important point. It is
also used to add comments related to a table or a figure,
The
icon indicates a message that gives information about the execution requirements or
recommendation type. It is also used to refer to situations or information the end-user needs to be
aware of or pay special attention to.
http://talendforge.org/forum
xxiv
Also, if you have any questions, concerns or general comments please take part in our product forums which can
be found at: http://www.talendforge.org/forum/index.php
tBigQueryBulkExec
tBigQueryBulkExec
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts of a two
step process. In the first step, an output file is generated. In the second step, this file is used to feed a dataset. These
two steps are fused together in the tBigQueryOutput component, detailed in a separate section. The advantage
of using two separate components is that the data can be transformed before it is loaded in the dataset.
tBigQueryBulkExec Properties
Component family
Function
This component transfers a given file from Google Cloud Storage to Google BigQuery, or
uploads a given file into Google Cloud Storage and then transfers it to Google BigQuery.
Purpose
Basic settings
Connection
Paste the client ID and the client secret, both created and viewable
on the API Access tab view of the project hosting the BigQuery
service and the Cloud Storage service you need to use.
Project ID
Paste the ID of the project hosting the BigQuery service you need
to use.
The default ID of this project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over the
name of the project in the BigQuery Browser Tool.
Authorization code
Dataset
Enter the name of the dataset you need to transfer data to.
Table
Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it doesn't
exist check box.
Action on data
Bulk file already exists in Select this check box to reuse the authentication information for
Google storage
Google Cloud Storage connection, then, complete the File and the
Header fields.
Related Scenario
Access key and Access secret Paste the authentication information obtained from Google for
making requests to Google Cloud Storage.
These keys can be consulted on the Interoperable Access tab view
under the Google Cloud Storage tab of the project.
File to upload
Bucket
Enter the name of the bucket, the Google Cloud Storage container,
that holds the data to be transferred to Google BigQuery.
File
Advanced settings
Header
Die on error
Enter the path to, or browse to the refresh token file you need to
use.
At the first Job execution using the Authorization code you have
obtained from Google BigQuery, the value in this field is the
directory and the name of that refresh token file to be created and
used; if that token file has been created and you need to reuse it,
you have to specify its directory and file name in this field.
With only the token file name entered, Talend Studio considers
the directory of that token file to be the root of the Studio folder.
For further information about the refresh token, see the manual of
Google BigQuery.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
tStatCatcher Statistics
Select this check box to collect the log data at the component level.
Usage
Limitation
N/A
Related Scenario
For related topic, see section Scenario: Writing data in BigQuery
tBigQueryInput
tBigQueryInput
tBigQueryInput Properties
Component family
Function
Purpose
Basic settings
Connection
Paste the client ID and the client secret, both created and viewable
on the API Access tab view of the project hosting the BigQuery
service and the Cloud Storage service you need to use.
Project ID
Paste the ID of the project hosting the BigQuery service you need
to use.
The default ID of this project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over the
name of the project in the BigQuery Browser Tool.
Authorization code
Advanced settings
Query
Enter the path to, or browse to the refresh token file you need to
use.
At the first Job execution using the Authorization code you have
obtained from Google BigQuery, the value in this field is the
directory and the name of that refresh token file to be created and
used; if that token file has been created and you need to reuse it,
you have to specify its directory and file name in this field.
With only the token file name entered, Talend Studio considers
the directory of that token file to be the root of the Studio folder.
For further information about the refresh token, see the manual
of Google BigQuery.
Advanced
number)
Usage
Separator
(for Select this check box to change the separator used for the
numbers.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
tStatCatcher Statistics
Select this check box to collect the log data at the component
level.
This is an input component. It sends the extracted data to the component that follows it.
Limitation
N/A
The following figure shows the schema of the table, UScustomer, we use as example to perform the SELECT
query in.
We will select the State records and count the occurrence of each State among those records.
In the Integration perspective of Studio, create an empty Job, named BigQueryInput for example, from the
Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
3.
2.
3.
Click the
button twice to add two rows and enter the names of your choice for each of them in the
Column column. In this scenario, they are: States and Count.
4.
Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
5.
Navigate to the Google APIs Console in your web browser to access the Google project hosting the BigQuery
and the Cloud Storage services you need to use.
6.
7.
In the Component view of the Studio, paste Client ID, Client secret and Project ID from the API Access tab
view to the corresponding fields, respectively.
In the Run view of the Studio, click Run to execute this Job. The execution will pause at a given moment to
print out in the console the URL address used to get the authorization code.
2.
Navigate to this address in your web browser and copy the authorization code displayed.
3.
In the Component view of tBigQueryInput, paste the authorization code in the Authorization Code field.
Once done, the Run view is opened automatically, where you can check the execution result.
tBigQueryOutput
tBigQueryOutput
tBigQueryOutput Properties
Component family
Function
This component writes the data it receives in a user-specified directory and transfers the data to
Google BigQuery via Google Cloud Storage.
Purpose
This component transfers the data provided by its preceding component to Google BigQuery.
Basic settings
Connection
Property type
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User Guide.
Local filename
Browse to, or enter the path to the file you want to write the
received data in.
Append
Select this check box to add rows to the existing data in the file
specified in Local filename.
Paste the client ID and the client secret, both created and viewable
on the API Access tab view of the project hosting the BigQuery
service and the Cloud Storage service you need to use.
Project ID
Paste the ID of the project hosting the BigQuery service you need
to use.
The default ID of this project can be found in the URL of the
Google API Console, or by hovering your mouse pointer over the
name of the project in the BigQuery Browser Tool.
Authorization code
Dataset
Enter the name of the dataset you need to transfer data to.
Table
Enter the name of the table you need to transfer data to.
If this table does not exist, select the Create the table if it doesn't
exist check box.
Action on data
Access key and Access secret Paste the authentication information obtained from Google for
making requests to Google Cloud Storage.
Enter the name of the bucket, the Google Cloud Storage container,
that holds the data to be transferred to Google BigQuery.
File
Advanced settings
Header
Die on error
Enter the path to, or browse to the refresh token file you need to
use.
At the first Job execution using the Authorization code you have
obtained from Google BigQuery, the value in this field is the
directory and the name of that refresh token file to be created and
used; if that token file has been created and you need to reuse it,
you have to specify its directory and file name in this field.
With only the token file name entered, Talend Studio considers
the directory of that token file to be the root of the Studio folder.
For further information about the refresh token, see the manual of
Google BigQuery.
Field Separator
Create directory if not exists Select this check box to create the directory you defined in the
File field for Google Cloud Storage, if it does not exist.
Custom the flush buffer size Enter the number of rows to be processed before the memory is
freed.
Check disk space
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
tStatCatcher Statistics
Select this check box to collect the log data at the component level.
Usage
This is an output component used at the end of a Job. It receives data from its preceding
component such as tFileInputDelimited, tMap or tMysqlInput.
Limitation
N/A
In the Integration perspective of Talend Studio, create an empty Job, named WriteBigQuery for example,
from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
3.
2.
3.
Click
4.
In the Column column, enter the name of your choice for each of the new rows. For example, fname, lname
and States.
5.
10
6.
In the Number of Rows for RowGenerator field, enter, for example, 100 to define the number of rows to
be generated.
7.
2.
Click Sync columns to retrieve the schema from its preceding component.
3.
In the Local filename field, enter the directory where you need to create the file to be transferred to BigQuery.
4.
Navigate to the Google APIs Console in your web browser to access the Google project hosting the BigQuery
and the Cloud Storage services you need to use.
5.
6.
In the Component view of the Studio, paste Client ID, Client secret and Project ID from the API Access tab
view to the corresponding fields, respectively.
7.
In the Dataset field, enter the dataset you need to transfer data in. In this scenario, it is documentation.
This dataset must exist in BigQuery. The following figure shows the dataset used by this scenario.
11
8.
In the Table field, enter the name of the table you need to write data in, for example, UScustomer.
If this table does not exist in BigQuery you are using, select Create the table if it doesn't exist.
9.
In the Action on data field, select the action. In this example, select Truncate to empty the contents, if there
are any, of target table and to repopulate it with the transferred data.
Navigate to the Google APIs Console in your web browser to access the Google project hosting the BigQuery
and the Cloud Storage services you need to use.
2.
Click Google Cloud Storage > Interoperable Access to open its view.
3.
In the Component view of the Studio, paste Access key, Access secret from the Interoperable Access tab
view to the corresponding fields, respectively.
4.
In the Bucket field, enter the path to the bucket you want to store the transferred data in. In this example,
it is talend/documentation
This bucket must exist in the directory in Cloud Storage
5.
In the File field, enter the directory where in Google Clould Storage you receive and create the file to be
transferred to BigQuery. In this example, it is gs://talend/documentation/biquery_UScustomer.csv. The file
name must be the same as the one you defined in the Local filename field.
Troubleshooting: if you encounter issues such as Unable to read source URI of the file stored in Google Cloud Storage,
check whether you put the same file name in these two fields.
6.
12
In the Run view of Talend Studio, click Run to execute this Job. The execution will pause at a given moment
to print out in the console the URL address used to get the authorization code.
2.
Navigate to this address in your web browser and copy the authorization code displayed.
3.
In the Component view of tBigQueryOutput, paste the authorization code in the Authorization Code field.
Press F6.
Once done, the Run view is opened automatically, where you can check the execution result.
13
14
tBigQueryOutputBulk
tBigQueryOutputBulk
The tBigQueryOutputBulk and tBigQueryBulkExec components are generally used together as parts of a two
step process. In the first step, an output file is generated. In the second step, this file is used to feed a dataset. These
two steps are fused together in the tBigQueryOutput component, detailed in a separate section. The advantage
of using two separate components is that the data can be transformed before it is loaded in the dataset.
tBigQueryOutputBulk Properties
Component family
Function
This component writes given data into a .txt or .csv file, ready to be transferred to Google
BigQuery.
Purpose
This component creates a .txt or .csv file for the data of large size so that you can process it
according to your needs before transferring it to Google BigQuery.
Basic settings
Advanced settings
Property type
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User Guide.
File name
Browse, or enter the path to the .txt or .csv file you need to generate.
Append
Select the check box to write new data at the end of the existing
data. Otherwise, the existing data will be overwritten.
Field Separator
Create directory if not exists Select this check box to create the directory you defined in the File
field for Google Cloud Storage, if it does not exist.
Custom the flush buffer size Enter the number of rows to be processed before the memory is
freed.
Global Variables
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
tStatCatcher Statistics
Select this check box to collect the log data at the component level/
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This is an output component which needs the data provided by its preceding component.
Limitation
N/A
15
Related Scenario
Related Scenario
For related topic, see section Scenario: Writing data in BigQuery
16
tCassandraBulkExec
tCassandraBulkExec
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts of a two
step process. In the first step, an SSTable is generated. In the second step, this SSTable is written into Cassandra.
These two steps are fused together in the tCassandraOutputBulkExec component, detailed in a separate section.
The advantage of using two separate components is that the data can be transformed before it is loaded into
Cassandra.
tCassandraBulkExec properties
Component family
Big
Data
Cassandra
Function
Purpose
As a dedicated component, tCassandraBulkExec allows you to gain in performance while carrying out
the Insert operations to a Cassandra column family.
Basic settings
DB Version
Host
Port
Required
authentication
Select this check box to provide credentials for the Cassandra authentication.
This check box will not appear if you select Cassandra 1.1.2 from theDB
Version list.
Username
Fill in this field with the username for the Cassandra authentication.
Password
Fill in this field with the password for the Cassandra authentication.
Keyspace
Type in the name of the keyspace into which you want to write the SSTable.
Column family
Type in the name of the column family into which you want to write the SSTable.
SSTable directory
Specify the local directory of the SSTable to be loaded into Cassandra. Note that the
complete path to the SSTable will be the local directory appended by the specified
keyspace name and column family name.
For example, if you set the local directory to /home/talend/sstable, and specify testk
as the keyspace name and testc as the column family name, the complete path to the
SSTable will be /home/talend/sstable/testk/testc/.
Advanced settings
tStatCatcher
Statistics
Select this check box to gather the Job processing metadata at the Job level as well
as at each component level.
Usage
Limitation
Related Scenario
No scenario is available for this component yet.
17
tCassandraClose
tCassandraClose
tCassandraClose properties
Component family
Function
Purpose
Basic settings
Component List
Advanced settings
tStatCatcher Statistics
Usage
Limitation
n/a
Related Scenario
For a scenario in which tCassandraClose is used, see section Scenario: Handling data with Cassandra.
18
tCassandraConnection
tCassandraConnection
tCassandraConnection properties
Component Family
Function
Purpose
Basic settings
DB Version
Server
Port
Fill in this field with the username for the Cassandra authentication.
Password
Fill in this field with the password for the Cassandra authentication.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component is generally used with other Cassandra components, particularly tCassandraClose.
Limitation
n/a
Related scenario
For a scenario in which tCassandraConnection is used, see section Scenario: Handling data with Cassandra.
19
tCassandraInput
tCassandraInput
tCassandraInput properties
Component family
Function
tCassandraInput allows you to read data from a Cassandra keyspace and send data in the Talend
flow.
Purpose
tCassandraInput allows you to extract the desired data from a standard or super column family
of a Cassandra keyspace so as to apply changes to the data.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
DB Version
Host
Port
Required authentication
Username
Fill in this field with the username for the Cassandra authentication.
Password
Fill in this field with the password for the Cassandra authentication.
Keyspace configuration
Keyspace
Type in the name of the keyspace from which you want to read data.
Column family
Type in the name of the column family from which you want to
read data.
Include key
columns
in
output Select this check box to include the key of the column family in
output columns.
Key column: select the key column from the list.
Select the appropriate Talend data type for the row key from the list.
Select the corresponding Cassandra type for the row key from the
list.
The value of the Default option varies with the selected
row key type. For example, if you select String from the
Row key type list, the value of the Default option will be
UTF8.
For more information about the mapping table between Cassandra
type and Talend data type, see section Mapping table between
Cassandra type and Talend data type.
Include super key output Select this check box to include the super key of the column family
columns
in output columns.
Super key column: select the desired super key column from
the list.
This check box appears only if you select Super from the Column
family type drop-down list.
Super column type
20
Super column Cassandra Select the corresponding Cassandra type for the super column from
type
the list.
For more information about the mapping table between Cassandra
type and Talend data type, see section Mapping table between
Cassandra type and Talend data type.
Query configuration
Select this check box to specify the row keys of the column family
directly.
Row Keys
Type in the specific row keys of the column family in the correct
format depending on the row key type.
This field appears only if you select the Specify row keys check
box.
Key start
Key end
Key limit
Type in the number of rows to be read between the start row key
and the end row key.
Specify columns
Select this check box to specify the column names of the column
family directly.
Columns
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
BytesType
byte[]
21
Cassandra Type
AsciiType
String
UTF8Type
String
IntegerType
Object
Int32Type
Integer
LongType
Long
UUIDType
String
TimeUUIDType
String
DateType
Date
BooleanType
Boolean
FloatType
Float
DoubleType
Double
DecimalType
BigDecimal
22
tFileInputDelimited: reads the input file, defines the data structure and sends it to the next component.
tCassandraOutput: writes the data it receives from the preceding component into a Cassandra keyspace.
tCassandraInput: reads the data from the Cassandra keyspace.
tLogRow: displays the data it receives from the preceding component on the console.
tCassandraClose: closes the connection to the Cassandra server.
Drop the following components from the Palette onto the design workspace: tCassandraConnection,
tFileInputDelimited, tCassandraOutput, tCassandraInput, tLogRow and tCassandraClose.
2.
3.
4.
5.
6.
Double-click the tCassandraConnection component to open its Basic settings view in theComponent tab.
2.
Select the Cassandra version that you are using from the DB Version list. In this example, it is Cassandra
1.1.2.
3.
In the Server field, type in the hostname or IP address of the Cassandra server. In this example, it is localhost.
4.
In the Port field, type in the listening port number of the Cassandra server.
5.
If required, type in the authentication information for the Cassandra connection: Username and Password.
23
2.
Click the [...] button next to the File Name/Stream field to browse to the file that you want to read data
from. In this scenario, the directory is D:/Input/Employees.csv. The CSV file contains four columns: id, age,
name and ManagerID.
id;age;name;ManagerID
1;20;Alex;1
2;40;Peter;1
3;25;Mark;1
4;26;Michael;1
5;30;Christophe;2
6;26;Stephane;3
7;37;Cedric;3
8;52;Bill;4
9;43;Jack;2
10;28;Andrews;4
3.
Click Edit schema to define the data to pass on to the tCassandraOutput component.
24
Double-click the tCassandraOutput component to open its Basic settings view in the Component tab.
2.
Type in required information for the connection or use the existing connection you have configured before.
In this scenario, the Use existing connection check box is selected.
3.
In the Keyspace configuration area, type in the name of the keyspace: Employee in this example.
4.
In the Column family configuration area, type in the name of the column family: Employee_Info in this
example.
5.
In the Action on data list, select the action you want to carry on.
6.
Click Sync columns to retrieve the schema from the preceding component.
7.
Select the key column of the column family from the Key column list. If needed, select the Include key
in columns check box.
2.
Type in required information for the connection or use the existing connection you have configured before.
In this scenario, the Use existing connection check box is selected.
3.
In the Keyspace configuration area, type in the name of the keyspace: Employee in this example.
4.
In the Column family configuration area, type in the name of the column family: Employee_Info in this
example.
25
5.
If needed, select the Include key in output columns check box, and then select the key column of the column
family you want to include from the Key column list.
6.
From the Row key type list, select Integer because id is of integer type in this example.
Keep the Default option for the row key Cassandra type because its value will become the corresponding
Cassandra type Int32 automatically.
7.
In the Query configuration area, select the Specify row keys check box and specify the row keys directly.
In this example, three rows will be read. Next, select the Specify columns check box and specify the column
names of the column family directly. This scenario will read three columns from the keyspace: id, name and
age.
8.
If needed, the Key start and the Key end fields allow you to define the range of rows, and the Key limit
field allows you to specify the number of rows within the range of rows to be read. Similarly, the Columns
range start and the Columns range end fields allow you to define the range of columns of the column
family, and the Columns range limit field allows you to specify the number of columns within the range
of columns to be read.
9.
Select Edit schema to define the data structure to be read from the Cassandra keyspace.
2.
2.
2.
26
27
tCassandraOutput
tCassandraOutput
tCassandraOutput properties
Component family
Function
tCassandraOutput receives data from the preceding component, and writes data into
Cassandra.
Purpose
tCassandraOutput allows you to write data into or delete data from a column family
of a Cassandra keyspace.
Basic settings
DB Version
Host
Port
Required authentication
Keyspace configuration
Username
Password
Keyspace
Action on keyspace
28
tCassandraOutput properties
Sync columns
Super columns
Include super columns in standard Select this check box to include the super
columns
columns in standard columns.
Delete row
Delete columns
Advanced settings
Global Variables
Batch Size
tStatCatcher Statistics
Usage
This component is used as an output component and it always needs an incoming link.
29
Related Scenario
Limitation
n/a
Related Scenario
For a scenario in which tCassandraOutput is used, see section Scenario: Handling data with Cassandra.
30
tCassandraOutputBulk
tCassandraOutputBulk
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together as parts of a two
step process. In the first step, an SSTable is generated. In the second step, this SSTable is written into Cassandra.
These two steps are fused together in the tCassandraOutputBulkExec component, detailed in a separate section.
The advantage of using two separate components is that the data can be transformed before it is loaded into
Cassandra.
tCassandraOutputBulk properties
Component family
Big
Data
Cassandra
Function
tCassandraOutputBulk receives data from the preceding component, and creates an SSTable locally.
Purpose
tCassandraOutputBulk allows you to prepare an SSTable of large size and process it according to your
needs before loading this SSTable into a column family of a Cassandra keyspace.
Basic settings
Schema and Edit A schema is a row description. It defines the number of fields to be processed and
Schema
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
DB Version
Keyspace
Type in the name of the keyspace into which you want to write the SSTable.
Column family
Type in the name of the column family into which you want to write the SSTable.
Partitioner
Select the partitioner which determines how data is distributed across the Cassandra
cluster.
Random: default partitioner in Cassandra 1.1 and earlier.
Murmur3: default partitioner in Cassandra 1.2.
Order preserving: not recommended because it assumes keys are UTF8 strings.
For more information about the partitioner, see http://wiki.apache.org/cassandra/
Partitioners.
Column
comparator
name Select the data type for the column names, which is used to sort columns.
For more information about the comparators, see http://www.datastax.com/docs/1.1/
ddl/column_family#about-data-types-comparators-and-validators.
SSTable directory
Specify the local directory for the SSTable. Note that the complete path to the
SSTable will be the local directory appended by the specified keyspace name and
column family name.
For example, if you set the local directory to /home/talend/sstable, and specify testk
as the keyspace name and testc as the column family name, the complete path to the
SSTable will be /home/talend/sstable/testk/testc/.
Buffer size
Specify what size the SSTable must reach before it is written into Cassandra.
Advanced settings
tStatCatcher
Statistics
Select this check box to gather the Job processing metadata at the Job level as well
as at each component level.
Usage
Limitation
n/a
31
Related scenario
Related scenario
No scenario is available for this component yet.
32
tCassandraOutputBulkExec
tCassandraOutputBulkExec
The tCassandraOutputBulk and tCassandraBulkExec components are generally used together to output data
to an SSTable and then to write the SSTable into Cassandra, in a two step process. These two steps are fused
together in the tCassandraOutputBulkExec component.
tCassandraOutputBulkExec properties
Component family
Big
Data
Cassandra
Function
tCassandraOutputBulkExec receives data from the preceding component, creates an SSTable and then
writes the SSTable into Cassandra.
Purpose
Basic settings
Schema and Edit A schema is a row description. It defines the number of fields to be processed and
Schema
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
DB Version
Host
Port
Required
authentication
Select this check box to provide credentials for the Cassandra authentication.
This check box will not appear if you select Cassandra 1.1.2 from theDB
Version list.
Username
Fill in this field with the username for the Cassandra authentication.
Password
Fill in this field with the password for the Cassandra authentication.
Keyspace
Type in the name of the keyspace into which you want to write the SSTable.
Column family
Type in the name of the column family into which you want to write the SSTable.
Partitioner
Select the partitioner which determines how the data is distributed across the
Cassandra cluster.
Random: default partitioner in Cassandra 1.1 and earlier.
Murmur3: default partitioner in Cassandra 1.2.
Order preserving: not recommended because it assumes keys are UTF8 strings.
For more information about the partitioner, see http://wiki.apache.org/cassandra/
Partitioners.
Column
comparator
name Select the data type for the column names, which is used to sort columns.
For more information about the comparators, see http://www.datastax.com/docs/1.1/
ddl/column_family#about-data-types-comparators-and-validators.
33
Related Scenario
SSTable directory
Specify the local directory for the SSTable. Note that the complete path to the
SSTable will be the local directory appended by the specified keyspace name and
column family name.
For example, if you set the local directory to /home/talend/sstable, and specify testk
as the keyspace name and testc as the column family name, the complete path to the
SSTable will be /home/talend/sstable/testk/testc/.
Buffer size
Specify what size the SSTable must reach before it is written into Cassandra.
Advanced settings
tStatCatcher
Statistics
Select this check box to gather the Job processing metadata at the Job level as well
as at each component level.
Usage
This component is mainly used when no particular transformation is required on the data to be loaded into
the database.
Limitation
Related Scenario
No scenario is available for this component yet.
34
tCassandraRow
tCassandraRow
tCassandraRow properties
Component Family
Function
tCassandraRow is the specific component for this database query. It executes the Cassandra Query
Language (CQL) query stated in the specified database. The row suffix means the component implements
a flow in the Job design although it does not provide output.
Purpose
Depending on the nature of the query and the database, tCassandraRow acts on the actual DB structure
or on the data (although without handling data).
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
DB Version
Host
Port
Required Authentication
Username
Fill in this field with the username for the Cassandra authentication.
Password
Fill in this field with the password for the Cassandra authentication.
Type in the name of the keyspace on which you want to execute the
CQL commands.
Column
configuration
Query
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Limitation
Related scenario
For related topics, see
section Scenario 1: Removing and regenerating a MySQL table index.
section Scenario 2: Using PreparedStatement objects to query data.
35
Related scenario
36
tCouchbaseClose
tCouchbaseClose
tCouchbaseClose properties
Component family
Function
Purpose
This component closes a connection to the Couchbase bucket when all transactions are done, in
order to guarantee the integrity of transactions.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Limitation
n/a
Related scenario
For a related scenario, see section Scenario: Inserting documents to a data bucket in the Couchbase database.
37
tCouchbaseConnection
tCouchbaseConnection
tCouchbaseConnection properties
Component family
Function
Purpose
This component allows you to create a connection to a Couchbase bucket and reuse that connection
in other components.
Basic settings
DB Version
Data Bucket
URIs
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Usage
This component is generally used with other Couchbase components, especially tCouchbaseClose.
Limitation
n/a
Related scenario
For a related scenario, see section Scenario: Inserting documents to a data bucket in the Couchbase database.
38
tCouchbaseInput
tCouchbaseInput
tCouchbaseInput Properties
Component family
Function
tCouchbaseInput allows you to fetch your documents from the Couchbase database either by
the unique key or through Views.
Purpose
This component allows you to query the documents from the Couchbase database.
Basic settings
View
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
DB Version
Data Bucket
URIs
Key
Query by view
Design document
Doc action
39
tCouchbaseInput Properties
View
View action
Map
Reduce
Startkey
Endkey
Startkey docid
Endkey docid
Select this check box to show the Complexkey field, where you
can type in the complex keys for the view queries.
Note that here the keys refer to the values of the key defined in
the Map function.
Key
(in the Query by view mode) Not available when Use complex key is selected in the Query by
view mode.
Include docs
Select this check box to include the document specified by the Key
in the view results.
Note that the JSONDoc field appears in the schema once this check
box is selected.
Inclusive end
Select this check box to include the specified end key in the result.
Descending
Stale
JSON Configuration
Limit
Skip
JSON field
Mapping
Advanced settings
Die on error
Debug
tStatCatcher Statistics
40
Select this check box to collect the log data at the component level.
Scenario 1: Querying JSON documents in the Couchbase database by unique document IDs
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
As a start component, tCouchbaseInput reads the documents from the Couchbase database
either by the unique key or through Views.
Limitation
n/a
For how to write such documents to the database, see section Scenario: Inserting documents to a data bucket in
the Couchbase database.
2.
3.
4.
41
Scenario 1: Querying JSON documents in the Couchbase database by unique document IDs
2.
In the Data Bucket field, enter the name of the data bucket in the Couchbase database.
3.
In the Password field, enter the password for access to the data bucket.
4.
In the URIs table, click the [+] button to add lines as needed, where you can enter the URIs of the Couchbase
server nodes.
5.
42
Scenario 1: Querying JSON documents in the Couchbase database by unique document IDs
6.
Select the Use existing connection check box to reuse the connection.
7.
In the Key field, enter the document IDs, for example "ELT Overview", "Integration at any scale".
8.
Click the Edit schema button to open the schema editor. The schema contains two pre-defined fields, Key
and Value.
9.
10. Select Table (print values in cells of a table for a better display of the results.
2.
As shown above, the JSON documents with the keys (IDs) of ELT Overview and Integration at any scale
are retrieved.
43
Scenario 2: Querying JSON documents in the Couchbase database through view queries
For how to write such documents to the database, see section Scenario: Inserting documents to a data bucket in
the Couchbase database.
2.
3.
4.
44
Scenario 2: Querying JSON documents in the Couchbase database through view queries
2.
In the Data Bucket field, enter the name of the data bucket in the Couchbase database.
3.
In the Password field, enter the password for access to the data bucket.
4.
In the URIs table, click the [+] button to add lines as needed, where you can enter the URIs of the Couchbase
server nodes.
5.
6.
Select the Use existing connection check box to reuse the connection.
7.
Select the Query by view check box to define the View functions and other filters.
8.
In the Design document field, enter the design document name of the View.
45
Scenario 2: Querying JSON documents in the Couchbase database through view queries
In the Doc action list, select Drop create to remove an existing design document and create it again.
In the View field, enter the name of the View.
In the View action list, select Create to create the View.
9.
Where, the Key is doc.id, namely the id field of the JSON documents and the Value is
[doc.title,doc.contents], namely the title and contents fields of the JSON documents.
10. Click the Edit schema button to open the schema editor. The schema contains four pre-defined fields, Id,
Key, Value and jsonDoc.
In this scenario, Id holds the document ID, Key holds the id field of the JSON documents, Value holds the
title and contents fields of the JSON documents and jsonDoc holds the entire JSON documents.
11. Select the Include docs check box to retrieve the entire documents.
12. Double-click tLogRow to open its Basic settings view.
13. Select Table (print values in cells of a table for a better display of the results.
46
Scenario 2: Querying JSON documents in the Couchbase database through view queries
2.
3.
As shown above, the View is created and the document information is correctly fetched.
47
tCouchbaseOutput
tCouchbaseOutput
tCouchbaseOutput Properties
Component family
Function
tCouchbaseOutput inserts, updates, upserts or deletes the documents in the Couchbase database
which are stored in the form of Key/Value pairs, where the Value can be JSON or binary data.
Purpose
This component allows you to perform actions on the JSON or binary documents stored in the
Couchbase database based on the incoming flat data from a file, a database table etc.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
DB Version
Data Bucket
URIs
Key
Value
Action on data
Advanced settings
48
Die on error
Expire
Select this check box to collect the log data at the component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and
it returns an integer.
NB_LINE_REJECTED: Indicates the number of rows rejected. This is an After variable and
it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Preceded by an input component, tCouchbaseOutput wraps flat data into JSON documents for
storage in the Couchbase database.
Limitation
n/a
2.
3.
4.
49
2.
In the Data Bucket field, enter the name of the data bucket in the Couchbase database.
In the Password field, enter the password for access to the data bucket.
In the URIs table, click the [+] button to add lines as needed, where you can enter the URIs of the Couchbase
server nodes.
3.
4.
50
5.
Click the [+] button to add four columns, namely id, author, title and contents, of the string type.
Click OK to validate the setup and close the editor.
6.
7.
Select the Use existing connection check box to reuse the connection.
8.
In the Key list, select the field title whose values will be used as the IDs of documents inserted to the
Couchbase database.
9.
Select the Generate JSON Document check box and click the Configure JSON Tree button to open the
JSON tree mapper.
51
10. Press the Shift key to select all the fields in the Linker source area and drop them onto the rootTag node
in the Link target part.
11. In the pop-up box, select Create as sub-element of target node.
2.
Go to the Couchbase web console and view the documents stored in the data bucket blog:
52
As shown above, the source records have been saved in the Couchbase database in the form of JSON
documents.
53
tCouchDBClose
tCouchDBClose
tCouchDBClose properties
Component family
Function
Purpose
Basic settings
Component List
Advanced settings
tStatCatcher Statistics
Usage
Limitation
n/a
Related Scenario
For a scenario in which tCouchDBClose is used, see section Scenario: Replicating data from the source database
to the target database.
54
tCouchDBConnection
tCouchDBConnection
tCouchDBConnection properties
Component Family
Function
Purpose
tCouchDBConnection enables the reuse of the connection it creates to a CouchDB server, and allows
you to configure replication parameters if a replication is triggered between the source database and the
target database.
Basic settings
DB Version
Server
Port
Database
Select the check box to set the replication in the table that appears.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component is generally used with other CouchDB components, particularly tCouchDBClose.
Limitation
n/a
Related scenario
For a scenario in which tCouchDBConnection is used, see section Scenario: Replicating data from the source
database to the target database.
55
tCouchDBInput
tCouchDBInput
tCouchDBInput properties
Component family
Function
tCouchDBInput allows you to read data from CouchDB and send data in the Talend flow.
Purpose
tCouchDBInput is used to extract the desired JSON data out of a CouchDB database to make transformation
to it, migrate it to another target format, or process it before inserting it to the same database.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of fields to be processed
and passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Click Edit Schema to make changes to the schema.
The columns in the schema may vary depending on your configuration:
If you select the Query by view check box and the Is reduce check box at the
same time and specify a group level after selecting the Group check box, only
the key and value columns are available in the schema.
If you select the Include docs check box but do not select the Is reduce check
box, the id, key, value and JSONDoc columns are available in the schema.
If you keep both the Is reduce check box and Include docs check box clear,
the id, key, and value columns are available in the schema.
Connection
Replication
Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
DB Version
Server
Port
Database
Querying options
Target DB name
Specify the target database to which the documents will be copied. It can be a
local database name or a remote database URL.
Continuous
Select this check box to continue a replication upon the server restart.
Create target DB
Select this check box to create the target database if it does not exist.
IS Canceled
Select this check box to cancel the existing replication between the specified
source database and target database at the end of the Job.
Query by view
Select this check box to specify query conditions based on a view which involves
one map function and one optional reduce function.
Design Document
Type in the name of the design document from which you want to read data.
Action
on
document
design Select the operation you want to perform on the design document of interest:
None: No operation is carried out.
Drop and create design document: The design document is removed and
created again.
Create design document: A new design document is created.
56
tCouchDBInput properties
Type in the name of the view from which you want to read data.
Action on view
Map
Reduce
Start key
End key
Is reduce
Select this check box to make the reduce function take effect.
Group
Select this check box to make the reduce function reduce to a set of distinct keys
or to a single result row.
This check box appears only if you select the Is reduce check box.
Group level
Enter the specific group level in this field after you select the Group check box.
Include docs
Select this check box to include the document which emitted each view entry.
This check box appears only if you do not select the Is reduce check
box.
Descending
Add options
Select this check box to add more query options and define the parameters as
needed.
Select this check box to extract the desired JSON data based on XPath query.
JSON field
Mapping
Schema output column: schema defined to hold the data extracted from the
JSON field.
XPath query: XPath query to specify the node within the JSON field to be
extracted.
Get Nodes: select this check box if you need to get values from a nested node
within the JSON field.
Limit
Die on error
This check box is cleared by default, meaning to skip the row on error and to
complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job level as
well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
57
Related Scenario
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
Limitation
n/a
Related Scenario
For a scenario in which tCouchDBInput is used, see section Scenario: Replicating data from the source database
to the target database.
58
tCouchDBOutput
tCouchDBOutput
tCouchDBOutput properties
Component
family
Function
tCouchDBOutput receives data from the preceding component, and writes data into CouchDB.
Purpose
tCouchDBOutput allows you to load JSON documents, write data into or remove data from them and then save
the documents back to the database on a CouchDB server.
Basic settings
Schema
schema
and
Edit A schema is a row description. It defines the number of fields to be processed and passed
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Sync columns
Connection
Replication
Use
connection
Click this button to retrieve schema from the previous component connected in the Job.
existing Select this check box and in the Component List click the relevant connection component
to reuse the connection details you already defined.
DB Version
Server
Port
Database
Use
replication
Specify the target database to which the documents will be copied. It can be a local database
name or a remote database URL.
Continuous
Select this check box to continue a replication upon the server restart.
Create target DB
Select this check box to create the target database if it does not exist.
IS Canceled
Select this check box to cancel the existing replication between the specified source
database and target database at the end of the Job.
Action on data
JSON
Configuration
Generate
Document
JSON Select this check box to generate a JSON document and configure the desired data structure
for it.
Key
Select the key that you want to use from the list.
Configure JSON Tree Click the [...] button to open the window for JSON tree configuration.
Advanced
settings
Group by
Customize the input columns based on which you want to group the data.
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete
the process for error-free rows.
tStatCatcher Statistics Select this check box to gather the Job processing metadata at the Job level as well as at
each component level.
59
Scenario: Replicating data from the source database to the target database
Global
Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This
is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as an output component and it always needs an incoming link.
Limitation
n/a
60
Drop the following components from the Palette onto the design workspace: tCouchDBConnection,
tFileInputDelimited, tCouchDBOutput, tCouchDBInput, tLogRow and tCouchDBClose.
Scenario: Replicating data from the source database to the target database
2.
3.
4.
5.
6.
Double-click the tCouchDBConnection component to open its Basic settings view in theComponent tab.
2.
3.
In the Database field, type in the name of the database you want to use: bookstore_old in this example.
4.
Select the CouchDB version that you are using from the DB Version list.
5.
6.
In the Replicate target database area, click [+] to add one line for database replication settings.
7.
Enter the name of the target database name: bookstore_new in this example.
8.
Select the Continuous check box to continue the replication upon the server restart.
9.
In this example, the target database does not exist. Select the Create target DB check box to create the target
database.
10. Select the Is Canceled check box to cancel the replication between bookstore_old and bookstore_new at the
end of the Job.
61
Scenario: Replicating data from the source database to the target database
2.
Click the [...] button next to the File name/Stream field to browse to the file that you want to read data
from. In this scenario, it is D:/Input/bookstore.txt. The file contains six columns: _id, title, author, category,
ISBN, and abstract.
_id;title;author;category;ISBN;abstract
001;Computer Networks: A Systems Approach;Larry L. Peterson, Bruce S. Davie;Computer
Science;0123850606;This best-selling and classic book teaches you the key
principles of computer networks with examples drawn from the real world of network
and protocol design.
002;David Copperfield;Charles Dickens;Language&Literature;1555763227;This adaptation
of the original story is presented in the format of a novel study, complete with
exercises and vocabulary lists, and is geared to the language arts classes of
grades 4 and 5.
003;Life of Pi;Yann Martel;Language&Literature;0547350651;The son of a zookeeper,
Pi Patel has an encyclopedic knowledge of animal behavior and a fervent love of
stories.
004;Les Miserables: Easyread Comfort Edition;Victor
Hugo;Language&Literature;1425048250;Expressing the author's ideas about society,
religion and politics, it is in the backdrop of Napoleonic Wars and ensuing years
that the story unravels. Grace, moral philosophy, law and history of France are
discussed.
005;Computer Security;Dieter Gollmann;Computer Science;0470741155;This text moves
away from the 'multi-level' security approach to compare and evaluate design
alternatives in computer security.
006;Advanced Database Systems;Carlo Zaniolo;Database;155860443X;This book, written
by a team of leading specialists in their fields, introduces the research issues at
the forefront of database technology and supports them with a variety of examples.
3.
In the Header field, type in 1 so that the header of the file will be skipped.
4.
Click Edit schema to define the data to pass on to the tCouchDBOutput component.
62
Scenario: Replicating data from the source database to the target database
Double-click the tCouchDBOutput component to open its Basic settings view in the Component tab.
2.
Click Sync columns to retrieve the schema from the preceding component.
3.
Select the Use an existing connection check box. In this example, the replication is triggered when opening
the CouchDB connection.
4.
63
Scenario: Replicating data from the source database to the target database
2.
Click Edit schema to define the data structure to be read from the CouchDB database.
By default, the Include docs check box is selected, so the id, key, value and jsonDoc columns are available
in the schema.
In this example, we define four columns to be extracted: id, title, author and category.
64
Scenario: Replicating data from the source database to the target database
3.
4.
In the Database field, enter the name of the database from which the replicated data will be read. In this
example, it is bookstore_new.
5.
In the Querying options area, type in the start key and end key to set the range of the data to be read: "001"
and "006" in this example.
6.
Select the Extract JSON field check box to extract the desired data.
7.
8.
In the Mapping area, click [+] to add items. Select the schema output column from the list and then type
in the proper XPath query.
2.
Click Edit schema to define the data structure to be displayed on the console. In this example, we need to
remove the jsonDoc column.
3.
65
Scenario: Replicating data from the source database to the target database
2.
2.
The book information read from the replicated database is shown on the console.
66
tGSBucketCreate
tGSBucketCreate
tGSBucketCreate properties
Component Family
Function
Purpose
tGSBucketCreate allows you to create a new bucket which you can use to organize data and control access
to data in Google Cloud Storage.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Bucket configuration
Bucket name
Specify the name of the bucket which you want to create. Note that the
bucket name must be unique across the Google Cloud Storage system.
For more information about the bucket naming convention, see https://
developers.google.com/storage/docs/bucketnaming.
Special configure
Select this check box to provide the additional configuration for the
bucket to be created.
Project ID
Location
Select from the list the location where the new bucket will be created.
Currently, Europe and US are available. By default, the bucket location
is in the US.
Note that once a bucket is created in a specific location, it cannot be
moved to another location.
Acl
Select from the list the desired access control list (ACL) for the new
bucket.
Depending on the ACL on the bucket, the access requests from users
may be allowed or rejected. If you do not specify a predefined ACL for
the new bucket, the predefined project-private ACL applies.
For more information about ACL, see https://developers.google.com/
storage/docs/accesscontrol?hl=en.
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component can be used together with the tGSBucketList component to check if a new bucket is
created successfully.
Limitation
n/a
67
Related scenario
Related scenario
For related topics, see section Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets .
68
tGSBucketDelete
tGSBucketDelete
tGSBucketDelete properties
Component Family
Function
Purpose
tGSBucketDelete allows you to delete an empty bucket in Google Cloud Storage so as to release occupied
resources.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Bucket name
Specify the name of the bucket that you want to delete. Make sure that
the bucket to be deleted is empty.
Bucket deletion cannot be undone, so you need to back up any
data that you want to keep before the deletion.
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component can be used together with the tGSBucketList component to check if the specified bucket
is deleted successfully.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
69
tGSBucketExist
tGSBucketExist
tGSBucketExist properties
Component Family
Function
Purpose
tGSBucketExist allows you to check the existence of a bucket so as to make further operations.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Bucket name
Specify the name of the bucket for which you want to perform a check
to confirm it exists in Google Cloud Storage.
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
BUCKET_EXIST: indicates the existence of a specified bucket. This is a Flow variable and it returns a
boolean.
BUCKET_NAME: indicates the name of a specified bucket. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
Limitation
n/a
Related scenario
For related topics, see section Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets .
70
tGSBucketList
tGSBucketList
tGSBucketList properties
Component Family
Function
tGSBucketList iterates on all buckets within all projects or one specific project in Google Cloud Storage.
Purpose
tGSBucketList allows you to retrieve a list of buckets from all projects or one specific project in Google
Cloud Storage.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Specify project ID
Select this check box and in the Project ID field specify a project ID
from which you want to retrieve a list of buckets.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
CURRENT_BUCKET_NAME: indicates the current bucket name. This is a Flow variable and it returns
a string.
NB_BUCKET: indicates the number of buckets. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
The tGSBucketList component can be used as a standalone component or as a start component of a process.
Limitation
n/a
Related scenario
For related topics, see section Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets .
71
tGSClose
tGSClose
tGSClose properties
Component family
Function
Purpose
Basic settings
Component List
Advanced settings
tStatCatcher Statistics
Usage
This component is generally used with other Google Cloud Storage components,
particularly tGSConnection.
Limitation
n/a
Related scenario
For a scenario in which tGSClose is used, see section Scenario: Managing files with Google Cloud Storage.
72
tGSConnection
tGSConnection
tGSConnection properties
Component Family
Function
Purpose
tGSConnection allows you to provide the authentication information for making requests to the Google
Cloud Storage system and enables the reuse of the connection it creates to Google Cloud Storage.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component is generally used with other Google Cloud Storage components, particularly tGSClose.
Limitation
n/a
Related scenario
For a scenario in which tGSConnection is used, see section Scenario: Managing files with Google Cloud Storage.
73
tGSCopy
tGSCopy
tGSCopy properties
Component Family
Function
tGSCopy copies or moves objects within a bucket or between buckets in Google Cloud Storage.
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Specify the name of the bucket from which you want to copy or move
objects.
Source is folder
Specify the name of the bucket to which you want to copy or move
objects.
Target folder
Specify the target folder to which the objects will be copied or moved.
Action
Select the action that you want to perform on objects from the list.
Copy: copies objects from the source bucket or folder to the target
bucket or folder.
Move: moves objects from the source bucket or folder to the target
bucket or folder.
Rename
Select this check box and in the New name field enter a new name for
the object to be copied or moved.
The Rename check box will not be available if you select the
Source is folder check box.
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
Limitation
n/a
74
Related scenario
Related scenario
For a scenario in which tGSCopy is used, see section Scenario: Managing files with Google Cloud Storage.
75
tGSDelete
tGSDelete
tGSDelete properties
Component Family
Function
tGSDelete deletes the objects which match the specified criteria in Google Cloud Storage.
Purpose
tGSDelete allows you to delete objects from Google Cloud Storage so as to release the occupied resources.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Key prefix
Specify the prefix to delete only objects whose keys begin with the
specified prefix.
Delimiter
Specify the delimiter in order to delete only those objects with key
names up to the delimiter.
Specify project ID
Select this check box and in the Project ID field enter the project ID
from which you want to delete objects.
Select this check box and complete the Bucket table to delete objects
in the specified buckets.
Bucket name: type in the name of the bucket from which you want
to delete objects.
Key prefix: type in the prefix to delete objects whose keys begin with
the specified prefix in the specified bucket.
Delimiter: type in the delimiter to delete those objects with key
names up to the delimiter in the specified bucket.
If you select the Delete object from bucket list check box, the
Key prefix and Delimiter fields as well as the Specify project
ID check box will not be available.
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
76
Related scenario
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
This component can be used together with the tGSList component to check if the objects which match the
specified criteria are deleted successfully.
Limitation
n/a
Related scenario
For a scenario in which tGSDelete is used, see section Scenario: Managing files with Google Cloud Storage.
77
tGSGet
tGSGet
tGSGet properties
Component Family
Function
tGSGet retrieves objects which match the specified criteria from Google Cloud Storage and outputs them
to a local directory.
Purpose
tGSGet allows you to download files from Google Cloud Storage to a local directory.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Key prefix
Specify the prefix to download only objects which keys begin with the
specified prefix.
Delimiter
Specify the delimiter in order to download only those objects with key
names up to the delimiter.
Specify project ID
Select this check box and in the Project ID field enter the project ID
from which you want to obtain objects.
Use keys
Select this check box and complete the Keys table to define the criteria
for objects to be downloaded from Google Cloud Storage.
Bucket name: type in the name of the bucket from which you want
to download objects.
Key: type in the key of the object to be downloaded.
New name: type in a new name for the object to be downloaded.
If you select the Use keys check box, the Key prefix and
Delimiter fields as well as the Specify project ID check box
and the Get files from bucket list check box will not be
available.
Select this check box and complete the Bucket table to define the
criteria for objects to be downloaded from Google Cloud Storage.
Bucket name: type in the name of the bucket from which you want
to download objects.
Key prefix: type in the prefix to download objects whose keys start
with the specified prefix from the specified bucket.
Delimiter: specify the delimiter to download those objects with key
names up to the delimiter from the specified bucket.
78
Related scenario
If you select the Get files from bucket list check box, the Key
prefix and Delimiter fields as well as the Specify project ID
check box and the Use keys check box will not be available.
Output directory
Specify the directory where you want to store the downloaded objects.
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
This component is usually used together with other Google Cloud Storage components, particularly
tGSPut.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
79
tGSList
tGSList
tGSList properties
Component Family
Function
tGSList iterates on a list of objects which match the specified criteria in Google Cloud Storage.
Purpose
tGSList allows you to retrieve a list of objects from Google Cloud Storage one by one.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Key prefix
Specify the key prefix so that only the objects whose keys begin with
the specified string will be listed.
Delimiter
Specify the delimiter in order to list only those objects with key names
up to the delimiter.
Specify project ID
Select this check box and in the Project ID field enter the project ID
from which you want to retrieve a list of objects.
Select this check box and complete the Bucket table to retrieve objects
in the specified buckets.
Bucket name: type in the name of the bucket from which you want
to retrieve objects.
Key prefix: type in the prefix to list only objects whose keys begin
with the specified string in the specified bucket.
Delimiter: type in the delimiter to list only those objects with key
names up to the delimiter.
If you select the List objects in bucket list check box, the Key
prefix and Delimiter fields as well as the Specify project ID
check box will not be available.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
CURRENT_BUCKET: indicates the current bucket name. This is a Flow variable and it returns a string.
CURRENT_KEY: indicates the current file name. This is a Flow variable and it returns a string.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
80
Related scenario
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
The tGSList component can be used as a standalone component or as a start component of a process.
Limitation
n/a
Related scenario
For a scenario in which tGSList is used, see section Scenario: Managing files with Google Cloud Storage
81
tGSPut
tGSPut
tGSPut properties
Component Family
Function
Purpose
tGSPut allows you upload files to Google Cloud Storage so that you can manage them with Google Cloud
Storage.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Bucket name
Type in the name of the bucket into which you want to upload files.
Local directory
Type in the full path of or browse to the local directory where the files
to be uploaded are located.
Type in the Google Storage directory to which you want to upload files.
Die on error
This check box is cleared by default, meaning to skip the row on error
and to complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
This component can be used together with other components, particularly the tGSGet component.
Limitation
n/a
82
Prerequisites: You have purchased a Google Cloud Storage account and created three buckets under the same
Google Storage directory. In this example, the buckets created are bighouse, bed_room, and study_room.
83
1.
Drop the following components from the Palatte to design the workspace: one tGSConnection component,
one tGSPut component, two tGSCopy components, one tGSDelete component, one tGSList component,
one tIterateToFlow component, one tLogRow component and one tGSClose component.
2.
3.
Connect tGSPut to the first tGSCopy using a Trigger > On Subjob Ok link.
4.
Do the same to connect the first tGSCopy to the second tGSCopy, connect the second tGSCopy to
tGSDelete, connect tGSDelete to tGSList, and connect tGSList to tGSClose.
5.
6.
Double-click the tGSConnection component to open its Basic settings view in the Component tab.
2.
Navigate to the Google APIs Console in your web browser to access the Google project hosting the Cloud
Storage services you need to use.
3.
Click Google Cloud Storage > Interoperable Access to open its view, and copy the access key and secret key.
4.
In the Component view of the Studio, paste the access key and secret key to the corresponding fields
respectively.
Double-click the tGSPut component to open its Basic settings view in the Component tab.
2.
Select the Use an existing connection check box and then select the connection you have configured earlier.
3.
In the Bucket name field, enter the name of the bucket into which you want to upload files. In this example,
bighouse.
84
4.
In the Local directory field, browse to the directory from which the files will be uploaded, D:/Input/House
in this example.
The files under this directory are shown below:
5.
Double-click the first tGSCopy component to open its Basic settings view in the Component tab.
2.
Select the Use an existing connection check box and then select the connection you have configured earlier.
3.
In the Source bucket name field, enter the name of the bucket from which you want to copy files, bighouse
in this example.
4.
Select the Source is a folder check box. All files from the bucket bighouse will be copied.
5.
In the Target bucket name field, enter the name of the bucket into which you want to copy files, bed_room
in this example.
6.
Double-click the second tGSCopy component to open its Basic settings view in the Component tab.
85
2.
Select the Use an existing connection check box and then select the connection you have configured earlier.
3.
In the Source bucket name field, enter the name of the bucket from which you want to move files, bighouse
in this example.
4.
In the Source object key field, enter the key of the object to be moved, computer_01.txt in this example.
5.
In the Target bucket name field, enter the name of the bucket into which you want to move files, study_room
in this example.
6.
Select Move from the Action list. The specified source file computer_01.txt will be moved from the bucket
bighouse to study_room.
7.
Select the Rename check box. In the New name field, enter a new name for the moved file. In this example,
the new name is laptop.txt.
8.
Double-click the tGSDelete component to open its Basic settings view in the Component tab.
2.
Select the Use an existing connection check box and then select the connection you have configured earlier.
3.
Select the Delete object from bucket list check box. Fill in the Bucket table with the file information that
you want to delete.
86
In this example, the file computer_03.csv will be deleted from the bucket bed_room whose files are copied
from the bucket bighouse.
Double-click the tGSList component to open its Basic settings view in the Component tab.
2.
Select the Use an existing connection check box and then select the connection you have configured earlier.
3.
Select the List objects in bucket list check box. In the Bucket table, enter the name of the three buckets in
the Bucket name column, bighouse, study_room, and bed_room.
4.
Double-click the tIterateToFlow component to open its Basic settings view in the Component tab.
5.
87
6.
The Mapping table will be populated with the defined columns automatically.
In the Value column, enter globalMap.get("tGSList_2_CURRENT_BUCKET") for the bucketName column
and globalMap.get("tGSList_2_CURRENT_KEY") for the key column. You can also press Ctrl + Space and
then choose the appopriate variable.
7.
Double-click the tLogRow component to open its Basic settings view in the Component tab.
8.
Select Table (print values in cells of a table) for a better view of the results.
Double-click the tGSClose component to open its Basic settings view in the Component tab.
2.
Select the connection you want to close from the Component List.
2.
88
The files in the three buckets are displayed. As expected, at first, the files from the bucket bighouse are
copied to the bucket bed_room, then the file computer_01.txt from the bucket bighouse is moved to the
bucket study_room and renamed to be laptop.txt, finally the file computer_03.csv is deleted from the bucket
bed_room.
89
tHBaseClose
tHBaseClose
tHBaseClose properties
Component family
Function
Purpose
This component is used to close an HBase connection you have established in your Job.
Basic settings
Component list
Advanced settings
Select this check box to collect log data at the component level.
Usage
This component is to be used along with HBase components, especially with tHBaseConnection.
Prerequisites
Before starting, ensure that you have met the Loopback IP prerequisites expected by HBase. For
further information, see Apache's HBase documentation on http://hbase.apache.org/.
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
Limitation
n/a
Related scenario
For a scenario in which tHBaseClose is used, see section Scenario: Exchanging customer data with HBase .
90
tHBaseConnection
tHBaseConnection
tHBaseConnection properties
Component Family
Function
Purpose
This component allows you to establish an HBase connection to be reused by other HBase
components in your Job.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on the
component you are using. Among these options, the Custom option
allows you to connect to a custom Hadoop distribution rather than
any of the distributions given in this list and officially supported by
Talend.
In order to connect to a custom distribution, once selecting Custom,
click the
alternatively:
Advanced settings
HBase version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of the
operating system for running the distribution and a Talend Job must
be the same, such as Windows or Linux.
Zookeeper quorum
Type in the name or the URL of the Zookeeper service you use to
coordinate the transaction between Talend and HBase.
Properties
91
Related scenario
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Usage
This component is generally used with other HBase components, particularly tHBaseClose.
Prerequisites
Before starting, ensure that you have met the Loopback IP prerequisites expected by HBase. For
further information, see Apache's HBase documentation on http://hbase.apache.org/.
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
Limitation
n/a
Related scenario
For a scenario in which tHBaseConnection is used, see section Scenario: Exchanging customer data with HBase .
92
tHBaseInput
tHBaseInput
tHBaseInput properties
Component family
Function
tHBaseInput extracts columns corresponding to schema definition. Then it passes these columns
to the next component via a Main row link.
If you have subscribed to one of the Talend solutions with Big Data, you are able to use
this component in a Talend Map/Reduce Job to generate Map/Reduce code. In that situation,
tHBaseInput belongs to the MapReduce component family.
Purpose
tHBaseInput reads data from a given HBase database and extracts columns of selection. Hbase is
a distributed, column-oriented database that hosts very large, sparsely populated tables on clusters.
Basic settings
Property type
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
Not available for defined.
the Map/Reduce
version of this
component.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on the
component you are using. Among these options, the Custom option
allows you to connect to a custom Hadoop distribution rather than
any of the distributions given in this list and officially supported by
Talend.
In order to connect to a custom distribution, once selecting Custom,
click the
alternatively:
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of the
93
tHBaseInput properties
operating system for running the distribution and a Talend Job must
be the same, such as Windows or Linux.
Zookeeper quorum
Type in the name or the URL of the Zookeeper service you use to
coordinate the transaction between Talend and HBase.
Advanced settings
Table name
Type in the name of the HBase table from which you need to extract
columns.
Mapping
Complete this table to map the columns of the HBase table to be used
with the schema columns you have defined for the data flow to be
processed.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Properties
Filter
Is by filter
Logical operation
Select the operator you need to use to define the logical relation
between filters. This available operators are:
And:
be
every
satisfied.
defined
filtering
It
represents
conditions
must
the
relationship
FilterList.Operator.MUST_PASS_ALL
Filter
Click the button under this table to add as many rows as required,
each row representing a filter. The parameters you may need to set
for a filter are:
Filter type: the drop-down list presents pre-existing filter types
that are already defined by HBase. Select the type of the filter you
need to use.
Filter column: enter the column qualifier on which you need
to apply the active filter. This parameter becomes mandatory
94
tHBaseInput properties
depending on the type of the filter and of the comparator you are
using. For example, it is not used by the Row Filter type but is
required by the Single Column Value Filter type.
Filter family: enter the column family on which you need to apply
the active filter. This parameter becomes mandatory depending
on the type of the filter and of the comparator you are using. For
example, it is not used by the Row Filter type but is required by
the Single Column Value Filter type.
Filter operation: select from the drop-down list the operation to
be used for the active filter.
Filter Value: enter the value on which you want to use the
operator selected from the Filter operation drop-down list.
Filter comparator type: select the type of the comparator to be
combined with the filter you are using.
Depending on the Filter type you are using, some or each of the
parameters become mandatory. For further information, see section
HBase filters
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component is a start component of a Job and always needs an output link.
Prerequisites
Before starting, ensure that you have met the Loopback IP prerequisites expected by HBase. For
further information, see Apache's HBase documentation on http://hbase.apache.org/.
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
95
HBase filters
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
HBase filters
This table presents the HBase filters available in Talend Studio and the parameters required by those filters.
Filter type
Filter column
Filter
family
Filter
operation
Filter
value
Filter
Objective
comparator
type
Yes
Yes
Yes
Yes
Family filter
Yes
Yes
Yes
Yes
Yes
Qualifier filter
Column
filter
Yes
prefix Yes
Yes
Column
filter
Row filter
Yes
Yes
Yes
Value filter
Yes
Yes
Yes
The use explained above of the listed HBase filters is subject to revisions made by Apache in its Apache HBase
project; therefore, in order to fully understand how to use these HBase filters, we recommend reading Apache's
HBase documentation.
96
97
2. Select Hortonworks Data Platform 1.0 from the HBase version list.
3. In the Zookeeper quorum field, type in the name or the URL of the Zookeeper service you are using. In this
example, the name of the service in use is hbase.
4. In the Zookeeper client port field, type in the number of client listening port. In this example, it is 2181.
98
2. In this view, click the three-dot button next to Edit schema to open the schema editor.
3. Click the plus button three times to add three rows and in the Column column, rename the three rows
respectively as: id, name and age.
4. In the Type column, click each of these rows and from the drop-down list, select the data type of every row.
In this scenario, they are Integer for id and age, String for name.
5. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
6. In the Mode area, select the Use Inline Content (delimited file) to display the fields for editing.
7. In the Content field, type in the delimited data to be written into the HBase, separated with the semicolon ";".
In this example, they are:
1;Albert;23
2;Alexandre;24
3;Alfred-Hubert;22
4;Andr;40
5;Didier;28
6;Anthony;35
7;Artus;32
8;Benot;56
9;Catherine;34
10;Charles;21
11;Christophe;36
12;Christian;67
13;Clment ;64
14;Danniel;54
15;Elisabeth;58
16;Emile;32
17;Gregory;30
99
9. Select the Use an existing connection check box and then select the connection you have configured earlier.
In this example, it is tHBaseConnection_1.
10.In the Table name field, type in the name of the table to be created in the HBase. In this example, it is customer.
11.In the Action on table field, select the action of interest from the drop-down list. In this scenario, select Drop
table if exists and create. This way, if a table named customer exists already in the HBase, it will be disabled
and deleted before creating this current table.
12.Click the Advanced settings tab to open the corresponding view.
13.In the Family parameters table, add two rows by clicking the plus button, rename them as family1 and family2
respectively and then leave the other columns empty. These two column families will be created in the HBase
using the default family performance options.
100
The Family parameters table is available only when the action you have selected in the Action on table field is to
create a table in HBase. For further information about this Family parameters table, see section tHBaseOutput.
14.In the Families table of the Basic settings view, enter the family names in the Family name column, each
corresponding to the column this family contains. In this example, the id and the age columns belong to family1
and the name column to family2.
These column families should already exist in the HBase to be connected to; if not, you need to define them in the Family
parameters table of the Advanced settings view for creating them at runtime.
2. Select the Use an existing connection check box and then select the connection you have configured earlier.
In this example, it is tHBaseConnection_1.
3. Click the three-dot button next to Edit schema to open the schema editor.
4. Click the plus button three times to add three rows and rename them as id, name and age respectively in the
Column column. This means that you extract these three columns from the HBase.
5. Select the types for each of the three columns. In this example, Integer for id and age, String for name.
101
6. Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
7. In the Table name field, type in the table from which you extract the columns of interest. In this scenario,
the table is customer.
8. In the Mapping table, the Column column has been already filled automatically since the schema was defined,
so simply enter the name of every family in the Column family column, each corresponding to the column
it contains.
9. Double-click tHBaseClose to open its Component view.
10.In the Component List field, select the connection you need to close. In this example, this connection is
tHBaseConnection_1.
102
These columns of interest are extracted and you can process them according to your needs.
Login to your HBase database, you can check the customer table this Job has created.
103
tHBaseOutput
tHBaseOutput
tHBaseOutput properties
Component family
Function
tHBaseOutput receives data from its preceding component, creates a table in a given HBase
database and writes the received data into this HBase table.
If you have subscribed to one of the Talend solutions with Big Data, you are able to use
this component in a Talend Map/Reduce Job to generate Map/Reduce code. In that situation,
tHBaseOutput belongs to the MapReduce component family and can only wirte data in an existing
HBase table. For further information, see section tHBaseOutput in Talend Map/Reduce Jobs.
Purpose
Basic settings
Property type
Version
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on the
component you are using. Among these options, the Custom option
allows you to connect to a custom Hadoop distribution rather than
any of the distributions given in this list and officially supported by
Talend.
In order to connect to a custom distribution, once selecting Custom,
click the
alternatively:
104
HBase version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of the
operating system for running the distribution and a Talend Job must
be the same, such as Windows or Linux.
Zookeeper quorum
Type in the name or the URL of the Zookeeper service you use to
coordinate the transaction between Talend and HBase.
tHBaseOutput properties
Table name
Action on table
Select the action you need to take for creating an HBase table.
Select this check box to use the customized row keys. Once selected,
the corresponding field appears. Then type in the user-defined row
key to index the rows of the HBase table being created.
For
example,
you
can
type
in
Advanced settings
Families
Die on error
Properties
tStatCatcher Statistics
Only available for Family parameters
creating a HBase table
Select this check box to collect log data at the component level.
Type in the names and, when needs be, the custom performance
options of the column families to be created. These options are
all attributes defined by the HBase data model, so for further
explanation about these options, see Apache's HBase documentation.
The parameter Compression type allows you to select the
format for output data compression.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component is normally an end component of a Job and always needs an input link.
Prerequisites
Before starting, ensure that you have met the Loopback IP prerequisites expected by HBase. For
further information, see Apache's HBase documentation on http://hbase.apache.org/.
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
105
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
In a Talend Map/Reduce Job, tHBaseOutput, as well as the whole Map/Reduce Job using it, generates native
Map/Reduce code. This section presents the specific properties of tHBaseOutput when it is used in that situation.
For further information about a Talend Map/Reduce Job, see the Talend Open Studio for Big Data Getting Started
Guide.
Component family
MapReduce / Output
Function
In a Map/Reduce Job, tHBaseOutput receives data from a transformation component and writes
the data in an existing HBase table.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on the
component you are using. Among these options, the Custom option
allows you to connect to a custom Hadoop distribution rather than
any of the distributions given in this list and officially supported by
Talend.
In order to connect to a custom distribution, once selecting Custom,
click the
alternatively:
106
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of the
operating system for running the distribution and a Talend Job must
be the same, such as Windows or Linux.
Zookeeper quorum
Type in the name or the URL of the Zookeeper service you use to
coordinate the transaction between Talend and HBase.
Table name
Type in the name of the HBase table in which you need to write data.
This table must already exist.
Select the column used as the row key column of the HBase table.
Then if needs be, select the Store row key column to HBase column
check box to make the row key column an HBase column belonging
to a specific column family.
Families
Complete this table to map the columns of the HBase table to be used
with the schema columns you have defined for the data flow to be
processed.
The Column column of this table is automatically filled once you
have defined the schema; the syntax of the Column family:qualifier
column requires each HBase column name (qualifier) to be paired
with its corresponding family name, for example, in an HBase table,
if a Paris column belongs to a France family, then you need to write
it as France:Paris.
Advanced settings
Properties
Usage
107
Related scenario
Hadoop Connection
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
Prerequisites
Before starting, ensure that you have met the Loopback IP prerequisites expected by HBase. For
further information, see Apache's HBase documentation on http://hbase.apache.org/.
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
Related scenario
For related scenario to tHBaseOutput, see section Scenario: Exchanging customer data with HBase .
108
tHCatalogInput
tHCatalogInput
tHCatalogInput Properties
Component family
Function
This component allows you to read data from HCatalog managed database and send data in the
talend flow.
Purpose
The tHCatalogInput component reads data from the specified HCatalog managed database
and sends data in the talend flow to the console or to a specified local file by connecting this
component to a proper component.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Templeton Configuration
HCatalog version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Templeton hostname
109
tHCatalogInput Properties
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the right
to read the keytab file being used. For example, the user name you
are using to execute a Job is user1 and the principal to be used is
guest; in this situation, ensure that user1 has the right to read the
keytab file to be used.
HCatalog Configuration
Database
Table
Partition
Fill this field to specify one or more partitions for the partition
operation on a specified table. When you specify multiple
partitions, use commas to separate every two partitions and use
double quotation marks to quote the partition string.
For further information about Partition, see https://
cwiki.apache.org/Hive/.
Advanced settings
Username
Die on error
Row separator
Field separator
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
Standard Output Folder
Fill this field with the path to which log files are stored.
This field is enabled only when you selected Retrieve
the HCatalog logs check box.
Fill this field with the path to which error log files are stored.
This field is enabled only when you selected Retrieve
the HCatalog logs check box.
110
Related scenario
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
When Use kerberos authentication is selected, the component cannot work with IBM JVM.
Knowledge of Hive Data Definition Language and HCatalog Data Definition Language
is required. For further information about Hive Data Definition Language, see https://
cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL. For further information
about HCatalog Data Definition Language, see https://cwiki.apache.org/confluence/display/
HCATALOG/Design+Document+-+Java+APIs+for+HCatalog+DDL+Commands.
Related scenario
For a related scenario, see section Scenario: HCatalog table management on Hortonworks Data Platform.
111
tHCatalogLoad
tHCatalogLoad
tHCatalogLoad Properties
Component family
Function
This component allows you to write data into an established HCatalog managed table from an
existing file in HDFS.
Purpose
The tHCatalogLoad component writes data into an established HCatalog managed table from
an existing file in HDFS.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Templeton Configuration
HCatalog version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Templeton hostname
Templeton port
112
tHCatalogLoad Properties
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the right
to read the keytab file being used. For example, the user name you
are using to execute a Job is user1 and the principal to be used is
guest; in this situation, ensure that user1 has the right to read the
keytab file to be used.
HCatalog Configuration
Database
Table
Partition
Fill this field to specify one or more partitions for the partition
operation on the specified table. When you specify multiple
partitions, use commas to separate every two partitions and use
double quotation marks to quote the partition string.
For further information about Partition, see Operation
on Partitions in Hive.
Username
File location
Fill this field with the HDFS location to which loaded data is
stored.
Die on error
Advanced settings
Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
HCatalog Configuration
Fill this field with the path to which log files are stored.
This field is enabled only when you selected Retrieve
the HCatalog logs check box.
Fill this field with the path to which error log files are stored.
This field is enabled only when you selected Retrieve
the HCatalog logs check box.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component can be used in a single-component Job or used together with a subjob.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
113
Related scenario
Limitation
When Use kerberos authentication is selected, the component cannot work with IBM JVM.
Knowledge of Hive Data Definition Language and HCatalog Data Definition Language is
required. For further information about Hive Data Definition Language, see Hive Data Definition
Language. For further information about HCatalog Data Definition Language, see HCatalog
Data Definition Language.
Related scenario
For a related scenario, see section Scenario: HCatalog table management on Hortonworks Data Platform.
114
tHCatalogOperation
tHCatalogOperation
tHCatalogOperation Properties
Component family
Big
Data
HCatalog
Function
This component allows you to manage the data stored in HCatalog managed database/table/partition in HDFS.
Purpose
The tHCatalogOperation component offers a platform on which you can operate in HCatalog managed
database/table/partition in HDFS.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from the drop-down list. The
options in the list vary depending on the component you are using. Among these options,
the Custom option allows you to connect to a custom Hadoop distribution rather than any
of the distributions given in this list and officially supported by Talend.
In order to connect to a custom distribution, once selecting Custom, click the
to display the dialog box in which you can alternatively:
button
1. Select Import from existing version to import jar files from a given Hadoop
distribution and then manually add other jar files which that Hadoop distribution does
not provide.
2. Select Import from zip to import jar files from a zip file which, for example, contains
all required jar files set up in another Studio and is exported from that Studio.
In this dialog box, the active check box must be kept selected so as to import
the jar files pertinent to the connection to be created between the custom
distribution and this component.
For an step-by-step example about how to connect to a custom Hadoop distribution and
share this connection, see section Connecting to a custom Hadoop distribution.
Templeton
Configuration
HCatalog
version
Select the version of the Hadoop distribution you are using. Note that if you use
Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
Templeton
hostname
Templeton port
Fill this field with the port of URL of Templeton Webservice. By default, the value for
this field is 50111.
Templeton is a webservice API for Hadoop. It allows you to move data directly
into/out-of HDFS through WebHDFS. For further information about Templeton,
see http://people.apache.org/~thejas/templeton_doc_latest.
Use
kerberos If you are accessing the Hadoop cluster running with Kerberos security, select this check
authentication box, then, enter the Kerberos principal name for the NameNode in the field displayed.
This enables you to use your user name to authenticate against the credentials stored in
Kerberos.
This check box is available depending on the Hadoop distribution you are connecting to.
Use a keytab to Select the Use a keytab to authenticate check box to log into a Kerberos-enabled Hadoop
authenticate
system using a given keytab file. A keytab file contains pairs of Kerberos principals and
115
tHCatalogOperation Properties
encrypted keys. You need to enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal
designates but must have the right to read the keytab file being used. For example, the user
name you are using to execute a Job is user1 and the principal to be used is guest; in this
situation, ensure that user1 has the right to read the keytab file to be used.
Operation on
Operation
Select an action from the list for the DB operation as follows: Create/Drop/Drop if exist/
Drop and create/Drop if exist and create. For further information about the DB operation
in HDFS, see https://cwiki.apache.org/Hive/.
Create the table Select this check box to avoid creating duplicate table when you create a table.
only it doesn't
This check box is enabled only when you select Table from the Operation on
exist already
list.
HCatalog
Configuration
Database
Fill this field with the name of the database in which the HCatalog managed tables are
placed.
Table
Fill this field to operate on one or multiple tables in a database or on a specified HDFS
location.
This field is enabled only when you select Table from the Operation on list. For
further information about the operation on Table, see https://cwiki.apache.org/
Hive/.
Partition
Fill this field to specify one or more partitions for the partition operation on a specified
table. When you specify multiple partitions, use comma to separate every two partitions
and use double quotation marks to quote the partition string.
This field is enabled only when you select Partition from the Operation
on list. For further information about the operation on Partition, see https://
cwiki.apache.org/Hive/.
Username
Database
location
Fill this field with the location of the database file in HDFS.
This field is enabled only when you select Database from the Operation on list.
Database
description
Create
an Select this field to create an external table in an alternative path defined in the Set HDFS
external table
location field in the Advanced settings view. For further information about creating
external table, see https://cwiki.apache.org/Hive/.
This check box is enabled only when you select Table from the Operation on
list and Create/Drop and create/Drop if exist and create from the Operation
list.
Format
Select a file format from the list to specify the format of the external table you want to
create:
TEXTFILE: Plain text files.
RCFILE: Record Columnar files. For further information about RCFILE, see http://
hive.apache.org/docs/.
RCFILE is only available starting with Hive 0.6.0. This list is enabled only when
you select Table from the Operation on list and Create/Drop and create/Drop
if exist and create from the Operation list.
116
tHCatalogOperation Properties
Set partitions
Select this check box to set the partition schema by clicking the Edit schema to the
right of Set partitions check box. The partition schema is either built-in or remote in the
Repository.
This check box is enabled only when you select Table from the Operation on
list and Create/Drop and create/Drop if exist and create from the Operation
list. You must follow the rules of using partition schema in HCatalog managed
tables. For more information about the rules in using partition schema, see http://
incubator.apache.org/hcatalog/docs/.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: The schema will be created and stored locally for this component only. Related
topic: see Talend Studio User Guide.
Set the user Select this check box to specify the user group.
group to use
This check box is enabled only when you select Drop/Drop if exist/Drop and
create/Drop if exist and create from the Operation list. By default, the value
for this field is root. For more information about the user group in the server,
contact your system administrator.
Option
Set
the Select this check box to specify the permissions needed by the operation you select from
permissions to the Operation list.
use
This check box is enabled only when you select Drop/Drop if exist/Drop and
create/Drop if exist and create from the Operation list. By default, the value
for this field is rwxrw-r-x. For more information on user permissions, contact
your system administrator.
Set File location Fill this field to specify a path to which partitioned data is stored.
This check box is enabled only when you select Partition from the Operation on
list and Create/Drop and create/Drop if exist and create from the Operation
list. For further information about storing partitioned data in HDFS, see https://
cwiki.apache.org/Hive/.
Advanced settings
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete
the process for error-free rows.
Comment
Fill this field with the comment for the table you want to create.
This field is enabled only when you select Table from the Operation on list and
Create/Drop and create/Drop if exist and create from the Operation list in
the Basic settings view.
Set
HDFS Select this check box to specify an HDFS location to which the table you want to create is
location
saved. Deselect it to save the table you want to create in the warehouse directory defined
in the key hive.metastore.warehouse.dir in Hive configuration file hive-site.xml.
This check box is enabled only when you select Table from the Operation on list
and Create/Drop and create/Drop if exist and create from the Operation list
in the Basic settings view. For further information about saving data in HDFS,
see https://cwiki.apache.org/Hive/.
Set
row Select this check box to use and define the row formats when you want to create a table:
format(terminated
Field: Select this check box to use Field as the row format. The default value for this field
by)
is "\u0001". You can also specify a customized char in this field.
Collection Item: Select this check box to use Collection Item as the row format. The
default value for this field is "\u0002". You can also specify a customized char in this field.
Map Key: Select this check box to use Map Key as the row format. The default value for
this field is "\u0003". You can also specify a customized char in this field.
Line: Select this check box to use Line as the row format. The default value for this field
is "\n". You can also specify a customized char in this field.
117
This check box is enabled only when you select Table from the Operation on
list and Create/Drop and create/Drop if exist and create from the Operation
list in the Basic settings view. For further information about row formats in the
HCatalog managed table, see https://cwiki.apache.org/Hive/.
Properties
Click [+] to add one or more lines to define table properties. The table properties allow you
to tag the table definition with your own metadata key/value pairs. Make sure that values
in both Key row and Value row must be quoted in double quotation marks.
This table is enabled only when you select Database/Table from the Operation
on list and Create/Drop and create/Drop if exist and create from the
Operation list in the Basic settings view. For further information about table
properties, see https://cwiki.apache.org/Hive/.
Retrieve
the Select this check box to retrieve log files generated during HCatalog operations.
HCatalog logs
Standard Output Browse to, or enter the directory where the log files are stored.
Folder
This field is enabled only when you selected Retrieve the HCatalog logs check
box.
Error
Folder
Output Browse to, or enter the directory where the error log files are stored.
This field is enabled only when you selected Retrieve the HCatalog logs check
box.
tStatCatcher
Statistics
Select this check box to gather the Job processing metadata at the Job level as well as at
each component level.
Usage
This component is commonly used in a single-component Job or used together with a subjob.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio.
The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR
client library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in
the MapR client jar file; without adding it, you may encounter the following error: no MapRClient in
java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native
library of that MapR client. This allows the subscription-based users to make full use of the Data viewer to
view locally in the Studio the data stored in MapR. For further information about how to set this argument,
see the section describing how to view data of Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the
Hadoop distribution you are using.
Limitation
When Use kerberos authentication is selected, the component cannot work with IBM JVM.
Knowledge of Hive Data Definition Language and HCatalog Data Definition Language is
required. For further information about Hive Data Definition Language, see https://cwiki.apache.org/
confluence/display/Hive/LanguageManual+DDL. For further information about HCatalog Data Definition
Language, see https://cwiki.apache.org/confluence/display/HCATALOG/Design+Document+-+Java+APIs
+for+HCatalog+DDL+Commands.
118
Drop the following components from the Palette to the design workspace: tHCatalogOperation,
tHCatalogLoad, tHCatalogInput, tHCatalogOutput, tFixedFlowInput, and tFileOutputDelimited.
2.
Right-click tHCatalogOperation
Trigger>OnSubjobOk connection.
3.
4.
5.
6.
to
connect
it
to
tFixedFlowInput
component
using
119
2.
Click Edit schema to define the schema for the table to be created.
3.
Click [+] to add at least one column to the schema and click OK when you finish setting the schema. In this
scenario, the columns added to the schema are: name, country and age.
4.
Fill the Templeton hostname field with URL of the Templeton webservice you are using. In this scenario,
fill this field with "192.168.0.131".
5.
Fill the Templeton port field with the port for Templeton hostname. By default, the value for this field
is "50111"
6.
Select Table from the Operation on list and Drop if exist and create from the Operation list to create a
table in HDFS.
7.
Fill the Database field with an existing database name in HDFS. In this scenario, the database name is
"talend".
8.
Fill the Table field with the name of the table to be created. In this scenario, the table name is "Customer".
120
9.
Fill the Username field with the username for the DB authentication.
10. Select the Set the user group to use check box to specify the user group. The default user group is "root",
you need to specify the value for this field according to real practice.
11. Select the Set the permissions to use check box to specify the user permission. The default value for this
field is "rwxrwxr-x".
12. Select the Set partitions check box to enable the partition schema.
13. Click the Edit schema button next to the Set partitions check box to define the partition schema.
14. Click [+] to add one column to the schema and click OK when you finish setting the schema. In this scenario,
the column added to the partition schema is: match_age.
2.
Click Edit schema to define a same schema as the one you defined in tHCatalogOperation.
3.
4.
5.
6.
121
7.
Click Sync columns to retrieve the schema defined in the preceding component.
8.
Fill the NameNode URI field with the URI to the NameNode. In this scenario, this URL is "192.168.0.131".
9.
Fill the File name field with the HDFS location of the file you write data to. In this scenario, the file location
is "/user/hdp/Customer/Customer.csv".
122
2.
3.
2.
Click Edit schema to define the schema of the table to be read from the database.
123
3.
Click [+] to add at least one column to the schema. In this scenario, the columns added to the schema are
age and name.
4.
5.
Outputting the data read from the table in HDFS to the console
1.
2.
Click Sync columns to retrieve the schema defined in the preceding component.
3.
Job execution
Press CTRL+S to save your Job and F6 to execute it.
124
The data of the restricted table read from the HDFS is displayed onto the console.
Type in http://talend-hdp:50075/browseDirectory.jsp?dir=/user/hdp/Customer&namenodeInfoPort=50070 to
the address bar of your browser to view the table you created:
125
Click the Customer.csv link to view the content of the table you created.
126
127
tHCatalogOutput
tHCatalogOutput
tHCatalogOutput Properties
Component family
Function
This component allows you to write data into an HCatalog managed table using Talend data flow.
Purpose
The tHCatalogOutput component writes data into a HCatalog managed table using Talend data
flow.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
128
tHCatalogOutput Properties
NameNode URI
File name
Browse to, or enter the location of the file which you write data to.
This file is created automatically if it does not exist.
Action
Templeton Configuration
Templeton hostname
Templeton port
HCatalog Configuration
Database
Table
Partition
Fill this field to specify one or more partitions for the partition
operation on the specified table. When you specify multiple
partitions, use commas to separate every two partitions and use
double quotation marks to quote the partition string.
For further information about Partition, see https://
cwiki.apache.org/Hive/.
Advanced settings
Username
File location
Fill this field with the path to which source data file is stored.
Die on error
Row separator
Field separator
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
Retrieve the HCatalog logs Select this check box to retrieve log files generated during
HCatalog operations.
129
Related scenario
Browse to, or enter the directory where the log files are stored.
This field is enabled only when you selected Retrieve
the HCatalog logs check box.
Browse to, or enter the directory where the error log files are stored.
This field is enabled only when you selected Retrieve
the HCatalog logs check box.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Knowledge of Hive Data Definition Language and HCatalog Data Definition Language
is required. For further information about Hive Data Definition Language, see https://
cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL. For further information
about HCatalog Data Definition Language, see https://cwiki.apache.org/confluence/display/
HCATALOG/Design+Document+-+Java+APIs+for+HCatalog+DDL+Commands.
Related scenario
For a related scenario, see section Scenario: HCatalog table management on Hortonworks Data Platform.
130
tHDFSCompare
tHDFSCompare
tHDFSCompare properties
Component family
Big Data/File
Function
This component compares two files in HDFS and based on the read-only schema, generates a
row flow that presents the comparison information.
Purpose
Basic settings
Property type
Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Connection
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
131
tHDFSCompare properties
User name
Group
Comparison mode
File to compare
Browse, or enter the path to the file in HDFS you need to check
for quality control.
Reference file
Advanced settings
Print to console
Select this check box to display the message in the Run console.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
132
Related scenario
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Related scenario
No scenario is available for this component yet.
133
tHDFSConnection
tHDFSConnection
tHDFSConnection properties
Component family
Function
Purpose
tHDFSConnection connects to a given HDFS so that the other Hadoop components can reuse
the connection it creates to communicate with this HDFS.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
134
Related scenario
User name
Group
Hadoop properties
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitations
Related scenario
No scenario is available for this component yet.
135
tHDFSCopy
tHDFSCopy
tHDFSCopy properties
Component family
Big Data/File
Function
tHDFSCopy copies a source file or folder into a target directory in HDFS and removes this
source if required.
Purpose
Basic settings
Property type
Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
136
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
tHDFSCopy properties
User name
Group
Browse to, or enter the directory in HDFS where the data you need
to use is.
Target location
Rename
Copy merge
Select this check box to merge the part files generated at the end
of a MapReduce computation.
Once selecting it, you need to enter the name of the final merged
file in the Merge name field.
Remove source
Select this check box to remove the source file or folder once this
source is copied to the target location.
Override target file (This Select this check box to override the file already existing in the
option does not override the target location. This option does not override the folder.
directory)
Advanced settings
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
137
Related scenario
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Related scenario
Related topic, see section Scenario: Restoring files from bin
Related topic, see section Scenario: Iterating on a HDFS directory
138
tHDFSDelete
tHDFSDelete
tHDFSDelete properties
Component family
Function
tHDFSDelete deletes a file located on a given Hadoop distributed file system (HDFS).
Purpose
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
139
tHDFSDelete properties
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the right
to read the keytab file being used. For example, the user name you
are using to execute a Job is user1 and the principal to be used is
guest; in this situation, ensure that user1 has the right to read the
keytab file to be used.
Advanced settings
NameNode URI
User name
Group
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitations
140
Related scenario
Related scenario
No scenario is available for this component yet.
141
tHDFSExist
tHDFSExist
tHDFSExist properties
Component family
Big Data/File
Function
Purpose
Basic settings
Property type
Select this check box and in the Component List click the
relevant connection component to reuse the connection details you
already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
142
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
tHDFSExist properties
Advanced settings
NameNode URI
User name
Group
HDFS directory
Browse to, or enter the directory in HDFS where the data you need
to use is.
Enter the name of the file you want to check whether this file
exists. Or if needs be, browse to the file or enter the path to the
file, relative to the directory you entered in HDFS directory.
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
143
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Launch the Hadoop distribution in which you want to check the existence of a particular file. Then, proceed as
follows:
In the Integration perspective of Talend Studio, create an empty Job, named hdfsexist_file for example, from
the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
3.
144
2.
In the Version area, select the Hadoop distribution you are connecting to and its version.
3.
In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from the
current component. For further information, see section tHDFSConnection.
4.
In the HDFS Directory field, browse to, or enter the path to the folder where the file to be checked is. In
this example, browse to /user/ychen/data/hdfs/out/dest.
5.
In the File name or relative path field, enter the name of the file you want to check the existence. For
example, output.csv.
2.
In the Title field, enter the title to be used for the pop-up message box to be created.
3.
In the Buttons list, select OK. This defines the button to be displayed on the message box.
4.
145
5.
In the Message field, enter the message you want to displayed once the file checking is done. In this example,
enter "This file does not exist!".
Click the If link to open the Basic settings view, where you are able to define the condition for checking
the existence of this file.
2.
In the Condition box, press Ctrl+Space to access the variable list and select the global variable EXISTS.
Type an exclamation mark before the variable to negate the meaning of the variable.
Once done, a message box pops up to indicate that this file called output.csv does not exist in the directory you
defined earlier.
In the HDFS we check the existence of the file, browse to this directory specified, you can see that this file does
not exist.
146
147
tHDFSGet
tHDFSGet
tHDFSGet properties
Component family
Function
tHDFSGet copies files from Hadoop distributed file system(HDFS), pastes them in an userdefined directory and if needs be, renames them.
Purpose
tHDFSGet connects to Hadoop distributed file system, helping to obtain large-scale files with
optimized performance.
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
148
tHDFSGet properties
NameNode URI
User name
Group
HDFS directory
Browse to, or enter the directory in HDFS where the data you need
to use is.
Local directory
Browse to, or enter the local directory to store the files obtained
from HDFS.
Overwrite file
Options to overwrite or not the existing file with the new one.
Append
Select this check box to add the new rows at the end of the records.
Include subdirectories
Select this check box if the selected input source type includes subdirectories.
Files
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component combines HDFS connection and data extraction, thus used as a singlecomponent subjob to move data from HDFS to an user-defined local directory.
Different from the tHDFSInput and the tHDFSOutput components, it runs standalone and does
not generate input or output flow for the other components.
It is often connected to the Job using OnSubjobOk or OnComponentOk link, depending on
the context.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
149
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitations
Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tFileOutputDelimited, tHDFSPut, tHDFSGet, tFileInputDelimited and tLogRow.
2.
3.
4.
5.
6.
150
2.
Set the Schema to Built-In and click the three-dot [...] button next to Edit Schema to describe the data
structure you want to create from internal variables. In this scenario, the schema contains one column: content.
3.
4.
Click OK to close the dialog box and accept to propagate the changes when prompted by the studio.
5.
In Basic settings, define the corresponding value in the Mode area using the Use Single Table option. In
this scenario, the value is Hello world!.
151
2.
Click the [...] button next to the File Name field and browse to the output file you want to write data in,
in.txt in this example.
152
2.
Select, for example, Apache 0.20.2 from the Hadoop version list.
3.
In the NameNode URI, the Username and the Group fields, enter the connection parameters to the HDFS.
4.
Next to the Local directory field, click the three-dot [...] button to browse to the folder with the
file to be loaded into the HDFS. In this scenario, the directory has been specified while configuring
tFileOutputDelimited: C:/hadoopfiles/putFile/.
5.
In the HDFS directory field, type in the intended location in HDFS to store the file to be loaded. In this
example, it is /testFile.
6.
7.
8.
In the Files area, click the plus button to add a row in which you define the file to be loaded.
9.
In the File mask column, enter *.txt to replace newLine between quotation marks and leave the New name
column as it is. This allows you to extract all the .txt files in the specified directory without changing their
names. In this example, the file is in.txt.
153
2.
Select, for example, Apache 0.20.2 from the Hadoop version list.
3.
In the NameNode URI, the Username, the Group fields, enter the connection parameters to the HDFS.
4.
In the HDFS directory field, type in location storing the loaded file in HDFS. In this example, it is /testFile.
5.
Next to the Local directory field, click the three-dot [...] button to browse to the folder intended to store the
files that are extracted out of the HDFS. In this scenario, the directory is: C:/hadoopfiles/getFile/.
6.
7.
8.
In the Files area, click the plus button to add a row in which you define the file to be extracted.
9.
In the File mask column, enter *.txt to replace newLine between quotation marks and leave the New name
column as it is. This allows you to extract all the .txt files from the specified directory in the HDFS without
changing their names. In this example, the file is in.txt.
Reading data from the HDFS and saving the data locally
1.
154
2.
3.
Next to the File Name/Stream field, click the three-dot button to browse to the file you have obtained from
the HDFS. In this scenario, the directory is C:/hadoopfiles/getFile/in.txt.
4.
Set Schema to Built-In and click Edit schema to define the data to pass on to the tLogRow component.
5.
6.
Click OK to close the dialog box and accept to propagate the changes when prompted by the studio.
155
The file is also extracted from the HDFS by tHDFSGet and is read by tFileInputDelimited.
156
tHDFSInput
tHDFSInput
tHDFSInput properties
Component family
Function
tHDFSInput reads a file located on a given Hadoop distributed file system (HDFS) and puts
the data of interest from this file into a Talend schema. Then it passes the data to the component
that follows.
If you have subscribed to one of the Talend solutions with Big Data, you are able to use this
component in a Talend Map/Reduce Job to generate Map/Reduce code. For further information,
see section tHDFSInput in Talend Map/Reduce Jobs.
Purpose
tHDFSInput extracts the data in a HDFS file for other components to process it.
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
157
tHDFSInput properties
Authentication
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the right
to read the keytab file being used. For example, the user name you
are using to execute a Job is user1 and the principal to be used is
guest; in this situation, ensure that user1 has the right to read the
keytab file to be used.
NameNode URI
User name
Group
File Name
Browse to, or enter the directory in HDFS where the data you need
to use is.
If the path you set points to a folder, this component will read all
of the files stored in that folder. Furthermore, if sub-folders exist
in that folder and you need to read files in the sub-folders, select
the Include sub-directories if path is directory check box in the
Advanced settings view.
File type
Type
Select the type of the file to be processed. The type of the file may
be:
Text file.
Sequence file: a Hadoop sequence file consists of binary key/
value pairs and is suitable for the Map/Reduce framework.
For further information, see http://wiki.apache.org/hadoop/
SequenceFile.
Once you select the Sequence file format, the Key column list
and the Value column list appear to allow you to select the keys
and the values of that Sequence file to be processed.
Row separator
Field separator
158
tHDFSInput properties
Set values to ignore the header of the transferred data. For example,
enter 0 to ignore no rows for the data without header and set 1 for
the data with header at the first row.
This field is not available for a Sequence file.
Custom encoding
You may encounter encoding issues when you process data stored
in HDFS. In that situation, select this check box to display the
Encoding list.
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
This option is not available for a Sequence file.
Compression
Advanced settings
Include sub-directories if Select this check box to read not only the folder you have specified
path is directory
in the File name field but also the sub-folders in that folder.
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
159
Limitations
In a Talend Map/Reduce Job, tHDFSInput, as well as the whole Map/Reduce Job using it, generates native Map/
Reduce code. This section presents the specific properties of tHDFSInput when it is used in that situation. For
further information about a Talend Map/Reduce Job, see the Talend Open Studio for Big Data Getting Started
Guide.
Component family
MapReduce / Input
Basic settings
Property type
Folder/File
Browse to, or enter the directory in HDFS where the data you need
to use is.
If the path you set points to a folder, this component will read all of
the files stored in that folder, for example, /user/talend/in; if subfolders exist, the sub-folders are automatically ignored unless you
define the path like /user/talend/in/*.
If you want to specify more than one files or directories in this
field, separate each path using a coma (,).
File type
Type
Select the type of the file to be processed. The type of the file may
be:
Text file.
Sequence file: a Hadoop sequence file consists of binary key/
value pairs and is suitable for the Map/Reduce framework.
For further information, see http://wiki.apache.org/hadoop/
SequenceFile.
Once you select the Sequence file format, the Key column list
and the Value column list appear to allow you to select the keys
and the values of that Sequence file to be processed.
Row separator
Field separator
Header
160
Set values to ignore the header of the transferred data. For example,
enter 0 to ignore no rows for the data without header and set 1 for
the data with header at the first row.
Related scenario
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
Then select the encoding to be used from the list or select Custom
and define it manually.
This option is not available for a Sequence file.
Advanced settings
Usage
Advanced separator (for Select this check box to change the separator used for numbers.
number)
By default, the thousands separator is a coma (,) and the decimal
separator is a period (.).
Trim all columns
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Hadoop Connection
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
Related scenario
Related topic, see section Scenario 1: Writing data in a delimited file.
Related topic, see section Scenario: Computing data with Hadoop distributed file system.
If you are a subscription-based Big Data user, you can as well consult a Talend Map/Reduce Job using the Map/
Reduce version of tHDFSInput:
section Scenario 2: Deduplicating entries using Map/Reduce components.
161
tHDFSList
tHDFSList
tHDFSList properties
Component family
Big Data/File
Function
Purpose
tHDFSList retrieves a list of files or folders based on a filemask pattern and iterates on each
unity.
Basic settings
Property type
Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Authentication
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
162
tHDFSList properties
User name
Group
HDFS Directory
Browse to, or enter the directory in HDFS where the data you need
to use is.
FileList Type
Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.
Include subdirectories
Select this check box if the selected input source type includes
sub-directories.
Case Sensitive
Set the case mode from the list to either create or not create case
sensitive filter on filenames.
Use Glob Expressions as This check box is selected by default. It filters the results using a
Filemask
Global Expression (Glob Expressions).
Files
Order by
The folders are listed first of all, then the files. You can choose to
prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverse alphabetical order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent to
most recent.
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then
file name takes precedence. If ordering by modified
date, in the event of identical dates then file name takes
precedence.
Order action
163
tHDFSList properties
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
CURRENT_FILE: Indicates the current file name. This is a Flow variable and it returns a string.
CURRENT_FILEDIRECTORY: Indicates the current file directory. This is a Flow variable
and it returns a string.
CURRENT_FILEEXTENSION: Indicates the extension of the current file. This is a Flow
variable and it returns a string.
CURRENT_FILEPATH: Indicates the current file name as well as its path. This is a Flow
variable and it returns a string.
NB_FILE: Indicates the number of files iterated upon so far. This is a Flow variable and it
returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
164
For further information regarding connections, see Talend Studio User Guide.
Usage
tHDFSList provides a list of files or folders from a defined HDFS directory on which it iterates.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Create the files to be iterated on in the HDFS you want to use. In this scenario, two files are created in the
directory: /user/ychen/data/hdfs/out.
165
You can design a Job in the Studio to create the two files. For further information, see section tHDFSPut
or section tHDFSOutput.
In the Integration perspective of Talend Studio, create an empty Job, named HDFSList for example, from
the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
3.
166
2.
In the Version area, select the Hadoop distribution you are connecting to and its version.
3.
In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may use tHDFSConnection to create a connection and reuse it from the
current component. For further information, see section tHDFSConnection.
4.
In the HDFS Directory field, enter the path to the folder where the files to be iterated on are. In this example,
as presented earlier, the directory is /user/ychen/data/hdfs/out/.
5.
6.
In the Files table, click
existing.
to add one row and enter * between the quotation marks to iterate on any files
167
2.
In the Version area, select the Hadoop distribution you are connecting to and its version.
3.
In the Connection area, enter the values of the parameters required to connect to the HDFS.
In the real-world practice, you may have used tHDFSConnection to create a connection; then you can reuse
it from the current component. For further information, see section tHDFSConnection.
4.
In the HDFS directory field, enter the path to the folder holding the files to be retrieved.
To do this with the auto-completion list, place the mouse pointer in this field, then, press Ctrl+Space to
display the list and select the tHDFSList_1_CURRENT_FILEDIRECTORY variable to reuse the directory
you have defined in tHDFSList. In this variable, tHDFSList_1 is the label of the component. If you label it
differently, select the variable accordingly.
Once
selecting
this
variable,
the
directory
reads,
((String)globalMap.get("tHDFSList_1_CURRENT_FILEDIRECTORY")) in this field.
for
example,
For further information about how to label a component, see the Talend Studio User Guide.
5.
In the Local directory field, enter the path, or browse to the folder you want to place the selected files in.
This folder will be created if it does not exist. In this example, it is C:/hdfsFiles.
6.
7.
In the Files table, click
to add one row and enter * between the quotation marks in the Filemask column
in order to get any files existing.
168
Once done, you can check the files created in the local directory.
169
tHDFSOutput
tHDFSOutput
tHDFSOutput properties
Component family
Function
tHDFSOutput writes data flows it receives into a given Hadoop distributed file system (HDFS).
If you have subscribed to one of the Talend solutions with Big Data, you are able to use this
component in a Talend Map/Reduce Job to generate Map/Reduce code. For further information,
see section tHDFSOutput in Talend Map/Reduce Jobs.
Purpose
tHDFSOutput transfers data flows from into a given HDFS file system.
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
170
tHDFSOutput properties
Authentication
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the right
to read the keytab file being used. For example, the user name you
are using to execute a Job is user1 and the principal to be used is
guest; in this situation, ensure that user1 has the right to read the
keytab file to be used.
File type
NameNode URI
User name
Group
File Name
Browse to, or enter the location of the file which you write data to.
This file is created automatically if it does not exist.
Type
Select the type of the file to be processed. The type of the file may
be:
Text file.
Sequence file: a Hadoop sequence file consists of binary key/
value pairs and is suitable for the Map/Reduce framework.
For further information, see http://wiki.apache.org/hadoop/
SequenceFile.
Once you select the Sequence file format, the Key column list
and the Value column list appear to allow you to select the keys
and the values of that Sequence file to be processed.
Action
Row separator
171
tHDFSOutput properties
Field separator
Custom encoding
You may encounter encoding issues when you process data stored
in HDFS. In that situation, select this check box to display the
Encoding list.
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
This option is not available for a Sequence file.
Compression
Select the Compress the data check box to compress the output
data.
Hadoop provides different compression formats that help reduce
the space needed for storing files and speed up data transfer. When
reading a compressed file, the Studio needs to uncompress it before
being able to feed it to the input flow.
Include header
Advanced settings
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitations
172
In a Talend Map/Reduce Job, tHDFSOutput, as well as the other Map/Reduce components preceding it, generates
native Map/Reduce code. This section presents the specific properties of tHDFSOutput when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Open Studio for Big Data
Getting Started Guide.
Component family
MapReduce / Output
Basic settings
Property type
Folder
Browse to, or enter the directory in HDFS where the data you need
to use is.
This path must point to a folder rather than a file, because a Talend
Map/Reduce Job need to write in its target folder not only the final
result but also multiple part- files generated in performing Map/
Reduce computations.
File type
Type
Select the type of the file to be processed. The type of the file may be:
Text file.
Sequence file: a Hadoop sequence file consists of binary key/
value pairs and is suitable for the Map/Reduce framework.
For further information, see http://wiki.apache.org/hadoop/
SequenceFile.
Once you select the Sequence file format, the Key column list
and the Value column list appear to allow you to select the keys
and the values of that Sequence file to be processed.
Action
Row separator
Field separator
Include header
173
Related scenario
Custom encoding
You may encounter encoding issues when you process data stored
in HDFS. In that situation, select this check box to display the
Encoding list.
Then select the encoding to be used from the list or select Custom
and define it manually.
This option is not available for a Sequence file.
Compression
Select the Compress the data check box to compress the output
data.
Hadoop provides different compression formats that help reduce
the space needed for storing files and speed up data transfer. When
reading a compressed file, the Studio needs to uncompress it before
being able to feed it to the input flow.
Merge result to single file Select this check box to merge the final part files into a single file
and put that file in a specified directory.
Once selecting it, you need to enter the path to, or browse to
the folder you want to store the merged file in. This directory is
automatically created if it does not exist.
This option is not available for a Sequence file.
Advanced settings
Advanced separator (for Select this check box to change the separator used for numbers.
number)
By default, the thousands separator is a coma (,) and the decimal
separator is a period (.).
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
Hadoop Connection
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
Related scenario
Related topic, see section Scenario 1: Writing data in a delimited file.
Related topic, see section Scenario: Computing data with Hadoop distributed file system.
If you are a subscription-based Big Data user, you can as well consult a Talend Map/Reduce Job using the Map/
Reduce version of tHDFSOutput:
section Scenario 2: Deduplicating entries using Map/Reduce components.
174
tHDFSProperties
tHDFSProperties
tHDFSProperties properties
Component family
Big Data/File
Function
This component creates a single row flow that displays the properties of a file processed in HDFS.
Purpose
Basic settings
Property type
Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
175
tHDFSProperties properties
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab
field.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the right
to read the keytab file being used. For example, the user name you
are using to execute a Job is user1 and the principal to be used is
guest; in this situation, ensure that user1 has the right to read the
keytab file to be used.
NameNode URI
User name
Group
Advanced settings
File
Browse to, or enter the directory in HDFS where the data you need
to use is.
Select this check box to generate and output the MD5 information
of the file processed.
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
176
Related scenario
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Related scenario
Related topic, see section Scenario: Displaying the properties of a processed file
Related topic, see section Scenario: Iterating on a HDFS directory
177
tHDFSPut
tHDFSPut
tHDFSPut properties
Component family
Function
tHDFSPut copies files from an user-defined directory, pastes them into a given Hadoop
distributed file system(HDFS) and if needs be, renames these files.
Purpose
tHDFSPut connects to Hadoop distributed file system to load large-scale files into it with
optimized performance.
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
178
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
tHDFSPut properties
User name
Group
Local directory
Local directory where are stored the files to be loaded into HDFS.
HDFS directory
Browse to, or enter the directory in HDFS where the data you need
to use is.
Overwrite file
Options to overwrite or not the existing file with the new one.
Files
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
NB_FILE: Indicates the number of files processed. This is an After variable and it returns an
integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component combines HDFS connection and data extraction, thus usually used as a singlecomponent subjob to move data from a user-defined local directory to HDFS.
Different from the tHDFSInput and the tHDFSOutput components, it runs standalone and does
not generate input or output flow for the other components.
179
Related scenario
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitations
Related scenario
For related scenario, see section Scenario: Computing data with Hadoop distributed file system.
180
tHDFSRename
tHDFSRename
tHDFSRename Properties
Component Family
Big Data/HDFS
Function
Purpose
tHDFSRename renames files selected from a local directory towards a distant HDFS directory.
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
181
tHDFSRename Properties
User name
Group
HDFS directory
Browse to, or enter the directory in HDFS where the data you need
to use is.
Overwrite file
Select the options to overwrite or not the existing file with the new
one.
Files
Click the [+] button to add the lines you want to use as filters:
Filemask: enter the filename or filemask using wildcharacters (*)
or regular expressions.
New name: name to give to the HDFS file after the transfer.
Die on error
This check box is selected by default. Clear the check box to skip
the row in error and complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
NB_FILE: Indicates the number of files processed. This is an After variable and it returns an
integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
182
Related scenario
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Related scenario
For related scenario, see section Scenario: Computing data with Hadoop distributed file system.
183
tHDFSRowCount
tHDFSRowCount
tHDFSRowCount properties
Component family
Big Data/File
Function
This component reads a file in HDFS row by row in order to determine the number of rows this
file contains.
Purpose
Basic settings
Property Type
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User Guide.
Repository: You have already created the schema and stored it
in the Repository. You can reuse it in various projects and Job
designs. Related topic: see Talend Studio User Guide.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Select this check box and in the Component List click the
HDFS connection component from which you want to reuse the
connection details already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
Authentication
184
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend Job
must be the same, such as Windows or Linux.
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
tHDFSRowCount properties
name for the NameNode in the field displayed. This enables you
to use your user name to authenticate against the credentials stored
in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab
field.
Note that the user that executes a keytab-enabled Job is not
necessarily the one a principal designates but must have the right
to read the keytab file being used. For example, the user name you
are using to execute a Job is user1 and the principal to be used is
guest; in this situation, ensure that user1 has the right to read the
keytab file to be used.
NameNode URI
User name
Group
File name
Browse to, or enter the directory in HDFS where the data you need
to use is.
Row separator
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
Compression
Advanced settings
Hadoop properties
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
185
Related scenario
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
In this example, tHDFSRowCount_1 is the label of this component in a Job, so it may vary
among different use cases; COUNT is the global variable of tHDFSRowCount, representing
the integer flow of the row count.
For further information about how to label a component or how to use a global variable in a Job,
see the Talend Studio User Guide.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Related scenario
No scenario is available for this component yet.
186
tHiveClose
tHiveClose
tHiveClose properties
Component Family
Function
Purpose
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with other Hive components, especially with tHiveConnection
as tHiveConnection allows you to open a connection for the transaction which is underway.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
187
tHiveConnection
tHiveConnection
tHiveConnection properties
Database Family
Function
Purpose
This component allows you to establish a Hive connection to be reused by other Hive components
in your Job.
Basic settings
Property type
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on the
component you are using. Among these options, the Custom option
allows you to connect to a custom Hadoop distribution rather than
any of the distributions given in this list and officially supported by
Talend.
In order to connect to a custom distribution, once selecting Custom,
click the
alternatively:
Connection
Hive version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of the
operating system for running the distribution and a Talend Job must
be the same, such as Windows or Linux.
Connection mode
Select a connection mode from the list. The options vary depending
on the distribution you are using.
Hive server
Select the Hive server through which you want the Job using this
component to execute queries on Hive.
This Hive server list is available only when the Hadoop distribution
to be used such as HortonWorks Data Platform V1.2.0 (Bimota)
supports HiveServer2. It allows you to select HiveServer2 (Hive
2), the server that better support concurrent connections of multiple
clients than HiveServer (Hive 1).
For further information about HiveServer2,
cwiki.apache.org/Hive/setting-up-hiveserver2.html.
188
see
https://
tHiveConnection properties
Host
Port
Database
Use kerberos authentication If you are accessing a Hive Metastore running with Kerberos
security, select this check box and then enter the relevant parameters
in the fields that appear.
The values of those parameters can be found in the hive-site.xml file
of the Hive system to be used.
1. Hive
principal
uses
the
value
of
hive.metastore.kerberos.principal. This is the service principal of
the Hive Metastore.
2. Metastore
URL
uses
the
value
of
javax.jdo.option.ConnectionURL. This is the JDBC connection
string to the Hive Metastore.
3. Driver
class
uses
the
value
of
javax.jdo.option.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
4. Username
uses
the
value
of
javax.jdo.option.ConnectionUserName. This, as well as the
Password parameter, is the user credential for connecting to the
Hive Metastore.
5. Password
uses
the
javax.jdo.option.ConnectionPassword.
value
of
Select this check box to indicate the location of the Jobtracker service
within the Hadoop cluster to be used. For example, we assume that
you have chosen a machine called machine1 as the JobTracker, then
set its location as machine1:portnumber. A Jobtracker is the service
that assigns Map/Reduce tasks to specific nodes in a Hadoop cluster.
Note that the notion job in this term JobTracker does not designate a
Talend Job, but rather a Hadoop job described as MR or MapReduce
job in Apache's Hadoop documentation on http://hadoop.apache.org.
This property is required when the query you want to use is executed
in Windows and it is a Select query. For example, SELECT
your_column_name FROM your_table_name
189
tHiveConnection properties
memory volumes to the Map and the Reduce computations and the
ApplicationMaster of YARN by selecting the Set memory check
box in the Advanced settings view. For further information about
the Resource Manager and its scheduler and the ApplicationMaster,
see YARN's documentation such as http://hortonworks.com/blog/
apache-hadoop-yarn-concepts-and-applications/.
For further information about the Hadoop Map/Reduce framework,
see the Map/Reduce tutorial in Apache's Hadoop documentation on
http://hadoop.apache.org.
Set NameNode URI
Store by HBase
Zookeeper quorum
Type in the name or the URL of the Zookeeper service you use to
coordinate the transaction between Talend and HBase.
Define the jars to register for Select this check box to display the Register jar for HBase table,
HBase
in which you can register any missing jar file required by HBase, for
example, the Hive Storage Handler, by default, registered along with
your Hive installation.
Register jar for HBase
Click the
button to add rows to this table, then, in the Jar
name column, select the jar file(s) to be registered and in the Jar
path column, enter the path(s) pointing to that or those jar file(s).
Advanced settings
Hadoop properties
190
Hive properties
Mapred job map memory If the Hadoop distribution to be used is Hortonworks Data Platform
mb and Mapred job reduce V1.2 or Hortonworks Data Platform V1.3, you need to set proper
memory mb
memory allocations for the map and reduce computations to be
performed by the Hadoop system.
In that situation, you need to enter the values you need to in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are both
1000 which are normally appropriate for running the computations.
Usage
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
This component is generally used with other Hive components, particularly tHiveClose.
If the Studio used to connect to a Hive database is operated on Windows, you must manually create
a folder called tmp in the root of the disk where this Studio is installed.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
Limitation
n/a
Select Import from existing version or Import from zip to import the required jar files from the appropriate
source. By doing so, you can reuse the jar files already available for a Hadoop distribution officially supported
by Talend.
191
2.
Verify that the Hive check box is selected. This allows you to import the jar files pertinent to the connection
to be created between this component and the Hive of the Hadoop distribution to be used.
3.
Click OK and then in the pop-up warning, click Yes to accept overwriting any custom setup of jar files
previously implemented for this component.
Once done, the [Custom Hadoop version definition] dialog box becomes active.
4.
If you still need to add more jar files, click the
5.
6.
7.
Click OK to validate the changes and to close the [Select libraries] dialog box.
Once done, the selected jar file appears in the list in the Hive tab view.
Then, you can repeat this procedure to import more jar files.
If you need to share the custom setup of jar files with another Studio, you can export this custom connection from
the [Custom Hadoop version definition] window using the
192
button.
Related scenario
Related scenario
For a scenario about how a connection component is used in a Job, see section Scenario: Inserting data in mother/
daughter tables.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker, when
configuring this component since the component is used to connect to a Hadoop distribution,
193
tHiveCreateTable
tHiveCreateTable
tHiveCreateTable properties
Component family
Function
This component connects to the Hive database to be used and creates a Hive table that is dedicated
to data of the format you specify.
Purpose
This component is used to create Hive tables that fit a wide range of Hive data formats. A proper
Hive data format such as RC or ORC allows you to obtain a better performance in processing data
with Hive.
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on the
component you are using. Among these options, the Custom option
allows you to connect to a custom Hadoop distribution rather than
any of the distributions given in this list and officially supported by
Talend.
In order to connect to a custom distribution, once selecting Custom,
click the
alternatively:
194
tHiveCreateTable properties
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of the
operating system for running the distribution and a Talend Job must
be the same, such as Windows or Linux.
Connection mode
Select a connection mode from the list. The options vary depending
on the distribution you are using.
Hive server
Select the Hive server through which you want the Job using this
component to execute queries on Hive.
This Hive server list is available only when the Hadoop distribution
to be used such as HortonWorks Data Platform V1.2.0 (Bimota)
supports HiveServer2. It allows you to select HiveServer2 (Hive
2), the server that better support concurrent connections of multiple
clients than HiveServer (Hive 1).
For further information about HiveServer2,
cwiki.apache.org/Hive/setting-up-hiveserver2.html.
Authentication
Host
Port
Database
Use
authentication
see
https://
value
of
195
tHiveCreateTable properties
Hadoop properties
Select this check box to indicate the location of the Jobtracker service
within the Hadoop cluster to be used. For example, we assume that
you have chosen a machine called machine1 as the JobTracker, then
set its location as machine1:portnumber. A Jobtracker is the service
that assigns Map/Reduce tasks to specific nodes in a Hadoop cluster.
Note that the notion job in this term JobTracker does not designate a
Talend Job, but rather a Hadoop job described as MR or MapReduce
job in Apache's Hadoop documentation on http://hadoop.apache.org.
If you use YARN such as Hortonworks Data Platform V2.0.0
or Cloudera CDH4.3 + (YARN mode), you need to specify
the location of the Resource Manager instead of the Jobtracker.
Then, if necessary, select the Set resourcemanager scheduler
address check box and enter the Scheduler address in the field
that appears. Furthermore, if required, you can allocate proper
memory volumes to the Map and the Reduce computations and the
ApplicationMaster of YARN by selecting the Set memory check
box in the Advanced settings view. For further information about
the Resource Manager and its scheduler and the ApplicationMaster,
see YARN's documentation such as http://hortonworks.com/blog/
apache-hadoop-yarn-concepts-and-applications/.
For further information about the Hadoop Map/Reduce framework,
see the Map/Reduce tutorial in Apache's Hadoop documentation on
http://hadoop.apache.org.
Table Name
Action on table
Format
Inputformat class
Outputformat class
and These fields appear only when you have selected INPUTFORMAT
and OUTPUTFORMAT from the Format list.
These fields allow you to enter the name of the jar files to be used
for the data formats not available in the Format list.
Storage class
Enter the name of the storage handler to be used for creating a nonnative table (Hive table stored and managed in other systems than
Hive, for example, Cassandra or MongoDB).
This field is available only when you have selected STORAGE from
the Format list.
For further information about a storage handler, see https://
cwiki.apache.org/confluence/display/Hive/StorageHandlers.
196
tHiveCreateTable properties
Set partitions
If you want to create a Hive table in a directory other than the default
one, select this check box and enter the directory in HDFS you want
to use to hold the table content.
This is typical useful when you need to create an external Hive
table by selecting the Create an external table check box in the
Advanced settings tab.
Row format
Select this check box to use the Delimited row format as the storage
format of data in the Hive table to be created. Once selecting it, you
can further to specify the delimiter(s) for the data you need to load
to the table. This Delimited format is also the default format which
is used when you have not selected either this check box or the Set
SerDe row format check box.
The Field delimiter is to separate fields of the data.
The Collection item delimiter is to separate elements in an Array
or Struct instance of the data or key-value pairs in a Map instance
of the data.
The Map key delimiter is to separate the key and the value in a
Map instance of the data.
The Line delimiter is to separate data rows.
For further information about the delimiters and the data types
mentioned in this list, see Apache's documentation about Hive or the
documentation of the Hadoop distribution you are using.
In defining the Field delimiter, you can as well define the escaping
character you need to use by selecting the Escape check box and
entering that character. Otherwise, the backward slash (\) is used by
default.
Note that this check box is not available when you have selected
AVRO or STORAGE from the Format list.
Select this check box to use the SerDe row format as the storage
format of data in the Hive table to be created. Once selecting it, you
need to enter the name of the Java class that implements the Hive
SerDe interface you need to use.
This Java class might have to be developed by yourself or is simply
among the jars provided in the Hadoop distribution you are using.
Note that this check box is not available when you have selected
AVRO or STORAGE from the Format list.
Advanced settings
Die on error
Select this check box to kill the Job when an error occurs.
Like table
Select this check box and enter the name of the Hive table you want
to copy. This allows you to copy the definition of an existing table
without copying its data.
For further information about the Like parameter, see Apache's
information about Hive's Data Definition Language.
Select this check box to make the table to be created an external Hive
table. This kind of Hive table leaves the raw data where it is if the
data is in HDFS.
An external table is usually the better choice for accessing shared
data existing in a file system.
For further information about an external Hive table, see Apache's
documentation about Hive.
Table comment
Enter the description you want to use for the table to be created.
197
tHiveCreateTable properties
As select
Set
clustered_by
skewed_by statement
Select this check box and enter the As select statement for creating
a Hive table that is based on a Select statement.
or Enter the Clustered by statement to cluster the data of a table or
a partition into buckets, or/and enter the Skewed by statement to
allow Hive to extract the heavily skewed data and put it into separate
files. This is typically used for obtaining better performance during
queries.
SerDe properties
If you are using the SerDe row format, you can add any custom SerDe
properties to override the default ones used by the Hadoop engine of
the Studio.
Table properties
Add any custom Hive table properties you want to override the
default ones used by the Hadoop engine of the Studio.
Temporary path
If you do not want to set the Jobtracker and the NameNode when you
execute the query select * from your_table_name, you need
to set this temporary path. For example, /C:/select_all in Windows.
Hadoop properties
Hive properties
Mapred job map memory If the Hadoop distribution to be used is Hortonworks Data Platform
mb and Mapred job reduce V1.2 or Hortonworks Data Platform V1.3, you need to set proper
memory mb
memory allocations for the map and reduce computations to be
performed by the Hadoop system.
In that situation, you need to enter the values you need to in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are both 1000
which are normally appropriate for running the computations.
Dynamic settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
198
Related scenario
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
Related scenario
For a related scenario, see section Scenario: creating a partitioned Hive table.
199
tHiveInput
tHiveInput
tHiveInput properties
Component family
Function
tHiveInput is the dedicated component to the Hive database (the Hive data warehouse system). It executes
the given HiveQL query in order to extract the data of interest from Hive. It provides the SQLBuilder tool
to help you write your HiveQL statements easily.
This component can also read data from a HBase database once you activate its Store by HBase function.
Purpose
tHiveInput executes the select queries to extract the corresponding data and sends the data to the
component that follows.
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection
connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share
an existing connection between the two levels, for example, to share the
connection created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the
Basic settings view of the connection component which creates that very
database connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job levels,
see Talend Studio User Guide.
Version
Distribution
Select the product you are using as the Hadoop distribution from the drop-down list.
The options in the list vary depending on the component you are using. Among these
options, the Custom option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially supported by Talend.
In order to connect to a custom distribution, once selecting Custom, click the
button to display the dialog box in which you can alternatively:
1. Select Import from existing version to import jar files from a given Hadoop
distribution and then manually add other jar files which that Hadoop distribution
does not provide.
2. Select Import from zip to import jar files from a zip file which, for example,
contains all required jar files set up in another Studio and is exported from that
Studio.
In this dialog box, the active check box must be kept selected so as to
import the jar files pertinent to the connection to be created between the
custom distribution and this component.
For an step-by-step example about how to connect to a custom Hadoop distribution
and share this connection, see section Connecting to a custom Hadoop distribution.
Hive version
200
Select the version of the Hadoop distribution you are using. Note that if you use
Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
tHiveInput properties
Connection
mode
Select a connection mode from the list. The options vary depending on the distribution
you are using.
Hive server
Select the Hive server through which you want the Job using this component to execute
queries on Hive.
This Hive server list is available only when the Hadoop distribution to be used such as
HortonWorks Data Platform V1.2.0 (Bimota) supports HiveServer2. It allows you
to select HiveServer2 (Hive 2), the server that better support concurrent connections
of multiple clients than HiveServer (Hive 1).
For further information about HiveServer2, see https://cwiki.apache.org/Hive/settingup-hiveserver2.html.
Host
Port
Database
Username
Password
Authentication
Use
kerberos If you are accessing a Hive Metastore running with Kerberos security, select this check
authentication
box and then enter the relevant parameters in the fields that appear.
The values of those parameters can be found in the hive-site.xml file of the Hive system
to be used.
1. Hive principal uses the value of hive.metastore.kerberos.principal. This is the
service principal of the Hive Metastore.
2. Metastore URL uses the value of javax.jdo.option.ConnectionURL. This is the
JDBC connection string to the Hive Metastore.
3. Driver class uses the value of javax.jdo.option.ConnectionDriverName. This is the
name of the driver for the JDBC connection.
4. Username uses the value of javax.jdo.option.ConnectionUserName. This, as well as
the Password parameter, is the user credential for connecting to the Hive Metastore.
5. Password uses the value of javax.jdo.option.ConnectionPassword.
This check box is available depending on the Hadoop distribution you are connecting
to.
Use a keytab to Select the Use a keytab to authenticate check box to log into a Kerberos-enabled
authenticate
Hadoop system using a given keytab file. A keytab file contains pairs of Kerberos
principals and encrypted keys. You need to enter the principal to be used in the
Principal field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a
principal designates but must have the right to read the keytab file being used. For
example, the user name you are using to execute a Job is user1 and the principal to be
used is guest; in this situation, ensure that user1 has the right to read the keytab file
to be used.
Hadoop properties
Set Jobtracker Select this check box to indicate the location of the Jobtracker service within the
URI
Hadoop cluster to be used. For example, we assume that you have chosen a machine
called machine1 as the JobTracker, then set its location as machine1:portnumber. A
Jobtracker is the service that assigns Map/Reduce tasks to specific nodes in a Hadoop
cluster. Note that the notion job in this term JobTracker does not designate a Talend
Job, but rather a Hadoop job described as MR or MapReduce job in Apache's Hadoop
documentation on http://hadoop.apache.org.
This property is required when the query you want to use is executed in Windows
and it is a Select query. For example, SELECT your_column_name FROM
your_table_name
If you use YARN such as Hortonworks Data Platform V2.0.0 or Cloudera CDH4.3
+ (YARN mode), you need to specify the location of the Resource Manager instead of
201
tHiveInput properties
the Jobtracker. Then, if necessary, select the Set resourcemanager scheduler address
check box and enter the Scheduler address in the field that appears. Furthermore,
if required, you can allocate proper memory volumes to the Map and the Reduce
computations and the ApplicationMaster of YARN by selecting the Set memory
check box in the Advanced settings view. For further information about the Resource
Manager and its scheduler and the ApplicationMaster, see YARN's documentation such
as http://hortonworks.com/blog/apache-hadoop-yarn-concepts-and-applications/.
For further information about the Hadoop Map/Reduce framework, see the Map/
Reduce tutorial in Apache's Hadoop documentation on http://hadoop.apache.org.
Set NameNode Select this check box to indicate the location of the NameNode of the Hadoop cluster
URI
to be used. The NameNode is the master node of a Hadoop cluster. For example, we
assume that you have chosen a machine called masternode as the NameNode of an
Apache Hadoop distribution, then the location is hdfs://masternode:portnumber.
This property is required when the query you want to use is executed in Windows
and it is a Select query. For example, SELECT your_column_name FROM
your_table_name
For further information about the Hadoop Map/Reduce framework, see the Map/
Reduce tutorial in Apache's Hadoop documentation on http://hadoop.apache.org.
Schema and Edit A schema is a row description. It defines the number of fields to be processed and
Schema
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related
topic: see Talend Studio User Guide.
Table Name
Query type
Guess Query
Click the Guess Query button to generate the query which corresponds to your table
schema in the Query field.
Guess schema
Query
Enter your DB query paying particularly attention to properly sequence the fields in
order to match the schema definition.
For further information about the Hive query language, see https://cwiki.apache.org/
Hive/languagemanual.html.
Compressed data in the form of Gzip or Bzip2 can be processed through
the query statements. For details, see https://cwiki.apache.org/confluence/
display/Hive/CompressedStorage.
Hadoop provides different compression formats that help reduce the space
needed for storing files and speed up data transfer. When reading a
compressed file, the Studio needs to uncompress it before being able to feed
it to the input flow.
HBase Configuration
Store by HBase
Available only
when the Use
an
existing
connection
check box is
clear
Select this check box to display the parameters to be set to allow the Hive components
to access HBase tables. Once this access is configured, you will be able to use, in
tHiveRow and tHiveInput, the Hive QL statements to read and write data in HBase.
For further information about this access involving Hive and HBase, see Apache's Hive
documentation about Hive/HBase integration.
Zookeeper
quorum
Type in the name or the URL of the Zookeeper service you use to coordinate the
transaction between Talend and HBase.
Zookeeper client Type in the number of the client listening port of the Zookeeper service you are using.
port
202
tHiveInput properties
Define the jars Select this check box to display the Register jar for HBase table, in which you can
to register for register any missing jar file required by HBase, for example, the Hive Storage Handler,
HBase
by default, registered along with your Hive installation.
Register jar for
HBase
Click the
button to add rows to this table, then, in the Jar name column, select
the jar file(s) to be registered and in the Jar path column, enter the path(s) pointing
to that or those jar file(s).
Advanced settings
Temporary path If you do not want to set the Jobtracker and the NameNode when you execute the
query select * from your_table_name, you need to set this temporary path. For
example, /C:/select_all in Windows.
Trim all the Select this check box to remove leading and trailing whitespace from all the String/
String/Char
Char columns.
columns
Trim column
Hadoop
properties
Talend Studio uses a default configuration for its engine to perform operations in a
Hadoop distribution. If you need to use a custom configuration in a specific situation,
complete this table with the property or properties to be customized. Then at runtime,
the customized property or properties will override those default ones.
For further information about the properties required by Hadoop and its related
systems such as HDFS and Hive, see Apache's Hadoop documentation on http://
hadoop.apache.org, or the documentation of the Hadoop distribution you need to use.
Hive properties
Talend Studio uses a default configuration for its engine to perform operations in
a Hive database. If you need to use a custom configuration in a specific situation,
complete this table with the property or properties to be customized. Then at runtime,
the customized property or properties will override those default ones. For further
information for Hive dedicated properties, see https://cwiki.apache.org/confluence/
display/Hive/AdminManual+Configuration.
Mapred
job
map memory mb
and Mapred job
reduce memory
mb
Path separator Leave the default value of the Path separator in server as it is, unless you have
in server
changed the separator used by your Hadoop distribution's host machine for its PATH
variable or in other words, that separator is not a colon (:). In that situation, you must
change this value to the one you are using in that host.
tStatCatcher
Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
203
Related scenarios
A Flow variable means it functions during the execution of a component while an After variable means
it functions after the execution of a component.
Usage
This component offers the benefit of flexible DB queries and covers all possible Hive QL queries.
If the Studio used to connect to a Hive database is operated on Windows, you must manually create a
folder called tmp in the root of the disk where this Studio is installed.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio.
The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR
client library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in
the MapR client jar file; without adding it, you may encounter the following error: no MapRClient
in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native
library of that MapR client. This allows the subscription-based users to make full use of the Data viewer
to view locally in the Studio the data stored in MapR. For further information about how to set this
argument, see the section describing how to view data of Talend Open Studio for Big Data Getting
Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the
Hadoop distribution you are using.
Related scenarios
For a scenario about how an input component is used in a Job, see section Scenario 1: Writing columns from a
MySQL database to an output file.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker, when
configuring this component since the component needs to connect to a Hadoop distribution.
204
tHiveLoad
tHiveLoad
tHiveLoad properties
Component
family
Function
This component connects to a given Hive database and copies or moves data into an existing Hive table or a
directory you specify.
Purpose
This component is used to write data of different formats into a given Hive table or to export data from a Hive
table to a directory.
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component
connection
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Version
Distribution
Select the product you are using as the Hadoop distribution from the drop-down list. The
options in the list vary depending on the component you are using. Among these options, the
Custom option allows you to connect to a custom Hadoop distribution rather than any of the
distributions given in this list and officially supported by Talend.
In order to connect to a custom distribution, once selecting Custom, click the
display the dialog box in which you can alternatively:
button to
1. Select Import from existing version to import jar files from a given Hadoop distribution
and then manually add other jar files which that Hadoop distribution does not provide.
2. Select Import from zip to import jar files from a zip file which, for example, contains all
required jar files set up in another Studio and is exported from that Studio.
In this dialog box, the active check box must be kept selected so as to import the
jar files pertinent to the connection to be created between the custom distribution
and this component.
For an step-by-step example about how to connect to a custom Hadoop distribution and
share this connection, see section Connecting to a custom Hadoop distribution.
Hive version
Select the version of the Hadoop distribution you are using. Note that if you use Hortonworks
Data Platform V2.0.0, the type of the operating system for running the distribution and a
Talend Job must be the same, such as Windows or Linux.
Connection mode
Select a connection mode from the list. The options vary depending on the distribution you
are using.
Hive server
Select the Hive server through which you want the Job using this component to execute
queries on Hive.
205
tHiveLoad properties
This Hive server list is available only when the Hadoop distribution to be used such as
HortonWorks Data Platform V1.2.0 (Bimota) supports HiveServer2. It allows you to select
HiveServer2 (Hive 2), the server that better support concurrent connections of multiple clients
than HiveServer (Hive 1).
For further information about HiveServer2, see https://cwiki.apache.org/Hive/setting-uphiveserver2.html.
Host
Port
Database
Username
Password
Authentication
Use
kerberos If you are accessing a Hive Metastore running with Kerberos security, select this check box
authentication
and then enter the relevant parameters in the fields that appear.
The values of those parameters can be found in the hive-site.xml file of the Hive system to
be used.
1. Hive principal uses the value of hive.metastore.kerberos.principal. This is the service
principal of the Hive Metastore.
2. Metastore URL uses the value of javax.jdo.option.ConnectionURL. This is the JDBC
connection string to the Hive Metastore.
3. Driver class uses the value of javax.jdo.option.ConnectionDriverName. This is the name
of the driver for the JDBC connection.
4. Username uses the value of javax.jdo.option.ConnectionUserName. This, as well as the
Password parameter, is the user credential for connecting to the Hive Metastore.
5. Password uses the value of javax.jdo.option.ConnectionPassword.
This check box is available depending on the Hadoop distribution you are connecting to.
Use a keytab to Select the Use a keytab to authenticate check box to log into a Kerberos-enabled Hadoop
authenticate
system using a given keytab file. A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used in the Principal field and the access
path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal
designates but must have the right to read the keytab file being used. For example, the user
name you are using to execute a Job is user1 and the principal to be used is guest; in this
situation, ensure that user1 has the right to read the keytab file to be used.
Hadoop
properties
Set Jobtracker URI Select this check box to indicate the location of the Jobtracker service within the Hadoop
cluster to be used. For example, we assume that you have chosen a machine called machine1
as the JobTracker, then set its location as machine1:portnumber. A Jobtracker is the service
that assigns Map/Reduce tasks to specific nodes in a Hadoop cluster. Note that the notion job
in this term JobTracker does not designate a Talend Job, but rather a Hadoop job described
as MR or MapReduce job in Apache's Hadoop documentation on http://hadoop.apache.org.
If you use YARN such as Hortonworks Data Platform V2.0.0 or Cloudera CDH4.3 +
(YARN mode), you need to specify the location of the Resource Manager instead of the
Jobtracker. Then, if necessary, select the Set resourcemanager scheduler address check
box and enter the Scheduler address in the field that appears. Furthermore, if required, you
can allocate proper memory volumes to the Map and the Reduce computations and the
ApplicationMaster of YARN by selecting the Set memory check box in the Advanced
settings view. For further information about the Resource Manager and its scheduler and the
ApplicationMaster, see YARN's documentation such as http://hortonworks.com/blog/apachehadoop-yarn-concepts-and-applications/.
For further information about the Hadoop Map/Reduce framework, see the Map/Reduce
tutorial in Apache's Hadoop documentation on http://hadoop.apache.org.
Set NameNode URI Select this check box to indicate the location of the NameNode of the Hadoop cluster to
be used. The NameNode is the master node of a Hadoop cluster. For example, we assume
that you have chosen a machine called masternode as the NameNode of an Apache Hadoop
distribution, then the location is hdfs://masternode:portnumber.
206
tHiveLoad properties
For further information about the Hadoop Map/Reduce framework, see the Map/Reduce
tutorial in Apache's Hadoop documentation on http://hadoop.apache.org.
Load action
Select the action you need to carry for writing data into the specified destination.
When you select LOAD, you are moving or copying data from a directory you specify.
When you select INSERT, you are moving or copying data based on queries.
Target type
This drop-down list appears only when you have selected INSERT from the Load action list.
Select from this list the type of the location you need to write data in.
If you select Table as destination, you can still choose to append data to or overwrite the
contents in the specified table.
If you select Directory as destination, you are overwriting the contents in the specified
directory
Table name
Enter the name of the Hive table you need to write data in.
Note that with the INSERT action, this field is available only when you have selected Table
from the Target type list.
File path
Enter the directory you need to read data from or write data in, depending on the action you
have selected from the Load action list.
If you have selected LOAD: this is the path to the data you want to copy or move into the
specified Hive table.
If you have selected INSERT: this is the directory to which you want to export data from
a Hive table. With this action, the File path field is available only when you have selected
Directory from the Target type list.
Action on file
Query
This field appears when you have selected INSERT from the Load action list.
Enter the appropriate query for selecting the data to be exported to the specified Hive table
or directory.
Local
Select this check box to use the Hive LOCAL statement for accessing a local directory.
This statement is used along with the directory you have defined in the File path field.
Therefore, this Local check box is available only when the File path field is available.
If you are using the LOAD action, tHiveLoad copies the local data to the target table.
If you are using the INSERT action, tHiveLoad copies data to a local directory.
If you leave this Local check box clear, the directory defined in the File path field is
assumed to be in the HDFS system to be used and data will be moved to the target location.
For further information about this LOCAL statement, see Apache's documentation about Hive's
Language.
Set partitions
Select this check box to use the Hive Partition clause in loading or inserting data in a Hive
table. You need to enter the partition keys and their values to be used in the field that appears.
For example, enter contry='US', state='CA'. This makes a partition clause reading
Partition (contry='US', state='CA'), that is to say, a US and CA partition.
Also, it is recommended to select the Create partition if not exist check box that appears to
ensure that you will not create a duplicate partition.
Advanced
settings
Die on error
Select this check box to kill the Job when an error occurs.
Temporary path
If you do not want to set the Jobtracker and the NameNode when you execute the query
select * from your_table_name, you need to set this temporary path. For example, /
C:/select_all in Windows.
Hadoop properties Talend Studio uses a default configuration for its engine to perform operations in a Hadoop
distribution. If you need to use a custom configuration in a specific situation, complete this
207
tHiveLoad properties
table with the property or properties to be customized. Then at runtime, the customized
property or properties will override those default ones.
For further information about the properties required by Hadoop and its related systems such
as HDFS and Hive, see Apache's Hadoop documentation on http://hadoop.apache.org, or the
documentation of the Hadoop distribution you need to use.
Hive properties
Talend Studio uses a default configuration for its engine to perform operations in a Hive
database. If you need to use a custom configuration in a specific situation, complete this
table with the property or properties to be customized. Then at runtime, the customized
property or properties will override those default ones. For further information for Hive
dedicated properties, see https://cwiki.apache.org/confluence/display/Hive/AdminManual
+Configuration.
Mapred job map If the Hadoop distribution to be used is Hortonworks Data Platform V1.2 or Hortonworks
memory mb and Data Platform V1.3, you need to set proper memory allocations for the map and reduce
Mapred job reduce computations to be performed by the Hadoop system.
memory mb
In that situation, you need to enter the values you need to in the Mapred job map memory
mb and the Mapred job reduce memory mb fields, respectively. By default, the values are
both 1000 which are normally appropriate for running the computations.
Path separator in Leave the default value of the Path separator in server as it is, unless you have changed the
server
separator used by your Hadoop distribution's host machine for its PATH variable or in other
words, that separator is not a colon (:). In that situation, you must change this value to the
one you are using in that host.
tStatCatcher
Statistics
Dynamic
settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component works standalone and supports writing a wide range of data formats such as RC, ORC or AVRO.
If the Studio used to connect to a Hive database is operated on Windows, you must manually create a folder called
tmp in the root of the disk where this Studio is installed.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio. The
following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR client
library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in the MapR client
jar file; without adding it, you may encounter the following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native library of
that MapR client. This allows the subscription-based users to make full use of the Data viewer to view locally
in the Studio the data stored in MapR. For further information about how to set this argument, see the section
describing how to view data of Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the Hadoop
distribution you are using.
208
The sample data to be used in this scenario is employee information of a company, reading as follows:
1;Lyndon;Fillmore;21-05-2008;US
2;Ronald;McKinley;15-08-2008
3;Ulysses;Roosevelt;05-10-2008
4;Harry;Harrison;23-11-2007
5;Lyndon;Garfield;19-07-2007
6;James;Quincy;15-07-2008
7;Chester;Jackson;26-02-2008
8;Dwight;McKinley;16-07-2008
9;Jimmy;Johnson;23-12-2007
10;Herbert;Fillmore;03-04-2008
The information contains some employees' names and the dates when they are registered in a HR system. Since
these employees work for the US subsidiary of the company, you will create a US partition for this sample data.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the
Hive database to be used.
Note that if you are using the Windows operating system, you have to create a tmp folder at the root of the disk
where the Studio is installed.
Then proceed as follows:
In the Integration perspective of the Studio, create an empty Job from the Job Designs node in the
Repository tree view.
For further information about how to create a Job, see the chapter describing how to designing a Job in Talend
Studio User Guide.
209
2.
3.
2.
From the Property type list, select Built-in. If you have created the connection to be used in Repository,
then select Repository, click the
button to open the [Repository content] dialog box and select that
connection. This way, the Studio will reuse that set of connection information for this Job.
For further information about how to create an Hadoop connection in Repository, see the chapter describing
the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide.
3.
In the Version area, select the Hadoop distribution to be used and its version. If you cannot find from the list
the distribution corresponding to yours, select Custom so as to connect to a Hadoop distribution not officially
supported in the Studio.
For a step-by-step example about how to use this Custom option, see section Connecting to a custom Hadoop
distribution.
4.
In the Connection area, enter the connection parameters to the Hive database to be used.
5.
In the Name node field, enter the location of the master node, the NameNode, of the distribution to be used.
For example, talend-hdp-all:50300.
6.
In the Job tracker field, enter the location of the JobTracker of your distribution. For example, hdfs://talendhdp-all:8020.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs described in
Apache's documentation on http://hadoop.apache.org/.
210
2.
Select the Use an existing connection check box and from Component list, select the connection configured
in the tHiveConnection component you are using for this Job.
3.
Click the
4.
Click the
button four times to add four rows and in the Column column, rename them to Id, FirstName,
LastName and Reg_date, respectively.
211
Note that you cannot use the Hive reserved keywords to name the columns, such as location or date.
5.
In the Type column, select the type of the data in each column. In this scenario, Id is of the Integer type,
Reg_date is of the Date type and the others are of the String type.
6.
In the DB type column, select the Hive type of each column corresponding to their data types you have
defined. For example, Id is of INT and Reg_date is of TIMESTAMP.
7.
In the Data pattern column, define the pattern corresponding to that of the raw data. In this example, use
the default one.
8.
In Table name field, enter the name of the Hive table to be created. In this scenario, it is employees.
2.
From the Action on table list, select Create table if not exists.
3.
From the Format list, select the data format that this Hive table in question is created for. In this scenario,
it is TEXTFILE.
4.
Select the Set partitions check box to add the US partition as explained at the beginning of this scenario. To
define this partition, click the
5.
Leave the Set file location check box clear to use the default path for Hive table.
6.
Select the Set Delimited row format check box to display the available options of row format.
7.
Select the Field check box and enter a semicolon (;) as field separator in the field that appears.
8.
Select the Line check box and leave the default value as line separator.
212
2.
Select the Use an existing connection check box and from Component list, select the connection configured
in the tHiveConnection component you are using for this Job.
3.
From the Load action field, select LOAD to write data from the file holding the sample data that is presented
at the beginning of this scenario.
4.
In the File path field, enter the directory where the sample data is stored.
5.
In the Table name field, enter the name of the target table you need to load data in. In this scenario, it is
employees.
6.
7.
Select the Local check box, because the sample data used in this scenario is stored in a local machine rather
than in the distributed file system where the target Hive table is.
8.
Select the Set partitions check box and in the field that appears, enter the partition you need to add data to.
In this scenario, this partition is country='US'.
213
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
214
tHiveRow
tHiveRow
tHiveRow properties
Component family
Function
tHiveRow is the dedicated component for this database. It executes the HiveQL query stated in the specified
database. The row suffix means the component implements a flow in the Job design although it does not
provide output.
This component can also perform queries in a HBase database once you activate its Store by HBase function.
Purpose
Depending on the nature of the query and the database, tHiveRow acts on the actual DB structure or on the
data (although without handling data). The SQLBuilder tool helps you write your HiveQL statements easily.
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component
connection
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Version
Distribution
Select the product you are using as the Hadoop distribution from the drop-down list. The
options in the list vary depending on the component you are using. Among these options,
the Custom option allows you to connect to a custom Hadoop distribution rather than any
of the distributions given in this list and officially supported by Talend.
In order to connect to a custom distribution, once selecting Custom, click the
to display the dialog box in which you can alternatively:
button
1. Select Import from existing version to import jar files from a given Hadoop distribution
and then manually add other jar files which that Hadoop distribution does not provide.
2. Select Import from zip to import jar files from a zip file which, for example, contains
all required jar files set up in another Studio and is exported from that Studio.
In this dialog box, the active check box must be kept selected so as to import
the jar files pertinent to the connection to be created between the custom
distribution and this component.
For an step-by-step example about how to connect to a custom Hadoop distribution and
share this connection, see section Connecting to a custom Hadoop distribution.
Hive version
Select the version of the Hadoop distribution you are using. Note that if you use
Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
215
tHiveRow properties
Connection
Connection
mode
Select a connection mode from the list. The options vary depending on the distribution you
are using.
Hive server
Select the Hive server through which you want the Job using this component to execute
queries on Hive.
This Hive server list is available only when the Hadoop distribution to be used such
as HortonWorks Data Platform V1.2.0 (Bimota) supports HiveServer2. It allows you
to select HiveServer2 (Hive 2), the server that better support concurrent connections of
multiple clients than HiveServer (Hive 1).
For further information about HiveServer2, see https://cwiki.apache.org/Hive/setting-uphiveserver2.html.
Host
Port
Database
Use kerberos If you are accessing a Hive Metastore running with Kerberos security, select this check box
authentication and then enter the relevant parameters in the fields that appear.
The values of those parameters can be found in the hive-site.xml file of the Hive system
to be used.
1. Hive principal uses the value of hive.metastore.kerberos.principal. This is the service
principal of the Hive Metastore.
2. Metastore URL uses the value of javax.jdo.option.ConnectionURL. This is the JDBC
connection string to the Hive Metastore.
3. Driver class uses the value of javax.jdo.option.ConnectionDriverName. This is the
name of the driver for the JDBC connection.
4. Username uses the value of javax.jdo.option.ConnectionUserName. This, as well as the
Password parameter, is the user credential for connecting to the Hive Metastore.
5. Password uses the value of javax.jdo.option.ConnectionPassword.
This check box is available depending on the Hadoop distribution you are connecting to.
Use a keytab to Select the Use a keytab to authenticate check box to log into a Kerberos-enabled Hadoop
authenticate
system using a given keytab file. A keytab file contains pairs of Kerberos principals and
encrypted keys. You need to enter the principal to be used in the Principal field and the
access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal
designates but must have the right to read the keytab file being used. For example, the user
name you are using to execute a Job is user1 and the principal to be used is guest; in this
situation, ensure that user1 has the right to read the keytab file to be used.
Hadoop properties
Set Jobtracker Select this check box to indicate the location of the Jobtracker service within the Hadoop
URI
cluster to be used. For example, we assume that you have chosen a machine called machine1
as the JobTracker, then set its location as machine1:portnumber. A Jobtracker is the
service that assigns Map/Reduce tasks to specific nodes in a Hadoop cluster. Note that the
notion job in this term JobTracker does not designate a Talend Job, but rather a Hadoop
job described as MR or MapReduce job in Apache's Hadoop documentation on http://
hadoop.apache.org.
This property is required when the query you want to use is executed in Windows and it is
a Select query. For example, SELECT your_column_name FROM your_table_name
If you use YARN such as Hortonworks Data Platform V2.0.0 or Cloudera CDH4.3 +
(YARN mode), you need to specify the location of the Resource Manager instead of the
Jobtracker. Then, if necessary, select the Set resourcemanager scheduler address check
box and enter the Scheduler address in the field that appears. Furthermore, if required, you
can allocate proper memory volumes to the Map and the Reduce computations and the
216
tHiveRow properties
ApplicationMaster of YARN by selecting the Set memory check box in the Advanced
settings view. For further information about the Resource Manager and its scheduler and
the ApplicationMaster, see YARN's documentation such as http://hortonworks.com/blog/
apache-hadoop-yarn-concepts-and-applications/.
For further information about the Hadoop Map/Reduce framework, see the Map/Reduce
tutorial in Apache's Hadoop documentation on http://hadoop.apache.org.
Set NameNode Select this check box to indicate the location of the NameNode of the Hadoop cluster to
URI
be used. The NameNode is the master node of a Hadoop cluster. For example, we assume
that you have chosen a machine called masternode as the NameNode of an Apache Hadoop
distribution, then the location is hdfs://masternode:portnumber.
This property is required when the query you want to use is executed in Windows and it is
a Select query. For example, SELECT your_column_name FROM your_table_name
For further information about the Hadoop Map/Reduce framework, see the Map/Reduce
tutorial in Apache's Hadoop documentation on http://hadoop.apache.org.
Schema
and A schema is a row description. It defines the number of fields to be processed and passed
Edit Schema
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic:
see Talend Studio User Guide.
Table Name
Guess Query
Click the Guess Query button to generate the query which corresponds to your table
schema in the Query field.
Query
Enter your DB query paying particularly attention to properly sequence the fields in order
to match the schema definition.
For further information about the Hive query language, see https://cwiki.apache.org/Hive/
languagemanual.html.
Compressed data in the form of Gzip or Bzip2 can be processed through the
query statements. For details, see https://cwiki.apache.org/confluence/display/
Hive/CompressedStorage.
Hadoop provides different compression formats that help reduce the space needed
for storing files and speed up data transfer. When reading a compressed file, the
Studio needs to uncompress it before being able to feed it to the input flow.
Die on error
This check box is selected by default. Clear the check box to skip the row on error and
complete the process for error-free rows. If needed, you can retrieve the rows on error via
a Row > Rejects link.
HBase Configuration Store by HBase Select this check box to display the parameters to be set to allow the Hive components to
access HBase tables. Once this access is configured, you will be able to use, in tHiveRow
Available
and tHiveInput, the Hive QL statements to read and write data in HBase.
only when
the
Use
For further information about this access involving Hive and HBase, see Apache's Hive
an existing
documentation about Hive/HBase integration.
connection
check box is
clear
Zookeeper
quorum
Type in the name or the URL of the Zookeeper service you use to coordinate the transaction
between Talend and HBase.
Zookeeper
client port
Type in the number of the client listening port of the Zookeeper service you are using.
Define the jars Select this check box to display the Register jar for HBase table, in which you can register
to register for any missing jar file required by HBase, for example, the Hive Storage Handler, by default,
HBase
registered along with your Hive installation.
217
tHiveRow properties
Temporary path If you do not want to set the Jobtracker and the NameNode when you execute the query
select * from your_table_name, you need to set this temporary path. For example,
/C:/select_all in Windows.
Propagate
QUERYs
recordset
Select this check box to insert the result of the query into a COLUMN of the current flow.
Select this column from the use column list.
This option allows the component to have a different schema from that of the
preceding component. Moreover, the column that holds the QUERYs recordset
should be set to the type of Object and this component is usually followed by
tParseRecordSet.
Hadoop
properties
Talend Studio uses a default configuration for its engine to perform operations in a Hadoop
distribution. If you need to use a custom configuration in a specific situation, complete this
table with the property or properties to be customized. Then at runtime, the customized
property or properties will override those default ones.
For further information about the properties required by Hadoop and its related systems
such as HDFS and Hive, see Apache's Hadoop documentation on http://hadoop.apache.org,
or the documentation of the Hadoop distribution you need to use.
Hive properties Talend Studio uses a default configuration for its engine to perform operations in a Hive
database. If you need to use a custom configuration in a specific situation, complete this
table with the property or properties to be customized. Then at runtime, the customized
property or properties will override those default ones. For further information for Hive
dedicated properties, see https://cwiki.apache.org/confluence/display/Hive/AdminManual
+Configuration.
Mapred
job
map memory
mb and Mapred
job
reduce
memory mb
Path separator Leave the default value of the Path separator in server as it is, unless you have changed
in server
the separator used by your Hadoop distribution's host machine for its PATH variable or in
other words, that separator is not a colon (:). In that situation, you must change this value
to the one you are using in that host.
tStatCatcher
Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful when
you need to access database tables having the same data structure but in different databases, especially when
you are working in an environment where you cannot change your Job settings, for example, when your Job
has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
218
This component offers the benefit of flexible DB queries and covers all possible Hive QL queries.
Related scenarios
If the Studio used to connect to a Hive database is operated on Windows, you must manually create a folder
called tmp in the root of the disk where this Studio is installed.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio.
The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR
client library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in
the MapR client jar file; without adding it, you may encounter the following error: no MapRClient in
java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native
library of that MapR client. This allows the subscription-based users to make full use of the Data viewer to
view locally in the Studio the data stored in MapR. For further information about how to set this argument,
see the section describing how to view data of Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the
Hadoop distribution you are using.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment
section Scenario 1: Removing and regenerating a MySQL table index.
You need to keep in mind the parameters required by Hadoop, such as NameNode and Jobtracker, when
configuring this component since the component needs to connect to a Hadoop distribution.
219
tMongoDBBulkLoad
tMongoDBBulkLoad
tMongoDBBulkLoad properties
Component family
Function
tMongoDBBulkLoad reads data from CSV, TSV or JSON files and imports them into the specified
MongoDB database.
Purpose
tMongoDBBulkLoad allows you to import data files in different formats (CSV, TSV or JSON)
into the specified MongoDB database so that the data can be further processed.
Basic settings
MongoDB directory
Select this check box to provide the information of the local database
that you want to use.
Local DB path: type in the path to the local database specified
when starting the MongoDB server.
Server
Port
Listening port of the database server. Note that the default value
27017 will be used if the port is not specified.
This field is available only when the Use replica set
address check box is not selected.
Database
Collection
Required authentication
Data file
Type in the full path of the file from which the data will be imported
or click the [...] button to browse to the desired data file.
Make sure that the data file is in standard format. For
example, the fields in CSV files should be separated with
commas.
220
File type
Select the proper file type from the list. CSV, TSV and JSON are
supported.
Action on data
Upsert fields
Select this check box to use the first line in CSV or TSV files as a
header.
This check box is available only when you select CSV or
TSV from the File type list.
Ignore blanks
Select this check box to ignore the empty fields in CSV or TSV files.
This check box is available only when you select CSV or
TSV from the File type list.
Advanced settings
Print log
Additional arguments
tStatCatcher Statistics
Global Variables
Select this check box to collect the log data at a component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component can be used together with the tMongoDBInput component to check if the data
is imported as expected.
Limitation
n/a
221
Drop the following components from the Palette onto the design workspace: two tMongoDBBulkLoad
components, two tMongoDBInput components, and two tLogRow components.
2.
Connect the first tMongoDBBulkLoad to the first tMongoDBInput using a Trigger > OnSubjobOk link.
3.
Connect the first tMongoDBInput to the first tLogRow using a Row > Main link.
4.
Repeat the two steps above to connect the second tMongoDBBulkLoad to the second tMongoDBInput,
and the second tMongoDBInput to the second tLogRow.
5.
Connect the first tMongoDBInput to the second tMongoDBBulkLoad using a Trigger > OnSubjobOk
link.
6.
Label the two tLogRow components to better identify the data displayed on the console.
222
2.
In the MongoDB directory field, type in the MongoDB home directory. In this example, it is D:/MongoDB.
3.
In the Server and Port fields, fill in the information required for the connection to MongoDB. In this example,
type in localhost and 27017.
4.
In the Database field, type in the database to import data to, bookstore in this example.
In the Collection field, type in the collection to import data to, books in this example.
5.
Select the Drop collection if exist check box to remove the specified collection if it already exists.
6.
Browse to the desired data file from which you want to import data. In this example, it is D:/Input/books.csv,
which is a standard CSV file containing four columns: id, title, author, and category.
id,title,author,category
1,Computer Networks,Larry Peterson,Computer Science
2,David Copperfield,Charles Dickens,Language&Literature
3,Life of Pi,Yann Martel,Language&Literature
7.
8.
9.
Select the First line is header check box to use the first line in the CSV file as a header.
Select the Ignore blanks check box to ignore the blank fields (if any) in the CSV file.
Double-click the first tMongoDBInput component to open its Basic settings view in the Component tab.
223
2.
In the Server and Port fields, fill in the information required for the connection to MongoDB. In this example,
type in localhost and 27017.
3.
In the Database field, type in the database from which the data will be read, bookstore in this example.
4.
In the Collection field, type in the collection from which the data will be read, books in this example.
5.
Click Edit schema to define the data structure to be read from the MongoDB collection.
224
6.
In the Mapping table, the Column field is automatically populated with the defined schema. You do not
need to fill in the Parent node path column.
7.
Double-click the first tLogRow component to open its Basic settings view in the Component tab.
8.
Double-click the second tMongoDBBulkLoad component to open its Basic settings view in the Component
tab.
225
2.
In the MongoDB directory field, type in the MongoDB home directory. In this example, it is D:/MongoDB.
3.
In the Server and Port fields, fill in the information required for the connection to MongoDB. In this example,
type in localhost and 27017.
4.
In the Database field, type in the target database to import data, bookstore in this example.
In the Collection field, type in the target collection to import data, books in this example.
5.
Browse to the desired data file from which you want to import data. Here, select books.json.
{
"id": "4",
"title": "Les Miserables",
"author": "Victor Hugo",
"category": "Language&Literature"
}
{
"id": "5",
"title": "Advanced Database Systems",
"author": "Carlo Zaniolo",
"category": "Database"
}
6.
7.
8.
Click the Advanced settings tab to define the additional arguments as needed.
226
In this example, add the argument " --jsonArray" to accept the imported data within a single JSON array.
Repeat Step 1 through Step 6 described in the procedure Validating that the CSV file is imported successfully
to configure the second tMongoDBInput component.
2.
Repeat Step 7 through Step 8 described in the procedure Validating that the CSV file is imported successfully
to configure the second tLogRow component.
227
2.
The data from the collection books in the MongoDB database bookstore is displayed on the console, which
contains the data imported from both the CSV file books.csv and the JSON file books.json.
228
tMongoDBClose
tMongoDBClose
tMongoDBClose properties
Component family
Function
Purpose
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Limitation
n/a
Related scenario
For a related scenario, see section Scenario 1: Creating a collection and writing data to it.
229
tMongoDBConnection
tMongoDBConnection
tMongoDBConnection properties
Component family
Function
Purpose
This component allows you to create a connection to a Mongo database and reuse that connection
in other components.
Basic settings
DB Version
Database
Required authentication
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Usage
This component is generally used with other Mongo components, particularly tMongoClose.
Limitation
n/a
Related scenario
For a related scenario, see section Scenario 1: Creating a collection and writing data to it.
230
tMongoDBInput
tMongoDBInput
tMongoDBInput Properties
Component family
Function
Purpose
This component allows you to retrieve records from a collection in the Mongo database and
transfer them to the following component for display or storage.
Basic settings
Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
DB Version
Database
Required authentication
Collection
Query
in the
refers
query
query
Mapping
Specify the parent node for the column in the Mongo database.
Sort by
Specify the column and choose the order for the sort operation.
Limit
231
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at the component
level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
As a start component, tMongoDBInput allows you to retrieve records from a collection in the
Mongo database and transfer them to the following component for display or storage.
To insert data into the database, see section Scenario 1: Creating a collection and writing data to it.
2.
3.
4.
232
2.
3.
4.
233
5.
6.
In the Collection field, enter the name of the collection, namely blog.
7.
Click the [...] button next to Edit schema to open the schema editor.
8.
Click the [+] button to add five columns, namely id, author, title, keywords and contents, with the type as
Integer and String respectively.
9.
10. The columns now appear in the left part of the Mapping area.
11. For columns author, title, keywords and contents, enter their parent node post so that the data can be retrieved
from the correct positions.
234
Related scenarios
12. In the Query box, enter the advanced query statement to retrieve the posts whose author is Anderson:
"{post.author : 'Anderson'}"
This statement requires that the sub-node of post, the node author, should have the value "Anderson".
13. Double-click tLogRow to open its Basic settings view.
Select Table (print values in cells of a table) for a better display of the results.
2.
Related scenarios
For related scenarios, see:
section Scenario 1: Creating a collection and writing data to it
section Scenario: Using Mongo functions to create a collection and write data to it
235
tMongoDBOutput
tMongoDBOutput
tMongoDBOutput Properties
Component family
Function
Purpose
This component executes the action defined on the collection in the Mongo database.
Basic settings
Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
DB Version
Database
Required authentication
Collection
Action on data
Mapping
236
Specify the parent node for the column in the Mongo database.
Advanced settings
Die on error
tStatCatcher Statistics
Global Variables
Select this check box to collect the log data at the component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
tMongoDBOutput executes the action defined on the collection in the Mongo database based
on the flow incoming from the preceding component in the Job.
The "multi" parameter, which allows to update multiple documents at a time, is
not supported. Therefore, if two documents have the same key, the first is always
updated, but the second never will.
For the update operation, the key cannot be a JSON array.
2.
3.
4.
5.
237
6.
7.
2.
3.
238
4.
Select the Use existing connection and Drop collection if exist check boxes.
Talend Open Studio for Big Data Components Reference Guide
239
In the Collection field, enter the name of the collection, namely blog.
5.
Click the [...] button next to Edit schema to open the schema editor.
6.
Click the [+] button to add five columns in the right part, namely id, author, title, keywords and contents,
with the type as Integer and String respectively.
Click
The columns now appear in the left part of the Mapping area.
For columns author, title, keywords and contents, enter their parent node post. By doing so, those nodes reside
under the node post in the Mongo collection.
8.
240
Click the [...] button next to Edit schema to open the schema editor.
10. Click the [+] button to add five columns, namely id, author, title, keywords and contents, with the type as
Integer and String respectively.
241
2.
3.
Switch to the database talend and read data from the collection blog in the Mongo command line client. You
can find that author, title, keywords and contents all reside under the node post. Meanwhile, the records are
stored in the same order as the source input.
242
Such records can be inserted to the database following the instructions of section Scenario 1: Creating a collection
and writing data to it.
2.
3.
4.
5.
6.
7.
2.
243
As shown above, the 3rd record has its author changed and the 4th record is new.
4.
Select the Use existing connection and Die on error check boxes.
244
In the Collection field, enter the name of the collection, namely blog.
Select Upsert from the Action on data list.
5.
Click the [...] button next to Edit schema to open the schema editor.
6.
Click the [+] button to add five columns in the right part, namely id, author, title, keywords and contents,
with the type as Integer and String respectively.
Click
In the Advanced Settings view, select the Generate JSON Document check box.
Select the Remove root node check box.
In the Data node and Query node fields, enter "data" and "query".
245
8.
Click the [...] button next to Configure JSON Tree to open the configuration interface.
9.
Right-click the node rootTag and select Add Sub-element from the contextual menu.
In the dialog box that appears, type in data for the Data node:
10. Select all the columns under the Schema list and drop them to the data node.
In the window that appears, select Create as sub-element of target node.
246
247
Click the [+] button to add five columns, namely id, author, title, keywords and contents, with the type as
Integer and String respectively.
Click OK to close the editor.
The columns now appear in the left part of the Mapping area.
For columns author, title, keywords and contents, enter their parent node post so that the data can be retrieved
from the correct positions.
248
In the Mode area, select Table (print values in cells of a table for better display.
2.
As shown above, the 3rd record has its author updated and the 4th record is inserted.
249
tMongoDBRow
tMongoDBRow
tMongoDBRow Properties
Component family
Function
tMongoDBRow executes the commands and functions provided by the Mongo database.
Purpose
This component allows you to execute the commands and functions of the Mongo database.
Basic settings
Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
DB Version
Database
Required authentication
Execute command
Function
Parameters value
Click the [+] button to add lines as needed and then define the
parameter values in the form of variables or constant values, for
example row1.author or "Andy". Note that the parameter values
correspond to the parameters defined in the Function field, in the
same order.
Not available when the Execute command check box is selected.
Die on error
250
Advanced settings
tStatCatcher Statistics
Usage
tMongoDBRow allows you to manipulate the Mongo database through the Mongo commands
and functions.
Select this check box to collect the log data at the component level.
Limitation
n/a
2.
3.
4.
5.
6.
7.
251
2.
3.
4.
252
5.
Click the [...] button next to Edit schema to open the schema editor.
253
6.
Click the [+] button to add four columns in the right part, namely author, title, keywords and contents, with
the type of String.
Click
In the Parameters value table, click the [+] button to add four lines and enter the values in sequence:
row1.author, row1.title, row1.keywords and row1.contents. By doing so, data of row1 will be transferred to
the parameters defined in the function.
8.
254
9.
Click the [...] button next to Edit schema to open the schema editor.
10. Click the [+] button to add four columns, namely author, title, keywords and contents, with the type as String.
Click OK to close the editor.
11. Double-click tLogRow to open its Basic settings view.
In the Mode area, select Table (print values in cells of a table for better display.
2.
255
tNeo4jClose
tNeo4jClose
tNeo4jClose properties
Component family
Function
Purpose
Basic settings
Connection
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is to be used along with other Neo4j components, especially with tNeo4jConnection.
Limitation
n/a
Related scenarios
For scenario in which tNeo4jClose is used, see section Scenario: Import employees table into Neo4j with hierarchy
relationship and section Scenario: Importing employees with their manager in a single query.
256
tNeo4jConnection
tNeo4jConnection
tNeo4jConnection properties
Component family
Function
In embedded mode the tNeo4jConnection start the database, and check the server availability in REST
mode.
Purpose
tNeo4jConnection allows you to define a connection to a Neo4j database to be reuse by other Neo4j
components.
Basic settings
Database path
If you use Neo4j in embedded mode, specify the path of data file.
This field is available only if the Use a remote server check box is not
selected.
Server URL
Read only
Select this check box if you want to use the embedded database in read only
mode. It's useful if an application is already dedicated with the database.
Do not use this mode when you have any output Neo4j component
in your Job such as tNeo4jOutput, tNeo4jOutputRelationship or
tNeo4jRow.
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Usage
This component is generally used with other Neo4j components, particularly tNeo4jClose.
Limitation
n/a
Related scenarios
For scenario in which tNeo4jConnection is used, see section Scenario: Import employees table into Neo4j with
hierarchy relationship and section Scenario: Importing employees with their manager in a single query.
257
tNeo4jInput
tNeo4jInput
tNeo4jInput properties
Component family
Function
tNeo4jInput allows you to read data from Neo4j and send data in the Talend flow.
Purpose
tNeo4jInput reads data from Neo4j based on Cypher query allowing any further transformation or
processing of the data in the rest of the Job.
Basic settings
Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
Remote server
Database path
If you use Neo4j in embedded mode, specify the path of data file.
Schema and Edit Schema A schema is a row description. It defines the number of fields to be processed
and passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Click Edit Schema to make changes to the schema.
Server url
Shutdown after Job (Only Select this check box if you want to shutdown the Neo4j database at the end of
embedded database)
your Job. Only available in embedded mode.
Query
Enter your Cypher query with return parameters matching the mapping table.
Mapping
Complete this table to specify the column or columns to be extracted and the
corresponding column family or families. The Column column of this table is
automatically filled once you have defined the schema.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job level as
well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
Limitation
n/a
258
Scenario: Using Cypher when reading nodes from a Neo4j database in REST mode
Drop tNeo4jInput and tLogRow from the Palette onto the workspace.
2.
2.
Select the Remote server check box, and enter the database root URL in the Server URL field, "http://
localhost:7474/db/data" in this example.
3.
259
Scenario: Using Cypher when reading nodes from a Neo4j database in REST mode
4.
Click the [+] button to add the rows that you will use to define the schema, five columns in this example,
emp_firstname, emp_lastname, man_firstname, man_lastname, and r.
Under Column, click in the fields to enter the corresponding column names.
Click in the fields under Type to define the type of data.
Click OK to close the schema editor.
5.
6.
In the Mapping table, enter the returned parameters mapped by the schema columns as below:
7.
260
Scenario: Using Cypher when reading nodes from a Neo4j database in REST mode
8.
In the Mode area, select Vertical (each row is a key/value list) for a better display of the results.
9.
261
tNeo4jOutput
tNeo4jOutput
tNeo4jOutput properties
Component
family
Function
tNeo4jOutput receives data from the preceding component, and writes data into Neo4j.
Purpose
tNeo4jOutput is used to write data into a Neo4j database, and/or update or delete entries in the database based
on the index defined.
Basic settings
Use
connection
existing Select this check box and in the Component List click the relevant connection component
to reuse the connection details you already defined.
Remote server
Database path
If you use Neo4j in embedded mode, specify the path of data file.
This field appears only if you do not select the Use an existing connection check
box.
Server URL
Shutdown
(Only
database)
after job Select this check box if you want to shutdown the Neo4j database connection at the end
embedded of the Job.
This check box is available only if the Use an existing connection is selected.
Mapping
Opens the indexes and relationships mapping editor. Use it to index node or create
relationships during the node insertion.
Data action
Schema
schema
and
Edit A schema is a row description. It defines the number of fields to be processed and passed
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Index name
Index key
Index value
262
Commit every
Enter the number of rows to be completed before committing batches of nodes to the
DB. This option ensures transaction quality (but not rollback) and, above all, better
performance at execution.
This option is only supported by the embedded mode of the database. You can't
make transactions in REST mode.
Batch import
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This
is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as an output component and it always needs an incoming link.
Limitation
n/a
263
Drop the following components from the Palette onto the design workspace: tNeo4jConnection,
tOracleInput, tNeo4jOutput, tOracleInput, tNeo4jOutputRelationship, and tNeo4jClose.
2.
3.
4.
5.
Do the same to connect tNeo4jConnection to the first tOracleInput , connect the first tOracleInput to the
second tOracleInput and connect the second tOracleInput totNeo4jClose.
264
Double-click the tOracleConnection component to open its Basic settings view in the Component tab.
2.
Select the Oracle version that you are using from the DB Version list.
3.
4.
In the Database field, type the name of the database you want to use: orcl in this example, and in the Schema
field, type the schema you want to use: HR in this example.
5.
If required, type in the authentication information for the Oracle connection: Username and Password.
Double-click the tNeo4jConnection component to open its Basic settings view in the Component tab.
2.
In the Database path field, type the path of database file, " /home/erouan/java/tools/neo4jcommunity-1.9.M01/data/graph.db" in this example.
265
2.
Type in the required information for the connection or use the existing connection you have configured before.
In this scenario, select the Use existing connection check box.
3.
4.
Click the [+] button to add the rows that you will use to define the schema, five columns in this example,
employee_id, first_name, last_name, email, and manager_id.
UnderColumn, click the fields to enter the corresponding column names.
Click in the fields under Type to define the type of data.
Click OK to close the schema editor.
5.
In the Table name field, type in the name of the requested table: "employees" in this example.
6.
In the Query type list, selectBuilt-in. Then, click Guess Query to get the query statement.
266
"SELECT EMPLOYEE_ID,
FIRST_NAME,
LAST_NAME,
EMAIL,
MANAGER_ID,
FROM EMPLOYEES"
Click the tNeo4jOutput component and select the Component tab to open its Basic settings view.
2.
3.
Click Sync columns to retrieve the schema from the preceding component.
4.
5.
Click the [...] button next to the Mapping field to open the indexes and relationships mapping editor:
6.
Select the Auto index check box for the employee_id row to auto index nodes with this property.
7.
Select the employee_id row and click on the [+] button to affect a new index to the nodes.
8.
For the new row, enter the index name in the Name field: types in this example.
267
9.
On the same row enter the index key in the Key field: __type__ in this example.
10. Enter the index value in Value (empty for current row) field: Employee in this example. Each nodes inserted
will be indexed with the type "Employee".
11. Click OK to close the schema editor.
Configure the second tOracleInput component in the same way, as shown below:
2.
3.
Click Sync columns to retrieve the schema from the preceding component.
4.
In the Relationship type field, enter the relationship type, "MANAGE" in this example.
5.
268
Enter the index key to query in Index key field, " employee_id" in this example.
Select the index value to query in Index value column list, "employee_id" in this example.
6.
Select the relationship direction in Relationship direction list, "Incoming" in this example.
7.
8.
In the Mapping table, add two relationships mapping entries, as shown below:
2.
269
2.
270
tNeo4jOutputRelationship
tNeo4jOutputRelationship
tNeo4jOutputRelationship properties
Component
family
Function
tNeo4jOutputRelationship receives data from the preceding component, and writes relationships into Neo4j.
Purpose
Basic settings
Use existing connection Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
Remote server
Database path
If you use Neo4j in embedded mode, specify the path of data file.
This field appears only if you do not select the Use an existing connection
check box.
Schema and Edit schema A schema is a row description. It defines the number of fields to be processed and passed
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Server url
Shutdown after job (Only Select this check box if you want to shutdown the Neo4j database connection at the end
embedded database)
of the Job.
This check box is available only if the Use an existing connection is selected.
Relationship type
Start node / Index name Specify the index name to query the starting node of the newest relationship.
Start node / Index key
Start node / Index value Select the index value to query the starting node.
Relationship direction
Advanced
settings
Specify the index name to query the ending node of the newest relationship.
Mapping
Use this table to map relationship properties with the input schema columns.
Commit every
Enter the number of rows to be completed before committing batches of nodes together
the DB. This option ensures transaction quality (but not rollback) and, above all, better
performance at execution.
This option is only supported by the embedded mode of the database. You
can't make transactions in REST mode.
271
Related scenario
Batch import
Node store
memory
Relationship
mapped memory
Array store
memory
tStatCatcher Statistics
Global
Variables
Select this check box to gather the Job processing metadata at the Job level as well as
at each component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This
is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as an output component and it always needs an incoming link.
Limitation
n/a
Related scenario
For a scenario describing the use of the tNeo4jOutputRelationship component, see section Scenario: Import
employees table into Neo4j with hierarchy relationship.
272
tNeo4jRow
tNeo4jRow
tNeo4jRow properties
Component
family
Function
tNeo4jRow is the specific component for this database query. It executes the stated Cypher query onto the specified
database. The row suffix means the component implements a flow in the Job design although it doesn't provide
output.
Purpose
Depending on the nature of the query, tNeo4jRow acts on the data (although without handling data).
Basic settings
Use
connection
existing Select this check box and in the Component List click the relevant connection component
to reuse the connection details you already defined.
Remote server
Database path
If you use Neo4j in embedded mode, specify the path of data file.
This check box appears only if you do not select the Use an existing connection
check box.
Server url
Shutdown after job Select this check box if you want to shutdown the Neo4j database connection at the end
(Only
embedded of the Job.
database)
This check box is available only if the Use an existing connection is selected.
Schema
schema
and
Edit A schema is a row description. It defines the number of fields to be processed and passed
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Advanced
settings
Query
Enter your Cypher query. If you have some parameters, declare them with curly brackets
as {parameter} and map them in the Parameters table.
Parameters
Use this table to map Cypher query parameters with the input schema columns.
Die on error
This check box is selected by default. Clear the check box to skip the row on error and
complete the process for error-free rows.
tStatCatcher Statistics Select this check box to gather the Job processing metadata at the Job level as well as at
each component level.
Commit every
Enter the number of rows to be completed before committing batches of nodes together the
database. This option ensures transaction quality (but not rollback) and, above all, better
performance at execution.
This option is only supported by the embedded mode of the database. You can't
make transactions in REST mode.
Global
Variables
NB_NODE_INSERTED: Indicates the number of nodes inserted. This is an After variable and it returns an
integer.
NB_RELATIONSHIP_INSERTED: Indicates the number of relationships inserted. This is an After variable
and it returns an integer.
273
NB_PROPERTY_UPDATED: Indicates the number of properties updated. This is an After variable and it returns
an integer.
NB_NODE_DELETED: Indicates the number of nodes deleted. This is an After variable and it returns an integer.
NB_RELATIONSHIP_DELETED: Indicates the number of relationships deleted. This is an After variable and
it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
Limitation
n/a
274
Drop the following components from the Palette onto the design workspace: tNeo4jConnection,
tFileInputDelimited, tNeo4jOutput, tOracleInput, tNeo4jRow, and tNeo4jClose.
2.
3.
4.
5.
275
Double-click the tNeo4jConnection component to open its Basic settings view in the Component tab.
2.
In the Database path field, type the path of database file, " /home/erouan/java/tools/neo4jcommunity-1.9.M01/data/graph.db" in this scenario.
2.
Click the [...] button next to the File Name/Stream field to browse to the file that you want to read data from.
In this scenario, the path is "/home/erouan/java/projects/talend-neo4j-connector/test.csv".
3.
276
4.
Click the [+] button to add one column, namely type, of the String type.
Click OK to close the schema editor.
5.
Click the tNeo4jOutput component and select the Component tab to open its Basic settings view.
6.
Select the Use an existing connection check box. The only tNeo4jConnection component in this Job appears
in the Connection list.
7.
Click Sync columns to retrieve the schema from the preceding component.
8.
9.
Click the [...] button next to the Mapping field to opens indexes and relationships mapping editor:
277
10. Select the type row and click the [+] button to add a new row.
In the Name field of the newly added row, type in the index name: types in this example.
In the Key field of the newly added row, type in the index key: type in this example.
Leave the Value (empty for current row) field empty and click OK to close the mapping editor.
2.
Select the Oracle version that you are using from the DB Version list.
3.
278
4.
In the Database field, type the name of the database you want to use: orcl in this example, and in the Schema
field, type the schema you want to use: HR in this example.
5.
If required, type in the authentication information for the Oracle connection: Username and Password.
6.
7.
Click the [+] button to add the rows that you will use to define the schema, emp_firstName, emp_lastName,
man_firstName, and man_lastName in this example.
Under Column, click the fields to enter the corresponding column names.
Click the fields under Type to define the type of data.
Click OK to close the schema editor.
8.
Double-click the tNeo4jRow component to open its Basic settings view in the Component tab.
279
2.
Select the Use an existing connection check box. The only tNeo4jConnection component in this Job appears
in the Connection list.
3.
Click Sync columns to retrieve the schema from the preceding component.
4.
For each row this query creates an employee node if doesn't exist. At the same time, it links the inserted node
with its manager node, and if the manager node does not exist, it's created.
5.
In the Parameters table, map the query parameters with the schema columns as below:
280
2.
2.
281
tPigAggregate
tPigAggregate
tPigAggregate Properties
Component family
Function
This component allows you to group the original data by column and add one or more additional
columns to the output of preceding grouped data.
Purpose
The tPigAggregate component adds one or more additional columns to the output of the grouped
data to create data to be used by Pig.
Basic settings
Group by
Click the plus button to add one or more columns to set tuples in
the source data as group condition.
Operations
Click the plus button to add one or more columns to generate one
or more additional output columns based on conditions:
Additional Output Column: Select a column in the original data
as output column.
Function: Functions for operation on input data.
Input Column: Select a column in the original data as input
column.
Advanced settings
Increase parallelism
Select this check box to set the number of reduce tasks for the
MapReduce Jobs.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component is commonly used as intermediate step together with input component and
output component.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
282
Related scenario
Related scenario
For a tPigAggregate related scenario, section Scenario: Aggregating values and sorting data of tAggregateRow.
283
tPigCode
tPigCode
tPigCode Properties
Component family
Function
This component allows you to enter personalized Pig code to integrate it in Talend program.
You can execute this code only once.
Purpose
tPigCode extends the functionalities of a Talend Job through using Pig scripts.
Basic settings
Scripts
Advanced settings
Usage
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Enable escape
Select this check box so that you can simply write plain Pig
code in the Scripts field without need to bear in mind the escape
characters, otherwise required for proper Java code generation.
This component is commonly used as intermediate step together with input component and
output component.
A tPigCode component can execute only one Pig Latin statement, therefore, if you need to
execute multiple statements, you have to use a corresponding number of tPigCode components
to run them, one after another.
If a particular .jar file is required to execute a statement, you need to register that library file via
the tPigLoad component that starts the Pig process in question.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
284
Scenario: Selecting a column of data from an input file and store it into a local file
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Drop the following components from the Palette to the design workspace: tPigCode, tPigLoad,
tPigStoreResult.
2.
Right-click tPigLoad to connect it to tPigCode using a Row > Pig Combine connection.
3.
Right-click tPigCode to connect it to tPigStoreResult using a Row > Pig Combine connection.
2.
Click the three-dot button next to Edit schema to add columns for tPigLoad.
285
Scenario: Selecting a column of data from an input file and store it into a local file
3.
Click the plus button to add Name, Country and Age and click OK to save the setting.
4.
5.
Fill in the Input filename field with the full path to the input file.
In this scenario, the input file is CustomerList which contains rows of names, country names and age.
6.
7.
2.
Click Sync columns to retrieve the schema structure from the preceding component.
3.
286
Scenario: Selecting a column of data from an input file and store it into a local file
2.
Click Sync columns to retrieve the schema structure from the preceding component.
3.
Fill in the Result file field with the full path to the result file.
In this scenario, the result is saved in Result file.
4.
5.
6.
287
Scenario: Selecting a column of data from an input file and store it into a local file
288
tPigCross
tPigCross
tPigCross Properties
Component family
Function
This component allows you to compute the cross data of two or more relations.
Purpose
The tPigCross component uses CROSS operator to compute the Cartesian product.
Basic settings
Schema
Schema
and
Edit A schema is a row description. It defines the number of fields to be processed and
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Click Edit Schema to make changes to the schema.
Built-in: The schema will be created and stored locally for this component only.
Related topic: see Talend Studio User Guide.
Advanced settings
Cross filename
Field separator
Enter character, string or regular expression to separate fields for the transferred
data.
Use partitioner
Select this check box to specify the Hadoop Partitioner that controls the
partitioning of the keys of the intermediate map-outputs. For further information
about the usage of Hadoop Partitioner, see:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/
Partitioner.html
Increase parallelism
Select this check box to set the number of reduce tasks for the MapReduce Jobs
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job level as well
as at each component level.
Usage
This component is commonly used as intermediate step together with input component and output
component.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio.
The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR
client library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in
the MapR client jar file; without adding it, you may encounter the following error: no MapRClient in
java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native
library of that MapR client. This allows the subscription-based users to make full use of the Data viewer
to view locally in the Studio the data stored in MapR. For further information about how to set this
argument, see the section describing how to view data of Talend Open Studio for Big Data Getting Started
Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the
Hadoop distribution you are using.
Limitation
289
Related scenario
Related scenario
No scenario is available for this component yet.
290
tPigDistinct
tPigDistinct
tPigDistinct Properties
Component family
Function
Purpose
Basic settings
Advanced settings
Usage
Increase parallelism
Select this check box to set the number of reduce tasks for the
MapReduce Jobs
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
This component is commonly used as intermediate step together with input component and
output component.
This component will not maintain the original order in the input file.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Related scenario
For more infomation regarding the tPigDistinct component in use, see section Scenario: Filtering rows of data
based on a condition and saving the result to a local file of tPigFilterRow.
291
tPigFilterColumns
tPigFilterColumns
tPigFilterColumns Properties
Component family
Function
This component allows you to select one or more columns from a relation based on defined
condition.
Purpose
This tPigFilterColumns component selects data or filters out data from a relation based on
defined filter conditions.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component is commonly used as intermediate step together with input component and
output component.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
Related Scenario
For a tPigFilterColumns related scenario, see section Scenario: Joining two files based on an exact match and
saving the result to a local file of tPigJoin.
292
tPigFilterRow
tPigFilterRow
tPigFilterRow Properties
Component family
Function
The tPigFilterRow component filters the input flow in a Pig process based on conditions set
on given column(s).
Purpose
This component is used to filter the input flow in a Pig process based on conditions set on one
or more columns.
Basic settings
Filter configuration
Click the Add button beneath the Filter configuration table to set
one or more filter conditions.
Note that when the column to be used by a condition is of the
string type, the text to be entered in the Value column must
be surrounded by both single and double quotation marks (for
example, "'California'"), because the double quotation marks are
required by Talend's code generator and the single quotation
marks required by Pig's grammar.
This table disappears if you select Use advanced filter.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
293
Scenario: Filtering rows of data based on a condition and saving the result to a local file
Drop the following components from the Palette to the design workspace: tPigLoad, tPigDistinct,
tPigFilterRow, and tPigStoreResult.
2.
Right-click tPigLoad, select Row > Pig Combine from the contextual menu, and click tPigDistinct to link
these two components.
3.
Repeat this operation to link tPigDistinct to tPigFilterRow, and tPigFilterRow to tPigStoreResult using
Row > Pig Combine connections to form a Pig process.
294
Scenario: Filtering rows of data based on a condition and saving the result to a local file
2.
Click the [...] button next to Edit schema to open the [Schema] dialog box.
3.
Click the [+] button to add three columns according to the data structure of the input file: Name (string),
Country (string) and Age (integer), and then click OK to save the setting and close the dialog box.
4.
5.
Fill in the Input file URI field with the full path to the input file.
6.
Select PigStorage from the Load function list, and leave rest of the settings as they are.
7.
Double-click tPigDistinct to open its Basic settings view, and click Sync columns to make sure that the
input schema structure is correctly propagated from the preceding component.
This component will remove any duplicates from the data flow.
295
Scenario: Filtering rows of data based on a condition and saving the result to a local file
2.
Click Sync columns to make sure that the input schema structure is correctly propagated from the preceding
component.
3.
Select Use advanced filter and fill in the Filter field with filter expression:
"Country matches 'PuertoRico'"
This filter expression selects rows of data that contains "PuertoRico" in the Country column.
2.
Click Sync columns to make sure that the input schema structure is correctly propagated from the preceding
component.
3.
Fill in the Result file field with the full path to the result file.
4.
If the target file already exists, select the Remove result directory if exists check box.
5.
Select PigStorage from the Store function list, and leave rest of the settings as they are.
2.
Press F6 or click the Run button on the Run tab to run the Job.
The result file contains the information of customers from the specified country.
296
Scenario: Filtering rows of data based on a condition and saving the result to a local file
297
tPigJoin
tPigJoin
tPigJoin Properties
Component
family
Function
This component allows you to perform join of two files based on join keys.
Purpose
The tPigJoin component is used to perform inner joins and outer joins of two files based on join keys to create
data that will be used by Pig.
Basic settings
Schema and Edit schema A schema is a row description. It defines the number of fields to be processed and
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
To make this component work, two schemas must be set: the schema of the
main flow and the schema of the lookup flow. In the output part of the main
schema, the columns of the main input file must be manually concatenated
with those of the lookup file.
Built-in: The schema will be created and stored locally for this component only.
Related topic: see Talend Studio User Guide.
Reference file
Schema and Edit schema A schema is a row description. It defines the number of fields to be processed and
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
To make this component work, two schemas must be set: the schema of the
main flow and the schema of the lookup flow. In the output part of the main
schema, the columns of the main input file must be manually concatenated
with those of the lookup file.
Built-in: The schema will be created and stored locally for this component only.
Related topic: see Talend Studio User Guide.
Filename
Field Separator
Enter character, string or regular expression to separate fields for the transferred data.
Join key
Click the plus button to add lines to set the Join key for Input file and Lookup file.
Join mode
Advanced
settings
298
Select this check box to optimize the performance of joins using REPLICATED,
SKEWED, or MERGE joins. For further information about optimized joins, see:
Scenario: Joining two files based on an exact match and saving the result to a local file
http://pig.apache.org/docs/r0.8.1/piglatin_ref1.html#Specialized+Joins
Use partitioner
Select this check box to specify the Hadoop Partitioner that controls the partitioning
of the keys of the intermediate map-outputs. For further information about the usage
of Hadoop Partitioner, see:
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/
Partitioner.html
Increase parallelism
Select this check box to set the number of reduce tasks for the MapReduce Jobs
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job level as well as
at each component level.
Usage
This component is commonly used as intermediate step together with input component and output component.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio. The
following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR
client library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in
the MapR client jar file; without adding it, you may encounter the following error: no MapRClient in
java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native library
of that MapR client. This allows the subscription-based users to make full use of the Data viewer to view
locally in the Studio the data stored in MapR. For further information about how to set this argument, see the
section describing how to view data of Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the Hadoop
distribution you are using.
Limitation
The reference file contains only the information of group IDs and group names:
1;group_A
2;group_B
Drop the following components from the Palette to the design workspace: tPigLoad, tPigJoin,
tPigFilterColumns, and tPigStoreResult.
299
Scenario: Joining two files based on an exact match and saving the result to a local file
2.
Connect these components in a series using Row > Pig Combine connections.
2.
Click the [...] button next to Edit schema to open the [Schema] dialog box.
3.
Click the [+] button to add columns, name them and define the column types according to the structure of
the input file. In this example, the input schema has five columns: id (integer), firstName (string), lastName
(string), groupId (integer), and salary (double).
Then click OK to validate the setting and close the dialog box.
300
Scenario: Joining two files based on an exact match and saving the result to a local file
4.
5.
6.
Fill in the Input file URI field with the full path to the input file, and leave the rest of the setting as they are.
2.
Click the [...] for the main schema to open the [Schema] dialog box.
3.
Check that input schema is correctly retrieved from the preceding component. If needed, click the [->>]
button to copy all the columns of the input schema to the output schema.
301
Scenario: Joining two files based on an exact match and saving the result to a local file
4.
Click the [+] button under the output panel to add new columns according to the data structure of the reference
file, groupId_ref (integer) and groupName (string) in this example. Then click OK to close the dialog box.
5.
Click the [...] for the schema lookup flow to open the [Schema] dialog box.
6.
Click the [+] button under the output panel to add two columns: groupId_ref (integer) and groupName (string),
and then click OK to close the dialog box.
7.
In the Filename field, specify the full path to the reference file.
8.
Click the [+] button under the Join key table to add a new line, and select groupId and groupId_ref
respectively from the Input and Lookup lists to match data from the main input flow with data from the
lookup flow based on the group ID.
9.
2.
Click the [...] button next to Edit schema to open the [Schema] dialog box.
302
Scenario: Joining two files based on an exact match and saving the result to a local file
3.
From the input schema, select the columns you want to include in your result file by clicking them one after
another while pressing the Shift key, and click the [->] button to copy them to the output schema. Then, click
OK to validate the schema setting and close the dialog box.
In this example, we want the result file to include all the information except the group IDs.
4.
5.
Click Sync columns to retrieve the schema structure from the preceding component.
6.
Fill in the Result file field with the full path to the result file, and select the Remove result file directory
if exists check box.
7.
Select PigStorage from the Store function list, and leave rest of the settings as they are.
2.
303
Scenario: Joining two files based on an exact match and saving the result to a local file
304
tPigLoad
tPigLoad
tPigLoad Properties
Component
family
Function
This component allows you to set up a connection to the data source for a current transaction.
Purpose
The tPigLoad component loads original input data to an output stream in just one single transaction, once the
data has been validated.
Basic settings
Property type
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed
Schema
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only. Related topic:
see Talend Studio User Guide.
Local
Click this radio button to run Pig scripts in Local mode. In this mode, all files are installed
and run from your local host and file system.
Map/Reduce
button
1. Select Import from existing version to import jar files from a given Hadoop
distribution and then manually add other jar files which that Hadoop distribution does
not provide.
2. Select Import from zip to import jar files from a zip file which, for example, contains
all required jar files set up in another Studio and is exported from that Studio.
In this dialog box, the active check box must be kept selected so as to import
the jar files pertinent to the connection to be created between the custom
distribution and this component.
For an step-by-step example about how to connect to a custom Hadoop distribution
and share this connection, see section Connecting to a custom Hadoop distribution.
305
tPigLoad Properties
Note that if you use Hortonworks Data Platform V2.0.0, the type of the operating
system for running the distribution and a Talend Job must be the same, such as Windows
or Linux.
Use Kerberos authentication:
If you are accessing the Hadoop cluster running with Kerberos security, select this check
box, then, enter the Kerberos principal name for the NameNode in the field displayed.
This enables you to use your user name to authenticate against the credentials stored in
Kerberos.
In addition, as this component needs the JobTracker to perform Map/Reduce
computations, you have to enter your JobTracker principal in the corresponding field.
This check box is available depending on the Hadoop distribution you are connecting to.
Use a keytab to authenticate:
Select the Use a keytab to authenticate check box to log into a Kerberos-enabled
Hadoop system using a given keytab file. A keytab file contains pairs of Kerberos
principals and encrypted keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal
designates but must have the right to read the keytab file being used. For example, the
user name you are using to execute a Job is user1 and the principal to be used is guest; in
this situation, ensure that user1 has the right to read the keytab file to be used.
NameNode URI:
Type in the location of the NameNode corresponding to the Map/Reduce version to be
used.
JobTracker host:
Type in the location of the JobTracker corresponding to the Map/Reduce version to be
used.
In Jobtracker, you can easily find the execution status of your Pig Job because the
name of the Job is automatically created by concatenating the name of the project
that contains the Job, the name and version of the Job itself and the label of the first
tPigLoad component used in it. The naming convention of a Pig Job in Jobtracker is
ProjectName_JobNameVersion_FirstComponentName.
User name:
Enter the user name under which you want to execute the Job. Since a file or a directory in
Hadoop has its specific owner with appropriate read or write rights, this field allows you
to execute the Job directly under the user name that has the appropriate rights to access
the file or directory to be processed. Note that this field is available depending on the
distribution you are using.
Load function
306
tPigLoad Properties
SequenceFileLoader: Loads data of the SequenceFile formats. Then you need to complete
the configuration of the file to be loaded in the Sequence Loader Configuration area that
appears. This function is for the Map/Reduce mode only.
RCFilePigStorage: Loads data of the RCFile format. This function is for the Map/Reduce
mode only.
AvroStorage: Loads Avro files. For further information about AvroStorage, see
Apache's documentation on https://cwiki.apache.org/confluence/display/PIG/AvroStorage.
This function is for the Map/Reduce mode only.
Custom: Loads data using any user-defined load function. To do this, you need to register,
in the Advanced settings tab view, the jar file containing the function to be used, and then,
in the field displayed next to this Load function field, specify that function.
For example, after registering a jar file called piggybank.jar, you can enter
org.apache.pig.piggybank.storage.XMLLoader('attr') as (xml:chararray) to use the custom
function, XMLLoader contained in that jar file. For further information about this
piggybank.jar file, see https://cwiki.apache.org/PIG/piggybank.html.
Input file URI
Fill in this field with the full local path to the input file.
This field is not available when you select HCatLoader from the Load function
list.
HCataLog
Configuration
Fill the following fields to configure HCataLog managed tables on HDFS (Hadoop
distributed file system):
Distribution and Version:
Select the product you are using as the Hadoop distribution from the drop-down list. The
options in the list vary depending on the component you are using. Among these options,
the Custom option allows you to connect to a custom Hadoop distribution rather than any
of the distributions given in this list and officially supported by Talend.
Note that if you use Hortonworks Data Platform V2.0.0, the type of the operating system
for running the distribution and a Talend Job must be the same, such as Windows or Linux.
HCat metastore: Enter the location of the HCatalog's metastore, which is actually Hive's
metastore, a system catalog. For further information about Hive and HCatalog, see http://
hive.apache.org/.
Database: The database in which tables are placed.
Table: The table in which data is stored.
Partition filter: Fill this field with the partition keys to list partitions by filter.
HCataLog Configuration area is enabled only when you select
HCatLoader from the Load function list. For further information
about the usage of HCataLog, see http://incubator.apache.org/hcatalog/
docs/. For further information about the usage of Partition
filter, see https://cwiki.apache.org/confluence/display/HCATALOG/Design
+Document+-+Java+APIs+for+HCatalog+DDL+Commands.
Field separator
Enter character, string or regular expression to separate fields for the transferred data.
This field is enabled only when you select PigStorage from the Load function
list.
Compression
Select the Force to compress the output data check box to compress the data when the
data is outputted by tPigStoreResult at the end of a Pig process.
Hadoop provides different compression formats that help reduce the space needed for
storing files and speed up data transfer. When you need to write and compress data using
the Pig program, by default you have to add a compression format as a suffix to the path
pointing to the folder in which you want to write data, for example, /user/ychen/out.bz2.
However, if you select this check box, the output data will be compressed even if you do
not add any compression format to that path, such as /user/ychen/out.
The output path is set in the Basic settings view of tPigStoreResult.
307
tPigLoad Properties
HBase
configuration
This area is available to the HBaseStorage function. The parameters to be set are:
Zookeeper quorum:
Type in the name or the URL of the Zookeeper service you use to coordinate the transaction
between Talend and HBase.
Zookeeper client port:
Type in the number of the client listening port of the Zookeeper service you are using.
Table name:
Enter the name of the HBase table you need to load data from.
Load key:
Select this check box to load the row key as the first column of the result schema. In this
situation, you must have created this column in the schema.
Mapping:
Complete this table to map the columns of the HBase table to be used with the schema
columns you have defined for the data flow to be processed.
Sequence Loader This area is available only to the SequenceFileLoader function. Since a SequenceFile
configuration
record consists of binary key/value pairs, the parameters to be set are:
Key column:
Select the Key column of a key/value record.
Value column
Select the Value column of a key/value record.
Die on
error
subjob This check box is cleared by default, meaning to skip the row on subjob error and to complete
the process for error-free rows.
Talend Studio uses a default configuration for its engine to perform operations in a Hadoop
distribution. If you need to use a custom configuration in a specific situation, complete this
table with the property or properties to be customized. Then at runtime, the customized
property or properties will override those default ones.
For further information about the properties required by Hadoop and its related systems
such as HDFS and Hive, see Apache's Hadoop documentation on http://hadoop.apache.org,
or the documentation of the Hadoop distribution you need to use.
Register jar
Click the
button to add rows to the table and from these rows, browse to the jar files
to be added. For example, in order to register a jar file called piggybank.jar, click the
button once to add one row, then click this row to display the
browse button, and click
this button to browse to the piggybank.jar file following the [Select Module] wizard.
Pig properties
Talend Studio uses a default configuration for its Pig engine to perform operations. If
you need to use a custom configuration in a specific situation, complete this table with
the property or properties to be customized. Then at runtime, the customized property or
properties will override those default ones.
For example, the default_parallel key used in Pig could be set as 20.
HBaseStorage
configuration
Add and set more HBaseStorage loader options in this table. The options are:
gt: the minimum key value;
lt: the maximum key value;
gte: the minimum key value (included);
lte: the maximum key value (included);
limit: maximum number of rows to retrieve per region;
308
Define the jars This check box appears when you are using tHCatLoader, while you can leave it clear as
to register for the Studio registers the required jar files automatically. In case any jar file is missing, you
HCatalog
can select this check box to display the Register jar for HCatalog table and set the correct
path to that missing jar.
Path separator in Leave the default value of the Path separator in server as it is, unless you have changed
server
the separator used by your Hadoop distribution's host machine for its PATH variable or in
other words, that separator is not a colon (:). In that situation, you must change this value
to the one you are using in that host.
Mapred job map If the Hadoop distribution to be used is Hortonworks Data Platform V1.2 or Hortonworks
memory mb and Data Platform V1.3, you need to set proper memory allocations for the map and reduce
Mapred job reduce computations to be performed by the Hadoop system.
memory mb
In that situation, you need to enter the values you need to in the Mapred job map memory
mb and the Mapred job reduce memory mb fields, respectively. By default, the values
are both 1000 which are normally appropriate for running the computations.
tStatCatcher
Statistics
Usage
Select this check box to gather the Job processing metadata at the Job level as well as at
each component level.
This component is always used to start a Pig process and needs tPigStoreResult at the end to output its data.
In the Map/Reduce mode, you need only configure the Hadoop connection for the first tPigLoad component
of a Pig process (a subjob), and any other tPigLoad component used in this process reuses automatically that
connection created by that first tPigLoad component.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio. The
following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR
client library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in
the MapR client jar file; without adding it, you may encounter the following error: no MapRClient in
java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native library
of that MapR client. This allows the subscription-based users to make full use of the Data viewer to view
locally in the Studio the data stored in MapR. For further information about how to set this argument, see the
section describing how to view data of Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding to the Hadoop
distribution you are using.
Limitation
Knowledge of Pig scripts is required. If you select HCatLoader as the load function, knowledge of HCataLog
DDL(HCataLog Data Definition Language, a subset of Hive Data Definition Language) is required. For further
information about HCataLog DDL, see http://incubator.apache.org/hcatalog/docs.
The HBase table to be used has three columns: id, name and age, among which id and age belong to the column
family, family1 and name to the column family, family2.
The data stored in that HBase table are as follows:
309
1;Albert;23
2;Alexandre;24
3;Alfred-Hubert;22
4;Andr;40
5;Didier;28
6;Anthony;35
7;Artus;32
8;Benot;56
9;Catherine;34
10;Charles;21
11;Christophe;36
12;Christian;67
13;Clment ;64
14;Danniel;54
15;Elisabeth;58
16;Emile;32
17;Gregory;30
In the Integration perspective of Talend Studio, create an empty Job, named hbase_storage for example,
from the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
3.
Configuring tPigLoad
1.
310
2.
Click the
3.
Click the
button four times to add four rows and rename them: rowkey, id, name and age. The rowkey
column put at the top of the schema to store the HBase row key column, but in practice, if you do not need
to load the row key column, you can create only the other three columns in your schema.
4.
Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
311
5.
In the Mode area, select Map/Reduce, as we are using a remote Hadoop distribution.
6.
In the Distribution and the Version fields, select the Hadoop distribution you are using. In this example, we
are using HortonWorks Data Platform V1.
7.
In the Load function field, select HBaseStorage. Then, the corresponding parameters to set appear.
8.
In the NameNode URI and the JobTracker host fields, enter the locations of those services, respectively.
9.
In the Zookeeper quorum and the Zookeeper client port fields, enter the location information of the
Zookeeper service to be used.
10. In the Table name field, enter the name of the table from which tPigLoad reads the data.
11. Select the Load key check box if you need to load the HBase row key column. In this example, we select it.
12. In the Mapping table, four rows have been added automatically. In the Column family:qualifier column,
enter the HBase columns you need to map with the schema columns you defined. In this scenario, we put
family1:id for the id column, family2:name for the name column and family1:age for the age column.
Configuring tPigStoreResult
1.
2.
In the Result file field, enter the directory where you need to store the result. As tPigStoreResult reuses
automatically the connection created by tPigLoad, the path in this scenario is the directory in the machine
hosting the Hadoop distribution to be used.
3.
4.
In the Store function field, select PigStorage to store the result in the UTF-8 format.
312
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
In Jobtracker, you can easily find the execution status of your Pig Job because the name of the Job is automatically
created by concatenating the name of the project that contains the Job, the name and version of the Job itself
and the label of the first tPigLoad component used in it. The naming convention of a Pig Job in Jobtracker is
ProjectName_JobNameVersion_FirstComponentName.
313
tPigMap
tPigMap
tPigMap properties
Component family
Function
tPigMap is fine-tuned for transforming and routing the data in a Pig process. It provides a
graphic interface that enables sophisticated configuration of multiple data flows.
Purpose
tPigMap transforms and routes data from single or multiple sources to single or multiple
destinations.
Basic settings
Mapping
display as
Map editor
Usage
Possible uses are from a simple reorganization of fields to the most complex Jobs of data
multiplexing or demultiplexing transformation, concatenation, inversion, filtering and more,
in a Pig process.
Limitation
The use of tPigMap supposes minimum Java and Pig Latin knowledge in order to fully exploit
its functionalities.
This component is a junction step, and for this reason cannot be a start nor end component
in the Job.
Value
Join Model
Inner Join;
Left Outer Join;
Right Outer Join;
Full Outer Join.
The default join option is Left Outer Join when you do not activate this
option settings panel by displaying it. These options perform the join of
two or more flows based on common field values.
When more than one lookup tables need joining, the main input flow starts
the joining from the first lookup flow, then uses the result to join the second
and so on in the same manner until the last lookup flow is joined.
Join Optimization
None;
Replicated;
Skewed;
314
Lookup properties
Value
Merge.
The default join option is None when you do not activate this option
settings panel by displaying it. These options are used to perform more
efficient join operations. For example, if you are using the parallelism of
multiple reduce tasks, the Skewed join can be used to counteract the load
imbalance problem if the data to be processed is sufficiently skewed.
Each of these options is subject to the constraints explained in Apache's
documentation about Pig Latin.
Custom Partitioner
Enter the Hadoop partitioner you need to use to control the partitioning of
the keys of the intermediate map-outputs. For example, enter, in double
quotation marks,
org.apache.pig.test.utils.SimpleCustomPartitioner
Enter the number of reduce tasks. For further information about the
parallel features, see Apache's documentation about Pig Latin..
Output properties
Value
True;
False.
This option, once activated, allows you to catch the records rejected by a
filter you can define in the appropriate area.
True;
False.
This option, once activated, allows you to catch the records rejected by the
inner join operation performed on the input flows.
315
The Hadoop distribution to be used is keeping the data about traffic situation such as normal or jam and the data
about the traffic-related events such as road work, rain and even no event. In this example, the data to be used
reads as follows:
1. The traffic situation data stored in the directory /user/ychen/tpigmap/date&traffic:
2013-01-11
2013-02-28
2013-01-26
2013-03-07
2013-02-07
2013-01-22
2013-03-17
2013-01-15
2013-03-19
2013-01-20
00:27:53;Bayshore Freeway;jam
07:01:18;Carpinteria Avenue;jam
11:27:59;Bayshore Freeway;normal
20:48:51;South Highway;jam
07:40:10;Lindbergh Blvd;normal
17:13:55;Pacific Hwy S;normal
23:12:26;Carpinteria Avenue;normal
08:06:53;San Diego Freeway;jam
15:18:28;Monroe Street;jam
05:53:12;Newbury Road;normal
For any given moment shown in the timestamps in the data, one row is logged to reflect the traffic situation and
another row to reflect the traffic-related event. You need to join the data into one table in order to easily detect
how the events on a given road are impacting the traffic.
The data used in this example is a sample with limited size.
316
To replicate this scenario, ensure that the Studio to be used has the appropriate right to read and write data in that
Hadoop distribution and then proceed as follows:
In the Integration perspective of Talend Studio, create an empty Job, named pigweather for example, from
the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
Drop two tPigLoad components, tPigMap and two tPigStoreResult onto the workspace.
The components can be labelled if needs be. In this scenario, we label the two tPigLoad components as traffic
and event, respectively, which load accordingly the traffic data and the related event data. Then we label the
two tPigStoreResult components as normal and jam, respectively, which write accordingly the results to
the Hadoop distribution to be used. For further information about how to label a component, see the Talend
Studio User Guide.
3.
Right-click the tPigLoad component labeled traffic to connect it to tPigMap using the Row > Pig combine
link from the contextual menu.
4.
Repeat this operation to link the tPigLoad component labeled event to tPigMap, too. As this is the second
link created, it becomes automatically the lookup link.
5.
Use the Row > Pig combine link again to connect tPigMap to each of the two tPigStoreResult components.
You need to name these links in the dialog box popped up once you select the link type from the contextual
menu. In this scenario, we name the link to tPigStoreResult labeled normal as out and the link to
tPigStoreResult labeled jam as reject.
Configuring tPigLoad
Loading the traffic data
1.
317
2.
Click the
3.
Click the
button three times to add three rows and in the Column column, rename them as date, street
and traffic, respectively.
4.
5.
In the Mode area, select the Map/Reduce option, as we need the Studio to connect to a remote Hadoop
distribution.
6.
In the Distribution list and the Version field, select the Hadoop distribution to be used. In this example, it
is Hortonworks Data Platform V1.0.0.
7.
In the Load function list, select the PigStorage function to read the source data, as the data is a structured
file in human-readable UTF-8 format.
318
8.
In the NameNode URI and the JobTracker host fields, enter the locations of the master node and the Job
tracker service of the Hadoop distribution to be used, respectively.
9.
In the Input file URI field, enter the directory where the data about the traffic situation is stored. As explained
earlier, the directory in this example is /user/ychen/tpigmap/date&traffic.
10. In the Field separator field, enter ; depending on the separator used by the source data.
2.
Click the
3.
Click the
button three times to add three rows and in the Column column, rename them as date, street
and event, respectively.
4.
5.
319
As you have configured the connection to the given Hadoop distribution in that first tPigLoad component,
traffic, this event component reuses that connection and therefore, the corresponding options in the
Distribution and the Version lists have been automatically selected.
6.
In the Load function field, select the PigStorage function to read the source data.
7.
In the Input file URI field, enter the directory where the event data is stored. As explained earlier, the
directory in this example is "/user/ychen/tpigmap/date&event".
Configuring tPigMap
On the input side (left side) of the Map Editor, each of the two tables represents one of the input flow, the
upper one for the main flow and the lower one for the lookup flow.
On the output side (right side), the two tables represent the output flows that you named as out1 and reject
earlier.
From the main flow table, drop its three columns onto each of the output flow table.
320
2.
From the lookup flow, drop the event column onto each of the output flow table.
Then from the Schema editor view, you can see the schemas of the both sides have been completed and as
well, click each table to display its schema in this view.
In the Join Model row, select Left Outer Join to ensure that every record of the main flow is included in
this join.
3.
On the out1 output flow table, click the
4.
Enter
'normal'== row1.traffic
This allows tPigMap to output only the traffic records reading normal in the out1 flow.
5.
6.
In the Catch Output Reject row, select true to output the traffic records reading jam in the reject flow.
7.
Click Apply, then click OK to validate these changes and accept the propagation prompted by the pop-up
dialog box.
Configuring tPigStoreResult
1.
2.
In the Result file field, enter the directory you need to write the result in. In this scenario, it is /user/ychen/
tpigmap/traffic_normal, which receives the records reading normal.
3.
4.
In the Store function list, select PigStorage to write the records in human-readable UTF-8 format.
5.
6.
Repeat the same operations to configure the tPigStoreResult labeled jam, but set the directory, in the Result
file field, as /user/ychen/tpigmap/traffic_jam.
321
If either of the components does not retrieve its schema from tPigMap, a warning icon appears. In this case, click the Sync
columns button to retrieve the schema from the preceding one and once done, the warning icon disappears.
From the traffic_jam records, you can analyze what event is often going on in the meantime of a traffic jam and
from the traffic_normal records, how the smooth traffic situation is maintained.
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
In Jobtracker, you can easily find the execution status of your Pig Job because the name of the Job is automatically
created by concatenating the name of the project that contains the Job, the name and version of the Job itself
and the label of the first tPigLoad component used in it. The naming convention of a Pig Job in Jobtracker is
ProjectName_JobNameVersion_FirstComponentName.
322
tPigReplicate
tPigReplicate
tPigReplicate Properties
Component family
Function
The tPigReplicate is used after an input Pig component, this component duplicates the incoming
schema into as many identical output flows as needed.
Purpose
This component allows you to perform different operations on the same schema.
Basic settings
Schema and Edit Schema A schema is a row description. It defines the number of fields to be
processed and passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in
mode is available.
Click Edit Schema to make changes to the schema.
Click Sync columns to retrieve the schema from the previous
component connected in the Job.
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component is not startable (green background); it requires tPigLoad as the input component
and expects other Pig components to handle its output flow(s).
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
Connections
323
Before starting to replicate this Job, ensure that you have the appropriate right to read and write data in the Hadoop
distribution to be used and that Pig is properly installed in that distribution.
In the Integration perspective of Talend Studio, create an empty Job, named Replicate for example, from
the Job Designs node in the Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
Drop tPigLoad, tPigReplicate, two tPigSort and two tPigStoreResult onto the workspace.
The tPigLoad component reads data from the given HDFS system. The sample data used in this scenario
reads as follows:
Andrew Kennedy;Mississippi
Benjamin Carter;Louisiana
Benjamin Monroe;West Virginia
Bill Harrison;Tennessee
Calvin Grant;Virginia
Chester Harrison;Rhode Island
Chester Hoover;Kansas
Chester Kennedy;Maryland
Chester Polk;Indiana
Dwight Nixon;Nevada
Dwight Roosevelt;Mississippi
Franklin Grant;Nebraska
324
Configuring tPigLoad
1.
2.
Click the
Click the
button twice to add two rows and name them Name and State, respectively.
3.
4.
Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
5.
In the Mode area, select Map/Reduce because the Hadoop to be used in this scenario is installed in a remote
machine. Once selecting it, the parameters to be set appear.
6.
In the Distribution and the Version lists, select the Hadoop distribution to be used.
7.
325
8.
In the NameNode URI field and the JobTracker host field, enter the locations of the NameNode and the
JobTracker to be used for Map/Reduce, respectively.
9.
In the Input file URI field, enter the location of the data to be read from HDFS. In this example, the location
is /user/ychen/raw/NameState.csv.
Configuring tPigReplicate
1.
2.
Click the
button next to Edit schema to open the schema editor to verify whether its schema is identical
with that of its preceding component.
If this component does not have the same schema of the preceding component, a warning icon appears. In this case,
click the Sync columns button to retrieve the schema from the preceding one and once done, the warning icon
disappears.
Configuring tPigSort
Two tPigSort components are used to sort the two identical output flows: one based on the Name column and
the other on the State column.
326
1.
Double-click the first tPigSort component to open its Component view to define the sorting by name.
2.
In the Sort key table, add one row by clicking the
3.
In the Column column, select Name from the drop-down list and select ASC in the Order column.
4.
Double-click the other tPigSort to open its Component view to define the sorting by state.
5.
In the Sort key table, add one row, then select Name from the drop-down list in the Column column and
select ASC in the Order column.
Configuring tPigStoreResult
Two tPigStoreResult components are used to write each of the sorted data into HDFS.
1.
Double-click either the first tPigStoreResult component to open its Component view to write the data sorted
by name.
327
2.
In the Result file field, enter the directory where the data will be written. This directory will be created if it
does not exist. In this scenario, we put /user/ychen/sort/tPigreplicate/byName.csv.
3.
4.
5.
6.
Do the same for the other tPigStoreResult component but set another directory for the data sorted by state.
In this scenario, it is /user/ychen/sort/tPigreplicate/byState.csv.
Once done, browse to the locations where the results were written in HDFS.
The following image presents the results sorted by name:
328
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
329
In Jobtracker, you can easily find the execution status of your Pig Job because the name of the Job is automatically
created by concatenating the name of the project that contains the Job, the name and version of the Job itself
and the label of the first tPigLoad component used in it. The naming convention of a Pig Job in Jobtracker is
ProjectName_JobNameVersion_FirstComponentName.
330
tPigSort
tPigSort
tPigSort Properties
Component family
Function
This component allows you to sort a relation based on one or more defined sort keys.
Purpose
The tPigSort component is used to sort relation based on one or more defined sort keys.
Basic settings
Advanced settings
Sort key
Click the Add button beneath the Sort key table to add one or more
lines to specify column and sorting order for each sort key.
Increase parallelism
Select this check box to set the number of reduce tasks for the
MapReduce Jobs
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component is commonly used as intermediate step together with input component and
output component.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
331
Drop the following components from the Palette to the design workspace: tPigSort, tPigLoad,
tPigStoreResult.
2.
3.
2.
Click the [...] button next to Edit schema to add columns for tPigLoad.
3.
Click the [+] button to add Name, Country and Age and click OK to save the setting.
4.
5.
Fill in the Input filename field with the full path to the input file.
In this scenario, the input file is CustomerList that contains rows of names, country names and age.
332
6.
7.
2.
Click Sync columns to retrieve the schema structure from the preceding component.
3.
Click the [+] button beneath the Sort key table to add a new sort key. Select Age from the Column list and
select ASC from the Order list.
This sort key will sort the data in CustomerList in ascending order based on Age.
2.
Click Sync columns to retrieve the schema structure from the preceding component.
3.
4.
Fill in the Result file field with the full path to the result file.
In this scenario, the result of filter is saved in Lucky_Customer file.
5.
6.
333
The Lucky_Customer file is generated containing the data in ascending order based on Age.
334
tPigStoreResult
tPigStoreResult
tPigStoreResult Properties
Component
family
Function
This component allows you to store the result of your Pig Job into a defined data storage space.
Purpose
The tPigStoreResult component is used to store the result into defined data storage space.
Basic settings
Property type
Schema
Schema
and
Edit A schema is a row description. It defines the number of fields to be processed and
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only. Related
topic: see Talend Studio User Guide.
Remove result directory Select this check box to remove an existing result directory.
if exists
This check box is disabled when you select HCatStorer from the Store
function list.
Store function
335
tPigStoreResult Properties
used, and then, in the field displayed next to this Store function field, specify that
function.
HCataLog
Configuration
Fill the following fields to configure HCataLog managed tables on HDFS (Hadoop
distributed file system):
Distribution and Version:
Select the Hadoop distribution to which you have defined the connection in the
tPigLoad component, used in the same Pig process of the active tPigStoreResult.
If that tPigLoad component connects to a custom Hadoop distribution, you must select
Custom for this tPigStoreResult component, too. Then the Custom jar table appears,
in which, you need to add only the jar files required by the selected Store function.
HCat metastore: Enter the location of the HCatalog's metastore, which is actually
Hive's metastore.
Database: The database in which tables are placed.
Table: The table in which data is stored.
Partition filter: Fill this field with the partition keys to list partitions by filter.
HCataLog Configuration area is enabled only when you select
HCatStorer from the Store function list. For further information
about the usage of HCataLog, see http://incubator.apache.org/hcatalog/
docs . For further information about the usage of Partition
filter, see https://cwiki.apache.org/confluence/display/HCATALOG/Design
+Document+-+Java+APIs+for+HCatalog+DDL+Commands.
HBase configuration
This area is available to the HBaseStorage function. The parameters to be set are:
Distribution and Version:
Select the Hadoop distribution to which you have defined the connection in the
tPigLoad component, used in the same Pig process of the active tPigStoreResult.
If that tPigLoad component connects to a custom Hadoop distribution, you must select
Custom for this tPigStoreResult component, too. Then the Custom jar table appears,
in which, you need to add only the jar files required by the selected Store function.
Zookeeper quorum:
Type in the name or the URL of the Zookeeper service you use to coordinate the
transaction between Talend and HBase.
Zookeeper client port:
Type in the number of the client listening port of the Zookeeper service you are using.
Table name:
Enter the name of the HBase table you need to store data in. The table must exist in
the target HBase.
Row key column:
Select the column used as the row key column of the HBase table.
Store row key column to Hbase column:
Select this check box to make the row key column an HBase column belonging to a
specific column family.
Mapping:
Complete this table to map the columns of the HBase table to be used with the schema
columns you have defined for the data flow to be processed.
The Column column of this table is automatically filled once you have defined the
schema; the syntax of the Column family:qualifier column requires each HBase
column name (qualifier) to be paired with its corresponding family name, for example,
336
tPigStoreResult Properties
in an HBase table, if a Paris column belongs to a France family, then you need to write
it as France:Paris.
Field separator
Enter character, string or regular expression to separate fields for the transferred data.
This field is enabled only when you select PigStorage from the Store
function list.
Sequence
Storage This area is available only to the SequenceFileStorage function. Since a SequenceFile
configuration
record consists of binary key/value pairs, the parameters to be set are:
Key column:
Select the Key column of a key/value record.
Value column
Select the Value column of a key/value record.
Advanced
settings
Register jar
Click the
button to add rows to the table and from these rows, browse to the jar
files to be added. For example, in order to register a jar file called piggybank.jar, click
the
button once to add one row, then click this row to display the
browse
button, and click this button to browse to the piggybank.jar file following the [Select
Module] wizard.
HBaseStorage
configuration
Add and set more HBaseStorage storer options in this table. The options are:
loadKey: enter true to store the row key as the first column of the result schema,
otherwise, enter false;
gt: the minimum key value;
lt: the maximum key value;
gte: the minimum key value (included);
lte: the maximum key value (included);
limit: maxum number of rows to retrieve per region;
caching: number of rows to cache;
caster: the converter to use for writing values to HBase. For example,
Utf8StorageConverter.
HCatalog
Configuration
Define the
register
jars
to This check box appears when you are using tHCatStorer, while by default, you can
leave it clear as the Studio registers the required jar files automatically. In case any jar
file is missing, you can select this check box to display the Register jar for HCatalog
table and set the correct path to that missing jar.
tStatCatcher Statistics
Usage
Select this check box to gather the Job processing metadata at the Job level as well as
at each component level.
This component is always used to end a Pig process and needs tPigLoad at the beginning of that chain to provide
data
This component reuses automatically the connection created by the tPigLoad component in that Pig process.
Note that if you use Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend Studio. The
following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added the MapR
client library to the PATH variable of that machine. For Windows, this library is lib\MapRClient.dll in
the MapR client jar file; without adding it, you may encounter the following error: no MapRClient in
java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the native library
of that MapR client. This allows the subscription-based users to make full use of the Data viewer to view
locally in the Studio the data stored in MapR. For further information about how to set this argument, see the
section describing how to view data of Talend Open Studio for Big Data Getting Started Guide.
337
Related Scenario
For further information about how to install a Hadoop distribution, see the manuals corresponding to the Hadoop
distribution you are using.
Limitation
Knowledge of Pig scripts is required. If you select HCatStorer as the store function, knowledge of HCataLog
DDL(HCataLog Data Definition Language, a subset of Hive Data Definition Language) is required. For further
information about HCataLog DDL, see http://incubator.apache.org/hcatalog/docs..
Related Scenario
1. Related scenario in which tPigStoreResult uses the Local mode, see section Scenario: Sorting data in
ascending order of tPigSort.
2. Related scenario in which tPigStoreResult uses the Map/Reduce mode, see section Scenario: Loading an
HBase table
338
tRiakBucketList
tRiakBucketList
tRiakBucketList properties
Component Family
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
If you select the Use an existing connection check box, the
Nodes table will not be available.
Nodes
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
CURRENT_BUCKET_NAME: indicates the current bucket name. This is a Flow variable and it returns
a string.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
339
tRiakClose
tRiakClose
tRiakClose properties
Component family
Function
Purpose
Basic settings
Component List
Advanced settings
tStatCatcher Statistics
Usage
Limitation
n/a
Related Scenario
For a scenario in which tRiakClose is used, see section Scenario: Exporting data from a Riak bucket to a local file.
340
tRiakConnection
tRiakConnection
tRiakConnection properties
Component Family
Function
Purpose
Basic settings
Nodes
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component is generally used with other Riak components, particularly tRiakClose.
Limitation
n/a
Related scenario
For a scenario in which tRiakConnection is used, see section Scenario: Exporting data from a Riak bucket to
a local file.
341
tRiakInput
tRiakInput
tRiakInput properties
Component family
Function
tRiakInput reads data from a Riak bucket and send data in the Talend flow.
Purpose
tRiakInput allows you to extract the desired data from a bucket in a Riak node so as to store or apply
changes to the data.
Basic settings
Schema
Schema
and
Edit A schema is a row description. It defines the number of fields to be processed and
passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Click Edit Schema to make changes to the schema.
Use existing connection Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
If you select the Use an existing connection check box, the Nodes table
will not be available.
Nodes
Bucket
Type in the name of the bucket from which you want to read data.
Key
Type in the key which is associated with the data that you want to read.
Select this check box and from the list select the desired column to which the keys
will be output.
Values column
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job level as well
as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
Limitation
n/a
342
Prerequisites: The Riak bucket from which you want to export data already exists. In this example, the data from
the bucket computer will be exported and the bucket has already imported the following data:
id; company; brand; price; owner
001; Dell; Inspiron 15; 299; Amanda
002; Dell; Inspiron 15R; 549; Linda
003; HP; Pavilion 500-210qe; 539; Marina
004; HP; Pavilion 500-075; 599; Diana
Drop the following components from the Palette to the design workspace: tRiakConnection, tRiakInput,
tFileOutputDelimited, and tRiakClose.
2.
3.
4.
Double-click tRiakConnection to open its Basic settings view in the Component tab.
343
2.
In the Nodes table, enter the information of a Riak cluster you want to connect to.
Double-click tRiakInput to open its Basic settings view in the Component tab.
2.
Click Edit schema to define the structure of exported data. In this example, three columns are defined: id,
company, and price.
3.
Select the Use an existing connection check box and then select the connection you have configured earlier.
In this example, it is tRiakConnection_1.
4.
In the Bucket field, enter the name of the bucket from which the data will be exported, computer in this
example.
344
5.
Select the Output key to column check box , and select the desired column from the list. id is selected in
this example.
6.
In the Value columns table, click twice the
respectively.
Double-click tFileOutputDelimited to open its Basic settings view in the Component tab.
2.
In the File Name field, enter the full path to the local file in which you want to store the exported data, D:/
Output/computer.txt in this example.
3.
4.
Double-click tRiakClose to open its Basic settings view in the Component tab.
2.
Select the connection you want to close from the Component List, tRiakConnection_1 in this example.
2.
3.
Go to the local directory where the file is stored and check the exported data from the Riak bucket.
345
346
tRiakKeyList
tRiakKeyList
tRiakKeyList properties
Component Family
Function
Purpose
tRiakKeyList allows you to retrieve a list of keys within a Riak bucket for analysis or development
purposes.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
If you select the Use an existing connection check box, the
Nodes table will not be available.
Nodes
Bucket
Type in the name of the bucket from which you want to retrieve all keys.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Global Variables
CURRENT_KEY: indicates the current key. This is a Flow variable and it returns a string.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component.
This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
347
tRiakOutput
tRiakOutput
tRiakOutput properties
Component family
Function
tRiakOutput receives data from the preceding component, and writes data into a Riak bucket.
Purpose
tRiakOutput allows you to write data into or delete data from a bucket in a Riak cluster.
Basic settings
Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
If you select the Use an existing connection check box, the Nodes
table will not be available.
Nodes
Bucket
Specify the name of the bucket to which you want to apply changes.
Action on data
Select this check box to let the Riak system generate keys for the values
automatically.
Key column
Select one column from the list to write its data into the Riak bucket as keys.
Note that the key must be unique across one bucket.
Value columns
Customize the columns to write their data into the Riak bucket as values.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job level as
well as at each component level.
Usage
This component is used as an output component and it always needs an incoming link.
Limitation
n/a
348
Related Scenario
Related Scenario
No scenario is available for the component yet.
349
tSqoopExport
tSqoopExport
tSqoopExport Properties
Component family
Function
tSqoopExport calls sqoop to transfer data from the Hadoop Distributed File System (HDFS)
to a relational database management system (RDBMS).
Sqoop is typically installed in every Hadoop distribution. But if the Hadoop
distribution you need to use have no Sqoop installed, you have to install one
on your own and ensure to add the Sqoop command line to the PATH variable
of that distribution. For further information about how to install Sqoop, see the
documentation of Sqoop.
Purpose
tSqoopExport is used to define the arguments required by Sqoop for transferring data to a
RDBMS.
Basic settings
Mode
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
350
tSqoopExport Properties
Configuration
Hadoop Version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend
Job must be the same, such as Windows or Linux.
NameNode URI
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables
you to use your user name to authenticate against the credentials
stored in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab
field.
351
tSqoopExport Properties
Enter the user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific owner with
appropriate read or write rights, this field allows you to execute
the Job directly under the user name that has the appropriate rights
to access the file or directory to be processed. Note that this field
is available depending on the distribution you are using.
Connection
Table Name
Export Dir
Specify Number of Mappers Select this check box to indicate the number of map tasks (parallel
processes) used to perform the data transfer.
If you do not want Sqoop to work in parallel, enter 1 in the
displayed field.
Advanced settings
Print Log
Verbose
Direct
Use
MySQL
delimiters
Additional arguments
Use speed parallel data Select this check box to enable quick parallel data transfers
transfers
between the Teradata database and the HortonWorks Hadoop
distribution. Then the Specific params table and the Use
additional params check box appear to allow you to specify the
Teradata parameters required by parallel transfers.
In the Specific params table, two columns are available:
Argument: select the parameters as needed from the dropdown list. They are the most common parameters for the
parallel transfer.
Value: type in the value of the parameters.
By selecting the Additional params check box, you make
the Specific additional params field displayed. In this field,
you can enter the Teradata parameters that you need to use
but are not provided in the Specific params table. The syntax
for a parameter is -Dparameter=value and when you put
more than one parameter in this field, separate them using
whitespace.
Available in the Use Commandline mode only.
352
tSqoopExport Properties
Hadoop properties
Mapred job map memory If the Hadoop distribution to be used is Hortonworks Data
mb and Mapred job reduce Platform V1.2 or Hortonworks Data Platform V1.3, you need
memory mb
to set proper memory allocations for the map and reduce
computations to be performed by the Hadoop system.
In that situation, you need to enter the values you need to
in the Mapred job map memory mb and the Mapred job
reduce memory mb fields, respectively. By default, the values
are both 1000 which are normally appropriate for running the
computations.
Usage
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component is used standalone. It respects the Sqoop prerequisites. You need necessary
knowledge about Sqoop to use it.
We recommend using the Sqoop of version 1.4+ in order to benefit the full functions of these
components.
For further information about Sqoop, see the Sqoop manual on: http://sqoop.apache.org/docs/
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with
Talend Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
If you have selected the Use Commandline mode, you need to use the host where Sqoop is
installed to run the Job using this component.
In either mode, you must add the driver file of the database to be used to the lib folder of
the Hadoop distribution you are using. For that purpose, use tLibraryLoad in the workspace
and connect it to this component using On Subjob Ok. For further information about
tLibraryLoad, see section tLibraryLoad.
Connections
353
Additional arguments
Additional arguments
Commandline mode
--driver
jdbc.driver.class
--direct-split-size
import.direct.split.size
--inline-lob-limit
import.max.inline.lob.size
--split-by
db.split.column
--warehouse-dir
hdfs.warehouse.dir
--enclosed-by
codegen.output.delimiters.enclose
--escaped-by
codegen.output.delimiters.escape
--fields-terminated-by
codegen.output.delimiters.field
--lines-terminated-by
codegen.output.delimiters.record
--optionally-enclosed-by
codegen.output.delimiters.required
--input-enclosed-by
codegen.input.delimiters.enclose
--input-escaped-by
codegen.input.delimiters.escape
--input-fields-terminated-by
codegen.input.delimiters.field
--input-lines-terminated-by
codegen.input.delimiters.record
--input-optionally-enclosed-by
codegen.input.delimiters.required
--hive-home
hive.home
--hive-import
hive.import
--hive-overwrite
hive.overwrite.table
--hive-table
hive.table.name
--class-name
codegen.java.classname
--jar-file
codegen.jar.file
--outdir
codegen.output.dir
--package-name
codegen.java.packagename
For further information about the arguments available in the Sqoop commandline mode, see the documentation
of Sqoop.
The arguments listed earlier for the Java API mode are subject to updates and changes by Java. For further
information about these arguments, see http://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/
SqoopOptions.java
Related scenario
No scenario is available for this component yet.
354
tSqoopImport
tSqoopImport
tSqoopImport Properties
Component family
Function
tSqoopImport calls Sqoop to transfer data from a relational database management system
(RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS).
Sqoop is typically installed in every Hadoop distribution. But if the Hadoop
distribution you need to use have no Sqoop installed, you have to install one
on your own and ensure to add the Sqoop command line to the PATH variable
of that distribution. For further information about how to install Sqoop, see the
documentation of Sqoop.
Purpose
tSqoopImport is used to define the arguments required by Sqoop for writing the data of your
interest into HDFS.
Basic settings
Mode
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
355
tSqoopImport Properties
Configuration
Hadoop Version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend
Job must be the same, such as Windows or Linux.
NameNode URI
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables
you to use your user name to authenticate against the credentials
stored in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab
field.
356
tSqoopImport Properties
Enter the user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific owner with
appropriate read or write rights, this field allows you to execute
the Job directly under the user name that has the appropriate rights
to access the file or directory to be processed. Note that this field
is available depending on the distribution you are using.
Connection
Enter the JDBC URL used to connect to the database where the
source data is stored.
Table Name
Advanced settings
Append
File format
Compress
Print Log
Verbose
Direct
Specify columns
Select this check box to display the column table where you can
specify the columns you want to transfer into HDFS.
Select this check box to use a WHERE clause that controls the
rows to be transferred. In the field displayed, you can type in the
condition used to select the rows you want. For example, type in
id >400 to import only the rows where the id column has a value
greater than 400.
Use
MySQL
delimiters
Query
Use query
default Select this check box to use MySQLs default delimiter set. This
check box is available only to the Commandline mode.
Select this check box to use the free-form query mode provided
by Sqoop.
Once selecting it, you are able to enter the free-form query you
need to use.
Then, you must specify the target directory and if the Sqoop
imports data in parallel, specify as well the Split by argument.
Once queries are entered here, the value of the
argument --fields-terminated-by can only be set to "\t"
in the Additional arguments table.
Select this check box to enter the path to the target location, in
HDFS, where you want to transfer the source data to.
This location should be a new directory; otherwise, you must
select the Append check box.
Specify Split by
Select this check box, then, enter the table column you need and
are able to use as the splitting column to split the workload.
For example, for a table where the id column is the key
column, enter tablename.id. Then Sqoop will split the data to
357
tSqoopImport Properties
Use speed parallel data Select this check box to enable quick parallel data transfers
transfers
between the Teradata database and the HortonWorks Hadoop
distribution. Then the Specific params table and the Use
additional params check box appear to allow you to specify the
Teradata parameters required by parallel transfers.
In the Specific params table, two columns are available:
Argument: select the parameters as needed from the dropdown list. They are the most common parameters for the
parallel transfer.
Value: type in the value of the parameters.
By selecting the Additional params check box, you make
the Specific additional params field displayed. In this field,
you can enter the Teradata parameters that you need to use
but are not provided in the Specific params table. The syntax
for a parameter is -Dparameter=value and when you put
more than one parameter in this field, separate them using
whitespace.
Available in the Use Commandline mode only.
Hadoop properties
Mapred job map memory If the Hadoop distribution to be used is Hortonworks Data
mb and Mapred job reduce Platform V1.2 or Hortonworks Data Platform V1.3, you need
memory mb
to set proper memory allocations for the map and reduce
computations to be performed by the Hadoop system.
In that situation, you need to enter the values you need to
in the Mapred job map memory mb and the Mapred job
reduce memory mb fields, respectively. By default, the values
are both 1000 which are normally appropriate for running the
computations.
Path separator in server
358
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
This component is used standalone. It respects the Sqoop prerequisites. You need necessary
knowledge about Sqoop to use it.
We recommend using the Sqoop of version 1.4+ in order to benefit the full functions of these
components.
For further information about Sqoop, see the Sqoop manual on: http://sqoop.apache.org/docs/
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with
Talend Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
If you have selected the Use Commandline mode, you need to use the host where Sqoop is
installed to run the Job using this component.
In either mode, you must add the driver file of the database to be used to the lib folder of
the Hadoop distribution you are using. For that purpose, use tLibraryLoad in the workspace
and connect it to this component using On Subjob Ok. For further information about
tLibraryLoad, see section tLibraryLoad.
Connections
359
id,wage,mod_date
0,2000,2008-06-26
1,2300,2011-06-12
2,2500,2007-01-15
3,3000,2010-05-02
04:25:59
05:29:45
11:59:13
15:34:05
In the Integration perspective of the Studio, create an empty Job from the Job Designs node in the
Repository tree view.
For further information about how to create a Job, see the Talend Studio User Guide.
2.
3.
2.
Click the Library field to display the drop-down list and select the jar file to be used from that list. In this
scenario, it is mysql-connector-java-5.1.22-bin.jar.
360
2.
3.
In the Version area, select the Hadoop distribution to be used and its version. If you cannot find from the list
the distribution corresponding to yours, select Custom so as to connect to a Hadoop distribution not officially
supported in the Studio.
For a step-by-step example about how to use this Custom option, see section Connecting to a custom Hadoop
distribution.
4.
In the NameNode URI field, enter the location of the master node, the NameNode, of the distribution to be
used. For example, hdfs://talend-cdh4-namenode:8020.
5.
In the JobTracker Host field, enter the location of the JobTracker of your distribution. For example, talendcdh4-namenode:8021.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs described in
Apache's documentation on http://hadoop.apache.org/.
6.
If the distribution to be used requires Kerberos authentication, select the Use Kerberos authentication check
box and complete the authentication details. Otherwise, leave this check box clear.
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains
pairs of Kerberos principals and encrypted keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal designates but
must have the right to read the keytab file being used. For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in this situation, ensure that user1 has the right to read
the keytab file to be used.
7.
In the Connection field, enter the URI of the MySQL database where the source table is stored. For example,
jdbc:mysql://10.42.10.13/mysql.
8.
361
9.
In the Table Name field, enter the name of the source table. In this scenario, it is sqoopmerge.
10. From the File format list, select the format that corresponds to the data to be used, textfile in this scenario.
11. Click the Advanced settings tab to open its view.
12. Select the Specify target dir check box and enter the directory where you need to import the data to. For
example, /user/ychen/target_old.
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
362
tSqoopImportAllTables
tSqoopImportAllTables
tSqoopImportAllTables Properties
Component family
Function
Purpose
tSqoopImportAllTables is used to define the arguments required by Sqoop for writing all of
the tables of a database into HDFS.
Basic settings
Mode
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
363
tSqoopImportAllTables Properties
Configuration
Hadoop Version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend
Job must be the same, such as Windows or Linux.
NameNode URI
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables
you to use your user name to authenticate against the credentials
stored in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab
field.
364
tSqoopImportAllTables Properties
Enter the user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific owner with
appropriate read or write rights, this field allows you to execute
the Job directly under the user name that has the appropriate rights
to access the file or directory to be processed. Note that this field
is available depending on the distribution you are using.
Connection
Enter the JDBC URL used to connect to the database where the
source data is stored.
File format
Specify Number of Mappers Select this check box to indicate the number of map tasks (parallel
processes) used to perform the data transfer.
If you do not want Sqoop to work in parallel, enter 1 in the
displayed field.
Advanced settings
Compress
Print Log
Verbose
Direct
Use
MySQL
delimiters
default Select this check box to use MySQLs default delimiter set. This
check box is available only to the Commandline mode.
Additional arguments
Hadoop properties
Mapred job map memory If the Hadoop distribution to be used is Hortonworks Data
mb and Mapred job reduce Platform V1.2 or Hortonworks Data Platform V1.3, you need
memory mb
to set proper memory allocations for the map and reduce
computations to be performed by the Hadoop system.
In that situation, you need to enter the values you need to
in the Mapred job map memory mb and the Mapred job
reduce memory mb fields, respectively. By default, the values
are both 1000 which are normally appropriate for running the
computations.
365
Related scenario
Usage
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component is used standalone. It respects the Sqoop prerequisites. You need necessary
knowledge about Sqoop to use it.
We recommend using the Sqoop of version 1.4+ in order to benefit the full functions of these
components.
For further information about Sqoop, see the Sqoop manual on: http://sqoop.apache.org/docs/
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with
Talend Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
If you have selected the Use Commandline mode, you need to use the host where Sqoop is
installed to run the Job using this component.
In either mode, you must add the driver file of the database to be used to the lib folder of
the Hadoop distribution you are using. For that purpose, use tLibraryLoad in the workspace
and connect it to this component using On Subjob Ok. For further information about
tLibraryLoad, see section tLibraryLoad.
The preconditions required by Sqoop for using its import-all-tables tool must be satisfied. For
further information, please see the manual of Sqoop.
Connections
Related scenario
No scenario is available for this component yet.
366
tSqoopMerge
tSqoopMerge
tSqoopMerge Properties
Component family
Function
tSqoopMerge reads two datasets in HDFS and combines them both using a merge class that is
able to parse the datasets, with the newer records overwriting the older records.
Sqoop is typically installed in every Hadoop distribution. But if the Hadoop
distribution you need to use have no Sqoop installed, you have to install one
on your own and ensure to add the Sqoop command line to the PATH variable
of that distribution. For further information about how to install Sqoop, see the
documentation of Sqoop.
Purpose
tSqoopMerge is typically used to perform an incremental import that updates an older dataset
with newer records. The file types of the newer and the older datasets must be the same.
Basic settings
Mode
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on
the component you are using. Among these options, the Custom
option allows you to connect to a custom Hadoop distribution
rather than any of the distributions given in this list and officially
supported by Talend.
In order to connect to a custom distribution, once selecting
Custom, click the
you can alternatively:
367
tSqoopMerge Properties
Configuration
Hadoop Version
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of
the operating system for running the distribution and a Talend
Job must be the same, such as Windows or Linux.
NameNode URI
Use kerberos authentication If you are accessing the Hadoop cluster running with Kerberos
security, select this check box, then, enter the Kerberos principal
name for the NameNode in the field displayed. This enables
you to use your user name to authenticate against the credentials
stored in Kerberos.
This check box is available depending on the Hadoop distribution
you are connecting to.
Use a keytab to authenticate Select the Use a keytab to authenticate check box to log into
a Kerberos-enabled Hadoop system using a given keytab file. A
keytab file contains pairs of Kerberos principals and encrypted
keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab
field.
368
tSqoopMerge Properties
Folders to merge
Enter the user name under which you want to execute the Job.
Since a file or a directory in Hadoop has its specific owner with
appropriate read or write rights, this field allows you to execute
the Job directly under the user name that has the appropriate rights
to access the file or directory to be processed. Note that this field
is available depending on the distribution you are using.
Old data
New data
Target folder
Enter the directory where you need to put the output of the
merging.
Merge key
Enter the name of the column used as the key of each record for
the merging.
This primary key must be unique.
Select this check box to generate the merge jar file and the merge
class required to parse the datasets to be merged. The default
name of the jar file and the class is SqoopMerge_component_ID.
This compnent_ID is the ID of the tSqoopMerge component that
generates the jar file and the class, such as tSqoopMerge_1, or
tSqoopMerge_2.
As this jar file is generated from the source table of the imported
data, selecting this check box displays the corresponding
parameters to be set for connecting to that table.
In a Job, you need a database jar file to access the source table.
This requires you to use tLibraryLoad to load that database jar
file.
Connection
Enter the JDBC URL used to connect to the database where the
source data is stored.
Table Name
JAR file
Advanced settings
Print Log
Verbose
Select this check box to display the Class name field and enter
the name of the merge class you need to use.
This check box must be clear if you use Generate the JAR file
in the Basic settings tab.
Additional arguments
369
tSqoopMerge Properties
Mapred job map memory If the Hadoop distribution to be used is Hortonworks Data
mb and Mapred job reduce Platform V1.2 or Hortonworks Data Platform V1.3, you need
memory mb
to set proper memory allocations for the map and reduce
computations to be performed by the Hadoop system.
In that situation, you need to enter the values you need to
in the Mapred job map memory mb and the Mapred job
reduce memory mb fields, respectively. By default, the values
are both 1000 which are normally appropriate for running the
computations.
Usage
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component is used standalone. It respects the Sqoop prerequisites. You need necessary
knowledge about Sqoop to use it.
We recommend using the Sqoop of version 1.4+ in order to benefit the full functions of these
components.
For further information about Sqoop, see the Sqoop manual on: http://sqoop.apache.org/docs/
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with
Talend Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
Limitation
If you have selected the Use Commandline mode, you need to use the host where Sqoop is
installed to run the Job using this component.
In either mode, you must add the driver file of the database to be used to the lib folder of
the Hadoop distribution you are using. For that purpose, use tLibraryLoad in the workspace
and connect it to this component using On Subjob Ok. For further information about
tLibraryLoad, see section tLibraryLoad.
Connections
370
Row: Iterate;
Trigger: Run if; On Subjob Ok; On Subjob Error; On Component
Ok; On Component Error
For further information regarding connections, see Talend Studio
User Guide.
The first dataset (the old one before the modifications) to be used in this scenario reads as follows:
id,wage,mod_date
0,2000,2008-06-26
1,2300,2011-06-12
2,2500,2007-01-15
3,3000,2010-05-02
04:25:59
05:29:45
11:59:13
15:34:05
04:25:59
05:29:45
11:59:13
18:00:00
In the Integration perspective of the Studio, create an empty Job from the Job Designs node in the
Repository tree view.
371
For further information about how to create a Job, see Talend Studio User Guide.
2.
Configuring tLibraryLoad
1.
2.
Click the Library field to display the drop-down list and select the jar file to be used from that list. In this
scenario, it is mysql-connector-java-5.1.22-bin.jar.
Configuring tSqoopMerge
1.
372
2.
3.
In the Version area, select the Hadoop distribution to be used and its version. If you cannot find from the list
the distribution corresponding to yours, select Custom so as to connect to a Hadoop distribution not officially
supported in the Studio.
For a step-by-step example about how to use this Custom option, see section Connecting to a custom Hadoop
distribution.
4.
In the NameNode URI field, enter the location of the master node, the NameNode, of the distribution to be
used. For example, hdfs://talend-cdh4-namenode:8020.
5.
In the JobTracker Host field, enter the location of the JobTracker of your distribution. For example, talendcdh4-namenode:8021.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs described in
Apache's documentation on http://hadoop.apache.org/.
6.
If the distribution to be used requires Kerberos authentication, select the Use Kerberos authentication check
box and complete the authentication details. Otherwise, leave this check box clear.
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains
pairs of Kerberos principals and encrypted keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal designates but
must have the right to read the keytab file being used. For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in this situation, ensure that user1 has the right to read
the keytab file to be used.
7.
In the Old data directory and the New data directory fields, enter the path, or browse to the directory in
HDFS where the older and the newer datasets are stored, respectively.
8.
In the Target directory field, enter the path, or browse to the folder you need to store the merge result in.
9.
In the Merge key field, enter the column to be used as the key for the merge. In this scenario, the column is id.
373
10. Select Generate the JAR file to display the connection parameters to the source database table.
11. In the Connection field, enter the URI of the MySQL database where the source table is stored. For example,
jdbc:mysql://10.42.10.13/mysql.
12. In the Table Name field, enter the name of the source table. In this scenario, it is sqoopmerge.
13. In Username and Password, enter the authentication information.
14. If the field delimiter of the source table is not coma (,), you still need to specify the delimiter in the Additional
Arguments table in the Advanced settings tab. The argument to be used is codegen.output.delimiters.field
for the Use Java API mode or --fields-terminated-by for the Use Commandline mode.
Once done, you can verify the results in the target directory you have specified, in the web console of the Hadoop
distribution used.
374
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
If you continue to import updated datasets to HDFS from the same source table, you can reuse the generated merge
class to merge the datasets.
375
Business components
This chapter details the major components that you can find in Business group of the Palette in the Integration
perspective of Talend Studio.
The Business component family groups connectors that covers specific Business needs, such as reading and writing
CRM, or ERP types of database and reading from or writing to an SAP system.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tAlfrescoOutput
tAlfrescoOutput
tAlfrescoOutput Properties
Component family
Business
Function
Creates dematerialized documents in an Alfresco server where they are indexed under
meaningful models.
Purpose
Basic settings
URL
Base
Target Location
Select the Map... check box and then in the Column list, select
the target location column.
Note: When you type in the base name, make sure to use the
double backslash (\\) escape character.
Create Or Update Mode
Document Mode
Select in the list the mode you want to use for the created
document.
Create only: creates a document if it does not exist.
Note that an error message will display if you try to create a
document that already exists
Create or update: creates a document if it does not exist or
updates the document if it exists.
Container Mode
Select in the list the mode you want to use for the destination
folder in Alfresco.
Update only: updates a destination folder if the folder exists.
Note that an error message will display if you try to update a
document that does not exist
Create or update: creates a destination folder if it does not exist
or updates the destination folder if it exists.
Property Mapping
378
tAlfrescoOutput Properties
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Click Edit Schema to make changes to the schema.
Result Log File Name
Browse to the file where you want to save any logs related to the
Job execution.
Die on error
Advanced settings
Configure Target Location Allows to configure the (by default) type of containers (folders)
Container
Select this check box to display new fields where you can modify
the container type to use your own created types based on the
father/child model.
Permissions
Configure Permissions
Encoding
Association
Mapping
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory.
Target Allows to create new documents in Alfresco with associated links
towards other documents already existing in Alfresco, to facilitate
the navigation process for example.
To create associations:
tStatCatcher Statistics
Global Variables
1.
2.
Click the Add button and select a model where you have
already defined aspects that contain associations.
3.
Click the drop-down arrow at the top of the editor and select
the corresponding document type.
4.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation/Prerequisites
To be able to use the tAlfrescoOutput component, few relevant resources need to be installed:
check the Installation Procedure sub section below for more information.
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
379
tAlfrescoOutput Properties
Installation procedure
To be able to use tAlfrescoOutput in the Integration perspective of Talend Studio, you need first to install the
Alfresco server with few relevant resources.
The below sub sections detail the prerequisite and the installation procedure.
Prerequisites
Start with the following operations:
1.
2.
3.
4.
5.
From the installation folder (C:\alfresco), launch the alfresco server using the script alf_start.bat
Make sure that the Alfresco server is launched correctly before start using the tAlfrescoOutput component.
2.
Add the authentification filter of the commands to the web.xml file located in the path
C:\alfresco\tomcat\webapps\alfresco\WEB-INF
son WEB-INF/
following the model of the example provided in talendalfresco_20081014/alfresco folder of the zipped
file talendalfresco_20081014.zip
The following figures show the portion of lines (in blue) to add in the file web.xml alfresco.
380
tAlfrescoOutput Properties
381
tAlfrescoOutput Properties
382
Drop the tFileInputDelimited and tAlfrescoOutput components from the Palette onto the design
workspace.
2.
Connect the two components together using a Main > Row connection.
2.
383
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
In this scenario, the delimited file provides the metadata and path of two documents we want to create in the
Alfresco server. The input schema for the documents consists of four columns: file_name, destination_folder
name, source_path, and author.
And therefore the input schema of the delimited file will be as the following:
384
2.
In the Alfresco Server area, enter the Alfresco server URL and user authentication information in the
corresponding fields.
3.
In the TargetLocation area, either type in the base name where to put the document in the server, or Select the
Map... check box and then in the Column list, select the target location column, destination_folder_name
in this scenario.
When you type in the base name, make sure to use the double backslash (\\) escape character.
4.
In the Document Mode list, select the mode you want to use for the created documents.
5.
In the Container Mode list, select the mode you want to use for the destination folder in Alfresco.
Click the Define Document Type three-dot button to open the tAlfrescoOutput editor.
385
2.
Click the Add button to browse and select the xml file that holds the metadata according to which you want
to save the documents in Alfresco.
All available aspects in the selected model file display in the Available Aspects list.
You can browse for this model folder locally or on the network. After defining the aspects to use for the document to
be created in Alfresco, this model folder is not needed any more.
3.
If needed, select in the Available Aspects list the aspect(s) to be included in the metadata to write in the
Alfresco server. In this scenario we want the author name to be part of the metadata registered in Alfresco.
4.
Click the drop-down arrow at the top of the editor to select from the list the type to give to the created
document in Alfresco, Content in this scenario.
All the defined aspects used to select the metadata to write in the Alfresco server display in the Property
Mapping list in the Basic Settings view of tAlfrescoOutput, three aspects in this scenario, two basic for the
Content type (content and name) and an additional one (author).
Click Sync columns to auto propagate all the columns of the delimited file.
If needed, click Edit schema to view the output data structure of tAlfrescoOutput.
386
2.
Click the three-dot button next to the Result Log File Name field and browse to the file where you want to
save any logs after Job execution.
3.
The two documents are created in Alfresco using the metadata provided in the input schemas.
387
tMarketoInput
tMarketoInput
tMarketoInput Properties
Component family
Business/Cloud
Function
Purpose
The tMarketoInput component allows you to retrieve data from a Marketo DB on a Web server.
Basic settings
Endpoint address
The URL of the Marketo Web server for the SOAP API calls to.
Secret key
Client Access ID
Operation
Options in this list allow you to retrieve lead data from Marketo to
external systems.
getLead: This operation retrieves basic information of leads and
lead activities in Marketo DB.
getMultipleLeads: This operation retrieves lead records in batch.
getLeadActivities: This operation retrieves the history of activity
records for a single lead identified by the provided key.
getLeadChanges: This operation checks the changes on Lead data
in Marketo DB.
Columns Mapping
388
LeadKey type
LeadKey value
Related Scenario
Last Updated At
Type in the time of last update to retrieve only the data since the last
specified time. The time format is YYYY-MM-DD HH:MM:SS.
This field is displayed only when you select getMultipleLeads
from the Operation list.
Type in the time of the earliest creation to retrieve only the data since
the specified time. The time format is YYYY-MM-DD HH:MM:SS
Z.
This field is displayed only when you select getLeadChanges from
the Operation list.
Type in the time of the latest creation to retrieve only the data before
the specified time. The time format is YYYY-MM-DD HH:MM:SS
Z.
This field is displayed only when you select getLeadChanges from
the Operation list.
Oldest create date and Latest create date can be specified together
or separately.
Batch Size
Timeout (milliseconds)
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Reject connection.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Related Scenario
For a related use case, see section Scenario: Data transmission between Marketo DB and an external system.
389
tMarketoListOperation
tMarketoListOperation
tMarketoListOperation Properties
Component family
Business/Cloud
Function
The tMarketoListOperation component adds/removes one or more leads to/from a list in the
Marketo DB; It also verifies if one or more leads exist in a list in Marketo DB.
Purpose
The tMarketoListOperation component allows you to add/remove one or more leads to/from
a list in the Marketo DB on a Web server. Also, you can verify the existence of one or more
leads in a list in the Marketo DB.
Basic settings
Endpoint address
The URL of the Marketo Web server for the SOAP API calls to.
Secret key
Marketo
Support
via
Client Access ID
Marketo
Support
via
Operation
Options in this list allow you carry out the adding/deletion one or
more leads to/from a list in the Marketo DB; Also you can verify
the existence of single or multiple leads in a list in the Marketo
DB.
addTo: This operation adds one or more leads to a list in the
Marketo DB.
isMemberOf: This operation checks the Marketo DB to judge
whether the specific leads exist in the list.
removeFrom: This operation removes one or more leads from a
list in the Marketo DB.
Add or remove multiple Select this check box to add multiple leads to or remove multiple
leads
leads from a list in the Marketo DB.
This check box appears only when you select addTo
or removeFrom from the Operation list.
390
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject
connection.
Timeout (milliseconds)
tStatCatcher Statistics
Usage
Limitation
n/a
2.
3.
Double-click tFixedFlowInput to define the component properties in its Basic settings view.
391
2.
Click the three-dot button next to Edit schema to set the schema manually.
3.
Click the plus button to add four columns: ListKeyType, ListKeyValue, LeadKeyType and LeadKeyValue.
Keep the settings as default. Then click OK to save the settings.
4.
5.
Click the plus button to add a new line and fill the line with respective values. In this example, these values
are: MKTOLISTNAME for ListKeyType, bchenTestList for ListKeyValue, IDNUM for LeadKeyType and
308408 for LeadKeyValue.
Configuring tMarketoListOperation
1.
Double-click tMarketoListOperation to define the component properties in its Basic settings view.
2.
Click the Sync columns button to retrieve the schema defined in tFixedFlowInput.
3.
4.
Fill the Endpoint address field with the URL of the Marketo Web server. In this example, it is https://nac.marketo.com/soap/mktows/1_5.
Note that the URL used in this scenario is for demonstration purpose only.
5.
Fill the Secret key field with encrypted authentication code assigned by Marketo. In this example, it is
464407637703554044DD11AA2211998.
6.
Fill the Client Access ID field with the user ID. In this example, it is mktodemo41_785133934D1A219.
392
7.
8.
Type in the limit of query timeout in the Timeout field. In this example, use the default number: 60000.
Job Execution
1.
Double-click tLogRow to define the component properties in its Basic settings view.
2.
Click the Sync columns button to retrieve the schema defined in tMarketoListOperation.
3.
4.
The result of adding a lead record to a list in Marketo DB is displayed on the Run console.
393
tMarketoOutput
tMarketoOutput
tMarketoOutput Properties
Component family
Business/Cloud
Function
Purpose
The tMarketoOutput component allows you to write data into a Marketo DB on a Web server.
Basic settings
Endpoint address
The URL of the Marketo Web server for the SOAP API calls to.
Secret key
Marketo
Support
via
Client Access ID
Marketo
Support
via
Operation
Columns Mapping
De-duplicate lead record on Select this check box to de-duplicate and update lead records
email address
using email address.
Deselect this check box to create another lead which contains the
same email address.
This check box will be displayed only when you select
syncMultipleLeads from the Operation list.
394
Batch Size
Timeout (milliseconds)
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject
connection.
Advanced settings
tStatCatcher Statistics
Usage
Limitation
n/a
2.
395
3.
4.
5.
Configuring tFileInputDelimited
1.
Double-click tFileInputDelimited to define the component properties in its Basic settings view.
2.
Click the three-dot button next to the File name/Stream field to select the source file for data insertion. In
this example, it is D:/SendData.csv.
3.
Click the three-dot button next to Edit schema to set the schema manually.
4.
Click the plus button to add four columns: Id, Email, ForeignSysPersonId and ForeignSysType. Set the Type
of Id to Integer and keep the rest as default. Then click OK to save the settings.
5.
Type in 1 in the Header field and keep the other settings as default.
Configuring tMarketoOutput
1.
396
Double-click tMarketoOutput to define the component properties in its Basic settings view.
2.
Click the Sync columns button to retrieve the schema defined in tFileInputDelimited and fill the Endpoint
address field with the URL of the Marketo Web server. In this example, it is https://na-c.marketo.com/soap/
demo/demo1.
Note that the URL used in this scenario is for demonstration purpose only.
3.
Fill the Secret key field with encrypted authentication code assigned by Marketo. In this example, it is
1234567894DEMOONLY987654321.
4.
5.
Select syncMultipleLeads from the Operation list and type in the limit of query timeout in the Timeout
field. In this example, use the default number: 600000.
the
user
ID.
In
this
example,
it
is
Configuring tMarketoInput
1.
Double-click tMarketoInput to define the component properties in its Basic settings view.
397
2.
3.
In Columns Mapping area, type in test@talend.com in Columns in Marketo column to set the Email
column.
Note that all the data used in this scenario is for demonstration purpose only.
4.
From the LeadKey type list, select EMAIL and fill the LeadKey value field with test@talend.com.
5.
Configuring tFileOutputDelimited
1.
Double-click tFileOutputDelimited to define the component properties in its Basic settings view.
2.
Click the three-dot button next to the File name field to synchronize data to a local file. In this example, it
is D:/ReceiveData.csv.
3.
Click the Sync columns button and keep the rest of the settings as default.
398
2.
In the Code field, type in following code to count the number of API calls throughout the data operations:
System.out.println(("The Number of API calls for inserting
data to Marketo DB is:"));
System.out.println((Integer)globalMap.get("tMarketoOutput_1_NB_CALL"));
System.out.println(("The Number of API calls for data synchronization
from Marketo DB is:"));
System.out.println((Integer)globalMap.get("tMarketoInput_1_NB_CALL"));
Job execution
1.
2.
The number of API calls throughout each data operation is displayed on the Run console.
399
tMicrosoftCrmInput
tMicrosoftCrmInput
tMicrosoftCrmInput Properties
Component family
Function
Purpose
Allows to extract data from a Microsoft CRM DB based on conditions set on specific columns.
Basic settings
Authentication Type
Microsoft Webservice URL Type in the webservice URL to connect to the Microsoft CRM
DB.
(Available when On_Premise
Authentication Type list.)
is
selected
from
the
Organizename
Domain
is
selected
from
Host
Port
is
selected
from
the
the
Entity
Logical operators used to In the case you want to combine the conditions you set on
combine conditions
columns, select the combine mode you want to use.
Conditions
400
Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows
tStatCatcher Statistics
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Drop the following components from the Palette to the design workspace: tFileInputdelimited,
tFileOutputDelimited, tMicrosoftCrmInput, and tMicrosoftCrmOutput.
2.
3.
4.
401
Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows
Configuring tFileInputDelimited
1.
Double-click tFileInputDelimited to display its Basic settings view and define its properties
2.
Click the three-dot button next to the File Name/Input Stream field and browse to the delimited file that
holds the input data. The input file in this example contains the following columns: new_id, new_status,
new_firstname, new_email, new_city, new_initial and new_zipcode.
3.
In the Basic settings view, define the Row Separator allowing to identify the end of a row. Then define the
Field Separator used to delimit fields in a row.
4.
If needed, define the header, footer and limit number of processed rows in the corresponding fields. In this
example, the header, footer and limits are not set.
5.
Click Edit schema to open a dialog box where you can define the input schema you want to write in Microsoft
CRM database.
402
Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows
6.
Configuring tMicrosoftCrmOutput
1.
Double-click tMicrosoftCrmOutput to display the component Basic settings view and define its properties.
2.
Enter the Microsoft Web Service URL as well as the user name and password in the corresponding fields.
3.
In the OrganizeName field, enter the name that is given the right to access the Microsoft CRM database.
4.
In the Domain field, enter the domain name of the server on which Microsoft CRM is hosted, and then enter
the host IP address and the listening port number in the corresponding fields.
5.
In the Action list, select the operation you want to carry on. In this example, we want to insert data in a
custom entity in Microsoft Crm.
6.
In the Time out field, set the amount of time (in seconds) after which the Job will time out.
7.
In the Entity list, select one among those offered. In this example, CustomEntity is selected.
If CustomEntity is selected, a Custom Entity Name field displays where you need to enter a name for the custom
entity.
The Schema is then automatically set according to the entity selected. If needed, click Edit schema to display
a dialog box where you can modify this schema and remove the columns that you do not need in the output.
8.
Click Sync columns to retrieve the schema from the preceding component.
403
Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows
Configuring tMicrosoftCrmInput
1.
Double-click tMicrosoftCrmInput to display the component Basic settings view and define its properties.
2.
Enter the Microsoft Web Service URL as well as the user name and password in the corresponding fields and
enter the name that is given the right to access the Microsoft CRM database in the OrganizeName field.
3.
In the Domain field, enter the domain name of the server on which Microsoft CRM is hosted, and then enter
the host IP address and the listening port number in the corresponding fields.
4.
In the Time out field, set the amount of time (in seconds) after which the Job will time out.
5.
In the Entity list, select the one among those offered you want to connect to. In this example, CustomEntity
is selected.
6.
The Schema is then automatically set according to the entity selected. But you can modify it according
to your needs. In this example, you should set the schema manually since you want to access a custom
entity. Copy the seven-column schema from tMicrosoftCrmOutput and paste it in the schema dialog box
in tMicrosoftCrmInput.
404
Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows
7.
Click OK to close the dialog box. You will be prompted to propagate changes. Click Yes in the popup
message.
8.
In the Basic settings view, select And or Or as the logical operator you want to use to combine the conditions
you set on the input columns. In this example, we want to set two conditions on two different input columns
and we use And as the logical operator.
9.
In the Condition area, click the plus button to add as many lines as needed and then click in each line in
the Input column list and select the column you want to set condition on. In this example, we want to set
conditions on two columns, new-city and new_id. We want to extract all customer rows whose city is equal
to New York and whose id is greater than 2.
10. Click in each line in the Operator list and select the operator to bind the input column with its value, in this
example Equal is selected for new_city and Greater Than for new_id.
11. Click in each line in the Value list and set the column value, New York for new_city and 2 for new_id in this
example. You can use a fixed or a context value in this field.
Configuring tFileOutputDelimited
1.
Double-click tFileOutputdelimited to display the component Basic settings view and define its properties.
405
Scenario: Writing data in a Microsoft CRM database and putting conditions on columns to extract specified rows
2.
Click the three-dot button next to the File Name field and browse to the output file.
3.
4.
Select the Append check box if you want to add the new rows at the end of the records.
5.
Select the Include Header check box if the output file includes a header.
6.
Click Sync columns to retrieve the schema from the preceding component.
Job execution
Save the Job and press F6 to execute it.
Only customers who live in New York city and those whose id is greater than 2 are listed in the output file
you stored locally.
406
tMicrosoftCrmOutput
tMicrosoftCrmOutput
tMicrosoftCrmOutput Properties
Component family
Function
Purpose
Basic settings
Authentication Type
Microsoft Webservice URL Type in the webservice URL to connect to the Microsoft CRM
DB.
(Available when On_Premise
Authentication Type list.)
is
selected
from
the
Organizename
Domain
is
selected
from
Host
Port
is
selected
from
the
the
Action
Select in the list the action you want to do on the CRM data.
Available actions are: insert, update, and delete.
Entity
Advanced settings
Global Variables
tStatCatcher Statistics
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
407
Related Scenario
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Related Scenario
For a related use case, see section Scenario: Writing data in a Microsoft CRM database and putting conditions
on columns to extract specified rows.
408
tOpenbravoERPInput
tOpenbravoERPInput
tOpenbravoERPInput properties
Component Family
Business
Function
Purpose
This component allows you to extract data from OpenBravoERP database according to the
conditions defined in specific columns.
Basic settings
Openbravo
WebService URL
REST Enter the URL of the Web service that allows you to connect to
the OpenbravoERP database.
Username et Password
Entity
WHERE Clause
Order by
Select this check bow to define how to order the results (the
elements in the drop-down list depend on the entity selected)
Sort: Choose whether to organise the results in either Ascending
or Descending order.
Advanced settings
First result
Max result
Advanced separator (for Select this check box to modify the separators to be used for the
numbers)
numbers. Either:
Thousands separator
or
Decimal separator
tStatCatcher Statistics
Global Variables
Select this check box to collect the log data at a component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
409
Related Scenario
Limitation
n/a
Related Scenario
For a scenario in which tOpenbravoERPInput might be used, see section Scenario: Writing data in a Microsoft
CRM database and putting conditions on columns to extract specified rows
410
tOpenbravoERPOutput
tOpenbravoERPOutput
tOpenbravoERPOutput properties
Component Family
Business
Function
Purpose
Basic settings
Openbravo
Webservice URL
REST Enter the URL of the Web service that allows you to connect to
the OpenbravoERP database.
Username et Password
Action on data
Select this check box if desired and then select the file by
browsing your directory.
Entity
Advanced settings
tStatCatcher Statistics
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
Select this check box to collect the log data at a component level.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Related scenario
For a scenario in which tOpenbravoERPOutput may be used, see section Scenario: Writing data in a Microsoft
CRM database and putting conditions on columns to extract specified rows.
411
tSageX3Input
tSageX3Input
tSageX3Input Properties
Component family
Business/Sage X3
Function
This component leverages the Web service provided by a given Sage X3 Web server to extract
data from the Sage X3 system (the X3 server).
Purpose
Basic settings
Endpoint address
Type in the Web service user authentication data that you have
defined for configuring the Sage X3 Web server.
Language
Pool alias
Request config
RequestConfigDebug=adxwss.trace
.on=on&adxwss.trace.size=16384;
Type in the publication name of the published object, list or subprogram you want your Studio to access.
Action
Mapping
412
Scenario: Using query key to extract data from a given Sage X3 system
Conditions
Query condition
Limit
Advanced settings
tStatCatcher Statistics
Usage
Limitation
n/a
Drop the tSageX3Input component and the tLogRow components onto the workspace from Palette.
2.
Connect the tSageX3Input component to the tLogRow component using a Row > Main link.
413
Scenario: Using query key to extract data from a given Sage X3 system
2.
Click the three-dot button next to Edit schema to open the schema editor.
3.
In this editor, click the plus button 12 times beneath the schema table to add 12 rows into this table.
414
Scenario: Using query key to extract data from a given Sage X3 system
4.
Type in the names you want to use for each row. In this example, these rows are named after the publication
names of the object attributes set in the Sage X3 Web server. These columns are used to map the corresponding
attribute fields in the Sage X3 system.
5.
In the Type column, click the IMG row to display its drop-down list.
6.
From the drop-down list, select List as this attribute appears twice or even more and do the same to switch
the types of the TIT2NBLIG row, the ITMLNK row and the ZITMLNK row to List as well for the same reason.
7.
Click OK to validate this change and accept the propagation prompted by a pop-up dialog box.
In the Endpoint address field, type in the URL address of the Web service provided by the Sage X3 Web
server. In this example, it is http://10.42.20.168:28880/adxwsvc/services/CAdxWebServiceXmlCC
2.
In the User field, type in the user name of the given Sage X3. In this example, it is ERP.
3.
In the Language field, type in the name of the X3 language code used to start a connection group. In this
example, it is FRA.
4.
In the Pool alias field, type in the name of connection pool to be used. In this example, this connection pool
is called TALEND.
5.
In the Publication name field, type in the publication name of the object to be called. In this scenario, the
publication name is ITMDET.
In the Group ID column and the Field name column of the Mapping table, type in values corresponding
to the attribute group IDs and the attribute publication names defined in the Sage X3 Web server. In this
example, the values are presented in the figure below.
In the Mapping table, the Column column has been filled automatically with the columns you created in the schema
editor.
2.
Select the Query condition check box to activate the Conditions table.
3.
Under the Conditions table, click the plus button to add one row into the table.
4.
In the Key column, type in the publication name associated with the object attribute you need to extract data
from.
415
Scenario: Using query key to extract data from a given Sage X3 system
5.
In the Value column, type in the value of the attribute you have selected as the key of the data extraction. In
this scenario, it is CONTS00059, one of the product references.
Job execution
1.
2.
416
tSageX3Output
tSageX3Output
tSageX3Output Properties
Component family
Business/Sage X3
Function
This component connects to the Web service provided by a given Sage X3 Web server and
therefrom insert, update or delete data in the Sage X3 system (the X3 server).
Purpose
Basic settings
Endpoint address
Type in the address of the Web service provided by the given Sage
X3 Web server.
Type in the Web service user authentication data that you have
defined for configuring the Sage X3 Web server.
Language
Pool alias
Type in the name of the connection pool that distributes the received
requests to available connections. This name was given from the
Sage X3 configuration console.
Request config
example,
the
string
could
be:
"RequestConfigDebug=adxwss.trace.on=on";
If you need use several strings, separate them with a &, for example,
RequestConfigDebug="adxwss.trace.on
=on&adxwss.trace.size=16384";
Type in the publication name of the published object, list or subprogram you want your Studio to access.
Action
Mapping
Complete this table to map the variable elements of the object, the
list or the sub-program your Studio access. Only the elements you
need to conduct the data action of your interest on are selected and
417
Scenario: Using a Sage X3 Web service to insert data into a given Sage X3 system
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
Limitation
n/a
Drop the tFixedFlowInput and the tSageX3Output components onto the workspace from Palette.
2.
Connect the tFixedFlowInput component to the tSageX3Output component using a Row > Main
connection.
418
Double-click the tFixedFlowInput component to set its Basic Settings in the Component view
Scenario: Using a Sage X3 Web service to insert data into a given Sage X3 system
2.
Click the three-dot button next to Edit schema to open the schema editor.
3.
In the schema editor and then under the schema table, click the plus button four times to add four rows.
4.
Click OK to validate this changes and then accept the propagation prompted by the pop-up dialog box. The
four rows appear automatically in the Values table of the Component view.
5.
In the Values table within the Mode area, type in the values for each of the four rows in the Value column.
In this scenario, the values downward are:
CONTS00059, Screen 24\" standard 16/10, Screen 24\" standard 28/10, 2
.
These values in the Value column must be put between quotation marks.
419
Scenario: Using a Sage X3 Web service to insert data into a given Sage X3 system
Double-click tSageX3Output to set its properties from the Basic Settings view.
2.
In the Endpoint address field, type in the URL address of the Web service provided by the Sage X3 Web
server. In this example, it is http://10.42.20.168:28880/adxwsvc/services/CAdxWebServiceXmlCC
3.
In the User field, type in the user name of the given Sage X3. In this example, it is ERP.
4.
In the Language field, type in the name of the X3 language code used to start a connection group. In this
example, it is FRA.
5.
In the Pool alias field, type in the name of connection pool to be used. In this example, this connection pool
is called TALEND.
6.
In the Publication name field, type in the publication name of the object to be called. In this scenario, the
publication name is ITMDET.
7.
In the Field name column of the Mapping table, type in the field names of the attributes the selected data
action is exercised on.
2.
In the Group ID column of the Mapping table, type in values corresponding to group IDs of the selected
attributes. These IDs are defined in the Sage X3 Web server
420
Scenario: Using a Sage X3 Web service to insert data into a given Sage X3 system
In the Mapping table, the Column column has been filled automatically with the columns retrieved from the schema
of the preceding component.
Job execution
Press CTRL+S to save your Job and press F6 to execute it.
To verify the data that you inserted in this scenario, you can use the tSageX3Input component to read the
concerned data from the Sage X3 server.
For further information about how to use the tSageX3Input component to read data, see section Scenario: Using
query key to extract data from a given Sage X3 system.
421
tSalesforceBulkExec
tSalesforceBulkExec
tSalesforceBulkExec Properties
tSalesforceOutputBulk and tSalesforceBulkExec components are used together to output the needed file
and then execute intended actions on the file for your Salesforce.com. These two steps compose the
tSalesforceOutputBulkExec component, detailed in a separate section. The interest in having two separate
elements lies in the fact that it allows transformations to be carried out before the data loading.
Component family
Business/Cloud
Function
Purpose
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Consumer
Key
Consumer Secret
Callback Host and Callback Enter your OAuth authentication callback url. This url (both host
Port
and port) is defined during the creation of a Connected App and
will be shown in the OAuth Settings area of the Connected App.
Token File
422
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
tSalesforceBulkExec Properties
Directory where are stored the bulk data you need to process.
Action
Module
Advanced settings
Rows to commit
Bytes to commit
Concurrency mode
Wait time for checking batch Specify the wait time for checking whether the batches in a Job
state(milliseconds)
have been processed until all batches are finally processed.
Global Variables
Select this check box if you want to use a proxy server. Once
selected, you need provide the connection parameters that are
host, port, username and password.
tStatCatcher Statistics
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
NB_SUCCESS: indicates the number of lines accepted. This is an After variable and it returns
an integer.
NB_REJECT: indicates the number of lines rejected. This is an After variable and it returns
an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
423
Related Scenario:
Related Scenario:
For a related scenario, see section Scenario: Inserting transformed bulk data into your Salesforce.com.
424
tSalesforceConnection
tSalesforceConnection
tSalesforceConnection properties
Component family
Business/Cloud
Function
Purpose
Basic settings
Property type
For
salesforce
component
bulk Select this check box if you use bulk data processing components
from the salesforce family. Once selected; the Salesforce
Version field appears and therein you need to enter the Salesforce
version you are using.
For more information on these bulk data processing components,
see section tSalesforceOutputBulk, section tSalesforceBulkExec
and section tSalesforceOutputBulkExec.
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Callback Host and Callback Enter your OAuth authentication callback URL. This URL (both
Port
host and port) is defined during the creation of a Connected App
and will be shown in the OAuth Settings area of the Connected
App.
Token File
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
425
Timeout (milliseconds)
Output Http Trace Message Select this option to output the Http interactions on the Studio
console.
Available when For salesforce bulk component is selected.
Advanced settings
Usage
Select this check box if you want to use a proxy. Once selected,
you need type in the connection parameters in the fields which
appear. These parameters are the host, the port, the username and
the password of the Proxy you need to use.
Client ID
Set the ID of the real user to differentiate between those who use
the same account and password to access the salesforce website.
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
2.
3.
426
2.
3.
In the Consumer Key and Consumer Secret fields, enter the relevant information.
4.
In the Content field, enter the data to write to the Saleforce.com, for example:
Talend
6.
427
7.
8.
In the Action list, select insert to insert the account name Talend.
9.
10. Click the Edit schema button to open the schema editor.
11. In the right panel, remove all the columns except Name.
12.
Click
2.
Press F6 to run the Job. The Studio console gives the url (in yellow) for OAuth authorization.
428
3.
Copy the url to the browsers address bar. The Salesforce.com login page appears.
4.
5.
429
430
Go to the Salesforce.com and check the Account module. We can find that the account name Talend is
inserted.
tSalesforceGetDeleted
tSalesforceGetDeleted
tSalesforceGetDeleted properties
Component family
Business/Cloud
Function
tSalesforceGetDeleted recovers deleted data from a Salesforce object over a given period of
time.
Purpose
This component can collect the deleted data from a Salesforce object during a specific period
of time.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Consumer
Key
Consumer Secret
Callback Host and Callback Enter your OAuth authentication callback url. This url (both host
Port
and port) is defined during the creation of a Connected App and
will be shown in the OAuth Settings area of the Connected App.
Token File
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
Timeout (milliseconds)
Module
431
Start Date
Advanced settings
End Date
Global Variables
Client ID
Set the ID of the real user to differentiate between those who use
the same account and password to access the Salesforce website.
tStatCatcher Statistics
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
You can use this component as an output component. tSalesforceGetDeleted requires an input
component.
Limitation
n/a
432
Drop tSalesforceGetDeleted and tLogRow from the Palette onto the design workspace.
2.
Connect the two components together using a Row > Main connection.
Double-click tSalesforceGetDeleted to display its Basic settings view and define the component properties.
2.
In the Salesforce WebService URL filed, use the by-default URL of the Salesforce Web service or enter
the URL you want to access.
3.
In the Username and Password fields, enter your login and password for the Web service.
4.
From the Module list, select the object you want to access, Account in this example.
Click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema
manually.
2.
In the Start Date and End Date fields, enter respectively the start and end dates for collecting the deleted
data using the following date format: yyyy-MM-dd HH:mm:ss. You can collect deleted data over the past
30 days. In this example, we want to recover deleted data over the past 5 days.
Job execution
1.
Double-click tLogRow to display its Basic settings view and define the component properties.
2.
Click Sync columns to retrieve the schema from the preceding component.
3.
In the Mode area, select Vertical to display the results in a tabular form on the console.
4.
433
Deleted data collected by the tSalesforceGetDeleted component is displayed in a tabular form on the console.
434
tSalesforceGetServerTimestamp
tSalesforceGetServerTimestamp
tSalesforceGetServerTimestamp properties
Component family
Business/Cloud
Function
Purpose
This component retrieves the current date of the Salesforce server presented in a timestamp
format.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Callback Host and Callback Enter your OAuth authentication callback url. This url (both host
Port
and port) is defined during the creation of a Connected App and
will be shown in the OAuth Settings area of the Connected App.
Token File
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
Timeout (milliseconds)
435
Related scenarios
Select this check box if you want to use a proxy server Once
selected, you need enter the connection parameters that are the
host, the port, the username and the passerword of the Proxy you
need to use.
Global Variables
Client ID
Set the ID of the real user to differentiate between those who use
the same account and password to access the salesforce website.
tStatCatcher Statistics
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Related scenarios
No scenario is available for this component yet.
436
tSalesforceGetUpdated
tSalesforceGetUpdated
tSalesforceGetUpdated properties
Component family
Business/Cloud
Function
tSalesforceGetUpdated recovers updated data from a Salesforce object over a given period
of time.
Purpose
This component can collect all updated data from a given Salesforce object during a specific
period of time.
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Consumer
Key
Consumer Secret
Callback Host and Callback Enter your OAuth authentication callback url. This url (both host
Port
and port) is defined during the creation of a Connected App and
will be shown in the OAuth Settings area of the Connected App.
Token File
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
Timeout (milliseconds)
Module
437
Related scenarios
Start Date
Advanced settings
End Date
Global Variables
Client ID
Set the ID of the real user to differentiate between those who use
the same account and password to access the Salesforce website.
tStatCatcher Statistics
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
You can use this component as an output component. tSalesforceGetUpdate requires an input
component.
Limitation
n/a
Related scenarios
No scenario is available for this component yet.
438
tSalesforceInput
tSalesforceInput
tSalesforceInput Properties
Component family
Business/Cloud
Function
tSalesforceInput connects to an object of a Salesforce database via the relevant Web service.
Purpose
Basic settings
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Query mode
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Consumer
Key
Consumer Secret
Callback Host and Callback Enter your OAuth authentication callback url. This url (both host
Port
and port) is defined during the creation of a Connected App and
will be shown in the OAuth Settings area of the Connected App.
Token File
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
Timeout (milliseconds)
439
tSalesforceInput Properties
Query condition
Maunal input of SOQL Select this check box to display the Query field where you can
query
manually enter the desired query.
Query all records (include Select this check box to query all the records, including the
deleted records)
deletions.
Available when Query is selected from the Query mode list.
Advanced settings
Batch Size
Select this check box if you want to use a proxy server. Once
selected, you need enter the connection parameters that are the
host, the port, the username and the password of the Proxy you
need to use.
Normalize delimiter (for Characters, strings or regular expressions used to normalize the
child relationship)
data that is collected by queries set on different hierarchical
Salesforce objects.
Available when Query is selected from the Query mode list.
Column name delimiter (for Characters, strings or regular expressions used to separate the
child relationship)
name of the parent object from the name of the child object when
you use a query on the hierarchical relations among the different
Salesforce objects.
Available when Query is selected from the Query mode list.
Use Soap Compression
Output Http Trace Message Select this check box to output the HTTP trace message.
Available when Bulk Query is selected from the Query mode
list.
440
tStatCatcher Statistics
Client ID
Set the ID of the real user to differentiate between those who use
the same account and password to access the Salesforce website.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Drop two tSalesforceInput components and two tLogRow components onto the workspace.
2.
Connect each tSalesforceInput component to a tLogRow component using a Row > Main connection for
each pair.
3.
441
2.
Enter the Salesforce WebService URL of the database you want to connect to in the corresponding field.
3.
Enter your authentication information in the corresponding Username and Password fields.
4.
Setting the query and the schema for the parent object
1.
2.
Select the Manual input of SOQL Query check box and enter your query scripts in the enabled Query field.
The query scripts you enter should follow the SOQL syntax.
3.
In this example, the IsWon and FiscalYear columns in the query are located in the Opportunity module
specified. The Name column is in a linked module called Account. To return a column from a linked module
the correct syntax is to enter the name of the linked module, followed by the period character, then the name
of the column of interest. Hence, the query required in this example is:
SELECT IsWon, FiscalYear, Account.Name FROM Opportunity.
442
4.
Click the plus button to add a new column for the fields taken from the Name column in the Account module.
5.
2.
Enter the Salesforce WebService URL of the database you want to connect to in the corresponding field.
The query scripts you enter must follow the SOQL syntax.
3.
Enter your authentication information in the corresponding Username and Password fields.
4.
Setting the query and the schema for the child object
1.
2.
Select the Manual input of SOQL Query check box and enter your query scripts in the enabled Query field.
In this example we want to extract the Id and CaseNumber fields from the Case module as well as the Name
fields from the Account module. The query is therefore: .
SELECT Id, CaseNumber, Account.Name FROM Case
3.
443
4.
Click the plus button to add a new column for the fields taken from the Name column in the Account module.
5.
Job execution
1.
Click each tLogRow component and set their component properties in the Basic settings view as desired.
In this example, there is no need to modify the tLogRow settings.
2.
444
tSalesforceOutput
tSalesforceOutput
tSalesforceOutput Properties
Component family
Business/Cloud
Function
tSalesforceoutput writes in an object of a Salesforce database via the relevant Web service.
Purpose
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Consumer
Key
Consumer Secret
Callback Host and Callback Enter your OAuth authentication callback url. This url (both host
Port
and port) is defined during the creation of a Connected App and
will be shown in the OAuth Settings area of the Connected App.
Token File
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
Timeout (milliseconds)
445
tSalesforceOutput Properties
Action
Module
Advanced settings
Extended Output
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject
link.
The Reject link is available only when you have
deselected the Extended Output and Die on error
check boxes.
If you want to create a file that holds all error logs, click the threedot button next to this field and browse to the specified file to set
its access path and its name.
Select this check box if you want to use a proxy server. Once
selected, you need enter the connection parameters that are the
host, the port, the username and the passerword of the Proxy you
need to use.
Retrieve inserted ID
tStatCatcher Statistics
Client ID
Set the ID of the real user to differentiate between those who use
the same account and password to access the salesforce website.
Relationship mapping for Click the [+] button to add lines as needed and specify the external
upsert (for upsert action ID fields in the input flow, the lookup relationship fields in the
upsert module, the lookup module as well as the external id fields
only)
in the lookup module.
446
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Drop tSalesforceInput and tSalesforceOutput from the Palette onto the design workspace.
2.
Connect the two components together using a Row > Main link.
447
Double-click tSalesforceInput to display its Basic settings view and define the component properties.
2.
In the Salesforce WebService URL field, use the default URL of the Salesforce Web service or enter the
URL you want to access or select the Use an existing connection check box to use an established connection.
3.
In the Username and Password fields, enter your login and password for the Web service.
4.
Type in your intended query timeout in the Timeout (milliseconds) field. In this example, use the default
number.
5.
From the Module list, select the object you want to access, Account in this example.
6.
Click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema
manually.
7.
In the Query Condition field, enter the query you want to apply. In this example, we want to retrieve the
clients whose names are sForce. To do this, we use the query: name=sForce.
8.
For a more advanced query, select the Manual input of SOQL query and enter the query manually.
Double-click tSalesforceOutput to display its Basic settings view and define the component properties.
2.
In the Salesforce WebService URL field, use the default URL of the Salesforce Web service or enter the
URL you want to access.
448
3.
In the Username and Password fields, enter your login and password for the Web service.
4.
Type in your intended query timeout in the Timeout (milliseconds) field. In this example, use the default
number.
5.
From the Action list, select the operation you want to carry out. In this example we select Delete to delete
the sForce account selected in the previous component.
6.
From the Module list, select the object you want to access, Account in this example.
7.
8.
Drag and drop the following components from the Palette onto the workspace: tFileInputDelimited,
tSalesforceOutput and two tLogRow components.
2.
3.
4.
5.
449
Double-click DataToInsert to open its Basic settings view in the Component tab.
2.
In the File name/Stream field, type in the path of the source file, for example, E:/salesforceout.csv.
3.
In the Header field, type in 1 to retrieve the column names. Keep the default settings for other fields.
450
Double-click InsertToSalesforce to open its Basic settings view in the Component tab.
2.
3.
4.
5.
6.
Double-click DataInserted to open its Basic settings view in the Component tab.
2.
In the Mode area, select Table (print values in cells of a table) for a better view.
3.
4.
Press F6 to run the Job and you can find the erroneous data (if any) is displayed in the Run view.
451
As shown above, there are two Call Center ID fields that have incorrect data.
Drag and drop the following components from the Palette onto the workspace: tFileInputExcel,
tSalesforceIntput, tMap and tSalesforceOutput.
2.
3.
452
Double-click excel_source to open its Basic settings view in the Component tab.
2.
Click the [...] button next to the File name/Stream field to select the source file.
The content looks like:
3.
Select the All sheets check box to retrieve the data of the entire excel file.
4.
Enter 1 in the Header field as the first line lists the column names.
5.
Click the [...] button next to the Edit schema field to open the schema editor.
6.
Click the [+] button to add three columns, i.e. AccountId, LastName and Name.
7.
Click OK to close the editor. Keep other default settings as they are.
Double-click insert_to_contact_module to open its Basic settings view in the Component tab.
453
2.
3.
Select insert in the Action list and Contact in the Module list.
4.
Click the [...] button next to Edit schema to open the schema editor.
5.
Click
6.
to copy all the columns from the output table to the input table.
454
Double-click load_salesforce_data to open its Basic settings view in the Component tab.
2.
3.
4.
5.
6.
Select fields LastName and Name from the table row1 and drop them next to their counterparts in the table
row2. This way, data from the excel file will be checked against their counterparts in the Contact module.
7.
Select fields LastName and AccountID from the table row1 and drop them next to their counterparts in the
table id. This way, qualified data from the excel file will be passed to their counterpart fields in the id table.
8.
455
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
2.
2.
Rename two tFixedFlowInput components as external ids to insert and emails to upsert, two
tSalesforceInput components as Contact (in) and Account (in), two tSalesforceOutput components as
Contact (out) and Account (out), and two tLogRow components as external ids inserted and emails
upserted.
3.
4.
Link external ids to insert to Account (out) using a Row > Main connection.
5.
Link external ids to insert to Account (in) using the OnSubjobOk trigger.
6.
Link Account (in) to external ids inserted using a Row > Main connection.
7.
8.
Link emails to upsert to Contact (out) using a Row > Main connection.
9.
10. Link Contact (in) to emails upserted using a Row > Main connection.
456
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
457
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
3.
Click the [+] button to add three columns, namely Name, AccountID__c and AccountBizLicense__c, all of
the String type. Note that AccountID__c and AccountBizLicense__c are customized fields in the Account
module, with the attribute of external ID.
Click OK to close the editor.
Select the Use Inline Content (delimited file) check box in the Mode area and enter the data below in the
Content box:
Google;US666;C.A.666
Talend;FR888;Paris888
4.
458
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
Click the [+] button to add three columns, namely Name, AccountID__c and AccountBizLicense__c, all of
the String type.
Click OK to close the editor.
7.
459
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
Select the Table (print values in cells of a table) check box for a better view of the results.
8.
9.
Click the [+] button to add four columns, namely Email, AccountID, AccountBizLicense and LastName, all
of the String type.
Click OK to close the editor.
Select the Use Inline Content (delimited file) check box in the Mode area and enter the data below in the
Content box:
460
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
andy@talend.com;Paris888;FR888;Andy
anderson@talend.com;C.A.666;US666;Anderson
Click the [+] button to add two lines and select AccountBizLicense and AccountID in the list under the Column
name of Talend Schema column.
Enter the lookup relationship fields in the Lookup field name column, namely Account and Account__r.
Enter the lookup module name in the Module name column, namely Account.
Enter the external id fields in the External id name column, namely AccountBizLicense__c and
AccountID__c, which are the customized fields (with the external id attribute) in the Account module.
Column name of Talend Schema refers to the fields in the schema of the component preceding tSalesforceOutput.
Such columns are intended to match against the external id fields specified in the External id name column, which
are the fields of the lookup module specified in the Module name column.
Lookup field name refers to the lookup relationship fields of the module selected from the Module list in the
Basic settings view. They are intended to establish relationship with the lookup module specified in the Module
name column.
For how to define the lookup relationship fields and how to provide their correct names in the Lookup field name
column, go to the Salesforce website and launch the Salesforce Data Loader application for proper actions and
information.
461
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
Click the [+] button to add two columns, namely LastName and Email, all of the String type.
Click OK to close the editor.
13. Double-click emails upserted to open its Basic settings view.
Select the Table (print values in cells of a table) check box for a better view of the results.
462
Scenario 4: Upserting the Contact module based on mapping relationships with the external IDs in the Account module
2.
As shown above, the insert and upsert actions have been completed successfully.
463
tSalesforceOutputBulk
tSalesforceOutputBulk
tSalesforceOutputBulk Properties
tSalesforceOutputBulk and tSalesforceBulkExec components are used together to output the needed file
and then execute intended actions on the file for your Salesforce.com. These two steps compose the
tSalesforceOutputBulkExec component, detailed in a separate section. The interest in having two separate
elements lies in the fact that it allows transformations to be carried out before the data loading.
Component family
Business/Cloud
Function
Purpose
Basic settings
File Name
Append
Select the check box to write new data at the end of the existing
data. Or the existing data will be overwritten.
Relationship mapping for Click the [+] button to add lines as needed and specify the external
upsert (for upsert action ID fields in the input flow, the lookup relationship fields in the
upsert module, the lookup module as well as the external id fields
only)
in the lookup module.
Additionally, the Polymorphic check box must be selected when
and only when polymorphic fields are used for relationship
mapping. For details about the polymorphic fields, search
polymorphic at http://www.salesforce.com/us/developer/docs/
api_asynch/.
Column name of Talend schema: external ID field in the input
flow.
Lookup field name: lookup relationship fields in the upsert
module.
External id name: external ID field in the lookup module.
Polymorphic: select this check box when and only when
polymorphic fields are used for relationship mapping.
Module name: name of the lookup module.
Column name of Talend schema refers to the
fields in the schema of the component preceding
tSalesforceOutput. Such columns are intended to
match against the external id fields specified in the
464
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component is intended for the use along with tSalesforceBulkExec component. Used
together they gain performance while feeding or modifying information in Salesforce.com.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
465
This Job is composed of two steps: preparing data by transformation and processing the transformed data.
Before starting this scenario, you need to prepare the input file which offers the data to be processed by the Job.
In this use case, this file is sforcebulk.txt, containing some customer information.
Then to create and execute this Job, operate as follows:
2.
Use a Row > Main connection to connect tFileInputDelimited to tMap, and Row > out1 from tMap to
tSalesforceOutputBulk.
3.
Use a Row > Main connection and a Row > Reject connection to connect tSalesforceBulkExec respectively
to the two tLogRow components.
4.
466
Double-click tFileInputDelimited to display its Basic settings view and define the component properties.
2.
Next to the File name/Stream field, click the [...] button to browse to the input file you prepared for the
scenario, for example, sforcebulk.txt.
3.
Click the three-dot button next to the Edit schema field to open the dialog box to set the schema. In this
scenario, the schema is made of four columns: Name, ParentId, Phone and Fax.
4.
According to your input file to be used by the Job, set the other fields like Row Separator, Field Separator...
Double-click the tMap component to open its editor and set the transformation.
2.
Drop all columns from the input table to the output table.
3.
4.
Double-click tSalesforceOutputBulk to display its Basic settings view and define the component properties.
467
2.
In the File Name field, type in or browse to the directory where you want to store the generated .csv data
for bulk processing.
3.
Click Sync columns to import the schema from its preceding component.
Double-click tSalesforceBulkExect to display its Basic settings view and define the component properties.
2.
Use the by-default URL of the Salesforce Web service or enter the URL you want to access.
3.
In the Username and Password fields, enter your username and password for the Web service.
4.
In the Bulk file path field, browse to the directory where is stored the generated .csv file by
tSalesforceOutputBulk.
5.
From the Action list, select the action you want to carry out on the prepared bulk data. In this use case, insert.
6.
From the Module list, select the object you want to access, Account in this example.
7.
Click the three-dot button next to the Edit schema field to open the dialog box to set the schema. In this
example, edit it conforming to the schema defined previously.
468
Double-click tLogRow_1 to display its Basic settings view and define the component properties.
2.
Click Sync columns to retrieve the schema from the preceding component.
3.
4.
Job execution
1.
2.
In the tLogRow_1 table, you can read the data inserted into your Salesforce.com.
In the tLogRow_2 table, you can read the rejected data due to the incompatibility with the Account objects
you have accessed.
All the customer names are written in upper case.
469
tSalesforceOutputBulkExec
tSalesforceOutputBulkExec
tSalesforceOutputBulkExec Properties
tSalesforceOutputBulk and tSalesforceBulkExec components are used together to output the needed file
and then execute intended actions on the file for your Salesforce.com. These two steps compose the
tSalesforceOutputBulkExec component, detailed in a separate section. The interest in having two separate
elements lies in the fact that it allows transformations to be carried out before the data loading.
Component family
Business/Cloud
Function
tSalesforceOutputBulkExec executes the intended actions on the .csv bulk data for
Salesforce.com.
Purpose
Basic settings
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Login Type
Salesforce Webservice URL Enter the Webservice URL required to connect to the Salesforce
database.
Salesforce Version
Consumer
Key
Consumer Secret
Callback Host and Callback Enter your OAuth authentication callback url. This url (both host
Port
and port) is defined during the creation of a Connected App and
will be shown in the OAuth Settings area of the Connected App.
Token File
470
Enter the token file name. It stores the refresh token that is used
to get the access token without authorization.
tSalesforceOutputBulkExec Properties
Directory where are stored the bulk data you need to process.
Action
Module
Advanced settings
Rows to commit
Bytes to commit
Concurrency mode
Wait time for checking batch Specify the wait time for checking whether the batches in a Job
state(milliseconds)
have been processed until all batches are finally processed.
Use Socks Proxy
Select this check box if you want to use a proxy server. In this
case, you should fill in the proxy parameters in the Proxy host,
Proxy port, Proxy username and Proxy password fields which
appear beneath.
Relationship mapping for Click the [+] button to add lines as needed and specify the external
upsert (for upsert action ID fields in the input flow, the lookup relationship fields in the
upsert module, the lookup module as well as the external id fields
only)
in the lookup module.
Additionally, the Polymorphic check box must be selected when
and only when polymorphic fields are used for relationship
mapping. For details about the polymorphic fields, search
polymorphic at http://www.salesforce.com/us/developer/docs/
api_asynch/.
Column name of Talend schema: external ID field in the input
flow.
Lookup field name: lookup relationship fields in the upsert
module.
External id name: external ID field in the lookup module.
Polymorphic: select this check box when and only when
polymorphic fields are used for relationship mapping.
471
Usage
This component is mainly used when no particular transformation is required on the data to be
loaded into Salesforce.com.
Limitation
Before starting this scenario, you need to prepare the input file which offers the data to be processed by the Job.
In this use case, this file is sforcebulk.txt, containing some customer information.
Then to create and execute this Job, operate as follows:
472
Drop tFileInputDelimited, tSalesforceOutputBulkExec, and tLogRow from the Palette onto the
workspace of your studio.
2.
3.
Use Row > Main and Row > Reject to connect tSalesforceOutputBulkExec respectively to the two
tLogRow components.
Double-click tFileInputDelimited to display its Basic settings view and define the component properties.
2.
Next to the File name/Stream field, click the [...] button to browse to the input file you prepared for the
scenario, for example, sforcebulk.txt.
3.
Click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema
manually. In this scenario, the schema is made of four columns: Name, ParentId, Phone and Fax.
4.
According to your input file to be used by the Job, set the other fields like Row Separator, Field Separator...
Double-click tSalesforceOutputBulkExec to display its Basic settings view and define the component
properties.
473
2.
In Salesforce WebService URL field, use the by-default URL of the Salesforce Web service or enter the
URL you want to access.
3.
In the Username and Password fields, enter your username and password for the Web service.
4.
In the Bulk file path field, browse to the directory where you store the bulk .csv data to be processed.
The bulk file here to be processed must be in .csv format.
5.
From the Action list, select the action you want to carry out on the prepared bulk data. In this use case, insert.
6.
From the Module list, select the object you want to access, Account in this example.
7.
Click the three-dot button next to the Edit schema field to open the dialog box where you can set the schema
manually. In this example, edit it conforming to the schema defined previously.
Job execution
1.
Double-click tLogRow_1 to display its Basic settings view and define the component properties.
2.
Click Sync columns to retrieve the schema from the preceding component.
3.
4.
5.
474
In the tLogRow_1 table, you can read the data inserted into your Salesforce.com.
In the tLogRow_2 table, you can read the rejected data due to the incompatibility with the Account objects
you have accessed.
If you want to transform the input data before submitting them, you need to use tSalesforceOutputBulk and
tSalesforceBulkExec in cooperation to achieve this purpose. For further information on the use of the two
components, see section Scenario: Inserting transformed bulk data into your Salesforce.com.
475
tSAPBWInput
tSAPBWInput
tSAPBWInput Properties
Component family
Business
Function
tSAPBWInput reads data from an SAP BW database using a JDBC API connection and
extracts fields based on an SQL query.
Purpose
This component executes an SQL query with a strictly defined order which must correspond
to your schema definition. Then it passes on the field list to the next component via a Row >
Main connection.
Basic settings
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
JDBC URL
Username
Password
Table Name
Query Type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Guess Query
Advanced settings
Trim all the String/Char Select this check box to remove leading and trailing whitespace
columns
from all the String/Char columns.
Trim column
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
This component supports SQL queries for SAP BW database using a JDBC connection.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
476
Prior to setting up the Job, make sure the following prerequisites are met:
1. Copy the following .jar files which compose the jdbc4olap driver to your class path:
-activation.jar
-commons-codec.jar
-jdbc4olap.jar
-saaj-api.jar
-saaj-impl.jar
2. Make sure that you have the latest version of jdbc4olap driver. You can download the latest version of jdbc4olap
driver from jdbc4olap download section. For further information about the usage of jdbc4olap driver, see
jdbc4olap User Guide.
The procedure of this scenario requires 4 main steps detailed hereafter:
1. Set up the Job.
2. Set up the jdbc connection to the SAP BW server.
3. Set up a query.
4. Display the fetched data on the console.
Drop a tSAPBWInput component and a tLogRow component from the Palette onto the workspace.
2.
Connect the tSAPBWInput component and the tLogRow component using a Row > Main connection.
477
Double-click the tSAPBWInput component to open its Basic settings view and define the component
properties.
2.
Fill the JDBC URL field with the URL of your jdbc4olap server.
Note that the URL displayed above is for demonstration only.
3.
Fill the Username and Password fields with your username and password for the DB access authentication.
4.
Click the three-dot button next to Edit schema to define the schema to be used.
5.
Click the plus button to add new columns to the schema and set the data type for each column and click OK
to save the schema settings.
478
Set up a query
1.
From the Basic settings view of tSAPBWInput, fill the Table Name field with the table name. In this
scenario, table name "Measures" is for demonstration only.
2.
Fill the Query area with the query script. In this example, we use:
"SELECT
T1.\"[0D_CO_CODE].[LEVEL01]\" AS company,
T0.\"[Measures].[D68EEPGGHUMSZ92PIJARDZ0KA]\" AS amount
FROM
\"0D_DECU\".\"0D_DECU/PRE_QRY4\".\"[Measures]\" T0,
\"0D_DECU\".\"0D_DECU/PRE_QRY4\".\"[0D_CO_CODE]\" T1 "
Due to the limitations of the supported SQL queries, the query scripts you use must be based on the grammar defined
in the jdbc4olap driver. For further information about this grammar, see jdbc4olap User Guide.
Double-click the tLogRow component to open its Basic settings view and define the component properties.
2.
Click Sync columns to retrieve the schema defined in the preceding component.
3.
4.
The data in the table "Measure" is fetched and displayed on the console.
479
tSAPCommit
tSAPCommit
tSAPCommit Properties
This component is closely related to tSAPConnection and tSAPRollback. It usually does not make much sense
to use these components separately in a transaction.
Component family
Business/SAP
Function
Validates the data processed through the Job into the connected server.
Purpose
Using a unique connection, this component commits a global transaction in one go instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
SAPConnection Component Select the tSAPConnection component in the list if more than one
list
connection are planned for the current Job.
Release Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your SAP connection dynamically from multiple connections planned in your Job.
Select this check box to collect log data at the component level.
When a dynamic parameter is defined, the SAPConnection Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with SAP components, especially with tSAPConnection and
tSAPRollback components.
Limitation
n/a
Related scenario
This component is closely related to tSAPConnection and tSAPRollback. It usually does not make much sense
to use one of these without using a tSAPConnection component to open a connection for the current transaction.
For tSAPCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
480
tSAPConnection
tSAPConnection
tSAPConnection properties
Component family
Business
Function
tSAPConnection opens a connection to the SAP system for the current transaction.
Purpose
tSAPConnection allows to commit a whole Job data in one go to the SAP system as one
transaction.
Basic settings
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data is stored centrally.
Connection configuration
Advanced settings
ftp and http based programs To invoke from the SAP server a function which requires
document downloading, select this check box and make sure that
SAPGUI has been installed with the SAP system.
If this check box is selected but SAPGUI has not been installed,
errors will occur.
This check box will not be available if you select Use
an existing connection check box in the Basic settings
tab.
tStat Catcher Statistics
Select this check box to collect log data at the component level.
Usage
Limitation
n/a
Related scenarios
For a related scenarios, see section Scenario 1: Retrieving metadata from the SAP system and section Scenario 2:
Reading data in the different schemas of the RFC_READ_TABLE function.
481
tSAPInput
tSAPInput
tSAPInput Properties
Component family
Business
Function
Purpose
tSAPInput allows to extract data from an SAP system at any level through calling RFC or
BAPI functions.
Basic settings
Property type
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Connection configuration
FunName
Enter the name of the function you want to use to retrieve data.
Initialize input
Outputs
482
Advanced settings
ftp and http based programs To invoke from the SAP server a function which requires
document downloading, select this check box and make sure that
SAPGUI has been installed with the SAP system.
If this check box is selected but SAPGUI has not been installed,
errors will occur.
This check box will not be available if you select Use
an existing connection check box in the Basic settings
tab.
Release Connection
tStatCatcher Statistics
Usage
Limitation
n/a
Talend SAP components (tSAPInput and tSAPOutput) as well as the SAP wizard are based on a library validated
and provided by SAP (JCO) that allows the user to call functions and retrieve data from the SAP system at Table,
RFC or BAPI, levels.
This scenario uses the SAP wizard that leads a user through dialog steps to create SAP connection and call RFC and BAPI
functions. This SAP wizard is available only for users who have subscribed to one of the Talend solutions. Otherwise, you
need to drop the tSAPInput component from the Palette and set its basic settings manually.
This scenario uses the SAP wizard to first create a connection to the SAP system, and then call a BAPI function
to retrieve the details of a company from the SAP system. It finally displays in Talend Studio the company details
stored in the SAP system.
The following figure shows the company detail parameters stored in the SAP system and that we want to read in
Talend Studio using the tSAPInput component.
483
Create a connection to the SAP system using the SAP connection wizard, in this scenario the SAP connection
is called sap and is saved in the Metadata node.
2.
Call the BAPI function BAPI_COMPANY_GETDETAIL using the SAP wizard to access the BAPI HTML
document stored in the SAP system and see the company details.
3.
In the Name filter field, type in BAPI* and click the Search button to display all available BAPI functions.
4.
Select BAPI_COMPANY_GETDETAIL to display the schema that describes the company details.
The three-tab view to the right of the wizard displays the metadata of the BAPI_COMPANY_GETDETAIL function
and allows you to set the necessary parameters.
484
The Document view displays the SAP html document about the BAPI_COMPANY_GETDETAIL function.
The Parameter view provides information about the input and output parameters required by the
BAPI_COMPANY_GETDETAIL function to return values.
In the Parameter view, click the Input tab to list the input parameter(s). In this scenario, there is only one
input parameter required by BAPI_COMPANY_GETDETAIL and it is called COMPANYID.
2.
In the Parameter view, click the Output tab to list the output parameters returned by
BAPI_COMPANY_GETDETAIL. In this scenario, there are two output parameters: COMPANY_DETAIL and
RETURN.
485
486
3.
In the Value column of the COMPANYID line in the first table, enter 000001 to send back company data
corresponding to the value 000001.
4.
In the Output type list at the bottom of the wizard, select output.table.
5.
Click Launch at the bottom of the view to display the value of each single parameter returned by the
BAPI_COMPANY_GETDETAIL function.
6.
The sap connection and the new schema BAI_COMPANY_GETDETAIL display under the SAP Connections
node in the Repository tree view.
Right-click BAPI_COMPANY_GETDETAIL in the Repository tree view and select Retrieve schema in
the contextual menu.
2.
In the open dialog box, select the schemas you want to retrieve, COMPANY_DETAIL and RETURN in
this scenario.
3.
Click Next to display the two selected schemas and then Finish to close the dialog box.
The two schemas display under the BAPI_COMPANY_GETDETAIL function in the Repository tree view.
487
In the Repository tree view, drop the SAP connection you already created to the design workspace to open
a dialog box where you can select tSAPConnection from the component list and finally click OK to close
the dialog box. The tSAPConnection component holding the SAP connection, sap in this example, displays
on the design workspace.
2.
Double-click tSAPConnection to display the Basic settings view and define the component properties.
If you store connection details in the Metadata node in the Repository tree view, the Repository mode is selected in
the Property Type list and the fields that follow are pre-filled. If not, you need to select Built-in as property type
and fill in the connection details manually.
3.
In the Repository tree-view, expand Metadata and sap in succession and drop RFC_READ_TABLE to
the design workspace to open a component list.
4.
5.
Drop tFilterColumns and tLogRow from the Palette to the design workspace.
6.
7.
488
tSAPInput
and
select
Row
>
8.
In the design workspace, double-click tSAPInput to display its Basic settings view and define the component
properties.
The basic setting parameters for the tSAPInput component display automatically since the schema is stored
in the Metadata node and the component is initialized by the SAP wizard.
9.
Select the Use an existing connection check box and then in the Component List, select the relevant
tSAPConnection component, sap in this scenario.
In the Initialize input area, we can see the input parameter needed by the BAPI_COMPANY_GETDETAIL
function.
In the Outputs area, we can see all different schemas of the BAPI_COMPANY_GETDETAIL function, in
particular, COMPANY_DETAIL that we want to output.
Job execution
1.
In the design workspace, double-click tLogRow to display the Basic settings view and define the component
properties. For more information about this component, see section tLogRow.
2.
489
The tSAPInput component retrieved from the SAP system the metadata of the COMPANY_DETAIL structure
parameter and tLogRow displayed the information on the console.
Talend SAP components (tSAPInput and tSAPOutput) as well as the SAP wizard are based on a library validated
and provided by SAP (JCO) that allows the user to call functions and retrieve data from the SAP system at Table,
RFC or BAPI, levels.
This scenario uses the SAP wizard that leads a user through dialog steps to create a SAP connection and call RFC and BAPI
functions. This SAP wizard is available only for users who have subscribed to one of the Talend solutions. Otherwise, you
need to drop the tSAPInput component from the Palette and set its basic settings manually.
This scenario uses the SAP wizard to first create a connection to the SAP system, and then call an RFC function
to directly read from the SAP system a table called SFLIGHT. It finally displays in Talend Studio the structure
of the SFLIGHT table stored in the SAP system.
Create a connection to the SAP system using the SAP connection wizard, in this scenario the SAP connection
is called sap.
2.
Call the RFC_READ_TABLE RFC function using the SAP wizard to access the table in the SAP system and
see its structure.
490
3.
In the Name filter field, type in RFC* and click the Search button to display all available RFC functions.
4.
Select RFC_READ_TABLE to display the schema that describe the table structure.
The three-tab view to the right of the wizard displays the metadata of the RFC_READ_TABLE function and allows
you to set the necessary parameters.
The Document view displays the SAP html document about the RFC_READ_TABLE function.
The Parameter view provides information about the parameters required by the RFC_READ_TABLE function
to return parameter values.
In the Parameter view, click the Table tab to show a description of the structure of the different tables of
the RFC_READ_TABLE function.
491
The Test it view allows you to add or delete input parameters according to the called function. In this example,
we want to retrieve the structure of the SFLIGHT table and not any data.
2.
3.
In the Value column of the QUERY_TABLE line, enter SFLIGHT as the table to query.
4.
In the Output type list at the bottom of the view, select output.table.
5.
6.
Click Launch at the bottom of the view to display the parameter values returned by the RFC_READ_TABLE
function. In this example, the delimiter is ; and the table to read is SFLIGHT.
492
7.
In the Repository tree view, right-click RFC_READ_TABLE and select Retrieve schema in the contextual
menu. A dialog box displays.
2.
Select in the list the schemas you want to retrieve, DATA, FIELDS and OPTIONS in this example.
3.
Click Next to open a new view on the dialog box and display these different schemas.
4.
Click Finish to validate your operation and close the dialog box.
The three schemas display under the RFC_READ_TABLE function in the Repository tree view.
In the Repository tree view, drop the RFC_READ_TABLE function of the sap connection to the design
workspace to open a dialog box where you can select tSAPInput from the component list and then click OK
to close the dialog box. The tSAPInput component displays on the design workspace.
2.
Drop two tLogRow components from the Palette to the design workspace.
3.
Right-click tSAPInput and select Row > row_DATA_1 and click the first tLogRow component.
4.
Right-click tSAPInput and select Row > row_FIELDS_1 and click the second tLogRow components.
493
In this example, we want to retrieve the FIELDS and DATA schemas and put them in two different output
flows.
5.
In the design workspace, double-click tSAPInput to open the Basic settings view and display the component
properties.
The basic setting parameters for the tSAPInput component display automatically since the schema is stored in
the Metadata node and the component is initialized by the SAP wizard.
In the Initialize input area, we can see the input parameters necessary for the RFC_READ_TABLE function, the
field delimiter ; and the table name SFLIGHT.
In the Outputs area, we can see the different schemas of the SFLIGHT table.
494
Job execution
1.
In the design workspace, double click each of the two tLogRow components to display the Basic settings
view and define the component properties. For more information on the properties of tLogRow, see section
tLogRow.
2.
The tSAPInput component retrieves from the SAP system the column names of the SFLIGHT table as well as the
corresponding data. The tLogRow components display the information in a tabular form in the Console.
495
tSAPOutput
tSAPOutput
tSAPOutput Properties
Component family
Business
Function
Purpose
Basic settings
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Use an existing connection Select this check box and in the Component List click the
relevant connection component to reuse the connection details
you already defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
Connection configuration
Advanced settings
FunName
Enter the name of the function you want to use to write data.
Mapping
Set the parameters to select the data to write to the SAP system.
ftp and http based programs To invoke from the SAP server a function which requires
document downloading, select this check box and make sure that
SAPGUI has been installed with the SAP system.
If this check box is selected but SAPGUI has not been installed,
errors will occur.
This check box will not be available if you select Use
an existing connection check box in the Basic settings
tab.
Release Connection
tStatCatcher Statistics
Usage
Limitation
n/a
496
Related scenario
Related scenario
For a related scenarios, see section Scenario 1: Retrieving metadata from the SAP system and section Scenario 2:
Reading data in the different schemas of the RFC_READ_TABLE function.
497
tSAPRollback
tSAPRollback
tSAPRollback properties
This component is closely related to tSAPCommit and tSAPConnection. It usually does not make much sense
to use these components separately in a transaction.
Component family
Business/SAP
Function
Purpose
Basic settings
SAPConnection Component Select the tSAPConnection component in the list if more than one
list
connection are planned for the current Job.
Release Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your SAP connection dynamically from multiple connections planned in your Job.
When a dynamic parameter is defined, the SAPConnection Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is intended to be used along with SAP components, especially with
tSAPConnection and tSAPCommit.
Limitation
n/a
Related scenarios
For tSAPRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter tables.
498
tSugarCRMInput
tSugarCRMInput
tSugarCRMInput Properties
Component family
Business/Cloud
Function
Purpose
Basic settings
SugarCRM
URL
Module
Query condition
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
499
Drop a tSugarCRMInput and a tFileOutputExcel component from the Palette onto the workspace.
2.
Connect the input component to the output component using a Row > Main connection.
Double-click tSugarCRMInput to define the component properties in its Basic settings view.
2.
Fill the SugarCRM WebService URL field with the connection inforamtion, and the Username and
Password fields with the authentication you have.
3.
Select the Module from the list of modules offered. In this example, Accounts is selected.
The Schema is then automatically set according to the module selected. But you can change it and remove
the columns that you do not require in the output.
4.
In the Query Condition field, type in the query you want to extract from the CRM. In this example:
billing_address_city=Sunnyvale.
Job execution
1.
Double-click tFileOutputExcel to define the component properties in its Basic settings view.
2.
Set the destination file name as well as the Sheet name and select the Include header check box.
3.
500
The filtered data is output in the defined spreadsheet of the specified Excel file.
501
tSugarCRMOutput
tSugarCRMOutput
tSugarCRMOutput Properties
Component family
Business/Cloud
Function
Purpose
Basic settings
SugarCRM
URL
Module
Action
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Related Scenario
No scenario is available for this component yet.
502
tVtigerCRMInput
tVtigerCRMInput
tVtigerCRMInput Properties
Component family
Business/VtigerCRM
Function
Purpose
Basic settings
Vtiger Version
Select the version of the Vtiger Web Services you want to use (either Vtiger 5.0 or Vtiger 5.1)
Vtiger 5.0
Server Address
Port
Vtiger Path
Version
Module
Method
Select the relevant method in the list. The method specifies the
action you can carry out on the VtigerCRM module selected.
Vtiger 5.1
Endpoint
Username
Access key
Query condition
Manual input of SQL query Manually type in your query in the corresponding field.
Advanced settings
tStatCatcher Statistics
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
503
Related Scenario
Related Scenario
No scenario is available for this component yet.
504
tVtigerCRMOutput
tVtigerCRMOutput
tVtigerCRMOutput Properties
Component family
Business/VtigerCRM
Function
Purpose
Basic settings
Vtiger Version
Select the version of the Vtiger Web Services you want to use (either Vtiger 5.0 or Vtiger 5.1)
Vtiger 5.0
Server Address
Port
Vtiger Path
Version
Module
Method
Select the relevant method in the list. The method specifies the
action you can carry out on the VtigerCRM module selected.
Vtiger 5.1
Endpoint
Username
Access key
Action
Module
Die on error
This check box is clear by default to skip the row on error and
complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
505
Related Scenario
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
n/a
Related Scenario
No scenario is available for this component yet.
506
tDB2SCD
tDB2SCD
tDB2SCD properties
Component
family
Databases/DB2
Function
Purpose
tDB2SCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes
into a dedicated SCD table
Basic settings Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Property type
Host
Port
Database
Table Schema
Username
Password
Table
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on to
schema
the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to maximize system performance.
saving Mode
Advanced
settings
508
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete the process
for error-free rows.
Additional
JDBC
parameters
Specify additional connection properties for the DB connection you are creating.
Related scenarios
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Limitation
This component requires installation of its related jar files. For more information about the installation of these
missing jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade
Guide.
Related scenarios
For related topics, see section tMysqlSCD.
509
tDB2SCDELT
tDB2SCDELT
tDB2SCDELT Properties
Component Databases/DB2
family
Function
Purpose
tDB2SCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode), and
logs the changes into a dedicated DB2 SCD table.
Basic
settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Host
Port
Database
Username
Password
Source table
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
Schema
schema
and
Edit A schema is a row description. It defines the number of fields to be processed and passed on to
the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
510
Related Scenario
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
Surrogate Key
Creation
Source Keys
Select one or more columns to be used as keys, to ensure the unicity of incoming data.
Source fields value Select this check box to allow the source columns to have Null values.
include Null
The source columns here refer to the fields defined in the SCD type 1 fields and SCD
type 2 fields tables.
Use SCD Type 1 Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections
fields
for example. Select the columns of the schema that will be checked for changes.
Use SCD Type 2 Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for
fields
example. Select the columns of the schema that will be checked for changes.
Start date: Adds a column to your SCD schema to hold the start date value. You can select one
of the input schema columns as Start Date in the SCD table.
End Date: Adds a column to your SCD schema to hold the end date value for the record. When
the record is currently active, the End Date column shows a null value, or you can select Fixed
Year value and fill it in with a fictive year to avoid having a null value in the End Date field.
Log Active Status: Adds a column to your SCD schema to hold the true or false status value.
This column helps to easily spot the active record.
Log versions: Adds a column to your SCD schema to hold the version number of the record.
Advanced
settings
Additional
parameters
JDBC Specify additional connection properties for the DB connection you are creating.
You can set the encoding parameters through this field.
Debug mode
tStat
Statistics
Dynamic
settings
Select this check box to display each step during processing entries in a database.
Catcher Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an
environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as an output component. It requires an input component and Row main link as input.
Limitation
This component requires installation of its related jar files. For more information about the installation of these missing
jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade Guide.
Related Scenario
For related topics, see section tDB2SCD and section tMysqlSCD.
511
tGreenplumSCD
tGreenplumSCD
tGreenplumSCD Properties
Component
family
Databases/
Greenplum
Function
Purpose
tGreenplumSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Use an existing Select this check box and in the Component List click the relevant connection component to reuse
connection
the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Connection type Select the relevant driver on the list.
Host
Port
Database
Schema
Username
Password
Table
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on to
schema
the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to maximize system performance.
saving Mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
Special attention should be paid to the uniqueness of the source key(s) value when this
option is selected.
512
Related scenario
Advanced
settings
Dynamic
settings
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete the process
for error-free rows.
tStatCatcher
Statistics
Select this check box to collect log data at the component level.
Debug mode
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Related scenario
For related scenarios, see section tMysqlSCD.
513
tInformixSCD
tInformixSCD
tInformixSCD properties
Component
family
Databases/
Business
Intelligence/
Informix
Function
tInformixSCD tracks and shows changes which have been made to Informix SCD dedicated tables.
Purpose
tInformixSCD addresses Slowly Changing Dimension transformation needs, by regularly reading a data source
and listing the modifications in an SCD dedicated table.
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Host
Port
Database
Schema
Username
Password
Instance
Name of the Informix instance to be used. This information can generally be found in the SQL
hosts file.
Table
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on
schema
to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension
outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to improve system performance.
saving Mode
514
Related scenario
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
Special attention should be paid to the uniqueness of the source key(s) values when
this option is selected.
Use Transaction Select this check box when the database is configured in NO_LOG mode.
Advanced
settings
Dynamic
settings
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
tStatCatcher
Statistics
Select this check box to collect the log data at a component level.
Debug mode
Select this check box to display each step of the process by which data is written in the database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is an output component. Consequently, it requires an input component and a connection of the
Row > Main type.
Limitation
This component requires installation of its related jar files. For more information about the installation of these
missing jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade
Guide.
Related scenario
For a scenario in which tInformixSCD might be used, see section tMysqlSCD.
515
tIngresSCD
tIngresSCD
tIngresSCD Properties
Component
family
Databases/
Ingress
Function
Purpose
tIngresSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Basic settings
Use an existing Select this check box and in the Component List click the relevant connection component
connection
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Property type
Server
Port
Database
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time.
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed
schema
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic:
see Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension
outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to maximize system performance.
saving Mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
Special attention should be paid to the uniqueness of the source key(s) values when
this option is selected.
516
Related scenario
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
Select this check box to display each step during processing entries in a database.
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an
integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can
easily find out and add such JARs in the Integration perspective of your studio. For details, see the section
about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For related scenarios, see section tMysqlSCD.
517
tMSSqlSCD
tMSSqlSCD
tMSSqlSCD Properties
Component
family
Databases/
MSSQL Server
Function
Purpose
tMSqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component
connection
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Server
Port
Schema
Database
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time.
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed
schema
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic:
see Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension
outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to maximize system performance.
saving Mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
Special attention should be paid to the uniqueness of the source key(s) values when
this option is selected.
518
Related scenario
Die on error
Advanced
settings
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
Additional JDBC Specify additional connection properties for the DB connection you are creating. This option
parameters
is not available if you have selected the Use an existing connection check box in the Basic
settings.
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Debug mode
Dynamic
settings
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can
easily find out and add such JARs in the Integration perspective of your studio. For details, see the section about
external modules in the Talend Installation and Upgrade Guide.
Related scenario
For related topics, see section tMysqlSCD.
519
tMysqlSCD
tMysqlSCD
tMysqlSCD Properties
Component Databases/
family
MySQL
Function
Purpose
tMysqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes
into a dedicated SCD table
Basic
settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component to reuse
connection
the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
DB Version
Host
Port
Database
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time.
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not exist.
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on to
schema
the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension outputs.
For more information, see section SCD management methodologies.
520
tMysqlSCD Properties
Use
memory Select this check box to maximize system performance.
saving mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
Special attention should be paid to the uniqueness of the source key(s) values when this
option is selected.
Die on error
Advanced
settings
This check box is cleared by default, meaning to skip the row on error and to complete the process
for error-free rows.
Additional JDBC Specify additional connection properties for the DB connection you are creating. This option is
Parameters
not available if you have selected the Use an existing connection check box in the Basic settings.
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Debug mode
Dynamic
settings
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an
environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
521
tMysqlSCD Properties
SCD keys
You must choose one or more source keys columns from the incoming data to ensure its unicity.
You must set one surrogate key column in the dimension table and map it to an input column in the source table.
The value of the surrogate key links a record in the source to a record in the dimension table. The editor uses
this mapping to locate the record in the dimension table and to determine whether a record is new or changing.
The surrogate key is typically the primary key in the source, but it can be an alternate key as long as it uniquely
identifies a record and its value does not change.
Source keys: Drag one or more columns from the Unused panel to the Source keys panel to be used as the key(s)
that ensure the unicity of the incoming data.
Surrogate keys: Set the column where the generated surrogate key will be stored. A surrogate key can be generated
based on a method selected on the Creation list.
Creation: Select any of the below methods to be used for the key generation:
Auto increment: auto-incremental key.
Input field: key is provided in an input field.
When selected, you can drag the appropriate field from the Unused panel to the complement field.
522
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
Routine: from the complement field, you can press Ctrl+ Space to display the autocompletion list and select
the appropriate routine.
Table max +1: the maximum value from the SCD table is incremented to create a surrogate key.
DB Sequence: from the complement field, you can enter the name of the existing database sequence that will
automatically increment the column indicated in the name field.
This option is only available through the SCD Editor of the tOracleSCD component.
523
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
The source delimited file contains various personal details including firstname, lastname, address, city, company,
age, and status. An id column helps ensuring the unicity of the data.
We want any change in the marital status to overwrite the existing old status record. This type of change is
equivalent to an SCD Type 1.
We want to insert a new record in the dimensional table with a separate key each time a person changes his/her
company. This type of change is equivalent to an SCD Type 2.
We want to track only the previous city and previous address of a person. This type of change is equivalent to
an SCD Type 3.
To realize this kind of scenario, it is better to divide it into three main steps: defining the main flow of the Job,
setting up the SCD editor, and finally creating the relevant SCD table in the database.
Drop the following components from the Palette onto the design workspace: a tMysqlConnection, a
tFileInputDelimited, a tMysqlSCD, a tMysqlCommit, and two tLogRow components.
2.
Connect the tFileInputDelimited, the first tLogRow, and the tMysqlSCD using the Row Main link. This
is the main flow of your Job.
3.
Connect the tMysqlConnection to the tFileInputDelimited and tMysqlSCD to tMysqlCommit using the
OnComponntOk trigger.
4.
Connect the tMysqlSCD to the second tLogRow using the Row Rejects link. Two columns, errorCode and
errorMessage, are added to the schema. This connection collects error information.
524
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
In the design workspace, double-click tMysqlConnection to display its Basic settings view and set the
database connection details. The tMysqlConnection component should be used to avoid setting several times
the same DB connection when multiple DB components are used.
In this scenario, we want to connect to the SCD table where changes in the source delimited file will be
tracked down.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
In the design workspace, double-click tFileInputDelimited to display its Basic settings view.
3.
Click the three-dot button next to the File Name field to select the path to the source delimited file, dataset.csv
in this scenario, that contains the personal details.
4.
Define the row and field separators used in the source file.
The File Name, Row separator, and Field separators are mandatory.
5.
6.
Click Edit schema to describe the data structure of the source delimited file.
In this scenario, the source schema is made of eight columns: id, firstName, lastName, address, city, company,
age, and status.
525
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
7.
Define the basic settings for the first tLogRow in order to view the content of the source file with varying
attributes in cells of a table on the console before being processed through the SCD component.
In the design workspace, click the tMysqlSCD and select the Component tab to define its basic settings.
2.
In the Basic settings view, select the Use an existing connection check box to reuse the connection details
defined on the tMysqlConnection properties.
3.
In the Table field, enter the table name to be used to track changes.
4.
If needed, click Sync columns to retrieve the output data structure from the tFileInputDelimited.
5.
6.
Select the relevant connection on the Component list if more than one connection exists.
7.
Define the basic settings of the second tLogRow in order to view reject information in cells of a table.
526
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
Double-click the tMysqlSCD component in the design workspace or click the three-dot button next to the
SCD Editor in the components Basic settings view to open the SCD editor and build the data flow for
the SCD outputs.
All the columns from the preceding component are displayed in the Unused panel of the SCD editor. All
the other panels in the SCD editor are empty.
2.
From the Unused list, drop the id column to the Source keys panel to use it as the key to ensure the unicity
of the incoming data.
3.
In the Surrogate keys panel, enter a name for the surrogate key in the Name field, SK1 in this scenario.
4.
From the Creation list, select the method to be used for the surrogate key generation, Auto-increment in
this scenario.
5.
From the Unused list, drop the firstname and lastname columns to the Type 0 panel, changes in these two
columns do not interest us.
6.
Drop the status column to the Type 1 panel. The new value will overwrite the old value.
7.
Drop the company column to the Type 2 panel. Each time a person changes his/her company, a new record
will be inserted in the dimensional table with a separate key.
In the Versioning area:
- Define the start and end columns of your SCD table that will hold the start and end date values. The end
date is null for current records until a change is detected. Then the end date gets filled in and a new record
is added with no end date.
In this scenario, we select Fixed Year Value for the end column and fill in a fictive year to avoid having
a null value in the end date field.
- Select the version check box to hold the version number of the record.
- Select the active check box to spot the column that will hold the True or False status. True for the current
active record and False for the modified record.
8.
Drop the address and city columns to the Type 3 panel to track only the information about the previous value
of the address and city.
For more information about SCD types, see section SCD management methodologies.
Talend Open Studio for Big Data Components Reference Guide
527
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
9.
Click Edit schema to view the input and output data structures.
The SCD output schema should include the SCD-specific columns defined in the SCD editor to hold standard
log information.
528
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
If you adjust any of the input schema definitions, you need to check, and reconfigure if necessary, the output flow
definitions in the SCD editor to ensure that the output data structure is properly updated.
2.
In the Basic settings view of the tMysqlSCD component, select Create table if not exists from the Action
on table list to avoid creating and defining the SCD table manually.
Job execution
Save your Job and press F6 to execute it.
The console shows the content of the input delimited file, and your SCD table is created in your database,
containing the initial dataset.
Janet gets divorced and moves to Adelanto at 355 Golf Rd. She works at Greenwood.
Adam gets married and moves to Belmont at 2505 Alisson ct. He works at Scoop.
Martin gets a new job at Phillips and Brothers.
Update the delimited file with the above information and press F6 to run your Job.
The console shows the updated personal information and the rejected data, and the SCD table shows the history of
valid changes made to the input file along with the status and version number. Because the name of Martins new
company exceeds the length of the column company defined in the schema, this change is directed to the reject
flow instead of being logged in the SCD table.
529
Scenario: Tracking changes using Slowly Changing Dimensions (type 0 through type 3)
530
tMysqlSCDELT
tMysqlSCDELT
tMysqlSCDELT Properties
Component family Databases/MySQL
Function
Purpose
tMysqlSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing
mode), and logs the changes into a dedicated MySQL SCD table.
Basic settings
Property type
DB Version
Use an existing Select this check box and in the Component List click the relevant connection component
connection
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Host
Port
Database
Username
Password
Source table
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed
schema
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
531
Related Scenario
Creation
Source Keys
Select one or more columns to be used as keys, to ensure the unicity of incoming data.
Use SCD Type 1 Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos
fields
corrections for example. Select the columns of the schema that will be checked for changes.
Use SCD Type 2 Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates
fields
for example. Select the columns of the schema that will be checked for changes.
Start date: Adds a column to your SCD schema to hold the strat date value. You can select
one of the input schema columns as Start Date in the SCD table.
End Date: Adds a column to your SCD schema to hold the end date value for the record.
When the record is currently active, the End Date column shows a null value, or you can
select Fixed Year value and fill it in with a fictive year to avoid having a null value in
the End Date field.
Log Active Status: Adds a column to your SCD schema to hold the true or false status
value. This column helps to easily spot the active record.
Log versions: Adds a column to your SCD schema to hold the version number of the
record.
Advanced settings Debug mode
Select this check box to display each step during processing entries in a database.
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful when
you need to access database tables having the same data structure but in different databases, especially when
you are working in an environment where you cannot change your Job settings, for example, when your Job
has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in
the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings
view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as an output component. It requires an input component and Row main link as input.
Related Scenario
For related topics, see: section tMysqlSCD and section Scenario: Tracking changes using Slowly Changing
Dimensions (type 0 through type 3).
532
tNetezzaSCD
tNetezzaSCD
tNetezzaSCD Properties
Component
family
Databases/Netezza
Function
Purpose
tNetezzaSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component
connection
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Host
Port
Database
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time.
Action on table
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed
schema
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic:
see Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension
outputs.
For more information, see section SCD management methodologies.
533
Related scenario
Use
memory Select this check box to maximize system performance.
saving Mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
Special attention should be paid to the uniqueness of the source key(s) values when
this option is selected.
Die on error
Advanced
settings
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
Additional JDBC Specify additional connection properties for the DB connection you are creating. This option
parameters
is not available if you have selected the Use an existing connection check box in the Basic
settings.
You can press Ctrl+Space to access a list of predefined global variables.
tStatCatcher
Statistics
Select this check box to collect log data at the component level.
Debug mode
Select this check box to display each step during processing entries in a database.
Dynamic settings Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful when you
need to access database tables having the same data structure but in different databases, especially when you are
working in an environment where you cannot change your Job settings, for example, when your Job has to be
deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in
the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings
view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as an Output component. It requires an Input component and Row main link as input.
Limitation
The nzjdbc.jar needs to be installed separately. For details, see the section about external modules in Talend
Installation and Upgrade Guide.
Related scenario
For related scenarios, see section tMysqlSCD.
534
tOracleSCD
tOracleSCD
tOracleSCD Properties
Component
family
Databases/
Oracle
Function
Purpose
tOracleSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Use
an Select this check box and in the Component List click the relevant connection component to reuse
existing
the connection details you already defined.
connection
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Connection
type
DB Version
Host
Port
Database
Schema
Username
and
Password
Table
Name of the table to be written. Note that only one table can be written at a time.
Action
table
535
Related scenario
Schema and A schema is a row description. It defines the number of fields to be processed and passed on to the
Edit schema next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see Talend
Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension outputs.
For more information, see section SCD management methodologies.
Additional
JDBC
parameters
Specify additional connection properties for the DB connection you are creating. This option is not
available if you have selected the Use an existing connection check box in the Basic settings.
tStat
Catcher
Statistics
Select this check box to collect log data at the component level.
Debug mode Select this check box to display each step during processing entries in a database.
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Related scenario
For related scenarios, see section tMysqlSCD.
536
tOracleSCDELT
tOracleSCDELT
tOracleSCDELT Properties
Component
family
Databases/
Oracle
Function
Purpose
tOracleSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing mode),
and logs the changes into a dedicated DB2 SCD table.
Basic
settings
DB Version
Host
Port
Database
Username
User authentication data for a dedicated database.
and Password
Source table
Table
Name of the table to be written. Note that only one table can be written at a time
Action
table
537
Related Scenario
Clear table: The table content is deleted. You have the possibility to rollback the operation.
Truncate table: The table content is deleted. You don not have the possibility to rollback the
operation.
Schema and A schema is a row description. It defines the number of fields to be processed and passed on to the
Edit schema next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see Talend
Studio User Guide.
Surrogate
Key
Creation
Source Keys
Select one or more columns to be used as keys, to ensure the unicity of incoming data.
Source fields Select this check box to allow the source columns to have Null values.
value include
The source columns here refer to the fields defined in the SCD type 1 fields and SCD type
Null
2 fields tables.
Use
SCD Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos corrections for
Type 1 fields example. Select the columns of the schema that will be checked for changes.
Use
SCD Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace updates for
Type 2 fields example. Select the columns of the schema that will be checked for changes.
Start date: Adds a column to your SCD schema to hold the start date value. You can select one of
the input schema columns as Start Date in the SCD table.
End Date: Adds a column to your SCD schema to hold the end date value for the record. When the
record is currently active, the End Date column shows a null value, or you can select Fixed Year
value and fill it in with a fictive year to avoid having a null value in the End Date field.
Log Active Status: Adds a column to your SCD schema to hold the true or false status value. This
column helps to easily spot the active record.
Log versions: Adds a column to your SCD schema to hold the version number of the record.
Advanced
settings
Dynamic
settings
Additional
JDBC
parameters
Specify additional connection properties for the DB connection you are creating. This option is not
available if you have selected the Use an existing connection check box in the Basic settings.
Debug mode
Select this check box to display each step during processing entries in a database.
tStatCatcher
Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as an output component. It requires an input component and Row main link as input.
Related Scenario
For related topics, see section tOracleSCD and section tMysqlSCD.
538
tPaloCheckElements
tPaloCheckElements
tPaloCheckElements Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component checks whether elements are present in an incoming data flow existing in a given
cube.
Purpose
This component can be used along with tPaloOutputMulti. It checks if the elements from the
input stream exist in the given cube, before writing them. It can also define a default value to be
used for nonexistent elements.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Cube
Type in the name of the cube in which the data should be written.
On element error
when
existing
Advanced settings
Select this check box to collect log data on the component level.
539
Related scenario
Connections
Usage
Limitation
Related scenario
For a related scenario, see section Scenario 2: Rejecting inflow data when the elements to be written do not exist
in a given cube.
540
tPaloConnection
tPaloConnection
tPaloConnection Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component opens a connection to a Palo Server and keeps it open throughout the duration of
the process it is required for. Every other Palo component used in the process is able to use this
connection.
Purpose
This component allows other components involved in a process to share its connection to a Palo
server for the duration of the process.
Basic settings
Host Name
Server Port
Advanced settings
Select this check box to collect log data at the component level.
Connections
Usage
This component is used along with Palo components to offer a shared connection to a Palo server.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For related scenarios, see section Scenario: Creating a dimension with elements.
541
tPaloCube
tPaloCube
tPaloCube Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component creates, deletes or clears Palo cubes from existing dimensions in a Palo database.
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Cube
Type in the name of the cube where the operation is to take place.
Cube type
From the drop-down list, select the type of cube on which the
operation is to be carried out:
when
existing
Select the operation you want to carry out on the cube defined:
- Create cube: the cube does not exist and will be created.
- Create cube if not exists: the cube is created if it does not exist.
- Delete cube if exists and create: the cube is deleted if it already
exists and a new one will be created.
- Delete cube: the cube is deleted from the database.
- Clear cube: the data is cleared from the cube.
Dimension list
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
CUBENAME: Indicates the name of the cube processed. This is an After variable and it returns
a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
542
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
Usage
Can be used as a standalone component for dynamic cube creation with a defined dimension list.
Limitation
The cube creation process does not create dimensions from scratch, so the dimensions to be used
in the cube must be created beforehand.
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
2.
543
3.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
4.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
5.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
6.
In the Database field, type in the database name in which you want to create the cube, Biker in this example.
7.
In the Cube field, type in the name you want to use for the cube to be created, for example, bikerTalend.
8.
In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning
this cube will be normal and default.
9.
In the Action on cube field, select the action to be performed. In this scenario, select Create cube.
10. Under the Dimension list table, click the plus button twice to add two rows into the table.
11. In the Dimension list table, type in the name for each newly added row to replace the default row name.
In this scenario, type in Months for the first row and Products for the second. These two dimensions exist
already in the Biker database where the new cube will be created.
Job execution
Press F6 to run the Job.
A new cube has been created in the Biker database and the two dimensions are added into this cube.
544
545
tPaloCubeList
tPaloCubeList
tPaloCubeList Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component retrieves a list of cube details from the given Palo database.
Purpose
This component lists cube names, cube types, number of assigned dimensions, the number of filled
cells from the given database.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Host Name
Server Port
Database
Type in the name of the database whose cube details you want to
retrieve.
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
NB_CUBES: indicates the number of the cubes processed from the given database. This is an After
variable and it returns an integer.
Unavailable
using an
connection.
when
existing
CUBEID: indicates the IDs of the cubes being processed from the given database. This is a Flow
variable and it returns an integer.
CUBENAME: indicates the name of the cubes being processed from the given database. This is
a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
Usage
546
Limitation
Type
Description
Cube_id
int
Cube_name
string
Cube_dimensions
int
Cube_cells
long
Cube_filled_cells
long
Cube_status
int
Cube_type
int
Drop tPaloCubeList and tLogRow from the component Palette onto the design workspace.
547
2.
3.
From this menu, select Row > Main to link the two components.
2.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
3.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
4.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
5.
In the Database field, type in the database name in which you want to create the cube, Biker in this example.
Job execution
Press F6 to run the Job.
The cube details are retrieved from the Biker database and are listed in the console of the Run view.
For further information about how to inteprete the cube details listed in the console, see section Discovering the
read-only output schema of tPaloCubeList.
548
tPaloDatabase
tPaloDatabase
tPaloDatabase Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Type in the name of the database on which the given operation should
take place.
Action on database
when
existing
- Create database: the database does not exist and will be created.
- Create database if not exists: the database is created when it does
not exist.
- Delete database if exists and create: the database is deleted if exist
and a new one is then created.
- Delete database: the database is removed from the server
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
DATABASE: Indicates the name of the database being processed. This is an After variable and
it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
549
For further information regarding connections, see Talend Studio User Guide.
Usage
This component can be used in standalone for database management in a Palo server.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Drop tPaloDatabase from the component Palette onto the design workspace.
2.
3.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
4.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
5.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
6.
In the Database field, type in the database name in which you want to create the cube, talenddatabase in
this example.
7.
In the Action on database field, select the action to be performed. In this scenario, select Create database
as the database to be created does not exist.
8.
550
tPaloDatabaseList
tPaloDatabaseList
tPaloDatabaseList Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component retrieves a list of database details from the given Palo server.
Purpose
This component lists database names, database types, number of cubes, number of dimensions,
database status and database id from a given Palo server.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
when
existing
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
NB_DATABASES: Indicates the number of the databases processed. This is an After variable and
it returns an integer.
DATABASEID: Indicates the id of the database being processed. This is a Flow variable and it
returns a long.
DATABASENAME: Indicates the name of the database processed. This is an After variable and
it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
Usage
Limitation
551
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Type
Description
Database_id
long
Database_name
string
Database_dimensions
int
Database_cubes
int
Database_status
int
Database_types
int
Drop tPaloDatabaseList and tLogRow from the component Palette onto the design workspace.
2.
3.
From this menu, select Row > Main to link the two components.
552
2.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
3.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
4.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
Job execution
Press F6 to run the Job.
Details of all of the databases in the Palo server are retrieved and listed in the console of the Run view.
For further information about the output schema, see section Discovering the read-only output schema of
tPaloDatabaseList.
553
tPaloDimension
tPaloDimension
tPaloDimension Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component creates, drops or recreates dimensions with or without dimension elements inside
a Palo database.
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Dimension
Action on dimension
when
existing
554
tPaloDimension Properties
Select this check box to move directly the incoming elements into
the given dimension. With this option, you will not define any
With this option, consolidations or hierarchy.
you activate the
corresponding
parameter fields to
be completed.
Input Column: select a column from the drop-down list. The
columns in the drop-down list are those you defined for the schema.
The values from this selected column would be taken to process
dimension elements.
Element type: Select the type of elements. It may be:
- Numeric
- Text
Creation mode: Select creation mode for elements to be processed.
This mode may be:
- Add: add simply an element to the dimension.
- Force add: force the creation of this element. If exist this element
will be recreated.
- Update: updates this element if it exists.
- Add or Update: if this element does not exist, it will be created
otherwise it will be updated. This is the default option.
- Delete: delete this element from the dimension
Consolidation type - Normal Select this check box to create elements and consolidate them
inside the given dimension. This consolidation structures the created
With this option, elements in different levels.
you activate the
corresponding
parameter fields to
be completed.
Input Column: select a column from the drop-down list. The
columns in the drop-down list are those you defined for the schema.
The values from this selected column would be taken to process
dimension elements.
Element type: Select the type of elements. It may be:
- Numeric
- Text
Creation mode: Select creation mode for elements to be created.
This mode may be
- Add: add simply an element to the dimension.
- Force add: force the creation of this element. If the element exists,
it will be recreated.
- Update: updates this element if it exists.
555
tPaloDimension Properties
Creation mode
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
DIMENSIONNAME: Indicates the name of the dimension processed. This is an After variable
and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
Usage
Limitation
Deletion of dimension elements is only possible with the consolidation type None. Only
consolidation type Self-Referenced allows the placing of an factor on this consolidation.
556
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Drop tPaloConnection, tRowGenerator, tMap, tPaloDimension from the component Palette onto the
design workspace.
2.
Right-click tPaloConnection to open the contextual menu and select Trigger > On Subjob Ok to link it
to tRowGenerator.
3.
Right-click tRowGenerator to open the contextual menu and select Row > Main to link it to tMap.
tRowGenerator is used to generate rows at random in order to simplify this process. In the real case, you can use one
of the other input components to load your actual data.
4.
Right-click tMap to open the contextual menu and select Row > New output to link to tPaloDimension,
then name it as out1 in the dialog box that pops up.
557
2.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
3.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
4.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
2.
On the upper part of the editor, click the plus button to add one column and rename it as random_date in
the Column column.
3.
In the newly added row, select Date in the Type column and getRandomDate in the Functions column.
4.
In the Function parameters view on the lower part of this editor, type in the new minimum date and
maximum date values in the Value column. In this example, the minimum is 2010-01-01, the maximum is
2010-12-31.
5.
6.
On the dialog box that pops up, click OK to propagate your changes.
558
2.
On the Schema editor view on the lower part of the tMap editor, under the out1 table, click the plus button
to add three rows.
3.
In the Column column of the out1 table, type in the new names for the three newly added rows. They are
Year, Month, and Date. These rows are then added automatically into the out1 table on the upper part of
the tMap editor.
4.
In the out1 table on the upper part of the tMap editor, click the Expression column in the Year row to locate
the cursor.
5.
6.
Double-click TalendDate.formatDate to select it from the list. The expression to get the date displays in
the Year row under the Expression column. The expression is TalendDate.formatDate("yyyy-MM-dd
HH:mm:ss",myDate).
7.
8.
Do the same for the Month row and the Date row to add this default expression and to
replace it with TalendDate.formatDate("MM",row1.random_date) for the Month row and with
TalendDate.formatDate("dd-MM-yyyy", row1.random_date) for the Date row.
9.
Click OK to validate this modification and accept the propagation by clicking OK in the dialog box that
pops up.
559
2.
Select the Use an existing connection check box. Then tPaloConnection_1 displays automatically in the
Connection configuration field.
3.
In the Database field, type in the database in which the new dimension is created, talendDatabase for this
scenario.
4.
In the Dimension field, type in the name you want to use for the dimension to be created, for example, Date.
5.
In the Action on dimension field, select the action to be performed. In this scenario, select Create dimension
if not exist.
6.
7.
8.
Under the element hierarchy table in the Consolidation Type area, click the plus button to add three rows
into the table.
9.
In the Input column column of the element hierarchy table, select Year from the drop-down list for the first
row, Month for the second and Date for the third. This determinates levels of elements from different columns
of the input schema.
Job execution
Press F6 to run the Job.
A new dimension is then created in your Palo database talendDatabase.
560
561
tPaloDimensionList
tPaloDimensionList
tPaloDimensionList Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component retrieves a list of dimension details from the given Palo database.
Purpose
This component lists dimension names, dimension types, number of dimension elements,
maximum dimension indent, maximum dimension depth, maximum dimension level, dimension
id from a given Palo server.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Host Name
Server Port
Database
Cube
Unavailable
when
using an existing
connection.
Available when
you select the
Retrieve
cube
dimensions check
box.
Schema and Edit Schema
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
DIMENSIONNAME: Indicates the name of the dimension being processed. This is a Flow
variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
562
Limitation
Type
Description
Dimension_id
long
Dimension_name
string
Dimension_attribute_cube
string
Dimension_rights_cube
string
Dimension_elements
int
Dimension_max_level
int
Dimension_max_indent
int
Dimension_max_depth
int
Dimension_type
int
563
Drop tPaloDimensionList and tLogRow from the component Palette onto the design workspace.
2.
3.
From this menu, select Row > Main to link the two components.
2.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
3.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
4.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
5.
In the Database field, type in the database name where the dimensions of interest reside, Biker in this example.
Job execution
Press F6 to run the Job.
Details of all the dimensions in the Biker database are retrieved and listed in the console of the Run view.
564
For further information about the output schema, see section Discovering the read-only output schema of
tPaloDimensionList.
565
tPaloInputMulti
tPaloInputMulti
tPaloInputMulti Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component retrieves data (elements as well as values) from a Palo cube.
Purpose
This component retrieves the stored or calculated values in combination with the element records
out of a cube.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Cube
Cube type
Select the cube type from the drop-down list for the cube of concern.
This type may be:
when
existing
- Normal
- Attribut
- System
- User Info
Commit size
Cube Query
Complete this table with the query you want to use to retrieve data.
The columns to be filled are:
Column: the schema columns are added automatically to this
column once defined in the schema editor. The schema columns are
used to stored the retrieved dimension elements.
Dimensions: type in each of the dimension names of the cube from
which you want to retrieve dimension elements.
566
Connections
Select this check box to collect log data at the component level.
Row: Main
Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error.
Incoming links (from one component to this one):
Row: Iterate
Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error.
For further information regarding connections, see Talend Studio User Guide.
Usage
Limitation
According to the architecture of OLAP-Systems only one single value (text or numeric) could be
retrieved from the cube. The MEASURE column and the TEXT column are fixed and read-only.
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Drop tPaloInputMulti and tLogRow from the component Palette onto the design workspace.
2.
3.
In the menu, select Row > Main to connect tPaloInputMulti to tLogRow with a row link.
567
2.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
3.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
4.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
In the Database field, type in the database name in which the cube to be used is stored.
2.
In the Cube field, type in the cube name in which the dimensions of interests are stored. In this scenario, it
is one of the demo cubes Sales.
3.
In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning
this cube will be normal and default.
4.
Next to the Edit schema field, click the three-dot button to open the schema editor.
568
5.
In the schema editor, click the plus button to add the rows of the schema to be edited. In this example, add
rows corresponding to all of the dimensions stored in the Sales cube: Products, Regions, Months, Years,
Datatypes, Measures. Type in them in the order given in this cube.
6.
Click OK to validate this editing and accept the propagation of this change to the next component. Then
these columns are added automatically into the Column column of the Cube query table in the Component
view. If the order is not consistent with the one in the Sales cube, adapt it using the up and down arrows
under the schema table.
7.
In the Dimensions column of the Cube query table, type in each of the dimension names stored in the Sales
cube regarding to each row in the Column column. In the Sales cube, the dimension names are: Products,
Regions, Months, Years, Datatypes, Measures.
8.
In the Elements columns of the Cube query table, type in the dimension elements you want to retrieve
regarding to the dimensions they belong to. In this example, the elements to be retrieved are All Products,
Germany and Austria (Belonging to the same dimension Regions, these two elements are entered in the same
row and separated with a coma.), Jan, 2009, Actual, Turnover.
Job execution
1.
2.
In the Mode area, select the Table (print values in cells of a table) check box to display the execution result
in a table.
3.
569
The dimension elements and the corresponding Measure values display in the Run console.
570
tPaloOutput
tPaloOutput
tPaloOutput Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component writes one row of data (elements as well as values) into a Palo cube.
Purpose
This component takes the input stream and writes it to a given Palo cube.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Type in the name of the database where the cube of interest resides.
Cube
Type in the name of the cube in which the incoming data is written.
Commit size
Type in the row count of each batch to be written into the cube.
when
existing
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component
only. Related topic: see Talend Studio User Guide.
Column as Measure
Select the column from the input stream which holds the Measure
or Text values.
Select this check box to create the element being processed if it does
not exist originally.
Select this check box to save the cube you have written the data in
at the end of this process.
Advanced settings
Select this check box to collect log data at the component level.
Global variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
571
Related scenario
Row: Iterate
Trigger: Run if
Incoming links (from one component to this one):
Row: Main; Reject
For further information regarding connections, see Talend Studio User Guide.
Usage
Limitation
This component is able to write only one row of data into a cube.
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For related topic, see section Scenario 1: Writing data into a given cube.
572
tPaloOutputMulti
tPaloOutputMulti
tPaloOutputMulti Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component writes data (elements as well as values) into a Palo cube.
Purpose
This component takes the input stream and writes it to a given Palo cube.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Type in the name of the database where the cube of interest resides.
Cube
Type in the name of the cube in which the incoming data is written.
Cube type
Select the cube type from the drop-down list for the cube of concern.
This type may be:
when
existing
- Normal
- Attribut
- System
- User Info
Commit size
Type in the row count of each batch to be written into the cube.
Measure value
Select the column from the input stream which holds the Measure
or Text values.
Splash mode
Select the splash mode used to write data into a consolidated element.
The mode may be:
- Add: it writes values to the underlying elements.
- Default: it uses the default splash mode.
- Set: it simply sets or replaces the current value and make the
distribution based on the other values.
573
Select this check box to add new values to the current values for a
sum. Otherwise these new values will overwrite the current ones.
Use eventprocessor
Die on error
Advanced settings
Select this check box to collect log data at the component level.
Connections
Usage
Limitation
Numeric measures are only be accepted as Double or String type. When the string type is used,
write the value to be processed between quotation marks.
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Drop tFixedFlowInput and tPaloOutputMulti from the component Palette onto the design workspace.
2.
3.
In this menu, select Row > Main to connect this component to tPaloOutputMulti.
574
2.
3.
In the schema editor, click the plus button to add 7 rows and rename them respectively as Products, Regions,
Months, Years, Datatypes, Measures and Values. The order of these rows must be consistent with that of
the corresponding dimensions in the Sales cube and the type of the Value column where the measure value
resides is set to double/Double.
4.
Click OK to validate the editing and accept the propagation prompted by the dialog box that pops up. Then
the schema column labels display automatically in the Value table under the Use single table check box,
in the Mode area.
5.
In the Value table, type in values for each row in the Value column. In this example, these values are: Desktop
L, Germany, Jan, 2009, Actual, Turnover, 1234.56.
575
2.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
3.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
4.
In the Database field, type in the database name in which you want to create the cube, Demo in this example.
5.
In the Cube field, type in the name of the cube you want to write data in, for example, Sales.
6.
In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning
this cube will be normal and default.
7.
In the Measure Value field, select the Measure element. In this scenario, select Value.
Job execution
Press F6 to run the Job.
The inflow data has been written into the Sales cube.
576
Scenario 2: Rejecting inflow data when the elements to be written do not exist in a given cube
2.
3.
In this menu, select Row > Main to connect this component to tPaloCheckElements.
4.
5.
6.
In this menu, select Row > Reject to connect this component to tLogRow.
577
Scenario 2: Rejecting inflow data when the elements to be written do not exist in a given cube
2.
3.
In the schema editor, click the plus button to add 7 rows and rename them respectively as Products, Regions,
Months, Years, Datatypes, Measures and Values. The order of these rows must be consistent with that of
the corresponding dimensions in the Sales cube and the type of the Value column where the measure value
resides is set to double/Double.
4.
Click OK to validate the editing and accept the propagation prompted by the dialog box that pops up. Then
the schema column labels display automatically in the Value table under the Use single table check box,
in the Mode area.
5.
In the Value table, type in values for each row in the Value column. In this example, these values are: Smart
Products, Germany, Jan, 2009, Actual, Turnover, 1234.56. The Smart Products element does not exist in
the Sales cube.
578
Scenario 2: Rejecting inflow data when the elements to be written do not exist in a given cube
2.
3.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
4.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
5.
In the Database field, type in the database name in which you want to create the cube, Demo in this example.
6.
In the Cube field, type in the name of the cube you want to write data in, for example, Sales.
7.
In the On Element error field, select Reject row from the drop-down list.
8.
In the element table at the bottom of the Basic settings view, click the Element type column in the Value
row and select Measure from the drop down list.
579
Scenario 2: Rejecting inflow data when the elements to be written do not exist in a given cube
2.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
3.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
4.
In the Database field, type in the database name in which you want to create the cube, Demo in this example.
5.
In the Cube field, type in the name of the cube you want to write data in, for example, Sales.
6.
In the Cube type field, select the Normal type from the drop-down list for the cube to be created, meaning
this cube will be normal and default.
7.
In the Measure Value field, select the Measure element. In this scenario, select Value.
Job execution
Press F6 to run the Job.
The data to be written is rejected and displayed in the console of the Run view. You can read that the error message
is Smart Products.
580
tPaloRule
tPaloRule
tPaloRule Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Type in the name of the database where the dimensions applying the
rules of interest reside.
Cube
Cube rules
when
existing
Connections
Select this check box to collect log data at the component level.
Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error.
Incoming links (from one component to this one):
Row: Iterate
Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error.
For further information regarding connections, see Talend Studio User Guide.
Usage
This component can be used in standalone for rule creation, deletion or update.
Limitation
Update or deletion of a rule is available only when this rule has been created with external ID.
581
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Drop tPaloRule from the component Palette onto the design workspace.
2.
3.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
4.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
582
5.
In the Username field and the Password field, type in the authentication information. In this example, both
of them are admin.
6.
In the Database field, type in the database name in which the dimensions applying the created rules reside,
Biker in this example.
7.
In the Cube field, type in the name of the cube which the dimensions applying the created rules belong to,
for example, Orders.
Under the Cube rules table, click the plus button to add a new row.
2.
In the Cube rules table, type in ['2009'] = 123 in the Definition column, OrderRule1 in the External
Id column and Palo Demo Rules in the Comment column.
3.
4.
Job execution
Press F6 to run the Job.
The new rule has been created and the value of every 2009 element is 123.
583
tPaloRuleList
tPaloRuleList
tPaloRuleList Properties
Component family
Business Intelligence/Cube
OLAP/Palo
Function
This component retrieves a list of rule details from the given Palo database.
Purpose
This component lists all rules, formulas, comments, activation status, external IDs from a given
cube.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Connection configuration
Unavailable
using an
connection.
Host Name
Server Port
Database
Cube
Type in the name of the cube in which you want to retrieve the rule
information.
when
existing
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component
only. Related topic: see Talend Studio User Guide.
Advanced settings
Select this check box to collect log data at the component level.
Global Variables
NB_RULES: Indicates the number of the rules processed. This is an After variable and it returns
an integer.
EXTERNAL_RULEID: Indicates the external IDs of the rules being processed. This is a Flow
variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
584
Row: Iterate
Trigger: Run if; On Subjob Ok; On Subjob Error; On Component Ok; On Component Error.
For further information regarding connections, see Talend Studio User Guide.
Usage
Limitation
Type
Description
rule_identifier
long
rule_definition
string
The formula of this rule. For further information about this formula,
see the Palo user guide.
rule_extern_id
string
rule_comment
string
rule_activated
boolean
Drop tPaloRuleList and tLogRow from the component Palette onto the design workspace.
2.
3.
From this menu, select Row > Main to link the two components.
585
2.
In the Host name field, type in the host name or the IP address of the host server, localhost for this example.
3.
In the Server Port field, type in the listening port number of the Palo server. In this scenario, it is 7777.
4.
In the Username and Password fields, type in the authentication information. In this example, both of them
are admin.
5.
In the Database field, type in the database name where the dimensions applying the rules of interest reside,
Biker in this example.
6.
In the Cube field, type in the name of the cube which the rules of interest belong to.
Job execution
Press F6 to run the Job.
Details of all of the rules in the Orders cube are retrieved and listed in the console of the Run view.
For further information about the output schema, see section Discovering the read-only output schema of
tPaloRuleList.
586
tParAccelSCD
tParAccelSCD
tParAccelSCD Properties
Component
family
Databases/
ParAccel
Function
Purpose
tParAccelSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Connection type
Host
Port
Database
Schema
Username
Password
Table
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on
schema
to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension
outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to maximize system performance.
saving Mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
587
Related scenario
Special attention should be paid to the uniqueness of the source key(s) values when
this option is selected.
Die on error
Advanced
settings
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Debug mode
Dynamic
settings
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can
easily find out and add such JARs in the Integration perspective of your studio. For details, see the section about
external modules in the Talend Installation and Upgrade Guide.
Related scenario
For related scenarios, see section tMysqlSCD.
588
tPostgresPlusSCD
tPostgresPlusSCD
tPostgresPlusSCD Properties
Component
family
Databases/
PostgresPlus
Server
Function
Purpose
tPostgresPlusSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging
the changes into a dedicated SCD table
Basic settings
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Property type
DB Version
Server
Port
Database
Schema
Username
Password
Table
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on
schema
to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension
outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to maximize system performance.
saving Mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
589
Related scenario
Special attention should be paid to the uniqueness of the source key(s) values when
this option is selected.
Advanced
settings
Dynamic
settings
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
tStatCatcher
Statistics
Select this check box to collect log data at the component level.
Debug mode
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Related scenario
For related topics, see section tMysqlSCD.
590
tPostgresPlusSCDELT
tPostgresPlusSCDELT
tPostgresPlusSCDELT Properties
Component family
Databases/
PostgresPlus
Function
Purpose
tPostgresPlusSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side
processing mode), and logs the changes into a dedicated PostgresPlus SCD table.
Basic settings
Property type
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share
an existing connection between the two levels, for example, to share the
connection created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the
Basic settings view of the connection component which creates that very
database connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job levels,
see Talend Studio User Guide.
DB Version
Host
Port
Database
Schema
Username
Password
Source table
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
Schema
schema
and
Edit A schema is a row description. It defines the number of fields to be processed and
passed on to the next component.
591
Related Scenario
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related
topic: see Talend Studio User Guide.
Surrogate Key
Creation
Source Keys
Select one or more columns to be used as keys, to ensure the unicity of incoming data.
Use SCD Type 1 fields Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos
corrections for example. Select the columns of the schema that will be checked for
changes.
Use SCD Type 2 fields Use type 2 if changes need to be tracked down. SCD Type 2 should be used to
trace updates for example. Select the columns of the schema that will be checked for
changes.
Start date: Adds a column to your SCD schema to hold the start date value. You can
select one of the input schema columns as Start Date in the SCD table.
End Date: Adds a column to your SCD schema to hold the end date value for the
record. When the record is currently active, the End Date column shows a null value,
or you can select Fixed Year value and fill it in with a fictive year to avoid having
a null value in the End Date field.
Log Active Status: Adds a column to your SCD schema to hold the true or false
status value. This column helps to easily spot the active record.
Log versions: Adds a column to your SCD schema to hold the version number of
the record.
Advanced settings
Debug mode
Select this check box to display each step during processing entries in a database.
tStatCatcher Statistics Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful when
you need to access database tables having the same data structure but in different databases, especially when
you are working in an environment where you cannot change your Job settings, for example, when your Job
has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in
the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings
view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as an output component. It requires an input component and Row main link as input.
Related Scenario
For related topics, see section tMysqlSCD.
592
tPostgresqlSCD
tPostgresqlSCD
tPostgresqlSCD Properties
Component Databases/
family
Postgresql
Server
Function
Purpose
tPostgresqlSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Basic
settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component to reuse
connection
the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
DB Version
Host
Port
Database
Schema
Name of the table to be written. Note that only one table can be written at a time.
Schema
and A schema is a row description. It defines the number of fields to be processed and passed on to the
Edit schema
next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see Talend
Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension outputs.
For more information, see section SCD management methodologies.
Use
memory Select this check box to maximize system performance.
saving Mode
Source
keys Select this check box to allow the source key columns to have Null values.
include Null
593
Related scenario
Special attention should be paid to the uniqueness of the source key(s) values when this
option is selected.
Advanced
settings
Dynamic
settings
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete the process
for error-free rows.
tStatCatcher
Statistics
Select this check box to collect log data at the component level.
Debug mode
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an
environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Related scenario
For related topics, see section tMysqlSCD.
594
tPostgresqlSCDELT
tPostgresqlSCDELT
tPostgresqlSCDELT Properties
Component
family
Databases/Postgresql
Function
Purpose
tPostgresqlSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing
mode), and logs the changes into a dedicated DB2 SCD table.
Basic settings
Property type
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share
an existing connection between the two levels, for example, to share the
connection created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the
Basic settings view of the connection component which creates that very
database connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job levels,
see Talend Studio User Guide.
DB Version
Host
Port
Database
Username
Password
Source table
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
Schema
schema
and
Edit A schema is a row description. It defines the number of fields to be processed and passed
on to the next component.
595
Related Scenario
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related
topic: see Talend Studio User Guide.
Surrogate Key
Creation
Source Keys
Select one or more columns to be used as keys, to ensure the unicity of incoming data.
Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos
corrections for example. Select the columns of the schema that will be checked for
changes.
Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace
updates for example. Select the columns of the schema that will be checked for changes.
Start date: Adds a column to your SCD schema to hold the start date value. You can
select one of the input schema columns as Start Date in the SCD table.
End Date: Adds a column to your SCD schema to hold the end date value for the record.
When the record is currently active, the End Date column shows a null value, or you
can select Fixed Year value and fill it in with a fictive year to avoid having a null value
in the End Date field.
Log Active Status: Adds a column to your SCD schema to hold the true or false status
value. This column helps to easily spot the active record.
Log versions: Adds a column to your SCD schema to hold the version number of the
record.
Advanced
settings
Debug mode
Select this check box to display each step during processing entries in a database.
tStat Catcher Statistics Select this check box to collect log data at the component level.
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as an output component. It requires an input component and Row main link as input.
Related Scenario
For related topics, see section tMysqlSCD.
596
tSPSSInput
tSPSSInput
tSPSSInput properties
Component family
Business Intelligence
Function
Purpose
tSPSSInput addresses SPSS .sav data to write it for example in another file.
Basic settings
Sync schema
Filename
Translate labels
Select this check box to translate the labels of the stored values.
If you select this check box, you need to retrieve the
metadata again.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
Oracle provides two kinds of JVM platforms (32-bit and 64-bit). By default, the JVM used in
a 64-bit operating system is the 64-bit version of the JVM. Since the JSPSS.dll file used by
this component is compiled in the 32-bit JVM, it is needed to configure the 32-bit JVM for Job
execution in Talend Studio installed on a 64-bit operating system.
To do so, perform the following:
1.
2.
Click the Run tab and enter the Advanced settings view.
3.
Select the Use specific JVM arguments check box and click the New... button.
4.
In the Set the VM argument box, enter -d32 and click Ok for validation.
597
Drop a tSPSSInput component and a tLogRow component from the Palette onto the design workspace.
2.
Click tSPSSInput to display its Basic settings view and define the component properties.
2.
Click the three-dot button next to the Filename field and browse to the SPSS .sav file you want to read.
3.
Click the three-dot button next to Sync schema. A message opens up prompting you to accept retrieving the
schema from the defined SPSS file.
4.
Click Yes to close the message and proceed to the next step.
5.
If required, click the three-dot button next to Edit schema to view the pre-defined data structure of the source
SPSS file.
598
6.
Job execution
Save the Job and press F6 to execute it.
The SPSS file is read row by row and the extracted fields are displayed on the log console.
In the Basic settings view, select the Translate label check box.
2.
Click Sync Schema a second time to retrieve the schema after translation.
A message opens up prompting you to accept retrieving the schema from the defined SPSS file.
3.
Click Yes to close the message and proceed to the next step.
A second message opens up prompting you to accept propagating the changes.
4.
Click Yes to close the message and proceed to the next step.
5.
The SPSS file is read row by row and the extracted fields are displayed on the log console after translating the
stored values.
599
600
tSPSSOutput
tSPSSOutput
tSPSSOutput properties
Component family
Business Intelligence
Function
Purpose
tSPSSOutput writes or appends data to an SPSS .sav file. It creates SPSS files on the fly and
overwrites existing ones.
Basic settings
Sync schema
Click this button to synchronize with the columns of the SPSS .sav
file.
Filename
Write Type
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component can not be used as start component. It requires an input flow.
Before being able to benefit from all functional objectives of the SPSS components, make
sure to do the following: -If you have already installed SPSS, add the path to the SPSS
directory as the following: SET PATH=%PATH%;<DR>:\program\SPSS, or -If you
have not installed SPSS, you must copy the SPSS IO "spssio32.dll" lib from the SPSS
installation CD and paste it in Talend root directory.
Limitation
Oracle provides two kinds of JVM platforms (32-bit and 64-bit). By default, the JVM used in
a 64-bit operating system is the 64-bit version of the JVM. Since the JSPSS.dll file used by
this component is compiled in the 32-bit JVM, it is needed to configure the 32-bit JVM for Job
execution in Talend Studio installed on a 64-bit operating system.
To do so, perform the following:
1.
2.
Click the Run tab and enter the Advanced settings view.
3.
Select the Use specific JVM arguments check box and click the New... button.
4.
In the Set the VM argument box, enter -d32 and click Ok for validation.
601
Drop a tRowGenerator component and a tSPSSOutput component from the Palette onto the design
workspace.
2.
In the design workspace, double click tRowGenerator to display its Basic Settings view and open its editor.
Here you can define your schema.
2.
Click the plus button to add the columns you want to write in the .sav file.
3.
4.
602
Click tSPSSOutput to display its Basic settings view and define the component properties.
2.
Click the three-dot button next to the Filename field and browse to the SPSS .sav file in which you want
to write data.
3.
Click the three-dot button next to Sync columns to synchronize columns with the previous component. In
this example, the schema to be inserted in the .sav file consists of the two columns: id and country.
4.
5.
From the Write Type list, select Write or Append to simply write the input data in the .sav file or add it
to the end of the .sav file.
Job execution
Save the Job and press F6 to execute it.
The data generated by the tRowGenerator component is written in the defined .sav file.
603
tSPSSProperties
tSPSSProperties
tSPSSProperties properties
Component family
Business Intelligence
Function
Purpose
tSPSSProperties allows you to obtain information about the main properties of a defined
SPSS .sav file.
Basic settings
Filename
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Limitation
Oracle provides two kinds of JVM platforms (32-bit and 64-bit). By default, the JVM used in
a 64-bit operating system is the 64-bit version of the JVM. Since the JSPSS.dll file used by
this component is compiled in the 32-bit JVM, it is needed to configure the 32-bit JVM for Job
execution in Talend Studio installed on a 64-bit operating system.
To do so, perform the following:
1.
2.
Click the Run tab and enter the Advanced settings view.
3.
Select the Use specific JVM arguments check box and click the New... button.
4.
In the Set the VM argument box, enter -d32 and click Ok for validation.
Related scenarios
For related topics, see:
section Scenario: Reading master data in an MDM hub.
section Scenario: Writing data in an .sav file.
604
tSPSSStructure
tSPSSStructure
tSPSSStructure properties
Component family
Business Intelligence
Function
Purpose
tSPSSStructure addresses variables inside .sav files. You can use this component in combination with
tFileList to gather information about existing *.sav files to further analyze or check the findings.
Basic settings
Filename
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means
it functions after the execution of a component.
Usage
Limitation
Oracle provides two kinds of JVM platforms (32-bit and 64-bit). By default, the JVM used in a 64-bit
operating system is the 64-bit version of the JVM. Since the JSPSS.dll file used by this component is
compiled in the 32-bit JVM, it is needed to configure the 32-bit JVM for Job execution in Talend Studio
installed on a 64-bit operating system.
To do so, perform the following:
1.
2.
Click the Run tab and enter the Advanced settings view.
3.
Select the Use specific JVM arguments check box and click the New... button.
4.
In the Set the VM argument box, enter -d32 and click Ok for validation.
605
Related scenarios
Related scenarios
For related topics, see:
section Scenario: Reading master data in an MDM hub.
section Scenario: Writing data in an .sav file.
606
tSybaseSCD
tSybaseSCD
tSybaseSCD properties
Component
family
Databases/Sybase
Function
Purpose
tSybaseSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the
changes into a dedicated SCD table
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component
connection
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Host
Port
Database
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time.
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed
schema
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic:
see Talend Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension
outputs.
For more information, see section SCD management methodologies.
Use memory saving Select this check box to maximize system performance.
Mode
Die on error
Advanced
settings
Additional
parameters
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
JDBC Specify additional connection properties for the DB connection you are creating. This option
is not available if you have selected the Use an existing connection check box in the Basic
settings.
607
Related scenarios
tStat
Statistics
Catcher Select this check box to collect log data at the component level.
Debug mode
Dynamic
settings
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row main link as input.
Limitation
This component requires installation of its related jar files. For more information about the installation of these
missing jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade
Guide.
Related scenarios
For related topics, see section tMysqlSCD.
608
tSybaseSCDELT
tSybaseSCDELT
tSybaseSCDELT Properties
Component family Databases/Sybase
Function
Purpose
tSybaselSCDELT addresses Slowly Changing Dimension needs through SQL queries (server-side processing
mode), and logs the changes into a dedicated Sybase SCD table.
Basic settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection
connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share
an existing connection between the two levels, for example, to share the
connection created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the
Basic settings view of the connection component which creates that very
database connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job levels,
see Talend Studio User Guide.
Host
Port
Database
Username
Password
Source table
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
Schema
schema
and
Edit A schema is a row description. It defines the number of fields to be processed and passed
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
609
Related Scenario
Built-in: The schema is created and stored locally for this component only. Related
topic: see Talend Studio User Guide.
Surrogate Key
Creation
Source Keys
Select one or more columns to be used as keys, to ensure the unicity of incoming data.
Use SCD Type 1 Use type 1 if tracking changes is not necessary. SCD Type 1 should be used for typos
fields
corrections for example. Select the columns of the schema that will be checked for
changes.
Use SCD Type 2 Use type 2 if changes need to be tracked down. SCD Type 2 should be used to trace
fields
updates for example. Select the columns of the schema that will be checked for changes.
Start date: Adds a column to your SCD schema to hold the start date value. You can
select one of the input schema columns as Start Date in the SCD table.
End Date: Adds a column to your SCD schema to hold the end date value for the record.
When the record is currently active, the End Date column shows a null value, or you
can select Fixed Year value and fill it in with a fictive year to avoid having a null value
in the End Date field.
Log Active Status: Adds a column to your SCD schema to hold the true or false status
value. This column helps to easily spot the active record.
Log versions: Adds a column to your SCD schema to hold the version number of the
record.
Advanced settings
Additional
parameters
JDBC Specify additional connection properties for the DB connection you are creating. This
option is not available if you have selected the Use an existing connection check box
in the Basic settings.
Debug mode
tStat
Statistics
Dynamic settings
Select this check box to display each step during processing entries in a database.
Catcher Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful when
you need to access database tables having the same data structure but in different databases, especially when
you are working in an environment where you cannot change your Job settings, for example, when your Job
has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in
the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings
view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as an output component. It requires an input component and Row main link as input.
Limitation
This component requires installation of its related jar files. For more information about the installation of
these missing jar files, see the section describing how to configure the Studio of the Talend Installation and
Upgrade Guide.
Related Scenario
For related topics, see section tMysqlSCD and section Scenario: Tracking changes using Slowly Changing
Dimensions (type 0 through type 3).
610
tVerticaSCD
tVerticaSCD
tVerticaSCD Properties
Component Databases/
family
Vertica
Function
Purpose
tVerticaSCD addresses Slowly Changing Dimension needs, reading regularly a source of data and logging the changes
into a dedicated SCD table
Basic
settings
Property type
Use an existing Select this check box and in the Component List click the relevant connection component to reuse
connection
the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
DB Version
Host
Port
Database
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time.
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not exist.
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on to the
schema
next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see Talend
Studio User Guide.
SCD Editor
The SCD editor helps to build and configure the data flow for slowly changing dimension outputs.
For more information, see section SCD management methodologies.
611
tVerticaSCD Properties
Use
memory Select this check box to maximize system performance.
saving mode
Die on error
Advanced
settings
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Debug mode
Dynamic
settings
This check box is cleared by default, meaning to skip the row on error and to complete the process
for error-free rows.
Select this check box to display each step during processing entries in a database.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an
environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component is used as Output component. It requires an Input component and Row > Main link as input.
Related scenarios
For related scenarios, see section tMysqlSCD.
612
Cloud components
This chapter details the main components which you can find in the Cloud family of the Palette in the Integration
perspective of Talend Studio.
Private and public cloud databases, data services and SaaS-based applications (CRM, HR, ERP, etc.) are springing
up alongside on-premise applications and databases that have been the mainstay of corporate IT. The resulting
hybrid IT environments have more sources, of more diverse types, which require more modes of integration, and
more effort on data quality and consistency across sources.
The Cloud family comprises the most popular database connectors adapted to Cloud and SaaS applications and
technologies.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tAmazonMysqlClose
tAmazonMysqlClose
tAmazonMysqlClose properties
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
614
tAmazonMysqlCommit
tAmazonMysqlCommit
tAmazonMysqlCommit Properties
This component is closely related to tAmazonMysqlConnection and tAmazonMysqlRollback. It usually doesnt
make much sense to use these components independently in a transaction.
Component family
Cloud/AmazonRDS/
MySQL
Function
Validates the data processed through the job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tAmazonMysqlConnection and tAmazonMysqlRollback. It usually doesnt
make much sense to use one of these without using a tAmazonMysqlConnection component to open a connection
for the current transaction.
For tAmazonMysqlCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
615
tAmazonMysqlConnection
tAmazonMysqlConnection
tAmazonMysqlConnection Properties
This component is closely related to tAmazonMysqlCommit and tAmazonMysqlRollback. It usually doesnt
make much sense to use one of these without using a tAmazonMysqlConnection component to open a connection
for the current transaction.
Component family
Cloud/AmazonRDS/
MySQL
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
DB Version
MySQL 5 is available.
Host
Port
Database
Additional
parameters
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Auto Commit
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Limitation
n/a
616
In a command line editor, connect to your Mysql server. Once connected to the relevant database, type in the
following command to create the parent table: create table f1090_mum(id int not null auto_increment, name
varchar(10), primary key(id)) engine=innodb.
2.
Then create the second table: create table baby (id_baby int not null, years int) engine=innodb.
Back into Talend Studio, the Job requires seven components including tAmazonMysqlConnection and
tAmazonMysqlCommit.
3.
Drag and drop the following components from the Palette: tFileList, tFileInputDelimited, tMap,
tAmazonMysqlOutput (x2).
4.
Connect the tFileList component to the input file component using an Iterate link as the name of the file to
be processed will be dynamically filled in from the tFileList directory using a global variable.
5.
Connect the tFileInputDelimited component to the tMap and dispatch the flow between the two output
AmazonMysql DB components. Use a Row link for each for these connections representing the main data
flow.
6.
Set the tFileList component properties, such as the directory name where files will be fetched from.
7.
Add a tAmazonMysqlConnection component and connect it to the starter component of this job, in this
example, the tFileList component using an OnComponentOk link to define the execution order.
On the tFileInputDelimited components Basic settings panel, press Ctrl+Space bar to access the variable
list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH
617
2.
Set the rest of the fields as usual, defining the row and field separators according to your file structure. Then set
the schema manually through the Edit schema feature. Make sure the data type is correctly set, in accordance
with the nature of the data processed.
In the tMap Output area, add two output tables, one called mum for the parent table, the second called baby,
for the child table.
2.
Drag the Name column from the Input area, and drop it to the mum table. Drag the Years column from the
Input area and drop it to the baby table.
Make sure the mum table is on the top of the baby table as the order is determining for the flow sequence
hence the DB insert to perform correctly.
3.
Then connect the output row link to distribute correctly the flow to the relevant DB output component.
618
In each of the tAmazonMysqlOutput components Basic settings panel, select the Use an existing
connection check box to retrieve the tAmazonMysqlConnection details.
2.
Set the Table name making sure it corresponds to the correct table, in this example either f1090_mum or
f1090_baby.
There is no action on the table as they are already created.
3.
Select Insert as Action on data for both output components. Click on Sync columns to retrieve the schema
set in the tMap.
4.
Go to the Advanced settings panel of each of the tAmazonMysqlOutput components. Notice that the
Commit every field will get overridden by the tAmazonMysqlCommit.
5.
In the Additional columns area of the DB output component corresponding to the child table (f1090_baby),
set the id_baby column so that it reuses the id from the parent table. In the SQL expression field type in:
'(Select Last_Insert_id())'.
The position is Before and the Reference column is years.
Add the tAmazonMysqlCommit component to the design workspace and connect it from the tFileList
component using a OnComponentOk connection in order for the Job to terminate with the transaction
commit.
2.
On the tAmazonMysqlCommit Component view, select in the list the connection to be used.
Job execution
Save your Job and press F6 to execute it.
The parent table id has been reused to feed the id_baby column.
619
tAmazonMysqlInput
tAmazonMysqlInput
tAmazonMysqlInput properties
Component family
Cloud/AmazonRDS/
MySQL
Function
Purpose
tAmazonMysqlInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
DB Version
MySQL 5 is available.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to
share an existing connection between the two levels, for example, to
share the connection created by the parent Job with the child Job, you
have to:
1. In the parent level, register the database connection to be shared in
the Basic settings view of the connection component which creates
that very database connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Host
Port
Database
Username
Password
Schema
Schema
and
Edit A schema is a row description, i.e. it defines the number of fields to be processed
and passed on to the next component. The schema is either Built-in or stored
remotely in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: The schema is created and stored locally for this component only.
Related topic: see Talend Studio User Guide.
Advanced settings
620
Table Name
Enter your DB query paying particularly attention to properly sequence the fields
in order to match the schema definition.
Additional
parameters
JDBC Specify additional connection properties for the DB connection you are creating.
This option is not available if you have selected the Use an existing connection
check box in the Basic settings.
Enable stream
Select this check box to enables streaming over buffering which allows the code
to read from a large table without consuming a large amount of memory in order
to optimize the performance.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from all the
columns
String/Char columns.
Trim column
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful
when you need to access database tables having the same data structure but in different databases, especially
when you are working in an environment where you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Mysql databases.
Drop tAmazonMysqlInput and tFileOutputDelimited from the Palette onto the workspace.
2.
Double-click tAmazonMysqlInput to open its Basic Settings view in the Component tab.
621
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
3.
4.
Click the [+] button to add the rows that you will use to define the schema, four columns in this example
id, first_name, city and salary.
5.
Under Column, click in the fields to enter the corresponding column names.
6.
Click the field under Type to define the type of data. Click OK to close the schema editor.
7.
Next to the Table Name field, click the [...] button to select the database table of interest.
A dialog box displays a tree diagram of all the tables in the selected database:
622
8.
Click the table of interest and then click OK to close the dialog box.
9.
In the Query box, enter the query required to retrieve the desired columns from the table.
2.
Next to the File Name field, click the [...] button to browse your directory to where you want to save the
output file, then enter a name for the file.
3.
Select the Include Header check box to retrieve the column names as well as the data.
Job execution
Save the Job and press F6 to run it.
The output file is written with the desired column names and corresponding data, retrieved from the database:
623
The Job can also be run in the Traces Debug mode, which allows you to view the rows as they are being written to the
output file, in the workspace.
624
tAmazonMysqlOutput
tAmazonMysqlOutput
tAmazonMysqlOutput properties
Component family
Cloud/AmazonRDS/
MySQL
Function
Purpose
tAmazonMysqlOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
DB Version
Use
an
connection
MySQL 5 is available.
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Host
Port
Database
Name of the table to be written. Note that only one table can be written at a time
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and
created again.
Clear table: The table content is deleted.
Truncate table: The table content is quickly deleted. However, you will not be
able to rollback the operation.
Action on data
625
tAmazonMysqlOutput properties
Insert: Add new entries to the table. If duplicates are found, the job stops.
Update: Make changes to existing entries.
Insert or update: inserts a new record. If the record with the given reference
already exists, an update would be made.
Update or insert: updates the record with the given reference. If the record
does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
Replace: Add new entries to the table. If an old row in the table has the same
value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is
deleted before the new row is inserted.
Insert or update on duplicate key or unique index: Add entries if the inserted
value does not exist or update entries if the inserted value already exists and
there is a risk of violating a unique index or primary key.
Insert Ignore: Add only new rows to prevent duplicate key errors.
You must specify at least one column as a primary key on which
the Update and Delete operations are based. You can do that by
clicking Edit Schema and selecting the check box(es) next to the
column(s) you want to set as primary key(s). For an advanced use,
click the Advanced settings view where you can simultaneously
define primary keys for the update and delete operations. To do that:
Select the Use field options check box and then in the Key in update
column, select the check boxes next to the column name on which you
want to base the update operation. Do the same in the Key in delete
column for the deletion operation.
Schema and Edit schema A schema is a row description, i.e. it defines the number of fields to be processed
and passed on to the next component. The schema is either Built-in or stored
remotely in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: The schema is created and stored locally for this component only.
Related topic: see Talend Studio User Guide.
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the row in
error and complete the process for error-free rows. If needed, you can retrieve
the rows in error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection you are
creating. This option is not available if you have selected the Use an existing
connection check box in the Basic settings.
You can press Ctrl+Space to access a list of predefined global
variables.
Extend Insert
Select this check box to carry out a bulk insert of a defined set of lines instead
of inserting lines one by one. The gain in system performance is considerable.
Number of rows per insert: enter the number of rows to be inserted per
operation. Note that the higher the value specidied, the lower performance
levels shall be due to the increase in memory demands.
This option is not compatible with the Reject link. You should
therefore clear the check box if you are using a Row > Rejects link
with this component.
If you are using this component with tMysqlLastInsertID, ensure that
the Extend Insert check box in Advanced Settings is not selected.
Extend Insert allows for batch loading, however, if the check box is
selected, only the ID of the last line of the last batch will be returned.
626
tAmazonMysqlOutput properties
Select this check box to activate the batch mode for data processing. In the
Batch Size field that appears when this check box is selected, you can type in
the number you need to define the batch size to be processed.
This check box is available only when you have selected, the Update
or the Delete option in the Action on data field.
Commit every
Additional Columns
This option is not available if you have just created the DB table (even if you
delete it beforehand). This option allows you to call SQL functions to perform
actions on columns, provided that these are not insert, update or delete actions,
or actions that require pre-processing.
Name: Type in the name of the schema column to be altered or inserted.
SQL expression: Type in the SQL statement to be executed in order to alter or
insert the data in the corrsponding column.
Position: Select Before, Replace or After, depending on the action to be
performed on the reference column.
Reference column: Type in a reference column that tAmazonMysqlOutput
can use to locate or replace the new column, or the column to be modified.
Select this check box to customize a request, particularly if multiple actions are
being carried out on the data.
Select this check box to activate the hint configuration area which helps you
optimize a querys execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax
/*+ */.
Select this check box to display each step involved in the process of writing
data in the database.
Use duplicate key update Updates the values of the columns specified, in the event of duplicate primary
mode insert
keys.:
Column: Between double quotation marks, enter the name of the column to
be updated.
Value: Enter the action you want to carry out on the column.
To use this option you must first of all select the Insert mode in the
Action on data list found in the Basic Settings view.
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible.
This component must be used as an output component. It allows you to carry out actions on a table or on the
data of a table in a MySQL database. It also allows you to create a reject flow using a Row > Rejects link
to filter data in error. For an example of tAmazonMysqlOutput in use, see section Scenario 3: Retrieve
data in error with a Reject link.
627
Drop the following components from the Palette onto the design workspace: tRowGenerator, tMap and
tAmazonMySQLOutput.
2.
Connect tRowGenerator, tMap, and tAmazonMysqlOutput using the Row Main link.
In the design workspace, select tRowGenerator to display its Basic settings view.
2.
Click the Edit schema three-dot button to define the data to pass on to the tMap component, two columns
in this scenario, name and random_date.
3.
4.
Click the RowGenerator Editor three-dot button to open the editor and define the data to be generated.
628
5.
Click in the corresponding Functions fields and select a function for each of the two columns, getFirstName
for the first column and getrandomDate for the second column.
6.
In the Number of Rows for Rowgenerator field, enter 10 to generate ten first name rows and click Ok to
close the editor.
Double-click the tMap component to open the Map editor. The Map editor opens displaying the input
metadata of the tRowGenerator component.
2.
In the Schema editor panel of the Map editor, click the [+] button of the output table to add two rows and
define the first as random_date and the second as random_date1.
629
In this scenario, we want to duplicate the random_date column and adapt the schema in order to alter the
data in the output component.
3.
In the Map editor, drag the random_date row from the input table to the random_date and random_date1
rows in the output table.
4.
In the design workspace, double-click the tAmazonMysqlOutput component to display its Basic settings
view and set its parameters.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
3.
Click the [...] button next to the Table field and select the table to be altered, Dates in this scenario.
4.
On the Action on table list, select Drop table if exists and create, select Insert on the Action on data list.
630
5.
If needed, click Sync columns to synchronize with the columns coming from the tMap component.
6.
Click the Advanced settings tab to display the corresponding view and set the advanced parameters.
7.
Job execution
Save your Job and press F6 to execute it.
The new One_month_later column replaces the random_date1 column in the DB table and adds one month to
each of the randomly generated dates.
631
Drop tFileInputDelimited and tAmazonMysqlOutput from the Palette onto the design workspace. Connect
the two components together using a Row Main link.
Double-click tFileInputDelimited to display its Basic settings view and define the component properties.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
3.
In the File Name field, click the [...] button and browse to the source delimited file that contains the
modifications to propagate in the MySQL table.
In this example, we use the customer_update file that holds four columns: id, CustomerName,
CustomerAddress and idState. Some of the data in these four columns is different from that in the MySQL
table.
632
4.
Define the row and field separators used in the source file in the corresponding fields. If needed, set Header,
Footer and Limit.
In this example, Header is set to 1 since the first row holds the names of columns, therefore it should be
ignored. Also, the number of processed lines is limited to 2000.
5.
Click the [...] button next to Edit Schema to open a dialog box where you can describe the data structure of
the source delimited file that you want to pass to the component that follows.
6.
Select the Key check box(es) next to the column name(s) you want to define as key column(s).
It is necessary to define at least one column as a key column for the Job to be executed correctly. Otherwise, the Job
is automatically interrupted and an error message displays on the console.
In the design workspace, double-click tAmazonMysqlOutput to open its Basic settings view where you
can define its properties.
2.
Click Sync columns to retrieve the schema of the preceding component. If needed, click the [...] button next
to Edit schema to open a dialog box where you can check the retrieved schema.
633
3.
4.
5.
From the Action on table list, select the operation you want to perform, None in this example since the table
already exists.
6.
From the Action on data list, select the operation you want to perform on the data, Update in this example.
Job execution
Save your Job and press F6 to execute it.
Using your DB browser, you can verify if the MySQL table, customers, has been modified according to the
delimited file.
In the above example, the database table has always the four columns id, CustomerName, CustomerAddress and
idState, but certain fields have been modified according to the data in the delimited file used.
634
Drop a tFileInputDelimited component from the family File > Input, in the Palette, and fill in its properties
manually in the Component tab.
2.
From the Palette, drop a tMap from the Processing family onto the workspace.
3.
Drop a tAmazonMysqlOutput from the Databases family in the Palette and fill in its properties manually
in the Component tab.
4.
From the Palette, select a tFileOutputDelimited from the File > Output family, and drop it onto the
workspace.
5.
Link the customers component to the tMap component, and the tMap and Localhost with a Row Main
link. Name this second link out.
6.
Link the Localhost to the tFileOutputDelimited using a Row > Reject link.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
Click the [...] button next to the File Name field, and fill in the path and the name of the file you want to use.
3.
In the Row and Field Separator fields, type in between inverted commas the row and field separator used
in the file.
635
4.
In the Header, Footer and Limit fields, type in the number of headers and footers to ignore, and the number
of rows to which processing should be limited.
5.
Click the [...] button next to the Edit schema field, and set the schema manually.
The schema is as follows:
2.
Select the id, CustomerName, CustomerAddress, idSate, id2, RegTime and RegisterTime columns on the table
on the left and drop them on the out table, on the right.
636
3.
In the Schema editor area, at the bottom of the tMap editor, in the right table, change the length of the
CustomerName column to 28 to create an error. Thus, any data for which the length is greater than 28 will
create errors, retrieved with the Reject link.
4.
Click OK. In the workspace, double-click the output Localhost component to display its Component view.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
5.
6.
In the Table field, type in the name of the table to be created. In this scenario, we call it customers_data.
In the Action on data list, select the Create table option. Click the Sync columns button to retrieve the
schema from the previous component.
Make sure the Die on error check box isnt selected, so that the Job can be executed despite the error you
just created.
7.
Click the Advanced settings tab of the Component view to set the advanced parameters of the component.
637
8.
Deselect the Extend Insert check box which enables you to insert rows in batch, because this option is not
compatible with the Reject link.
Double-click the tFileOutputDelimited component to set its properties in the Component view.
2.
Click the [...] button next to the File Name field to fill in the path and name of the output file. Click the Sync
columns button to retrieve the schema of the previous component.
Job execution
Save your Job and press F6 to execute it.
638
The data in error are sent to the delimited file, as well as the error type met. Here, we have: Data truncation.
639
tAmazonMysqlRollback
tAmazonMysqlRollback
tAmazonMysqlRollback properties
This component is closely related to tAmazonMysqlCommit and tAmazonMysqlConnection. It usually does
not make much sense to use these components independently in a transaction.
Component family
Cloud/AmazonRDS/Mysql
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
640
1.
Drag and drop a tAmazonMysqlRollback to the design workspace and connect it to the Start component.
2.
This complementary element to the Job ensures that the transaction will not be partly committed.
641
tAmazonMysqlRow
tAmazonMysqlRow
tAmazonMysqlRow properties
Component family
Cloud/Amazon/MySQL
Function
tAmazonMysqlRow is the specific component for this database query. It executes the SQL query stated in
the specified database. The row suffix means the component implements a flow in the job design although
it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tAmazonMysqlRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements.
Basic settings
Property type
DB Version
Use
an
connection
MySQL 5 is available.
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Host
Port
Database
Query type
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: Fill in manually the query statement or build it graphically using
SQLBuilder
Guess Query
642
Click the Guess Query button to generate the query which corresponds to your
table schema in the Query field.
Advanced settings
Query
Die on error
This check box is selected by default. Clear the check box to skip the row on
error and complete the process for error-free rows. If needed, you can retrieve
the rows on error via a Row > Rejects link.
Additional
parameters
Propagate
recordset
JDBC Specify additional connection properties for the DB connection you are
creating. This option is not available if you have selected the Use an existing
connection check box in the Basic settings.
QUERYs Select this check box to insert the result of the query in a COLUMN of the
current flow. Select this column from the use column list.
This option allows the component to have a different schema from
that of the preceding component. Moreover, the column that holds
the QUERYs recordset should be set to the type of Object and this
component is usually followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
643
Select and drop the following components onto the design workspace: tAmazonMysqlRow (x2),
tRowGenerator, and tAmazonMysqlOutput.
2.
3.
2.
3.
Propagate the properties and schema details onto the other components of the Job.
4.
Type in the following SQL statement to alter the database entries: drop index <index_name> on
<table_name>
5.
Select the second tAmazonMysqlRow component, check the DB properties and schema.
6.
Type in the SQL statement to recreate an index on the table using the following statement: create index
<index_name> on <table_name> (<column_name>)
The tRowGenerator component is used to generate automatically the columns to be added to the DB output
table defined.
Select the tAmazonMysqlOutput component and fill in the DB connection properties>. The table to be fed
is named: comprehensive.
2.
The schema should be automatically inherited from the data flow coming from the tRowGenerator. Edit the
schema to check its structure and check that it corresponds to the schema expected on the DB table specified.
3.
644
Job execution
Press F6 to run the job.
If you manage to watch the action on DB data, you can notice that the index is dropped at the start of the job and
recreated at the end of the insert action.
Related topics: section tDBSQLRow properties.
Drop a tFileInputDelimited component from the Palette onto the design workspace, and double-click it to
open its Basic settings view.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
Define the path to the input file, the row speparator, the field separator, the header, and the footer in the
corresponding fields.
3.
Click on [...] next to the Edit schema field to add a column into which the name of the State will be inserted.
645
4.
Click on the [+] button to add a column to the schema. Rename this column LabelStateRecordSet and select
Object from the Type list. Click OK to save your modifications.
5.
From the Palette, select the tAmazonMysqlRow, tParseRecordSet and tFileOutputDelimited components
and drop them onto the workspace. Connect the four components using Row > Main type links.
Double-click tAmazonMysqlRow to set its properties in the Basic settings tab of the Component view.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
646
2.
3.
Click the Sync columns button to retrieve the schemma from the preceding component.
4.
In the Query field, enter the SQL query you want to use. Here, we want to retrieve the names of the American
States from the LabelState column of the MySQL table, us_state:
"SELECT LabelState
FROM us_state WHERE idState=?"
The question mark, ?, represents the parameter to be set in the Advanced settings tab.
2.
Select the Propagate QUERYs recordset check box and select the LabelStateRecordSet column from the
use column list to insert the query results in that column.
3.
Select the Use PreparedStatement check box and define the parameter used in the query in the Set
PreparedStatement Parameters table. Click on the [+] button to add a parameter.
4.
In the Parameter Index cell, enter the parameter position in the SQL instruction. Enter 1 as we are only
using one parameter in this example.
5.
In the Parameter Type cell, enter the type of parameter. Here, the parameter is a whole number, hence,
select Int from the list.
6.
In the Parameter Value cell, enter the parameter value. Here, we want to retrieve the name of the State based
on the State ID for every client in the input file. Hence, enter row1.idState.
Double-click tParseRecordSet to set its properties in the Basic settings tab of the Component view.
647
2.
From the Prev. Comp. Column list, select the preceding components column for analysis. In this example,
select LabelStateRecordSet.
3.
Click on the Sync columns button to retrieve the schema from the preceding component. The Attribute table
is automatically completed with the schema columns.
4.
In the Attribute table, in the Value field which corresponds to the LabelStateRecordSet, enter the name of
the column containing the State names to be retrieved and matched with each client, within double quotation
marks. In this example, enter LabelState.
Double-click tFileOutputDelimited to set its properties in the Basic settings tab of the Component view.
2.
In the File Name field, enter the access path and name of the output file. Click Sync columns to retrieve the
schema from the preceding component.
Job execution
Save your Job and press F6 to run it.
648
Related scenarios
A column containing the name of the American State corrresponding to each client is added to the file.
Related scenarios
For a related scenario, see:
section Scenario 3: Combining two flows for selective output
649
tAmazonOracleClose
tAmazonOracleClose
tAmazonOracleClose properties
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
650
tAmazonOracleCommit
tAmazonOracleCommit
tAmazonOracleCommit Properties
This component is closely related to tAmazonOracleConnection and tAmazonOracleRollback. It usually
doesnt make much sense to use these components independently in a transaction.
Component family
Cloud/AmazonRDS/Oracle
Function
Validates the data processed through the job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tAmazonOracleConnection and tAmazonOracleRollback. It usually
doesnt make much sense to use one of these without using a tAmazonOracleConnection component to open a
connection for the current transaction.
For tAmazonOracleCommit related scenario, see section tMysqlConnection
651
tAmazonOracleConnection
tAmazonOracleConnection
tAmazonOracleConnection Properties
This component is closely related to tAmazonOracleCommit and tAmazonOracleRollback. It usually doesnt
make much sense to use one of these without using a tAmazonOracleConnection component to open a connection
for the current transaction.
Component family
Cloud/AmazonRDS/Oracle
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
Connection type
DB Version
Host
Port
Database
Schema
Additional
parameters
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
652
Related scenario
Limitation
n/a
Related scenario
This component is closely related to tAmazonOracleCommit and tAmazonOracleRollback. It usually doesnt
make much sense to use one of these without using a tAmazonOracleConnection component to open a connection
for the current transaction.
For tAmazonOracleConnection related scenario, see section tMysqlConnection
653
tAmazonOracleInput
tAmazonOracleInput
tAmazonOracleInput properties
Component family
Cloud/AmazonRDS/
Oracle
Function
Purpose
tAmazonOracleInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
Connection type
DB Version
Use
an
connection
Host
Port
Database
Oracle schema
654
Table name
Related scenarios
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Use cursor
When selected, helps to decide the row set to work with at a time and thus
optimize performance.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from all the
columns
String/Char columns.
Trim column
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Oracle databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see the
section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
655
tAmazonOracleOutput
tAmazonOracleOutput
tAmazonOracleOutput properties
Component family
Cloud/AmazonRDS/
Oracle
Function
Purpose
tAmazonOracleOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Connection type
DB Version
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be written at a time.
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not exist.
Drop a table if exists and create: The table is removed if it already exists and
created again.
656
tAmazonOracleOutput properties
Schema and Edit schema A schema is a row description, i.e. it defines the number of fields to be processed
and passed on to the next component. The schema is either Built-in or stored
remotely in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: The schema is created and stored locally for this component only.
Related topic: see Talend Studio User Guide.
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the row on
error and complete the process for error-free rows. If needed, you can retrieve
the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection you are
creating. This option is not available if you have selected the Use an existing
connection check box in the Basic settings.
You can press Ctrl+Space to access a list of predefined global
variables.
Override any existing Select this check box to override variables already set for a NLS language
NLS_LANG environment environment.
variable
Commit every
Select this check box to collect log data at the component level.
Additional Columns
This option is not offered if you create (with or without drop) the DB table. This
option allows you to call SQL functions to perform actions on columns, which
are not insert, nor update or delete actions, or action that require particular
preprocessing.
Name: Type in the name of the schema column to be altered or inserted as new
column.
SQL expression: Type in the SQL statement to be executed in order to alter
or insert the relevant column data.
Position: Select Before, Replace or After following the action to be performed
on the reference column.
Reference column: Type in a column of reference that the tDBOutput can use
to place or replace the new or altered column.
657
Related scenarios
Select this check box to customize a request, especially when there is double
action on data.
Select this check box to activate the hint configuration area which helps you
optimize a querys execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax
/*+ */.
and Select this check box to set the names of columns and table in upper case.
Select this check box to display each step during processing entries in a
database.
When selected, enables you to define the number of lines in each processed
batch.
This option is available only when you do not Use an existing
connection in Basic settings.
Support null in SQL Select this check box to validate null in SQL WHERE statement.
WHERE statement
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible.
This component must be used as an output component. It allows you to carry out actions on a table or on
the data of a table in a Oracle database. It also allows you to create a reject flow using a Row > Rejects link
to filter data in error. For such an example, see section Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see the
section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For tAmazonOracleOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
658
tAmazonOracleRollback
tAmazonOracleRollback
tAmazonOracleRollback properties
This component is closely related to tAmazonOracleCommit and tAmazonOracleConnection. It usually
doesnt make much sense to use these components independently in a transaction.
Component family
Cloud/AmazonRDS/Oracle
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tAmazonOracleConnection and tAmazonOracleCommit. It usually
doesnt make much sense to use one of these without using a tAmazonOracleConnection component to open a
connection for the current transaction.
For tAmazonOracleRollback related scenario, see section tMysqlRollback.
659
tAmazonOracleRow
tAmazonOracleRow
tAmazonOracleRow properties
Component family
Cloud/AmazonRDS/
Oracle
Function
tAmazonOracleRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job design
although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tAmazonOracleRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write easily
your SQL statements.
Basic settings
Property type
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Connection type
Host
Port
Database
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: Fill in manually the query statement or build it graphically using
SQLBuilder
660
Related scenarios
Query
Use NB_LINE_
This option allows you feed the variable with the number of rows inserted/
updated/deleted to the next component or subjob. This field only applies if the
query entered in Query field is a INSERT, UPDATE or DELETE query.
NONE: does not feed the variable.
INSERTED: feeds the variable with the number of rows inserted.
UPDATED: feeds the variable with the number of rows updated.
DELETED: feeds the variable with the number of rows deleted.
Die on error
Advanced settings
Propagate
recordset
This check box is selected by default. Clear the check box to skip the row on
error and complete the process for error-free rows. If needed, you can retrieve
the rows on error via a Row > Rejects link.
QUERYs Select this check box to insert the result of the query into a COLUMN of the
current flow. Select this column from the use column list.
This option allows the component to have a different schema from
that of the preceding component. Moreover, the column that holds
the QUERYs recordset should be set to the type of Object and this
component is usually followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
661
Related scenarios
662
tCloudStart
tCloudStart
tCloudStart Properties
Component family
Cloud
Function
This component accesses the cloud provider to be used (Amazon EC2) and launches instances,
which are virtual servers in that cloud. If an instance to be launched does not exist, tCloudStart
creates it.
Purpose
This component starts instances on Amazon EC2 (Amazon Elastic Compute Cloud).
Basic settings
Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services. These
access credentials are generated from the Security Credential tab
of your Amazon account page.
Cloud provider
Image
Instance name
Instance count
Instance type
Select this check box to use Amazon Key Pair for your login to
Amazon EC2. Once selecting it, a drop-down list appears to allow
you to select :
Use an existing Key Pair to enter the name of that Key Pair in
the field next to the drop-down list. If required, Amazon will
prompt you at runtime to find and use that Key Pair.
Create a Key Pair to enter the name of the new Key Pair in the
field next to the drop-down list and define the location where
you want to store this Key Pair in the Advanced settings tab
view.
663
Related scenario
Security group
Add rows to this table and enter the names of the security groups
to which you need to assign the instance(s) to be launched. The
security groups set in this table must exist on your Amazon EC2.
A security group applies specific rules on inbound traffic to
instances assigned to the group, such as the ports to be used.
For further information about security groups, see Amazon's
documentation about security groups.
Advanced settings
Browse to, or enter the path to the folder you use to store the
created Key Pair file.
This field appears when you select Creating a Key Pair in the
Basic settings tab view.
Volumes
Add rows and define the volume(s) to be created for the instances
to be launched in addition to the volumes predefined and
allocated by the given Amazon EC2.
The parameters to be set in this table are the same parameters
used by Amazon for describing a volume.
If you need to remove automatically an additional volume after
terminating its related instance, select the check box in the Delete
on termination column.
tStatCatcher Statistics
Select this check box to collect the log data at the component
level.
Usage
This component works standalone to launch an instance on Amazon EC2. You can use this
component to start the instance you need to deploy Jobs on.
Limitation
N/A
Related scenario
No scenario is available for this component yet.
664
tCloudStop
tCloudStop
tCloudStop Properties
Component family
Cloud
Function
This component accesses the cloud provider to be used (Amazon EC2) and suspends, resumes
or terminates given instance(s).
Purpose
This component allows you to change the status of a launched instance on Amazon EC2
(Amazon Elastic Compute Cloud).
Basic settings
Enter or paste the access key and the secret key required by
Amazon to authenticate your requests to its web services. These
access credentials are generated from the Security Credential
view of your Amazon account page.
Cloud provider
Action
Predicate
Select the instance(s) of which you need to change the status. The
options are:
Running instances: status of all the running instances will be
changed.
Instances in a specific group: status of the instances of a
specific instance group will be changed. You need to enter the
name of that group in the Group name field.
Running instances in a specific group: status of the running
instances of a specific instance group will be changed. You
need to enter the name of that group in the Group name field.
Instance with predefined id: status of a given instance will
be changed. You need to enter the ID of that instance in the Id
field. You can find this ID on your Amazon EC2.
An instance group is composed of the instances using the same
instance name you have defined in the Instance name field of
tCloudStart.
Advanced settings
Group name
Enter the name of the group in which you want to change the
status of given instances. This field appears when you select
Instances in a specific group or Running instances in a specific
group from the Predicate list.
Id
tStatCatcher Statistics
Select this check box to collect the log data at the component
level.
665
Related scenario
Usage
This component works standalone to change the status of given instances on Amazon EC2. You
can use this component to suspend, resume or terminate the instance(s) you have deployed Jobs
on.
This component often works alongside tCloudStart to change the status of the instances
launched by the latter component.
Limitation
N/A
Related scenario
No scenario is available for this component yet.
666
tGSBucketCreate
tGSBucketCreate
tGSBucketCreate belongs to two component families: Big Data and Cloud. For more information on it, see
section tGSBucketCreate.
667
tGSBucketDelete
tGSBucketDelete
tGSBucketDelete belongs to two component families: Big Data and Cloud. For more information on it, see section
tGSBucketDelete.
668
tGSBucketExist
tGSBucketExist
tGSBucketExist belongs to two component families: Big Data and Cloud. For more information on it, see section
tGSBucketExist.
669
tGSBucketList
tGSBucketList
tGSBucketList belongs to two component families: Big Data and Cloud. For more information on it, see section
tGSBucketList.
670
tGSClose
tGSClose
tGSClose belongs to two component families: Big Data and Cloud. For more information on it, see section
tGSClose.
671
tGSConnection
tGSConnection
tGSConnection belongs to two component families: Big Data and Cloud. For more information on it, see section
tGSConnection.
672
tGSCopy
tGSCopy
tGSCopy belongs to two component families: Big Data and Cloud. For more information on it, see section
tGSCopy.
673
tGSDelete
tGSDelete
tGSDelete belongs to two component families: Big Data and Cloud. For more information on it, see section
tGSDelete.
674
tGSGet
tGSGet
tGSGet belongs to two component families: Big Data and Cloud. For more information on it, see section tGSGet.
675
tGSList
tGSList
tGSList belongs to two component families: Big Data and Cloud. For more information on it, see section tGSList.
676
tGSPut
tGSPut
tGSPut belongs to two component families: Big Data and Cloud. For more information on it, see section tGSPut.
677
tMarketoInput
tMarketoInput
tMarketoInput belongs to two component families: Business and Cloud. For more information on it, see section
tMarketoInput.
678
tMarketoListOperation
tMarketoListOperation
tMarketoListOperation belongs to two component families: Business and Cloud. For more information on it,
see section tMarketoListOperation.
679
tMarketoOutput
tMarketoOutput
tMarketoOutput belongs to two component families: Business and Cloud. For more information on it, see section
tMarketoOutput.
680
tS3BucketCreate
tS3BucketCreate
tS3BucketCreate properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Bucket
Die on error
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component can be used alone or with other S3 components, e.g. tS3BucketExist.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tS3BucketCreate related scenarios, see section Scenario: Verifing the absence of a bucket, creating it and
listing all the S3 buckets .
681
tS3BucketDelete
tS3BucketDelete
tS3BucketDelete properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Bucket
Die on error
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component can be used alone or with other S3 components, e.g. tS3BucketList.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tS3BucketDelete related scenarios, see section Scenario: Verifing the absence of a bucket, creating it and
listing all the S3 buckets .
682
tS3BucketExist
tS3BucketExist
tS3BucketExist properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Bucket
Die on error
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
BUCKET_EXIST: indicates the existence of a specified bucket. This is a Flow variable and it
returns a boolean.
BUCKET_NAME: indicates the name of a specified bucket. This is an After variable and it returns
a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
683
Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets
2.
3.
4.
5.
6.
684
Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets
2.
In the Access Key and Secret Key fields, enter the authentication credentials.
3.
4.
Select the Use existing connection check box to reuse the connection.
5.
6.
7.
This way, the rest of the Job will be executed if the specified bucket does not exist.
8.
Select the Use existing connection check box to reuse the connection.
In the Bucket field, enter the bucket name to create.
9.
685
Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets
Select the Use existing connection check box to reuse the connection.
10. Double-click tIterateToFlow to open its Basic settings view.
Click the [+] button to add one column, namely bucket_list of the String type.
Click Ok to validate the setup and close the schema editor.
12. In the Mapping area, press Ctrl + Space in the Value field to choose the variable
tS3BucketList_1_CURRENT_BUCKET_NAME.
13. Double-click tLogRow to open its Basic settings view.
Select Table (print values in cells of a table) for a better display of the results.
686
Scenario: Verifing the absence of a bucket, creating it and listing all the S3 buckets
2.
As shown above, the bucket is created and all the buckets are listed.
3.
687
tS3BucketList
tS3BucketList
tS3BucketList properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Die on error
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
CURRENT_BUCKET_NAME: indicates the current bucket name. This is a Flow variable and
it returns a string.
NB_BUCKET: indicates the number of buckets. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component can be used alone or with other S3 components, e.g. tS3BucketDelete.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
688
Related scenario
Related scenario
For tS3BucketList related scenarios, see section Scenario: Verifing the absence of a bucket, creating it and listing
all the S3 buckets .
689
tS3Close
tS3Close
tS3Close properties
Component family
Cloud/AmazonS3
Function
Purpose
tS3Close is designed to close a connection to Amazon S3, thus releasing the network resources.
Basic settings
Component List
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
As an end component, this component is to be used along with other S3 components, e.g.
tS3Connection.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tS3Close related scenarios, see section Scenario: Listing files with the same prefix from a bucket.
690
tS3Connection
tS3Connection
tS3Connection properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Access Key
Access Secret
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tS3Connection related scenarios, see section Scenario: File exchanges with Amazon S3 .
691
tS3Delete
tS3Delete
tS3Delete properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Bucket
Key
Die on error
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component can be used alone or with other S3 components, e.g. tS3BucketList.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tS3Delete related scenarios, see section Scenario: Verifing the absence of a bucket, creating it and listing
all the S3 buckets .
692
tS3Get
tS3Get
tS3Get properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Bucket
Key
File
Die on error
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component can be used alone or with other S3 components, e.g. tS3Connection.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tS3Get related scenarios, see section Scenario: File exchanges with Amazon S3 .
693
tS3List
tS3List
tS3List properties
Component family
Cloud/AmazonS3
Function
Purpose
tS3List is designed to list the files on Amazon S3 based on the bucket/file prefix settings.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Select this check box to list all the files on the S3 server.
Key prefix: enter the prefix of files to be listed. This way, only files
with that prefix will be listed.
Bucket
Click the [+] button to add one or more lines for defining the buckets
and file prefixes.
Bucket name: name of the bucket whose files will be listed.
Key prefix: prefix of files to be listed.
Not available when List all bucket objects is selected.
Advanced settings
Die on error
Config client
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
CURRENT_BUCKET: indicates the current bucket name. This is a Flow variable and it returns
a string.
CURRENT_KEY: indicates the current file name. This is a Flow variable and it returns a string.
NB_BUCKET: indicates the number of buckets. This is an After variable and it returns an integer.
694
NB_BUCKET_OBJECT: indicates the number of files in all the buckets. This is an After variable
and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component can be used alone or with other S3 components, e.g. tS3Delete.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
For how to create a bucket and put files into it, see section Scenario: Verifing the absence of a bucket, creating it
and listing all the S3 buckets and section Scenario: File exchanges with Amazon S3 .
Drop tS3Connection, tS3List, tIterateToFlow, tLogRow and tS3Close onto the workspace.
2.
3.
4.
5.
695
2.
In the Access Key and Secret Key fields, enter the authentication credentials.
3.
696
4.
Select the Use existing connection check box to reuse the connection.
5.
In the Bucket area, click the [+] button to add one line.
6.
In the Bucket name and Key prefix fields, enter the bucket name and file prefix.
This way, only files with the specified prefix will be listed.
7.
8.
Click the [+] button to add one column, namely file_list of the String type.
Click Ok to validate the setup and close the schema editor.
697
9.
In the Mapping area, press Ctrl + Space in the Value field to choose the variable tS3List_1_CURRENT_KEY.
Select Table (print values in cells of a table) for a better display of the results.
11. Double-click tS3Close to open its Basic settings view.
There is no need to select a connection component as the only one is selected by default.
2.
As shown above, only the files with the prefix "in" are listed.
698
tS3Put
tS3Put
tS3Put properties
Component family
Cloud/AmazonS3
Function
Purpose
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Access Key
Access Secret
Bucket
Key
File
Die on error
Config client
Advanced settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component can be used alone or with other S3 components, e.g. tS3Connection.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
699
2.
3.
2.
In the Access Key and Secret Key fields, enter the authentication credentials.
3.
4.
Select the Use existing connection check box to reuse the connection.
5.
6.
In the Key field, enter the name of the file to be saved on the S3 server.
700
7.
8.
9.
Select the Use existing connection check box to reuse the connection.
2.
3.
701
As shown above, the remote file is retrieved to the local disk, proof that the S3 Get action was performed
successfully.
702
tSalesforceBulkExec
tSalesforceBulkExec
tSalesforceBulkExec belongs to two component families: Business and Cloud. For more information on it, see
section tSalesforceBulkExec.
703
tSalesforceConnection
tSalesforceConnection
tSalesforceConnection belongs to two component families: Business and Cloud. For more information on it, see
section tSalesforceConnection.
704
tSalesforceGetDeleted
tSalesforceGetDeleted
tSalesforceGetDeleted belongs to two component families: Business and Cloud. For more information on it, see
section tSalesforceGetDeleted.
705
tSalesforceGetServerTimestamp
tSalesforceGetServerTimestamp
tDB2SCD belongs to two component families: Business and Cloud. For more information on it, see section
tSalesforceGetServerTimestamp.
706
tSalesforceGetUpdated
tSalesforceGetUpdated
tSalesforceGetUpdated belongs to two component families: Business and Cloud. For more information on it, see
section tSalesforceGetUpdated.
707
tSalesforceInput
tSalesforceInput
tSalesforceInput belongs to two component families: Business and Cloud. For more information on it, see section
tSalesforceInput.
708
tSalesforceOutput
tSalesforceOutput
tSalesforceOutput belongs to two component families: Business and Cloud. For more information on it, see
section tSalesforceOutput.
709
tSalesforceOutputBulk
tSalesforceOutputBulk
tSalesforceOutputBulk belongs to two component families: Business and Cloud. For more information on it, see
section tSalesforceOutputBulk.
710
tSalesforceOutputBulkExec
tSalesforceOutputBulkExec
tSalesforceOutputBulkExec belongs to two component families: Business and Cloud. For more information on
it, see section tSalesforceOutputBulkExec.
711
tSugarCRMInput
tSugarCRMInput
tSugarCRMInput belongs to two component families: Business and Cloud. For more information on it, see
section tSugarCRMInput.
712
tSugarCRMOutput
tSugarCRMOutput
tSugarCRMOutput belongs to two component families: Business and Cloud. For more information on it, see
section tSugarCRMOutput.
713
tGroovy
tGroovy
tGroovy properties
Component Family
Custom Code
Function
tGroovy allows you to enter customized code which you can integrate in the Talend programme.
The code is run only once.
Purpose
tGroovy broadens the functionality if the Talend Job, using the Groovy language which is a
simplified Java syntax.
Basic settings
Groovy Script
Variables
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at component level.
Usage
This component can be used alone or as a subjob along with one other component.
Limitation
Related Scenarios
For a scenario using the Groovy code, see section Scenario: Calling a file which contains Groovy code.
For a functional example, see section Scenario: Printing out a variable content
716
tGroovyFile
tGroovyFile
tGroovyFile properties
Component Family
Custom Code
Function
Purpose
tGroovyFile broadens the functionality of Talend Jobs using the Groovy language which is a
simplified Java syntax.
Basic settings
Groovy File
Variables
Advanced settings
tStatCatcher Statistics
Usage
This component can be used alone or as a sub-job along with another component.
Select this check box to collect the log data at component level.
Limitation
2.
In the Groovy File field, enter the path to the file containing the Groovy code, or browse to the file in your
directory.
3.
717
4.
In the Name column, enter age, then in the Value column, enter 50, as in the screenshot.
Job execution
Press F6 to save and run the Job.
The Console displays the information contained in the input file, to which the variable result is added.
718
tJava
tJava
tJava properties
Component family
Custom Code
Function
tJava enables you to enter personalized code in order to integrate it in Talend program. You
can execute this code only once.
Purpose
tJava makes it possible to extend the functionalities of a Talend Job through using Java
commands.
Basic settings
Code
Type in the Java code you want to execute according to the task
you need to perform. For further information about Java functions
syntax specific to Talend, see Talend Studio Help Contents (Help
> Developer Guide > API Reference).
For a complete Java reference, check http://docs.oracle.com/
javaee/6/api/
Advanced settings
Import
tStatCatcher Statistics
Usage
Limitation
Select and drop the following components from the Palette onto the design workspace: tFileInputDelimited,
tFileOutputExcel, tJava.
719
2.
Connect the tFileInputDelimited to the tFileOutputExcel using a Row Main connection. The content
from a delimited txt file will be passed on through the connection to an xls-type of file without further
transformation.
3.
Then connect the tFileInputDelimited component to the tJava component using a Trigger > On Subjob
Ok link. This link sets a sequence ordering tJava to be executed at the end of the main process.
2.
Define the path to the input file in the File name field.
The input file used in this example is a simple text file made of two columns: Names and their respective
Emails.
3.
Click the Edit Schema button, and set the two-column schema. Then click OK to close the dialog box.
4.
When prompted, click OK to accept the propagation, so that the tFileOutputExcel component gets
automatically set with the input schema.
720
In this example, the Sheet name is Email and the Include Header box is selected.
Then select the tJava component to set the Java command to execute.
2.
In this use case, we use the NB_Line variable. To access the global variable list, press Ctrl + Space bar on
your keyboard and select the relevant global parameter.
Job execution
Save your Job and press F6 to execute it.
721
The content gets passed on to the Excel file defined and the Number of lines processed are displayed on the Run
console.
722
tJavaFlex
tJavaFlex
tJavaFlex properties
Component family
Custom Code
Function
tJavaFlex enables you to enter personalized code in order to integrate it in Talend program.
With tJavaFlex, you can enter the three java-code parts (start, main and end) that constitute a
kind of component dedicated to do a desired operation.
Objective
tJavaFlex lets you add Java code to the Start/Main/End code sections of this component itself.
Basic settings
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Click Sync columns to retrieve the schema from the previous
component in the Job.
Built-in: The schema is created and stored locally for this
component only. Related topic: see Talend Studio User Guide.
Advanced settings
Start code
Enter the Java code that will be called during the initialization
phase.
Main code
Enter the Java code to be applied for each line in the data flow.
End code
Enter the Java code that will be called during the closing phase.
Import
tStatCatcher Statistics
Usage
You can use this component as a start, intermediate or output component. You can as well use
it as a one-component subjob.
Limitation
Drop tJavaFlex and tLogRow from the Palette onto the design workspace.
723
2.
Double-click tJavaFlex to display its Basic settings view and define its properties.
2.
Click the three-dot button next to Edit schema to open the corresponding dialog box where you can define
the data structure to pass to the component that follows.
3.
Click the [+] button to add two columns: key and value and then set their types to Integer and String
respectively.
4.
5.
In the Basic settings view of tJavaFlex, select the Data Auto Propagate check box to automatically
propagate data to the component that follows.
In this example, we do not want to do any transformation on the retrieved data.
6.
In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of tJavaFlex by displaying the START message and sets
up the loop and the variables to be used afterwards in the Java code:
System.out.println("## START\n#");
String [] valueArray = {"Miss", "Mrs", "Mr"};
for (int i=0;i<valueArray.length;i++) {
724
7.
In the Main code field, enter the code you want to apply on each of the data rows.
In this example, we want to display each key with its value:
row1.key = i;
row1.value = valueArray[i];
In the Main code field, "row1" corresponds to the name of the link that comes out of tJavaFlex. If you rename this
link, you have to modify the code of this field accordingly.
8.
In the End code field, enter the code that will be executed in the closing phase.
In this example, the brace (curly bracket) closes the loop and the code indicates the end of the execution of
tJavaFlex by displaying the END message:
}
System.out.println("#\n## END");
9.
If needed, double-click tLogRow and in its Basic settings view, click the [...] button next to Edit schema
to make sure that the schema has been correctly propagated.
2.
725
The three personal titles are displayed on the console along with their corresponding keys.
Drop tRowGenerator and tJavaFlex from the Palette onto the design workspace.
2.
Double-click tRowGenerator to display its Basic settings view and the [RowGenerator Editor] dialog box
where you can define the component properties.
2.
Click the plus button to add four columns: number, txt, date and flag.
3.
Define the schema and set the parameters to the four columns according to the above capture.
726
4.
In the Functions column, select the three-dot function [...] for each of the defined columns.
5.
In the Parameters column, enter 10 different parameters for each of the defined columns. These 10
parameters corresponds to the data that will be randomly generated when executing tRowGenerator.
6.
Double-click tJavaFlex to display its Basic settings view and define the components properties.
2.
Click Sync columns to retrieve the schema from the preceding component.
3.
In the Start code field, enter the code to be executed in the initialization phase.
In this example, the code indicates the initialization of the tJavaFlex component by displaying the START
message and defining the variable to be used afterwards in the Java code:
System.out.println("## START\n#");
int i = 0;
4.
In the Main code field, enter the code to be applied on each line of data.
In this example, we want to show the number of each line starting from 0 and then the number and the random
text transformed to upper case and finally the random date set in the editor of tRowGenerator. Then, we
create a condition to show if the status is true or false and we increment the number of the line:
System.out.print(" row" + i + ":");
System.out.print("# number:" + row1.number);
System.out.print (" | txt:" + row1.txt.toUpperCase());
System.out.print(" | date:" + row1.date);
if(row1.flag) System.out.println(" | flag: true");
else System.out.println(" | flag: false");
i++;
727
In the Main code field, "row1" corresponds to the name of the link that connects to tJavaFlex. If you rename this
link, you have to modify the code.
5.
In the End code field, enter the code that will be executed in the closing phase.
In this example, the code indicates the end of the execution of tJavaFlex by displaying the END message:
System.out.println("#\n## END");
2.
728
The console displays the randomly generated data that was modified by the java command set through
tJavaFlex.
729
tJavaRow
tJavaRow
tJavaRow properties
Component Family
Custom Code
Function
tJavaRow allows you to enter customized code which you can integrate in a Talend programme.
With tJavaRow, you can enter the Java code to be applied to each row of the flow.
Purpose
tJavaRow allows you to broaden the functionality of Talend Jobs, using the Java language.
Basic settings
Advanced settings
Global Variables
Code
Enter the Java code to be applied to each line of the data flow.
Import
tStatCatcher Statistics
Select this check box to collect the log data at a component level..
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
To enter a global variable (for example COUNT of tFileRowCount) in the Code
box, you need to type in the entire piece of code manually, that is to say
((Integer)globalMap.get("tFileRowCount_COUNT")).
Usage
This component is used as an intermediary between two other components. It must be linked to
both an input and an output component.
Limitation
730
Drop a tFileInputDelimited component and a tJavaRow component from the Palette onto the design
workspace, and label them to better identify their roles in the Job.
2.
Double-click the tFileInputDelimited component to display its Basic settings view in the Component tab.
2.
In the File name/Stream field, type in the path to the input file in double quotation marks, or browse to the
path by clicking the [...] button, and define the first line of the file as the header.
In this example, the input file has the following content:
City;Population;LandArea;PopDensity
Beijing;10233000;1418;7620
Moscow;10452000;1081;9644
Seoul;10422000;605;17215
Tokyo;8731000;617;14151
New York;8310000;789;10452
3.
Click the [...] button next to Edit schema to open the [Schema] dialog box, and define the data structure of
the input file. Then, click OK to validate the schema setting and close the dialog box.
731
4.
Double-click the tJavaRow component to display its Basic settings view in the Component tab.
5.
Click Sync columns to make sure that the schema is correctly retrieved from the preceding component.
6.
In the Code field, enter the code to be applied on each line of data based on the defined schema columns.
In this example, we want to transform the city names to upper case, group digits of numbers larger than 1000
using the thousands separator for ease of reading, and print the data on the console:
System.out.print("\n" + row1.City.toUpperCase() + ":");
System.out.print("\n - Population: "
+ FormatterUtils.format_Number(String.valueOf(row1.Population), ',', '.')
+ " people");
System.out.print("\n - Land area: "
+ FormatterUtils.format_Number(String.valueOf(row1.LandArea), ',', '.')
+ " km2");
System.out.print("\n - Population density: "
+ FormatterUtils.format_Number(String.valueOf(row1.PopDensity), ',', '.')
+ " people/km2\n");
In the Code field, "row1" refers to the name of the link that connects to tJavaRow. If you rename the link, you have
to modify the code.
2.
732
The city information is transformed by the Java code set through tJavaRow and displayed on the console.
733
tLibraryLoad
tLibraryLoad
tLibraryLoad properties
Famille de composant
Custom Code
Function
Purpose
Basic settings
Library
Select the library you want to import from the list, or click on the
[...] button to browse to the library in your directory.
Advanced settings
Dynamic Libs
Lib Paths: Enter the access path to your library, between double
quotation marks.
Import
tStatCatcher Statistics
Select this check box to collect the log data at component level.
Usage
This component may be used alone, although it is more logical to use it as part of a Job.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is used
standalone. It generates native Map/Reduce code that can be executed directly in Hadoop.
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
n/a
734
In the Palette, open the Custom_Code folder, and slide a tLibraryLoad and tJava component onto the
workspace.
2.
Double-click on tLibraryLoad to display its Basic settings. From the Library list, select jakartaoro-2.0.8.jar.
2.
In the Import field of the Advanced settings tab, type import org.apache.oro.text.regex.*;
2.
In the Basic settings tab, enter your code, as in the screenshot below. The code allows you to check whether
the character string pertains to an e-mail address, based on the regular expression: "^[\\w_.-]+@[\\w_.-]+
\\.[\\w]+$".
735
Job execution
Press F6 to save and run the Job.
The Console displays the boolean false. Hence, the e-mail address is not valid as the format is incorrect.
736
tSetGlobalVar
tSetGlobalVar
tSetGlobalVar properties
Component family
Custom Code
Function
Purpose
Basic settings
Variables
Advanced settings
tStatCatcher Statistics
Usage
Limitation
Drop the following components from the Palette onto the design workspace: tSetGlobalVar and tJava.
2.
Connect the tSetGlobalVar component to the tJava component using a Trigger > OnSubjobOk connection.
737
2.
Click the plus button to add a line in the Variables table, and fill the Key and Value fields with K1 and
20 respectively.
3.
Then double-click the tJava component to display its Basic settings view.
4.
In this use case, we use the Result variable. To access the global variable list, press Ctrl + Space bar on your
keyboard and select the relevant global parameter.
Job execution
Save your Job and press F6 to execute it.
The content of global variable K1 is displayed on the console.
738
tAddCRCRow
tAddCRCRow
tAddCRCRow properties
Component family
Data Quality
Function
tAddCRCRow calculates a surrogate key based on one or several columns and adds it to the
defined schema.
Purpose
Basic settings
Advanced Settings
Usage
Implication
Select the check box facing the relevant columns to be used for the
surrogate key checksum.
CRC type
Select a CRC type in the list. The longer the CRC, the least overlap
you will have.
Select this check box to collect log data at the component level.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
2.
740
1.
In the tFileInputDelimited Component view, set the File Name path and all related properties in case these
are not stored in the Repository.
2.
Create the schema through the Edit Schema button. Remember to set the data type column and for more
information on the Date pattern to be filled in, visit http://docs.oracle.com/javase/6/docs/api/index.html.
In the tAddCRCRow Component view, select the check boxes of the input flow columns to be used to
calculate the CRC.
Notice that a CRC column (read-only) has been added at the end of the schema.
2.
3.
In the Basic settings view of tLogRow, select the Print values in cells of a table option to display the output
data in a table on the Console.
741
Job execution
Then save your Job and press F6 to execute it.
An additional CRC Column has been added to the schema calculated on all previously selected columns (in this
case all columns of the schema).
742
tChangeFileEncoding
tChangeFileEncoding
tChangeFileEncoding component belongs to two component families: Data Quality and File. For more
information about tChangeFileEncoding, see section tChangeFileEncoding.
743
tExtractRegexFields
tExtractRegexFields
tExtractRegexFields belongs to two component families: Data Quality and Processing. For more information on
tExtractRegexFields, see section tExtractRegexFields.
744
tFuzzyMatch
tFuzzyMatch
tFuzzyMatch properties
Component family
Data Quality
Function
Compares a column from the main flow with a reference column from the lookup flow and
outputs the main flow data displaying the distance
Purpose
Helps ensuring the data quality of any source data against a reference data source.
Basic settings
Matching type
Usage
Min distance
Max distance
Matching column
Select the column of the main flow that needs to be checked against
the reference (lookup) key column
Unique matching
Select this check box if you want to get the best match possible, in
case several matches are available.
This component is not startable (green background) and it requires two input components and
an output component.
745
Drag and drop the following components from the Palette to the design workspace: tFileInputDelimited
(x2), tFuzzyMatch, tLogRow.
2.
Link the first tFileInputDelimited component to the tFuzzyMatch component using a Row > Main
connection.
3.
Link the second tFileInputDelimited component to the tFuzzyMatch using a Row > Main connection
(which appears as a Lookup row on the design workspace).
4.
Link the tFuzzyMatch component to the standard output tLogRow using a Row > Main connection.
Define the first tFileInputDelimited in its Basic settings view. Browse the system to the input file to be
analyzed.
2.
Define the schema of the component. In this example, the input schema has two columns, firstname and
gender.
3.
746
4.
Double-click the tFuzzyMatch component to open its Basic settings view, and check its schema.
The Schema should match the Main input flow schema in order for the main flow to be checked against
the reference.
Note that two columns, Value and Matching, are added to the output schema. These are standard matching
information and are read-only.
5.
Select the method to be used to check the incoming data. In this scenario, Levenshtein is the Matching type
to be used.
6.
Then set the distance. In this method, the distance is the number of char changes (insertion, deletion or
substitution) that needs to be carried out in order for the entry to fully match the reference.
In this use case, we set both the minimum distance and the maximum distance to 0. This means only the
exact matches will be output.
747
7.
8.
Check that the matching column and look up column are correctly selected.
9.
As the edit distance has been set to 0 (min and max), the output shows the result of a regular join between the main
flow and the lookup (reference) flow, hence only full matches with Value of 0 are displayed.
A more obvious example is with a minimum distance of 1 and a maximum distance of 2, see section Scenario 2:
Levenshtein distance of 1 or 2 in first names
In the Component view of the tFuzzyMatch, change the minimum distance from 0 to 1. This excludes
straight away the exact matches (which would show a distance of 0).
2.
Change also the maximum distance to 2. The output will provide all matching entries showing a discrepancy
of 2 characters at most.
748
Make sure the Matching item separator is defined, as several references might be matching the main flow
entry.
4.
As the edit distance has been set to 2, some entries of the main flow match more than one reference entry.
You can also use another method, the metaphone, to assess the distance between the main flow and the reference,
which will be described in the next scenario.
Change the Matching type to Metaphone. There is no minimum nor maximum distance to set as the
matching method is based on the discrepancies with the phonetics of the reference.
749
2.
750
Save the Job and press F6. The phonetics value is displayed along with the possible matches.
tIntervalMatch
tIntervalMatch
tIntervalMatch properties
Component family
Data Quality
Function
tIntervalMatch receives a main flow and aggregates it based on join to a lookup flow. Then it
matches a specified value to a range of values and returns related information.
Purpose
Basic settings
Search Column
Column (LOOKUP)
Lookup Column (min) / Select the column containing the minimum value of the range.
Include the bound (min)
Select the check box to include the minimum value of the range
in the match.
Lookup Column (max) / Select the column containing the maximum value of the range.
Include the bound (max)
Select the check box to include the maximum value of the range
in the match.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component handles flow of data therefore it requires input and output, hence is defined as
an intermediary step.
Limitation
n/a
751
2.
Double-click the first tFileInputDelimited component to open its Basic settings view.
2.
Browse to the file to be used as the main input, which provides a list of servers and their IP addresses:
Server;IP
Server1;057.010.010.010
Server2;001.010.010.100
Server3;057.030.030.030
Server4;053.010.010.100
752
3.
Click the [...] button next to Edit schema to open the [Schema] dialog box and define the input schema.
According to the input file structure, the schema is made of two columns, respectively Server and IP, both
of type String. Then click OK to close the dialog box.
4.
Define the number of header rows to be skipped, and keep the other settings as they are.
5.
The file to be used as the input to the lookup flow in this example lists some IP address ranges and the
corresponding countries:
StartIP;EndIP;Country
001.000.000.000;001.255.255.255;USA
002.006.190.056;002.006.190.063;UK
011.000.000.000;011.255.255.255;USA
057.000.000.000;057.255.255.255;France
012.063.178.060;012.063.178.063;Canada
053.000.000.000;053.255.255.255;Germany
Accordingly, the schema of the lookup flow should have the following structure:
753
6.
7.
From the Search Column list, select the main flow column containing the values to be matched with the
range values. In this example, we want to match the servers' IP addresses with the range values from the
lookup flow.
8.
From the Column (LOOKUP) list, select the lookup column that holds the values to be returned. In this
example, we want to get the names of countries where the servers are hosted.
9.
Set the min and max lookup columns corresponding to the range bounds defined in the lookup schema, StartIP
and EndIP respectively in this example.
754
tReplaceList
tReplaceList
tReplaceList Properties
Component family
Data Quality
Function
Carries out a Search and Replace operation in the input columns defined based on an external
lookup.
Purpose
Basic settings
Lookup replacement column Select the column where the replacement values are stored.
Column options
Advanced settings
Select this check box to collect log data at the component level.
Usage
The following Job searches and replaces a list of states with their corresponding two-letter codes. The relevant
codes are taken from a reference file placed as lookup flow in the Job.
755
Drop the following components from the Palette onto the design workspace: two tFileInputDelimited
components, a tReplaceList and a tLogRow.
2.
Connect the two tFileInputDelimited components to the tReplaceList component using Row > Main
connections. Note that the link between the reference input component (the second tFileInputDelimited)
and the tReplaceList component appears as a lookup row.
3.
Connect the tReplaceList component to the tLogRow component using a Row > Main connection.
Double-click the first tFileInputDelimited component to open its Basic settings view and set the parameters
of the main input flow, including the path and name of the file to read and the number of header rows to skip.
In this example, the main input file provides a list of people names and US state names. The following shows
an extract of the file content:
name;state
Andrew Kennedy;Mississippi
Benjamin Carter;Louisiana
756
2.
Click the [...] button next to Edit schema to open the [Schema] dialog box and set the input schema.
According to the structure of the main input file, the input schema should contain two columns: name and
state.
When done, click OK to close the dialog box and propagate the changes to the next component.
3.
In this example, the reference input file provides a list of states and their two-letter codes. Accordingly, the
reference input schema should have two columns: state and code.
4.
Double-click the tReplaceList component to open its Basic settings view to set the operation to carry out.
757
5.
From the Lookup search column list, select the column to be searched. In this use case, we want to carry
out a search on the state column.
6.
From the Lookup replacement column list, select the column containing the replacement values, code for
the two-letter state codes in this example.
7.
In the Column options table, select Replace check box for the states column, to replace the state names with
their corresponding codes.
8.
In the tLogRow component, select the Table check box for a better readability of the output.
The state names have been replaced with their respective two-letter codes.
758
tSchemaComplianceCheck
tSchemaComplianceCheck
tSchemaComplianceCheck Properties
Component family
Data Quality
Function
Validates all input rows against a reference schema or check types, nullability, length of rows against
reference values. The validation can be carried out in full or partly.
Purpose
Helps to ensure the data quality of any source data against a reference data source.
Basic settings
Base Schema and Edit schema A schema is a row description, i.e. it defines the number of fields to be
processed and passed on to the next component. The schema is either
Built-in or stored remotely in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in
mode is available.
Describe the structure and nature of your data to be processed as it is.
Built-in: The schema will be created and stored locally for this
component only. Related topic: see Talend Studio User Guide.
Check all
schema
columns
from Select this option to carry out all checks on all columns against the base
schema.
Custom defined
Checked Columns
In this table, define what checks are to be carried out on which columns.
Column: Displays the columns names.
Type: Select the type of data each column is supposed to contain. This
validation is mandatory for all columns.
Date pattern: Define the expected date format for each column with
the data type of Date.
Nullable: Select the check box in an individual column to define the
column to be nullable, that is, to allow empty rows in this column to go
to the output flow regardless of the base schema definition. To define
all columns to be nullable, select the check box in the table header.
Undefined or empty: Select the check box in an individual column
to reject empty rows in this column while the column is not nullable
in the base schema definition. To carry out this verification on all the
columns, select the check box in the table header.
Max length: Select the check box in an individual column to verify
the data length of the column against the length definition of the base
schema. To carry out this verification on all the columns, select the
check box in the table header.
for Define a reference schema as you expect the data to be, in order to reject
the non-compliant data.
It can be restrictive on data type, null values, and/or length.
Trim the excess content of With any of the three modes of tSchemaComplianceCheck, select this
column when length checking check box to truncate the data that exceeds the length specified rather
chosen and the length is than reject it.
greater than defined length
This option is applicable only on data of String type.
Advanced settings
Select this check box to perform a fast date format check using the
TalendDate.isDate() method of the TalendDate system routine if Date
759
pattern is not defined. For more information about routines, see Talend
Studio User Guide.
Ignore TimeZone when Check Select this check box to ignore the time zone setup upon date check.
Date
Not available when the Check all columns from schema mode is
selected.
Treat all empty string as Select this check box to treat any empty fields in any columns as null
NULL
values, instead of empty strings.
By default, this check box is selected. When it is cleared, the Choose
Column(s) table shows to let you select individual columns.
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
This component is an intermediary step in the flow allowing to exclude from the main flow the noncompliant data. This component cannot be a start component as it requires an input flow. It also requires
at least one output component to gather the validated flow, and possibly a second output component
for rejected data using Rejects link. For more information, see Talend Studio User Guide.
Usage in Map/Reduce Jobs If you have subscribed to one of the Talend solutions with Big Data, you can also use this component
as a Map/Reduce component. In a Talend Map/Reduce Job, this component is used as an intermediate
step and other components used along with it must be Map/Reduce components, too. They generate
native Map/Reduce code that can be executed directly in Hadoop.
It does not support data of the Object and the List types.
For further information about a Talend Map/Reduce Job, see the sections describing how to create,
convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting
Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard
Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
2.
760
3.
Connect the tSchemaComplianceCheck component to the first tLogRow component using a Row > Main
connection. This output flow will gather the valid data.
4.
Connect the tSchemaComplianceCheck component to the second tLogRow component using a Row >
Rejects connection. This second output flow will gather the non-compliant data. It passes two additional
columns to the next component: ErrorCode and ErrorMessage. These two read-only columns provide
information about the rejected data to ease error handling and troubleshooting if needed.
Double-click the tFileInputDelimited component to display its Basic settings view and define the basic
parameters including the input file name and the number of header rows to skip.
2.
Click the [...] button next to Edit schema to describe the data structure of the input file. In this use case, the
schema is made of five columns: ID, Name, BirthDate, State, and City.
761
3.
Fill the Length field for the Name, State and City columns with 7, 10 and 10 respectively. Then click OK
to close the schema dialog box and propagate the schema.
4.
Double-click the tSchemaComplianceCheck component to display its Basic settings view, wherein you
will define most of the validation parameters.
5.
Select the Custom defined option in the Mode area to perform custom defined checks.
In this example, we use the Checked columns table to set the validation parameters. However, you can also
select the Check all columns from schema check box if you want to perform all the checks (type, nullability
and length) on all the columns against the base schema, or select the Use another schema for compliance
check option and define a new schema as the expected structure of the data.
6.
In the Checked Columns table, define the checks to be performed. In this use case:
- The type of the ID column should be Int.
762
- The length of the Name, State and City columns should be checked.
- The type of the BirthDate column should be Date, and the expected date pattern is dd-MM-yyyy.
- All the columns should be checked for null values, so clear the Nullable check box for all the columns.
To send rows containing fields exceeding the defined maximum length to the reject flow, make sure that the Trim
the excess content of column when length checking chosen and the length is greater than defined length check
box is cleared.
7.
In the Advanced settings view of the tSchemaComplianceCheck component, select the Treat all empty
string as NULL option to sent any rows containing empty fields to the reject flow.
8.
To view the validation result in tables on the Run console, double-click each tLogRow component and select
the Table option in the Basic settings view.
763
tUniqRow
tUniqRow
tUniqRow Properties
Component family
Data Quality
Function
Compares entries and sorts out duplicate entries from the input flow.
Purpose
Basic settings
Unique key
Advanced settings
Only once each duplicated key Select this check box if you want to have only the first duplicated
entry in the column(s) defined as key(s) sent to the output flow for
duplicates.
Use of disk (suitable for Select this check box to enable generating temporary files on the hard
processing large row set)
disk when processing a large amount of data. This helps to prevent
Job execution failure caused by memory overflow. With this check
Not available for box selected, you need also to define:
Map/Reduce Jobs.
- Buffer size in memory: Select the number of rows that can be
buffered in the memory before a temporary file is to be generated on
the hard disk.
- Directory for temp files: Set the location where the temporary files
should be stored.
Make sure that you specify an existing directory for
temporary files; otherwise your Job execution will fail.
Ignore trailing
BigDecimal
zeros
tStatCatcher Statistics
Global Variables
for Select this check box to ignore trailing zeros for BigDecimal data.
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
NB_UNIQUES: indicates the number of unique rows. This is an After variable and it returns an
integer.
NB_DUPLICATES: indicates the number of duplicate rows. This is an After variable and it returns
an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
764
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component handles flow of data therefore it requires input and output, hence is defined as an
intermediary step.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this component
as a Map/Reduce component. In a Talend Map/Reduce Job, this component is used as an intermediate
step and other components used along with it must be Map/Reduce components, too. They generate
native Map/Reduce code that can be executed directly in Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to create,
convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting
Started Guide.
For a scenario demonstrating a Map/Reduce Job using this component, see section Scenario 2:
Deduplicating entries using Map/Reduce components.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard
Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
n/a
Drop a tFileInputDelimited, a tSortRow, a tUniqRow, and two tLogRow components from the Palette to
the design workspace, and name the components as shown above.
2.
Connect the tFileInputDelimited component, the tSortRow component, and the tUniqRow component
using Row > Main connections.
3.
Connect the tUniqRow component and the first tLogRow component using a Main > Uniques connection.
4.
Connect the tUniqRow component and the second tLogRow component using a Main > Duplicates
connection.
765
2.
Click the [...] button next to the File Name field to browse to your input file.
3.
Define the header and footer rows. In this use case, the first row of the input file is the header row.
4.
Click Edit schema to define the schema for this component. In this use case, the input file has five columns:
Id, FirstName, LastName, Age, and City. Then click OK to propagate the schema and close the schema editor.
5.
6.
To rearrange the entries in the alphabetic order of the names, add two rows in the Criteria table by clicking
the plus button, select the FirstName and LastName columns under Schema column, select alpha as the
sorting type, and select the sorting order.
7.
766
8.
In the Unique key area, select the columns on which you want deduplication to be carried out. In this use
case, you will sort out duplicated names.
9.
In the Basic settings view of each of the tLogRow components, select the Table option to view the Job
execution result in table mode.
2.
Run the Job by pressing F6 or clicking the Run button on the Run tab.
The unique names and duplicated names are displayed in different tables on the Run console.
767
Note that the Talend Map/Reduce components are available to subscription-based Big Data users only and this
scenario can be replicated only with Map/Reduce components.
The sample data to be used in this scenario reads as follows:
1;Harry;Ford;68;Albany
2;Franklin;Wilson;79;Juneau
3;Ulysses;Roosevelt;25;Harrisburg
4;Harry;Ford;48;Olympia
5;Martin;Reagan;75;Columbia
6;Woodrow;Roosevelt;63;Harrisburg
7;Grover;McKinley;98;Atlanta
8;John;Taft;93;Montpelier
9;Herbert;Johnson;85;Lincoln
10;Grover;McKinley;33;Lansing
Since Talend Studio allows you to convert a Job between its Map/Reduce and Standard (Non Map/Reduce)
versions, you can convert the scenario explained earlier to create this Map/Reduce Job. This way, many
components used can keep their original settings so as to reduce your workload in designing this Job.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the
Hadoop distribution to be used. Then proceed as follows:
In the Repository tree view of the Integration perspective of Talend Studio, right-click the Job you have
created in the earlier scenario to open its contextual menu and select Edit properties.
Then the [Edit properties] dialog box is displayed. Note that the Job must be closed before you are able to
make any changes in this dialog box.
This dialog box looks like the image below:
768
Note that you can change the Job name as well as the other descriptive information about the Job from this
dialog box.
2.
Click Convert to Map/Reduce Job. Then a Map/Reduce Job using the same name appears under the Map/
Reduce Jobs sub-node of the Job Design node.
If you need to create this Map/Reduce Job from scratch, you have to right-click the Job Design node or the Map/
Reduce Jobs sub-node and select Create Map/Reduce Job from the contextual menu. Then an empty Job is
opened in the workspace. For further information, see the section describing how to create a Map/Reduce Job of
the Talend Open Studio for Big Data Getting Started Guide.
Double-click this new Map/Reduce Job to open it in the workspace. The Map/Reduce components' Palette is
opened accordingly and in the workspace, the crossed-out components, if any, indicate that those components
do not have the Map/Reduce version.
2.
Right-click each of those components in question and select Delete to remove them from the workspace.
3.
769
If from scratch, you have to drop a tSortRow component and a tUniqRow component, too.
4.
Connect tHDFSInput to tSortRow using the Row > Main link and accept to get the schema of tSortRow.
5.
Connect tUniqRow to tHDFSOutput using Row > Uniques and to tJDBCOutput using Row > Duplicates.
Click Run to open its view and then click the Hadoop Configuration tab to display its view for configuring
the Hadoop connection for this Job.
This view looks like the image below:
2.
From the Property type list, select Built-in. If you have created the connection to be used in Repository,
then select Repository and thus the Studio will reuse that set of connection information for this Job.
For further information about how to create an Hadoop connection in Repository, see the chapter describing
the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide.
3.
In the Version area, select the Hadoop distribution to be used and its version. If you cannot find from the list
the distribution corresponding to yours, select Custom so as to connect to a Hadoop distribution not officially
supported in the Studio.
For a step-by-step example about how to use this Custom option, see section Connecting to a custom Hadoop
distribution.
Note that if you use Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
4.
In the Name node field, enter the location of the master node, the NameNode, of the distribution to be used.
For example, hdfs://talend-cdh4-namenode:8020.
5.
In the Job tracker field, enter the location of the JobTracker of your distribution. For example, talend-cdh4namenode:8021.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs described in
Apache's documentation on http://hadoop.apache.org/.
770
6.
If the distribution to be used requires Kerberos authentication, select the Use Kerberos authentication check
box and complete the authentication details. Otherwise, leave this check box clear.
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains
pairs of Kerberos principals and encrypted keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal designates but
must have the right to read the keytab file being used. For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in this situation, ensure that user1 has the right to read
the keytab file to be used.
7.
In the User name field, enter the login user name for your distribution. If you leave it empty, the user name
of the machine hosting the Studio will be used.
8.
In the Temp folder field, enter the path in HDFS to the folder where you store the temporary files generated
during Map/Reduce computations.
9.
Leave the default value of the Path separator in server as it is, unless you have changed the separator used
by your Hadoop distribution's host machine for its PATH variable or in other words, that separator is not a
colon (:). In that situation, you must change this value to the one you are using in that host.
10. Leave the Clear temporary folder check box selected, unless you want to keep those temporary files.
11. If the Hadoop distribution to be used is Hortonworks Data Platform V1.2 or Hortonworks Data Platform
V1.3, you need to set proper memory allocations for the map and reduce computations to be performed by
the Hadoop system.
In that situation, you need to enter the values you need to in the Mapred job map memory mb and
the Mapred job reduce memory mb fields, respectively. By default, the values are both 1000 which are
normally appropriate for running the computations.
For further information about this Hadoop Configuration tab, see the section describing how to configure the
Hadoop connection for a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting Started Guide.
771
2.
Click the
defined.
button next to Edit schema to verify that the schema received in the earlier steps is properly
Note that if you are creating this Job from scratch, you need to click the
button to manually add these
schema columns; otherwise, if the schema has been defined in Repository, you can select the Repository
option from the Schema list in the Basic settings view to reuse it. For further information about how to define
a schema in Repository, see the chapter describing metadata management in the Talend Studio User Guide
or the chapter describing the Hadoop cluster node in Repository of the Getting Started Guide.
3.
If you make changes in the schema, click OK to validate these changes and accept the propagation prompted
by the pop-up dialog box.
4.
In the Folder/File field, enter the path, or browse to the source file you need the Job to read.
If this file is not in the HDFS system to be used, you have to place it in that HDFS, for example, using
tFileInputDelimited and tHDFSOutput in a Standard Job.
772
This component keeps its configuration used by the original Job. It sorts the incoming entries into alphabetical
order depending on the FirstName and the LastName columns.
2.
The component keeps as well its configuration from the original Job. It separates the incoming entries into
a Uniques flow and a Duplicates flow, then sends the unique entries to tHDFSOutput and the duplicate
entries to tJDBCOutput.
Configuring tHDFSOutput
1.
2.
As explained earlier for verifying the schema of tHDFSInput, do the same to verify the schema of
tHDFSOutput. If it is not consistent with that of its preceding component, tUniqRow, click Sync column
to retrieve the schema of tUniqRow.
773
3.
In the Folder field, enter the path, or browse to the folder you want to write the unique entries in.
4.
From the Action list, select the operation you need to perform on the folder in question. If the folder already
exists, select Overwrite; otherwise, select Create.
Configuring tJDBCOutput
1.
2.
In the JDBC URL field, enter the URL of the database in which you need to write the duplicate entries. In
this example, it is jdbc:mysql://10.42.10.13:3306/Talend, a MySQL database called Talend.
3.
In the Drive JAR table, add one row to the table by clicking the
774
button.
4.
5.
In the Class name field, enter the class file to be called. In this example, it is org.gjt.mm.mysql.Driver.
6.
In the User name and the Password fields, enter the authentication information to that database.
7.
In the Table name field, enter the name of the table in which you need to write data, for example, Namelist.
This table must already exist.
775
776
tUniservBTGeneric
tUniservBTGeneric
This component will be available in the Palette of the studio on the condition that you have subscribed to the relevant edition
of Data Quality Service Hub Studio.
tUniservBTGeneric properties
Component family
Data quality
Function
tUniservBTGeneric enables the execution of a processing created with the Uniserv product DQ
Batch Suite.
Purpose
tUniservBTGeneric sends the data to the DQ Batch Suite and starts the specified DQ Batch
Suite job. When the job execution is finished, the results are returned to the Data Quality Service
Hub Studio for further processing.
Basic settings
Advanced settings
Host name
Port
Client Server
User name
User name for the registration on the DQ Batch Suite server. The
stated user must have the right to execute the DQ Batch Suite job.
Password
Job directory
Job name
File path under which the DQ Batch Suite job to be executed will
be saved. The path to the file must be stated absolutely.
Temporary directory
Input Parameters
777
Usage
tUniservBTGeneric sends data to DQ Batch Suite and starts the specified DQ Batch Suite job.
When the execution is finished, the output data of the job is returned to Data Quality Service
Hub Studio for further processing.
Limitation
778
In the Repository view, expand the Metadata node and the directory in which you saved the source. Then
drag this source into the design workspace.
2.
3.
Drag the following components from the Palette into the design workspace: two tMap components,
tOracleOutput and tUniservBTGeneric.
4.
5.
Connect the other components via the Row > Main link.
6.
779
7.
Enter the connection data for the DQ Batch Suite job. Note that the absolute path must be entered in the field
Job File Path.
8.
Click Retrieve Schema to automatically create a schema for tUniservBTGeneric from the input and output
definitions of the DQ Batch Suite job and automatically fill in the fields in the Advanced Settings.
9.
Check the details in the Advanced Settings view. The definitions for input and output must be defined exactly
the same as the DQ Batch Suite job. If necessary, adapt the path for the temporary files.
10. Double-click tMap_1 to open the schema mapping window. On the left is the structure of the input source,
on the right is the schema of tUniservBTGeneric (and thus the input for the DQ Batch Suite job). At the
bottom is the Schema Editor, where you can find the attributes of the individual columns and edit them.
11. Assign the columns of the input source to the respective columns of tUniservBTGeneric. For this purpose,
select a column of the input source and drag it onto the appropriate column on the right side.
780
781
tUniservRTConvertName
tUniservRTConvertName
This component will be available in the Palette of the studio on the condition that you have subscribed to the relevant edition
of Data Quality Service Hub Studio.
tUniservRTConvertName properties
Component family
Data quality
Function
tUniservRTConvertName analyzes the name line against the context. For individual persons,
it divides the name line into segments (name, first name, title, name prefixes, name suffixes,
etc.) and creates the address key.
The component recognizes company or institution addresses and is able to provide the form of
the organization separately. It also divides lines that contain information on several persons to
separate lines and is able to recognize certain patterns that do not belong to the name information
in the name line (customer number, handling notes, etc.) and remove them or move them to
special memo fields.
Purpose
tUniservRTConvertName provides the basis for a uniform structuring and population of person
and company names in the database as well as the personalized salutation.
Basic settings
Host name
Port
Service
Use rejects
782
Advanced settings
Analysis Configuration
Output Configuration
Configuration
of
recognized input
Global Variables
not For detailed information, please refer to the Uniserv user manual
convert-name.
Cache Configuration
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
tUniservRTConvertName provides the basis for a uniform structuring and population of person
and company names in the database as well as the personalized salutation.
Limitation
1.
In the Repository view, expand the Metadata node and the directory in which the file is saved. Then drag
this file into the design workspace.
The dialog box below appears.
783
2.
3.
Drag the following components from the Palette into the design workspace: two tMap components,
tUniservRTConvertName, and tFileOutputDelimited..
4.
5.
6.
Double-click tMap_1 to open the schema mapping window. On the left is the structure of the input file, on
the right is the schema of tUniservRTConvertName. At the bottom lies the Schema Editor, where you can
find the attributes of the individual columns and edit them.
784
7.
Assign the columns of the input source to the respective columns of tUniservRTConvertName. For this
purpose, select a column of the input source and drag it onto the appropriate column on the right side. If
fields from the input file are to be passed on to the output file, like the address fields or IDs, you have to
define additional fields.
8.
9.
10. Fill in the server information and specify the country-specific service.
11. Double-click tMap_3 to open the mapping window. On the left is the schema of tUniservRTConvertName
and on the right is the schema of the output file.
785
786
tUniservRTMailBulk
tUniservRTMailBulk
This component will be available in the Palette of the studio on the condition that you have subscribed to the relevant edition
of Data Quality Service Hub Studio.
tUniservRTMailBulk properties
Component family
Data quality
Function
tUniservRTMailBulk creates an index pool for mailRetrieval with predefined input data.
Purpose
Basic settings
Advanced settings
Host name
Port
Service
Uniserv Parameters
tStatCatcher Statistics
Select this check box to collect log data at the Job and the
component levels.
Usage
Limitation
In the Repository view, expand the Metadata node and the directory in which the database is saved. Then
drag this database into the design workspace.
The dialog box below appears.
787
2.
3.
Drag the following components from the Palette into the design workspace: tMap and
tUniservRTMailBulk.
4.
5.
6.
Double-click tMap_1 to open the schema mapping window. On the left is the schema of the database file and
on the right is the schema of tUniservRTMailBulk. At the bottom is displayed the Schema Editor, where
you can find the attributes of the individual columns and edit them.
788
7.
Assign the columns of the input source to the respective columns of tUniservRTMailBulk. For this purpose,
select a column of the input source and drag it onto the appropriate column on the right side. The meaning
of the individual arguments is described in the Uniserv user manual mailRetrieval.
8.
9.
789
790
tUniservRTMailOutput
tUniservRTMailOutput
This component will be available in the Palette of the studio on the condition that you have subscribed to the relevant edition
of Data Quality Service Hub Studio.
tUniservRTMailOutput properties
Component family
Data Quality
Function
tUniservRTMailOutput updates the index pool that is used for duplicate search..
Purpose
Basic settings
Host name
Port
Service
Action on data
Advanced settings
Uniserv Parameters
tStatCatcher Statistics
Select this check box to collect log data at the Job and the
component levels.
Usage
tUniservRTMailOutput updates the index pool and passes the input set on. The output is
amended by the status of the operation. If the operation fails, an error message will be displayed.
Limitation
791
Related scenarios
Related scenarios
For a related scenario, see section Scenario: Adding contacts to the mailRetrieval index pool.
792
tUniservRTMailSearch
tUniservRTMailSearch
This component will be available in the Palette of the studio on the condition that you have subscribed to the relevant edition
of Data Quality Service Hub Studio.
tUniservRTMailSearch properties
Component family
Data quality
Function
tUniservRTMailSearch searches for similar data based on the given input record.
Purpose
tUniservRTMailSearch searches for duplicate values and adds additional data to each record.
Basic settings
Host name
Port
Service
Advanced settings
Uniserv Parameters
tStatCatcher Statistics
Select this check box to collect log data at the Job and the
component levels.
Usage
Limitation
793
The input file for this scenario is already saved in the Repository, so that all schema metadata is available.
Please note that the data from the input source must be related to the same country.
In the Repository view, expand the Metadata node and the directory in which the file is saved. Then drag
this file into the design workspace.
The dialog box below appears.
2.
3.
Drag the following components from the Palette into the design workspace: two tMap components,
tUniservRTMailSearch and tUniservRTMailOutput .
4.
5.
794
Double-click tMap_1 to open the schema mapping window. On the left is the structure of the input file and
on the right is the schema of tUniservRTMailSearch. At the bottom lies the Schema Editor, where you can
find the attributes of the individual columns and edit them.
2.
Assign the columns of the input file to the respective columns of tUniservRTMailSearch. For this purpose,
select a column of the input source and drag it onto the appropriate column on the right side.
3.
When your input list contains a reference ID, you should adopt it. In order to do so, create a new column
IN_DBREF in the Schema Editor and connect it with your reference ID.
Click OK to close the window.
4.
795
5.
6.
Click the [+] button to insert a new line in the window. Select Duplicate count under the element column,
> under the operator column, and 0 under the value column. So all the existing contacts are disqualified and
only the new contact will be added to the index pool.
7.
Enter the Advanced settings view and check the parameters. Reasonable parameters are preset. Detailed
information can be found in the manual mailRetrieval.
8.
Double-click tMap_3 to open schema mapping window. On the left is the schema of tUniservRTMailSearch
and on the right is the schema of tUniservRTMailOutput.
9.
10. The only field that must be assigned manually is the reference ID. In order to do so, drag OUT-DBREF from
the left side onto the field IN_DBREF on the right side.
796
From the Action on Data list, select Insert or update. This way, all new contacts are added to the index pool.
797
tUniservRTPost
tUniservRTPost
This component will be available in the Palette of the studio on the condition that you have subscribed to the relevant edition
of Data Quality Service Hub Studio.
tUniservRTPost properties
Component family
Data quality
Function
Purpose
tUniservRTPost helps to improve the addresses quality, which is extremely important for CRM
and e-business as it is directly related to postage and advertising costs.
Basic settings
Host name
Port
Service
Use rejects
Select this check box to collect faulty addresses via the rejects
connection. Usually they are the addresses with the post result
class 5. Valid values for the result class are 1-5. The value must
be between double quotation marks.
If this check box is not selected, the faulty addresses are output
via the Main connection.
If the check box is selected but the rejects connection is not
created, the faulty addresses are simply rejected.
Use File for ambiguous Select the check box to define a file for writing the selection list
results
to it.
When an address cannot be corrected unambiguously, a selection
list is created.
This list can be further processed via the AMBIGUITY
connection. All potential candidate results then run via this
connection. The schema of this connection is preinitialized with
the arguments of the dissolved selection list of the service 'post'.
Advanced settings
Uniserv Parameters
tStatCatcher Statistics
Select this check box to collect log data at the Job and the
component levels.
Full address selection list Select the check box Display to show all the columns. Or, select
the check box next to a particular column to show it alone.
798
Scenario 1: Checking and correcting the postal code, city and street
tUniservRTPost requires an input set. Its postal validation will then be checked. In case of an
unambiguous result, the corrected set will be output via the Main connection. If the address is
ambiguous, the potential candidates will be output via the Ambiguity connection. If an address
was not found, it will be passed on via the Reject connection.
Limitation
To use tUniservRTPost, the Uniserv software International Postal Framework and the required
post servers must be installed.
In the Repository view, expand the Metadata node and the directory in which the file is saved. Then drag
this file into the design workspace.
The dialog box below appears.
2.
3.
Drag the following components from the Palette into the design workspace: two tMap components,
tUniservRTPost and tFileOutputDelimited .
4.
799
Scenario 1: Checking and correcting the postal code, city and street
5.
6.
Double-click tMap_1 to open the schema mapping window. On the left is the structure of the input file and
on the right is the schema of tUniservRTPost. At the bottom is displayed the Schema Editor, where you
can find the attributes of the individual columns and edit them.
800
Scenario 1: Checking and correcting the postal code, city and street
7.
Assign the columns of the input file to the respective columns of tUniservRTPost. For this purpose, select a
column of the input source and drag it onto the appropriate column on the right side. If fields from the input
file are to be passed on to the output file, e.g. the names or the IDs, additional fields must be defined.
When assigning the fields, note that street and house number can either be saved together in the street column or
respectively in separate fields. If your data list does not have a country code but the addresses are from the same
country, the relevant ISO-country code should be manually entered between double quotation marks in the column
IN_COUNTRY. If you have an international data list without country code, just leave the column IN_COUNTRY empty.
For detailed information, please refer to the Uniserv user manual International Postal Framework.
8.
9.
10. Change the parameters and field lengths if necessary and select the output fields.
Make sure sufficient field length is defined. For detailed information, please refer to the Uniserv user manual
International Postal Framework.
11. Double-click tMap_3 to open schema mapping window. On the left is the schema of tUniservRTPost and
on the right is the schema of the output file.
801
Scenario 2: Checking and correcting the postal code, city and street, as well as rejecting the unfeasible
2.
Drag the following additional components from the Palette into the design workspace: tMap and
tFileOutputDelimited.
3.
4.
Select the Use rejects check box and enter "5" in the field if result class greater or equals to. This is the
result class from the check of postal codes in addresses, which contain too few or unfeasible data.
802
Scenario 2: Checking and correcting the postal code, city and street, as well as rejecting the unfeasible
5.
6.
7.
Define the fields for the output file in the mapping window.
803
tAccessBulkExec
tAccessBulkExec
tAccessBulkExec properties
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data to a
delimited file and then to perform various actions on the file in an Access database, in a two step process. These two
steps are fused together in the tAccessOutputBulkExec component, detailed in a separate section. The advantage
of using a two step process is that it makes it possible to carry out transformations on the data before loading it
in the database.
Component family
Databases/Access
Function
Purpose
As a dedicated component, tAccessBulkExec offers gains in performance when carrying out Insert
operations in an Access database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data is stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB version
Database
Action on table
806
Related scenarios
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist already for the insert
operation to succeed.
Local filename
Action on data
Advanced settings
Dynamic settings
Additional
parameters
Include header
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with tAccessOutputBulk component. Used together, they can
offer gains in performance while feeding an Access database.
Related scenarios
For use cases in relation with tAccessBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database
section Scenario: Inserting data in MySQL database
807
tAccessClose
tAccessClose
tAccessClose properties
Component family
Databases/Access
Function
Purpose
Basic settings
Component list
Advanced settings
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with other Access components, especially with
tAccessConnection and tAccessCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
808
tAccessCommit
tAccessCommit
tAccessCommit Properties
This component is closely related to tAccessConnection and tAccessRollback. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases/Access
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Access components, especially with tAccessConnection
and tAccessRollback components.
Limitation
n/a
Related scenario
This component is closely related to tAccessConnection and tAccessRollback. It usually does not make much
sense to use one of these without using a tAccessConnection component to open a connection for the current
transaction.
For tAccessCommit related scenario, see section tMysqlConnection
809
tAccessConnection
tAccessConnection
tAccessConnection Properties
This component is closely related to tAccessCommit, tAccessInput and tAccessOutput. It usually does not make
much sense to use one of these without using a tAccessConnection component to open a connection for the current
transaction.
Component family
Databases/Access
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Additional
parameters
Usage
This component is to be used along with Access components, especially with tAccessCommit and
tAccessOutput components.
Limitation
n/a
810
Drop the following components from the Palette to the design workspace: tFileList, tFileInputDelimited,
tMap, tAccessOutput (two), tAccessInput (two), tAccessCommit, tAccessClose and tLogRow (x2).
Connect the tFileList component to the input file component using an Iterate link. Thus, the name of the file
to be processed will be dynamically filled in from the tFileList directory using a global variable.
Connect the tFileInputDelimited component to the tMap component and dispatch the flow between the two
output Access components. Use a Row link for each of these connections representing the main data flow.
Set the tFileList component properties, such as the directory where files will be fetched from.
Add a tAccessConnection component and connect it to the starter component of this Job. In this example, the
tFileList component uses an OnComponentOk link to define the execution order.
In the tAccessConnection Component view, set the connection details.
In the tFileInputDelimited components Basic settings view, press Ctrl+Space bar to access the variable list.
Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH. For more information about
using variables, see Talend Studio User Guide.
811
Set the rest of the fields as usual, defining the row and field separators according to your file structure.
Then set the schema manually through the Edit schema dialog box. Make sure the data type is correctly set,
in accordance with the nature of the data processed.
In the tMap Output area, add two output tables, one called Name for the Name table, the second called Birthday,
for the Birthday table. For more information about the tMap component, see Talend Studio User Guide.
Drag the Name column from the Input area, and drop it to the Name table.
Drag the Birthday column from the Input area, and drop it to the Birthday table.
Then connect the output row links to distribute the flow correctly to the relevant DB output components.
In each of the tAccessOutput components Basic settings view, select the Use an existing connection check
box to retrieve the tAccessConnection details.
Set the Table name making sure it corresponds to the correct table, in this example either Name or Birthday.
There is no action on the table as they are already created.
Select Insert as Action on data for both output components.
Click on Sync columns to retrieve the schema set in the tMap.
Then connect the first tAccessOutput component to the first tAccessInput component using an
OnComponentOk link.
In each of the tAccessInput components Basic settings view, select the Use an existing connection check
box to retrieve the distributed data flow. Then set the schema manually through Edit schema dialog box.
Then set the Table Name accordingly. In tAccessInput_1, this will be Name.
Click on the Guess Query.
Connect each tAccessInput component to tLogRow component with a Row > Main link. In each of the
tLogRow components basic settings view, select Table in the Mode field.
812
Add the tAccessCommit component below the tFileList component in the design workspace and connect them
together using an OnComponentOk link in order to terminate the Job with the transaction commit.
In the basic settings view of tAccessCommit component and from the Component list, select the connection
to be used, tAccessConnection_1 in this scenario.
Save your Job and press F6 to execute it.
The parent table Table1 is reused to generate the Name table and Birthday table.
813
tAccessInput
tAccessInput
tAccessInput properties
Component family
Databases/Access
Function
Purpose
tAccessInput executes a DB query with a strictly defined statement which must correspond to
the schema definition. Then it passes on the field list to the next component via a Row > Main
connection.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version
Database
Additional
parameters
Select this check box to collect log data at the component level.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Dynamic settings
814
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
Related scenarios
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all possible SQL queries.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
Related topic in description of section tContextLoad.
815
tAccessOutput
tAccessOutput
tAccessOutput properties
Component family
Databases/Access
Function
Purpose
tAccessOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
816
tAccessOutput properties
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection you
are creating. This option is not available if you have selected the Use
an existing connection check box in the Basic settings.
You can press Ctrl+Space to access a list of predefined
global variables.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Select this check box to collect log data at the component level.
Select this check box to display each step during processing entries
in a database.
Support null in
WHERE statement
SQL Select this check box if you want to deal with the Null values
contained in a DB table.
Make sure the Nullable check box is selected for the
corresponding columns in the schema.
817
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Access database. It also allows you to create a reject flow using a Row
> Rejects link to filSchemaSchemater data in error. For an example of tMySqlOutput in use, see
section Scenario 3: Retrieve data in error with a Reject link.
Related scenarios
For related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection
section Scenario 1: Adding a new column and altering data in a DB table.
818
tAccessOutputBulk
tAccessOutputBulk
tAccessOutputBulk properties
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data to a
delimited file and then to perform various actions on the file in an Access database, in a two step process. These two
steps are fused together in the tAccessOutputBulkExec component, detailed in a separate section. The advantage
of using a two step process is that it makes it possible to carry out transformations on the data before loading it
in the database.
Component family
Databases/Access
Function
Purpose
tAccessOutputBulk prepares the file which contains the data used to feed the Access database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Select this check box to create the as yet non-existant file directory
that specified in the File name field.
Append
Select this check box to add any new rows to the end of the file
Advanced settings
Usage
Include header
Select this check box to include the column header in the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box to collect log data at the component level.
This component is to be used along with tAccessBulkExec component. Used together they offer
gains in performance while feeding an Access database.
Related scenarios
For use cases in relation with tAccessOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database
section Scenario: Inserting data in MySQL database
819
tAccessOutputBulkExec
tAccessOutputBulkExec
tAccessOutputBulkExec properties
The tAccessOutputBulk and tAccessBulkExec components are generally used together to output data to a
delimited file and then to perform various actions on the file in an Access database, in a two step process. These
two steps are fused together in tAccessOutputBulkExec.
Component family
Databases/Access
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version
DB name
Action on table
Table
820
Related scenarios
Note that only one table can be written at a time and that the
table must already exist for the insert operation to succeed
FileName
Action on data
Create directory if not exists Select this check box to create the as yet non existant file directory
specified in the File name field.
Append
Advanced settings
Additional
parameters
Select this check box to append new rows to the end of the file.
JDBC Specify additional connection properties for the DB connection you
are creating. This option is not available if you have selected the Use
an existing connection check box in the Basic settings.
You can press Ctrl+Space to access a list of predefined
global variables.
Dynamic settings
Include header
Select this check box to include the column header to the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect the log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is mainly used when no particular transformation is required on the data to be
loaded in the database.
Limitation
n/a
Related scenarios
For use cases in relation with tAccessOutputBulkExec, see the following scenarios:
section Scenario: Inserting data in MySQL database
section Scenario: Inserting transformed data in MySQL database
821
tAccessRollback
tAccessRollback
tAccessRollback properties
This component is closely related to tAccessConnection and tAccessCommit components. It usually does not
make much sense to use these components independently in a transaction.
Component family
Databases/Access
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is to be used along with Access components, especially with tAccessConnection
and tAccessCommit.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Limitation
n/a
Related scenarios
For tAccessRollback related scenario, see tMysqlRollback.
822
tAccessRow
tAccessRow
tAccessRow properties
Component family
Databases/Access
Function
tAccessRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tAccessRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version
Database
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
823
Related scenarios
Advanced settings
Query
Commit every
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
Use PreparedStatement
Select this check box if you want to query the database using
a PreparedStatement. In the Set PreparedStatement Parameter
table, define the parameters represented by ? in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same
query several times. Performance levels are increased
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenarios
For related topics, see:
section Scenario: Resetting a DB auto-increment
section Scenario 1: Removing and regenerating a MySQL table index.
824
tAS400Close
tAS400Close
tAS400Close properties
Component family
Databases/AS400
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with AS400 components, especially with tAS400Connection
and tAS400Commit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
825
tAS400Commit
tAS400Commit
tAS400Commit Properties
This component is closely related to tAS400Connection and tAS400Rollback. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/AS400
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with AS400 components, especially with tAS400Connection
and tAS400Rollback components.
Limitation
n/a
Related scenario
This component is closely related to tAS400Connection and tAS400Rollback. It usually does not make much
sense to use one of these without using a tAS400Connection component to open a connection for the current
transaction.
For tAS400Commit related scenario, see section tMysqlConnection
826
tAS400Connection
tAS400Connection
tAS400Connection Properties
This component is closely related to tAS400Commit and tAS400Rollback. It usually does not make much sense
to use one of the components without using a tAS400Connection component to open a connection for the current
transaction.
Component family
Databases/AS400
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Host
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Additional
parameters
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
This component is to be used along with AS400components, especially with tAS400Commit and
tAS400Rollback components.
Limitation
n/a
Related scenario
This component is closely related to tAS400Commit and tAS400Rollback. It usually does not make much sense
to use one of these without using a tAS400Connection component to open a connection for the current transaction.
827
Related scenario
828
tAS400Input
tAS400Input
tAS400Input properties
Component family
Databases/AS400
Function
Purpose
tAS400SInput executes a DB query with a strictly defined statement which must correspond to
the schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Host
Port
Database
Additional
parameters
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Select this check box to collect log data at the component level.
829
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenarios
For related topic, see tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table
section Scenario 2: Using StoreSQLQuery variable.
Related topic in tContextLoad, see section Scenario: Reading data from different MySQL databases using
dynamically loaded connection parameters.
830
tAS400LastInsertId
tAS400LastInsertId
tAS400LastInsertId properties
Component family
Databases
Function
Purpose
tAS400LastInsertId obtains the primary key value of the record that was last inserted in an AS400
table by a user.
Basic settings
Component list
Advanced settings
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
For a related scenario, see section Scenario: Get the ID for the last inserted record.
831
tAS400Output
tAS400Output
tAS400Output properties
Component family
Databases/DB2
Function
Purpose
tAS400Output executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
DB Version
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
832
tAS400Output properties
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Select this check box to have access to the Commit every field where
you can define the commit operation.
Commit every: Enter the number of rows to be completed before
committing batches of rows together into the DB. This option
ensures transaction quality (but not rollback) and, above all, better
performance at execution.
Additional
parameters
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
833
Related scenarios
Select this check box to display each step during processing entries
in a database.
Select this check box to activate the batch mode for data processing.
In the Batch Size field, you can type in the number of rows to be
processed in batches.
This check box is available only when you have selected
the Insert, Update or Delete option in the Action on data
field.
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a AS400 database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Related scenarios
For related topics, see
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
834
tAS400Rollback
tAS400Rollback
tAS400Rollback properties
This component is closely related to tAS400Commit and tAS400Connection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/AS400
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with AS400 components, especially with tAS400Connection
and tAS400Commit.
Limitation
n/a
Related scenarios
For tAS400Rollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
835
tAS400Row
tAS400Row
tAS400Row properties
Component family
Databases/AS400
Function
tAS400Row is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tAS400Row acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Host
Port
Database
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
836
Related scenarios
Advanced settings
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Additional
Parameters
Propagate
recordset
Use PreparedStatement
Select this check box if you want to query the database using
a PreparedStatement. In the Set PreparedStatement Parameter
table, define the parameters represented by ? in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same
query several times. Performance levels are increased
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment
section Scenario 1: Removing and regenerating a MySQL table index.
837
tDB2BulkExec
tDB2BulkExec
tDB2BulkExec properties
Component
family
Databases/DB2
Function
Purpose
As a dedicated component, tDB2BulkExec allows gains in performance during Insert operations to a DB2
database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Host
Port
Database
Table Schema
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and created again.
Clear table: The table content is deleted.
Schema and Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed
Schema
on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: You create the schema and store it locally for this component only. Related topic: see
Talend Studio User Guide.
838
Related scenarios
Data file
Action on data
Advanced
settings
Additional
JDBC
parameters
Specify additional connection properties for the DB connection you are creating.
You can set the encoding parameters through this field.
Use this field to define the way months and days are ordered.
Time Format
Use this field to define the way hours, minutes and seconds are ordered.
Timestamp
Format
Use this field to define the way date and time are ordered.
Remove
pending
load When the box is ticked, tables blocked in "pending" status following a bulk load are de-blocked.
Load options
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This dedicated component offers performance and flexibility of DB2 query handling.
Limitation
This component requires installation of its related jar files. For more information about the installation of these
missing jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade
Guide.
Related scenarios
For tDB2BulkExec related topics, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
839
tDB2Close
tDB2Close
tDB2Close properties
Component family
Databases/DB2
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with DB2 components, especially with tDB2Connection and
tDB2Commit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
840
tDB2Commit
tDB2Commit
tDB2Commit Properties
This component is closely related to tDB2Connection and tDB2Rollback. It usually doesnt make much sense
to use these components independently in a transaction.
Component family
Databases/DB2
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with DB2 components, especially with tDB2Connection and
tDB2Rollback components.
Limitation
n/a
Related scenario
This component is closely related to tDB2Connection and tDB2Rollback. It usually doesnt make much sense
to use one of these without using a tDB2Connection component to open a connection for the current transaction.
For tDB2Commit related scenario, see section tMysqlConnection
841
tDB2Connection
tDB2Connection
tDB2Connection properties
This component is closely related to tDB2Commit and tDB2Rollback. It usually does not make much sense to
use one of these without using a tDB2Connection to open a connection for the current transaction.
Component family
Databases/DB2
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host name
Port
Database
Table Schema
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Additional
parameters
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
This component is to be used along with DB2 components, especially with tDB2Commit and
tDB2Rollback.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
842
Related scenarios
Related scenarios
This component is closely related to tDB2Commit and tDB2Rollback. It usually does not make much sense to
use one of these without using a tDB2Connection component to open a connection for the current transaction.
For tDB2Connection related scenario, see section tMysqlConnection
843
tDB2Input
tDB2Input
tDB2Input properties
Component Databases/
family
DB2
Function
Purpose
tDB2Input executes a DB query with a strictly defined order which must correspond to the schema definition. Then
it passes on the field list to the next component via a Main row link.
If double quotes exist in the column names of a table, the double quotation marks cannot be retrieved when
retrieving the column. Therefore, it is recommended not to use double quotes in column names in a DB2
database table.
Basic
settings
Property type .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use
an Select this check box and in the Component List click the relevant connection component to reuse the
existing
connection details you already defined.
connection
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the parent
Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings view
of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend Studio
User Guide.
Host
Port
Database
Schema
Username
DB user authentication data.
and Password
Schema and A schema is a row description, i.e., it defines the number of fields to be processed and passed on to
Edit Schema the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: The schema is created and stored locally for this component only. Related topic: see Talend
Studio User Guide.
Table name
Select the source table where to capture any changes made on data.
Query type Enter your DB query paying particularly attention to properly sequence the fields in order to match
and Query
the schema definition.
Advanced
settings
Additional
JDBC
parameters
Specify additional connection properties for the DB connection you are creating.
You can set the encoding parameters through this field.
Trim
all Select this check box to remove leading and trailing whitespace from all the String/Char columns.
the
String/
Char columns
844
Related scenarios
Trim column Remove leading and trailing whitespace from defined columns.
tStat Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an
environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for DB2 databases.
Limitation
This component requires installation of its related jar files. For more information about the installation of these missing
jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
See also the related topic in section Scenario: Reading data from different MySQL databases using dynamically
loaded connection parameters.
845
tDB2Output
tDB2Output
tDB2Output properties
Component
family
Databases/DB2
Function
Purpose
tDB2Output executes the action defined on the table and/or on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
Basic settings Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Table schema
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
On the table defined, you can perform one of the following operations:
Default: No operation is carried out.
Drop and create a table: The table is removed and created again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not exist.
Drop a table if exists and create: The table is removed if it already exists and created again.
Clear a table: The table content is deleted.
Truncate table: The table content is deleted. You do not have the possibility to rollback the
operation.
Truncate table with reuse storage: The table content is deleted. You do not have the possibility
to rollback the operation. However, you can reuse the existing storage allocated to the table,
even if the storage is considered empty.
846
tDB2Output properties
Action on data
Schema and Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed
Schema
on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
Die on error
Advanced
settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the row on error and complete
the process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects
link.
JDBC Specify additional connection properties for the DB connection you are creating.
You can set the encoding parameters through this field.
Commit every
Enter the number of rows to be completed before committing batches of rows together into the
DB. This option ensures transaction quality (but not rollback) and, above all, better performance
at execution.
Additional Columns This option is not offered if you create (with or without drop) the DB table. This option allows
you to call SQL functions to perform actions on columns, which are not insert, nor update or
delete actions, or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or inserted as new column
SQL expression: Type in the SQL statement to be executed in order to alter or insert the relevant
column data.
Position: Select Before, Replace or After following the action to be performed on the reference
column.
Reference column: Type in a column of reference that the tDBOutput can use to place or
replace the new or altered column.
Use field options
Select this check box to customize a request, especially when there is double action on data.
Convert columns Select this check box to uppercase the names of the columns and the name of the table.
and table names to
uppercase
Enable debug mode Select this check box to display each step during processing entries in a database.
Support null in Select this check box if you want to deal with the Null values contained in a DB table.
SQL
WHERE
Make sure the Nullable check box is selected for the corresponding columns in the
statement
schema.
Use batch size
Select this check box to activate the batch mode for data processing. In the Batch Size field
that appears when this check box is selected, you can type in the number you need to define
the batch size to be processed.
847
Related scenarios
This check box is available only when you have selected the Insert, the Update or
the Delete option in the Action on data field.
tStat
Statistics
Dynamic
settings
Catcher Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible.
This component must be used as an output component. It allows you to carry out actions on a table or on the data of
a table in a DB2 database. It also allows you to create a reject flow using a Row > Rejects link to filter data in error.
For an example of tMySqlOutput in use, see section Scenario 3: Retrieve data in error with a Reject link.
Limitation
This component requires installation of its related jar files. For more information about the installation of these
missing jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade Guide.
Related scenarios
For tDB2Output related topics, see
section Scenario: Writing a row to a table in the MySql database via an ODBC connection
section Scenario 1: Adding a new column and altering data in a DB table.
848
tDB2Rollback
tDB2Rollback
tDB2Rollback properties
This component is closely related to tDB2Commit and tDB2Connection. It usually does not make much sense
to use these components independently in a transaction.
Component family
Databases/DB2
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with DB2 components, especially with tDB2Connection and
tDB2Commit.
Limitation
n/a
Related scenarios
For tDB2Rollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter tables
of the tMysqlRollback.
849
tDB2Row
tDB2Row
tDB2Row properties
Component
family
Databases/DB2
Function
tDB2Row is the specific component for this database query. It executes the SQL query stated onto the specified
database. The row suffix means the component implements a flow in the job design although it doesnt provide
output.
Purpose
Depending on the nature of the query and the database, tDB2Row acts on the actual DB structure or on the data
(although without handling data). The SQLBuilder tool helps you write easily your SQL statements.
Basic settings
Use
an
connection
existing Select this check box and in the Component List click the relevant connection component
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Username
Password
Schema
Schema
Edit A schema is a row description, i.e., it defines the number of fields to be processed and
passed on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: The schema is created and stored locally for this component only. Related topic:
see Talend Studio User Guide.
Query type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: Fill in manually the query statement or build it graphically using SQLBuilder
Advanced
settings
850
Query
Enter your DB query paying particularly attention to properly sequence the fields in order
to match the schema definition.
Die on error
This check box is selected by default. Clear the check box to skip the row on error and
complete the process for error-free rows. If needed, you can retrieve the rows on error via
a Row > Rejects link.
Additional
parameters
JDBC Specify additional connection properties for the DB connection you are creating.
Related scenarios
Number of rows to be completed before committing batches of rows together into the DB.
This option ensures transaction quality (but not rollback) and above all better performance
on executions.
Use
PreparedStatement
Select this checkbox if you want to query the database using a PreparedStatement. In the
Set PreparedStatement Parameter table, define the parameters represented by ? in the
SQL instruction of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same query several times.
Performance levels are increased
tStatCatcher
Statistics
Dynamic
settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
This component requires installation of its related jar files. For more information about the installation of these
missing jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade
Guide.
Related scenarios
For tDB2Row related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment
section Scenario 1: Removing and regenerating a MySQL table index.
851
tDB2SCD
tDB2SCD
tDB2SCD belongs to two component families: Business Intelligence and Databases. For more information on it,
see section tDB2SCD.
852
tDB2SCDELT
tDB2SCDELT
tDB2SCDELT belongs to two component families: Business Intelligence and Databases. For more information
on it, see section tDB2SCDELT.
853
tDB2SP
tDB2SP
tDB2SP properties
Component
family
Databases/DB2
Function
Purpose
tDB2SP offers a convenient way to centralize multiple or complex queries in a database and call them easily.
Basic settings Use an existing Select this check box and in the Component List click the relevant connection component to reuse
connection
the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Username
Password
Schema and Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed on
Schema
to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: You create and store the schema locally for this component only. Related topic: see
Talend Studio User Guide.
SP Name
Click the Plus button and select the various Schema Columns that will be required by the
procedures. Note that the SP schema can hold more columns than there are parameters used in
the procedure.
Select the Type of parameter:
IN: Input parameter
OUT: Output parameter/return value
IN OUT: Input parameters is to be returned as value, likely after modification through the
procedure (function).
854
Related scenarios
RECORDSET: Input parameters is to be returned as a set of values, rather than single value.
Check the section Scenario: Inserting data in mother/daughter tables if you want to
analyze a set of records from a database table or DB query and return single records.
Advanced
settings
Additional
JDBC
parameters
Specify additional connection properties for the DB connection you are creating.
You can set the encoding parameters through this field.
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as intermediary component. It can be used as start component but only input parameters
are thus allowed.
Limitation
This component requires installation of its related jar files. For more information about the installation of these
missing jar files, see the section describing how to configure the Studio of the Talend Installation and Upgrade
Guide.
Related scenarios
For related topic, see section Scenario: Executing a stored procedure in the MDM Hub.
Check section Scenario: Inserting data in mother/daughter tables as well if you want to analyze a set of records
from a database table or DB query and return single records.
855
tInformixBulkExec
tInformixBulkExec
tInformixBulkExec Properties
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the first step,
an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tInformixOutputBulkExec component, detailed in another section. The
advantage of using two components is that data can be transformed before it is loaded in the database.
Component Family
Databases/Informix
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Execution Platform
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Instance
Table
Name of the table to be written. Note that only one table can be
written at a time.
Action on table
856
tInformixBulkExec Properties
Create a table if not exists: The table is created if it does not exist.
Drop a table if exists and create: The table is removed if it already
exists and created again.
Clear a table: The table content is deleted.
Schema and Edit Schema
Informix Directory
Data file
Action on data
On the data of the table defined, you can perform the following
operations:
Insert: Add new data to the table. If duplicates are found, the job
stops.
Update: Update the existing table data.
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Delete the entry data which corresponds to the input flow.
You must specify at least one key upon which the Update
and Delete operations are to be based. It is possible
to define the columns which should be used as the key
from the schema, from both the Basic Settings and the
Advanced Settings, to optimise these operations.
Advanced settings
Dynamic settings
Additional
parameters
Field terminated by
Set DBMONEY
Select this check box to define the decimal separator in the Decimal
separator field.
Set DBDATE
Enter the number of rows in error at which point the Job should stop.
Select this check box to colelct the log data at component level.
Output
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers database query flexibility and covers all possible DB2 queries which may
be required.
857
Related scenario
Limitation
The database server/client must be installed on the same machine where the Studio is installed or
where the Job using tInformixBulkExec is deployed, so that the component functions properly.
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenario
For a scenario in which tInformixBulkExec might be used, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
858
tInformixClose
tInformixClose
tInformixClose properties
Component Family
Databases/Informix
Function
Purpose
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used along
with tInformixConnection as the latter allows you to open a connection for the transaction which is underway.
To see a scenario in which tInformixClose might be used, see section tMysqlConnection.
859
tInformixCommit
tInformixCommit
tInformixCommit properties
This component is closely related to tInformixConnection and tInformixRollback. They are generally used to
execute transactions together.
Component Family
Databases/Informix
Function
Purpose
Using a single connection, make a global commit just once instead of commiting every row or
batch of rows separately. This improves performance.
Basic settings
Component list
Close connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect the log data at a component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
with
Informix
components,
particularly
Related Scenario
This component is for use with tInformixConnection and tInformixRollback. They are generally used along
with tInformixConnection as the latter allows you to open a connection for the transaction which is underway
To see a scenario in which tInformixCommit might be used, see section tMysqlConnection.
860
tInformixConnection
tInformixConnection
tInformixConnection properties
This component is closely related to tInformixCommit and tInformixRollback. They are generally used along
with tInformixConnection, with tInformixConnection opening the connection for the transaction.
Database Family
Databases/Informix
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Schema
Username et Password
Instance
Additional
parameters
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Use Transaction
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Usage
This component is generally used with other Informix components, particularly tInformixCommit
and tInformixRollback.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
861
Related scenario
Related scenario
For a scenario in which the tInformixConnection, might be used, see section Scenario: Inserting data in mother/
daughter tables.
862
tInformixInput
tInformixInput
tInformixInput properties
Component family
Databases/Informix
Function
Purpose
tInformixInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
DB server
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for DB2 databases.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
863
Related scenarios
864
tInformixOutput
tInformixOutput
tInformixOutput properties
Component family
Databases/Informix
Function
Purpose
tInformixOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
DB server
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
865
tInformixOutput properties
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection you
are creating. This option is not available if you have selected the Use
an existing connection check box in the Basic settings.
You can press Ctrl+Space to access a list of predefined
global variables.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
866
Select this check box to display each step during processing entries
in a database.
Related scenarios
Optimize the batch insertion Ensure the check box is selected, to optimize the insertion of batches
of data.
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Informix database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For tInformixOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
867
tInformixOutputBulk
tInformixOutputBulk
tInformixOutputBulk properties
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the first step,
an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tInformixOutputBulkExec component, detailed in another section. The
advantage of using two components is that data can be transformed before it is loaded in the database.
Component family
Databases/Informix
Function
Writes a file composed of columns, based on a defined delimiter and on Informix standards.
Purpose
Prepares the file to be used as a parmameter in the INSERT query used to feed Informix
databases.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally
File Name
Append
Select this check box to append new rows to the end of the file.
Advanced settings
Row separator
Field separator
Set DBMONEY
Select this box if you want to define the decimal separator in the
corresponding field.
Set DBDATE
Create directory if not exists This check box is selected automatically. The option allows you
to create a folder for the output file if it doesnt already exist.
Custom the flush buffer size Select this box in order to customize the memory size used to
store the data temporarily. In the Row number field enter the
number of rows at which point the memory should be freed.
Usage
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component is generally used along with tInformixBulkExec. Together, they improve
performance levels when adding data to an Informix database.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
868
Related scenario
Related scenario
For a scenario in which tInformixOutputBulk might be used, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
869
tInformixOutputBulkExec
tInformixOutputBulkExec
tInformixOutputBulkExec properties
tInformixOutputBulk and tInformixBulkExec are generally used together in a two step process. In the first step,
an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tInformixOutputBulkExec component.
Component Family
Databases/Informix
Function
Purpose
Basic settings
Property Type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
No properties stored centrally
Execution platform
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Username et Password
Instance
Table
Name of the table to be written. Note that only one table can be
written at a time and the table must already exist for the insert
operation to be authorised.
Action on table
870
tInformixOutputBulkExec properties
Create a table if not exists: The table is created if it does not exist.
Drop a table if exists and create: The table is removed if it already
exists and created again.
Clear a table: The table content is deleted.
Schema and Edit schema
Informix Directory
Data file
Append
Select this check box to add rows to the end of the file.
Action on data
Advanced settings
Additional
parameters
Row separator
Fields terminated by
Set DBMONEY
Select this check box to define the decimal separator used in the
corresponding field.
Set DBDATE
Enter the number of rows in error at which point the Job should stop.
Create directory if not exists This check box is selected by default. It creates a directory to hold
the output table if required.
Custom the flush buffer size Select this box in order to customize the memory size used to store
the data temporarily. In the Row number field enter the number of
rows at which point the memory should be freed.
Dynamic settings
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Output
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is generally used when no particular transformation is required on the data to be
inserted in the database.
Limitation
The database server/client must be installed on the same machine where the Studio is installed
or where the Job using tInformixOutputBulkExec is deployed, so that the component functions
properly.
871
Related scenario
Related scenario
For a scenario in which tInformixOutputBulkExec might be used, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
872
tInformixRollback
tInformixRollback
tInformixRollback properties
This component is closely related to tInformixCommit and tInformixConnection. They are generally used
together to execute transactions.
Famille de composant
Databases/Informix
Function
Purpose
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component must be used with other Informix components, particularly tInformixConnection
and tInformixCommit.
Limitation
n/a
Related Scenario
For a scenario in which tInformixRollback might be used, see section Scenario: Rollback from inserting data
in mother/daughter tables.
873
tInformixRow
tInformixRow
tInformixRow properties
Component family
Databases/Informix
Function
tInformixRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tInformixRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder.
Query
874
Related scenarios
Die on error
Advanced settings
Additional
parameters
Propagate
recordset
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection you
are creating. This option is not available if you have selected the Use
an existing connection check box in the Basic settings.
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Select this check box if you want to query the database using
a PreparedStatement. In the Set PreparedStatement Parameter
table, define the parameters represented by ? in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same
query several times. Performance levels are increased
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
875
tInformixSCD
tInformixSCD
The tInformixSCD component belongs to two different families: Business Intelligence and Databases. For
further information, see section tInformixSCD.
876
tInformixSP
tInformixSP
tInformixSP properties
Component Family
Databases/Informix
Function
Purpose
tInformixSP allows you to centralise multiple and complex queries in a database and enables you
to call them more easily.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No properties stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Username et Password
Instance
SP Name
Is Function / Return result in Select this check box if only one value must be returned.
From the list, select the the schema column upon which the value to
be obtained is based.
Parameters
Click the Plus button and select the various Schema Columns that
will be required by the procedures. Note that the SP schema can hold
more columns than there are parameters used in the procedure.
877
Related scenario
Additional
parameters
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This is an intermediary component. It can also be used as an entry component. In this case, only
the entry parameters are authorized.
Limitation
Related scenario
For a scenario in which tInformixSP may be used, see:
section Scenario: Executing a stored procedure in the MDM Hub.
section Scenario: Checking number format using a stored procedure.
Also, see section Scenario: Inserting data in mother/daughter tables if you want to analyse a set of records in
a table or SQL query.
878
tMSSqlBulkExec
tMSSqlBulkExec
tMSSqlBulkExec properties
The tMSSqlOutputBulk and tMSSqlBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tMSSqlOutputBulkExec component, detailed in a separate section. The
advantage of using a two step process is that the data can be transformed before it is loaded in the database.
Component family
Databases/MSSql
Function
Purpose
As a dedicated component, tMSSqlBulkExec offers gains in performance while carrying out the
Insert operations to a MSSql database
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data is stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
879
tMSSqlBulkExec properties
Advanced settings
Action
Additional
parameters
Fields terminated
Rows terminated
First row
Type in the number of the row where the action should start
Code page
Output
Select the type of output for the standard output of the MSSql
database:
to console,
to global variable.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Fields terminated
Rows terminated
Output
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
880
Related scenarios
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with tMSSqlOutputBulk component. Used together, they can
offer gains in performance while feeding a MSSql database.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For use cases in relation with tMSSqlBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
881
tMSSqlColumnList
tMSSqlColumnList
tMSSqlColumnList Properties
Component family
Databases/MS SQL
Function
Purpose
Basic settings
Component list
Table name
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with MSSql components, especially with tMSSqlConnection.
Limitation
n/a
Related scenario
For tMSSqlColumnList related scenario, see section Scenario: Iterating on a DB table and listing its column
names.
882
tMSSqlClose
tMSSqlClose
tMSSqlClose properties
Component family
Databases/MSSql
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with tMssql components, especially with tMssqlConnection
and tMssqlCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
883
tMSSqlCommit
tMSSqlCommit
tMSSqlCommit properties
This component is closely related to tMSSqlConnection and tMSSqlRollback. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/MSSql
Function
tMSSqlCommit validates the data processed through the job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close connection
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mssql components, especially with tMSSqlConnection
and tMSSqlRollback components.
Limitation
n/a
Related scenarios
This component is closely related to tMSSqlConnection and tMSSqlRollback. It usually does not make much
sense to use one of these without using a tMSSqlConnection component to open a connection for the current
transaction.
For a tMSSqlCommit related scenario, see section tMSSqlConnection.
884
tMSSqlConnection
tMSSqlConnection
tMSSqlConnection properties
This component is closely related to tMSSqlCommit and tMSSqlRollback. Both components are usually used
with a tMSSqlConnection component to open a connection for the current transaction.
Component family
Databases/MSSQL
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Schema
Schema name.
Database
Additional
parameters
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Usage
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
This component is to be used along with MSSql components, especially with tMSSqlCommit and
tMSSqlRollback.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
885
Scenario: Inserting data into a database table and extracting useful information from it
Drop the following components from the Palette onto the design workspace: tMSSqlConnection,
tFileInputDelimited, tMSSqlOutput, tMSSqlInput, tLogRow, and tMSSqlCommit.
2.
3.
886
Scenario: Inserting data into a database table and extracting useful information from it
4.
5.
Double-click the tMSSqlConnection component to open its Basic settings view in theComponent tab.
2.
In the Host field, type in the IP address or hostname of the MSSQL server, 192.168.30.47 in this example.
3.
In the Port field, type in the port number of the database server, 1433 in this example.
4.
In the Schema field, type in the schema name, dbo in this example.
5.
In the Database field, type in the database name, talend in this example.
6.
In the Username and Password fields, enter the credentials for the MSSQL connection.
2.
Click the [...] button next to the File Name/Stream field to browse to the input file. In this example, it is D:/
Input/Employee_Wage.txt. This text file holds three columns: id, name and wage.
id;name;wage
51;Harry;2300
40;Ronald;3796
17;Theodore;2174
21;James;1986
2;George;2591
887
Scenario: Inserting data into a database table and extracting useful information from it
89;Calvin;2362
84;Ulysses;3383
4;Lyndon;2264
17;Franklin;1780
86;Lyndon;3999
3.
In the Header field, type in 1 to skip the first row of the input file.
4.
Click Edit schema to define the data to pass on to the tMSSqlOutput component. In this example, we define
id as the key, and specify the length and precision for each column respectively.
Click OK to close the schema editor. A dialog box opens, and you can choose to propagate the schema to
the next component.
Double-click the tMSSqlOutput component to open its Basic settings view in the Component tab.
2.
Type in required information for the connection or use the existing connection you have configured before.
In this example, we select the Use an existing connection check box. If multiple connections are available,
select the connection you want to use from the Component List drop-down list.
3.
In the Table field, type in the name of the table you want to write the data to: Wage_Info in this example.
You can also click the [...] button next to the Table field to open a dialog box and select a proper table.
4.
Select Create table if not exists from the Action on table drop-down list.
5.
Select Insert if not exists from the Action on data drop-down list.
888
Scenario: Inserting data into a database table and extracting useful information from it
6.
Click Sync columns to retrieve the schema from the preceding component.
Double-click the tMSSqlInput component to open its Basic settings view in the Component tab.
2.
Select the Use an existing connection check box. If multiple connections are available, select the connection
you want to use from the Component List drop-down list.
3.
Click Edit schema to define the data structure to be read from the table. In this example, we need to read
all three columns from the table.
4.
In the Table Name field, type in the name of the table you want to read the data from: Wage_Info in this
example.
889
Scenario: Inserting data into a database table and extracting useful information from it
5.
In the Query field, fill in the SQL query to be executed on the table specified. To obtain the data of employees
whose wages are above the average value and order them by id, enter the SQL query as follows:
SELECT
FROM
WHERE
(SELECT
FROM
ORDER BY
*
Wage_Info
wage >
avg(wage)
Wage_Info)
id
2.
2.
2.
890
tMSSqlInput
tMSSqlInput
tMSSqlInput properties
Component family
Function
Purpose
tMSSqlInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Additional
parameters
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Select this check box to collect log data at the component level.
891
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for MS SQL server databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
Related topics in tDBInput scenarios:
section tMSSqlConnection
section Scenario 2: Using StoreSQLQuery variable.
For related topic in tContextLoad, see section Scenario: Reading data from different MySQL databases using
dynamically loaded connection parameters.
892
tMSSqlLastInsertId
tMSSqlLastInsertId
tMSSqlLastInsertId properties
Component Family
Function
tMSSqlLastInsertId displays the last IDs added to a table from a MSSql specified connection.
Purpose
tMSSqlLastInsertId enables you to retrieve the last primary keys added by a user to a MSSql table.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For a related scenario, see section Scenario: Get the ID for the last inserted record
893
tMSSqlOutput
tMSSqlOutput
tMSSqlOutput properties
Component
family
Databases/MS
SQL server
Function
Purpose
tMSSqlOutput executes the action defined on the table and/or on the data contained in the table, based on the
flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Host
Port
Schema
Database
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
On the table defined, you can perform one of the following operations:
Default: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and created again.
Clear table: The table content is deleted.
Truncate table: The table content is deleted. You do not have the possibility to rollback the
operation.
Turn on identity Select this check box to use your own sequence for the identity value of the inserted records
insert
(instead of having the SQL Server pick the next sequential value).
Action on data
894
tMSSqlOutput properties
Insert: Add new entries to the table. If duplicates are found, job stops.
Single Insert Query: Add entries to the table in a batch
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given reference already exists,
an update would be made.
Update or insert: updates the record with the given reference. If the record does not exist,
a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
Insert if not exist : Add new entries to the table if they do not exist.
It is necessary to specify at least one column as a primary key on which the
Update and Delete operations are based. You can do that by clicking Edit Schema
and selecting the check box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings view where you can
simultaneously define primary keys for the Update and Delete operations. To do
that: Select the Use field options check box and then in the Key in update column,
select the check boxes next to the column names you want to use as a base for the
Update operation. Do the same in the Key in delete column for the Delete operation.
Specify
field
identity Select this check box to specify the identity field, which is made up of an automatically
incrementing identification number. When this check box is selected, three other fields
display:
Identity field: select the column you want to define as the identity field from the list.
Start value: type in a start value, used for the very first row loaded into the table.
Step: type in an incremental value, added to the value of the previous row that was loaded.
You can also specify the identity field from the schema of the component. To do so,
set the DB Type of the relevant column to INT IDENTITY.
When the Specify identity field check box is selected, the INT IDENTITY DB
Type in the schema is ignored.
Schema and Edit A schema is a row description. It defines the number of fields to be processed and passed on
schema
to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only. Related topic: see
Talend Studio User Guide.
Die on error
Advanced
settings
This check box is selected by default. Clear the check box to skip the row on error and complete
the process for error-free rows. If needed, you can retrieve the rows on error via a Row >
Rejects link.
Additional JDBC Specify additional connection properties for the DB connection you are creating. This option
parameters
is not available if you have selected the Use an existing connection check box in the Basic
settings.
You can press Ctrl+Space to access a list of predefined global variables.
Commit every
Enter the number of rows to be completed before committing batches of rows together into
the DB. This option ensures transaction quality (but not rollback) and, above all, better
performance at execution.
Additional
Columns
This option is not offered if you create (with or without drop) the DB table. This option allows
you to call SQL functions to perform actions on columns, which are not insert, nor update or
delete actions, or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or inserted as new column
SQL expression: Type in the SQL statement to be executed in order to alter or insert the
relevant column data.
895
tMSSqlOutput properties
Position: Select Before, Replace or After following the action to be performed on the
reference column.
Reference column: Type in a column of reference that the tDBOutput can use to place or
replace the new or altered column.
Use field options
Ignore
validation
Enable
mode
Select this check box to customize a request, especially when there is double action on data.
date Select this check box to ignore the date validation and insert the data directly into the database
for the data types of DATE, DATETIME, DATETIME2 and DATETIMEOFFSET.
debug Select this check box to display each step during processing entries in a database.
Support null in Select this check box if you want to deal with the Null values contained in a DB table.
SQL WHERE
Make sure that the Nullable check box is selected for the corresponding columns
statement
in the schema.
Use batch size
Select this check box to activate the batch mode for data processing. In the Batch Size field
that appears when this check box is selected, you can type in the number you need to define
the batch size to be processed.
This check box is available only when you have selected the Insert, the Update, the Single
Insert Query or the Delete option in the Action on data list.
If you select the Single Insert Query option in the Action on data list, be aware
that the batch size must be lower than or equal to the limit of parameter markers
authorized by the JDBC driver (generally 2000) divided by the number of columns.
For more information, see Limitation below.
tStatCatcher
Statistics
Dynamic
settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This
is an After variable and it returns an integer.
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
NB_LINE_DELETED: Indicates the number of rows deleted. This is an After variable and it returns an integer.
NB_LINE_REJECTED: Indicates the number of rows rejected. This is an After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is an After variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible.
This component must be used as an output component. It allows you to carry out actions on a table or on the data
of a table in a MSSql database. It also allows you to create a reject flow using a Row > Rejects link to filter data in
error. For an example of tMysqlOutput in use, see section Scenario 3: Retrieve data in error with a Reject link.
Limitation
896
When the Single Insert Query option is selected in the Action on data list, an SQL Prepared Statement is
generated, for example, INSERT INTO table (col1, col2, col3) VALUES (?,?,?) , (?,?,?) ,
Related scenarios
(?,?,?) ,(?,?,?). Within brackets are the groups of parameters the number of which cannot exceed 2000,
generally, depending on the JBDC driver. Therefore, the batch size should be set so that this limit is respected.
Due to license incompatibility, one or more JARs required to use this component are not provided. You can easily
find out and add such JARs in the Integration perspective of your studio. For details, see the section about external
modules in the Talend Installation and Upgrade Guide.
Related scenarios
For tMSSqlOutput related topics, see:
section tMSSqlConnection.
section Scenario 1: Adding a new column and altering data in a DB table.
897
tMSSqlOutputBulk
tMSSqlOutputBulk
tMSSqlOutputBulk properties
The tMSSqlOutputBulk and tMSSqlBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tMSSqlOutputBulkExec component, detailed in a separate section. The
advantage of using a two step process is that the data can be transformed before it is loaded in the database.
Component family
Databases/MSSql
Function
Writes a file with columns based on the defined delimiter and the MSSql standards.
Purpose
Prepares the file to be used as parameter in the INSERT query to feed the MSSql database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the records.
Advanced settings
Usage
Row separator
Field separator
Include header
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStaCatcher statistics
Select this check box to collect log data at the component level.
This component is to be used along with tMSSqlBulkExec component. Used together they
offer gains in performance while feeding a MSSql database.
Related scenarios
For use cases in relation with tMSSqlOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
898
tMSSqlOutputBulkExec
tMSSqlOutputBulkExec
tMSSqlOutputBulkExec properties
The tMSSqlOutputBulk and tMSSqlBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tMSSqlOutputBulkExec component.
Component
family
Databases/
MSSql
Function
Purpose
As a dedicated component, it allows gains in performance during Insert operations to a MSSql database.
Property type .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use
an Select this check box and in the Component List click the relevant connection component to reuse
existing
the connection details you already defined.
connection
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Host
Port
DB name
Schema
Username
DB user authentication data.
and Password
Table
Action
table
on On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created again.
Create a table: The table does not exist and gets created.
899
Related scenarios
Create a table if not exists: The table is created if it does not exist.
Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.
Clear a table: The table content is deleted. You have the possibility to rollback the operation.
Schema and A schema is a row description, i.e., it defines the number of fields to be processed and passed on to
Edit schema the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: You create and store the schema locally for this component only. Related topic: see Talend
Studio User Guide.
File Name
Advanced
settings
Append
Select this check box to add the new rows at the end of the records
Additional
JDBC
parameters
Specify additional connection properties for the DB connection you are creating. This option is not
available if you have selected the Use an existing connection check box in the Basic settings.
You can press Ctrl+Space to access a list of predefined global variables.
Field
separator
Row
separator
First row
Type in the number of the row where the action should start.
Include
header
Code page
OEM code pages used to map a specific set of characters to numerical code point values.
Dynamic
settings
Encoding
Select the encoding from the list or select Custom and define it manually. This field is compulsory
for DB data handling.
tStatCatcher
statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is mainly used when no particular transformation is required on the data to be loaded onto the
database.
Limitation
The database server must be installed on the same machine where the Studio is installed or where the Job using
tMSSqlOutputBulkExec is deployed, so that the component functions properly.
Related scenarios
For use cases in relation with tMSSqlOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database
section Scenario: Inserting data in MySQL database
900
tMSSqlRollback
tMSSqlRollback
tMSSqlRollback properties
This component is closely related to tMSSqlCommit and tMSSqlConnection. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with MSSql components, especially with tMSSqlConnection
and tMSSqlCommit components.
Limitation
n/a
Related scenario
For tMSSqlRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
901
tMSSqlRow
tMSSqlRow
tMSSqlRow properties
Component family
Databases/DB2
Function
tMSSqlRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tMSSqlRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
902
Table name
Select this check box to use your own sequence for the identity value
of the inserted records (instead of having the SQL Server pick the
next sequential value).
Query type
tMSSqlRow properties
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Advanced settings
Guess Query
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Additional
parameters
Propagate
recordset
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
903
Related scenarios
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
904
tMSSqlSCD
tMSSqlSCD
tMSSqlSCD belongs to two component families: Business Intelligence and Databases. For more information on
it, see section tMSSqlSCD.
905
tMSSqlSP
tMSSqlSP
tMSSqlSP Properties
Component family
Databases/MSSql
Function
Purpose
tMSSqlSP offers a convenient way to centralize multiple or complex queries in a database and
call them easily.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
SP Name
Is Function / Return result in Select this check box, if only a value is to be returned.
Select on the list the schema column, the value to be returned is based
on.
Parameters
906
Click the Plus button and select the various Schema Columns that
will be required by the procedures. Note that the SP schema can hold
more columns than there are paramaters used in the procedure.
Related scenario
Additional
parameters
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as intermediary component. It can be used as start component but only
input parameters are thus allowed.
Limitation
Related scenario
For related scenarios, see:
section Scenario: Executing a stored procedure in the MDM Hub.
section Scenario: Checking number format using a stored procedure.
Check as well section Scenario: Inserting data in mother/daughter tables to analyze a set of records from a database
table or DB query and return single records.
907
tMSSqlTableList
tMSSqlTableList
tMSSqlTableList Properties
Component family
Databases/MS SQL
Function
Purpose
Lists the names of a given set of MSSql tables using a select statement based on a Where clause.
Basic settings
Component list
Where clause for table name Enter the Where clause to identify the tables to iterate on.
selection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with MSSql components, especially with tMSSqlConnection.
Limitation
n/a
Related scenario
For tMSSqlTableList related scenario, see section Scenario: Iterating on a DB table and listing its column names.
908
tMysqlBulkExec
tMysqlBulkExec
tMysqlBulkExec properties
The tMysqlOutputBulk and tMysqlBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT statement used to feed a database.
These two steps are fused together in the tMysqlOutputBulkExec component, detailed in a separate section. The
advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component
Databases/
family
Mysql
Function Executes the Insert action on the data provided.
Purpose As a dedicated component, tMysqlBulkExec offers gains in performance while carrying out the Insert operations to a
Mysql database
Basic
settings
Property
type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
DB Version
Use
an Select this check box and in the Component List click the relevant connection component to reuse the
existing
connection details you already defined.
connection
When a Job contains the parent Job and the child Job, if you need to share an existing connection
between the two levels, for example, to share the connection created by the parent Job with the
child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings view of
the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend Studio
User Guide.
Host
Port
Database
Username
and
Password
Action
table
on On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and created again.
Clear table: The table content is deleted. You have the possibility to rollback the operation.
Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.
909
Related scenarios
Table
Local
Name
Name of the table to be written. Note that only one table can be written at a time and that the table must
exist for the insert operation to succeed.
file Name of the file to be loaded.
This file should be located on the same machine where the Studio is installed or where the Job using
tMysqlBulkExec is deployed.
Schema and A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the
Edit Schema next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only. Related topic: see Talend Studio
User Guide.
Advanced Additional
settings JDBC
parameters
Specify additional connection properties for the DB connection you are creating. This option is not available
if you have selected the Use an existing connection check box in the Basic settings.
Lines
terminated
by
Fields
terminated
by
Enclosed by
Action
data
Records
Check this box if you want to retrieve the null values from the input data flow. If you do not check this
contain
box, the null values from the input data flow will be considered as empty fields in the output data flow.
NULL value
Encoding
Select the encoding from the list or select Custom and define it manually. This field is compulsory for
DB data handling.
tStatCatcher Select this check box to collect log data at the component level.
Statistics
Dynamic Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
settings connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an
environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with tMysqlOutputBulk component. Used together, they can offer gains in
performance while feeding a Mysql database.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can easily find
out and add such JARs in the Integration perspective of your studio. For details, see the section about external modules
in the Talend Installation and Upgrade Guide.
Related scenarios
For use cases in relation with tMysqlBulkExec, see the following scenarios:
910
Related scenarios
911
tMysqlClose
tMysqlClose
tMysqlClose properties
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mysql components, especially with tMysqlConnection
and tMysqlCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
912
tMysqlColumnList
tMysqlColumnList
tMysqlColumnList Properties
Component family
Databases/MySQL
Function
Purpose
Basic settings
Component list
Table name
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mysql components, especially with tMysqlConnection.
Limitation
n/a
913
In the design workspace, select tMysqlConnection and click the Component tab to define its basic settings.
In the Basic settings view, set the database connection details manually or select them from the context variable
list, through a Ctrl+Space click in the corresponding field if you have stored them locally as Metadata DB
connection entries.
For more information about Metadata, see Talend Studio User Guide.
On the Component list, select the relevant Mysql connection component if more than one connection is used.
Enter a Where clause using the right syntax in the corresponding field to iterate on the table name(s) you want
to list on the console.
In this scenario, the table we want to iterate on is called customer.
In the design workspace, select tMysqlColumnList and click the Component tab to define its basic settings.
914
On the Component list, select the relevant Mysql connection component if more than one connection is used.
In the Table name field, enter the name of the DB table you want to list its column names.
In this scenario, we want to list the columns present in the DB table called customer.
In the design workspace, select tFixedFlowInput and click the Component tab to define its basic settings.
Set the Schema to Built-In and click the three-dot [...] button next to Edit Schema to define the data you want
to use as input. In this scenario, the schema is made of two columns, the first for the table name and the second
for the column name.
Click OK to close the dialog box, and accept propagating the changes when prompted by the system. The
defined columns display in the Values panel of the Basic settings view.
Click in the Value cell for each of the two defined columns and press Ctrl+Space to access the global variable
list.
From the global variable list, select ((String)globalMap.get("tMysqlTableList_1_CURRENT_TABLE")) and
((String)globalMap.get("tMysqlColumnList_1_COLUMN_NAME")) for the TableName and ColumnName
respectively.
915
The name of the DB table is displayed on the console along with all its column names.
916
tMysqlCommit
tMysqlCommit
tMysqlCommit Properties
This component is closely related to tMysqlConnection and tMysqlRollback. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases/MySQL
Function
Validates the data processed through the job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mysql components, especially with tMysqlConnection
and tMysqlRollback components.
Limitation
n/a
Related scenario
This component is closely related to tMysqlConnection and tMysqlRollback. It usually doesnt make much sense
to use one of these without using a tMysqlConnection component to open a connection for the current transaction.
For tMysqlCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
917
tMysqlConnection
tMysqlConnection
tMysqlConnection Properties
This component is closely related to tMysqlCommit and tMysqlRollback. It usually doesnt make much sense to
use one of these without using a tMysqlConnection component to open a connection for the current transaction.
Component family
Databases/MySQL
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
Host
Port
Database
Additional
parameters
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Specify a data source alias
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
Usage
This component is to be used along with Mysql components, especially with tMysqlCommit and
tMysqlRollback components.
Limitation
n/a
918
1.
2.
Once connected to the relevant database, type in the following command to create the parent table:
create table f1090_mum(id int not null auto_increment, name varchar(10), primary
key(id)) engine=innodb;
3.
Back in Talend Studio, the Job requires seven components including tMysqlConnection and tMysqlCommit.
Drag and drop the following components from the Palette: a tFileList, a tFileInputDelimited, a tMap, a
tMysqlConnection, a tMysqlCommit and two tMysqlOutput.
2.
3.
4.
Connect the tFileList component to the input file component using an Iterate link as the name of the file to
be processed will be dynamically filled in from the tFileList directory using a global variable.
5.
Connect the tFileInputDelimited component to the tMap and dispatch the flow between the two output
Mysql DB components. Use a Row link for each for these connections representing the main data flow.
Set the tFileList component properties, such as the directory name where files will be fetched from.
2.
3.
On the tFileInputDelimited components Basic settings panel, press Ctrl+Space bar to access the variable
list. Set the File Name field to the global variable: tFileList_1.CURRENT_FILEPATH
919
4.
Set the rest of the fields as usual, defining the row and field separators according to your file structure.
5.
Then set the schema manually through the Edit schema feature or select the schema from the Repository. In
Java version, make sure the data type is correctly set, in accordance with the nature of the data processed.
6.
In the tMap Output area, add two output tables, one called mum for the parent table, the second called baby,
for the child table.
Drag the Name column from the Input area, and drop it to the mum table.
Drag the Years column from the Input area and drop it to the baby table.
7.
Make sure the mum table is on the top of the baby table as the order is determining for the flow sequence
hence the DB insert to perform correctly.
Connect the output row link to distribute correctly the flow to the relevant DB output component.
8.
920
In each of the tMysqlOutput components Basic settings panel, select the Use an existing connection check
box to retrieve the tMysqlConnection details.
9.
Set the Table name making sure it corresponds to the correct table, in this example either f1090_mum or
f1090_baby.
There is no action on the table as they are already created.
Select Insert as Action on data for both output components.
Click on Sync columns to retrieve the schema set in the tMap.
10. In the Additional columns area of the DB output component corresponding to the child table (f1090_baby),
set the id_baby column so that it reuses the id from the parent table.
11. In the SQL expression field type in: "(Select Last_Insert_id())"
The position is Before and the Reference column is years.
In the Advanced settings panel, clear the Extend insert check box.
2.
The parent table id has been reused to feed the id_baby column.
921
922
tMysqlInput
tMysqlInput
tMysqlInput properties
Component family
Databases/MySQL
Function
Purpose
tMysqlInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: No property data stored centrally.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to
share an existing connection between the two levels, for example, to
share the connection created by the parent Job with the child Job, you
have to:
1. In the parent level, register the database connection to be shared in
the Basic settings view of the connection component which creates
that very database connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Host
Port
Database
Username
Password
Schema
Schema
and
Edit A schema is a row description, i.e., it defines the number of fields to be processed
and passed on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: You create and store the schema locally for this component only.
Related topic: see Talend Studio User Guide.
Table Name
Enter your DB query paying particularly attention to properly sequence the fields
in order to match the schema definition.
Specify a data source Select this check box and specify the alias of a data source created on the side
alias
to use the shared connection pool defined in the data source configuration. This
option works only when you deploy and run your Job in .
If you use the component's own DB configuration, your data source
connection will be closed at the end of the component. To prevent this
from happening, use a shared DB connection with the data source alias
specified.
923
This check box is not available when the Use an existing connection check box
is selected.
Advanced settings
Additional
parameters
JDBC Specify additional connection properties for the DB connection you are creating.
This option is not available if you have selected the Use an existing connection
check box in the Basic settings.
When you need to handle data of the time-stamp type 0000-00-00
00:00:00 using this component, set the parameter as:
noDatetimeStringSync=true&zeroDateTimeBehavior=convertToNull.
Enable stream
Select this check box to enables streaming over buffering which allows the code
to read from a large table without consuming a large amount of memory in order
to optimize the performance.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from all the
columns
String/Char columns.
Trim column
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful
when you need to access database tables having the same data structure but in different databases, especially
when you are working in an environment where you cannot change your Job settings, for example, when
your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Mysql databases.
Drop tMysqlInput and tFileOutputDelimited from the Palette onto the workspace.
2.
924
Double-click tMysqlInput to open its Basic Settings view in the Component tab.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
3.
4.
Click the [+] button to add the rows that you will use to define the schema, four columns in this example
id, first_name, city and salary.
Under Column, click in the fields to enter the corresponding column names.
Click the field under Type to define the type of data.
Click OK to close the schema editor.
925
5.
Next to the Table Name field, click the [...] button to select the database table of interest.
A dialog box displays a tree diagram of all the tables in the selected database:
6.
Click the table of interest and then click OK to close the dialog box.
7.
In the Query box, enter the query required to retrieve the desired columns from the table.
8.
9.
Next to the File Name field, click the [...] button to browse your directory to where you want to save the
output file, then enter a name for the file.
Select the Include Header check box to retrieve the column names as well as the data.
926
Scenario 2: Using context parameters when reading a table from a MySQL database
As shown above, the output file is written with the desired column names and corresponding data, retrieved from
the database:
The Job can also be run in the Traces Debug mode, which allows you to view the rows as they are being written to the
output file, in the workspace.
Drop tMysqlInput and tLogRow from the Palette onto the workspace.
2.
Double-click tMysqlInput to open its Basic Settings view in the Component tab.
927
Scenario 2: Using context parameters when reading a table from a MySQL database
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
3.
4.
Click the [+] button to add the rows that you will use to define the schema, seven columns in this example:
id, first_name, last_name, city, state, date_of_birth and salary.
Under Column, click the fields to enter the corresponding column names.
Click the fields under Type to define the type of data.
Click OK to close the schema editor.
5.
928
Put the cursor in the Table Name field and press F5 for context parameter setting.
Scenario 2: Using context parameters when reading a table from a MySQL database
For more information about context settings, see Talend Studio User Guide.
6.
Keep the default setting in the Name field and type in the name of the database table in the Default value
field, employees in this case.
7.
8.
9.
929
Scenario 3: Reading data from MySQL databases through context-based dynamic connections
10. In the Mode area, select Table (print values in cells of a table) for a better display of the results.
11. Save the Job.
As shown above, the records with the salary greater than 8000 are retrieved.
Drop two tMysqlConnection, a tMysqlInput, a tLogRow, and a tMysqlClose components onto the design
workspace.
2.
Link the first tMysqlConnection to the second tMysqlConnection and the second tMysqlConnection to
tMysqlInput using Trigger > On Subjob Ok connections.
3.
4.
930
Scenario 3: Reading data from MySQL databases through context-based dynamic connections
In the Contexts view, select the Variables tab, click the [+] button to add a row in the table, and give the
variable a name, myConnection in this example.
2.
Select the Values as tree tab, expand the myConnection node, fill the Prompt field with the message you
want to display at runtime, and select the check box in front of the message text.
931
Scenario 3: Reading data from MySQL databases through context-based dynamic connections
3.
Fill the Value field with the unique name of the component you want to use as the default connection
component, tMysqlConnection_1 in this example.
Double-click the first tMysqlConnection component to show its Basic settings view, and set the connection
details. For more information on the configuration of tMysqlConnection, see section tMysqlConnection.
Note that we use this component to open a connection to a MySQL databased named project_q1.
2.
Configure the second tMysqlConnection component in the same way, but fill the Database field with
project_q2 because we want to use this component to open a connection to another MySQL database,
project_q2.
3.
932
Scenario 3: Reading data from MySQL databases through context-based dynamic connections
4.
Select the Use an existing connection check box, and leave the Component List box as it is.
5.
Click the [...] button next to Edit schema to open the [Schema] dialog box and define the data structure of
the database table to read data from.
In this example, the database table structure is made of four columns, id (type Integer, 2 characters long),
firstName (type String, 15 characters long), lastName (type String, 15 characters long), and city (type String,
15 characters long). When done, click OK to close the dialog box and propagate the schema settings to the
next component.
6.
Fill the Table field with the database table name, customers in this example, and click Guess Query to
generate the query statement corresponding to your table schema in the Query field.
7.
In the Dynamic settings view, click the [+] button to add a row in the table, and fill the Code field with the
code script of the context variable you just created, " + context.myConnection + " in this example.
933
Scenario 3: Reading data from MySQL databases through context-based dynamic connections
8.
In the Basic settings view of the tLogRow component, select the Table option for better display effect of
the Job execution result.
9.
In the Dynamic settings view of the tMysqlClose component, do exactly the same as in the Dynamic settings
view of the tMysqlInput component.
Press Ctrl+S to save your Job and press F6 or click Run to launch it.
A dialog box appears prompting you to specify the connection component you want to use.
2.
934
Scenario 3: Reading data from MySQL databases through context-based dynamic connections
3.
Press F6 or click Run to launch your Job again. When prompted, specify the other connection component,
tMysqlConnection_2, to read data from the other database, project_q2.
The data read from database project_q2 is displayed in the Run console.
935
tMysqlLastInsertId
tMysqlLastInsertId
tMysqlLastInsertId properties
Component family
Databases
Function
Purpose
tMysqlLastInsertId obtains the primary key value of the record that was last inserted in a Mysql
table by a user.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
936
In the design workspace, select tMysqlCommit and click the Component tab to define its basic settings.
On the Component List, select the relevant tMysqlConnection if more than one connection is used.
In the design workspace, select tFileInputDelimited.
Click the Component tab to define the basic settings of tFileInputDelimited.
937
Fill in a path to the processed file in the File Name field. The file used in this example is Customers.
Define the Row separator that allow to identify the end of a row. Then define the Field separator used to
delimit fields in a row.
Set the header, the footer and the number of processed rows as necessary. In this scenario, we have one header.
Click the three-dot button next to Edit Schema to define the data to pass on to the next component.
Related topics: Talend Studio User Guide.
In this scenario, the schema consists of two columns, name and age. The first holds three employees names and
the second holds the corresponding age for each.
In the design workspace, select tMySqlOutput.
Click the Component tab to define the basic settings of tMySqlOuptput.
938
On the Component List, select the relevant tMysqlConnection, if more than one connection is used.
Click Sync columns to synchronize columns with the previous component. In the output schema of
tMySqlLastInsertId, you can see the read-only column last_insert_id that will fetch the last inserted ID on
the existing connection.
You can select the data type Long from the Type drop-down list in case of a huge number of entries.
In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information, see section tLogRow.
Save your job and press F6 to execute it.
939
tMysqlLastInsertId fetched the last inserted ID for each line on the existing connection.
940
tMysqlOutput
tMysqlOutput
tMysqlOutput properties
Component family
Databases/MySQL
Function
Purpose
tMysqlOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
941
tMysqlOutput properties
Die on error
This check box is selected by default. Clear the check box to skip the
row in error and complete the process for error-free rows. If needed,
you can retrieve the rows in error via a Row > Rejects link.
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This check box is not available when the Use an existing connection
check box is selected.
Advanced settings
942
Additional
parameters
tMysqlOutput properties
Select this check box to carry out a bulk insert of a defined set of lines
instead of inserting lines one by one. The gain in system performance
is considerable.
Number of rows per insert: enter the number of rows to be inserted
per operation. Note that the higher the value specified, the lower
performance levels shall be due to the increase in memory demands.
This option is not compatible with the Reject link. You
should therefore clear the check box if you are using a Row
> Rejects link with this component.
If you are using this component with tMysqlLastInsertID,
ensure that the Extend Insert check box in Advanced
Settings is not selected. Extend Insert allows for batch
loading, however, if the check box is selected, only the ID
of the last line of the last batch will be returned.
Select this check box to activate the batch mode for data processing.
In the Batch Size field that appears when this check box is selected,
you can type in the number you need to define the batch size to be
processed.
This check box is available only when you have selected,
the Update or the Delete option in the Action on data
field.
Commit every
Additional Columns
This option is not available if you have just created the DB table
(even if you delete it beforehand). This option allows you to call SQL
functions to perform actions on columns, provided that these are not
insert, update or delete actions, or actions that require pre-processing.
Name: Type in the name of the schema column to be altered or
inserted.
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the data in the corrsponding column.
Position: Select Before, Replace or After, depending on the action
to be performed on the reference column.
Reference column: Type in a reference column that tMySqlOutput
can use to locate or replace the new column, or the column to be
modified.
Select this check box to activate the hint configuration area which
helps you optimize a querys execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax /*+ */.
- POSITION: specify where you put the hint in a SQL statement.
- SQL STMT: select the SQL statement you need to use.
Select this check box to display each step involved in the process of
writing data in the database.
Use duplicate key update Updates the values of the columns specified, in the event of duplicate
mode insert
primary keys.:
Column: Between double quotation marks, enter the name of the
column to be updated.
Value: Enter the action you want to carry out on the column.
943
To use this option you must first of all select the Insert
mode in the Action on data list found in the Basic Settings
view.
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a MySQL database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
In the design workspace, select tRowGenerator to display its Basic settings view.
Click the Edit schema three-dot button to define the data to pass on to the tMap component, two columns in
this scenario, name and random_date.
944
Click in the corresponding Functions fields and select a function for each of the two columns, getFirstName
for the first column and getrandomDate for the second column.
In the Number of Rows for Rowgenerator field, enter 10 to generate ten first name rows and click Ok to
close the editor.
Double-click the tMap component to open the Map editor. The Map editor opens displaying the input metadata
of the tRowGenerator component.
945
In the Schema editor panel of the Map editor, click the plus button of the output table to add two rows and
define the first as random_date and the second as random_date1.
In this scenario, we want to duplicate the random_date column and adapt the schema in order to alter the data
in the output component.
In the Map editor, drag the random_date row from the input table to the random_date and random_date1 rows
in the output table.
946
947
Double-click tFileInputDelimited to display its Basic settings view and define the component properties.
948
In the File Name field, click the three-dot button and browse to the source delimited file that contains the
modifications to propagate in the MySQL table.
In this example, we use the customer_update file that holds four columns: id, CustomerName, CustomerAddress
and idState. Some of the data in these four columns is different from that in the MySQL table.
Define the row and field separators used in the source file in the corresponding fields.
If needed, set Header, Footer and Limit.
In this example, Header is set to 1 since the first row holds the names of columns, therefore it should be ignored.
Also, the number of processed lines is limited to 2000.
Click the three-dot button next to Edit Schema to open a dialog box where you can describe the data structure
of the source delimited file that you want to pass to the component that follows.
Select the Key check box(es) next to the column name(s) you want to define as key column(s).
It is necessary to define at least one column as a key column for the Job to be executed correctly. Otherwise, the Job is
automatically interrupted and an error message displays on the console.
In the design workspace, double-click tMysqlOutput to open its Basic settings view where you can define
its properties.
949
Click Sync columns to retrieve the schema of the preceding component. If needed, click the three-dot button
next to Edit schema to open a dialog box where you can check the retrieved schema.
Fill in the database connection information in the corresponding fields.
In the Table field, enter the name of the table to update.
From the Action on table list, select the operation you want to perform, None in this example since the table
already exists.
From the Action on data list, select the operation you want to perform on the data, Update in this example.
Save your Job and press F6 to execute it.
Using you DB browser, you can verify if the MySQL table, customers, has been modified according to the
delimited file.
In the above example, the database table has always the four columns id, CustomerName, CustomerAddress and
idState, but certain fields have been modified according to the data in the delimited file used.
950
Drop a tFileInputDelimited component from the family File > Input, in the Palette, and fill in its properties
manually in the Component tab.
From the Palette, drop a tMap from the Processing family onto the workspace.
Drop a tMysqlOutput from the Databases family in the Palette and fill in its properties manually in the
Component tab.
For more information, see Talend Studio User Guide.
From the Palette, select a tFileOutputDelimited from the File > Output family, and drop it onto the workspace.
Link the customers component to the tMap component, and the tMap and Localhost with a Row Main link.
Name this second link out.
Link the Localhost to the tFileOutputDelimited using a Row > Reject link.
Double-click the customers component to display the Component view.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
951
In the Header, Footer and Limit fields, type in the number of headers and footers to ignore, and the number
of rows to which processing should be limited.
Click the [...] button next to the Edit schema field, and set the schema manually.
The schema is as follows:
Select the id, CustomerName, CustomerAddress, idSate, id2, RegTime and RegisterTime columns on the table
on the left and drop them on the out table, on the right.
952
In the Schema editor area, at the bottom of the tMap editor, in the right table, change the length of the
CustomerName column to 28 to create an error. Thus, any data for which the length is greater than 28 will
create errors, retrieved with the Reject link.
Click OK.
In the workspace, double-click the output Localhost component to display its Component view.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
953
Deselect the Extend Insert check box which enables you to insert rows in batch, because this option is not
compatible with the Reject link.
Double-click the tFileOutputDelimited component to set its properties in the Component view.
Click the [...] button next to the File Name field to fill in the path and name of the output file.
Click the Sync columns button to retrieve the schema of the previous component.
Save your Job and press F6 to execute it.
954
The data in error are sent to the delimited file, as well as the error type met. Here, we have: Data truncation.
955
tMysqlOutputBulk
tMysqlOutputBulk
tMysqlOutputBulk properties
The tMysqlOutputBulk and tMysqlBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT statement used to feed a database.
These two steps are fused together in the tMysqlOutputBulkExec component, detailed in a separate section. The
advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component family
Databases/MySQL
Function
Writes a file with columns based on the defined delimiter and the MySql standards
Purpose
Prepares the file to be used as parameter in the INSERT query to feed the MySQL database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the file
Advanced settings
Row separator
Field separator
Text enclosure
Create directory if does not This check box is selected by default. It creates a directory to hold
exist
the output table if required.
Custom the flush buffer size Customize the amount of memory used to temporarily store
output data. In the Row number field, enter the number of rows
after which the memory is to be freed again.
Records
value
Usage
contain
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect the log data at the component
level.
This component is to be used along with tMySQlBulkExec component. Used together they
offer gains in performance while feeding a MySQL database.
Limitation
956
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
2.
3.
And connect the start component (tRowgenerator in this example) to the tMysqlBulkExec using a trigger
connection, of type OnComponentOk.
2.
Define the schema of the rows to be generated and the nature of data to generate. In this example, the clients
file to be produced will contain the following columns: ID, First Name, Last Name, Address, City which all
are defined as string data but the ID that is of integer type.
957
Some schema information dont necessarily need to be displayed. To hide them away, click on Columns list
button next to the toolbar, and uncheck the relevant entries, such as Precision or Parameters.
Use the plus button to add as many columns to your schema definition.
Click the Refresh button to preview the first generated row of your output.
3.
4.
Drag and drop all columns from the input table to the output table.
5.
Apply the transformation on the LastName column by adding .toUpperCase() in its expression field.
Then, click OK to validate the transformation.
6.
7.
Define the name of the file to be produced in File Name field. If the delimited file information is stored in
the Repository, select it in Property Type field, to retrieve relevant data. In this use case the file name is
clients.txt.
The schema is propagated from the tMap component, if you accepted it when prompted.
8.
In this example, dont include the header information as the table should already contain it.
9.
958
10. Then double-click on the tMysqlBulkExec component to set the INSERT query to be executed.
11. Define the database connection details.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
12. Set the table to be filled in with the collected data, in the Table field.
13. Fill in the column delimiters in the Field terminated by area.
14. Make sure the encoding corresponds to the data encoding.
2.
The clients database table is filled with data from the file including upper-case last name as transformed in
the job.
For simple Insert operations that dont include any transformations, the use of tMysqlOutputBulkExec allows
you to skip a step in the process and thus improves performance.
Related topic: section tMysqlOutputBulkExec properties
959
tMysqlOutputBulkExec
tMysqlOutputBulkExec
tMysqlOutputBulkExec properties
The tMysqlOutputBulk and tMysqlBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT statement used to feed a
database. These two steps are fused together in the tMysqlOutputBulkExec component.
Component family
Databases/MySQL
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Host
Port
Database
Table
Local FileName
Append
Select the check box for this option to append new rows to the end
of the file.
960
Additional
parameters
Row separator
Field separator
Escape char
Text enclosure
Create directory if does not This check box is selected by default. It creates a directory to hold
exist
the output table if required.
Custom the flush buffer size Customize the amount of memory used to temporarily store output
data. In the Row number field, enter the number of rows after which
the memory is to be freed again.
Action on data
On the data of the table defined, you can carry out the following
opertaions:
Insert records in table: Add new records to the table.
Update records in table: Make changes to existing records.
Replace records in table: Replace existing records with new one.
Ignore records in table: Ignore existing records or insert the new
ones.
Records
value
contain
NULL This check box is selected by default. It allows you to take account
of NULL value fields. If you clear the check box, the NULL values
will automatically be replaced with empty values.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect the log data at the component level.
Usage
This component is mainly used when no particular transformation is required on the data to be
loaded onto the database.
Limitation
n/a
Drop a tRowGenerator and a tMysqlOutputBulkExec component from the Palette to the design workspace.
Connect the components using a link such as Row > Main.
Set the tRowGenerator parameters the same way as in section Scenario: Inserting transformed data in MySQL
database. The schema is made of four columns including: ID, First Name, Last Name, Address and City.
In the workspace, double-click the tMysqlOutputBulkExec to display the Component view and set the
properties.
961
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
In the Action on data list, select the Insert records in table to insert the new data in the table.
Press F6 to run the Job.
The result should be pretty much the same as in section Scenario: Inserting transformed data in MySQL database,
but the data might differ as these are regenerated randomly everytime the Job is run.
962
tMysqlRollback
tMysqlRollback
tMysqlRollback properties
This component is closely related to tMysqlCommit and tMysqlConnection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mysql components, especially with tMysqlConnection
and tMysqlCommit components.
Limitation
n/a
963
1.
2.
3.
4.
964
tMysqlRow
tMysqlRow
tMysqlRow properties
Component family
Databases/MySQL
Function
tMysqlRow is the specific component for this database query. It executes the SQL query stated in
the specified database. The row suffix means the component implements a flow in the job design
although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tMysqlRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
965
tMysqlRow properties
Guess Query
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This check box is not available when the Use an existing connection
check box is selected.
Advanced settings
Additional
parameters
Propagate
recordset
Use PreparedStatement
Dynamic settings
Commit every
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
966
This component offers the flexibility of the DB query and covers all possible SQL queries.
Select and drop the following components onto the design workspace: tMysqlRow (x2), tRowGenerator,
and tMysqlOutput.
2.
3.
4.
2.
3.
Click the [...] button next to Edit schema and define the schema columns.
4.
Propagate the properties and schema details onto the other components of the Job.
5.
Type in the following SQL statement to alter the database entries: drop index <index_name> on
<table_name>
6.
Select the second tMysqlRow component, check the DB properties and schema.
7.
Type in the SQL statement to recreate an index on the table using the following statement: create index
<index_name> on <table_name> (<column_name>)
The tRowGenerator component is used to generate automatically the columns to be added to the DB output
table defined.
967
8.
Select the tMysqlOutput component and fill in the DB connection properties. The table to be fed is named:
comprehensive.
9.
The schema should be automatically inherited from the data flow coming from the tLogRow. Edit the schema
to check its structure and check that it corresponds to the schema expected on the DB table specified.
The Action on table is None and the Action on data is Insert.
No additional Columns is required for this job.
2.
2.
Select the metadata which corresponds to the client file and slide the metadata onto the workspace. Here, we
are using the customers metadata.
3.
4.
5.
968
6.
7.
2.
In the Schema list, select Built-in so that you can modify the components schema. Then click on [...] next
to the Edit schema field to add a column into which the name of the State will be inserted.
969
3.
Click on the [+] button to add a column to the schema. Rename this column LabelStateRecordSet and select
Object from the Type list. Click OK to save your modifications.
From the Palette, select the tMysqlRow, tParseRecordSet and tFileOutputDelimited components and drop
them onto the workspace.
4.
Double click tMysqlRow to set its properties in the Basic settings tab of the Component view.
5.
In the Property Type list, select Repository and click on the [...] button to select a database connection from
the metadata in the Repository. The DB Version, Host, Port, Database, Username and Password fields are
completed automatically. If you are using the Built-in mode, complete these fields manually.
6.
From the Schema list, select Built-in to set the schema properties manually and add the LabelStateRecordSet
column, or click directly on the Sync columns button to retrieve the schemma from the preceding component.
7.
In the Query field, enter the SQL query you want to use. Here, we want to retrieve the names of the American
States from the LabelState column of the MySQL table, us_state: "SELECT LabelState FROM us_state
WHERE idState=?".
The question mark, ?, represents the parameter to be set in the Advanced settings tab.
8.
970
9.
Select the Propagate QUERYs recordset check box and select the LabelStateRecordSet column from the
use column list to insert the query results in that column.
Select the Use PreparedStatement check box and define the parameter used in the query in the Set
PreparedStatement Parameters table.
Click on the [+] button to add a parameter.
In the Parameter Index cell, enter the parameter position in the SQL instruction. Enter 1 as we are only
using one parameter in this example.
In the Parameter Type cell, enter the type of parameter. Here, the parameter is a whole number, hence,
select Int from the list.
In the Parameter Value cell, enter the parameter value. Here, we want to retrieve the name of the State based
on the State ID for every client in the input file. Hence, enter row1.idState.
10. Double click tParseRecordSet to set its properties in the Basic settings tab of the Component view.
971
11. From the Prev. Comp. Column list, select the preceding components column for analysis. In this example,
select LabelStateRecordSet.
Click on the Sync columns button to retrieve the schema from the preceding component. The Attribute table
is automatically completed with the schema columns.
In the Attribute table, in the Value field which corresponds to the LabelStateRecordSet, enter the name of
the column containing the State names to be retrieved and matched with each client, within double quotation
marks. In this example, enter LabelState.
12. Double click tFileOutputDelimited to set its properties in the Basic settings tab of the Component view.
13. In the File Name field, enter the access path and name of the output file.
Click Sync columns to retrieve the schema from the preceding component.
2.
A column containing the name of the American State corrresponding to each client is added to the file.
972
2.
3.
4.
5.
2.
3.
Double-click the [...] button next to Edit schema to open the schema editor.
973
Click the [+] button to add two columns, namely id and age, with the type of Integer.
Click Ok to close the editor.
4.
5.
6.
974
Click the [...] button next to Edit schema to open the schema editor.
7.
Click the [+] button to add two columns in the right part, namely recordset and age, with the type of Object
and Integer. Note that recordset is intended to hold the query results of the Mysql table, namely the id and
name fields.
Click OK to close the editor.
8.
9.
Select the Propagate QUERY's recordset check box and choose recordset from the use column list to insert
the query results in that column.
Select the Use PreparedStatement check box and define the parameter used in the query in the Set
PreparedStatement Parameters table.
975
From the Prev. Comp. Column list, select the column to parse, namely recordset.
12. Click the [...] button next to Edit schema to open the schema editor.
Click the [+] button to add three columns in the right part, namely id, name and age, with the type of Integer,
String and Integer. Note that the id and name fields are intended to hold the parsed data of recordset.
Click OK to close the editor.
In the Attribute table, in the Value fields which correspond to id and name, enter the name of the column
in the Mysql table to be retrieved, namely "id" and "name".
13. Double-click tLogRow to open its Basic settings view.
In the Mode area, select Table (print values in cells of a table for better display.
976
2.
977
tMysqlSCD
tMysqlSCD
tMysqlSCD belongs to two component families: Business Intelligence and Databases. For more information on
it, see section tMysqlSCD.
978
tMysqlSCDELT
tMysqlSCDELT
tMysqlSCDELT belongs to two component families: Business Intelligence and Databases. For more information
on it, see section tMysqlSCDELT.
979
tMysqlSP
tMysqlSP
tMysqlSP Properties
Component family
Databases/Mysql
Function
Purpose
tMysqlSP offers a convenient way to centralize multiple or complex queries in a database and call
them easily.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
SP Name
Is Function / Return result in Select this check box, if a value only is to be returned.
Select on the list the schema column, the value to be returned is based
on.
Parameters
Click the Plus button and select the various Schema Columns that
will be required by the procedures. Note that the SP schema can hold
more columns than there are paramaters used in the procedure.
Select the Type of parameter:
IN: Input parameter.
OUT: Output parameter/return value.
IN OUT: Input parameters is to be returned as value, likely after
modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of values,
rather than single value.
Check the section Scenario: Inserting data in mother/
daughter tables if you want to analyze a set of records from
a database table or DB query and return single records.
Dynamic settings
980
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as intermediary component. It can be used as start component but only
input parameters are thus allowed.
Limitation
Drag and drop the following components used in this example: tRowGenerator, tMysqlSP, tLogRow.
Connect the components using the Row Main link.
The tRowGenerator is used to generate the odd id number. Double-click on the component to launch the editor.
981
Change the Value of step from 1 to 2 for this example, still starting from 1.
Set the Number of generated rows to 25 in order for all the odd State id (of 50 states) to be generated.
Click OK to validate the configuration.
Then select the tMysqlSP component and define its properties.
982
IF EXISTS `talend`.`getstate` $$
CREATE DEFINER=`root`@`localhost` PROCEDURE `getstate`(IN pid INT, OUT
pstate VARCHAR(50))
BEGIN
SELECT LabelState INTO pstate FROM us_states WHERE idState = pid;
END $$
In the Parameters area, click the plus button to add a line to the table.
Set the Column field to ID, and the Type field to IN as it will be given as input parameter to the procedure.
Add a second line and set the Column field to State and the Type to Out as this is the output parameter to
be returned.
Eventually, set the tLogRow component properties.
The output shows the state labels corresponding to the odd state ids as defined in the procedure.
Check section Scenario: Inserting data in mother/daughter tables if you want to analyze a set of records from a database
table or DB query and return single records.
983
tMysqlTableList
tMysqlTableList
tMysqlTableList Properties
Component family
Databases/MySQL
Function
Purpose
Lists the names of a given set of Mysql tables using a select statement based on a Where clause.
Basic settings
Component list
Where clause for table name Enter the Where clause to identify the tables to iterate on.
selection
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mysql components, especially with tMysqlConnection.
Limitation
n/a
Related scenario
For tMysqlTableList related scenario, see section Scenario: Iterating on a DB table and listing its column names.
984
tOleDbRow
tOleDbRow
tOleDbRow properties
Component
family
Databases/OleDb
Function
tOleDbRow is the specific component for this database query. It executes the SQL query stated in the specified
database.
Purpose
Depending on the nature of the query and the database, tOleDbRow acts on the actual database structure or on
the data.
Basic settings
Database
Enter
the
connection
string
that
contains
the
database.
For
details,
see
http://msdn.microsoft.com/en-us/library/
system.data.oledb.oledbconnection.connectionstring.aspx.
Table Name
Query type
Advanced
settings
Guess Query
Click the Guess Query button to generate the query which corresponds to your
table schema in the Query field.
Query
Enter your database query paying particularly attention to properly sequence the
fields in order to match the schema definition.
Die on error
This check box is selected by default. Clear the check box to skip the row on
error and complete the process for error-free rows. If needed, you can retrieve
the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query in a COLUMN of the current
flow. Select this column from the use column list.
This option allows the component to have a different schema from
that of the preceding component. Moreover, the column that holds
the QUERYs recordset should be set to the type of Object and this
component is usually followed by tParseRecordSet.
Use PreparedStatement
Select this check box if you want to query the database using a
PreparedStatement. In the Set PreparedStatement Parameter table, define the
parameters.
Parameter Index: Enter the parameter position in the SQL instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same query several
times. Performance levels are increased
985
Related scenario
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your
database connection dynamically from multiple connections planned in your Job. This feature is useful when you
need to access database tables having the same data structure but in different databases, especially when you are
working in an environment where you cannot change your Job settings, for example, when your Job has to be
deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in
the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings
view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the database query and covers all possible SQL queries.
Related scenario
For related scenarios, see section tMysqlRow.
986
tOracleBulkExec
tOracleBulkExec
tOracleBulkExec properties
The tOracleOutputBulk and tOracleBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tOracleOutputBulkExec component, detailed in a separate section. The
advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component
family
Databases/Oracle
Function
Purpose
As a dedicated component, it allows gains in performance during operations performed on data of an Oracle database.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection component
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that
registered database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Connection type
DB Version
Host
Port
Database
Database name.
Schema
Username
Password
Schema name.
and DB user authentication data.
Table
Name of the table to be written. Note that only one table can be written at a time.
Action on table
On the table defined, you can perform one of the following operations:
987
tOracleBulkExec properties
Action on data
Schema
Schema
and
Edit A schema is a row description, it defines the number of fields to be processed and passed on
to the next component. The schema is either Built-in or stored remotely in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Advanced
settings
Advanced separator (for Select this check box to change the separator used for the numbers.
number)
Use existing control file Select this check box if you use a control file (.ctl) and specify its path in the .ctl file name
field.
Record format
Specify .ctl files INTO Select this check box to manually fill in the INTO TABLE clause of the control file.
TABLE clause manually
Fields terminated by
Select this check box if you want to use enclosing characters for the text:
Fields enclosure (left part): character delimiting the left of the field.
Field enclosure (right part): character delimiting the right of the field.
988
Use schemas Date Select this check box to use the date pattern of the schema in the date field.
Pattern to load Date
field
Specify field condition
Preserve blanks
Load options
NLS Language
In the list, select the language used for the data that are not used in Unicode.
Set
Parameter Select this check box to modify the territory conventions used for day and weeks
NLS_TERRITORY
numbering. Your OS value is the default value used.
Encoding
Select the encoding type from the list or select Custom and define it manually. This field
is compulsory for database data handling.
Output
Select the type of output for the standard output of the Oracle database:
to console,
to global variable.
Convert columns and Select this check box to uppercase the names of the columns and the name of the table.
table
names
to
uppercase
tStatCatcher Statistics
Dynamic
settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This dedicated component offers performance and flexibility of Oracle DB query handling.
Limitation
The database server/client must be installed on the same machine where the Studio is installed or where the Job
using tOracleBulkExec is deployed, so that the component functions properly.
989
Drop the following components: tOracleInput, tFileOutputDelimited, tOracleBulkExec from the Palette to
the design workspace
Connect the tOracleInput with the tFileOutputDelimited using a row main link.
And connect the tOracleInput to the tOracleBulkExec using a OnSubjobOk trigger link.
Define the Oracle connection details. We recommend you to store the DB connection details in the Metadata
repository in order to retrieve them easily at any time in any job.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
Define the schema, if it isnt stored either in the Repository. In this example, the schema is as follows:
ID_Contract, ID_Client, Contract_type, Contract_Value.
Define the tFileOutputDelimited component parameters, including output File Name, Row separator and
Fields delimiter.
Then double-click on the tOracleBulkExec to define the DB feeding properties.
990
In the Property Type, select Repository mode if you stored the database connection details under the Metadata
node of the Repository or select Built-in mode to define them manually. In this scenario, we use the Builtin mode.
Thus, set the connection parameters in the following fields: Host, Port, Database, Schema, Username, and
Password.
Fill in the name of the Table to be fed and the Action on data to be carried out, in this use case: insert.
In the Schema field, select Built-in mode, and click [...] button next to the Edit schema field to describe the
structure of the data to be passed on to the next component.
Click the Advanced settings view to configure the advanced settings of the component.
991
Select the Use an existing control file check box if you want to use a control file (.ctl) storing the status of
the physical structure of the database. Or, fill in the following fields manually: Record format, Specify .ctl
files INTO TABLE clause manually, Field terminated by, Use field enclosure, Use schemas Date Pattern
to load Date field, Specify field condition, Preserve blanks, Trailing null columns, Load options, NLS
Language et Set Parameter NLS_TERRITORY according to your database.
Define the Encoding as in preceding steps.
For this scenario, in the Output field, select to console to output the standard output f the database to the console.
Press F6 to run the job. The log output displays in the Run tab and the table is fed with the parameter file data.
Related topic: see section Scenario: Inserting data in MySQL database.
992
tOracleClose
tOracleClose
tOracleClose properties
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Oracle components, especially with tOracleConnection
and tOracleCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
993
tOracleCommit
tOracleCommit
tOracleCommit Properties
This component is closely related to tOracleConnection and tOracleRollback. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases/Oracle
Function
Validates the data processed through the job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Oracle components, especially with tOracleConnection
and tOracleRollback components.
Limitation
n/a
Related scenario
This component is closely related to tOracleConnection and tOracleRollback. It usually doesnt make much
sense to use one of these without using a tOracleConnection component to open a connection for the current
transaction.
For tOracleCommit related scenario, see section tMysqlConnection
994
tOracleConnection
tOracleConnection
tOracleConnection Properties
This component is closely related to tOracleCommit and tOracleRollback. It usually doesnt make much sense
to use one of these without using a tOracleConnection component to open a connection for the current transaction.
Component family
Databases/Oracle
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Connection type
DB Version
Host
Port
Database
Schema
Additional
parameters
995
Related scenario
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Specify a data source alias
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
Usage
This component is to be used along with Oracle components, especially with tOracleCommit and
tOracleRollback components.
Limitation
n/a
Related scenario
This component is closely related to tOracleCommit and tOracleRollback. It usually doesnt make much sense
to use one of these without using a tOracleConnection component to open a connection for the current transaction.
For tOracleConnection related scenario, see section tMysqlConnection
996
tOracleInput
tOracleInput
tOracleInput properties
Component
family
Databases/
Oracle
Function
Purpose
tOracleInput executes a DB query with a strictly defined order which must correspond to the schema definition.
Then it passes on the field list to the next component via a Main row link.
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an existing Select this check box and in the Component List click the relevant connection component to reuse
connection
the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
Connection type Drop-down list of available drivers:
Oracle OCI: Select this connection type to use Oracle Call Interface with a set of C-language
software APIs that provide an interface to the Oracle database.
Oracle Custom: Select this connection type to access a clustered database.
Oracle Service Name: Select this connection type to use the TNS alias that you give when you
connect to the remote database.
WALLET: Select this connection type to store credentials in an Oracle wallet.
Oracle SID: Select this connection type to uniquely identify a particular database on a system.
DB Version
Host
Port
Database
997
Scenario 1: Using context parameters when reading a table from an Oracle database
Built-in: You create and store the schema locally for this component only. Related topic: see Talend
Studio User Guide.
Table name
Query type and Enter your DB query paying particularly attention to properly sequence the fields in order to match
Query
the schema definition.
Specify a data Select this check box and specify the alias of a data source created on the side to use the shared
source alias
connection pool defined in the data source configuration. This option works only when you deploy
and run your Job in .
If you use the component's own DB configuration, your data source connection will be
closed at the end of the component. To prevent this from happening, use a shared DB
connection with the data source alias specified.
This check box is not available when the Use an existing connection check box is selected.
Advanced
settings
tStatCatcher
Statistics
Select this check box to collect log data at the component level.
Use cursor
When selected, helps to decide the row set to work with at a time and thus optimize performance.
Trim all the Select this check box to remove leading and trailing whitespace from all the String/Char columns.
String/Char
columns
Dynamic
settings
Trim column
No null values
Check this box to improve the performance if there are no null values.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Oracle databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can easily
find out and add such JARs in the Integration perspective of your studio. For details, see the section about external
modules in the Talend Installation and Upgrade Guide.
Drop tOracleInput and tLogRow from the Palette onto the workspace.
2.
998
Scenario 1: Using context parameters when reading a table from an Oracle database
Double-click tOracleInput to open its Basic Settings view in the Component tab.
2.
In the Host field, enter the Oracle database serverse's IP address, "192.168.0.19" in this example.
In the Port field, enter the port number, "1521" in this example.
In the Database field, enter the database name, "talend" in this example.
In the Oracle schema field, enter the Oracle schema name, "TALEND" in this example.
In the Username and Password fields, enter the authentication details, respectively "talend" and "oracle"
in this example.
3.
Set the Schema as Built-In and click Edit schema to define the desired schema.
The schema editor opens:
4.
Click the [+] button to add the rows that you will use to define the schema, three columns in this example:
id, name and age.
Under Column, click the fields to enter the corresponding column names.
Click the fields under Type to define the type of data.
Talend Open Studio for Big Data Components Reference Guide
999
Scenario 1: Using context parameters when reading a table from an Oracle database
Put the cursor in the Table Name field and press F5 for context parameter setting.
For more information about context settings, see Talend Studio User Guide.
6.
Keep the default setting in the Name field and type in the name of the database table in the Default value
field, staff in this use case.
7.
8.
In the Query type list, select Built-In. Then, click Guess Query to get the query statement.
"SELECT
TALEND."+context.TABLE+".id,
TALEND."+context.TABLE+".name,
TALEND."+context.TABLE+".age
FROM TALEND."+context.TABLE
9.
10. In the Mode area, select Table (print values in cells of a table) for a better display of the results.
1000
Related scenarios
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1001
tOracleOutput
tOracleOutput
tOracleOutput properties
Component family
Databases/Oracle
Function
Purpose
tOracleOutput executes the action defined on the table and/or on the data contained in the table, based
on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: No property data stored centrally.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Connection type
DB Version
Host
Port
Database
Name of the table to be written. Note that only one table can be written at a time.
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
1002
tOracleOutput properties
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and
created again.
Clear table: The table content is deleted.
Truncate table: The table content is deleted. You do not have the possibility
to rollback the operation.
Truncate table with reuse storage: The table content is deleted. You do not
have the possibility to rollback the operation. However, it is allowed to reuse the
existing storage allocated to the table though the storage is considered empty.
Action on data
Schema and Edit schema A schema is a row description, i.e., it defines the number of fields to be
processed and passed on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only.
Related topic: see Talend Studio User Guide.
Die on error
This check box is selected by default. Clear the check box to skip the row on
error and complete the process for error-free rows. If needed, you can retrieve
the rows on error via a Row > Rejects link.
Specify a data source Select this check box and specify the alias of a data source created on the side
alias
to use the shared connection pool defined in the data source configuration. This
option works only when you deploy and run your Job in .
If you use the component's own DB configuration, your data source
connection will be closed at the end of the component. To prevent this
from happening, use a shared DB connection with the data source
alias specified.
This check box is not available when the Use an existing connection check
box is selected.
Advanced settings
Additional
parameters
JDBC Specify additional connection properties for the DB connection you are
creating. This option is not available if you have selected the Use an existing
connection check box in the Basic settings.
1003
tOracleOutput properties
Select this check box to collect log data at the component level.
Additional Columns
This option is not offered if you create (with or without drop) the DB table. This
option allows you to call SQL functions to perform actions on columns, which
are not insert, nor update or delete actions, or action that require particular
preprocessing.
Name: Type in the name of the schema column to be altered or inserted as new
column.
SQL expression: Type in the SQL statement to be executed in order to alter
or insert the relevant column data.
Position: Select Before, Replace or After following the action to be performed
on the reference column.
Reference column: Type in a column of reference that the tDBOutput can use
to place or replace the new or altered column.
Select this check box to customize a request, especially when there is double
action on data.
Select this check box to activate the hint configuration area which helps you
optimize a querys execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax /*+ */. - POSITION:
specify where you put the hint in a SQL statement.
- SQL STMT: select the SQL statement you need to use.
Convert columns
table to uppercase
and Select this check box to set the names of columns and table in upper case.
Select this check box to display each step during processing entries in a
database.
When selected, enables you to define the number of lines in each processed
batch.
This option is available only when you do not Use an existing
connection in Basic settings.
Support null in SQL Select this check box to validate null in SQL WHERE statement.
WHERE statement
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible.
This component must be used as an output component. It allows you to carry out actions on a table or on
the data of a table in a Oracle database. It also allows you to create a reject flow using a Row > Rejects
link to filter data in error. For an example of tMysqlOutput in use, see section Scenario 3: Retrieve data
in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see the
section about external modules in the Talend Installation and Upgrade Guide.
1004
Related scenarios
Related scenarios
For tOracleOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1005
tOracleOutputBulk
tOracleOutputBulk
tOracleOutputBulk properties
The tOracleOutputBulk and tOracleBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tOracleOutputBulkExec component, detailed in a separate section. The
advantage of using two separate steps is that the data can be transformed before it is loaded in the database.
Component family
Databases/Oracle
Function
Writes a file with columns based on the defined delimiter and the Oracle standards
Purpose
Prepares the file to be used as parameter in the INSERT query to feed the Oracle database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the file
Advanced settings
Advanced separator (for Select this check box to change data separators for numbers:
number)
Thousands separator: define separators you want to use for
thousands.
Decimal separator: define separators you want to use for
decimals.
Usage
1006
Field separator
Row separator
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
This component is to be used along with tOracleBulkExec component. Used together they
offer gains in performance while feeding a Oracle database.
Related scenarios
Related scenarios
For use cases in relation with tOracleOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1007
tOracleOutputBulkExec
tOracleOutputBulkExec
tOracleOutputBulkExec properties
The tOracleOutputBulk and tOracleBulkExec components are used together in a two step process. In the first
step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tOracleOutputBulkExec component.
Component family
Databases/Oracle
Function
Purpose
As a dedicated component, it allows gains in performance during Insert operations to an Oracle database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: No property data stored centrally.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Connection type
DB Version
Host
Port
Database
Schema
1008
tOracleOutputBulkExec properties
Table
Name of the table to be written. Note that only one table can be written at a time
and that the table must exist for the insert operation to succeed.
Action on table
On the table defined, you can perform one of the following operations:
None: No operations is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if does not exist.
Drop table if exists and create: The table is removed if it already exists and
created again.
Clear table: The table content is deleted.
Truncate table: The table content is deleted. You do not have the possibility
to rollback the operation.
File Name
Create directory if not This check box is selected by default. It creates a directory to hold the output
exists
table if required.
Append
Select this check box to add the new rows at the end of the file.
Action on data
Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be
processed and passed on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Click Edit Schema to make changes to the schema.
Advanced settings
Field separator
Encoding
Select the encoding from the list or select Custom and define it manually. This
field is compulsory for DB data handling.
Advanced separator (for Select this check box to change data separators for numbers:
number)
Thousands separator: define separators you want to use for thousands.
Decimal separator: define separators you want to use for decimals.
Use existing control file
Select this check box and browse to the .ctl control file you want to use.
Field separator
Row separator
Specify .ctl files INTO Select this check box to enter manually the INTO TABLE clause of the control
TABLE clause manually file directly into the code.
Use schemas Date Select this check box to use the date model indicated in the schema for dates.
Pattern to load Date field
Specify field condition
Preserve blanks
1009
Related scenarios
Select this check box to load data with all empty columns.
Load options
NLS Language
From the drop-down list, select the language for your data if the data is not in
Unicode.
Set
Parameter Select this check box to modify the conventions used for date and time formats.
NLS_TERRITORY
The default value is that of the operating system.
Encoding
Set Oracle
Type
Select the encoding from the list or select Custom and define it manually. This
field is compulsory for DB data handling.
Encoding Select this check box to type in the characterset next to the Oracle Encoding
Type field.
Output
Select the type of output for the standard output of the Oracle database:
to console,
to global variable.
Convert columns and Select this check box to put columns and table names in upper case.
table names to uppercase
Dynamic settings
Set the parameters Buffer Size and StringBuilder Size for a performance gain
according to the memory size.
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job level as well
as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is mainly used when no particular transformation is required on the data to be loaded
onto the database.
Limitation
The database server/client must be installed on the same machine where the Studio is installed or where
the Job using tOracleOutputBulkExec is deployed, so that the component functions properly.
Related scenarios
For use cases in relation with tOracleOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1010
tOracleRollback
tOracleRollback
tOracleRollback properties
This component is closely related to tOracleCommit and tOracleConnection. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Oracle components, especially with tOracleConnection
and tOracleCommit components.
Limitation
n/a
Related scenario
This component is closely related to tOracleConnection and tOracleCommit. It usually doesnt make much
sense to use one of these without using a tOracleConnection component to open a connection for the current
transaction.
For tOracleRollback related scenario, see section tMysqlRollback
1011
tOracleRow
tOracleRow
tOracleRow properties
Component family
Databases/Oracle
Function
tOracleRow is the specific component for this database query. It executes the SQL query stated onto the
specified database. The row suffix means the component implements a flow in the job design although
it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tOracleRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your SQL
statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: No property data stored centrally.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Connection type
DB Version
Host
Port
Database
1012
tOracleRow properties
Query
Use NB_LINE_
This option allows you feed the variable with the number of rows inserted/
updated/deleted to the next component or subjob. This field only applies if the
query entered in Query field is a INSERT, UPDATE or DELETE query.
NONE: does not feed the variable.
INSERTED: feeds the variable with the number of rows inserted.
UPDATED: feeds the variable with the number of rows updated.
DELETED: feeds the variable with the number of rows deleted.
Specify a data source Select this check box and specify the alias of a data source created on the side
alias
to use the shared connection pool defined in the data source configuration. This
option works only when you deploy and run your Job in .
If you use the component's own DB configuration, your data source
connection will be closed at the end of the component. To prevent this
from happening, use a shared DB connection with the data source
alias specified.
This check box is not available when the Use an existing connection check
box is selected.
Die on error
Advanced settings
Propagate
recordset
This check box is selected by default. Clear the check box to skip the row on
error and complete the process for error-free rows. If needed, you can retrieve
the rows on error via a Row > Rejects link.
QUERYs Select this check box to insert the result of the query into a COLUMN of the
current flow. Select this column from the use column list.
This option allows the component to have a different schema from
that of the preceding component. Moreover, the column that holds
the QUERYs recordset should be set to the type of Object and this
component is usually followed by tParseRecordSet.
Use PreparedStatement
Commit every
tStatCatcher Statistics
Select this check box to collect log data at the component level.
1013
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns
an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns
an integer.
NB_LINE_DELETED: Indicates the number of rows deleted. This is an After variable and it returns an
integer.
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means
it functions after the execution of a component.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
section Scenario 2: Using PreparedStatement objects to query data.
1014
tOracleSCD
tOracleSCD
tOracleSCD belongs to two component families: Business Intelligence and Databases. For more information on
it, see section tOracleSCD.
1015
tOracleSCDELT
tOracleSCDELT
tOracleSCDELT belongs to two component families: Business Intelligence and Databases. For more information
on it, see section tOracleSCDELT.
1016
tOracleSP
tOracleSP
tOracleSP Properties
Component family
Databases/Oracle
Function
Purpose
tOracleSP offers a convenient way to centralize multiple or complex queries in a database and call them
easily.
Basic settings
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
Connection type
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: No property data stored centrally.
DB Version
Host
Port
Database
Schema
1017
tOracleSP Properties
Built-in: You create and store the schema locally for this component only.
Related topic: see Talend Studio User Guide.
SP Name
Is Function / Return Select this check box, if the stored procedure is a function and one value only
result in
is to be returned.
Select on the list the schema column, the value to be returned is based on.
Parameters
Click the Plus button and select the various Schema Columns that will be
required by the procedures. Note that the SP schema can hold more columns
than there are parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter.
OUT: Output parameter/return value.
IN OUT: Input parameter is to be returned as value, likely after modification
through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of values, rather than
single value.
Check the section Scenario: Inserting data in mother/daughter tables
if you want to analyze a set of records from a database table or DB
query and return single records.
The Custom Type is used when a Schema Column you want to use is userdefined. Two Custom Type columns are available in the Parameters table. In
the first Custom Type column:
- Select the check box in the Custom Type column when the corresponding
Schema Column you want to use is of user-defined type.
- If all listed Schema Columns in the Parameters table are of custom type,
you can select the check box before Custom Type once for them all.
Select a database type from the DB Type list to map the source database type
to the target database type:
- Auto-Mapping: Map the source database type to the target database type
automatically.(default)
- CLOB: Character large object.
- BLOB: Binary large object.
- DECIMAL: Decimal numeric object.
- NUMERIC: Character 0 to 9.
In the second Custom Type column, you can precise what the custom type is.
The type may be:
- STRUCT: used for one element.
- ARRAY: used for a collection of elements.
In the Custom name column, specify the name of the custom type that you
have given to this type.
When an OUT parameter uses the custom type, make sure that its
corresponding Schema Column has chosen the Object type in the
schema table.
Advanced settings
1018
Additional
parameters
JDBC Specify additional connection properties for the DB connection you are
creating. This option is not available if you have selected the Use an existing
connection check box in the Basic settings.
NLS Language
In the list, select the language used for the data that are not used in Unicode.
NLS Territory
Select the conventions used for date and time formats. The default value is that
of the operating system.
tStatCatcher Statistics
Dynamic settings
Select this check box to gather the job processing metadata at a Job level as
well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as intermediary component. It can be used as start component but only input
parameters are thus allowed.
Limitation
Drag and drop the following components from the Palette: tOracleConnection, tOracleInput, tOracleSP and
tLogRow.
Link the tOracleConnection to the tOracleInput using a Then Run connection as no data is handled here.
And connect the other components using a Row Main link as rows are to be passed on as parameter to the SP
component and to the console.
In the tOracleConnection, define the details of connection to the relevant Database. You will then be able to
reuse this information in all other DB-related components.
Then select the tOracleInput and define its properties.
1019
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
Select the Use an existing connection check box and select the tOracleConnection component in the list in
order to reuse the connection details that you already set.
Select Repository as Property type as the Oracle schema is defined in the DB Oracle connection entry of the
Repository. If you havent recorded the Oracle DB details in the Repository, then fill in the Schema name
manually.
Then select Repository as Schema, and retrieve the relevant schema corresponding to your Oracle DB table.
In this example, the SSN table has a four-column schema that includes ID, NAME, CITY and SSNUMBER.
In the Query field, type in the following Select query or select it in the list, if you stored it in the Repository.
select ID, NAME, CITY, SSNUMBER from SSN
Like for the tOracleInput component, select Repository in the Property type field and select the Use an
existing connection check box, then select the relevant entries in the respective list.
The schema used for the tOracleSP slightly differs from the input schema. Indeed, an extra column (SSN_Valid)
is added to the Input schema. This column will hold the format validity status (1 or 0) produced by the procedure.
1020
In the SP Name field, type in the exact name of the stored procedure (or function) as called in the Database.
In this use case, the stored procedure name is is_ssn.
The basic function used in this particular example is as follows:
CREATE OR REPLACE FUNCTION is_ssn(string_in VARCHAR2)
RETURN PLS_INTEGER
IS
-- validating ###-##-#### format
BEGIN
IF TRANSLATE(string_in, '0123456789A', 'AAAAAAAAAAB') =
'AAA-AA-AAAA' THEN
RETURN 1;
END IF;
RETURN 0;
END is_ssn;
/
As a return value is expected in this use case, the procedure acts as a function, so select the Is function check box.
The only return value expected is based on the ssn_valid column, hence select the relevant list entry.
In the Parameters area, define the input and output parameters used in the procedure. In this use case, only the
SSNumber column from the schema is used in the procedure.
Click the plus sign to add a line to the table and select the relevant column (SSNumber) and type (IN).
Then select the tLogRow component and click Sync Column to make sure the schema is passed on from the
preceding tOracleSP component.
Select the Print values in cells of a table check box to facilitate the output reading.
Then save your job and press F6 to run it.
On the console, you can read the output results. All input schema columns are displayed even though they are not
used as parameters in the stored procedure.
1021
The final column shows the expected return value, whether the SS Number checked is valid or not.
Check section Scenario: Inserting data in mother/daughter tables if you want to analyze a set of records from a database
table or DB query and return single records.
1022
tOracleTableList
tOracleTableList
tOracleTableList properties
Component family
Databases/Oracle
Function
Purpose
This component lists the names of specified Oracle tables using a SELECT statement based on a
WHERE clause.
Basic settings
Component list
Where clause for table name Enter the WHERE clause that will be used to identify the tables to
selection
iterate on.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with other Oracle components, especially with
tOracleConnection.
Limitation
n/a
Related scenarios
No scenario is available for this component yet.
1023
tPostgresqlBulkExec
tPostgresqlBulkExec
tPostgresqlBulkExec properties
tPostgresqlOutputBulk and tPostgresqlBulkExec components are used together to first output the file
that will be then used as parameter to execute the SQL query stated. These two steps compose the
tPostgresqlOutputBulkExec component, detailed in a separate section. The interest in having two separate
elements lies in the fact that it allows transformations to be carried out before the data loading in the database.
Component
family
Databases/Postgresql
Function
Purpose
As a dedicated component, tPostgresqlBulkExec offers gains in performance while carrying out the Insert operations
to a Postgresql database
existing Select this check box and in the Component List click the relevant connection component
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
DB Version
Host
Port
Database
Schema
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time and that
the table must exist for the insert operation to succeed.
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and created again.
Clear table: The table content is deleted.
1024
Related scenarios
Truncate table: The table content is deleted. You don not have the possibility to rollback
the operation.
File Name
Schema
Schema
and
Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed
on to the next component.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Advanced
settings
Action on data
Copy the OID for each Retrieve the ID item for each row.
row
Contains a header line Specify that the table contains header.
with the names of each
column in the file
File type
Null string
Fields terminated by
Escape char
Text enclosure
Activate
Activate the variable.
standard_conforming_string
Force not null for Define the columns nullability.
columns
Force not null: Select the check box next to the column you want to define as not null.
tStatCatcher Statistics Select this check box to collect log data at the component level.
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with tPostgresqlOutputBulk component. Used together, they can offer gains
in performance while feeding a Postgresql database.
Limitation
n/a
Related scenarios
For use cases in relation with tPostgresqlBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1025
tPostgresqlCommit
tPostgresqlCommit
tPostgresqlCommit Properties
This component is closely related to tPostgresqlCommit and tPostgresqlRollback. It usually does not make
much sense to use these components independently in a transaction.
Function
Validates the data processed through the job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tPostgresqlConnection and tPostgresqlRollback. It usually does not make
much sense to use one of these without using a tPostgresqlConnection component to open a connection for the
current transaction.
For tPostgresqlCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
1026
tPostgresqlClose
tPostgresqlClose
tPostgresqlClose properties
Component family
Databases/Postgresql
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1027
tPostgresqlConnection
tPostgresqlConnection
tPostgresqlConnection Properties
This component is closely related to tPostgresqlCommit and tPostgresqlRollback. It usually doesnt make much
sense to use one of these without using a tPostgresqlConnection component to open a connection for the current
transaction.
Component family
Databases/Postgresql
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
DB Version
Host
Port
Database
Schema
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
database connection among several database connection components
from different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Auto Commit
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Limitation
n/a
Related scenario
This component is closely related to tPostgresqlCommit and tPostgresqlRollback. It usually doesnt make much
sense to use one of these without using a tPostgresqlConnection component to open a connection for the current
transaction.
1028
Related scenario
1029
tPostgresqlInput
tPostgresqlInput
tPostgresqlInput properties
Component
family
Databases/
PostgreSQL
Function
Purpose
tPostgresqlInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
DB Version
Host
Port
Database
Schema
Username
Password
Schema and Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed
Schema
on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only. Related topic: see
Talend Studio User Guide.
Table name
Query type and Enter your DB query paying particularly attention to properly sequence the fields in order to
Query
match the schema definition.
Advanced
settings
Use cursor
When selected, helps to decide the row set to work with at a time and thus optimize performance.
Trim all the Select this check box to remove leading and trailing whitespace from all the String/Char
String/Char
columns.
columns
1030
Related scenarios
Trim column
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This
is an After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component covers all possible SQL queries for Postgresql databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can easily
find out and add such JARs in the Integration perspective of your studio. For details, see the section about external
modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1031
tPostgresqlOutput
tPostgresqlOutput
tPostgresqlOutput properties
Component
family
Databases/
Postgresql
Function
Purpose
tPostgresqlOutput executes the action defined on the table and/or on the data contained in the table, based on the
flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
DB Version
Host
Port
Database
Schema
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if already exists and created again.
Clear table: The table content is deleted.
Truncate table: The table content is deleted. You don not have the possibility to rollback
the operation.
Action on data
1032
tPostgresqlOutput properties
Advanced
settings
Die on error
This check box is cleared by default, meaning to skip the row on error and to complete the
process for error-free rows.
Commit every
Enter the number of rows to be completed before committing batches of rows together into
the DB. This option ensures transaction quality (but not rollback) and, above all, better
performance at execution.
Additional
Columns
This option is not offered if you create (with or without drop) the DB table. This option allows
you to call SQL functions to perform actions on columns, which are not insert, nor update or
delete actions, or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or inserted as new column
SQL expression: Type in the SQL statement to be executed in order to alter or insert the
relevant column data.
Position: Select Before, Replace or After following the action to be performed on the
reference column.
Reference column: Type in a column of reference that the tDBOutput can use to place or
replace the new or altered column.
Select this check box to use savepoints in the transaction. This check box will not be available
if you select:
the Die on error check box in the Basic settings view, or
the Use Batch Size check box in the Advanced settings view.
This check box will not work if you:
type in 0 in the Commit every field, or
select the Use an existing connection check box in the Basic settings view while the Auto
Commit mode is activated in the database connection component.
Select this check box to customize a request, especially when there is double action on data.
debug Select this check box to display each step during processing entries in a database.
Support null in Select this check box if you want to deal with the Null values contained in a DB table.
SQL WHERE
Ensure that the Nullable check box is selected for the corresponding columns in
statement
the schema.
1033
Related scenarios
Select this check box to activate the batch mode for data processing. In the Batch Size field
that appears when this check box is selected, you can type in the number you need to define
the batch size to be processed.
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed
and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This
is an After variable and it returns an integer.
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
NB_LINE_DELETED: Indicates the number of rows deleted. This is an After variable and it returns an integer.
NB_LINE_REJECTED: Indicates the number of rows rejected. This is an After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is an After variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible.
This component must be used as an output component. It allows you to carry out actions on a table or on the data of
a table in a Postgresql database. It also allows you to create a reject flow using a Row > Rejects link to filter data
in error. For an example of tMySqlOutput in use, see section Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can easily
find out and add such JARs in the Integration perspective of your studio. For details, see the section about external
modules in the Talend Installation and Upgrade Guide.
Related scenarios
For tPostgresqlOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1034
tPostgresqlOutputBulk
tPostgresqlOutputBulk
tPostgresqlOutputBulk properties
The tPostgresqlOutputBulk and tPostgresqlBulkExec components are generally used together as part of a two
step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tPostgresqlOutputBulkExec component,
detailed in a separate section. The advantage of having two separate steps is that it makes it possible to transform
data before it is loaded in the database.
Component family
Databases/Postgresql
Function
Writes a file with columns based on the defined delimiter and the Postgresql standards
Purpose
Prepares the file to be used as parameters in the INSERT query to feed the Postgresql database.
Basic settings
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the file.
Advanced settings
Global Variables
Row separator
Field separator
Include header
Select this check box to include the column header to the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component is to be used along with tPostgresqlBulkExec component. Used together they
offer gains in performance while feeding a Postgresql database.
1035
Related scenarios
Related scenarios
For use cases in relation with tPostgresqlOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1036
tPostgresqlOutputBulkExec
tPostgresqlOutputBulkExec
tPostgresqlOutputBulkExec properties
The tPostgresqlOutputBulk and tPostgresqlBulkExec components are generally used together as part of a two
step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tPostgresqlOutputBulkExec component.
Component family
Databases/Postgresql
Function
Purpose
Basic settings
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Host
Port
Database
Schema
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
File Name
1037
Related scenarios
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Advanced settings
Action on data
Copy the OID for each row Retrieve the ID item for each row.
Contains a header line with Specify that the table contains header.
the names of each column in
the file
Encoding
Select the encoding from the list or select CUSTOM and define it
manually. This field is compulsory for DB data handling.
File type
Null string
Row separator
Fields terminated by
Escape char
Text enclosure
Activate
Activate the variable.
standard_conforming_string
Force not null for columns
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is mainly used when no particular tranformation is required on the data to be
loaded onto the database.
Limitation
The database server must be installed on the same machine where the Studio is installed or where
the Job using tPostgresqlOutputBulkExec is deployed, so that the component functions properly.
Related scenarios
For use cases in relation with tPostgresqlOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1038
tPostgresqlRollback
tPostgresqlRollback
tPostgresqlRollback properties
This component is closely related to tPostgresqlCommit and tPostgresqlConnection. It usually does not make
much sense to use these components independently in a transaction.
Component family
Databases
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tPostgresqlConnection and tPostgresqlCommit. It usually does not make
much sense to use one of them without using a tPostgresqlConnection component to open a connection for the
current transaction.
For tPostgresqlRollback related scenario, see section tMysqlRollback
1039
tPostgresqlRow
tPostgresqlRow
tPostgresqlRow properties
Component
family
Databases/Postgresql
Function
tPostgresqlRow is the specific component for the database query. It executes the SQL query stated onto the specified
database. The row suffix means the component implements a flow in the job design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tPostgresqlRow acts on the actual DB structure or on the
data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements.
Basic
settings
Property type
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an
connection
existing Select this check box and in the Component List click the relevant connection component to
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
DB Version
Host
Port
Database
Schema
Username
Password
Schema
Schema
Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed
on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only. Related topic: see
Talend Studio User Guide.
Query type
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: Fill in manually the query statement or build it graphically using SQLBuilder
1040
Query
Enter your DB query paying particularly attention to properly sequence the fields in order to
match the schema definition.
Die on error
This check box is selected by default. Clear the check box to skip the row on error and complete
the process for error-free rows. If needed, you can retrieve the rows on error via a Row >
Rejects link.
Related scenarios
Advanced
settings
Propagate QUERYs Select this check box to insert the result of the query into a COLUMN of the current flow.
recordset
Select this column from the use column list.
This option allows the component to have a different schema from that of the
preceding component. Moreover, the column that holds the QUERYs recordset
should be set to the type of Object and this component is usually followed by
tParseRecordSet.
Use
PreparedStatement
Select this checkbox if you want to query the database using a PreparedStatement. In the Set
PreparedStatement Parameter table, define the parameters represented by ? in the SQL
instruction of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same query several times.
Performance levels are increased
Commit every
tStat
Statistics
Dynamic
settings
Number of rows to be completed before committing batches of rows together into the DB.
This option ensures transaction quality (but not rollback) and above all better performance on
executions.
Catcher Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component offers the flexibility benefit of the DB query and covers all possible SQL queries.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1041
tPostgresqlSCD
tPostgresqlSCD
tPostgresqlSCD belongs to two component families: Business Intelligence and Databases. For more information
on it, see section tPostgresqlSCD.
1042
tPostgresqlSCDELT
tPostgresqlSCDELT
tPostgresqlSCDELT belongs to two component families: Business Intelligence and Databases. For more
information on it, see section tPostgresqlSCDELT.
1043
tSybaseBulkExec
tSybaseBulkExec
tSybaseBulkExec Properties
The tSybaseOutputBulk and tSybaseBulkExec components are generally used together as parts of a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tSybaseOutputBulkExec component, detailed
in a separate section. The advantage of using two separate components is that the data can be transformed before
it is loaded in the database.
Component family
Databases
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Server
Port
Database
Database name
Bcp Utility
Name of the utility to be used to copy data over to the Sybase server.
Server
Batch size
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
1044
tSybaseBulkExec Properties
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already
exists and created again.
Clear table: The table content is deleted.
Truncate table: The table content is deleted. You do not have the
possibility to rollback the operation.
File Name
Advanced settings
Select this check box to specify an interface file in the field Interface
file.
JDBC Specify additional connection properties in the existing DB
connection, to allow specific character set support. E.G.:
CHARSET=KANJISJIS_OS to get support of Japanese characters.
Action on data
Field Terminator
Row Terminator
Head row
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Output
Select the type of output for the standard output of the Sybase
database:
to console,
to global variable.
tStataCatcher statistics
Dynamic settings
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is mainly used when no particular transformation is required on the data to be
loaded onto the database.
1045
Related scenarios
Limitation
The database server/client must be installed on the same machine where the Studio is installed or
where the Job using tSybaseBulkExec is deployed, so that the component functions properly.
As opposed to the Oracle dedicated bulk component, no action on data is possible using this Sybase
dedicated component.
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For tSybaseBulkExec related topics, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1046
tSybaseClose
tSybaseClose
tSybaseClose properties
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Sybase components, especially with tSybaseConnection
and tSybaseCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1047
tSybaseCommit
tSybaseCommit
tSybaseCommit Properties
This component is closely related to tSybaseConnection and tSybaseRollback. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Sybase
Function
tSybaseCommit validates the data processed through the Job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Sybase components, especially with tSybaseConnection
and tSybaseRollback.
Limitation
n/a
Related scenario
This component is closely related to tSybaseConnection and tSybaseRollback. It usually does not make much
sense to use one of these without using a tSybaseConnection component to open a connection for the current
transaction.
For tSybaseCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
1048
tSybaseConnection
tSybaseConnection
tSybaseConnection Properties
This component is closely related to tSybaseCommit and tSybaseRollback. It usually does not make much sense
to use one of these without using a tSybaseConnection component to open a connection for the current transaction.
Component family
Databases/Sybase
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Usage
This component is to be used along with Sybase components, especially with tSybaseCommit and
tSybaseRollback.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For a tSybaseConnection related scenario, see section Scenario: Inserting data in mother/daughter tables.
1049
tSybaseInput
tSybaseInput
tSybaseInput Properties
Component family
Databases/Sybase
Function
Purpose
tSybaseInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Server
Port
Database
Sybase Schema
Advanced settings
1050
Table Name
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Select this check box to collect log data at the component level.
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Sybase databases.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1051
tSybaseIQBulkExec
tSybaseIQBulkExec
tSybaseIQBulkExec Properties
Component
Databases/
family Sybase IQ
FunctiontSybaseIQBulkExec uploads a bulk file in a Sybase IQ database.
Purpose As a dedicated component, it allows gains in performance during Insert operations to a Sybase IQ database.
Basic
Property type
settings
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
DB Version
Use an existing Select this check box and in the Component List click the relevant connection component to reuse the
connection
connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing connection
between the two levels, for example, to share the connection created by the parent Job with the
child Job, you have to:
Sybase
IQ 12
only.
1. In the parent level, register the database connection to be shared in the Basic settings view
of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend Studio
User Guide.
Host
Port
Data Source
Select the type of the data source to be used and complete the corresponding DSN information in the field
alongside. The available types are:
- DSN;
1052
tSybaseIQBulkExec Properties
Sybase - FILEDSN.
IQ 15
only. When the FILEDSN type is used, a three-dot button appears next to the Data Source field to allow you
to browse to the data source file of interest.
Database
Database name
Name of the table to be written. Note that only one table can be written at a time and that the table must
exist for the insert operation to succeed.
Action on table On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and created again.
Clear table: The table content is deleted.
Truncate table: The table content is deleted. You do not have the possibility to rollback the operation.
Local filename Name of the file to be loaded.
Schema
and A schema is a row description, i.e., it defines the number of fields to be processed and passed on to the
Edit Schema
next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: You create and store the schema locally for this component only. Related topic: see Talend
Studio User Guide.
AdvancedAdditional
settings JDBC
Parameters
Specify additional connection properties in the existing DB connection, to allow specific character set
support.
Lines
terminated by
Field
Terminated by
Use enclosed Select this check box to use data enclosure characters.
quotes
Use fixed length Select this check box to set a fixed width for data lines.
tStatCatcher
Statistics
Select this check box to gather the job processing metadata at a job level as well as at each component
level.
Dynamic Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
settings connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an environment
where you cannot change your Job settings, for example, when your Job has to be deployed and executed independent
of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This dedicated component offers performance and flexibility of Sybase IQ DB query handling.
Limitation
As opposed to the Oracle dedicated bulk component, no action on data is possible using this Sybase dedicated component.
This component requires installation of its related jar files. For more information about the installation of these missing jar
files, see the section describing how to configure the Studio of the Talend Installation and Upgrade Guide.
1053
Related scenarios
The jodbc.jar also needs to be installed separately in the Modules view of the Integration perspective in your studio.
For Sybase IQ 12, the database client/server should be installed on the same machine where the Studio is installed
or where the Job using tSybaseIQBulkExec is deployed, so that the component functions properly.
For Sybase IQ 15, it is allowed that only the database client is installed on the same machine where the Studio
is installed or where the Job using tSybaseIQBulkExec is deployed, so that the component functions properly.
However, this means certain setup on the Sybase IQ 15 server. For details, see Sybase IQ client-side load support
enhancements.
Related scenarios
For tSybaseIQBulkExec related topics, see:
section Scenario: Bulk-loading data to a Sybase IQ 12 database.
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1054
tSybaseIQOutputBulkExec
tSybaseIQOutputBulkExec
tSybaseIQOutputBulkExec properties
Component family
Databases/Sybase IQ
Function
Purpose
Basic settings
Property type
DB Version
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
1055
tSybaseIQOutputBulkExec properties
Sybase IQ 12 only.
Data Source
Sybase IQ 15 only.
Select the type of the data source to be used and complete the
corresponding DSN information in the field alongside. The available
types are:
- DSN;
- FILEDSN.
When the FILEDSN type is used, a three-dot button appears next
to the Data Source field to allow you to browse to the data source
file of interest.
Database
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
File Name
select this check box to add the new rows at the end of the records.
Advanced settings
Fields terminated by
1056
Lines terminated by
Include Head
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is mainly used when no particular transformation is required on the data to be loaded
onto the database.
Limitation
This component requires installation of its related jar files. For more information about the installation of
these missing jar files, see the section describing how to configure the Studio of the Talend Installation
and Upgrade Guide.
The jodbc.jar also needs to be installed separately in the Modules view of the Integration perspective
in your studio.
For Sybase IQ 12, the database client/server should be installed on the same machine where
the Studio is installed or where the Job using tSybaseIQOutputBulkExec is deployed, so that
the component functions properly.
For Sybase IQ 15, it is allowed that only the database client is installed on the same machine
where the Studio is installed or where the Job using tSybaseIQOutputBulkExec is deployed,
so that the component functions properly. However, this means certain setup on the Sybase
IQ 15 server. For details, see Sybase IQ client-side load support enhancements.
2.
1057
2.
Click the [+] button to add two columns, namely id and name.
3.
Select the type for id and name, respectively int and String.
4.
5.
Select
the
function
for
id
and
name,
respectively
Numeric.sequence
TalendDataGenerator.getFirstName.
6.
Click Ok to close the editor and click Yes on the pop-up below to propagate changes:
7.
1058
and
8.
9.
10. In the Username and Password fields, enter the authentication credentials.
11. In the Table field, enter the table name.
12. In the Action on table list, select Create table if not exists.
13. In the Filename field, enter the full path of the file to hold the data.
2.
3.
In the Sybase Central console, open the table staff to check the data:
1059
Related scenarios
Related scenarios
For use cases in relation with tSybaseIQOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1060
tSybaseOutput
tSybaseOutput
tSybaseOutput Properties
Component family
Databases/Sybase
Function
Purpose
tSybaseOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Server
Port
Database
Sybase Schema
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
1061
tSybaseOutput Properties
Truncate table: The table content is deleted. You do not have the
possibility to rollback the operation.
Turn on identity insert
Select this check box to use your own sequence for the identity value
of the inserted records (instead of having the SQL Server pick the
next sequential value).
Action on data
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
1062
Select this check box to display each step during processing entries
in a database.
Select this check box to activate the batch mode for data processing.
In the Batch Size field that appears when this check box is selected,
Related scenarios
you can type in the number you need to define the batch size to be
processed.
This check box is available only when you have selected
the Insert, the Update or the Delete option in the Action
on data field.
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Sybase database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For use cases in relation with tSybaseOutput, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1063
tSybaseOutputBulk
tSybaseOutputBulk
tSybaseOutputBulk properties
The tSybaseOutputBulk and tSybaseBulkExec components are generally used together as parts of a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tSybaseOutputBulkExec component, detailed
in a separate section. The advantage of using two separate components is that the data can be transformed before
it is loaded in the database.
Component family
Databases/Sybase
Function
Writes a file with columns based on the defined delimiter and the Sybase standards
Purpose
Prepares the file to be used as parameter in the INSERT query to feed the Sybase database.
Basic settings
Property type
File Name
Append
Select this check box to add the new rows at the end of the file.
Advanced settings
Row separator
Field separator
Usage
Include header
Select this check box to include the column header in the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect log data at the component level
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of
the Talend Installation and Upgrade Guide.
This component is to be used along with tSybaseBulkExec component. Used together they
offer gains in performance while feeding a Sybase database.
Related scenarios
For use cases in relation with tSybaseOutputBulk, see the following scenarios:
1064
Related scenarios
1065
tSybaseOutputBulkExec
tSybaseOutputBulkExec
tSybaseOutputBulkExec properties
The tSybaseOutputBulk and tSybaseBulkExec components are generally used together as parts of a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tSybaseOutputBulkExec component.
Component family
Databases/Sybase
Function
Purpose
Basic settings
Property type
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Server
Port
Database
Bcp utility
Name of the utility to be used to copy data over to the Sybase server.
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
1066
tSybaseOutputBulkExec properties
Drop and create a table: The table is removed and created again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not exist.
Clear a table: The table content is deleted.
File Name
Append
Select this check box to add the new rows at the end of the records.
Advanced settings
Select this check box to specify an interface file in the field Interface
file.
JDBC Specify additional connection properties in the existing DB
connection, to allow specific character set support. E.G.:
CHARSET=KANJISJIS_OS to get support of Japanese characters.
On the data of the table defined, you can perform:
Bulk Insert: Add multiple entries to the table. If duplicates are
found, job stops.
Bulk Update: Make simultaneous changes to multiple entries.
Field terminator
DB Row terminator
Type in the number of the file row where the action should start at.
Include Head
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Output
Select the type of output for the standard output of the Sybase
database:
to console,
to global variable.
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
1067
Related scenarios
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is mainly used when no particular transformation is required on the data to be
loaded onto the database.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
The database server/client must be installed on the same machine where the Studio is installed
or where the Job using tSybaseOutputBulkExec is deployed, so that the component functions
properly.
Related scenarios
For use cases in relation with tSybaseOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1068
tSybaseRollback
tSybaseRollback
tSybaseRollback properties
This component is closely related to tSybaseCommit and tSybaseConnection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Sybase
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Sybase components, especially with tSybaseConnection
and tSybaseCommit.
Limitation
n/a
Related scenarios
For tSybaseRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1069
tSybaseRow
tSybaseRow
tSybaseRow Properties
Component family
Databases/Sybase
Function
tSybaseRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tSybaseRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Server
Port
Database
Sybase Schema
Table Name
Select this check box to use your own sequence for the identity value
of the inserted records (instead of having the SQL Server pick the
next sequential value).
Query type
1070
Related scenarios
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Advanced settings
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For tSybaseRow related topics, see:
section Scenario 3: Combining two flows for selective output .
section Scenario: Resetting a DB auto-increment.
1071
Related scenarios
1072
tSybaseSCD
tSybaseSCD
tSybaseSCD belongs to two component families: Business Intelligence and Databases. For more information on
it, see section tSybaseSCD.
1073
tSybaseSCDELT
tSybaseSCDELT
tSybaseSCDELT belongs to two component families: Business Intelligence and Databases. For more information
on it, see section tSybaseSCDELT.
1074
tSybaseSP
tSybaseSP
tSybaseSP properties
Component family
Databases/Sybase
Function
Purpose
tSybaseSP offers a convenient way to centralize multiple or complex queries in a database and
call them easily.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
SP Name
Parameters
Click the Plus button and select the various Schema Columns that
will be required by the procedures. Note that the SP schema can hold
more columns than there are parameters used in the procedure.
Select the Type of parameter:
1075
Related scenarios
Additional
Parameters
Use Multiple
Procedure
tStatCatcher Statistics
Dynamic settings
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as intermediary component. It can be used as start component but only
input parameters are thus allowed.
Limitation
Related scenarios
For related topic, see section Scenario: Finding a State Label using a stored procedure.
Check section tMysqlConnection as well if you want to analyze a set of records from a database table or DB query
and return single records.
1076
tVerticaSCD
tVerticaSCD
tVerticaSCD belongs to two component families: Business Intelligence and Databases. For more information on
it, see section tVerticaSCD.
1077
Databases - appliance/datawarehouse
components
This chapter describes connectors for specific databases oriented to the processing of large volume of data.
These connectors cover various needs, including: opening connections, reading and writing tables, committing
transactions as a whole, and performing rollback for error handling. These components can be found in the Palette
of the Integration perspective of Talend Studio.
Other types of database connectors, such as connectors for traditional databases and database management, are
documented in Databases - traditional components and Databases - other components.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tGreenplumBulkExec
tGreenplumBulkExec
tGreenplumBulkExec Properties
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step process.
In the first step, an output file is generated. In the second step, this file is used in the INSERT statement used to
feed a database. These two steps are fused together in the tGreenplumOutputBulkExec component, detailed in
a separate section. The advantage using a two step process is that it makes it possible to transform data before
it is loaded in the database.
Component Family
Databases/Greenplum
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
1080
Related scenarios
Advanced settings
Action on data
Copy the OID for each row Retrieve the ID item for each row.
Contains a header line with Specify that the table contains header.
the names of each column in
the file
File type
Null string
Fields terminated by
Escape char
Text enclosure
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is generally used with a tGreenplumOutputBulk component. Used together they
offer gains in performance while feeding a Greenplum database.
Related scenarios
For more information about tGreenplumBulkExec, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1081
Related scenarios
1082
tGreenplumClose
tGreenplumClose
tGreenplumClose properties
Component family
Databases/Greenplum
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1083
tGreenplumCommit
tGreenplumCommit
tGreenplumCommit Properties
This component is closely related to tGreenplumConnection and tGreenplumRollback. It usually doesnt make
much sense to use these components independently in a transaction.
Component family
Databases/Greenplum
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenarios
This component is closely related to tGreenplumConnection and tGreenplumRollback. It usually doesnt make
much sense to use one of these without using a tGreenplumConnection component to open a connection for the
current transaction.
For tGreenplumCommit related scenarios, see:
section Scenario: Mapping data using a simple implicit join.
section tMysqlConnection.
1084
Related scenarios
1085
tGreenplumConnection
tGreenplumConnection
tGreenplumConnection properties
This component is closely related to tGreenplumCommit and tGreenplumRollback. It usually does not make
much sense to use one of these without using a tGreenplumConnection to open a connection for the current
transaction.
Component family
Databases/Greenplum
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Schema
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
Limitation
n/a
Related scenarios
This component is closely related to tGreenplumCommit and tGreenplumRollback. It usually does not make
much sense to use one of these without using a tGreenplumConnection component to open a connection for the
current transaction.
1086
Related scenarios
1087
tGreenplumGPLoad
tGreenplumGPLoad
This component invokes Greenplum's gpload utility to insert records into a Greenplum database. This component
can be used either in standalone mode, loading from an existing data file, or connected to an input flow to load
data from the connected component.
tGreenplumGPLoad properties
Component family
Databases/Greenplum
Function
tGreenplumGPLoad inserts data into a Greenplum database table using Greenplum's gpload
utility.
Purpose
This component is used to bulk load data into a Greenplum table either from an existing data file,
an input flow, or directly from a data flow in streaming mode through a named-pipe.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Schema
Table
Action on table
Action on data
1088
tGreenplumGPLoad properties
Data file
Use named-pipe
Advanced settings
Named-pipe name
Specify a name for the named-pipe to be used. Ensure that the name
entered is valid.
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Use existing control file Select this check box to provide a control file to be used with the
(YAML formatted)
gpload utility instead of specifying all the options explicitly in the
component. When this check box is selected, Data file and the other
gpload related options no longer apply. Refer to Greenplum's gpload
manual for details on creating a control file.
Control file
Enter the path to the control file to be used, between double quotation
marks, or click [...] and browse to the control file. This option is
passed on to the gpload utility via the -f argument.
CSV mode
Field separator
Escape char
Text enclosure
Header (skips the first row of Select this check box to skip the first row of the data file.
data file)
Additional options
1089
tGreenplumGPLoad properties
Browse to or enter the access path to the log file in your directory.
Encoding
Select this check box to specify the full path to the gpload executable.
You must check this option if the gpload path is not specified in the
PATH environment variable.
Full path
executable
to
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
This component is mainly used when no particular transformation is required on the data to be
loaded on to the database.
This component can be used as a standalone or an output component.
Limitation
1090
Related scenario
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For a related use case, see section Scenario: Inserting data in MySQL database.
1091
tGreenplumInput
tGreenplumInput
tGreenplumInput properties
Component family
Databases/Greenplum
Function
Purpose
tGreenplumInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Schema
Advanced settings
Guess Query
Guess schema
Use cursor
When selected, helps to decide the row set to work with at a time and
thus optimize performance.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Dynamic settings
Trim column
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
1092
This component covers all possible SQL queries for FireBird databases.
Related scenarios
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario: Mapping data using a simple implicit join.
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
See also related topic: section Scenario: Reading data from different MySQL databases using dynamically loaded
connection parameters.
1093
tGreenplumOutput
tGreenplumOutput
tGreenplumOutput Properties
Component Family
Databases/Greenplum
Function
Purpose
tGreenplumOutput executes the action defined on the table and/or on the data of a table, according
to the input flow form the previous component.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1094
tGreenplumOutput Properties
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column , select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit Schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
1095
Related scenarios
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Greenplum databases. It allows you to carry
out actions on a table or on the data of a table in a Greenplum database. It enables you to create a
reject flow, with a Row > Rejects link filtering the data in error. For a usage example, see section
Scenario 3: Retrieve data in error with a Reject link.
Related scenarios
For related scenarios, see:
section Scenario: Mapping data using a simple implicit join.
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1096
tGreenplumOutputBulk
tGreenplumOutputBulk
tGreenplumOutputBulk properties
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step process.
In the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to
feed a database. These two steps are fused together in the tGreenplumOutputBulkExec component, detailed in
a separate section. The advantage of using a two step process is that it makes it possible to transform data before
it is loaded in the database.
Component family
Databases/Greenplum
Function
Writes a file with columns based on the defined delimiter and the Greenplum standards
Purpose
Prepares the file to be used as parameter in the INSERT query to feed the Greenplum database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the records
Advanced settings
Usage
Row separator
Field separator
Include header
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStaCatcher statistics
Select this check box to collect log data at the component level.
Related scenarios
For use cases in relation with tGreenplumOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
1097
tGreenplumOutputBulkExec
tGreenplumOutputBulkExec
tGreenplumOutputBulkExec properties
The tGreenplumOutputBulk and tGreenplumBulkExec components are used together in a two step process. In
the first step, an output file is generated. In the second step, this file is used in the INSERT operation used to feed
a database. These two steps are fused together in the tGreenplumOutputBulkExec component.
Component family
Databases/Greenplum
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database name
Schema
Table
Action on table
File Name
1098
Related scenarios
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Advanced settings
Action on data
Copy the OID for each row Retrieve the ID item for each row.
Contains a header line with Specify that the table contains header.
the names of each column in
the file
File type
Null string
Row separator
Fields terminated by
Escape char
Text enclosure
tStatCatcherStatistics
Select this check box to collect log data at the component level.
Usage
This component is mainly used when no particular transformation is required on the data to be
loaded onto the database.
Limitation
The database server must be installed on the same machine where the Studio is installed or
where the Job using tGreenplumOutputBulkExec is deployed, so that the component functions
properly.
Related scenarios
For use cases in relation with tGreenplumOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
1099
tGreenplumRollback
tGreenplumRollback
tGreenplumRollback properties
This component is closely related to tGreenplumCommit and tGreenplumConnection. It usually does not make
much sense to use these components independently in a transaction.
Component family
Databases/Greenplum
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenarios
For tGreenplumRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1100
tGreenplumRow
tGreenplumRow
tGreenplumRow Properties
Component Family
Databases/Greenplum
Function
tGreenplumRow is the specific component for this database query. It executes the SQL query
stated onto the specified database. The row suffix means the component implements a flow in the
job design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tGreenplumRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Username et Password
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder.
1101
Related scenarios
Advanced settings
Guess Query
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenarios
For a related scenario, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1102
tGreenplumSCD
tGreenplumSCD
tGreenplumSCD belongs to two component families: Business Intelligence and Databases. For more information
on it, see section tGreenplumSCD.
1103
tIngresBulkExec
tIngresBulkExec
tIngresBulkExec properties
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first step, an
output file is generated. In the second step, this file is used in the INSERT operation used to feed a database.
These two steps are fused together in the tIngresOutputBulkExec component, detailed in another section. The
advantage of using two components is that data can be transformed before it is loaded in the database.
Component family
Databases/Ingres
Function
Purpose
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Table
VNode
Database
Action on table
File name
Delete Working Files After Use Select this check box to delete the files that are created during the
execution.
Advanced settings
Field Separator
Row Separator
Null Indicator
Session User
Rollback
On Error
Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.
Error Count
1104
Related scenarios
Extend
Fill Factor
Leaf Fill
A bulk copy from can specify a leaffill value. This clause specifies
the percentage (from 1 to 100) of each B-tree leaf page that must
be filled with rows during the copy. This clause can be used only
on tables with a B-tree storage structure.
Row Estimate
Trailing WhiteSpace
Encoding
Output
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Deployed along with tIngresOutputBulk, tIngresBulkExec feeds the given data in bulk to the
Ingres database for performance gain.
Limitation
The database server/client must be installed on the same machine where the Studio is installed or
where the Job using tIngresBulkExec is deployed, so that the component functions properly.
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For details,
see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Loading data to a table in the Ingres DBMS
1105
tIngresClose
tIngresClose
tIngresClose properties
Component family
Databases/Ingres
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Ingres components, especially with tIngresConnection
and tIngresCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1106
tIngresCommit
tIngresCommit
tIngresCommit Properties
This component is closely related to tIngresConnection and tIngresRollback. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Ingres
Function
Validates the data processed through the Job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Ingres components, especially with tIngresConnection
and tIngresRollback.
Limitation
n/a
Related scenario
For tIngresCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
1107
tIngresConnection
tIngresConnection
tIngresConnection Properties
This component is closely related to tIngresCommit and tIngresRollback. It usually does not make much sense
to use one of these without using a tIngresConnection component to open a connection for the current transaction.
Component family
Databases/Ingres
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Server
Port
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Usage
This component is to be used along with Ingres components, especially with tIngresCommit and
tIngresRollback.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For tIngresConnection related scenario, see section Scenario: Loading data to a table in the Ingres DBMS.
1108
tIngresInput
tIngresInput
tIngresInput properties
Component family
Databases/Ingres
Function
Purpose
tIngresInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Server
Port
Database
Usage
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Select this check box to collect log data at the component level.
This component covers all possible SQL queries for Ingres databases.
Limitation
1109
Related scenarios
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table
section Scenario 2: Using StoreSQLQuery variable.
See also the scenario for tContextLoad: section Scenario: Reading data from different MySQL databases using
dynamically loaded connection parameters.
1110
tIngresOutput
tIngresOutput
tIngresOutput properties
Component family
Databases/Ingres
Function
Purpose
tIngresOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1111
tIngresOutput properties
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit Schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Usage
Select this check box to display each step during processing entries
in a database.
Select this check box to collect log data at the component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Ingres database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
1112
Related scenarios
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection
section Scenario 1: Adding a new column and altering data in a DB table.
1113
tIngresOutputBulk
tIngresOutputBulk
tIngresOutputBulk properties
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first step, an
output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These
two steps are fused together in the tIngresOutputBulkExec component.
Component family
Databases/Ingres
Function
Prepares a file with the schema defined and the data coming from the preceding component.
Purpose
Prepares the file whose data is inserted in bulk to the Ingres DBMS for performance gain.
Basic settings
Property Type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Select this check box to add the new rows at the end of the file.
Advanced settings
Usage
Field Separator
Row Separator
Include Header
Select this check box to include the column header in the file.
Encoding
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For details,
see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Loading data to a table in the Ingres DBMS,
1114
tIngresOutputBulkExec
tIngresOutputBulkExec
tIngresOutputBulkExec properties
tIngresOutputBulk and tIngresBulkExec are generally used together in a two step process. In the first step, an
output file is generated. In the second step, this file is used in the INSERT operation used to feed a database. These
two steps are fused together in the tIngresOutputBulkExec component.
Component family
Databases/Ingres
Function
Prepares an output file and uses it to feed a table in the Ingres DBMS.
Purpose
Inserts data in bulk to a table in the Ingres DBMS for performance gain.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Table
VNode
Database
Action on table
File name
Delete Working Files After Select this check box to delete the files that are created during the
Use
execution.
Advanced settings
Field Separator
Row Separator
On Error
Path and name of the file that holds the rejected rows.
Available when Continue is selected from the On Error list.
Error Count
Rollback
1115
Null Indicator
Session User
Allocation
Extend
Fill Factor
Leaf Fill
A bulk copy from can specify a leaffill value. This clause specifies
the percentage (from 1 to 100) of each B-tree leaf page that must
be filled with rows during the copy. This clause can be used only
on tables with a B-tree storage structure.
Row Estimate
Trailing WhiteSpace
Output
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Limitation
The database server/client must be installed on the same machine where the Studio is installed
or where the Job using tIngresOutputBulkExec is deployed, so that the component functions
properly.
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For details,
see the section about external modules in the Talend Installation and Upgrade Guide.
Drop tIngresConnection, tFileInputDelimited and tIngresOutputBulkExec from the Palette onto the
workspace.
2.
3.
4.
1116
Double-click tIngresConnection to open its Basic settings view in the Component tab.
2.
In the Server field, enter the address of the server where the Ingres DBMS resides, for example "localhost".
Keep the default settings of the Port field.
3.
In the Database field, enter the name of the Ingres database, for example "research".
4.
5.
Double-click tFileInputDelimited to open its Basic settings view in the Component tab.
6.
Select the source file by clicking the [...] button next to the File name/Stream field.
7.
Click the [...] button next to the Edit schema field to open the schema editor.
1117
8.
Click the [+] button to add four columns, for example name, age, job and dept, with the data type as string,
Integer, string and string respectively.
Click OK to close the schema editor.
Click Yes on the pop-up window that asks whether to propagate the changes to the subsequent component.
Leave other default settings unchanged.
9.
Double-click tIngresOutputBulkExec to open its Basic settings view in the Component tab.
10. In the Table field, enter the name of the table for data insertion.
11. In the VNode and Database fields, enter the names of the VNode and the database.
12. In the File Name field, enter the full path of the file that will hold the data of the source file.
2.
1118
Related scenarios
As shown above, the employee data is written to the table employee in the database research on the node
talendbj. Meanwhile, the output file employee_research.csv has been generated at C:/Users/talend/Desktop.
Related scenarios
For related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection,
section Scenario 1: Adding a new column and altering data in a DB table.
1119
tIngresRollback
tIngresRollback
tIngresRollback properties
This component is closely related to tIngresCommit and tIngresConnection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Ingres
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Ingres components, especially with tIngresConnection
and tIngresCommit.
Limitation
n/a
Related scenarios
For tIngresRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1120
tIngresRow
tIngresRow
tIngresRow properties
Component family
Databases/Ingres
Function
tIngresRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tIngresRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Query
1121
Related scenarios
Die on error
Advanced Settings
Propagate
recordset
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
Use PreparedStatement
Usage
Commit every
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1122
tIngresSCD
tIngresSCD
tIngresSCD belongs to two component families: Business Intelligence and Databases. For more information on
it, see section tIngresSCD.
1123
tNetezzaBulkExec
tNetezzaBulkExec
tNetezzaBulkExec properties
Component family
Databases/Netezza
Function
Purpose
As a dedicated component, tNetezzaBulkExec offers gains in performance while carrying out the
Insert operations to a Netezza database
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
File Name
Advanced settings
1124
Field Separator
Escape character
Related scenarios
Date format / Date delimiter Use Date format to distinguish the way years, months and days are
represented in a string. Use Date delimiter to specify the separator
between date values.
Time format/ Time delimiter Use Time format to distinguish the time is represented in a string.
Use Time delimiter to specify the separator between time values.
Dynamic settings
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Max Errors
Enter the maximum error limit that will not stop the process.
Skip Rows
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is mainly used when non particular transformation is required on the data to be
loaded on to the database.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For use cases in relation with tNetezzaBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1125
tNetezzaClose
tNetezzaClose
tNetezzaClose properties
Component family
Databases/Netezza
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1126
tNetezzaCommit
tNetezzaCommit
tNetezzaCommit Properties
This component is closely related to tNetezzaConnection and tNetezzaRollback. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Netezza
Function
tNetezzaCommit validates the data processed through the Job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tNetezzaConnection and tNetezzaRollback. It usually does not make much
sense to use one of these without using a tNetezzaConnection component to open a connection for the current
transaction.
For tNetezzaCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
1127
tNetezzaConnection
tNetezzaConnection
tNetezzaConnection Properties
This component is closely related to tNetezzaCommit and tNetezzaRollback. It usually does not make much
sense to use one of these without using a tNetezzaConnection component to open a connection for the current
transaction.
Component family
Databases/Netezza
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Additional
Parameters
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Usage
This component is to be used along with Netezza components, especially with tNetezzaCommit
and tNetezzaRollback.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For a tNetezzaConnection related scenario, see section Scenario: Inserting data in mother/daughter tables.
1128
tNetezzaInput
tNetezzaInput
tNetezzaInput properties
Component family
Databases/Netezza
Function
Purpose
tNetezzaInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Advanced settings
Table Name
Use cursor
When selected, helps to decide the row set to work with at a time and
thus optimize performance.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
tStatCatcher Statistics
Select this check box to collect log data at the component level.
1129
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Netezza databases.
Limitiation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
Related scenarios for tNetezzaInput are:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1130
tNetezzaNzLoad
tNetezzaNzLoad
This component invokes Netezza's nzload utility to insert records into a Netezza database. This component can
be used either in standalone mode, loading from an existing data file; or connected to an input row to load data
from the connected component.
tNetezzaNzLoad properties
Component family
Databases/Netezza
Function
tNetezzaNzLoad inserts data into a Netezza database table using Netezza's nzload utility.
Purpose
To bulk load data into a Netezza table either from an existing data file, an input flow, or directly
from a data flow in streaming mode through a named-pipe.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Table
Action on table
Data file
Full path to the data file to be used. If this component is used on its
own (not connected to another component with input flow) then this
is the name of an existing data file to be loaded into the database. If
it is connected, with an input flow to another component; this is the
name of the file to be generated and written with the incoming data
to later be used with nzload to load into the database.
Use named-pipe
Select this check box to use a named-pipe instead of a data file. This
option can only be used when the component is connected with an
1131
tNetezzaNzLoad properties
Advanced settings
Named-pipe name
Specify a name for the named-pipe to be used. Ensure that the name
entered is valid.
Select this check box to provide a control file to be used with the
nzload utility instead of specifying all the options explicitly in the
component. When this check box is selected, Data file and the other
nzload related options no longer apply. Please refer to Netezza's
nzload manual for details on creating a control file.
Control file
Enter the path to the control file to be used, between double quotation
marks, or click [...] and browse to the control file. This option is
passed on to the nzload utility via the -cf argument.
Field separator
Parameter
1132
Advanced options
-If
Name of the log file to generate. The logs will be appended if the
log file already exists. If the parameter is not specified, the default
name for the log file is '<table_name>.<db_name>.nzlog'. And
it's generated under the current working directory where the job is
running.
-bf
Name of the bad file to generate. The bad file contains all the
records that could not be loaded due to an internal Netezza error.
The records will be appended if the bad file already exists. If the
parameter is not specified, the default name for the bad file is
'<table_name>.<db_name>.nzbad'. And it's generated under the
current working directory where the job is running.
-ouputDir
Directory path to where the log and the bad file are generated. If the
parameter is not specified the files are generated under the current
directory where the job is currently running.
-logFileSize
Maximum size for the log file. The value is in MB. The default value
is 2000 or 2GB. To save hard disk space, specify a smaller amount
if your job runs often.
tNetezzaNzLoad properties
-compress
Specify this option if the data file is compressed. Valid values are
"TRUE" or "FALSE". Default value if "FALSE".
This option is only valid if this component is used by itself
and not connected to another component via an input flow.
-skipRows <n>
Number of rows to skip from the beginning of the data file. Set the
value to "1" if you like to skip the header row from the data file. The
default value is "0".
This option should only be used if this component is used
by itself and not connected to another component via an
input flow.
-maxRows <n>
-maxErrors
-ignoreZero
Binary zero bytes in the input data will generate errors. Set this option
to "NO" to generate error or to "YES" to ignore zero bytes. The
default value is "NO".
-requireQuotes
-nullValue <token>
Specify the token to indicate a null value in the data file. The default
value is "NULL". To improve slightly performance you can set this
value to an empty field by specifying the value as single quotes: "\'\'".
-fillRecord
Treat missing trailing input fields as null. You do not need to specify
a value for this option in the value field of the table. This option is
not turned on by default, therefore input fields must match exactly
all the columns of the table by default.
Trailing input fields must be nullable in the database.
-ctrlChar
-ctInString
-truncString
-dateStyle
Specify the date format in which the input data is written in.
Valid values are: "YMD", "Y2MD", "DMY", "DMY2", "MDY",
"MDY2", "MONDY", "MONDY2". The default value is "YMD".
The date format of the column in the component's schema
must match the value specified here. For example if you
want to load a DATE column, specify the date format in the
component schema as "yyyy-MM-dd" and the -dateStyle
option as "YMD".
For more description on loading date and time fields, see section
Loading DATE, TIME and TIMESTAMP columns.
1133
tNetezzaNzLoad properties
-dateDelim
Delimiter character between date parts. The default value is "-" for
all date styles except for "MONDY[2]" which is " " (empty space).
The date format of the column in the component's schema
must match the value specified here.
-y2Base
-timeStyle
Specify the time format in which the input data is written in.
Valid values are: "24HOUR" and "12HOUR". The default value is
"24HOUR". For slightly better performance you should keep the
default value.
The time format of the column in the component's schema
must match the value specified here. For example if you
want to load a TIME column, specify the date format in
the component schema as "HH:mm:ss" and the -timeStyle
option as "24HOUR".
For more description on loading date and time fields, see section
Loading DATE, TIME and TIMESTAMP columns.
-timeDelim
-timeRoundNanos
-boolStyle
Specify the format in which Boolean data is written in the data. The
valid values are: "1_0", "T_F", "Y_N", "TRUE_FALSE", "YES".
The default value is "1_0". For slightly better performance keep the
default value.
-allowRelay
Allow load to continue after one or more SPU reset or failed over.
The default behaviour is not allowed.
-allowRelay <n>
Encoding
Select this check box to specify the full path to the nzload executable.
You must check this option if the nzload path is not specified in the
PATH environment variable.
Full path
executable
to
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
This component is mainly used when non particular transformation is required on the data to be
loaded ont to the database.
This component can be used as a standalone or an output component.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
-dateStyle
-dateDelim
-timeStyle
-timeDelim
DATE
"yyyy-MM-dd"
"YMD"
"-"
n/a
n/a
1134
Related scenario
DB Type
-dateStyle
-dateDelim
-timeStyle
-timeDelim
TIME
"HH:mm:ss"
n/a
n/a
"24HOUR"
":"
TIMESTAMP
"yyyy-MM-dd HH:mm:ss"
"YMD"
"-"
"24HOUR"
":"
Related scenario
For a related use case, see section Scenario: Inserting data in MySQL database.
1135
tNetezzaOutput
tNetezzaOutput
tNetezzaOutput properties
Component family
Databases/Netezza
Function
Purpose
tNetezzaOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the designed Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1136
tNetezzaOutput properties
Insert: Add new entries to the table. If duplicates are found, job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection you
are creating. This option is not available if you have selected the Use
an existing connection check box in the Basic settings.
You can press Ctrl+Space to access a list of predefined
global variables.
Select this check box to activate the batch mode for data processing.
In the Batch Size field that appears when this check box is selected,
you can type in the number you need to define the batch size to be
processed.
This check box is available only when you have selected
the Insert, Update or the Delete option in the Action on
data list.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
1137
Related scenarios
Dynamic settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Netezza database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For tNetezzaOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1138
tNetezzaRollback
tNetezzaRollback
tNetezzaRollback properties
This component is closely related to tNetezzaCommit and tNetezzaConnection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Netezza
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenarios
For tNetezzaRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1139
tNetezzaRow
tNetezzaRow
tNetezzaRow properties
Component family
Databases/Netezza
Function
tNetezzaRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means that the component implements a flow in the
job design although it does not provide output.
Purpose
Depending on the nature of the query and the database, tNetezzaRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
1140
Related scenarios
Advanced settings
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Additional
parameters
Propagate
recordset
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 3: Combining two flows for selective output
section Scenario 1: Removing and regenerating a MySQL table index
1141
tNetezzaSCD
tNetezzaSCD
tNetezzaSCD belongs to two component families: Business Intelligence and Databases. For more information on
it, see section tNetezzaSCD.
1142
tParAccelBulkExec
tParAccelBulkExec
tParAccelBulkExec Properties
The tParAccelOutputBulk and tParAccelBulkExec components are generally used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tParAccelOutputBulkExec component, detailed
in a different section. The advantage of using two separate steps is that the data can be transformed before it is
loaded in the database.
Component Family
Databases/ParAccel
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Database name.
Schema
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
1143
tParAccelBulkExec Properties
Advanced settings
Copy mode
Filename
Dynamic settings
File Type
Field Layout
Field separator
Explicit IDs
Remove Quotes
Select this check box to remove quotation marks from the file to be
loaded.
Max. Errors
Date Format
Time/Timestamp Format
Enter the specific, customized ParAccel option that you want to use.
Log file
Browse to or enter the access path to the log file in your directory.
Logging level
Select the information type you want to record in your log file.
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL database queries. It allows you to carry out actions on a
table or on the data of a table in a ParAccel database. It enables you to create a reject flow, with a
Row > Reject link filtering the data in error. For a usage example, see section Scenario 3: Retrieve
data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
1144
Related scenarios
Related scenarios
For a related scenario, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1145
tParAccelClose
tParAccelClose
tParAccelClose properties
Component family
Databases/ParAccel
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1146
tParAccelCommit
tParAccelCommit
tParAccelCommit Properties
This component is closely related to tParAccelConnection and tParAccelRollback. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases/ParAccel
Function
Validates the data processed through the job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tParAccelConnection and tParAccelRollback. It usually does not make
much sense to use one of these without using a tParAccelConnection component to open a connection for the
current transaction.
For tParAccelCommit related scenario, see section tMysqlConnection
1147
tParAccelConnection
tParAccelConnection
tParAccelConnection Properties
This component is closely related to tParAccelCommit and tParAccelRollback. It usually doesnt make much
sense to use one of these without using a tParAccelConnection component to open a connection for the current
transaction.
Component family
Databases/ParAccel
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Schema
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Usage
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
This component is to be used along with ParAccel components, especially with tParAccelCommit
and tParAccelRollback components.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
1148
Related scenario
Related scenario
This component is closely related to tParAccelCommit and tParAccelRollback. It usually does not make much
sense to use one of these without using a tParAccelConnection component to open a connection for the current
transaction.
For tParAccelConnection related scenario, see section tMysqlConnection
1149
tParAccelInput
tParAccelInput
tParAccelInput properties
Component family
Databases/ ParAccel
Function
Purpose
tParAccelInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Advanced settings
1150
Table name
Guess Query
Guess schema
Use cursor
When selected, helps to decide the row set to work with at a time and
thus optimize performance.
Related scenarios
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Dynamic settings
Trim column
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for ParAccel databases.
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
1151
tParAccelOutput
tParAccelOutput
tParAccelOutput Properties
Component Family
Databases/ParAccel
Function
Purpose
tParAccelOutput executes the action defined on the table and/or on the data of a table, according
to the input flow form the previous component.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Database name.
Schema
Username et Password
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1152
tParAccelOutput Properties
Insert: Add new entries to the table. If duplicates are found, job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit Schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
1153
Related scenarios
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL database queries. It allows you to carry out actions on a
table or on the data of a table in a ParAccel database. It enables you to create a reject flow, with a
Row > Rejects link filtering the data in error. For a usage example, see section Scenario 3: Retrieve
data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For a related scenario, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1154
tParAccelOutputBulk
tParAccelOutputBulk
tParAccelOutputBulk properties
The tParAccelOutputBulk and tParAccelBulkExec components are generally used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tParAccelOutputBulkExec component, detailed
in a different section. The advantage of using two separate steps is that the data can be transformed before it is
loaded in the database.
Component family
Databases/ParAccel
Function
Writes a file with columns based on the defined delimiter and the ParAccel standards
Purpose
Prepares the file to be used as parameter in the INSERT query to feed the ParAccel database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the file
Advanced settings
Usage
Row separator
Field separator
Include header
Encoding
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
Select this check box to collect log data at the component level.
This component is to be used along with tParAccelBulkExec component. Used together they
offer gains in performance while feeding a ParAccel database.
Related scenarios
For use cases in relation with tParAccelOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1155
Related scenarios
1156
tParAccelOutputBulkExec
tParAccelOutputBulkExec
tParAccelOutputBulkExec Properties
The tParAccelOutputBulk and tParAccelBulkExec components are generally used together in a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in tParAccelOutputBulkExec.
Component Family
Databases/ParAccel
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Database name.
Schema
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Copy mode
1157
Related scenarios
Advanced settings
File Type
Row separator
Fields terminated by
Append
Select this check box to add the new rows at the end of the file.
Explicit IDs
Remove Quotes
Select this check box to remove quotation marks from the file to be
loaded.
Max. Errors
Date Format
Time/Timestamp Format
Enter the specific, customized ParAccel option that you want to use.
Log file
Browse to or enter the access path to the log file in your directory.
Logging level
Select the information type you want to record in your log file.
Select this check box to collect log data at the component level.
Usage
This component covers all possible SQL database queries. It allows you to carry out actions on a
table or on the data of a table in a ParAccel database. It enables you to create a reject flow, with a
Row > Reject link filtering the data in error. For a usage example, see section Scenario 3: Retrieve
data in error with a Reject link.
Limitation
The database server must be installed on the same machine where the Studio is installed or where
the Job using tParAccelOutputBulkExec is deployed, so that the component functions properly.
Related scenarios
For a related scenario, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1158
tParAccelRollback
tParAccelRollback
tParAccelRollback properties
This component is closely related to tParAccelCommit and tParAccelConnection. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tParAccelConnection and tParAccelCommit. It usually doesnt make much
sense to use one of them without using a tParAccelConnection component to open a connection for the current
transaction.
For tParAccelRollback related scenario, see section tMysqlRollback.
1159
tParAccelRow
tParAccelRow
tParAccelRow Properties
Component Family
Databases/ParAccel
Function
tParAccelRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tParAccelRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Username et Password
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder.
1160
Related scenarios
Advanced settings
Guess Query
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For a related scenario, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1161
Related scenarios
1162
tParAccelSCD
tParAccelSCD
tParAccelSCD belongs to two component families: Business Intelligence and Databases. For more information
on it, see section tParAccelSCD.
1163
tRedshiftClose
tRedshiftClose
tRedshiftClose properties
Component family
Databases/Amazon Redshift
Function
Purpose
This component is used together with tRedShiftConnection and tRedshiftCommit to ensure the
integrity of the transaction performed into the database.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Amazon Redshift components, especially with
tRedshiftConnection and tRedshiftCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1164
tRedshiftCommit
tRedshiftCommit
tRedshiftCommit properties
Component family
Databases/Amazon Redshift
Function
tRedshiftCommit validates the data processed through the Job into the connected database.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Amazon Redshift components, especially with
tRedshiftConnection and tRedshiftRollback components.
Limitation
n/a
Related scenario
For tRedshiftCommit related scenario, see section tMysqlConnection
1165
tRedshiftConnection
tRedshiftConnection
tRedshiftConnection properties
Component family
Databases/Amazon
Redshift
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single transaction,
once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: No property data stored centrally.
Host
Port
Database
Schema
Usage
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job level as well
as at each component level.
This component is to be used along with Amazon Redshift components, especially with tRedshiftCommit
and tRedshiftRollback components.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see the
section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
This component is closely related to tRedshiftCommit and tRedshiftRollback. It usually does not make much
sense to use one of these without using a tRedshiftConnection component to open a connection for the current
transaction.
For tRedshiftConnection related scenario, see section tMysqlConnection
1166
tRedshiftInput
tRedshiftInput
tRedshiftInput properties
Component family
Databases/Amazon Redshift
Function
tRedshiftInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component through a Main row link.
Purpose
tRedshiftInput reads data from a database and extracts fields based on a query so that you may
apply changes to the extracted data.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Table name
Guess Query
Guess schema
1167
Related scenarios
Advanced settings
Use cursor
Select this check box to help to decide the row set to work with at a
time and thus optimize performance.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Dynamic settings
Trim column
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Amazon Redshift databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
1168
tRedshiftOutput
tRedshiftOutput
tRedshiftOutput properties
Component
Family
Databases/Amazon
Redshift
Function
tRedshiftOutput executes the action defined on the table and/or on the data of a table, according to the input flow
from the previous component.
Purpose
tRedshiftOutput allows you to write, update, modify or delete the data in a database.
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection component
to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an
existing connection between the two levels, for example, to share the connection
created by the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
Host
Port
Database
Database name.
Schema
Username
Password
Table
Name of the table to which the data will be written. Note that only one table can be written
at a time.
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create a table: The table is removed and created again.
Create a table: The table does not exist and gets created.
Create a table if not exists: The table is created if it does not exist.
Drop a table if exists and create: The table is removed if already exists and created again.
Clear a table: The table content is deleted.
Action on data
1169
tRedshiftOutput properties
Insert or update: inserts a new record. If the record with the given reference already exists,
an update would be made.
Update or insert: updates the record with the given reference. If the record does not exist,
a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary key on which the
Update and Delete operations are based. You can do that by clicking Edit Schema
and selecting the check box(es) next to the column(s) you want to set as primary
key(s). For an advanced use, click the Advanced settings view where you can
simultaneously define primary keys for the Update and Delete operations. To do
that: Select the Use field options check box and then in the Key in update column,
select the check boxes next to the column names you want to use as a base for
the Update operation. Do the same in the Key in delete column for the Delete
operation.
Schema
Schema
and
Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed
on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: You create and store the schema locally for this component only. Related topic:
see Talend Studio User Guide.
Advanced
settings
Die on error
This check box is selected by default. Clear the check box to skip the row on error and
complete the process for error-free rows. If needed, you can retrieve the rows on error
through a Row > Rejects link.
Extend Insert
Select this check box to carry out a bulk insert of a defined set of lines instead of inserting
lines one by one. The gain in system performance is considerable.
Number of rows per insert: enter the number of rows to be inserted per operation. Note
that the higher the value specified, the lower performance levels shall be due to the increase
in memory demands.
This option is not compatible with the Reject link. You should therefore clear the
check box if you are using a Row > Rejects link with this component.
Commit every
Enter the number of rows to be completed before committing batches of rows together into
the database. This option ensures transaction quality (but not rollback) and, above all, better
performance at execution.
Additional Columns
This option is not offered if you create (with or without drop) the DB table. This option
allows you to call SQL functions to perform actions on columns, which are not insert, nor
update or delete actions, or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or inserted as new column.
SQL expression: Type in the SQL statement to be executed in order to alter or insert the
relevant column data.
Position: Select Before, Replace or After following the action to be performed on the
reference column.
Reference column: Type in a column of reference that the tDBOutput can use to place or
replace the new or altered column.
Select this check box to customize a request, especially when there is double action on data.
Catcher Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
1170
Related scenarios
Usage
This component covers all possible SQL database queries. It allows you to carry out actions on a table or on the data
of a table in an Amazon Redshift database. It enables you to create a reject flow, with a Row > Rejects link filtering
the data in error. For a usage example, see section Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You can easily
find out and add such JARs in the Integration perspective of your studio. For details, see the section about external
modules in the Talend Installation and Upgrade Guide.
Related scenarios
For a related scenario, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1171
tRedshiftRollback
tRedshiftRollback
tRedshiftRollback properties
Component family
Databases/Amazon Redshift
Function
Purpose
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Amazon Redshift components, especially with
tRedshiftConnection and tRedshiftCommit components.
Limitation
n/a
Related scenario
For tRedshiftRollback related scenario, see section tMysqlRollback.
1172
tRedshiftRow
tRedshiftRow
tRedshiftRow properties
Component Family
Databases/Amazon Redshift
Function
tRedshiftRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it does not provide output.
Purpose
Depending on the nature of the query and the database, tRedshiftRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Schema
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder.
1173
Related scenarios
Advanced settings
Guess Query
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error through a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Select this check box if you want to query the database using
a PreparedStatement. In the Set PreparedStatement Parameter
table, define the parameters represented by ? in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same
query several times. Performance levels are increased.
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the database query and covers all possible SQL
queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For a related scenario, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1174
Related scenarios
1175
tTeradataClose
tTeradataClose
tTeradataClose properties
Component family
Databases/Teradata
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1176
tTeradataCommit
tTeradataCommit
tTeradataCommit Properties
This component is closely related to tTeradataConnection and tTeradataRollback. It usually does not make
much sense to use these components independently in a transaction.
Component family
Databases/Teradata
Function
tTeradataCommit validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tTeradataConnection and tTeradataRollback. It usually does not make
much sense to use one of these without using a tTeradataConnection component to open a connection for the
current transaction.
For tTeradataCommit related scenario, see section tVerticaConnection
1177
tTeradataConnection
tTeradataConnection
tTeradataConnection Properties
This component is closely related to tTeradataCommit and tTeradataRollback. It usually doesnt make much
sense to use one of these without using a tTeradataConnection component to open a connection for the current
transaction.
Component family
Databases/Teradata
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Database
Additional
parameters
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Utilisation
Auto commit
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component is to be used along with Teradata components, especially with tTeradataCommit
and tTeradataRollback components.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
1178
Related scenario
Related scenario
This component is closely related to tTeradataCommit and tTeradataRollback. It usually doesnt make much
sense to use one of these without using a tTeradataConnection component to open a connection for the current
transaction.
For tTeradataConnection related scenario, see section tMysqlConnection.
1179
tTeradataFastExport
tTeradataFastExport
tTeradataFastExport Properties
Component Family
Databases/Teradata
Function
tTeradataFastExport exports rapidly voluminous data batches from a Teradata table or view.
Purpose
Basic settings
Use Commandline
Property type
Execution platform
Host
Database name
Database name.
Table
Name of the table to be written. Note that only one table can be
written at a time.
Use query
Select this check box to show the Query box where you can enter
the SQL statement.
Available in the Use Commandline mode.
Query
Log database
Log table
Browse your directory and select the destination of the file which
will be created.
Available in the Use Commandline mode.
Exported file
Field separator
Row separator
1180
Related scenario
Error file
Browse your directory and select the destination of the file where
the error messages will be recorded.
Available in the Use Commandline mode.
Advanced settings
Output
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component offers the flexibility benefit of the DB query and covers all possible SQL queries.
Limitation
If you have selected the Use Commandline mode, you need to install the Teradata client on the
machine where there is the Job that involves this component.
Related scenario
No scenario is available for this component yet.
1181
tTeradataFastLoad
tTeradataFastLoad
tTeradataFastLoad Properties
Component Family
Databases/Teradata
Function
Purpose
tTeradataFastLoad executes a database query according to a strict order which must be the same
as the one in the schema. The retrieve list of fields is then transfered to the next component, using
a connection flow (Main row).
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Database
Database name.
Table
Name of the table to be written. Note that only one table can be
written at a time.
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Advanced settings
Additional
parameters
tStatCatcher Statistics
Usage
This component offers the flexibility benefit of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
No scenario is available for this component yet.
1182
tTeradataFastLoadUtility
tTeradataFastLoadUtility
tTeradataFastLoadUtility Properties
Component Family
Databases/Teradata
Function
Purpose
tTeradataFastLoadUtility executes a database query according to a strict order which must be the
same as the one in the schema. The retrieve list of fields is then transfered to the next component,
using a connection flow (Main row).
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Execution platform
Host
Database name
Database name.
Table
Name of the table to be written. Note that only one table can be
written at a time.
Advanced settings
Browse your directory and select the destination of the file which
will be created.
Load file
Browse your directory and select the file from which you want to
load data.
Field separator
Error file
Browse your directory and select the destination of the file where the
error messages will be recorded.
Specify the character encoding you need use for your system.
Check point
Error files
Enter the file name where the error messages are stored. By default,
the code ERRORFILES table_ERR1, table_ERR2 is entered,
meaning that the two tables table_ERR1 and table_ERR2 are used to
record the error messages.
Select this check box to specify the exit code number to indicate the
point at which an error message should display in the console.
ERRLIMIT
Enter the limit number of errors detected during the loading phase.
Processing stops when the limit is reached.
The default error limit value is 1000000.
For more information, see Teradata FastLoad Reference
documentation.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
1183
Related scenario
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenario
For related topic, see section Scenario: Inserting data into a Teradata database table.
1184
tTeradataInput
tTeradataInput
tTeradataInput Properties
Component family
Databases/Teradata
Function
Purpose
tTeradataInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Additional
parameters
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
tStatCatcher Statistics
Select this check box to collect log data at the component level.
1185
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Teradata databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1186
tTeradataMultiLoad
tTeradataMultiLoad
tTeradataMultiLoad Properties
Component Family
Databases/Teradata
Function
Purpose
tTeradataMultiLoad executes a database query according to a strict order which must be the same
as the one in the schema. The retrieve list of fields is then transfered to the next component, using
a connection flow (Main row).
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Execution platform
Host
Database name
Database name.
Table
Name of the table to be written. Note that only one table can be
written at a time.
Browse your directory and select the destination of the file which
will be created.
Action to data
Where condition in case Type in a condition, which, once verified, will delete the row.
Delete
Load file
Browse your directory and select the file from which you want to
load data.
Field separator
Error file
Browse your directory and select the destination of the file where the
error messages will be recorded.
1187
Related scenario
Advanced settings
This check box is selected to define a log table you want to use
in place of the default one that is the database table you defined
in Basic settings. The syntax required to define the log table is
databasename.logtablename.
BEGIN LOAD
Usage
Select this check box to specify the exit code number to indicate the
point at which an error message should display in the console.
Specify the character encoding you need use for your system
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenario
For related topic, see section Scenario: Inserting data into a Teradata database table.
1188
tTeradataOutput
tTeradataOutput
tTeradataOutput Properties
Component family
Databases/Teradata
Function
Purpose
tTeradataOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time.
Action on table
Create
This is not visible by default, until you choose to create a table from
the Action on table drop-down list. The table to be created may be:
1189
tTeradataOutput Properties
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection you
are creating. This option is not available if you have selected the Use
an existing connection check box in the Basic settings.
This is intended to allow specific character set support. E.G.:
CHARSET=KANJISJIS_OS to get support of Japanese characters.
You can press Ctrl+Space to access a list of predefined
global variables.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
1190
Related scenarios
Dynamic settings
Select this check box to display each step during processing entries
in a database.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Teradata database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection
section Scenario 1: Adding a new column and altering data in a DB table.
1191
tTeradataRollback
tTeradataRollback
tTeradataRollback Properties
This component is closely related to tTeradataCommit and tTeradataConnection. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases/Teradata
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
For tTeradataRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1192
tTeradataRow
tTeradataRow
tTeradataRow Properties
Component family
Databases/Teradata
Function
tTeradataRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tTeradataRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Query
1193
Related scenarios
Advanced settings
Commit every
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Additional
parameters
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output .
1194
Related scenarios
1195
tTeradataTPTExec
tTeradataTPTExec
tTeradataTPTExec Properties
Component Family
Databases/Teradata
Function
Purpose
tTeradataTPTExec offers high performance in inserting data from an existing file to a table in
the Teradata Database.
Basic settings
Action on data
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Execution platform
TDPID
Database name
Load Operator
Data Connector
Job Name
1196
Related scenario
Advanced settings
Usage
Layout Name(schema)
Table
Name of the table to be written into the Teradata database. Note that
only one table can be written at a time.
Browse your directory and select the destination of the file which
will be created.
Load file
Browse your directory and select the file to insert data to the Teradata
Database.
Error file
Browse your directory and select the destination of the file where the
error messages will be recorded.
Field separator
Select this check box to specify the exit code number to indicate
the point at which an error message should display in the console.
For further information about this error, see Teradata MultiLoad
Reference.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Used as a single-component Job or Sub-Job, this component offers high performance in inserting
data from an existing file to a table in the Teradata Database. For further information about the
usage of this component, see Teradata Parallel Transporter Reference.
Related scenario
For related topic, see section Scenario: Inserting data into a Teradata database table.
1197
tTeradataTPTUtility
tTeradataTPTUtility
tTeradataTPTUtility Properties
Component Family
Databases/Teradata
Function
Purpose
tTeradataTPTUtility writes the incoming data to a file and then loads the data from the file to
the Teradata Database.
Basic settings
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Filename
Append
Select this check box to append the work table to the path set in the
Filename field.
Action on data
Execution platform
TDPID
Database name
Load Operator
Data Connector
Job Name
1198
Related scenario
Table
Name of the table to be written into the Teradata database. Note that
only one table can be written at a time.
Browse your directory and select the destination of the file which
will be created.
Where condition in case Type in script as a condition, which, once verified, will delete the
Delete
row.
Advanced settings
Usage
Error file
Browse your directory and select the destination of the file where the
error messages will be recorded.
Row separator
Field separator
Include header
Select this check box to include the column header to the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box to specify the exit code number to indicate
the point at which an error message should display in the console.
For further information about this error, see Teradata MultiLoad
Reference.
Select this check box to collect log data at the component level.
Preceded by an input component, tTeradataTPTUtility writes the incoming data to a file and
then loads the data from the file to the Teradata Database. High performance is provided during
this process. For further information about the usage of this component, see Teradata Parallel
Transporter Reference.
Related scenario
For related topic, see section Scenario: Inserting data into a Teradata database table.
1199
tTeradataTPump
tTeradataTPump
tTeradataTPump Properties
Component Family
Databases/Teradata
Function
Purpose
tTeradataTPump executes a database query according to a strict order which must be the same
as the one in the schema. The retrieve list of fields is then transfered to the next component, using
a connection flow (Main row).
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Execution platform
Host
Database name
Database name.
Table
Name of the table to be written. Note that only one table can be
written at a time.
Browse your directory and select the destination of the file which
will be created.
Action to data
Where condition in case Type in a condition, which, once verified, will delete the row.
Delete
1200
Load file
Browse your directory and select the file from which you want to
load data.
Field separator
Error file
Browse your directory and select the destination of the file where the
error messages will be recorded.
Advanced settings
This check box is selected to define a log table you want to use
in place of the default one that is the database table you defined
in Basic settings. The syntax required to define the log table is
databasename.logtablename.
BEGIN LOAD
Usage
Select this check box to specify the exit code number to indicate the
point at which an error message should display in the console.
Specify the character encoding you need use for your system
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries.
Dropping components
1.
Drop the required components: tRowGenerator, tFileOutputDelimited and tTeradataTPump from the
Palette onto the design workspace.
2.
3.
1201
3.
Next to File Name, browse to the output file or enter a name for the output file to be created.
4.
Between double quotation marks, enter the delimiters to be used next to Row Separator and Field Separator.
1202
Click Edit schema and check that the schema matches the input schema. If need be, click Sync Columns.
5.
6.
Enter the Database name, User name and Password in accordance with your database authentication
information.
7.
Specify the Table into which you want to insert the customer data. In this scenario, it is called mytable.
8.
In the Script generated folder field, browse to the folder in which you want to store the script files generated.
9.
In the Load file field, browse to the file which contains the customer data.
10. In the Error file field, browse to the file in which you want to log the error information.
11. In the Action on data field, select Insert.
1203
2.
3.
4.
On the Advanced settings tab, select the Return tpump error check box and type in the exit code number
to indicate the point at which an error message should be displayed in the console. In this example, enter the
number 4 and use the default values for the other parameters.
5.
6.
An exception error occurs and TPump returned exit code 12 is displayed. If you need to view detailed
information about the exception error, you can open the log file stored in the directory you specified in the
Error file field in the Basic settings tab of the Component view.
1204
tVectorWiseCommit
tVectorWiseCommit
tVectorWiseCommit Properties
This component is closely related to tVectorWiseConnection and tVectorWiseRollback. It usually doesnt make
much sense to use these components independently in a transaction.
Component family
Databases/VectorWise
Function
tVectorWiseCommit validates the data processed in a Job into the connected DB.
Purpose
Using a single connection, this component commits a global transaction in one go instead of doing
so on every row or every batch. This provides a gain in performance
Basic settings
Component list
Close connection
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
VectorWise
components,
notably
Related scenario
This component is closely related to tVectorWiseConnection and tVectorWiseRollback. It usually doesnt make
much sense to use one of these without using a tVectorWiseConnection component to open a connection for
the current transaction.
For a tVectorWiseCommit related scenario, see section tVerticaConnection.
1205
tVectorWiseConnection
tVectorWiseConnection
tVectorWiseConnection Properties
This component is closely related to tVectorWiseCommit and tVectorWiseRollback. It usually doesnt make
much sense to use one of these without using a tVectorWiseConnection component to open a connection for
the current transaction.
Component family
Databases/VectorWise
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Server
Port
Database
Username et Password
Use or register a shared DB Select this check box to share your connection or retrieve a
Connection
connection shared by a parent or child Job. This allows you to
share one single DB connection among several DB connection
components from different Job levels that can be either parent or
child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Usage
Auto Commit
tStatCatcher Statistics
Select this check box to collect log data at the component level.
VectorWise
components,
particularly
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
This component is closely related to tVectorWiseCommit and tVectorWiseRollback. It usually doesnt make
much sense to use one of these without using a tVectorWiseConnection component to open a connection for
the current transaction.
1206
Related scenario
1207
tVectorWiseInput
tVectorWiseInput
tVectorWiseInput Properties
Component family
Databases/VectorWise
Function
Purpose
tVectorWiseInput executes a DB query with a strictly defined order which must correspond to
the schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Server
Port
Database
Username a Password
Advanced settings
Table name
Enter your DB query, ensuring that the field order matches the order
in the schema.
Guess Query
Guess schema
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
1208
Related scenario
Select this check box to collect log data at the component level.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tVectorWiseInput related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1209
tVectorWiseOutput
tVectorWiseOutput
tVectorWiseOutput Properties
Component family
Databases/VectorWise
Function
Purpose
tVectorWiseOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time.
Action on table
Action on data
1210
tVectorWiseOutput Properties
Insert: Add new entries to the table. If duplicates are found, job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit Schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column.
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Select this check box to display each step during processing entries
in a database.
Support null in
WHERE statement
SQL Select this check box if you want to deal with the Null values
contained in a DB table.
Ensure that the Nullable check box is selected for the
corresponding columns in the schema.
Select this check box to collect log data at the component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
1211
Related scenario
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Vertica database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For tVectorWiseOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1212
tVectorWiseRollback
tVectorWiseRollback
tVectorWiseRollback Properties
This component is closely related to tVectorWiseCommit and tVectorWiseConnection. It usually doesnt make
much sense to use these components independently in a transaction.
Component family
Databases/VectorWise
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
For a tVectorWiseRollback related scenario, see section Scenario: Rollback from inserting data in mother/
daughter tables.
1213
tVectorWiseRow
tVectorWiseRow
tVectorWiseRow Properties
Component family
Databases/VectorWise
Function
tVectorWiseRow is the specific component for this database query. It executes the SQL query
stated in the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tVectorWiseRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
your SQL statements easily.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in the query statement manually or build it graphically
using the SQLBuilder.
1214
Related scenario
Advanced settings
Guess Query
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Usage
Commit every
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1215
tVerticaBulkExec
tVerticaBulkExec
tVerticaBulkExec Properties
The tVerticaOutputBulk and tVerticaBulkExec components are generally used together as parts of a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tVerticaOutputBulkExec component, detailed
in a separate section. The advantage of using two separate components is that the data can be transformed before
it is loaded in the database.
Component family
Databases/Vertica
Function
Purpose
As a dedicated component, tVerticaBulkExec offers gains in performance while carrying out the
Insert operations to a Mysql database
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Action on table
1216
Related scenarios
Clear table: The table content is deleted. You have the possibility
to rollback the operation.
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
File Name
Advanced settings
Write to ROS
Optimized Store)
(Read Select this check box to store the data in a physical storage area,
in order to optimize the reading, as the data is compressed and presorted.
Exit job if no row was loaded The Job automatically stops if no row has been loaded.
Dynamic settings
Fields terminated by
Null string
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with tVerticaOutputBulk component. Used together, they
can offer gains in performance while feeding a Vertica database.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1217
tVerticaClose
tVerticaClose
tVerticaClose properties
Component family
Databases/Vertica
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Vertica components, especially with tVerticaConnection
and tVerticaCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1218
tVerticaCommit
tVerticaCommit
tVerticaCommit Properties
This component is closely related to tVerticaConnection and tVerticaRollback. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Vertica
Function
tVerticaConnection validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mysql components, especially with tVerticaConnection
and tVerticaRollback components.
Limitation
n/a
Related scenario
This component is closely related to tVerticaConnection and tVerticaRollback. It usually does not make much
sense to use one of these without using a tVerticaConnection component to open a connection for the current
transaction.
For tVerticaCommit related scenario, see section tVerticaConnection
1219
tVerticaConnection
tVerticaConnection
tVerticaConnection Properties
This component is closely related to tVerticaCommit and tVerticaRollback. It usually does not make much
sense to use one of these without using a tVerticaConnection component to open a connection for the current
transaction.
Component family
Databases/Vertica
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Select the version of Vertica you are using from the list.
Host
Port
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
tStatCatcher Statistics
Utilisation
This component is to be used along with Vertica components, especially with tVerticaCommit
and tVerticaRollback components.
Select this check box to collect log data at the component level.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
This component is closely related to tVerticaCommit and tVerticaRollback. It usually does not make much
sense to use one of these without using a tVerticaConnection component to open a connection for the current
transaction.
1220
Related scenario
1221
tVerticaInput
tVerticaInput
tVerticaInput Properties
Component family
Databases/Vertica
Function
Purpose
tVerticaInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Select the version of Vertica you are using from the list.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Advanced settings
1222
Table Name
Enter your DB query, ensuring that the field order matches the order
in the schema.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Related scenarios
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Vertica databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1223
tVerticaOutput
tVerticaOutput
tVerticaOutput Properties
Component family
Databases/Vertica
Function
Purpose
tVerticaOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in
mode is available.
Built-in: No property data stored centrally.
DB Version
Select the version of Vertica you are using from the list.
Use an existing connection Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time.
Action on table
Action on data
1224
tVerticaOutput Properties
Insert: Add new entries to the table. If duplicates are found, job stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
Copy: Read data from a text file and insert tuples of entries into
the WOS (Write Optimized Store) or directly into the ROS (Read
Optimized Store). This option is ideal for bulk loading. For further
information, see your Vertica SQL Reference Manual.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the
Key in update column, select the check boxes next to the
column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Advanced settings
Commit every
Copy parameters
Abort on error
Select this check box to stop the Copy operation on data if a row is
rejected and rolls back this operation. Thus no data is loaded.
Maximum rejects
No commit
Exception file
Type in the path to, or browse to the file in which messages are
written indicating the input line number and the reason for each
rejected data record.
Type in the path to, or browse to the file in which to write rejected
rows. This file can then be edited to resolve problems and reloaded.
1225
tVerticaOutput Properties
Type in the node of the rejected data file. If not specified, operations
default to the querys initiator node.
Select this check box to activate the batch mode for data processing.
In the Batch Size field that appears when this check box is selected,
you can type in the number you need to define the batch size to be
processed.
This check box is available only when you have selected
the Insert, the Update, the Delete or the Copy option in
the Action on data field.
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Select this check box to display each step during processing entries
in a database.
Support null in
WHERE" statement
"SQL Select this check box to allow for the Null value in the "SQL
WHERE" statement.
Create projection
create table
when Select this check box to create a projection for a table to be created.
This check box is available only when you have selected
the table creation related option in the Action on table
field.
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Vertica database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
1226
Related scenarios
Related scenarios
For tVerticaOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection
section Scenario 1: Adding a new column and altering data in a DB table.
1227
tVerticaOutputBulk
tVerticaOutputBulk
tVerticaOutputBulk Properties
The tVerticaOutputBulk and tVerticaBulkExec components are generally used together as parts of a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tVerticaOutputBulkExec component, detailed
in a separate section. The advantage of using two separate components is that the data can be transformed before
it is loaded in the database.
Component family
Databases/Vertica
Function
tVerticaBulkOutputExec writes a file with columns based on the defined delimiter and the
Vertica standards.
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the file.
Advanced settings
Utilisation
Row separator
Field separator
Include header
Select this check box to include the column header to the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component is to be used along with tVerticaBulkExec. Used together, they offer gains
in performance while feeding a Vertica database.
Related scenarios
For use cases in relation with tVerticaOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
1228
Related scenarios
1229
tVerticaOutputBulkExec
tVerticaOutputBulkExec
tVerticaOutputBulkExec Properties
The tVerticaOutputBulk and tVerticaBulkExec components are generally used together as parts of a two step
process. In the first step, an output file is generated. In the second step, this file is used in the INSERT operation
used to feed a database. These two steps are fused together in the tVerticaOutputBulkExec component.
Component family
Databases/Vertica
Function
Purpose
Basic settings
Property Type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Select the version of Vertica you are using from the list.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
Host
Port
DB Name
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
File Name
1230
Related scenarios
Write to ROS
Optimized Store)
Select this check box to add the new rows at the end of the file
(Read Select this check box to store the data in a physical storage area,
in order to optimize the reading, as the data is compressed and presorted.
Exit job if no row was loaded The Job automatically stops if no row has been loaded.
Usage
Field Separator
Null string
Include header
Select this check box to include the column header to the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component is mainly used when no particular transformation is required on the data to be
loaded onto the database.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For use cases in relation with tVerticaOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
1231
tVerticaRollback
tVerticaRollback
tVerticaRollback Properties
This component is closely related to tVerticaCommit and tVerticaConnection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Vertica
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Mysql components, especially with tVerticaConnection
and tVerticaCommit components.
Limitation
n/a
Related scenario
For tVerticaRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1232
tVerticaRow
tVerticaRow
tVerticaRow Properties
Component family
Databases/Vertica
Function
tVerticaRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it does not provide output.
Purpose
Depending on the nature of the query and the database, tVerticaRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Select the version of Vertica you are using from the list.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Port
Database
Table name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in the query statement manually or build it graphically
using the SQLBuilder.
1233
Related scenario
Advanced settings
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Select this check box if you want to query the database using
a PreparedStatement. In the Set PreparedStatement Parameter
table, define the parameters represented by ? in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same
query several times. Performance levels are increased
Dynamic settings
Commit every
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1234
tCassandraBulkExec
tCassandraBulkExec
tCassandraBulkExec belongs to two component families: Big Data and Databases. For more information about
tCassandraBulkExec, see section tCassandraBulkExec.
1236
tCassandraClose
tCassandraClose
tCassandraClose belongs to two component families: Big Data and Databases. For more information about
tCassandraClose, see section tCassandraClose.
1237
tCassandraConnection
tCassandraConnection
tCassandraConnection belongs to two component families: Big Data and Databases. For more information about
tCassandraConnection, see section tCassandraConnection.
1238
tCassandraInput
tCassandraInput
tCassandraInput belongs to two component families: Big Data and Databases. For more information about
tCassandraInput, see section tCassandraInput.
1239
tCassandraOutput
tCassandraOutput
tCassandraOutput belongs to two component families: Big Data and Databases. For more information about
tCassandraOutput, see section tCassandraOutput.
1240
tCassandraOutputBulk
tCassandraOutputBulk
tCassandraOutputBulk belongs to two component families: Big Data and Databases. For more information about
tCassandraOutputBulk, see section tCassandraOutputBulk.
1241
tCassandraOutputBulkExec
tCassandraOutputBulkExec
tCassandraOutputBulkExec belongs to two component families: Big Data and Databases. For more information
about tCassandraOutputBulkExec, see section tCassandraOutputBulkExec.
1242
tCassandraRow
tCassandraRow
tCassandraRow belongs to two component families: Big Data and Databases. For more information about
tCassandraRow, see section tCassandraRow.
1243
tCouchbaseClose
tCouchbaseClose
tCouchbaseClose belongs to two component families: Big Data and Databases. For more information about
tCouchbaseClose, see section tCouchbaseClose.
1244
tCouchbaseConnection
tCouchbaseConnection
tCouchbaseConnection belongs to two component families: Big Data and Databases. For more information about
tCouchbaseConnection, see section tCouchbaseConnection.
1245
tCouchbaseInput
tCouchbaseInput
tCouchbaseInput belongs to two component families: Big Data and Databases. For more information about
tCouchbaseInput, see section tCouchbaseInput.
1246
tCouchbaseOutput
tCouchbaseOutput
tCouchbaseOutput belongs to two component families: Big Data and Databases. For more information about
tCouchbaseOutput, see section tCouchbaseOutput.
1247
tCouchDBClose
tCouchDBClose
tCouchDBClose belongs to two component families: Big Data and Databases. For more information about
tCouchDBClose, see section tCouchDBClose.
1248
tCouchDBConnection
tCouchDBConnection
tCouchDBConnection belongs to two component families: Big Data and Databases. For more information about
tCouchDBConnection, see section tCouchDBConnection.
1249
tCouchDBInput
tCouchDBInput
tCouchDBInput belongs to two component families: Big Data and Databases. For more information about
tCouchDBInput, see section tCouchDBInput.
1250
tCouchDBOutput
tCouchDBOutput
tCouchDBOutput belongs to two component families: Big Data and Databases. For more information about
tCouchDBOutput, see section tCouchDBOutput.
1251
tCreateTable
tCreateTable
tCreateTable Properties
Component family
Databases
Function
tCreateTable creates, drops and creates and clears the specified table.
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Database Type
Select the DBMS type from the list. The component properties may
differ slightly according to the database type selected from the list.
DB Version
Table Action
Use an existing connection Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database name
Schema
1252
Table name
tCreateTable Properties
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Advanced settings
tStatCatcher Statistics
Additional
Parameters
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Create projection
Usage
This component offers the flexibility of the database query and covers all possible SQL queries.
More scenarios are available for specific database Input components.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of
the Talend Installation and Upgrade Guide.
Database-specific fields:
Access
Access File
Firebird
Firebird File
HSQLDb
Running Mode
DB Alias
Case Sensitive
Interbase
Interbase File
JavaDb
Framework Type
Structure type
DB Root Path
Mysql
Temporary table
ODBC
ODBC Name
Oracle
Connection Type
SQLite
SQLite File
1253
Create
1.
Drop a tCreateTable component from the Databases family in the Palette to the design workspace.
2.
In the Basic settings view, and from the Database Type list, select Mysql for this scenario.
3.
4.
Select the Use Existing Connection check box only if you are using a dedicated DB connection component
section tMysqlConnection. In this example, we wont use this option.
5.
6.
In the Table Name field, fill in a name for the table to be created.
7.
1254
8.
Click the Reset DB Types button in case the DB type column is empty or shows discrepancies (marked in
orange). This allows you to map any data type to the relevant DB data type. Then, click OK to validate your
changes and close the dialog box.
9.
The table is created empty but with all columns defined in the Schema.
1255
tDBInput
tDBInput
tDBInput properties
Component family
Databases/DB Generic
Function
Purpose
tDBInput executes a DB query with a strictly defined order which must correspond to the schema
definition. Then it passes on the field list to the next component via a Main row link.
For performance reasons, a specific Input component (e.g.: tMySQLInput for MySQL
database) should always be preferred to the generic component.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Database
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Query
Advanced settings
Additional
parameters
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Usage
1256
Trim column
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries using
a generic ODBC connection.
1.
Drop a tDBInput and tLogRow component from the Palette to the design workspace.
2.
3.
Double-click tDBInput to open its Basic settings view in the Component tab.
4.
Fill in the database name, the username and password in the corresponding fields.
5.
Click Edit Schema and create a 2-column description including shop code and sales.
6.
7.
Type in the query making sure it includes all columns in the same order as defined in the Schema. In this
case, as well select all columns of the schema, the asterisk symbol makes sense.
8.
9.
10. Now go to the Run tab, and click on Run to execute the Job.
The DB is parsed and queried data is extracted from the specified table and passed on to the job log console.
You can view the output file straight on the console.
1257
Use the same scenario as scenario 1 above and add a third component, tJava.
2.
Connect tDBInput to tJava using a trigger connection of the OnComponentOk type. In this case, we want
the tDBInput to run before the tJava component.
3.
4.
Click anywhere on the design workspace to display the Contexts property panel.
5.
Create a new parameter called explicitly StoreSQLQuery. Enter a default value of 1. This value of 1 means
the StoreSQLQuery is true for a use in the QUERY global variable.
6.
Click
on
the
tJava
component
to
display
the
Component
view.
Enter
the
System.Out.println()command to display the query content, press Ctrl+Space bar to access the
1258
7.
8.
The query entered in the tDBInput component shows at the end of the job results, on the log:
1259
tDBOutput
tDBOutput
tDBOutput properties
Component family
Databases/DB Generic
Function
Purpose
tDBOutput executes the action defined on the data in a table, based on the flow incoming from
the preceding component in the Job.
Specific Output component should always be preferred to generic component.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on data
1260
Select this check box to delete data in the selected table before any
operation.
Scenario: Writing a row to a table in the MySql database via an ODBC connection
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the database connection
you are creating.
You can set the encoding parameters through this field.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After depending on the action
to be performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Usage
Select this check box to display each step during processing entries
in a database.
Use java.sql.Statement
Select this check box to use the Statement object in case the
PreparedStatement object is not supported by certain DBMSs.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on the
data of a table in a database. It also allows you to create a reject flow using a Row > Rejects link
to filter data in error. For a related scenario, see section Scenario 3: Retrieve data in error with
a Reject link.
1.
Drop tDBOutput and tRowGenerator from the Palette to the design workspace.
2.
1261
Scenario: Writing a row to a table in the MySql database via an ODBC connection
3.
4.
5.
Double-click tDBOutput to open its Basic settings view in the Component tab.
6.
In the Database field, enter the name of the data source defined during the configuration of the MySql ODBC
connection.
To configure an ODBC connection, click
1262
Scenario: Writing a row to a table in the MySql database via an ODBC connection
7.
In the Username and Password fields, enter the database authentication credentials.
8.
In the Table field, enter the table name, Date in this example.
9.
In the Action on data field, select Insert to insert a line to the table.
10. Select the check box Clear data in table to clear the table before the insertion.
11. Save the Job and press F6 to run.
As shown above, the table now has only one line about the current date and time.
1263
tDBSQLRow
tDBSQLRow
tDBSQLRow properties
Component family
Databases/DB Generic
Function
tDBSQLRow is the generic component for database query. It executes the SQL query stated onto
the specified database. The row suffix means the component implements a flow in the job design
although it does not provide output.
For performance reasons, specific DB component should always be preferred to the
generic component.
Purpose
Depending on the nature of the query and the database, tDBSQLRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
To use this component, relevant DBMSs' ODBC drivers should be installed and the
corresponding ODBC connections should be configured via the database connection
configuration wizard.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Datasource
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Advanced settings
Query
Die on error
Additional
parameters
Propagate
recordset
1264
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
Use PreparedStatement
Select this check box if you want to query the database using
a PreparedStatement. In the Set PreparedStatement Parameter
table, define the parameters represented by ? in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same
query several times. Performance levels are increased
Usage
Commit every
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries.
Note that the relevant DBRow component should be preferred according to your DBMSs. Most of
the DBMSs have their specific DBRow components.
1.
Drag and drop a tDBSQLRow component from the Palette to the design workspace.
2.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
3.
4.
The Schema is built-in for this Job and it does not really matter in this example as the action is made on the
table auto-increment and not on data.
1265
5.
The Query type is also built-in. Click on the [...] button next to the Query statement box to launch the
SQLbuilder editor, or else type in directly in the statement box:
Alter table <TableName> auto_increment = 1
6.
1266
tEXAInput
tEXAInput
tEXAInput properties
Component family
Databases/EXA
Function
Purpose
tEXAInput executes queries in databases according to a strict order which must correspond exactly
to that defined in the schema. The list of fields retrieved is then transmitted to the following
component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No properties stored centrally
Host name
Port
Schema name
Username et Password
Advanced settings
Table Name
Enter your database query, taking care to ensure that the order of the
fields corresponds exactly to that defined in the schema.
Guess Query
Guess schema
Additional
parameters
Trim all the String/Char Select this check box to delete the spaces at the start and end of fields
columns
in all of the columns containing strings.
Usage
Trim column
Deletes the spaces from the start and end of fields in the selected
columns.
tStatCatcher Statistics
Select this check box to collect the log data and a component level.
This component covers all possible SQL queries for EXA databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For scenarios in which tEXAInput might be used, see the following tBIInput scenarios:
1267
Related scenarios
1268
tEXAOutput
tEXAOutput
tEXAOutput properties
Component family
Databases/EXA
Function
Purpose
tEXAOutput executes the action defined on the table and/or on the table data, depending on the
function of the input flow, from the preceding component.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No properties stored centrally.
Host
Port
Schema name
Table
Name of the table to be created. You can only create one table at a
time.
Action on table
Action on data
1269
Related scenario
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next
to the column name on which you want to base the update
operation. Do the same in the Key in delete column for the
deletion operation.
Schema and Edit schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Select this box to display the Commit every field in which you can
define the number of rows to be processed before committing.
Additional
parameters
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing
Name: Enter the name of the column to be modified or inserted.
SQL expression: Enter the SQL expression to be executed to modify
or insert data in the corresponding columns.
Position : Select Before, Replace or After, depending on the action
to be carried out on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Usage
Select this check box to display each step of the process by which
the data is written in the database.
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in an EXA database. It also allows you to create a reject flow using a Row
> Rejects link to filter data in error. For a user scenario, see section Scenario 3: Retrieve data in
error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For a scenario in which tEXAOutput might be used, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1270
tEXARow
tEXARow
tEXARow properties
Component family
Databases/EXA
Function
The tEXARow component is specific to this type of database. It executes SQL queries on specified
databases. The Row suffix indicates that it is used to channel a flow in a Job although it does not
produce any output data.
Purpose
Depending on the nature of the query and the database, tEXARow acts on the actual structure of
the database, or indeed the data, although without modifying them.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No properties stored centrally.
Host
Port
Schema name
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Enter the query manually or with the help of the
SQLBuilder.
Advanced settings
Guess Query
Click the Guess Query button to generate the query that corresponds
to the table schema in the Query field.
Query
Enter your query, taking care to ensure that the field order matches
that defined in the schema.
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Additional
parameters
Propagate
recordset
Usage
Commit every
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
This component offers query flexibility as it covers all possible SQL query requirements.
Limitation
1271
Related scenarios
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For a scenario in which tEXARow might be used, see:
section Scenario: Resetting a DB auto-increment
section Scenario 1: Removing and regenerating a MySQL table index
1272
tEXistConnection
tEXistConnection
tEXistConnection properties
This component is closely related to tEXistGet and tEXistPut. Once you have set the connection properties in
this component, you have the option of reusing the connection without having to set the properties again for each
tEXist component used in the Job.
Component family
Databases/eXist
Function
Purpose
Opens a connection to an eXist database in order that a transaction may be carried out.
Basic settings
URI
Collection
Driver
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
This component is to be used along with the other tEXist components such as tEXistGet and
tEXistPut.
eXist-db is an open source database management system built using XML technology. It
stores XML data according to the XML data model and features efficient, index-based XQuery
processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension, see XQuery update extension.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
This component is closely related to tEXistGet and tEXistPut. It usually does not make much sense to use one
of these without using a tEXistConnection component to open a connection for the current transaction.
For tEXistConnection related scenario, see section tMysqlConnection
1273
tEXistDelete
tEXistDelete
tEXistDelete properties
Component family
Databases/eXist
Function
Purpose
Basic settings
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
URI
Collection
Driver
Target Type
Files
Click the plus button to add the lines you want to use as filters:
Filemask: enter the filename or filemask using wildcharacters (*)
or regular expressions.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is typically used as a single component sub-job but can also be used as an
output or end object. eXist-db is an open source database management system built using XML
technology. It stores XML data according to the XML data model and features efficient, indexbased XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension, see XQuery update extension.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
No scenario is available for this component yet.
1274
tEXistGet
tEXistGet
tEXistGet properties
Component family
Databases/eXist
Function
Purpose
tEXistGet downloads selected resources from a remote DB server to a defined local directory.
Basic settings
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
URI
Collection
Driver
Local directory
Files
Click the plus button to add the lines you want to use as filters:
Filemask: enter the filename or filemask using wildcharacters (*)
or regular expressions
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is typically used as a single component sub-job but can also be used as an
output or end object. eXist-db is an open source database management system built using XML
technology. It stores XML data according to the XML data model and features efficient, indexbased XQuery processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension, see XQuery update extension.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
1275
1.
Drop the tEXistGet component from the Palette into the design workspace.
2.
Double-click the tEXistGet component to open the Component view and define the properties in its Basic
settings view.
3.
Fill in the URI field with the URI of the eXist database you want to connect to.
In this scenario, the URI is xmldb:exist://192.168.0.165:8080/exist/xmlrpc. Note that the URI used in this use
case is for demonstration purpose only and is not an active address.
4.
Fill in the Collection field with the path to the collection of interest on the database server, /db/talend in
this scenario.
5.
Fill in the Driver field with the driver for the XML database, org.exist.xmldb.DatabaseImpl in this scenario.
6.
Fill in the Username and Password fields by typing in admin and talend respectively in this scenario.
7.
Click the three-dot button next to the Local directory field to set a path for saving the XML file downloaded
from the remote database server.
In this scenario, set the path to your desktop, for example C:/Documents and Settings/galano/Desktop/
ExistGet.
8.
In the Files field, click the plus button to add a new line in the Filemask area, and fill it with a complete file
name to retrieve data from a particular file on the server, or a filemask to retrieve data from a set of files.
In this scenario, fill in dictionary_en.xml.
9.
1276
The XML file dictionary_en.xml is retrieved and downloaded to the defined local directory.
1277
tEXistList
tEXistList
tEXistList properties
Component family
Databases/eXist
Function
Purpose
Basic settings
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
URI
Collection
Driver
Files
Click the plus button to add the lines you want to use as filters:.
Filemask: enter the filename or filemask using wildcharacters (*)
or regular expressions.
Target Type
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is typically used along with a tEXistGetcomponent to retrieve the files listed,
for example.
eXist-db is an open source database management system built using XML technology. It
stores XML data according to the XML data model and features efficient, index-based XQuery
processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension, see XQuery update extension.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
No scenario is available for this component yet.
1278
tEXistPut
tEXistPut
tEXistPut properties
Component family
Databases/eXist
Function
Purpose
tEXistPut uploads specified files from a defined local directory to a remote DB server.
Basic settings
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
URI
Collection
Driver
Local directory
Files
Click the plus button to add the lines you want to use as filters:.
Filemask: enter the filename or filemask using wildcharacters (*)
or regular expressions.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is typically used as a single component sub-job but can also be used as an output
or end object.
eXist-db is an open source database management system built using XML technology. It
stores XML data according to the XML data model and features efficient, index-based XQuery
processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension, see XQuery update extension.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
No scenario is available for this component yet.
1279
tEXistXQuery
tEXistXQuery
tEXistXQuery properties
Component family
Databases/eXist
Function
This component uses local files containing XPath queries to query XML files stored on remote
databases.
Purpose
tEXistXQuery queries XML files located on remote databases and outputs the results to an
XML file stored locally.
Basic settings
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
URI
Collection
Driver
Local Output
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is typically used as a single component Job but can also be used as part of a
more complex Job.
eXist-db is an open source database management system built using XML technology. It
stores XML data according to the XML data model and features efficient, index-based XQuery
processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension, see XQuery update extension.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
No scenario is available for this component yet.
1280
tEXistXUpdate
tEXistXUpdate
tEXistXUpdate properties
Component family
Databases/eXist
Function
This component processes XML file records and updates the records on the DB server.
Purpose
tEXistXUpdate processes XML file records and updates the existing records on the DB server.
Basic settings
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level.
URI
Collection
Enter the path to the collection and file of interest on the database
server.
Driver
Update File
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is typically used as a single component Job but can also be used as part of a
more complex Job.
eXist-db is an open source database management system built using XML technology. It
stores XML data according to the XML data model and features efficient, index-based XQuery
processing.
For further information about XQuery, see XQuery.
For further information about the XQuery update extension, see XQuery update extension.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
No scenario is available for this component yet.
1281
tFirebirdClose
tFirebirdClose
tFirebirdClose properties
Component family
Databases/Firebird
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1282
tFirebirdCommit
tFirebirdCommit
tFirebirdCommit Properties
This component is closely related to tFirebirdConnection and tFirebirdRollback. It usually doesnt make much
sense to use these components independently in a transaction.
Component family
Databases/Firebird
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tFirebirdConnection and tFirebirdRollback. It usually doesnt make much
sense to use one of these without using a tFirebirdConnection component to open a connection for the current
transaction.
For tFirebirdCommit related scenario, see section tMysqlConnection
1283
tFirebirdConnection
tFirebirdConnection
tFirebirdConnection properties
This component is closely related to tFirebirdCommit and tFirebirdRollback. It usually does not make much
sense to use one of these without using a tFirebirdConnection to open a connection for the current transaction.
Component family
Databases/Firebird
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host name
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Usage
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
This component is to be used along with Firebird components, especially with tFirebirdCommit
and tFirebirdRollback.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
This component is closely related to tFirebirdCommit and tFirebirdRollback. It usually does not make much
sense to use one of these without using a tFirebirdConnection component to open a connection for the current
transaction.
For tFirebirdConnection related scenario, see section tMysqlConnection
1284
tFirebirdInput
tFirebirdInput
tFirebirdInput properties
Component family
Databases/FireBird
Function
Purpose
tFirebirdInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Dynamic settings
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
1285
Related scenarios
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for FireBird databases.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
See also related topic: section Scenario: Reading data from different MySQL databases using dynamically loaded
connection parameters.
1286
tFirebirdOutput
tFirebirdOutput
tFirebirdOutput properties
Component family
Databases/FireBird
Function
Purpose
tFirebirdOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in
mode is available.
Built-in: No property data stored centrally.
Use an existing connection Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to
be shared in the Basic settings view of the connection
component which creates that very database connection.
2. In the child level, use a dedicated connection component
to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be written
at a time
Action on table
Action on data
1287
tFirebirdOutput properties
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
You must specify at least one column as a primary key
on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the update and delete operations. To do that: Select
the Use field options check box and then in the Key in
update column, select the check boxes next to the column
name on which you want to base the update operation.
Do the same in the Key in delete column for the deletion
operation.
Schema and Edit schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Select this check box to display each step during processing entries
in a database.
Support null in SQL Select this check box if you want to deal with the Null values
WHERE statement
contained in a DB table.
Make sure the Nullable check box is selected for the
corresponding columns in the schema.
tStat Catcher Statistics
Dynamic settings
1288
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Related scenarios
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Firebird database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1289
tFirebirdRollback
tFirebirdRollback
tFirebirdRollback properties
This component is closely related to tFirebirdCommit and tFirebirdConnection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/Firebird
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
For tFirebirdRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1290
tFirebirdRow
tFirebirdRow
tFirebirdRow properties
Component family
Databases/FireBird
Function
tFirebirdRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tFirebirdRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Database
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
1291
Related scenarios
Advanced settings
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1292
tHBaseClose
tHBaseClose
tHBaseClose component belongs to two component families: Big Data and Databases. For more information
about tHBaseClose, see section tHBaseClose.
1293
tHBaseConnection
tHBaseConnection
tHBaseConnection component belongs to two component families: Big Data and Databases. For more
information about tHBaseConnection, see section tHBaseConnection .
1294
tHBaseInput
tHBaseInput
tHBaseInput component belongs to two component families: Big Data and Databases. For more information
about tHBaseInput, see section tHBaseInput.
1295
tHBaseOutput
tHBaseOutput
tHBaseOutput component belongs to two component families: Big Data and Databases. For more information
about tHBaseOutput, see section tHBaseOutput.
1296
tHiveClose
tHiveClose
tHiveClose component belongs to two component families: Big Data and Databases. For more information about
tHiveClose, see section tHiveClose.
1297
tHiveConnection
tHiveConnection
tHiveConnection component belongs to two component families: Big Data and Databases. For more information
about tHiveConnection, see section tHiveConnection.
1298
tHiveCreateTable
tHiveCreateTable
tHiveCreateTable belongs to two component families: Big data and Databases. For more information on it, see
section tHiveCreateTable.
1299
tHiveInput
tHiveInput
tHiveInput component belongs to two component families: Big Data and Databases. For more information about
tHiveInput, see section tHiveInput.
1300
tHiveLoad
tHiveLoad
tHiveLoad belongs to two component families: Big data and Databases. For more information on it, see section
tHiveLoad.
1301
tHiveRow
tHiveRow
tHiveRow component belongs to two component families: Big Data and Databases. For more information about
tHiveRow, see section tHiveRow.
1302
tHSQLDbInput
tHSQLDbInput
tHSQLDbInput properties
Component family
Databases/HSQLDb
Function
Purpose
tHSQLDbInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Running Mode
Host
Port
Database Alias
DB path
Specify the directory to the database you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode.
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view
Db name
Enter the database name that you want to connect to. This field is
available only to the HSQLDb In Process Persistent running mode
and the HSQLDb In Memory running mode.
Additional
parameters
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
1303
Related scenarios
Usage
Trim column
Select this check box to collect log data at the component level.
This component covers all possible SQL queries for HSQLDb databases.
Global Variables
Connections
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable
1304
tHSQLDbOutput
tHSQLDbOutput
tHSQLDbOutput properties
Component family
Databases/HSQLDb
Function
Purpose
tHSQLDbOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Running Mode
Host
Port
Database
DB path
Specify the directory to the database you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode.
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view
Db name
Enter the database name that you want to connect to. This field is
available only to the HSQLDb In Process Persistent running mode
and the HSQLDb In Memory running mode.
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1305
tHSQLDbOutput properties
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Make changes to existing entries
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema
Die on error
Advanced settings
Additional
parameters
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
JDBC Specify additional connection properties for the DB connection
you are creating. When the running mode is HSQLDb In Process
Persistent , this additional property is set as ifexists=true by
default, meaning that the database will be automatically created
when needed.
You can press Ctrl+Space to access a list of predefined
global variables.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
1306
Select this check box to display each step during processing entries
in a database.
Related scenarios
Select this check box to collect log data at the component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a MySQL database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Global Variables
Connections
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see
section Scenario: Writing a row to a table in the MySql database via an ODBC connection
section Scenario 1: Adding a new column and altering data in a DB table.
1307
Related scenarios
1308
tHSQLDbRow
tHSQLDbRow
tHSQLDbRow properties
Component family
Databases/HSQLDb
Function
tHSQLDbRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tHSQLDbRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Running Mode
Host
Port
Database Alias
DB path
Specify the directory to the database you want to connect to. This
field is available only to the HSQLDb In Process Persistent
running mode.
By default, if the database you specify in this field does
not exist, it will be created automatically. If you want
to change this default setting, modify the connection
parameter set in the Additional JDBC parameter field in
the Advanced settings view
Database
Enter the database name that you want to connect to. This field is
available only to the HSQLDb In Process Persistent running mode
and the HSQLDb In Memory running mode.
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
1309
Related scenarios
Advanced settings
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Additional
parameters
Propagate
recordset
Usage
Commit every
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries.
Global Variables
Connections
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1310
tInterbaseClose
tInterbaseClose
tInterbaseClose properties
Component family
Databases/Interbase
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1311
tInterbaseCommit
tInterbaseCommit
tInterbaseCommit Properties
This component is closely related to tInterbaseConnection and tInterbaseRollback. It usually doesnt make
much sense to use JDBC components independently in a transaction.
Component family
Databases/Interbase
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
If you want to use a Row > Main connection to
link tInterbaseCommit to your Job, your data will be
committed row by row. In this case, do not select the Close
connection check box or your connection will be closed
before the end of your first row commit.
Advanced settings
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with Interbase components, especially with the
tInterbaseConnection and tInterbaseRollback components.
Limitation
n/a
Related scenario
This component is closely related to tInterbaseConnection and tInterbaseRollback. It usually doesnt make
much sense to use JDBC components without using the tInterbaseConnection component to open a connection
for the current transaction.
For tInterbaseCommit related scenario, see section tMysqlConnection
1312
tInterbaseConnection
tInterbaseConnection
tInterbaseConnection properties
This component is closely related to tInterbaseCommit and tInterbaseRollback. It usually does not make much
sense to use one of these without using a tInterbaseConnection to open a connection for the current transaction.
Component family
Databases/Interbase
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host name
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
This component is to be used along with Interbase components, especially with tInterbaseCommit
and tInterbaseRollback.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
This component is closely related to tInterbaseCommit and tInterbaseRollback. It usually does not make much
sense to use one of these without using a tInterbaseConnection component to open a connection for the current
transaction.
For tInterbaseConnection related scenario, see section tMysqlConnection
1313
tInterbaseInput
tInterbaseInput
tInterbaseInput properties
Component family
Databases/Interbase
Function
Purpose
tInterbaseInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Database
Dynamic settings
1314
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Related scenarios
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for Interbase databases.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
See also the related topic in tContextLoad: section Scenario: Reading data from different MySQL databases using
dynamically loaded connection parameters.
1315
tInterbaseOutput
tInterbaseOutput
tInterbaseOutput properties
Component family
Databases/Interbase
Function
Purpose
tInterbaseOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1316
tInterbaseOutput properties
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Dynamic settings
Select this check box to display each step during processing entries
in a database.
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
1317
Related scenarios
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Interbase database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1318
tInterbaseRollback
tInterbaseRollback
tInterbaseRollback properties
This component is closely related to tInterbaseCommit and tInterbaseConnection. It usually does not make
much sense to use these components independently in a transaction.
Component family
Databases/Interbase
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenarios
For tInterbaseRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1319
tInterbaseRow
tInterbaseRow
tInterbaseRow properties
Component family
Databases/Interbase
Function
tInterbaseRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it does not provide output.
Purpose
Depending on the nature of the query and the database, tInterbaseRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Host
Database
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Query
1320
Related scenarios
Die on error
Advanced settings
Propagate
recordset
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For related scenarios, see:
section Scenario 3: Combining two flows for selective output
For tDBSQLRow related scenario: see section Scenario: Resetting a DB auto-increment
For tMySQLRow related scenario: see section Scenario 1: Removing and regenerating a MySQL table index.
1321
tJavaDBInput
tJavaDBInput
tJavaDBInput properties
Component family
Databases/JavaDB
Function
Purpose
tJavaDBInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Framework
Database
DB root path
Usage
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see the tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
See also the related topic in tContextLoad: section Scenario: Reading data from different MySQL databases using
dynamically loaded connection parameters.
1322
tJavaDBOutput
tJavaDBOutput
tJavaDBOutput properties
Component family
Databases/JavaDB
Function
Purpose
tJavaDBOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Framework
Database
DB root path
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1323
Related scenarios
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Usage
Select this check box to display each step during processing entries
in a database.
Select this check box to collect log data at the component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a Java database. It also allows you to create a reject flow using a Row >
Rejects link to filter data in error. For an example of tMysqlOutput in use, see section Scenario 3:
Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1324
tJavaDBRow
tJavaDBRow
tJavaDBRow properties
Component family
Databases/JavaDB
Function
tJavaDBRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tJavaDBRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Framework
Database
DB root path
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Advanced settings
Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
Use PreparedStatement
1325
Related scenarios
Usage
Commit every
Select this check box to collect log data at the component level.
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For related topics, see:
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1326
tJDBCColumnList
tJDBCColumnList
tJDBCColumnList Properties
Component family
Databases/JDBC
Function
Purpose
Basic settings
Component list
Table name
Usage
This component is to be used along with JDBC components, especially with tJDBCConnection.
Limitation
n/a
Related scenario
For tJDBCColumnList related scenario, see section Scenario: Iterating on a DB table and listing its column
names.
1327
tJDBCClose
tJDBCClose
tJDBCClose properties
Component family
Databases/JDBC
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with JDBC components, especially with tJDBCConnection
and tJDBCCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1328
tJDBCCommit
tJDBCCommit
tJDBCCommit Properties
This component is closely related to tJDBCConnection and tJDBCRollback. It usually doesnt make much sense
to use JDBC components independently in a transaction.
Component family
Databases/JDBC
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with JDBC components, especially with the tJDBCConnection
and tJDBCRollback components.
Limitation
n/a
Related scenario
This component is closely related to tJDBCConnection and tJDBCRollback. It usually doesnt make much sense
to use JDBC components without using the tJDBCConnection component to open a connection for the current
transaction.
For tJDBCCommit related scenario, see section tMysqlConnection
1329
tJDBCConnection
tJDBCConnection
tJDBCConnection Properties
This component is closely related to tJDBCCommit and tJDBCRollback. It usually doesnt make much sense
to use one of JDBC components without using the tJDBCConnection component to open a connection for the
current transaction.
Component family
Databases/JDBC
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
JDBC URL
Enter the JDBC URL to connect to the desired DB. For example,
enter: jdbc:mysql://IP address/database name to connect to a mysql
database.
Driver JAR
Click the plus button under the table to add lines of the count of your
need for the purpose of loading several JARs. Then on each line,
click the three dot button to open the Select Module wizard from
which you can select a driver JAR of your interest for each line.
Driver Class
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
Use Auto-Commit
Select this check box to display the Auto Commit check box. Select
it to activate auto commit mode.
Once you clear the Use Auto-Commit check box, the auto-commit
statement will be removed from the codes.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is to be used along with JDBC components, especially with the tJDBCCommit
and tJDBCRollback components.
Limitation
n/a
1330
Related scenario
Related scenario
This component is closely related to tJDBCCommit and tJDBCRollback. It usually doesnt make much sense
to use one of JDBC components without using the tJDBCConnection component to open a connection for the
current transaction.
For tJDBCConnection related scenario, see section tMysqlConnection
1331
tJDBCInput
tJDBCInput
tJDBCInput properties
Component family
Databases/JDBC
Function
tJDBCInput reads any database using a JDBC API connection and extracts fields based on a query.
If you have subscribed to one of the Talend solutions with Big Data, you are able to use
this component in a Talend Map/Reduce Job to generate Map/Reduce code. In that situation,
tJDBCInput belongs to the MapReduce component family. For further information, see section
tJDBCInput in Talend Map/Reduce Jobs.
Purpose
tJDBCInput executes a database query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
JDBC URL
Driver JAR
Click the plus button under the table to add lines of the count of your
need for the purpose of loading several JARs. Then on each line,
click the three dot button to open the Select Module wizard from
which you can select a driver JAR of your interest for each line.
Class Name
1332
Table Name
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This check box is not available when the Use an existing connection
check box is selected.
Advanced settings
Use cursor
When selected, helps to decide the row set to work with at a time and
thus optimize performance.
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Dynamic settings
Trim column
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component covers all possible SQL queries for any database using a JDBC connection.
In a Talend Map/Reduce Job, tJDBCInput, as well as the other Map/Reduce components preceding it, generates
native Map/Reduce code. This section presents the specific properties of tJDBCInput when it is used in that
situation. For further information about a Talend Map/Reduce Job, see Talend Open Studio for Big Data Getting
Started Guide.
Component family
MapReduce/Input
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Repository: Select the repository file in which the properties are
stored. The fields that follow are completed automatically using the
data retrieved.
Click this icon to open a database connection wizard and store the
database connection parameters you set in the component Basic
settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.
JDBC URL
1333
Related scenarios
address XX.XX.XX.XX and the port is 3306, then the URL should be
jdbc:mysql://XX.XX.XX.XX:3306/Talend.
Driver JAR
Click the plus button under the table to add lines of the count of your
need for the purpose of loading several JARs. Then on each line,
click the three dot button to open the Select Module wizard from
which you can select a driver JAR of your interest for each line.
Class Name
Usage
Table Name
Type in the name of the table from which you need to read data.
Hadoop Connection
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
Limitation
We recommend using the following databases with the Map/Reduce version of this component:
DB2, Informix, MSSQL, MySQL, Netezza, Oracle, Postgres, Teradata and Vertica.
It may work with other databases as well, but these may not necessarily have been tested.
Related scenarios
Related topics in tDBInput scenarios:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
Related topic in tContextLoad: see section Scenario: Reading data from different MySQL databases using
dynamically loaded connection parameters.
1334
tJDBCOutput
tJDBCOutput
tJDBCOutput properties
Component family
Databases/JDBC
Function
tJDBCOutput writes, updates, makes changes or suppresses entries in any type of database
connected to a JDBC API.
If you have subscribed to one of the Talend solutions with Big Data, you are able to use this
component in a Talend Map/Reduce Job to generate Map/Reduce code. In that situation, this
component belongs to the MapReduce component family and can only write data in a database.
For further information, see section tJDBCOutput in Talend Map/Reduce Jobs
Purpose
tJDBCOutput executes the action defined on the data contained in the table, based on the flow
incoming from the preceding component in the Job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
JDBC URL
Driver JAR
Click the plus button under the table to add lines of the count of your
need for the purpose of loading several JARs. Then on each line,
click the three dot button to open the Select Module wizard from
which you can select a driver JAR of your interest for each line.
Class Name
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on data
1335
tJDBCOutput properties
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to set as
primary key(s). For an advanced use, click the Advanced
settings view where you can simultaneously define primary
keys for the Update and Delete operations. To do that:
Select the Use field options check box and then in the Key
in update column, select the check boxes next to the column
names you want to use as a base for the Update operation.
Do the same in the Key in delete column for the Delete
operation.
Schema and Edit schema
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This check box is not available when the Use an existing connection
check box is selected.
Advanced settings
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
database table. This option allows you to call SQL functions to
perform actions on columns, which are not insert, nor update or
delete actions, or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
1336
Select this check box to display each step during processing entries
in a database.
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the database query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a JDBC database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySqlOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
In a Talend Map/Reduce Job, tJDBCOutput, as well as the other Map/Reduce components preceding it, generates
native Map/Reduce code. This section presents the specific properties of tJDBCOutput when it is used in that
situation. For further information about a Talend Map/Reduce Job, see Talend Open Studio for Big Data Getting
Started Guide.
Component family
MapReduce/Output
Function
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Click this icon to open a database connection wizard and store the
database connection parameters you set in the component Basic
settings view.
For more information about setting up and storing database
connection parameters, see Talend Studio User Guide.
JDBC URL
Driver JAR
Click the [+] button under the table to add lines of the count of your
need for the purpose of loading several JARs. Then on each line,
click the [...] button to open the Select Module wizard from which
you can select a driver JAR of your interest for each line.
Class Name
Table name
Name of the table to be written. Note that this must exist and only
one table can be written at a time.
1337
Related scenarios
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: The schema is created and stored locally for this component
only. Related topic: see Talend Studio User Guide.
Repository: The schema already exists and is stored in the
Repository, hence can be reused. Related topic: see Talend Studio
User Guide.
Advanced settings
Usage
Hadoop Connection
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
Limitation
We recommend using the following databases with the Map/Reduce version of this component:
DB2, Informix, MSSQL, MySQL, Netezza, Oracle, Postgres, Teradata and Vertica.
It may work with other databases as well, but these may not necessarily have been tested.
Related scenarios
For tJDBCOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
If you are a subscription-based Big Data user, you can as well consult a Talend Map/Reduce Job using the Map/
Reduce version of tJDBCOutput:
section Scenario 2: Deduplicating entries using Map/Reduce components
1338
tJDBCRollback
tJDBCRollback
tJDBCRollback properties
This component is closely related to tJDBCCommit and tJDBCConnection. It usually does not make much sense
to use JDBC components independently in a transaction.
Component family
Databases/JDBC
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with JDBC components, especially with tJDBCConnection
and tJDBCCommit components.
Limitation
n/a
Related scenario
This component is closely related to tJDBCConnection and tJDBCCommit. It usually does not make much
sense to use JDBC components without using the tJDBCConnection component to open a connection for the
current transaction.
For tJDBCRollback related scenario, see section tMysqlRollback
1339
tJDBCRow
tJDBCRow
tJDBCRow properties
Component family
Databases/JDBC
Function
tJDBCRow is the component for any type database using a JDBC API. It executes the SQL query
stated onto the specified database. The row suffix means the component implements a flow in the
job design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tJDBCRow acts on the actual DB structure
or on the data (although without handling data). The SQLBuilder tool helps you write easily your
SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
JDBC URL
Driver JAR
Click the plus button under the table to add lines of the count of your
need for the purpose of loading several JARs. Then on each line,
click the three dot button to open the Select Module wizard from
which you can select a driver JAR of your interest for each line.
Class Name
Table Name
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
1340
tJDBCRow properties
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
If you use the component's own DB configuration, your
data source connection will be closed at the end of the
component. To prevent this from happening, use a shared
DB connection with the data source alias specified.
This check box is not available when the Use an existing connection
check box is selected.
Advanced settings
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
set to the type of Object and this component is usually
followed by tParseRecordSet.
Use PreparedStatement
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query for any database using a JDBC connection
and covers all possible SQL queries.
1341
Related scenarios
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output .
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1342
tJDBCSP
tJDBCSP
tJDBCSP Properties
Component family
Databases/JDBC
Function
Purpose
tJDBCSP offers a convenient way to centralize multiple or complex queries in a database and call
them easily.
Basic settings
JDBC URL
Driver JAR
Click the plus button under the table to add lines of the count of your
need for the purpose of loading several JARs. Then on each line,
click the three dot button to open the Select Module wizard from
which you can select a driver JAR of your interest for each line.
Class Name
SP Name
Is Function / Return result in Select this check box , if a value only is to be returned.
Select on the list the schema column, the value to be returned is based
on.
Parameters
Click the Plus button and select the various Schema Columns that
will be required by the procedures. Note that the SP schema can hold
more columns than there are parameters used in the procedure.
Select the Type of parameter:
IN: Input parameter
OUT: Output parameter/return value
IN OUT: Input parameters is to be returned as value, likely after
modification through the procedure (function).
RECORDSET: Input parameters is to be returned as a set of values,
rather than single value.
Check section Scenario: Inserting data in mother/
daughter tables, if you want to analyze a set of records
from a database table or DB query and return single
records.
Select this check box and specify the alias of a data source created
on the side to use the shared connection pool defined in the data
source configuration. This option works only when you deploy and
run your Job in .
1343
Related scenario
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is used as intermediary component. It can be used as start component but only
input parameters are thus allowed.
Limitation
Related scenario
For related scenarios, see:
section Scenario: Executing a stored procedure in the MDM Hub.
section Scenario: Checking number format using a stored procedure
Check as well section Scenario: Inserting data in mother/daughter tables if you want to analyze a set of records
from a database table or DB query and return single records.
1344
tJDBCTableList
tJDBCTableList
tJDBCTableList Properties
Component family
Databases/JDBC
Function
Purpose
Lists the names of a given set of JDBC tables using a select statement based on a Where clause.
Basic settings
Database type
Component list
Advanced settings
Use filter
Regular expression
tables name
Filter criteria
tStatCatcher Statistics
Usage
Select this check box to collect the log data at the component level.
This component is to be used along with JDBC components, especially with tJDBCConnection.
Global Variables
Related scenario
For tJDBCTableList related scenario, see section Scenario: Iterating on a DB table and listing its column names.
1345
tLDAPAttributesInput
tLDAPAttributesInput
tLDAPAttributesInput Properties
Component family
Databases/LDAP
Function
tLDAPAttributesInput analyses each object found via the LDAP query and lists a collection of
attributes associated with the object.
Purpose
tLDAPAttributesInput executes an LDAP query based on the given filter and corresponding to
the schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Base DN
Protocol
Authentication
Password
User
Filter
and Select the Authentication check box if LDAP login is required. Note
that the login must match the LDAP syntax requirement to be valid.
e.g.: cn=Directory Manager.
Type in the filter as expected by the LDAP directory db.
Multi valued field separator Type in the value separator in multi-value fields.
Alias dereferencing
Referral handling
1346
Related scenario
Limit
Time Limit
Paging
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Advanced settings
Usage
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Related scenario
The tLDAPAttributesInput component follows the usage similar to that of tLDAPInput. Hence for
tLDAPInput related scenario, see section Scenario: Displaying LDAP directorys filtered content.
1347
tLDAPClose
tLDAPClose
tLDAPClose properties
Component family
Databases/LDAP
Function
Purpose
tLDAPClose is used to disconnect one connection to the LDAP Directory server so as to release
occupied resources.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with other LDAP components, especially with
tLDAPConnection.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1348
tLDAPConnection
tLDAPConnection
tLDAPConnection Properties
Component family
Databases/LDAP
Function
Purpose
This component creates a connection to an LDAP Directory server. Then it can be invoked by other
components that need to access the LDAP Directory server, e.g., tLDAPInput, tLDAPOutput,
etc.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Protocol
Base DN
Alias dereferencing
Referral handling
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is to be used with other LDAP components, especially with tLDAPInput and
tLDAPOutput.
1349
Related scenarios
Related scenarios
This component is closely related to tLDAPInput and tLDAPOutput as it frees you from filling in the connection
details repeatedly if multiple LDAP input/output components exist.
For tLDAPConnection related scenarios, see section Scenario: Inserting data in mother/daughter tables.
1350
tLDAPInput
tLDAPInput
tLDAPInput Properties
Component family
Databases/LDAP
Function
tLDAPInput reads a directory and extracts data based on the defined filter.
Purpose
tLDAPInput executes an LDAP query based on the given filter and corresponding to the schema
definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Host
Port
Base DN
Protocol
Authentication
Password
User
Filter
and Select the Authentication check box if LDAP login is required. Note
that the login must match the LDAP syntax requirement to be valid.
e.g.: cn=Directory Manager.
Type in the filter as expected by the LDAP directory db.
Multi valued field separator Type in the value separator in multi-value fields.
Alias dereferencing
1351
Limit
Time Limit
Paging
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Drop the tLDAPInput component along with a tLogRow from the Palette to the design workspace.
Set the tLDAPInput properties.
Set the Property type on Repository if you stored the LDAP connection details in the Metadata Manager in
the Repository. Then select the relevant entry on the list.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in.
For further information about how to edit a Built-in schema, see Talend Studio User Guide.
1352
In Built-In mode, fill in the Host and Port information manually. Host can be the IP address of the LDAP
directory server or its DNS name.
No particular Base DN is to be set.
Then select the relevant Protocol on the list. In this example: a simple LDAP protocol is used.
Select the Authentication check box and fill in the login information if required to read the directory. In this
use case, no authentication is needed.
In the Filter area, type in the command, the data selection is based on. In this example, the filter is:
(&(objectClass=inetorgperson)&(uid=PIERRE DUPONT)).
Fill in Multi-valued field separator with a comma as some fields may hold more than one value, separated
by a comma.
As we do not know if some aliases are used in the LDAP directory, select Always on the list.
Set Ignore as Referral handling.
Set the limit to 100 for this use case.
Set the Schema as required by your LDAP directory. In this example, the schema is made of 6 columns including
the objectClass and uid columns which get filtered on.
In the tLogRow component, no particular setting is required.
1353
Only one entry of the directory corresponds to the filter criteria given in the tLDAPInput component.
1354
tLDAPOutput
tLDAPOutput
tLDAPOutput Properties
Component family
Databases/LDAP
Function
Purpose
tLDAPOutput executes an LDAP query based on the given filter and corresponding to the schema
definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job,
Component list presents only the connection components
in the same Job level.
Host
Port
Base DN
Protocol
Multi valued field separator Character, string or regular expression to separate data in a multivalue field.
Alias dereferencing
Referral handling
1355
DN Column Name
Select in the list the type of the LDAP input entity used.
Die on error
Advanced settings
Use Attribute Options (for Select this check box to choose the desired attribute (including dn,
update mode)
dc, ou, objectClass, mail and uid) and the corresponding operation
(including Add, Replace, Remove Attribute and Remove Value).
tStatCatcher Statistics
Usage
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Reject link.
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
1356
Drop the tLDAPInput, tLDAPOutput, tMap and tLogRow components from the Palette to the design
workspace.
2.
3.
4.
In the tLDAPInput Component view, set the connection details to the LDAP directory server as well as the
filter as described in section Scenario: Displaying LDAP directorys filtered content.
Change the schema to make it simpler, by removing the unused fields: dc, ou, objectclass.
2.
3.
In the Expression field of the dn column (output), fill in with the exact expression expected by the LDAP
server to reach the target tree leaf and allow directory writing on the condition that you havent set it already
in the Base DN field of the tLDAPOutput component.
4.
In this use case, the GetResultName global variable is used to retrieve this path automatically. Press Ctrl
+Space bar to access the variable list and select tLDAPInput_1_RESULT_NAME.
5.
In the mail columns expression field, type in the new email that will overwrite the current data in the LDAP
directory. In this example, we change to Pierre.Dupont@talend.com.
Click OK to validate the changes.
6.
Then select the tLDAPOutput component to set the directory writing properties.
Talend Open Studio for Big Data Components Reference Guide
1357
7.
Set the Port and Host details manually if they arent stored in the Repository.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
8.
In Base DN field, set the highest tree leaf you have the rights to access. If you have not set previously the exact
and full path of the target DN you want to access, then fill in it here. In this use case, the full DN is provided by
the dn output from the tMap component, therefore only the highest accessible leaf is given: o=directoryRoot.
9.
10. Use the default setting of Alias Dereferencing and Referral Handling fields, respectively Always and
Ignore.
The Insert mode for this use case is Update (the email address).
The schema was provided by the previous component through the propagation operation.
11. In the Advanced settings view, select the Use Attribute Options (for update mode) check box to show
the Attribute Options table.
Select the attribute mail under the Attribute Name part and the choose Replace under the Option part.
1358
2.
1359
tLDAPRenameEntry
tLDAPRenameEntry
tLDAPRenameEntry properties
Component family
Databases/LDAP
Function
Purpose
The tLDAPRenameEntry component rename ones or more entries in a specific LDAP directory.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Base DN
Protocol
Alias dereferencing
Referrals handling
1360
Select from the list the schema column that holds the old DN
(Previous DN) and the column that holds the new DN (New DN).
Related scenarios
Die on error
Global Variables
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Reject link.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component covers all possible LDAP queries. It is usually used as a one-component subjob
but you can use it with other components as well.
Note: Press Ctrl + Space bar to access the global variable list, including the GetResultName
variable to retrieve automatically the relevant DN Base.
Related scenarios
For use cases in relation with tLDAPRenameEntry, see the following scenarios:
section Scenario: Displaying LDAP directorys filtered content.
section Scenario: Editing data in a LDAP directory.
1361
tMaxDBInput
tMaxDBInput
tMaxDBInput properties
Component family
Databases/MaxDB
Function
Purpose
tMaxDBInput executes a DB query with a strictly defined order which must correspond to the
schema definition. Then it passes on the field list to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host name
Port
Database
Advanced settings
Table name
Guess Query
Guess schema
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
Select this check box to collect log data at the component level.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenario
For a related scenario, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
1362
Related scenario
1363
tMaxDBOutput
tMaxDBOutput
tMaxDBOutput properties
Component family
Databases/MaxDB
Function
Purpose
tMaxDBOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
Action on data
1364
Related scenario
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Usage
Select this check box to display each step during processing entries
in a database.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a table
or on the data of a table in a database. It also allows you to create a reject flow using a Row >
Rejects link to filter data in error. For an example of tMySqlOutput in use, see section Scenario 3:
Retrieve data in error with a Reject link.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenario
For a related scenario, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1365
tMaxDBRow
tMaxDBRow
tMaxDBRow properties
Component family
Databases/MaxDB
Function
tMaxDBRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the job
design although it doesnt provide output.
Purpose
Depending on the nature of the query and the database, tMaxDBRow acts on the actual DB
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Database
Advanced settings
Table name
Guess Query
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
Use PreparedStatement
1366
Related scenario
Commit every
Select this check box to collect log data at the component level.
Usage
This component offers the flexibility benefit of the DB query and covers all possible SQL queries.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenario
For a related scenario, see:
section Scenario 1: Displaying selected data from DB table
section Scenario 2: Using StoreSQLQuery variable
1367
tMongoDBBulkLoad
tMongoDBBulkLoad
tMongoDBBulkLoad belongs to two component families: Big Data and Databases. For more information about
tMongoDBBulkLoad, see section tMongoDBBulkLoad.
1368
tMongoDBClose
tMongoDBClose
tMongoDBClose belongs to two component families: Big Data and Databases. For more information about
tMongoDBClose, see section tMongoDBClose.
1369
tMongoDBConnection
tMongoDBConnection
tMongoDBConnection belongs to two component families: Big Data and Databases. For more information about
tMongoDBConnection, see section tMongoDBConnection.
1370
tMongoDBInput
tMongoDBInput
tMongoDBInput belongs to two component families: Big Data and Databases. For more information about
tMongoDBInput, see section tMongoDBInput.
1371
tMongoDBOutput
tMongoDBOutput
tMongoDBOutput belongs to two component families: Big Data and Databases. For more information about
tMongoDBOutput, see section tMongoDBOutput.
1372
tMongoDBRow
tMongoDBRow
tMongoDBRow belongs to two component families: Big Data and Databases. For more information about
tMongoDBRow, see section tMongoDBRow.
1373
tNeo4jClose
tNeo4jClose
tNeo4jClose belongs to two component families: Big Data and Databases. For more information about
tNeo4jClose, see section tNeo4jClose.
1374
tNeo4jConnection
tNeo4jConnection
tNeo4jConnection belongs to two component families: Big Data and Databases. For more information about
tNeo4jConnection, see section tNeo4jConnection.
1375
tNeo4jInput
tNeo4jInput
tNeo4jInput belongs to two component families: Big Data and Databases. For more information about
tNeo4jInput, see section tNeo4jInput.
1376
tNeo4jOutput
tNeo4jOutput
tNeo4jOutput belongs to two component families: Big Data and Databases. For more information about
tNeo4jOutput, see section tNeo4jOutput.
1377
tNeo4jOutputRelationship
tNeo4jOutputRelationship
tNeo4jOutputRelationship belongs to two component families: Big Data and Databases. For more information
about tNeo4jOutputRelationship, see section tNeo4jOutputRelationship.
1378
tNeo4jRow
tNeo4jRow
tNeo4jRow belongs to two component families: Big Data and Databases. For more information about tNeo4jRow,
see section tNeo4jRow.
1379
tParseRecordSet
tParseRecordSet
You can find this component at the root of Databases group of the Palette of the Integration perspective of
Talend Studio. tParseRecordSet covers needs related indirectly to the use of any database.
tParseRecordSet properties
Component family
Databases
Function
tParseRecordSet parses a set of records from a database table or DB query and possibly returns
single records.
Purpose
Basic settings
Set the column from the database that holds the recordset.
Attribute table
Set the position value of each column for single records from the
recordset.
Usage
This component is used as intermediary component. It can be used as start component but only
input parameters are thus allowed.
Limitation
This component is mainly designed for a use with the SP component Recordset feature.
Related Scenario
For an example of tParseRecordSet in use, see section Scenario 2: Using PreparedStatement objects to query
data.
1380
tPostgresPlusBulkExec
tPostgresPlusBulkExec
tPostgresPlusBulkExec properties
The tPostgresplusOutputBulk and tPostgresplusBulkExec components are generally used together as part of a
two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
operation used to feed a database. These two steps are fused together in the tPostgresPlusOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that the data can
be transformed before it is loaded in the database.
Component family
Databases/PostgresPlus
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Built-in: No property data stored centrally.
Use
an
connection
existing Select this check box and in the Component List click the relevant connection
component to reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need
to share an existing connection between the two levels, for example,
to share the connection created by the parent Job with the child Job,
you have to:
1. In the parent level, register the database connection to be shared
in the Basic settings view of the connection component which
creates that very database connection.
2. In the child level, use a dedicated connection component to read
that registered database connection.
For an example about how to share a database connection across Job
levels, see Talend Studio User Guide.
DB Version
Host
Port
Database
Schema
Name of the table to be written. Note that only one table can be written at a time
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if it already exists and
created again.
1381
Related scenarios
Schema and Edit Schema A schema is a row description, i.e., it defines the number of fields to be
processed and passed on to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is
available.
Click Edit Schema to make changes to the schema.
Built-in: You create the schema and store it locally for this component only.
Related topic: see Talend Studio User Guide.
Advanced settings
Action
Dynamic settings
Field terminated by
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your database connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access database tables having the same data structure but in different databases,
especially when you are working in an environment where you cannot change your Job settings, for
example, when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected
in the Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This dedicated component offers performance and flexibility of DB2 query handling.
Related scenarios
For tPostgresPlusBulkExec related topics, see:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1382
tPostgresPlusClose
tPostgresPlusClose
tPostgresPlusClose properties
Component family
Databases/Postgres
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1383
tPostgresPlusCommit
tPostgresPlusCommit
tPostgresPlusCommit Properties
This component is closely related to tPostgresPlusConnection and tPostgresPlusRollback. It usually does not
make much sense to use JDBC components independently in a transaction.
Component family
Databases/PostgresPlus
Function
Validates the data processed through the Job into the connected DB.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with PostgresPlus components, especially with the
tPostgresPlusConnection and tPostgresPlusRollback components.
Limitation
n/a
Related scenario
This component is closely related to tPostgresPlusConnection and tPostgresPlusRollback. It usually doesnt
make much sense to use PostgresPlus components without using the tPostgresPlusConnection component to
open a connection for the current transaction.
For tPostgresPlusCommit related scenario, see section tMysqlConnection
1384
tPostgresPlusConnection
tPostgresPlusConnection
tPostgresPlusConnection Properties
This component is closely related to tPostgresPlusCommit and tPostgresPlusRollback. It usually doesnt make
much sense to use one of PostgresPlus components without using the tPostgresPlusConnection component to
open a connection for the current transaction.
Component family
Databases/PostgresPlus
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Host
Port
Database
Schema
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
This component is to be used along with PostgresPlus components, especially with the
tPostgresPlusCommit and tPostgresPlusRollback components.
Limitation
n/a
Related scenario
This component is closely related to tPostgresPlusCommit and tPostgresPlusRollback. It usually doesnt make
much sense to use one of PostgresPlus components without using the tPostgresPlusConnection component to
open a connection for the current transaction.
1385
Related scenario
1386
tPostgresPlusInput
tPostgresPlusInput
tPostgresPlusInput properties
Component Databases/
family
PostgresPlus
Function
Purpose
tPostgresPlusInput executes a DB query with a strictly defined order which must correspond to the schema definition.
Then it passes on the field list to the next component via a Main row link.
Basic
settings
Property type .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use
an Select this check box and in the Component List click the relevant connection component to reuse
existing
the connection details you already defined.
connection
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the parent
Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings view
of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered database
connection.
For an example about how to share a database connection across Job levels, see Talend Studio
User Guide.
DB Version
Host
Port
Database
Schema
Query
type Enter your DB query paying particularly attention to properly sequence the fields in order to match
and Query
the schema definition.
Advanced
settings
Use cursor
When selected, helps to decide the row set to work with at a time and thus optimize performance.
Trim all the Select this check box to remove leading and trailing whitespace from all the String/Char columns.
String/Char
columns
Trim column
1387
Related scenarios
tStat Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to access
database tables having the same data structure but in different databases, especially when you are working in an
environment where you cannot change your Job settings, for example, when your Job has to be deployed and executed
independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This is
an After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component covers all possible SQL queries for Postgresql databases.
Related scenarios
For related scenarios, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
1388
tPostgresPlusOutput
tPostgresPlusOutput
tPostgresPlusOutput properties
Component
family
Databases/
PostgresPlus
Function
Purpose
tPostgresPlusOutput executes the action defined on the table and/or on the data contained in the table, based on
the flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an existing Select this check box and in the Component List click the relevant connection component to
connection
reuse the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by
the parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic
settings view of the connection component which creates that very database
connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see
Talend Studio User Guide.
DB Version
Host
Port
Database
Schema
Username
Password
Table
Name of the table to be written. Note that only one table can be written at a time
Action on table
On the table defined, you can perform one of the following operations:
None: No operation is carried out.
Drop and create table: The table is removed and created again.
Create table: The table does not exist and gets created.
Create table if not exists: The table is created if it does not exist.
Drop table if exists and create: The table is removed if already exists and created again.
Clear table: The table content is deleted.
Truncate table: The table content is deleted. You don not have the possibility to rollback the
operation.
Action on data
1389
tPostgresPlusOutput properties
Advanced
settings
Die on error
This check box is selected by default. Clear the check box to skip the row on error and complete
the process for error-free rows. If needed, you can retrieve the rows on error via a Row >
Rejects link.
Commit every
Enter the number of rows to be completed before committing batches of rows together into
the DB. This option ensures transaction quality (but not rollback) and, above all, better
performance at execution.
Additional
Columns
This option is not offered if you create (with or without drop) the DB table. This option allows
you to call SQL functions to perform actions on columns, which are not insert, nor update or
delete actions, or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or inserted as new column
SQL expression: Type in the SQL statement to be executed in order to alter or insert the
relevant column data.
Position: Select Before, Replace or After following the action to be performed on the
reference column.
Reference column: Type in a column of reference that the tDBOutput can use to place or
replace the new or altered column.
Select this check box to customize a request, especially when there is double action on data.
debug Select this check box to display each step during processing entries in a database.
Support null in Select this check box if you want to deal with the Null values contained in a DB table.
SQL WHERE
Ensure that the Nullable check box is selected for the corresponding columns in the
statement
schema.
Use batch size
Select this check box to activate the batch mode for data processing. In the Batch Size field
that appears when this check box is selected, you can type in the number you need to define
the batch size to be processed.
This check box is available only when you have selected the Insert, the Update or
the Delete option in the Action on data field.
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Global
Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output component. This
is an After variable and it returns an integer.
NB_LINE_UPDATED: Indicates the number of rows updated. This is an After variable and it returns an integer.
NB_LINE_INSERTED: Indicates the number of rows inserted. This is an After variable and it returns an integer.
1390
Related scenarios
NB_LINE_DELETED: Indicates the number of rows deleted. This is an After variable and it returns an integer.
NB_LINE_REJECTED: Indicates the number of rows rejected. This is an After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is an After variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the
Basic settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view
becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility benefit of the DB query and covers all of the SQL queries possible.
This component must be used as an output component. It allows you to carry out actions on a table or on the data of
a table in a PostgresPlus database. It also allows you to create a reject flow using a Row > Rejects link to filter data
in error. For an example of tMySqlOutput in use, see section Scenario 3: Retrieve data in error with a Reject link.
Related scenarios
For tPostgresPlusOutput related topics, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
section Scenario 1: Adding a new column and altering data in a DB table.
1391
tPostgresPlusOutputBulk
tPostgresPlusOutputBulk
tPostgresPlusOutputBulk properties
The tPostgresplusOutputBulk and tPostgresplusBulkExec components are generally used together as part of a
two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
operation used to feed a database. These two steps are fused together in the tPostgresPlusOutputBulkExec
component, detailed in a separate section. The advantage of using two separate components is that the data can
be transformed before it is loaded in the database.
Component family
Databases/PostgresPlus
Function
Writes a file with columns based on the defined delimiter and the PostgresPlus standards
Purpose
Prepares the file to be used as parameter in the INSERT query to feed the PostgresPlus database.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
File Name
Append
Select this check box to add the new rows at the end of the file
Advanced settings
Global Variables
Row separator
Field separator
Include header
Select this check box to include the column header to the file.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
1392
Related scenarios
Related scenarios
For use cases in relation with tPostgresplusOutputBulk, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1393
tPostgresPlusOutputBulkExec
tPostgresPlusOutputBulkExec
tPostgresPlusOutputBulkExec properties
The tPostgresplusOutputBulk and tPostgresplusBulkExec components are generally used together as part of a
two step process. In the first step, an output file is generated. In the second step, this file is used in the INSERT
operation used to feed a database. These two steps are fused together in the tPostgresPlusOutputBulkExec
component.
Component family
Databases/PostgresPlus
Function
Purpose
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
DB Version
Host
Port
Database
Schema
Table
Name of the table to be written. Note that only one table can be
written at a time and that the table must exist for the insert operation
to succeed.
Action on table
File Name
1394
Related scenarios
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Click Edit Schema to make changes to the schema.
Advanced settings
Action
File type
Null string
Row separator
Field terminated by
Text enclosure
Select this check box to collect log data at the component level.
Usage
This component is mainly used when no particular transformation is required on the data to be
loaded onto the database.
Limitation
The database server must be installed on the same machine where the Studio is installed or
where the Job using tPostgresPlusOutputBulkExec is deployed, so that the component functions
properly.
Related scenarios
For use cases in relation with tPostgresPlusOutputBulkExec, see the following scenarios:
section Scenario: Inserting transformed data in MySQL database.
section Scenario: Inserting data in MySQL database.
section Scenario: Truncating and inserting file data into Oracle DB.
1395
tPostgresPlusRollback
tPostgresPlusRollback
tPostgresPlusRollback properties
This component is closely related to tPostgresPlusCommit and tPostgresPlusConnection. It usually does not
make much sense to use these components independently in a transaction.
Component family
Databases/PostgresPlus
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenarios
For tPostgresPlusRollback related scenario, see section Scenario: Rollback from inserting data in mother/
daughter tables.
1396
tPostgresPlusRow
tPostgresPlusRow
tPostgresPlusRow properties
Component
family
Databases/
Postgresplus
Function
tPostgresPlusRow is the specific component for the database query. It executes the SQL query stated onto the
specified database. The row suffix means the component implements a flow in the job design although it doesnt
provide output.
Purpose
Depending on the nature of the query and the database, tPostgresPlusRow acts on the actual DB structure or on the
data (although without handling data). The SQLBuilder tool helps you write easily your SQL statements.
Basic
settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: No property data stored centrally.
Use an existing Select this check box and in the Component List click the relevant connection component to reuse
connection
the connection details you already defined.
When a Job contains the parent Job and the child Job, if you need to share an existing
connection between the two levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection to be shared in the Basic settings
view of the connection component which creates that very database connection.
2. In the child level, use a dedicated connection component to read that registered
database connection.
For an example about how to share a database connection across Job levels, see Talend
Studio User Guide.
DB Version
Host
Port
Database
Schema
Username
Password
Schema and Edit A schema is a row description, i.e., it defines the number of fields to be processed and passed on
Schema
to the next component. .
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Built-in: The schema is created and stored locally for this component only. Related topic: see
Talend Studio User Guide.
Table name
Query type
.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Built-in: Fill in manually the query statement or build it graphically using SQLBuilder
Query
Enter your DB query paying particularly attention to properly sequence the fields in order to match
the schema definition.
1397
Related scenarios
Advanced
settings
Die on error
This check box is selected by default. Clear the check box to skip the row on error and complete the
process for error-free rows. If needed, you can retrieve the rows on error via a Row > Rejects link.
Propagate
QUERYs
recordset
Select this check box to insert the result of the query into a COLUMN of the current flow. Select
this column from the use column list.
This option allows the component to have a different schema from that of the preceding
component. Moreover, the column that holds the QUERYs recordset should be set to
the type of Object and this component is usually followed by tParseRecordSet.
Use
Select this checkbox if you want to query the database using a PreparedStatement. In the Set
PreparedStatementPreparedStatement Parameter table, define the parameters represented by ? in the SQL
instruction of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same query several times.
Performance levels are increased
Commit every
Number of rows to be completed before committing batches of rows together into the DB.
This option ensures transaction quality (but not rollback) and above all better performance on
executions.
tStat
Catcher Select this check box to collect log data at the component level.
Statistics
Dynamic
settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose your database
connection dynamically from multiple connections planned in your Job. This feature is useful when you need to
access database tables having the same data structure but in different databases, especially when you are working
in an environment where you cannot change your Job settings, for example, when your Job has to be deployed and
executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is selected in the Basic
settings view. Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global
Variables
QUERY: Indicates the query to be processed. This is an After variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the variable
to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Related scenarios
For related topics, see:
section Scenario 3: Combining two flows for selective output
section Scenario: Resetting a DB auto-increment.
section Scenario 1: Removing and regenerating a MySQL table index.
1398
tPostgresPlusSCD
tPostgresPlusSCD
tPostgresPlusSCD belongs to two component families: Business Intelligence and Databases. For more
information on it, see section tPostgresPlusSCD.
1399
tPostgresPlusSCDELT
tPostgresPlusSCDELT
tPostgresPlusSCDELT belongs to two component families: Business Intelligence and Databases. For more
information on it, see section tPostgresPlusSCDELT.
1400
tRiakBucketList
tRiakBucketList
tRiakBucketList belongs to two component families: Big Data and Databases. For more information about
tRiakBucketList, see section tRiakBucketList.
1401
tRiakClose
tRiakClose
tRiakClose belongs to two component families: Big Data and Databases. For more information about tRiakClose,
see section tRiakClose.
1402
tRiakConnection
tRiakConnection
tRiakConnection belongs to two component families: Big Data and Databases. For more information about
tRiakConnection, see section tRiakConnection.
1403
tRiakInput
tRiakInput
tRiakInput belongs to two component families: Big Data and Databases. For more information about tRiakInput,
see section tRiakInput.
1404
tRiakKeyList
tRiakKeyList
tRiakKeyList belongs to two component families: Big Data and Databases. For more information about
tRiakKeyList, see section tRiakKeyList.
1405
tRiakOutput
tRiakOutput
tRiakOutput belongs to two component families: Big Data and Databases. For more information about
tRiakOutput, see section tRiakOutput.
1406
tSAPHanaClose
tSAPHanaClose
tSAPHanaClose properties
Component family
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is to be used along with SAP Hana components, especially with
tSAPHanaConnection and tSAPHanaCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1407
tSAPHanaCommit
tSAPHanaCommit
tSAPHanaCommit Properties
Component family
Function
tSAPHanaCommit validates the data processed through the Job into the connected database.
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is to be used along with other SAP Hana components, especially with
tSAPHanaConnection and tSAPHanaRollback. It usually does not make much sense to use these
components independently in a transaction or without using a tSAPHanaConnection component
to open a connection for the current transaction.
Use this component if the Auto Commit option of the tSAPHanaConnection component is
cleared.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1408
tSAPHanaConnection
tSAPHanaConnection
tSAPHanaConnection properties
Component family
Function
Purpose
This component allows you to establish a SAP Hana connection to be reused by other SAP Hana
components in your Job.
Basic settings
DB Version
Select the SAP Hana Database (HDB) version you are using.
Host
Port
Table Schema
Additional
Parameters
Advanced settings
Auto commit
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
This component is to be used along with other SAP Hana components, especially with
tSAPHanaClose and tSAPHanaRollback. It usually does not make much sense to use these
components independently in a transaction or without using a tSAPHanaClose component to close
a connection for the current transaction.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1409
tSAPHanaInput
tSAPHanaInput
tSAPHanaInput Properties
Component family
Function
Purpose
tSAPHanaInput executes a database query with a defined command which must correspond to
the schema definition. Then it passes on rows to the next component via a Main row link.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version
Select the SAP Hana Database (HDB) version you are using.
Host
Port
Schema
Table Name
Name of the table to be written. Note that only one table can be
written at a time.
Query Type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder.
1410
Guess Query
Guess schema
Query
Related scenario
Advanced settings
Additional
Parameters
Trim all the String/Char Select this check box to remove leading and trailing whitespaces
columns
from all the String/Char columns.
Trim column
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is standalone as it includes the SAP Hana engine. This is a startable component
that can initiate a data flow processing.
Related scenario
No scenario is available for this component yet.
1411
tSAPHanaOutput
tSAPHanaOutput
tSAPHanaOutput Properties
Component family
Function
tSAPHanaOutput writes, updates, makes changes or suppresses entries in a SAP Hana database.
Purpose
tSAPHanaOutput executes the action defined on the table and/or on the data contained in the
table, based on the flow incoming from the preceding component in the Job.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version
Select the SAP Hana Database (HDB) version you are using.
Host
Port
Table Schema
Table
Name of the table to be written. Note that only one table can be
written at a time.
Action on table
This option is only available if you create (with or without drop) the
database table. This option allows you to define the way the data is
stored in the table. The following types of table storage organization
are available:
Row: Data is stored in rows. It is preferable to use this table type if
the majority of table access involves selecting a few records, with
all attributes selected.
1412
tSAPHanaOutput Properties
Die on error
Advanced settings
Additional
Parameters
This check box is cleared by default. This means that Die on error
skips the row when an error is encountered and completes the process
for rows without errors.
JDBC Specify additional connection properties in the database connection
you are creating. This option is not available if you have selected
Use an existing connection check box in the Basic settings.
Commit every
Additional Columns
1413
Related scenario
Select this check box to display each step during processing entries
in a database.
Support null in
WHERE statement
SQL Select this check box to validate null in SQL WHERE statement.
tStatCatcher Statistics
Dynamic settings
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component must be connected to an Input component. It allows you to carry out actions on
a table or on the data of a table in an SAP Hana database. It also allows you to create reject flows
using a Row > Reject link to filter erroneous data.
Related scenario
No scenario is available for this component yet.
1414
tSAPHanaRollback
tSAPHanaRollback
tSAPHanaRollback properties
Component family
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is to be used along with SAP Hana components, especially with
tSAPHanaConnection and tSAPHanaCommit. It usually does not make much sense to use these
components independently in a transaction.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1415
tSAPHanaRow
tSAPHanaRow
tSAPHanaRow Properties
Component family
Function
tSAPHanaRow is the specific component for this database query. It executes the SQL query stated
onto the specified database. The row suffix means the component implements a flow in the Job
design although it does not provide output.
Purpose
Depending on the nature of the query and the database, tSAPHanaRow acts on the actual database
structure or on the data (although without handling data). The SQLBuilder tool helps you write
easily your SQL statements.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
DB Version
Select the SAP Hana Database (HDB) version you are using.
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host
Port
Table Name
Name of the table to be written. Note that only one table can be
written at a time.
Query Type
1416
Related scenario
Advanced settings
Guess Query
Query
Die on error
This check box is cleared by default. This means that Die on error
skips the row when an error is encountered and completes the process
for rows without errors.
Additional
Parameters
Propagate
recordset
Commit every
Use PreparedStatement
Select this check box if you want to query the database using
a PreparedStatement. In the Set PreparedStatement Parameter
table, define the parameters represented by ? in the SQL instruction
of the Query field in the Basic Settings tab.
Parameter Index: Enter the parameter position in the SQL
instruction.
Parameter Type: Enter the parameter type.
Parameter Value: Enter the parameter value.
This option is very useful if you need to execute the same
query several times. Performance levels are increased.
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
This component offers the flexibility of the database query and covers all possible SQL queries.
Related scenario
No scenario is available for this component yet.
1417
tSasInput
tSasInput
Before being able to benefit from all functional objectives of the SAS components, make sure to install the following three
modules: sas.core.jar, sas.intrnet.javatools.jar and sas.svc.connection.jar in the path lib > java in your Talend Studio
directory. You can later verify, if needed whether the modules are successfully installed through the Modules view of the
Studio.
tSasInput properties
Component family
Databases/SAS
Function
Purpose
tSasInput executes a DB query with a strictly defined statement which must correspond to the
schema definition. Then it passes on the field list to the component that follows via a Row > Main
connection.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Host name
Port
Librefs
Enter the directory name that holds the table to read followed by its
access path. For example:
TpSas C:/SAS/TpSas
Table Name
Enter the name of the table to read preceded by the directory name
that holds it. For example: TpSas.Customers.
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Query
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
This component covers all possible SQL queries for databases using SAS connections.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
1418
Related scenarios
Related scenarios
For related topics, see:
section Scenario 1: Displaying selected data from DB table.
section Scenario 2: Using StoreSQLQuery variable.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
1419
tSasOutput
tSasOutput
Before being able to benefit from all functional objectives of the SAS components, make sure to install the following three
modules: sas.core.jar, sas.intrnet.javatools.jar and sas.svc.connection.jar in the path lib > java in your Talend Studio
directory. You can later verify, if needed whether the modules are successfully installed through the Modules view of the
Studio.
tSasOutput properties
Component family
Databases/SAS
Function
Purpose
tSasOutput executes the action defined on the table and/or on the data contained in the table, based
on the incoming flow from the preceding component in the Job.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
SAS URL
Driver JAR
Class Name
Table
Action on data
1420
Related scenarios
Select this check box to delete data in the selected table before any
operation.
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as a new column.
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tSasOutput can use to place or replace the new or altered column.
Usage
Select this check box to display each step during processing entries
in a database.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
This component offers the flexibility benefit of the DB query and covers all of the SQL queries
possible.
This component must be used as an output component. It allows you to carry out actions on a
table or on the data of a table in a SAS database. It also allows you to create a reject flow using a
Row > Rejects link to filter data in error. For an example of tMySQLOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For scenarios in which tSasOutput might be used, see:
section Scenario: Writing a row to a table in the MySql database via an ODBC connection.
1421
Related scenarios
1422
tSQLiteClose
tSQLiteClose
tSQLiteClose properties
Component family
Databases/SQLite
Function
Purpose
Close a transaction.
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with SQLite components, especially with tSQLiteConnection
and tSQLiteCommit.
Limitation
n/a
Related scenario
No scenario is available for this component yet.
1423
tSQLiteCommit
tSQLiteCommit
tSQLiteCommit Properties
This component is closely related to tSQLiteConnection and tSQLiteRollback. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/SQLite
Function
tSQLiteCommit validates the data processed through the Job into the connected DB
Purpose
Using a unique connection, this component commits in one go a global transaction instead of doing
that on every row or every batch and thus provides gain in performance.
Basic settings
Component list
Close Connection
Advanced settings
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Select this check box to collect log data at the component level.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with SQLite components, especially with tSQLiteConnection
and tSQLiteRollback.
Limitation
n/a
Related scenario
This component is closely related to tSQLiteConnection and tSQLiteRollback. It usually does not make much
sense to use one of these without using a tSQLiteConnection component to open a connection for the current
transaction.
For tSQLiteCommit related scenario, see section Scenario: Inserting data in mother/daughter tables.
1424
tSQLiteConnection
tSQLiteConnection
SQLiteConnection properties
This component is closely related to tSQLiteCommit and tSQLiteRollback. It usually does not make much sense
to use one of these without using a tSQLiteConnection to open a connection for the current transaction.
Component family
Databases/SQLite
Function
Purpose
This component allows you to commit all of the Job data to an output database in just a single
transaction, once the data has been validated.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Database
Use or register a shared DB Select this check box to share your connection or fetch a connection
Connection
shared by a parent or child Job. This allows you to share one single
DB connection among several DB connection components from
different Job levels that can be either parent or child.
This option is incompatible with the Use dynamic job and
Use an independent process to run subjob options of the
tRunJob component. Using a shared database connection
together with a tRunJob component with either of these
two options enabled will cause your Job to fail.
Shared DB Connection Name: set or type in the shared connection
name.
Advanced settings
Auto commit
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Usage
This component is to be used along with SQLite components, especially with tSQLiteCommit
and tSQLiteRollback.
Limitation
n/a
Related scenarios
This component is closely related to tSQLiteCommit and tSQLiteRollback. It usually does not make much sense
to use one of these without using a tSQLiteConnection component to open a connection for the current transaction.
For tSQLiteConnection related scenario, see section tMysqlConnection
1425
tSQLiteInput
tSQLiteInput
tSQLiteInput Properties
Component family
Databases
Function
tSQLiteInput reads a database file and extracts fields based on an SQL query. As it embeds the
SQLite engine, no need of connecting to any database server.
Purpose
tSQLiteInput executes a DB query with a defined command which must correspond to the schema
definition. Then it passes on rows to the next component via a Main row link.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Database
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Query
Advanced settings
Dynamic settings
1426
Trim all the String/Char Select this check box to remove leading and trailing whitespace from
columns
all the String/Char columns.
Trim column
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is standalone as it includes the SQLite engine. This is a startable component that
can iniate a data flow processing.
Drop from the Palette, a tSQLiteInput and a tSQLiteOutput component from the Palette to the design
workspace.
Connect the input to the output using a row main link.
On the tSQLiteInput Basic settings, type in or browse to the SQLite Database input file.
The file contains hundreds of lines and includes an ip column which the select statement will based on
On the tSQLite Basic settings, edit the schema for it to match the table structure.
1427
In the Query field, type in your select statement based on the ip column.
On the tSQLiteOutput component Basic settings panel, select the Database filepath.
1428
tSQLiteOutput
tSQLiteOutput
tSQLiteOutput Properties
Component family
Databases
Function
Purpose
tSQLiteOutput executes the action defined on the table and/or on the data contained in the table,
based on the flow incoming from the preceding component in the job.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Database
Table
Name of the table to be written. Note that only one table can be
written at a time
Action on table
Action on data
1429
tSQLiteOutput Properties
Insert or update: inserts a new record. If the record with the given
reference already exists, an update would be made.
Update or insert: updates the record with the given reference. If the
record does not exist, a new record would be inserted.
Delete: Remove entries corresponding to the input flow.
It is necessary to specify at least one column as a primary
key on which the Update and Delete operations are based.
You can do that by clicking Edit Schema and selecting
the check box(es) next to the column(s) you want to
set as primary key(s). For an advanced use, click the
Advanced settings view where you can simultaneously
define primary keys for the Update and Delete operations.
To do that: Select the Use field options check box and then
in the Key in update column, select the check boxes next to
the column names you want to use as a base for the Update
operation. Do the same in the Key in delete column for the
Delete operation.
Schema and Edit Schema
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip the
row on error and complete the process for error-free rows. If needed,
you can retrieve the rows on error via a Row > Rejects link.
Commit every
Additional Columns
This option is not offered if you create (with or without drop) the
DB table. This option allows you to call SQL functions to perform
actions on columns, which are not insert, nor update or delete actions,
or action that require particular preprocessing.
Name: Type in the name of the schema column to be altered or
inserted as new column
SQL expression: Type in the SQL statement to be executed in order
to alter or insert the relevant column data.
Position: Select Before, Replace or After following the action to be
performed on the reference column.
Reference column: Type in a column of reference that the
tDBOutput can use to place or replace the new or altered column.
Dynamic settings
Select this check box to display each step during processing entries
in a database.
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
1430
Related Scenario
Usage
This component must be connected to an Input component. It allows you to carry out actions on a
table or on the data of a table in an SQLite database. It also allows you to create reject flows using
a Row > Reject link to filter erroneous data. For an example of tSQLiteOutput in use, see section
Scenario 3: Retrieve data in error with a Reject link.
Related Scenario
For scenarios related to tSQLiteOutput, see section Scenario 3: Retrieve data in error with a Reject link.
1431
tSQLiteRollback
tSQLiteRollback
tSQLiteRollback properties
This component is closely related to tSQLiteCommit and tSQLiteConnection. It usually does not make much
sense to use these components independently in a transaction.
Component family
Databases/SQLite
Function
Purpose
Basic settings
Component list
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is to be used along with SQLite components, especially with tSQLiteConnection
and tSQLiteCommit.
Limitation
n/a
Related scenarios
For tSQLiteRollback related scenario, see section Scenario: Rollback from inserting data in mother/daughter
tables.
1432
tSQLiteRow
tSQLiteRow
tSQLiteRow Properties
Component family
Databases
Function
tSQLiteRow executes the defined query onto the specified database and uses the parameters bound
with the column.
Purpose
A prepared statement uses the input flow to replace the placeholders with the values for each
parameters defined. This component can be very useful for updates.
Basic settings
Property type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
Query type
.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: Fill in manually the query statement or build it graphically
using SQLBuilder
Advanced settings
Query
Die on error
Clear this check box to skip the row on error and complete the
process for error-free rows.
Propagate
recordset
QUERYs Select this check box to insert the result of the query into a COLUMN
of the current flow. Select this column from the use column list.
This option allows the component to have a different
schema from that of the preceding component. Moreover,
the column that holds the QUERYs recordset should be
1433
Dynamic settings
Commit every
Select this check box to collect log data at the component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component offers the flexibility of the DB query and covers all possible SQL queries.
Drop a tFileInputDelimited and a tSQLiteRow component from the Palette to the design workspace.
On the tFileInputDelimited Basic settings panel, browse to the input file that will be used to update rows in
the database.
1434
There is no Header nor Footer. The Row separator is a carriage return and the Field separator is a semi-colon.
Click the [...] button next to Edit schema and define the schema structure.
Make sure the length and type are respectively correct and large enough to define the columns.
Then in the tSQLiteRow Basic settings panel, set the Database filepath to the file to be updated.
1435
Related scenarios
In the Input parameters table, add as many lines as necessary to cover all placeholders. In this scenario, type_os
and id are to be defined.
Set the Commit every field.
Save the job and press F6 to run it.
The download table from the SQLite database is thus updated with new type_os code according to the delimited
input file.
Related scenarios
For a related scenario, see:
section Scenario 3: Combining two flows for selective output
1436
DotNET components
This chapter details the main components which you can find in the DotNET family of the Palette in the
Integration perspective of Talend Studio.
The DotNET family comprises the most popular database connectors that are utilized to integrate with .NET
objects.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tDotNETInstantiate
tDotNETInstantiate
tDotNETInstantiate properties
Component family
DotNET
Function
Purpose
tDotNETInstantiate invokes the constructor of a .NET object that is intended for later reuse.
Basic settings
Dll to load
Fully qualified class name(i.e. Enter a fully qualified name for the class of interest.
ClassLibrary1.NameSpace2.Class1)
Value(s) to pass to the constructor
Click the plus button to add one or more values to be passed to the
constructor for the object. Or, leave this table empty to call a default
constructor for the object.
The valid value(s) should be the parameters required by the class to
be used.
Advanced settings
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
To use this component, you must first install the runtime DLLs, for example janet-win32.dll for Windows
32-bit version and janet-win64.dll for Windows 64-bit version, from the corresponding Microsoft Visual
C++ Redistributable Package. This allows you to avoid errors like the UnsatisfiedLinkError on dependent
DLL.
So ensure that the runtime and all of the other DLLs which the DLL to be called depends on are installed
and their versions are consistent among one another.
The required DLLs can be installed in the System32 folder or in the bin folder of the Java runtime
to be used.
If you need to export a Job using this component to run it outside the Studio, you have to specify
the runtime container of interest by setting the -Djava.library.path argument accordingly.
Related scenario
For a related scenario, see section Scenario: Utilizing .NET in Talend.
1438
tDotNETRow
tDotNETRow
tDotNETRow properties
Component family
DotNET
Function
tDotNETRow sends data to and from libraries and classes within .NET or other custom DLL files.
Purpose
tDotNETRow helps you facilitate data transform by utilizing custom or built-in .NET classes.
Basic settings
Select this check box to invoke a static method in .NET and this will
disable Use an existing instance check box.
Dll to load
Fully qualified class name(i.e. Enter a fully qualified name for the class of interest.
ClassLibrary1.NameSpace2.Class1)
Method name
Fill this field with the name of the method to be invoked in .NET.
Click the plus button to add one or more lines for values to be passed
to the constructor for the object. Or, leave this table empty to call a
default constructor for the object.
The valid value(s) should be the parameters required by the class to
be used.
Advanced settings
Method Parameters
Click the plus button to add one or more lines for parameters to be
passed to the method.
Select a column in the output row from the list to put value into it.
Select this check box to create a new instance at each row that passes
through the component.
Returns an instance of a .NET Object Select this check box to return an instance of a .NET object as a result
of a invoked method.
Store the returned value for later use Select this check box to store the returned value of a method for later
reuse in another tDotNETRow component.
1439
tStatCatcher Statistics
Usage
Select this check box to collect log data at the component level.
Prerequisites
Before replicating this scenario, you need first to build up your runtime environment.
Create the DLL to be loaded by tDotNETInstantiate
This example class built into .NET reads as follows:
using System;
using System.Collections.Generic;
using System.Text;
namespace Test1
{
public class Class1
{
string s = null;
public Class1(string s)
{
this.s = s;
1440
}
public string getValue()
{
return "Return Value from Class1: " + s;
}
}
}
This class reads the input value and adds the text Return Value from Class1: in front of this value. It is compiled
using the latest .NET.
Install the runtime DLL from the latest .NET. In this scenario, we use janet-win32.dll on Windows 32-bit version
and place it in the System32 folder.
Thus the runtime DLL is compatible with the DLL to be loaded.
Connecting components
1.
Drop the following components from the Palette to the design workspace: tDotNETInstantiate,
tDotNETRow and tLogRow.
2.
3.
Configuring tDotNETInstantiate
1.
Double-click tDotNETInstantiate to display its Basic settings view and define the component properties.
2.
Click the three-dot button next to the Dll to load field and browse to the DLL file to be loaded. Alternatively,
you can fill the field with an assembly. In this example, we use :
"C:/Program Files/ClassLibrary1/bin/Debug/ClassLibrary1.dll""
3.
Fill the Fully qualified class name field with a valid class name to be used. In this example, we use:
"Test1.Class1"
4.
Click the plus button beneath the Value(s) to pass to the constructor table to add a new line for the value
to be passed to the constructor.
1441
Configuring tDotNETRow
1.
Double-click tDotNETRow to display its Basic settings view and define the component properties.
2.
3.
Select Use an existing instance check box and select tDotNETInstantiate_1 from the Existing instance
to use list on the right.
4.
Fill the Method Name field with a method name to be used. In this example, we use "getValue", a custom
method.
5.
Click the three-dot button next to Edit schema to add one column to the schema.
Click the plus button beneath the table to add a new column to the schema and click OK to save the setting.
6.
Configuring tLogRow
1.
1442
Double-click tLogRow to display its Basic settings view and define the component properties.
2.
Click Sync columns button to retrieve the schema defined in the preceding component.
3.
From the result, you can read that the text Return Value from Class1 is added in front of the retrieved value
Hello world.
1443
ELT components
This chapter details the main components that you can find in the ELT family of the Palette in the Integration
perspective of Talend Studio.
The ELT family groups together the most popular database connectors and processing components, all dedicated
to the ELT mode where the target DBMS becomes the transformation engine.
This mode supports all of the most popular databases including Teradata, Oracle, Vertica, Netezza, Sybase, etc.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tAccessConnection
tAccessConnection
AccessConnection belongs to two component families: Databases and ELT. For more information on it, see
section tAccessConnection.
1446
tAS400Connection
tAS400Connection
tAS400Connection belongs to two component families: Databases and ELT. For more information on it, see
section tAS400Connection.
1447
tCombinedSQLAggregate
tCombinedSQLAggregate
tCombinedSQLAggregate properties
Component family
ELT/CombinedSQL
Function
Purpose
Basic settings
Group by
Define the aggregation sets, the values of which will be used for
calculations.
Output Column: Select the column label in the list offered
according to the schema structure you defined. You can add
as many output columns as you wish to make more precise
aggregations.
Input Column: Select the input column label to match the
output columns expected content, in case the output label of the
aggregation set needs to be different.
Operations
Select the type of operation along with the value to use for the
calculation and the output field.
Output Column: Select the destination field in the list.
Function: Select any of the following operations to perform on
data: count, min, max, avg, sum, first, last, distinct and count
(distinct).
Input column: Select the input column from which you want to
collect the values to be aggregated.
Advanced settings
tStatCatcher Statistics
Usage
This component is an intermediary component. The use of the corresponding connection and
commit components is recommended when using this component to allow a unique connection
to be open and then closed during the Job execution.
Limitation
n/a
1448
tCombinedSQLAggregate
and
In the design workspace, select tMysqlConnection and click the Component tab to define its basic settings.
In the Basic settings view, set the database connection details manually.
In the design workspace, select tCombinedSQLInput and click the Component tab to access the configuration
panel.
1449
Enter the source table name in the Table field, and click the three-dot button next to Edit schema to define
the data structure.
The schema defined through tCombinedSQLInput can be different from that of the source table as you can just instantiate
the desired columns of the source table. Therefore, tCombinedSQLInput also plays a role of column filtering.
In this scenario, the source database table has seven columns: id, first_name, last_name, city, state, date_of_birth,
and salary while tCombinedSQLInput only instantiates four columns that are needed for the aggregation: id,
state, date_of_birth, and salary from the source table.
In the design workspace, select tCombinedSQLFilter and click the Component tab to access the configuration
panel.
Click the Sync columns button to retrieve the schema from the previous component, or configure the schema
manually by selecting Built-in from the Schema list and clicking the [...] button next to Edit schema.
When you define the data structure for tCombinedSQLFilter, column names automatically appear in the Input column
list in the Conditions table.
In this scenario, the tCombinedSQLFilter component instantiates four columns: id, state, date_of_birth, and
salary.
In the Conditions table, set input parameters, operators and expected values in order to only extract the records
that fulfill these criteria.
1450
In this scenario, the tCombinedSQLFilter component filters the state and date_of_birth columns in the source
table to extract the employees who were born after Oct. 19, 1960 and who live in the states Utah, Ohio and Iowa.
Select And in the Logical operator between conditions list to apply the two conditions at the same time. You
can also customize the conditions by selecting the Use custom SQL box and editing the conditions in the code
box.
In the design workspace, select tCombinedSQLAggregate and click the Component tab to access the
configuration panel.
Click the Sync columns button to retrieve the schema from the previous component, or configure the schema
manually by selecting Built-in from the Schema list and clicking on the [...] button.
The tCombinedSQLAggregate component instantiates four columns: id, state, date_of_birth, and salary, coming
from the previous component.
The Group by table helps you define the data sets to be processed based on a defined column. In this example:
State.
In the Group by table, click the [+] button to add one line.
In the Output column drop-down list, select State. This column will be used to hold the data filtered on State.
The Operations table helps you define the type of aggregation operations to be performed. The Output column
list available depends on the schema you want to output (through the tCombinedSQLOutput component). In this
scenario, we want to group employees based on the state they live. We want then count the number of employees
per state, calculate the average/lowest/highest salaries as well as the oldest/youngest employees for each state.
In the Operations table, click the [+] button to add one line and then click in the Output column list to select
the output column that will hold the computed data.
In the Function field, select the relevant operation to be carried out.
In the design workspace, select tCombinedSQLOutput and click the Component tab to access the
configuration panel.
1451
Click the three-dot button next to Edit schema to define the data structure of the target table.
In this scenario, tCombinedSQLOutput instantiates seven columns coming from the previous component in the
Job design (tCombinedSQLAggregate): state, empl_count, avg_salary, min_salary, max_salary, oldest_empl
and youngest_empl.
In the design workspace, select tCombinedSQLCommit and click the Component tab to access the
configuration panel.
On the Component list, select the relevant database connection component if more than one connection is used.
Save your Job and press F6 to execute it.
Rows are inserted into a seven-column table empl_by_state in the database. The table shows, per defined state,
the number of employees, the average salary, the lowest and highest salaries as well as the oldest and youngest
employees.
1452
tCombinedSQLFilter
tCombinedSQLFilter
tCombinedSQLFilter Properties
Component family
ELT/CombinedSQL
Function
tCombinedSQLFilter allows you to alter the schema of a source table through column name
mapping and to define a row filter on that table. Therefore, it can be used to filter columns and
rows at the same time. This component has real-time capabilities since it runs the data filtering
on the DBMS itself.
Purpose
Helps to filter data by reorganizing, deleting or adding columns based on the source table and
to filter the given data source using the filter conditions.
Basic settings
Logical operator between Select the logical operator between the filter conditions defined in
conditions
the Conditions panel.
Two operators are available: Or, And.
Conditions
Select the type of WHERE clause along with the values and the
columns to use for row filtering.
Input Column: Select the column to filter in the list.
Operator: Select the type of the WHERE clause: =, < >, >, <, >=,
<=, LIKE, IN, NOT IN, and EXIST IN.
Values: Type in the values to be used in the WHERE clause.
Negate: Select this check box to enable the condition that is
opposite to the current setting.
Advanced settings
tStatCatcher Statistics
Usage
This component is an intermediary component. The use of the corresponding connection and
commit components is recommended when using this component to allow a unique connection
to be open and then closed during the Job execution.
Limitation
n/a
Related Scenario
For a related scenario, see section Scenario: Filtering and aggregating table columns directly on the DBMS.
1453
tCombinedSQLInput
tCombinedSQLInput
tCombinedSQLInput properties
Component family
ELT/CombinedSQL
Function
tCombinedSQLInput extracts fields from a database table based on its schema. This component
also has column filtering capabilities since its schema can be different from that of the database
table.
Purpose
tCombinedSQLInput extracts fields from a database table based on its schema definition.
Then it passes on the field list to the next component via a Combine row link. The schema
of tCombinedSQLInput can be different from that of the source database table but must
correspond to it in terms of the column order.
Basic settings
Table
Schema
Advanced settings
tStatCatcher Statistics
Usage
This component is an intermediary component. The use of the corresponding connection and
commit components is recommended when using this component to allow a unique connection
to be open and then closed during the Job execution.
Limitation
n/a
Related scenario
For a related scenario, see section Scenario: Filtering and aggregating table columns directly on the DBMS.
1454
tCombinedSQLOutput
tCombinedSQLOutput
tCombinedSQLOutput properties
Component family
ELT/CombinedSQL
Function
Purpose
tCombinedSQLOutput inserts records from the incoming flow to an existing database table.
Basic settings
Database Type
Component list
Table
Schema
Action on data
Select INSERT from the list to insert the records from the
incoming flow to the target database table.
Advanced settings
tStatCatcher Statistics
Usage
This component is an intermediary component. The use of the corresponding connection and
commit components is recommended when using this component to allow a unique connection
to be open and then closed during the Job execution.
Limitation
n/a
Related scenario
For a related scenario, see section Scenario: Filtering and aggregating table columns directly on the DBMS.
1455
tDB2Connection
tDB2Connection
tDB2Connection belongs to two component families: Databases and ELT. For more information on it, see section
tDB2Connection.
1456
tELTGreenplumInput
tELTGreenplumInput
tELTGreenplumInput properties
The three ELT Greenplum components are closely related, in terms of their operating conditions. These
components should be used to handle Greenplum DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Component family
ELT/Map/Greenplum
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTGreenplumInput is to be used along with the tELTGreenplumMap. Note that the Output
link to be used with these components must correspond strictly to the syntax of the table name
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTGreenplumInput, see:
section Scenario: Mapping data using a simple implicit join
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table
1457
tELTGreenplumMap
tELTGreenplumMap
tELTGreenplumMap properties
The three ELT Greenplum components are closely related, in terms of their operating conditions. These
components should be used to handle Greenplum DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Component family
ELT/Map/Greenplum
Function
Helps you to build the SQL statement graphically, using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
ELT Greenplum Map Editor The ELT Map editor allows you to define the output schema and
make a graphical build of the SQL statement to be executed. The
column names of schema can be different from the column names
in the database.
Style link
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
1458
Host
Port
Database
Advanced settings
Additional
parameters
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Dropping components
1.
Drop
tGreenplumConnection,
tELTGreenplumInput
(two),
tELTGreenplumMap,
tELTGreenplumOutput, tGreenplumCommit, tGreenplumInput and tLogRow from the Palette onto
the workspace.
2.
3.
1459
4.
Double-click tGreenplumConnection to open its Basic settings view in the Component tab.
In the Host and Port fields, enter the context variables for the Greenplum server.
In the Database field, enter the context variable for the Greenplum database.
In the Username and Password fields, enter the context variables for the authentication credentials.
For more information on context variables, see Talend Studio User Guide.
2.
1460
Double-click employee+statecode to open its Basic settings view in the Component tab.
In the Default table name field, enter the name of the source table, namely employee_by_statecode.
Click the [...] button next to the Edit schema field to open the schema editor.
Click the [+] button to add three columns, namely id, name and statecode, with the data type as INT4,
VARCHAR, and INT4 respectively.
Click OK to close the schema editor.
Link employee+statecode to tELTGreenplumMap using the output employee_by_statecode.
3.
Double-click statecode to open its Basic settings view in the Component tab.
In the Default table name field, enter the name of the lookup table, namely statecode.
4.
Click the [...] button next to the Edit schema field to open the schema editor.
1461
Click the [+] button to add two columns, namely state and statecode, with the data type as VARCHAR and
INT4 respectively.
Click OK to close the schema editor.
Link statecode to tELTGreenplumMap using the output statecode.
5.
Click tELTGreenplumMap to open its Basic settings view in the Component tab.
Click the [...] button next to the ELT Greenplum Map Editor field to open the map editor.
7.
Click the [+] button on the upper left corner to open the table selection box.
1462
On the upper right corner, click the [+] button to add an output table, namely employee_by_state.
Click Ok to close the map editor.
9.
Double-click tELTGreenplumOutput to open its Basic settings view in the Component tab.
In the Default table name field, enter the name of the output table, namely employee_by_state.
10. Click the [...] button next to the Edit schema field to open the schema editor.
1463
Click the [+] button to add three columns, namely id, name and state, with the data type as INT4, VARCHAR,
and VARCHAR respectively.
Click OK to close the schema editor.
Link tELTGreenplumMap to tELTGreenplumOutput using the table output employee_by_state.
Click OK on the pop-up window below to retrieve the schema of tELTGreenplumOutput.
Now the map editor's output table employee_by_state shares the same schema as that of
tELTGreenplumOutput.
11. Double-click tELTGreenplumMap to open the map editor.
Drop the column statecode from table employee_by_statecode to its counterpart of the table statecode,
looking for the records in the two tables that have the same statecode values.
Drop the columns id and name from table employee_by_statecode as well as the column statecode from table
statecode to their counterparts in the output table employee_by_state.
Click Ok to close the map editor.
12. Double-click tGreenplumInput to open its Basic settings view in the Component tab.
1464
Related scenario:
In the Mode area, select Table (print values in cells of a table for a better display.
2.
As shown above, the desired employee records have been written to the table employee_by_state, presenting
clearer geographical information about the employees.
Related scenario:
For related scenarios, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering.
section Scenario 2: ELT using an Alias table.
1465
tELTGreenplumOutput
tELTGreenplumOutput
tELTGreenplumOutput properties
The three ELT Greenplum components are closely related, in terms of their operating conditions. These
components should be used to handle Greenplum DB schemas to generate Insert statements, including clauses,
which are to be executed in the DB output table defined.
Component family
ELT/Map/Greenplum
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the Greenplum database
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Adds new entries to the table.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Select this check box to define a different output table name, between
double quotation marks, in the Table name field which appears.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTGreenplumOutput is to be used along with the tELTGreenplumMap. Note that the Output
link to be used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTGreenplumOutput, see:
section Scenario: Mapping data using a simple implicit join
section Scenario 1: Aggregating table columns and filtering
1466
Related scenarios
1467
tELTHiveInput
tELTHiveInput
tELTHiveInput properties
The three ELT Hive components are closely related, in terms of their operating conditions. These components
should be used to handle Hive DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Hive
Function
This component provides, for the tELTHiveMap component that follows, the input schema of the
Hive table to be used.
Purpose
This component helps to replicate the schema, which the tELTHiveMap component that follows
will use, of the input Hive table.
Basic settings
Schema
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Repository: You have already created the schema and stored it in
the Repository. You can reuse it in various projects and Job designs.
Related topic: see Talend Studio User Guide.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Edit schema
Advanced settings
Select this check box to collect log data at the component level.
Usage
tELTHiveMap is used along with a tELTHiveInput and tELTHiveOutput. Note that the Output
link to be used with these components must correspond strictly to the syntax of the table name.
If the Studio used to connect to a Hive database is operated on Windows, you must manually create
a folder called tmp in the root of the disk where this Studio is installed.
The ELT components do not handle actual data flow but only schema information.
Related scenario
For a related scenario, see section Scenario: Joining table columns and writing them into Hive
1468
tELTHiveMap
tELTHiveMap
tELTHiveMap properties
The three ELT Hive components are closely related, in terms of their operating conditions. These components
should be used to handle Hive DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Hive
Function
This component uses the tables provided as input, to feed the parameter in the built statement. The
statement can include inner or outer joins to be implemented between tables or between one table
and its aliases.
Purpose
This component helps to graphically build the Hive QL statement in order to transform data.
Basic settings
Property type
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor helps you to define the output schema as well as
build graphically the Hive QL statement to be executed. The column
names of schema can be different from the column names in the
database.
Style link
Version
Distribution
Select the product you are using as the Hadoop distribution from
the drop-down list. The options in the list vary depending on the
component you are using. Among these options, the Custom option
allows you to connect to a custom Hadoop distribution rather than
1469
tELTHiveMap properties
Select the version of the Hadoop distribution you are using. Note
that if you use Hortonworks Data Platform V2.0.0, the type of the
operating system for running the distribution and a Talend Job must
be the same, such as Windows or Linux.
Hive server
Select the Hive server through which you want the Job using this
component to execute queries on Hive.
This Hive server list is available only when the Hadoop distribution
to be used such as HortonWorks Data Platform V1.2.0 (Bimota)
supports HiveServer2. It allows you to select HiveServer2 (Hive
2), the server that better support concurrent connections of multiple
clients than HiveServer (Hive 1).
For further information about HiveServer2,
cwiki.apache.org/Hive/setting-up-hiveserver2.html.
https://
Connection mode
Select a connection mode from the list. The options vary depending
on the distribution you are using.
Host
Port
Database
see
see
https://cwiki.apache.org/Hive/
Use kerberos authentication If you are accessing a Hive Metastore running with Kerberos
security, select this check box and then enter the relevant parameters
in the fields that appear.
The values of those parameters can be found in the hive-site.xml file
of the Hive system to be used.
1. Hive
principal
uses
the
value
of
hive.metastore.kerberos.principal. This is the service principal of
the Hive Metastore.
2. Metastore
URL
uses
the
value
of
javax.jdo.option.ConnectionURL. This is the JDBC connection
string to the Hive Metastore.
3. Driver
class
uses
the
value
of
javax.jdo.option.ConnectionDriverName. This is the name of the
driver for the JDBC connection.
1470
tELTHiveMap properties
4. Username
uses
the
value
of
javax.jdo.option.ConnectionUserName. This, as well as the
Password parameter, is the user credential for connecting to the
Hive Metastore.
5. Password
uses
the
javax.jdo.option.ConnectionPassword.
value
of
Select this check box to indicate the location of the Jobtracker service
within the Hadoop cluster to be used. For example, we assume that
you have chosen a machine called machine1 as the JobTracker, then
set its location as machine1:portnumber. A Jobtracker is the service
that assigns Map/Reduce tasks to specific nodes in a Hadoop cluster.
Note that the notion job in this term JobTracker does not designate a
Talend Job, but rather a Hadoop job described as MR or MapReduce
job in Apache's Hadoop documentation on http://hadoop.apache.org.
This property is required when the query you want to use is executed
in Windows and it is a Select query. For example, SELECT
your_column_name FROM your_table_name
1471
tELTHiveMap properties
Temporary path
If you do not want to set the Jobtracker and the NameNode when you
execute the query select * from your_table_name, you need
to set this temporary path. For example, /C:/select_all in Windows.
Hadoop properties
Hive properties
Mapred job map memory If the Hadoop distribution to be used is Hortonworks Data Platform
mb and Mapred job reduce V1.2 or Hortonworks Data Platform V1.3, you need to set proper
memory mb
memory allocations for the map and reduce computations to be
performed by the Hadoop system.
In that situation, you need to enter the values you need to in
the Mapred job map memory mb and the Mapred job reduce
memory mb fields, respectively. By default, the values are both
1000 which are normally appropriate for running the computations.
Dynamic settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Temporary path
If you do not want to set the Jobtracker and the NameNode when you
execute the query select * from your_table_name, you need
to set this temporary path. For example, /C:/select_all in Windows.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
tELTHiveMap is used along with a tELTHiveInput and tELTHiveOutput. Note that the Output
link to be used with these components must correspond strictly to the syntax of the table name.
If the Studio used to connect to a Hive database is operated on Windows, you must manually create
a folder called tmp in the root of the disk where this Studio is installed.
1472
tELTHiveMap properties
The ELT components do not handle actual data flow but only schema information.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library is lib
\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the following
error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to the
native library of that MapR client. This allows the subscription-based users to make full use of
the Data viewer to view locally in the Studio the data stored in MapR. For further information
about how to set this argument, see the section describing how to view data of Talend Open
Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals corresponding
to the Hadoop distribution you are using.
1473
Create the Hive table you want to write data in. In this scenario, this table is named as agg_result, and you
can create it using the following statement in tHiveRow:
create table agg_result (id int, name string, address string, sum1 string, postal
string, state string, capital string, mostpopulouscity string) partitioned by (type
1474
In this statement, '/user/ychen/hive/table/agg_result' is the directory used in this scenario to store this created
table in HDFS. You need to replace it with the directory you want to use in your environment.
For further information about tHiveRow, see section tHiveRow.
2.
Create two input Hive tables containing the columns you want to join and aggregate these columns into the
output Hive table, agg_result. The statements to be used are:
create table customer (id int, name string, address string, idState int, id2 int,
regTime string, registerTime string, sum1 string, sum2 string) row format delimited
fields terminated by ';' location '/user/ychen/hive/table/customer'
and
create table state_city (id int, postal string, state string, capital int,
mostpopulouscity string) row format delimited fields terminated by ';' location '/
user/ychen/hive/table/state_city'
3.
Use tHiveRow to load data into the two input tables, customer and state_city. The statements to be used are:
"LOAD DATA LOCAL INPATH 'C:/tmp/customer.csv' OVERWRITE INTO TABLE customer"
and
"LOAD DATA LOCAL INPATH 'C:/tmp/State_City.csv' OVERWRITE INTO TABLE state_city"
The two files, customer.csv and State_City.csv, are two local files we created for this scenario. You need to
create your own files to provide data to the input Hive tables. The data schema of each file should be identical
with their corresponding table.
You can use tRowGenerator and tFileOutputDelimited to create these two files easily. For further
information about these two components, see section tRowGenerator and section tFileOutputDelimited.
For further information
languagemanual.html.
about
the
Hive
query
language,
see
https://cwiki.apache.org/Hive/
In the Integration perspective of Talend Studio, create an empty Job from the Job Designs node in the
Repository tree view.
For further information about how to create a Job, see Talend Studio User Guide.
2.
Drop two tELTHiveInput components and tELTHiveMap and tELTHiveOutput onto the workspace.
3.
Double-click the tELTHiveInput component using the customer link to open its Component view.
1475
2.
Click the [...] button next to Edit schema to open the schema editor.
3.
Click the
button as many times as required to add columns and rename them to replicate the schema
of the customer table we created earlier in Hive.
4.
In the Default table name field, enter the name of the input table, customer, to be processed by this
component.
5.
Double-click the other tELTHiveInput component using the state_city link to open its Component view.
1476
6.
Click the [...] button next to Edit schema to open the schema editor.
7.
Click the
button as many times as required to add columns and rename them to replicate the schema
of the state_city table we created earlier in Hive.
8.
In the Default table name field, enter the name of the input table, state_city, to be processed by this
component.
2.
In the Version area, select the Hadoop distribution you are using and the Hive version.
1477
3.
In the Connection mode list, select the connection mode you want to use. If your distribution is HortonWorks,
this mode is Embedded only.
4.
In the Host field and the Port field, enter the authentication information for the component to connect to
Hive. For example, the host is talend-hdp-all and the port is 9083.
5.
Select the Set Jobtracker URI check box and enter the location of the Jobtracker. For example, talend-hdpall:50300.
6.
Select the Set NameNode URI check box and enter the location of the NameNode. For example, hdfs://
talend-hdp-all:8020.
2.
On the input side (left in the figure), click the Add alias button to add the table to be used.
3.
In the pop-up window, select the customer table, then click OK.
4.
5.
Drag and drop the idstate column from the customer table onto the id column of the state_city table. Thus
an inner join is created automatically.
6.
On the output side (the right side in the figure), the agg_result table is empty at first. Click
at the bottom
of this side to add as many columns as required and rename them to replicate the schema of the agg_result
table you created earlier in Hive.
1478
The type column is the partition column of the agg_result table and should not be replicated in this schema. For further
information about the partition column of the Hive table, see the Hive manual.
7.
From the customer table, drop id, name, address, and sum1 to the corresponding columns in the agg_result
table.
8.
From the state_city table, drop postal, state, capital and mostpopulouscity to the corresponding columns in
the agg_result table.
9.
2.
If this component does not have the same schema of the preceding component, a warning icon appears. In
this case, click the Sync columns button to retrieve the schema from the preceding one and once done, the
warning icon disappears.
3.
In the Default table name field, enter the output table you want to write data in. In this example, it is
agg_result.
4.
In the Field partition table, click
of the agg_result table.
to add one row. This allows you to write data in the partition column
1479
This partition column was defined the moment we created the agg_result table using partitioned by (type
string) in the Create statement presented earlier. This partition column is type, which describes the type
of a customer.
5.
In Partition column, enter type without any quotation marks and in Partition value, enter prospective in
single quotation marks.
This figure present only a part of the table. You can find that the selected input columns are aggregated and written
into the agg_result table and the partition column is filled with the value prospective.
1480
tELTHiveOutput
tELTHiveOutput
tELTHiveOutput properties
The three ELT Hive components are closely related, in terms of their operating conditions. These components
should be used to handle Hive DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Hive
Function
This component executes the query built by the preceding tELTHiveMap component to write data
into the specified Hive table.
Purpose
This component works alongside tELTHiveMap to write data into the Hive table.
Basic settings
Action on data
Schema
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Repository: You have already created the schema and stored it in
the Repository. You can reuse it in various projects and Job designs.
Related topic: see Talend Studio User Guide.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Edit schema
Enter the default name of the output table you want to write data in.
Select this check box to define a different output table name, between
double quotation marks, in the Table name field that appears.
Field Partition
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
tELTHiveMap is used along with a tELTHiveInput and tELTHiveOutput. Note that the Output
link to be used with these components must correspond strictly to the syntax of the table name.
If the Studio used to connect to a Hive database is operated on Windows, you must manually create
a folder called tmp in the root of the disk where this Studio is installed.
The ELT components do not handle actual data flow but only schema information.
Related scenario
For a related scenario, see section Scenario: Joining table columns and writing them into Hive
1481
tELTJDBCInput
tELTJDBCInput
tELTJDBCInput properties
The three ELT JDBC components are closely related, in terms of their operating conditions. These components
should be used to handle JDBC DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/JDBC
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTJDBCInput is to be used along with the tELTJDBCMap. Note that the Output link to be
used with these components must correspond strictly to the syntax of the table name
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTJDBCInput, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table
1482
tELTJDBCMap
tELTJDBCMap
tELTJDBCMap properties
The three ELT JDBC components are closely related, in terms of their operating conditions. These components
should be used to handle JDBC DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/JDBC
Function
Helps to graphically build the SQL statement using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor allows you to define the output schema and
make a graphical build of the SQL statement to be executed. The
column names of schema can be different from the column names
in the database.
Style link
Property type
Host
1483
Related scenario:
Advanced settings
Port
Database
Additional
parameters
tStatCatcher Statistics
Dynamic settings
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
tELTJDBCMap is used along with tELTJDBCInput and tELTJDBCOutput. Note that the
Output link to be used with these components must correspond strictly to the syntax of the table
name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenario:
For related scenarios, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering.
section Scenario 2: ELT using an Alias table.
1484
tELTJDBCOutput
tELTJDBCOutput
tELTJDBCOutput properties
The three ELT JDBC components are closely related, in terms of their operating conditions. These components
should be used to handle JDBC DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/JDBC
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the JDBC database
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Adds new entries to the table. If duplicates are found, Job
stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Select this check box to define a different output table name, between
double quotation marks, in the Table name field which appears.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTJDBCOutput is to be used along with the tELTJDBCMap. Note that the Output link to be
used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTJDBCOutput, see tELTMysqlMap scenarios:
1485
Related scenarios
1486
tELTMSSqlInput
tELTMSSqlInput
tELTMSSqlInput properties
The three ELT MSSql components are closely related, in terms of their operating conditions. These components
should be used to handle MSSql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/MSSql
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTMySSqlInput is to be used along with the tELTMSSsqlMap. Note that the Output link to
be used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTMSSqlInput, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table
1487
tELTMSSqlMap
tELTMSSqlMap
tELTMSSqlMap properties
The three ELT MSSql components are closely related, in terms of their operating conditions. These components
should be used to handle MSSql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/MSSql
Function
Helps you to build the SQL statement graphically, using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor allows you to define the output schema and
make a graphical build of the SQL statement to be executed. The
column names of schema can be different from the column names
in the database.
Style link
Property type
Host
1488
Related scenario:
Advanced settings
Port
Database
Additional
parameters
tStatCatcher Statistics
Dynamic settings
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
tELTMSSqlMap is used along with a tELTMSSqlInput and tELTMSSqlOutput. Note that the
Output link to be used with these components must correspond strictly to the syntax of the table
name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenario:
For related scenarios, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering.
section Scenario 2: ELT using an Alias table.
1489
tELTMSSqlOutput
tELTMSSqlOutput
tELTMSSqlOutput properties
The three ELT MSSql components are closely related, in terms of their operating conditions. These components
should be used to handle MSSql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/MSSql
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the MSSql database
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Adds new entries to the table. If duplicates are found, Job
stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Select this check box to define a different output table name, between
double quotation marks, in the Table name field which appears.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTMSSqlOutput is to be used along with the tELTMSSqlMap. Note that the Output link to
be used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For use cases in relation with tELTMSSqlOutput, see tELTMysqlMap scenarios:
1490
Related scenarios
1491
tELTMysqlInput
tELTMysqlInput
tELTMysqlInput properties
The three ELT Mysql components are closely related, in terms of their operating conditions. These components
should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Mysql
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
tELTMysqlInput is to be used along with the tELTMysqlMap. Note that the Output link to be
used with these components must correspond strictly to the syntax of the table name
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTMysqlInput, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table
1492
tELTMysqlMap
tELTMysqlMap
tELTMysqlMap properties
The three ELT Mysql components are closely related, in terms of their operating conditions. These components
should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Mysql
Function
Helps to graphically build the SQL statement using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor allows you to define the output schema as well
as build graphically the SQL statement to be executed. The column
names of schema can be different from the column names in the
database.
Style link
Property type
Host
1493
tELTMysqlMap properties
Dynamic settings
Port
Database
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
tELTMysqlMap is used along with a tELTMysqlInput and tELTMysqlOutput. Note that the
Output link to be used with these components must correspond strictly to the syntax of the table
name.
The ELT components do not handle actual data flow but only schema information.
1494
This scenario describes a Job that gathers together several input DB table schemas and implementing a clause to
filter the output using an SQL statement.
Drop the following components from the Palette onto the design workspace: three tELTMysqlInput
components, a tELTMysqlMap, and a tELTMysqlOutput. Label these components to best describe their
functionality.
1495
Double-click the first tELTMysqlInput component to display its Basic settings view.
Select Repository from the Schema list, click the three dot button preceding Edit schema, and select your DB
connection and the desired schema from the [Repository Content] dialog box.
The selected schema name appears in the Default Table Name field automatically.
In this use case, the DB connection is Talend_MySQL and the schema for the first input component is owners.
Set the second and third tELTMysqlInput components in the same way but select cars and resellers
respectively as their schema names.
In this use case, all the involved schemas are stored in the Metadata node of the Repository tree view for easy retrieval.
For further information concerning metadata, see Talend Studio User Guide.
You can also select the three input components by dropping the relevant schemas from the Metadata area onto the design
workspace and double-clicking tELTMysqlInput from the [Components] dialog box. Doing so allows you to skip the
steps of labeling the input components and defining their schemas manually.
Connect the three tELTMysqlInput components to the tELTMysqlMap component using links named
following strictly the actual DB table names: owners, cars and resellers.
Connect the tELTMysqlMap component to the tELTMysqlOutput component and name the link agg_result,
which is the name of the database table you will save the aggregation result to.
Click the tELTMysqlMap component to display its Basic settings view.
Select Repository from the Property Type list, and select the same DB connection that you use for the input
components.
All the database details are automatically retrieved.
Leave all the other settings as they are.
Double-click the tELTMysqlMap component to launch the ELT Map editor to set up joins between the input
tables and define the output flow.
1496
Add the input tables by clicking the green plus button at the upper left corner of the ELT Map editor and selecting
the relevant table names in the [Add a new alias] dialog box.
Drop the ID_Owner column from the owners table to the corresponding column of the cars table.
In the cars table, select the Explicit join check box in front of the ID_Owner column.
As the default join type, INNER JOIN is displayed on the Join list.
Drop the ID_Reseller column from the cars table to the corresponding column of the resellers table to set up
the second join, and define the join as an inner join in the same way.
Select the columns to be aggregated into the output table, agg_result.
Drop the ID_Owner, Name, and ID_Insurance columns from the owners table to the output table.
Drop the Registration, Make, and Color columns from the cars table to the output table.
Drop the Name_Reseller and City columns from the resellers table to the output table.
With the relevant columns selected, the mappings are displayed in yellow and the joins are displayed in dark
violet.
Set up a filter in the output table. Click the Add filter row button on top of the output table to display the
Additional clauses expression field, drop the City column from the resellers table to the expression field, and
complete a WHERE clause that reads resellers.City ='Augusta'.
1497
Click the Generated SQL Select query tab to display the corresponding SQL statement.
1498
This scenario describes a Job that maps information from two input tables and an alias table, serving as a virtual
input table, to an output table. The employees table contains employees IDs, their department numbers, their
names, and the IDs of their respective managers. The managers are also considered as employees and hence
included in the employees table. The dept table contains the department information. The alias table retrieves the
names of the managers from the employees table.
Select Repository from the Schema list, and define the DB connection and schema by clicking the three dot
button preceding Edit schema.
The DB connection is Talend_MySQL and the schema for the first input component is employees.
In this use case, all the involved schemas are stored in the Metadata node of the Repository tree view for easy retrieval.
For further information concerning metadata, see Talend Studio User Guide.
Set the second tELTMysqlInput component in the same way but select dept as its schema.
1499
Select an action from the Action on data list as needed, Insert in this use case.
Select Repository as the schema type, and define the output schema in the same way as you defined the input
schemas. In this use case, select result as the output schema, which is the name of the database table used to
store the mapping result.
The output schema contains all the columns of the input schemas plus a ManagerName column.
Leave all the other parameters as they are.
Connect the two tELTMysqlInput components to the tELTMysqlMap component using Link connections
named strictly after the actual input table names, employees and dept in this use case.
Connect the tELTMysqlMap component to the tELTMysqlOutput component using a Link connection.
When prompted, click Yes to allow the ELT Mapper to retrieve the output table structure from the output
schema.
Click the tELTMysqlMap component and select the Component tab to display its Basic settings view.
Select Repository from the Property Type list, and select the same DB connection that you use for the input
components.
All the DB connection details are automatically retrieved.
Leave all the other parameters as they are.
Click the three-dot button next to ELT Mysql Map Editor or double-click the tELTMysqlMap component
on the design workspace to launch the ELT Map editor.
With the tELTMysqlMap component connected to the output component, the output table is displayed in the
output area.
Add the input tables, employees and dept, in the input area by clicking the green plus button and selecting the
relevant table names in the [Add a new alias] dialog box.
1500
Create an alias table based on the employees table by selecting employees from the Select the table to use list
and typing in Managers in the Type in a valid alias field in the the [Add a new alias] dialog box.
Drop the DeptNo column from the employees table to the dept table.
Select the Explicit join check box in front of the DeptNo column of the dept table to set up an inner join.
Drop the ManagerID column from the employees table to the ID column of the Managers table.
Select the Explicit join check box in front of the ID column of the Managers table and select LEFT OUTER
JOIN from the Join list to allow the output rows to contain Null values.
Drop all the columns from the employees table to the corresponding columns of the output table.
Drop the DeptName and Location columns from the dept table to the corresponding columns of the output table.
Drop the Name column from the Managers table to the ManagerName column of the output table.
1501
Click on the Generated SQL Select query tab to display the SQL query statement to be executed.
1502
tELTMysqlOutput
tELTMysqlOutput
tELTMysqlOutput properties
The three ELT Mysql components are closely related, in terms of their operating conditions. These components
should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Mysql
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the Mysql database
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Use tCreateTable as
substitute for this
function.
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Schema and Edit schema
Where clauses
UPDATE and
only)
Usage
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Select this check box to define a different output table name, between
double quotation marks, in the Table name field which appears.
tELTMysqlOutput is to be used along with the tELTMysqlMap. Note that the Output link to be
used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTMysqlOutput, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
1503
Related scenarios
1504
tELTNetezzaInput
tELTNetezzaInput
tELTNetezzaInput properties
The three ELT Netezza components are closely related, in terms of their operating conditions. These components
should be used to handle Netezza DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Netezza
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTNetezzaInput is to be used along with the tELTNetezzaMap. Note that the Output link to
be used with these components must correspond strictly to the syntax of the table name
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For related scenarios, see:
section Scenario: Mapping data using a simple implicit join
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table
1505
tELTNetezzaMap
tELTNetezzaMap
tELTNetezzaMap properties
The three ELT Netezza components are closely related, in terms of their operating conditions. These components
should be used to handle Netezza DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Netezza
Function
Helps you to build the SQL statement graphically, using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor allows you to define the output schema and
make a graphical build of the SQL statement to be executed. The
column names of schema can be different from the column names
in the database.
Style link
Property type
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Built-in: No property data stored centrally.
1506
Host
Port
Database
Related scenarios
Advanced settings
Additional
parameters
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Related scenarios
For related scenarios, see:
section Scenario: Mapping data using a simple implicit join.
section Scenario 1: Aggregating table columns and filtering.
section Scenario 2: ELT using an Alias table.
1507
tELTNetezzaOutput
tELTNetezzaOutput
tELTNetezzaOutput properties
The three ELT Netezza components are closely related, in terms of their operating conditions. These components
should be used to handle Netezza DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Netezza
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the Netezza database
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Adds new entries to the table.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Select this check box to define a different output table name, between
double quotation marks, in the Table name field that appears.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTNetezzaOutput is to be used along with the tELTNetezzaMap. Note that the Output link to
be used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For related scenarios, see:
section Scenario: Mapping data using a simple implicit join
1508
Related scenarios
1509
tELTOracleInput
tELTOracleInput
tELTOracleInput properties
The three ELT Oracle components are closely related, in terms of their operating conditions. These components
should be used to handle Oracle DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Oracle
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTOracleInput is to be used along with the tELTOracleMap. Note that the Output link to be
used with these components must must correspond strictly to the syntax of the table name
The ELT components do not handle actual data flow but only schema information.
Related scenarios
For use cases in relation with tELTOracleInput, see section Scenario: Updating Oracle DB entries.
1510
tELTOracleMap
tELTOracleMap
tELTOracleMap properties
The three ELT Oracle components are closely related, in terms of their operating conditions. These components
should be used to handle Oracle DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Oracle
Function
Helps to graphically build the SQL statement using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor allows you to define the output schema and
make a graphical build of the SQL statement to be executed. The
column names of schema can be different from the column names
in the database.
Style link
Auto: By default, the links between the input and output schemas
and the Web service parameters are in the form of curves.
Bezier curve: Links between the schema and the Web service
parameters are in the form of curve.
Line: Links between the schema and the Web service parameters are
in the form of straight lines.
This option slightly optimizes performance.
Property type
Connection type
DB Version
Host
1511
tELTOracleMap properties
Advanced settings
Port
Database
Mapping
Additional
Parameters
Select this check box to activate the hint configuration area to help
you optimize a querys execution. In this area, parameters are:
- HINT: specify the hint you need, using the syntax /*+ */. POSITION: specify where you put the hint in a SQL statement.
- SQL STMT: select the SQL statement you need to use.
tStatCatcher Statistics
Dynamic settings
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
tELTOracleMap is used along with a tELTOracleInput and tELTOracleOutput. Note that the
Output link to be used with these components must correspond strictly to the syntax of the table
name.
Note that the ELT components do not handle actual data flow but only schema
information.
1512
As described in section Scenario 1: Aggregating table columns and filtering, set up a Job for data
aggregation using the corresponding ELT components for Oracle DB, tELTOracleInput, tELTOracleMap,
and tELTOracleOutput, and execute the Job to save the aggregation result in a database table named
agg_result.
When defining filters in the ELT Map editor, note that strings are case sensitive in Oracle DB.
Launch the ELT Map editor and add a new output table named update_data.
Add a filter row to the update_data table to set up a relationship between input and output tables:
owners.ID_OWNER = agg_result.ID_OWNER.
Drop the MAKE column from the cars table to the update_data table.
Drop the NAME_RESELLER column from the resellers table to the update_data table.
Add a model enclosed in single quotation marks, A8 in this use case, to the MAKE column from the cars table,
preceded by a double pipe.
Add Sold by enclosed in single quotation marks in front of the NAME_RESELLER column from the resellers
table, with a double pipe in between.
1513
1514
1515
tELTOracleOutput
tELTOracleOutput
tELTOracleOutput properties
The three ELT Oracle components are closely related, in terms of their operating conditions. These components
should be used to handle Oracle database schemas to generate Insert, Update or Delete statements, including
clauses, which are to be executed in the database output table defined.
Component family
ELT/Map/Oracle
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the Mysql database.
Basic Settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Add new entries to the table. If duplicates are found, the Job
stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
MERGE: Updates and/or adds data to the table. Note that the options
available for the MERGE operation are different to those available
for the Insert, Update or Delete operations.
Following global variables are available:
NB_LINE_INSERTED: Number of lines inserted
during the Insert operation.
NB_LINE_UPDATED: Number of lines updated
during the Update operation.
NB_LINE_DELETED: Number of lines deleted during
the Delete operation.
NB_LINE_MERGED: Number of lines inserted and/or
updated during the MERGE operation.
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Use Merge Update (for Select this check box to update the data in the output table.
MERGE)
1516
Scenario: Using the Oracle MERGE function to update and add data simultaneously
Insert
(for Select this check box to insert the data in the table.
Column: Lists the entry flow columns.
Check All: Select the check box corresponding to the name of the
column you want to insert.
Use Merge Update Where Clause: Select this check box and enter
the WHERE clause required to filter the data to be inserted.
Advanced settings
Enter a default name for the table, between double quotation marks.
Select this check box to define a different output table name, between
double quotation marks, in the Table name field which appears.
Select this check box to activate the hint configuration area when
you want to use a hint to optimize a querys execution. In this area,
parameters are:
- HINT: specify the hint you need, using the syntax /*+ */.
- POSITION: specify where you put the hint in a SQL statement.
- SQL STMT: select the SQL statement you need to use.
tStatCatcher Statistics
Usage
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Drop the following components from the Palette to the design workspace: tELTOracleInput,
tELTOracleMap, and tELTOracleOutput.
1517
Scenario: Using the Oracle MERGE function to update and add data simultaneously
2.
3.
Link tELTOracleInput to tELTOracleMap using a Row > New Output (table) connection.
In the pop-up box, enter NEW_CUSTOMERS as the table name, which should be the actual database table
name.
4.
Link tELTOracleMap to tELTOracleOutput using a Row > New Output (table) connection.
In the pop-up box, enter customers_merge as the name of the database table, which holds the merge results.
2.
Select Repository from the Schema list, click the [...] button preceding Edit schema, and select your database
connection and the desired schema from the [Repository Content] dialog box.
The selected schema name appears in the Default Table Name field automatically.
In this use case, the database connection is Talend_Oracle and the schema is new_customers.
In this use case, the input schema is stored in the Metadata node of the Repository tree view for easy retrieval. For
further information concerning metadata, see Talend Studio User Guide.
You can also select the input component by dropping the relevant schema from the Metadata area onto the design
workspace and double-clicking tELTOracleInput from the [Components] dialog box. Doing so allows you to skip
the steps of labeling the input component and defining its schema manually.
3.
1518
Scenario: Using the Oracle MERGE function to update and add data simultaneously
4.
Select Repository from the Property Type list, and select the same database connection that you use for
the input components.
All the database details are automatically retrieved.
Leave the other settings as they are.
5.
Double-click the tELTOracleMap component to launch the ELT Map editor to set up the data transformation
flow.
Display the input table by clicking the green plus button at the upper left corner of the ELT Map editor and
selecting the relevant table name in the [Add a new alias] dialog box.
In this use case, the only input table is new_customers.
6.
Select all the columns in the input table and drop them to the output table.
1519
Scenario: Using the Oracle MERGE function to update and add data simultaneously
7.
Click the Generated SQL Select query tab to display the query statement to be executed.
Click OK to validate the ELT Map settings and close the ELT Map editor.
8.
9.
In the table that appears, select the check boxes for the columns you want to update.
In this use case, we want to update all the data according to the customer ID. Therefore, select all the check
boxes except the one for the ID column.
The columns defined as the primary key cannot and must not be made subject to updates.
10. Select the Use Merge Insert check box to insert new data while updating the existing data by leveraging
Oracles MERGE function.
In the table that appears, select the check boxes for the columns into which you want to insert new data.
1520
Scenario: Using the Oracle MERGE function to update and add data simultaneously
In this use case, we want to insert all the new customer data. Therefore, select all the check boxes by clicking
the Check All check box.
11. Fill the Default Table Name field with the name of the target table already existing in your database. In this
example, fill in customers_merge.
12. Leave the other parameters as they are.
2.
1521
tELTPostgresqlInput
tELTPostgresqlInput
tELTPostgresqlInput properties
The three ELT Postgresql components are closely related, in terms of their operating conditions. These components
should be used to handle Postgresql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Postgresql
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTPostgresqlInput is to be used along with the tELTPostgresqlMap. Note that the Output
link to be used with these components must correspond strictly to the syntax of the table name
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTPostgresqlInput, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table
1522
tELTPostgresqlMap
tELTPostgresqlMap
tELTPostgresqlMap properties
The three ELT Postgresql components are closely related, in terms of their operating conditions. These components
should be used to handle Postgresql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Postgresql
Function
Helps to build the SQL statement graphically, using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
ELT Postgresql Map Editor The ELT Map editor allows you to define the output schema and
make a graphical build of the SQL statement to be executed. The
column names of schema can be different from the column names
in the database.
Style link
Property type
Host
Port
Database
1523
Related scenario:
Additional
parameters
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Related scenario:
For related scenarios, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering.
section Scenario 2: ELT using an Alias table.
1524
tELTPostgresqlOutput
tELTPostgresqlOutput
tELTPostgresqlOutput properties
The three ELT Postgresql components are closely related, in terms of their operating conditions. These components
should be used to handle Mysql DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Postgresql
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the Postgresql database
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Select this check box to enter a different output table name, between
double quotation marks, in the Table name field which appears.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTPostgresqlOutput is to be used along with the tELTPostgresqlMap. Note that the Output
link to be used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTPostgresqlOutput, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
1525
Related scenarios
1526
tELTSybaseInput
tELTSybaseInput
tELTSybaseInput properties
The three ELT Sybase components are closely related, in terms of their operating conditions. These components
should be used to handle Sybase DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Sybase
Function
Purpose
Allows you to add as many Input tables as required, for Insert statements which can be complex.
Basic settings
Enter a default name for the table, between double quotation marks.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
tELTSybaseInput is intended for use with tELTSybaseMap. Note that the Output link to be used
with these components must correspond strictly to the syntax of the table name.
ELT components only handle schema information. They do not handle actual data flow..
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For scenarios in which tELTSybaseInput may be used, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table.
1527
tELTSybaseMap
tELTSybaseMap
tELTSybaseMap properties
The three ELT Sybase components are closely related in terms of their operating conditions. These components
should be used to handle Sybase DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Sybase
Function
Allows you construct a graphical build of the SQL statement using the table provided as input.
Purpose
Uses the tables provided as input to feed the parameters required to execute the SQL statement.
The statement can include inner or outer joins to be implemented between tables or between a table
and its aliases
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor allows you to define the output schema and
make a graphical build of the SQL statement to be executed. The
column names of schema can be different from the column names
in the database.
Style link
Property type
1528
Host
Port
Related scenarios
Database
Username et Password
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
tELTSybaseMap is intended for use with tELTSybaseInput and tELTSybaseOutput. Note that
the Output link to be used with these components must correspond strictly to the syntax of the
table name.
The ELT components only handle schema information. They do not handle actual data
flow.
Limitation
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
For scenarios in which tELTSybaseMap may be used, see the following tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering.
section Scenario 2: ELT using an Alias table.
1529
tELTSybaseOutput
tELTSybaseOutput
tELTSybaseOutput properties
The three ELT Sybase components are closely related in terms of their operating conditions. These components
should be used to handle Sybase DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Componant family
ELT/Map/Sybase
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement in the Mysql database
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Add new entries to the table. If duplicates are found, the Job
stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Schema and Edit schema
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Enter a default name for the table, between double quotation marks.
Select this check box to enter a different output table name, between
double quotation marks, in the Table name field which appears.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at component level.
Usage
Limitation
1530
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of the
Talend Installation and Upgrade Guide.
Related scenarios
Related scenarios
For scenarios in which tELTSybaseOutput may be used, see the following tELTMysqlMap scenarios :
section Scenario 1: Aggregating table columns and filtering.
section Scenario 2: ELT using an Alias table.
1531
tELTTeradataInput
tELTTeradataInput
tELTTeradataInput properties
The three ELT Teradata components are closely related, in terms of their operating conditions. These components
should be used to handle Teradata DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Teradata
Function
Provides the table schema to be used for the SQL statement to execute.
Purpose
Allows you to add as many Input tables as required for the most complicated Insert statement.
Basic settings
Enter a default name for the table, between double quotation marks.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at component level.
Usage
tELTTeradataInput is to be used along with the tELTTeradataMap. Note that the Output link
to be used with these components must correspond strictly to the syntax of the table name
Note that the ELT components do not handle actual data flow but only schema
information.
Related scenarios
For use cases in relation with tELTTeradataInput, see tELTMysqlMap scenarios:
section Scenario 1: Aggregating table columns and filtering
section Scenario 2: ELT using an Alias table
1532
tELTTeradataMap
tELTTeradataMap
tELTTeradataMap properties
The three ELT Teradata components are closely related, in terms of their operating conditions. These components
should be used to handle Teradata DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Teradata
Function
Helps to graphically build the SQL statement using the table provided as input.
Purpose
Uses the tables provided as input, to feed the parameter in the built statement. The statement can
include inner or outer joins to be implemented between tables or between one table and its aliases.
Basic settings
Select this check box and in the Component List click the relevant
connection component to reuse the connection details you already
defined.
When a Job contains the parent Job and the child Job, if
you need to share an existing connection between the two
levels, for example, to share the connection created by the
parent Job with the child Job, you have to:
1. In the parent level, register the database connection
to be shared in the Basic settings view of the
connection component which creates that very database
connection.
2. In the child level, use a dedicated connection
component to read that registered database connection.
For an example about how to share a database connection
across Job levels, see Talend Studio User Guide.
The ELT Map editor allows you to define the output schema as well
as build graphically the SQL statement to be executed. The column
names of schema can be different from the column names in the
database.
Style link
Property type
Host
1533
Related scenarios
Dynamic settings
Port
Database
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your database connection dynamically from multiple connections planned in your Job. This
feature is useful when you need to access database tables having the same data structure but in
different databases, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent of
Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Related scenarios
For use cases in relation with tELTTeradataMap, see tELTMysqlMap scenarios:
1534
Related scenarios
1535
tELTTeradataOutput
tELTTeradataOutput
tELTTeradataOutput properties
The three ELT Teradata components are closely related, in terms of their operating conditions. These components
should be used to handle Teradata DB schemas to generate Insert statements, including clauses, which are to be
executed in the DB output table defined.
Component family
ELT/Map/Teradata
Function
Carries out the action on the table specified and inserts the data according to the output schema
defined the ELT Mapper.
Purpose
Executes the SQL Insert, Update and Delete statement to the Teradata database.
Basic settings
Action on data
On the data of the table defined, you can perform the following
operation:
Insert: Add new entries to the table. If duplicates are found, Job
stops.
Update: Updates entries in the table.
Delete: Deletes the entries which correspond to the entry flow.
Schema and Edit schema
Where clauses
UPDATE and
only)
for (for Enter a clause to filter the data to be updated or deleted during the
DELETE update or delete operations.
Enter a default name for the table, between double quotation marks.
Select this check box to enter a different output table name, between
double quotation marks, in the Table name field which appears.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at component level.
Usage
tELTTeradataOutput is to be used along with the tELTTeradataMap. Note that the Output link
to be used with these components must correspond strictly to the syntax of the table name.
Note that the ELT components do not handle actual data flow but only schema
information.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
For use cases in relation with tELTTeradataOutput, see tELTMysqlMap scenarios:
1536
Related scenarios
1537
tFirebirdConnection
tFirebirdConnection
tFirebirdConnection belongs to two component families: Databases and ELT. For more information on it, see
section tFirebirdConnection.
1538
tGreenplumConnection
tGreenplumConnection
tGreenplumConnection belongs to two component families: Databases and ELT. For more information on it,
see section tGreenplumConnection.
1539
tHiveConnection
tHiveConnection
tHiveConnection belongs to two component families: Databases and ELT. For more information on it, see section
tHiveConnection.
1540
tIngresConnection
tIngresConnection
tIngresConnection belongs to two component families: Databases and ELT. For more information on it, see
section tIngresConnection.
1541
tInterbaseConnection
tInterbaseConnection
tInterbaseConnection belongs to two component families: Databases and ELT. For more information on it, see
section tInterbaseConnection.
1542
tJDBCConnection
tJDBCConnection
tJDBCConnection belongs to two component families: Databases and ELT. For more information on it, see
section tJDBCConnection.
1543
tMSSqlConnection
tMSSqlConnection
tMSSqlConnection belongs to two component families: Databases and ELT. For more information on it, see
section tMSSqlConnection.
1544
tMysqlConnection
tMysqlConnection
tMysqlConnection belongs to two component families: Databases and ELT. For more information on it, see
section tMysqlConnection.
1545
tNetezzaConnection
tNetezzaConnection
tNetezzaConnection belongs to two component families: Databases and ELT. For more information on it, see
section tNetezzaConnection.
1546
tOracleConnection
tOracleConnection
tOracleConnection belongs to two component families: Databases and ELT. For more information on it, see
section tOracleConnection.
1547
tParAccelConnection
tParAccelConnection
tParAccelConnection belongs to two component families: Databases and ELT. For more information on it, see
section tParAccelConnection.
1548
tPostgresPlusConnection
tPostgresPlusConnection
tPostgresPlusConnection belongs to two component families: Databases and ELT. For more information on it,
see section tPostgresPlusConnection.
1549
tPostgresqlConnection
tPostgresqlConnection
tPostgresqlConnection belongs to two component families: Databases and ELT. For more information on it, see
section tPostgresqlConnection.
1550
tSQLiteConnection
tSQLiteConnection
tSQLiteConnection belongs to two component families: Databases and ELT. For more information on it, see
section tSQLiteConnection.
1551
tSQLTemplate
tSQLTemplate
tSQLTemplate properties
Component family
ELT/SQLTemplate
Function
tSQLTemplate offers a range of SQL statement templates for a number of DBMSs to facilitate
some of the most common database actions. Additionally, you are allowed to customize the SQL
statement templates as needed.
Purpose
Helps users to conveniently execute the common database actions or customized SQL statement
templates, for example to drop/create a table. Note that such templates are accessible via the
SQL Template view.
Basic settings
Database Type
Select the database type you want to connect to from the list.
Component List
Database name
Table name
Advanced settings
tStatCatcher Statistics
SQL Template
1552
Related scenarios
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change your
Job settings, for example, when your Job has to be deployed and executed independent of Talend
Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
QUERY: Indicates the query to be processed. This is a Flow variable and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
As a start component, this component is used with other database components, especially the
database connection and commit components.
Related scenarios
For a related scenario, see section Scenario: Filtering and aggregating table columns directly on the DBMS.
1553
tSQLTemplateAggregate
tSQLTemplateAggregate
tSQLTemplateAggregate properties
Component family
ELT/SQLTemplate
Function
tSQLTemplateAggregate collects data values from one or more columns with the intent to
manage the collection as a single unit. This component has real-time capabilities since it runs
the data transformation on the DBMS itself.
Purpose
Basic settings
Database Type
Select the database type you want to connect to from the list.
Component List
Database name
Name of the table holding the data you want to collect values from.
Name of the table you want to write the collected and transformed
data in.
Operations
Select the type of operation along with the value to use for the
calculation and the output field.
Output Column: Select the destination field in the list.
Function: Select any of the following operations to perform on
data: count, min, max, avg, sum, and count (distinct).
Input column position: Select the input column from which you
want to collect the values to be aggregated.
Group by
Define the aggregation sets, the values of which will be used for
calculations.
Output Column: Select the column label in the list offered
according to the schema structure you defined. You can add
as many output columns as you wish to make more precise
aggregations.
Input Column position: Match the input column label with your
output columns, in case the output label of the aggregation set
needs to be different.
Advanced settings
tStatCatcher Statistics
SQL Template
1554
Click in the SQL template field and then click the arrow to display
the system SQL template list. Select the desired system SQL
template provided by Talend.
Note: You can create your own SQL template and add them to the
SQL Template List.
To create a user-defined SQL template:
-Select a system template from the SQL Template list and click
on its code in the code box. You will be prompted by the system
to create a new template.
-Click Yes to open the SQL template wizard.
-Define your new SQL template in the corresponding fields and
click Finish to close the wizard. An SQL template editor opens
where you can enter the template code.
-Click the Add button to add the new created template to the SQL
Template list.
For more information, see Talend Studio User Guide.
Usage
Limitation
n/a
1555
In the design workspace, select tMysqlConnection and click the Component tab to define the basic settings
for tMysqlConnection.
In the Basic settings view, set the database connection details manually or select Repository from the Property
Type list and select your DB connection if it has already been defined and stored in the Metadata area of the
Repository tree view.
For more information about Metadata, see Talend Studio User Guide.
In the design workspace, select tSQLTemplateFilterColumns and click the Component tab to define its basic
settings.
1556
When you define the data structure for the source table, column names automatically appear in the Column list in the
Column filters panel.
In this scenario, the source table has five columns: id, First_Name, Last_Name, Address, and id_State.
In the Column filters panel, set the column filter by selecting the check boxes of the columns you want to
write in the source table.
In this scenario, the tSQLTemplateFilterColumns component instantiates only three columns: id, First_Name,
and id_State from the source table.
In the Component view, you can click the SQL Template tab and add system SQL templates or create your own and
use them within your Job to carry out the coded operation. For more information, see section tSQLTemplateFilterColumns
Properties.
In the design workspace, select tSQLTemplateFilterRows and click the Component tab to define its basic
settings.
1557
In the Operations panel, click the plus button to add one or more lines and then click in the Output column
line to select the output column that will hold the counted data.
Click in the Function line and select the operation to be carried on.
In the Group by panel, click the plus button to add one or more lines and then click in the Output column line
to select the output column that will hold the aggregated data.
In the design workspace, select tSQLTemplateCommit and click the Component tab to define its basic
settings.
On the Database type list, select the relevant database.
On the Component list, select the relevant database connection component if more than one connection is used.
Do the same for tSQLTemplateRollback.
Save your Job and press F6 to execute it.
A two-column table aggregate_customers is created in the database. It groups customers according to their marital
status and count customer number in each marital group.
1558
tSQLTemplateCommit
tSQLTemplateCommit
tSQLTemplateCommit properties
This component is closely related to tSQLTemplateRollback and to the ELT connection component for the
database you work with. tSQLTemplateCommit, tSQLTemplateRollback and the ELT database connection
component are usually used together in a transaction.
Component family
ELT/SQLTemplate
Function
Purpose
Using a single connection, this component commits a global action in one go instead of doing so
for every row or every batch of rows, separately. This provides a gain in performance.
Basic settings
Database Type
Select the database type you want to connect to from the list.
Component List
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
SQL Template
Usage
This component is to be used with ELT components, especially with tSQLTemplateRollback and
the relevant database connection component.
Limitation
n/a
1559
Related scenario
Related scenario
This component is closely related to tSQLTemplateRollback and to the ELT connection component depending
on the database you are working with. It usually does not make much sense to use ELT components without using
the relevant ELT database connection component as its purpose is to open a connection for a transaction.
For more information on tSQLTemplateCommit, see section Scenario: Filtering and aggregating table columns
directly on the DBMS.
1560
tSQLTemplateFilterColumns
tSQLTemplateFilterColumns
tSQLTemplateFilterColumns Properties
Component family
ELT/SQLTemplate
Function
Purpose
Basic settings
Database Type
Select the type of database you want to work on from the dropdown list.
Component List
Database name
Name of the table you want to write the filtered data in.
Column Filters
In the table, click the Filter check box to filter all of the columns.
To select specific columns for filtering, select the check box(es)
which correspond(s) to the column name(s).
Advanced settings
tStatCatcher Statistics
SQL Template
1561
Related Scenario
Limitation
n/a
Related Scenario
For a related scenario, see section Scenario: Filtering and aggregating table columns directly on the DBMS.
1562
tSQLTemplateFilterRows
tSQLTemplateFilterRows
tSQLTemplateFilterRows Properties
Component family
ELT/SQLTemplate
Function
tSQLTemplateFilterRows allows you to define a row filter on one table. This component has
real-time capabilities since it runs the data filtering on the DBMS itself.
Purpose
Helps to set row filters for any given data source, based on a WHERE clause.
Basic settings
Database Type
Select the type of database you want to work on from the drop
down list.
Component List
Database name
Name of the table you want to write the filtered data in.
Where condition
Use a WHERE clause to set the criteria that you want the rows to
meet.
You can use the WHERE clause to select specific rows from the
table that match specified criteria or conditions.
Advanced settings
tStatCatcher Statistics
SQL Template
1563
Related Scenario
Limitation
n/a
Related Scenario
For a related scenario, see section Scenario: Filtering and aggregating table columns directly on the DBMS.
1564
tSQLTemplateMerge
tSQLTemplateMerge
tSQLTemplateMerge properties
Component family
ELT/SQLTemplate
Function
This component creates an SQL MERGE statement to merge data into a database table.
Purpose
This component is used to merge data into a database table directly on the DBMS by creating and
executing a MERGE statement.
Basic settings
Database Type
Select the type of database you want to work on from the drop-down
list.
Component list
Select the relevant DB connection component from the list if you use
more than one connection in the current Job.
Name of the database table holding the data you want to merge into
the target table.
Merge ON
Use UPDATE
MATCHED)
Specify the target and source columns you want to use as the primary
keys.
(WHEN Select this check box to update existing records. With the check
box selected, the UPDATE Columns table appears, allowing you to
define the columns in which records are to be updated.
Specify additional output Select this check box to update records in additional columns other
columns
than those listed in the UPDATE Columns table. With this check
box selected, the Additional UPDATE Columns table appears,
allowing you to specify additional columns.
Specify UPDATE WHERE Select this check box and type in a WHERE clause in the WHERE
clause
clause field to filter data during the update operation.
This option may not work with certain database versions,
including Oracle 9i.
Use
INSERT
MATCHED)
(WHEN Select this check box to insert new records. With the check box
selected, the INSERT Columns table appears, allowing you to
specify the columns to be involved in the insert operation.
Specify additional output Select this check box to insert records to additional columns other
columns
than those listed in the INSERT Columns table. With this check box
selected, the Additional INSERT Columns table appears, allowing
you to specify additional columns.
Specify INSERT WHERE Select this check box and type in a WHERE clause in the WHERE
clause
clause field to filter data during the insert operation.
1565
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at component level.
SQL Template
Usage
1566
Set the database connection details manually or select Repository from the Property Type list and select your
DB connection if it has already been defined and stored in the Metadata area of the Repository tree view.
For more information about Metadata, see Talend Studio User Guide.
Double-click the first tMysqlInput component to display its Basic settings view.
1567
Select the Use an existing connection check box. If you are using more than one DB connection component in
your Job, select the component you want to use from the Component List.
Click the three-dot button next to Edit schema and define the data structure of the target table, or select
Repository from the Schema list and select the target table if the schema has already been defined and stored
in the Metadata area of the Repository tree view.
In this scenario, we use built-in schemas.
Define the columns as shown above, and then click OK to propagate the schema structure to the output
component and close the schema dialog box.
Fill the Table Name field with the name of the target table, customer_info_merge in this scenario.
Click the Guess Query button, or type in SELECT * FROM customer_info_merge in the Query area, to
retrieve all the table columns.
Define the properties of the second tMysqlInput component, using exactly the same settings as for the first
tMysqlInput component.
In the Basic settings view of each tLogRow component, select the Table option in the Mode area so that the
contents will be displayed in table cells on the console.
1568
Type in the names of the source table and the target table in the relevant fields.
In this scenario, the source table is new_customer_info, which contains eight records; the target table is
customer_info_merge, which contains five records, and both tables have the same data structure.
The source table and the target table may have different schema structures. In this case, however, make sure that the source
column and target column specified in each line of the Merge ON table, the UPDATE Columns table, and the INSERT
Columns table are identical in data type and the target column length allows the insertion of the data from the corresponding
source column.
Define the source schema manually, or select Repository from the Schema list and select the relevant table if
the schema has already been defined and stored in the Metadata area of the Repository tree view.
In this scenario, we use built-in schemas.
Define the columns as shown above and click OK to close the schema dialog box, and do the same for the
target schema.
1569
Click the green plus button beneath the Merge ON table to add a line, and select the ID column as the primary
key.
Select the Use UPDATE check box to update existing data during the merge operation, and define the columns
to be updated by clicking the green plus button and selecting the desired columns.
In this scenario, we want to update all the columns according to the customer IDs. Therefore, we select all the
columns except the ID column.
The columns defined as the primary key CANNOT and MUST NOT be made subject to updates.
Select the Specify UPDATE WHERE clause check box and type in customer_info_merge.ID >= 4 within
double quotation marks in the WHERE clause field so that only those existing records with an ID equal to or
greater than 4 will be updated.
Select the Use INSERT check box and define the columns to take data from and insert data to in the INSERT
Columns table.
In this example, we want to insert all the records that do not exist in the target table.
1570
Select the SQL Template view to display and add the SQL templates to be used.
By default, the SQLTemplateMerge component uses two system SQL templates: MergeUpdate and
MergeInsert.
In the SQL Template tab, you can add system SQL templates or create your own and use them within your Job to carry out
the coded operation. For more information, see section tSQLTemplateFilterColumns Properties.
Click the Add button to add a line and select Commit from the template list to commit the merge result to
your database.
Alternatively, you can connect the tSQLTemplateMerge component to a tSQLTemplateCommit or
tMysqlCommit component using a Trigger > OnSubjobOK connection to commit the merge result to your
database.
Save your Job and press F6 to run it.
Both the original contents of the target table and the merge result are displayed on the console. In the target
table, records No. 4 and No. 5 contain the updated information, and records No.6 through No. 8 contain the
inserted information.
1571
1572
tSQLTemplateRollback
tSQLTemplateRollback
tSQLTemplateRollback properties
This component is closely related to tSQLTemplateCommit and to the ELT connection component relative to
the database you work with. tSQLTemplateRollback, tSQLTemplateCommit and the ELT database connection
component are usually used together in a transaction.
Component family
ELT/SQLTemplate
Function
tSQLTemplateRollback cancels the transaction committed in the database you connect to.
Purpose
Basic settings
Database Type
Select the database type you want to connect to from the list.
Component List
Close Connection
Clear this check box to continue to use the selected connection once
the component has performed its task.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
SQL Template
Usage
This component is to be used with ELT components, especially with tSQLTemplateCommit and
the relevant database connection component.
Limitation
n/a
Related scenarios
For a tSQLTemplateRollback related scenario, see section Scenario: Filtering and aggregating table columns
directly on the DBMS.
1573
tSybaseConnection
tSybaseConnection
tSybaseConnection belongs to two component families: Databases and ELT. For more information on it, see
section tSybaseConnection.
1574
tTeradataConnection
tTeradataConnection
tTeradataConnection belongs to two component families: Databases and ELT. For more information on it, see
section tTeradataConnection.
1575
tVectorWiseConnection
tVectorWiseConnection
tVectorWiseConnection belongs to two component families: Databases and ELT. For more information on it,
see section tVectorWiseConnection.
1576
ESB components
This chapter details the main components that you can find in the ESB family of the Palette in the Integration
perspective of Talend Studio.
The ESB component family groups together the components dedicated to ESB related tasks.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tESBConsumer
tESBConsumer
tESBConsumer properties
Component family
ESB/Web Services
Function
Calls the defined method from the invoked Web service and returns the class as defined, based
on the given parameters.
Purpose
Basic settings
Service configuration
Connection time out(second) Set a value in seconds for Web service connection time out.
Receive time out(second)
Input Schema
schema
and
Response Schema and Edit A schema is a row description, i.e., it defines the number of fields
schema
to be processed and passed on to the next component.
If you are using Talend Open Studio for Big Data, only the Builtin mode is available.
Click Edit schema to make changes to the schema.
Built-in: The schema is created and stored locally for this
component only. Related topic: see Talend Studio User Guide.
Fault Schema
schema
and
1578
tESBConsumer properties
SSL/ Select this check box to validate the server certificate to the client
and via an SSL protocol and fill in the corresponding fields:
TrustStore file: Enter the path (including filename) to the
certificate TrustStore file that contains the list of certificates that
the client trusts.
TrustStore password: Enter the password used to check the
integrity of the TrustStore data.
Die on error
Advanced settings
Select this check box to kill the Job when an error occurs.
Service Locator
Properties
Custom This table appears when Use Service Locator is selected. You
can add as many lines as needed in the table to customize the
relevant properties. Enter the name and the value of each property
between double quotation marks in the Property Name field and
the Property Value field respectively.
Service Activity
Properties
Log messages
Select this check box to log the message exchange between the
service provider and the consumer.
tStatCatcher Statistics
1579
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
turn on or off the Use Service Locator or Use Service Activity Monitor option dynamically at
runtime. You can add two rows in the table to set both options.
Once a dynamic parameter is defined, the corresponding option becomes highlighted and
unusable in the Basic settings view.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, a
tXMLMap, a tESBConsumer, and two tLogRow components.
2.
Right-click the tFixedFlowInput component, select Row > Main from the contextual menu and click the
tXMLMap component.
3.
Right-click the tXMLMap component, select Row > *New Output* (Main) from the contextual menu and
click the tESBConsumer component. Enter payload in the popup dialog box to name this row and accept
the propagation that prompts you to get the schema from the tESBConsumer component.
4.
Right-click the tESBConsumer component, select Row > Response from the contextual menu and click one
of the tLogRow component.
5.
Right-click the tESBConsumer component again, select Row > Fault from the contextual menu and click
the other tLogRow component.
1580
Double-click the tFixedFlowInput component to open its Basic settings view in the Component tab.
2.
Click the three-dot button next to Edit Schema. In the schema dialog box, click the plus button to add a new
line of String type and name it payloadString. Click OK to close the dialog box.
3.
4.
In the Mode area, select Use Single Table and input the following request in double quotation marks into
the Value field:
nomatter@gmail.com
In the design workspace, double-click the tXMLMap component to open the Map Editor.
2.
On the lower right part of the map editor , click [+] to add a row of Document type to the output table and
name it payload.
3.
In the output table, right-click the root node and select Rename from the contextual menu. Enter IsValidEmail
in the dialog box that appears.
4.
Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter http://
www.webservicex.net in the dialog box that appears.
5.
Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu. Enter
Email in the dialog box that appears.
1581
6.
Right-click the Email node and select As loop element from the contextual menu.
7.
Click the payloadString node in the input table and drop it to the Expression column in the row of the Email
node in the output table.
8.
In the design workspace, double-click the tESBConsumer component to open its Basic settings view in the
Component tab.
2.
1582
3.
In the dialog box that appears, type in: http://www.webservicex.net/ValidateEmail.asmx?WSDL in the WSDL
field and click the refresh button to retrieve port name and operation name. In the Port Name list, select the
port you want to use, ValidateEmailSoap in this example. Click Finish to validate your settings and close
the dialog box.
The tLogRow components will monitor the message exchanges and does not need any configuration. Press Ctrl
+S to save your Job.
1583
Drop the following components from the Palette onto the design workspace: a tESBConsumer, a tMap,
two tFixedFlowInput, two tXMLMap, and two tLogRow.
2.
Connect each of the tFixedFlowInput with a tXMLMap using the Row > Main connection.
3.
Right-click the first tXMLMap, select Row > *New Output* (Main) from the contextual menu and click
tMap. Enter payload in the popup dialog box to name this row.
Repeat this operation to connect the other tXMLMap to tMap and name the output row header.
4.
Right-click the tMap component, select Row > *New Output* (Main) from the contextual menu and click
the tESBConsumer component. Enter request in the popup dialog box to name this row and accept the
propagation that prompts you to get the schema from the tESBConsumer component.
5.
Right-click the tESBConsumer component, select Row > Response from the contextual menu and click one
of the tLogRow component.
6.
Right-click the tESBConsumer component again, select Row > Fault from the contextual menu and click
the other tLogRow component.
1584
Double-click the first tFixedFlowInput component to open its Basic settings view in the Component tab.
2.
Click the [...] button next to Edit Schema. In the schema dialog box, click the [+] button to add a new line
of String type and name it payload. Click OK to close the dialog box.
3.
4.
In the Mode area, select Use Single Table and enter "nomatter@gmail.com" into the Value field, which
is the payload of the request message.
5.
Configure the second tFixedFlowInput as the first one, except for its schema.
Add two rows of String type to the schema and name them id and company respectively.
1585
Give the value Hello world! to id and Talend to company, which are the headers of the request message.
In the design workspace, double-click the first tXMLMap component to open the Map Editor.
2.
On the lower right part of the map editor , click [+] to add a row of Document type to the output table and
name it payload.
3.
In the output table, right-click the root node and select Rename from the contextual menu. Enter IsValidEmail
in the dialog box that appears.
4.
Right-click the IsValidEmail node and select Set A Namespace from the contextual menu. Enter http://
www.webservicex.net in the dialog box that appears.
5.
Right-click the IsValidEmail node again and select Create Sub-Element from the contextual menu. Enter
Email in the dialog box that appears.
6.
Right-click the Email node and select As loop element from the contextual menu.
7.
Click the payload node in the input table and drop it to the Expression column in the row of the Email node
in the output table.
1586
8.
9.
Configure the other tXMLMap in the same way. Add a row of Document type to the output table and name
it header. Create two sub-elements to it, id and company. Map the id and the company nodes in the input
table to the corresponding nodes in the output table.
1587
2.
On the lower right part of the map editor, click [+] to add two rows of Document type to the output table and
name them payload and headers respectively.
3.
Click the payload node in the input table and drop it to the Expression column in the row of the payload
node in the output table.
4.
Click the header node in the input table and drop it to the Expression column in the row of the headers
node in the output table.
1588
In the design workspace, double-click the tESBConsumer component to open its Basic settings view in the
Component tab.
2.
3.
In the dialog box that appears, type in: http://www.webservicex.net/ValidateEmail.asmx?WSDL in the WSDL
field and click the refresh button to retrieve port name and operation name. In the Port Name list, select
the port you want to use, ValidateEmailSoap in this example. Click OK to validate your settings and close
the dialog box.
4.
In the Advanced settings view, select the Log messages check box to log the content of the messages.
1589
The tLogRow components will monitor the message exchanges and does not need any configuration. Press Ctrl
+S to save your Job.
As shown in the execution log, the SOAP header is sent with the request to the service.
1590
tRESTClient
tRESTClient
tRESTClient properties
Component family
ESB/REST
Function
The tRESTClient component sends HTTP and HTTPS requests to a REpresentational State
Transfer (REST) Web service provider and gets the corresponding responses. This component
integrates well with to get HTTPS support, with more QoS features to be supported in time.
Purpose
The tRESTClient component is used to interact with RESTful Web service providers by
sending HTTP and HTTPS requests using CXF (JAX-RS).
Basic settings
URL
Relative Path
HTTP Method
From this list, select an HTTP method that describes the desired
action. The specific meanings of the HTTP methods are subject
to definitions of your Web service provider. Listed below are the
generally accepted HTTP method definitions:
- GET: retrieves data from the server end based on the given
parameters.
- POST: uploads data to the server end based on the given
parameters.
- PUT: updates data based on the given parameters, or if the data
does not exist, creates it.
- DELETE: removes data based on the given parameters.
Content Type
Accept Type
Select the media type the client end is prepared to accept for the
response from the server end.
1591
tRESTClient properties
Use Authentication
Use
Service
Monitor
Input Schema
Schema for the input data. This schema contains two columns:
- body: stores the content of structured input data
- string: stores the input content when it is, or is handled as, a
string.
Response Schema
Schema for server response. This schema is passed onto the next
component via a Row > Response link, and it contains three
columns:
- statusCode: stores the HTTP status code from the server end.
- body: stores the content of a structured response from the server
end.
- string: stores the response content from the server end when it
is, or is handled as, a string.
1592
tRESTClient properties
Error Schema
Schema for error information. This schema is passed onto the next
component via a Row > Error link, and it contains two columns:
- errorCode: stores the HTTP status code from the server
end when an error occurs during the invocation process. The
specific meanings of the errors codes are subject to definitions
of your Web service provider. For reference information, visit
en.wikipedia.org/wiki/List_of_HTTP_status_codes.
- errorMessage: stores the error message corresponding the error
code.
Advanced settings
Die on error
This check box is selected to kill the Job when an error occurs.
Clear the check box to skip the row on error and complete the
process for error-free rows.
Connection timeout
Set the amount of time, in seconds, that the client will attempt to
establish a connection before it times out. If set to 0, the client will
continue to attempt to open a connection indefinitely. (default:
30)
Receive timeout
Set the amount of time, in seconds, that the client will wait for
a response before it times out. If set to 0, the client will wait
indefinitely. (default: 60)
Log messages
Select this check box to log the message exchange between the
service provider and the consumer.
Convert Response To DOM Select this check box to convert the response from the server to
Document
document type.
Drop JSON Request Root
HTTP Headers
Service Locator Customer This option appears when Use Service Locator is enabled in the
Properties
Basic settings tab. Click [+] to add as many properties as needed
to the table. Enter the name and the value of each property in the
Property Name field and the Property Value field respectively
to identify the service.
Service Activity Customer This option appears when Use Service Activity Monitor is
Properties
enabled in the Basic settings tab. Click [+] to add as many
properties as needed to the table. Enter the name and the value
of each property in the Property Name field and the Property
Value field respectively to identify the service.
Dynamic settings
Select this check box if you are using a proxy server. Once
selected, you need to provide the connection details: host, port,
username and password.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to
turn on or off the Use Service Locator or Use Service Activity Monitor option dynamically
at runtime. You can add two rows in the table to set both options.
Once a dynamic parameter is defined, the corresponding option becomes highlighted and
unusable in the Basic settings view.
For more information on Dynamic settings and context variables, see Talend Studio User
Guide.
Usage
This component is used as a RESTful Web service client to communicate with a RESTful
service provider, with the ability to input a request to a service into a Job and return the Job
result as a service response. Depending on the actions to perform, it usually works as a start
or middle component in a Job or subjob.
Connections
Outgoing links:
1593
n/a
In the Windows Run box, enter the full path of the RestServer.jar.
2.
3.
In the browser address bar, enter the URL of the RESTful Web service, namely http://127.0.0.1:8080/
customerservice/customers.
1594
You can find the service deployed and its original records.
Drop the following components from the Palette onto the design workspace: a tFixedFlowInput, two
tXMLMap, two tRESTClient components, and a tFileOutputDelimited components.
2.
Connect the tFixedFlowInput to the first tXMLMap component using a Row > Main connection.
3.
Connect the first tXMLMap component to the first tRESTClient component using a Row > Main
connection, and give it a name, out in this example.
4.
Connect the second tRESTClient to the second tXMLMap using a Row > Response connection, which will
retrieve the customer information from the server end.
5.
Connect the second tXMLMap to the tFileOutputDelimited using a Row > Main connection, and give it
a name, out2 in this example, to write the customer information into a CSV file.
6.
7.
1595
2.
Click the [...] button next to Edit schema and then set up the schema of the input data in the [Schema] dialog
box, and click OK to close the [Schema] dialog box.
In this example, the input schema has only one column of string type, name.
3.
In the Basic settings view of tFixedFlowInput, fill the Number of rows field with 1.
In the Values table under the Use Single Table option, fill the Value field with a customer name, Gerald
Wilson for example, between double quotation marks.
4.
1596
Double-click the tXMLMap component labeled Map to XML to open the Map Editor.
5.
In the output table, right-click the default root node of the body column, select Rename from the contextual
menu, and rename it to Customer. Make sure Customer is the loop element because the XML structure of the
Web service to be invoked is looped on this element.
Right-click the Customer node, select Create Sub-Element from the contextual menu, and create subelement named name.
6.
Drop the name column in the input table to the name node in the output table, and then click OK to validate
the mapping and close the Map Editor.
7.
Double-click the tRESTClient component labeled HTTP POST to open its Basic settings view.
1597
8.
Fill the URL field with the URL of the Web service you are going to invoke. Note that the URL provided in
this use case is for demonstration purpose only and is not a live address.
9.
From the HTTP Method list, select POST to send an HTTP request for creating a new record.
From the Content Type list, select the type of the content to be uploaded to the server end, XML in this
example.
From the Accept Type list, select the type the client end is prepared to accept for the response from the server
end, XML in this example. Leave the rest of the settings as they are.
10. Click the Advanced settings view of the HTTP POST component. Select the Log messages and the Convert
Response To DOM Document check boxes to log the message exchange to the server and convert the
response from the server to document type.
1598
Double-click the tRESTClient component labeled HTTP GET to open its Basic settings view.
2.
Fill the URL field with the same URL as in the first tRESTClient component.
3.
From the HTTP Method list, select GET to send an HTTP request for retrieving the existing records, and
select XML from the Accept Type list. Leave the rest of the settings as they are.
1599
4.
In the Advanced settings view of the HTTP GET component, select the Log messages and the Convert
Response To DOM Document check boxes to log the message exchange to the server and convert the
response from the server to document type.
5.
Double-click the tXMLMap component labeled Extract Response to open the Map Editor.
6.
In the input table, right-click the default root node of the body column, select Rename from the contextual
menu, and rename it to Customers.
Right-click the Customers node, select Create Sub-Element from the contextual menu, and create subelement named Customer. Make sure Customer is the loop element because the XML structure of the Web
service to be invoked is looped on this element.
Repeat this operation to create two sub-elements under the Customer node, id and name.
7.
Drop the id and name columns in the input table to the output table, and then click OK to validate the mapping
and close the Map Editor.
8.
1600
9.
In the File Name field, specify the path to the output file to save the GET result.
2.
3.
Go to the web console and you can find that a new record is added:
1601
4.
1602
Go to output file path to view the customer information in the CSV file:
File components
This chapter details the main components that you can find in File family of the Palette in the Integration
perspective of Talend Studio.
The File family groups together components that read and write data in all types of files, from the most popular
to the most specific format (in the Input and Output subfamilies). In addition, the Management subfamily groups
together File-dedicated components that perform various tasks on files, including unarchiving, deleting, copying,
comparing files and so on.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tAdvancedFileOutputXML
tAdvancedFileOutputXML
tAdvancedFileOutputXML belongs to two component families: File and XML. For more information on
tAdvancedFileOutputXML, see section tAdvancedFileOutputXML.
1604
tApacheLogInput
tApacheLogInput
tApacheLogInput properties
Component family
File/Input
Function
Purpose
tApachLogInput helps to effectively manage the Apache HTTP Server,. It is necessary to get feedback
about the activity and performance of the server as well as any problems that may be occurring.
Basic settings
Property type
File Name
Advanced settings
Die on error
Select this check box to stop the execution of the Job when an error occurs.
Clear the check box to skip the row on error and complete the process for
error-free rows. If needed, you can collect the rows on error using a Row
> Reject link.
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level as
well as at each component level.
Usage
tApacheLogInput can be used with other components or as a standalone component. It allows you to
create a data flow using a Row > Main connection, or to create a reject flow to filter specified data
using a Row > Reject connection. For an example of how to use these two links, see section Scenario 2:
Extracting correct and erroneous data from an XML field in a delimited file.
Limitation
n/a
Drop a tApacheLogInput component and a tLogRow component from the Palette onto the design
workspace.
2.
Right-click on the tApacheLogInput component and connect it to the tLogRow component using a Main
Row link.
1605
3.
4.
Click the Component tab to define the basic settings for tApacheLogInput.
5.
If desired, click the Edit schema button to see the read-only columns.
6.
In the File Name field, enter the file path or browse to the access-log file you want to read.
7.
In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information, see section tLogRow
8.
The log lines of the defined file are displayed on the console.
1606
tCreateTemporaryFile
tCreateTemporaryFile
tCreateTemporaryFile properties
Component family
File/Management
Function
Purpose
tCreateTemporaryFile helps to create a temporary file and puts it in a defined directory. This
component allows you to either keep the temporary file or delete it after Job execution.
Basic settings
Remove file when execution Select this check box to delete the temporary file after Job
is over
execution.
Use default temporary Select this check box to create the file in the systems default
system directory
temporary directory.
Usage
Directory
Template
Suffix
Enter the filename extension to indicate the file format you want
to give to the temporary file.
tCreateTemporaryFile provides the possibility to manage temporary files so that the memory
can be freed for other ends and thus optimizes system performance.
Global Variables
FILEPATH: Retrieves the path where the file was created. This
is an After variable and it returns a string.
For further information about variables, see Talend Studio User
Guide.
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.
Connections
Limitation
n/a
1607
Drop the following components from the Palette onto the design workspace: tCreate temporaryFile,
tRowGenerator, tFileOutputDelimited, tFileInputDelimited and tLogRow.
2.
3.
4.
5.
2.
Click the Component tab to define the basic settings for tCreateTemporaryFile.
3.
Select the Remove file when execution is over check box to delete the created temporary file when Job
execution is over.
4.
Click the three-dot button next to the Directory field to browse to the directory where temporary files will
be stored, or enter the path manually.
5.
In the Template field, enter a name for the temporary file respecting the template format.
6.
In the Suffix field, enter a filename extension to indicate the file format you want to give to the temporary file.
7.
In the design workspace, select tRowGenerator and click the Component tab to define its basic settings.
1608
8.
9.
Click the Edit schema three-dot button to define the data to pass on to the tFileOutputDelimited component,
one column in this scenario, value.
Click OK to close the dialog box.
10. Click the RowGenerator Editor three-dot button to open the editor dialog box.
11. In the Number of Rows for Rowgenerator field, enter 5 to generate five rows and click Ok to close the
dialog box.
12. In the design workspace, select tFileOutputDelimited and click the Component tab to define its basic
settings.
1609
14. Click in the File Name field and use the Ctrl+Space bar combination to access the variable completion list.
To output data in the created temporary file, select tCreateTemporaryFile_1.FILEPATH on the global
variable list.
15. Set the row and field separators in their corresponding fields as needed.
16. Set Schema to Built-In and click Sync columns to synchronize input and output columns. Note that the row
connection feeds automatically the output schema.
For more information about schema types, see Talend Studio User Guide.
17. In the design workspace, select the tFileInputDelimited component.
18. Click the Component tab to define the basic settings of tFileInputDelimited.
19. Click in the File Name field and use the Ctrl+Space bar combination to access the variable completion
list. To read data in the created temporary file, select tCreateTemporaryFile_1.FILEPATH on the global
variable list.
20. Set the row and field separators in their corresponding fields as needed.
21. Set Schema to Built in and click Edit schema to define the data to pass on to the tLogRow component. The
schema consists of one column here, value.
2.
Press F6 to execute the Job or click the Run button of the Run tab.
The temporary file is created in the defined directory during Job execution and the five generated rows are written
in it. The temporary file is deleted when Job execution is over.
1610
tChangeFileEncoding
tChangeFileEncoding
tChangeFileEncoding Properties
Component family
File/Management
Function
Purpose
tChangeFileEncoding transforms the character encoding of a given file and generates a new
file with the transformed character encoding.
Basic settings
Use Custom Input Encoding Select this check box to customize input encoding type. When it
is selected, a list of input encoding types appears, allowing you to
select an input encoding type or specify an input encoding type
by selecting CUSTOM.
Encoding
From this list of character encoding types, you can select one
of the offered options or customize the character encoding by
selecting CUSTOM and specifying a character encoding type.
Usage
Limitation
n/a
2.
1611
3.
Select Use Custom Input Encoding check box. Set the Encoding type to GB2312.
4.
In the Input File Name field, enter the file path or browse to the input file.
5.
In the Output File Name field, enter the file path or browse to the output file.
6.
Select CUSTOM from the second Encoding list and enter UTF-16 in the text field.
7.
The encoding type of the file in.txt is transformed and out.txt is generated with the UTF-16 encoding type.
1612
tFileArchive
tFileArchive
tFileArchive properties
Component Family
File/Management
Function
The tFileArchive zips one or several files according to the parameters defined and places the
archive created in the directory selected.
Purpose
Basic settings
Directory
Archive file
Compress level
All files
Select this check box if you want all files in the directory to be
zipped. Clear it to specify the file(s) you want to zip in the Files
table.
Filemask: type in a file name or a file mask using a special
character or a regular expression.
Create directory if not This check box is selected by default. It creates a destination folder
exists
for the output table if it does not already exist.
Encoding
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
Overwrite Existing Archive This check box is selected by default. This allows you to save an
archive by replacing the existing one. But if you clear the check
box, an error is reported, the replacement fails and the new archive
cannot be saved.
When the replacement fails, the Job runs.
Encrypt files
ZIP64 mode
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
1613
Global Variables
ARCHIVE_FILEPATH: Retrieves the path to the archive file. This is an After variable and
it returns a string.
ARCHIVE_FILENAME: Retrieves the name of the archive file. This is an After variable
and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
Usage
Limitation
n/a
Drop the tFileArchive component from the Palette onto the workspace.
2.
3.
In the Directory field, click the [...] button, browse your directory and select the directory or the file you
want to compress.
4.
Select the Subdirectories check box if you want to include the subfolders and their files in the archive.
5.
Then, set the Archive file field, by filling the destination path and the name of your archive file.
1614
6.
Select the Create directory if not exists check box if you do not have a destination directory yet and you
want to create it.
7.
In the Compress level list, select the compression level you want to apply to your archive. In this example,
we use the normal level.
8.
Clear the All Files check box if you only want to zip specific files.
9.
Add a row in the table by clicking the [+] button and click the name which appears. Between two star symbols
(ie. *RG*), type part of the name of the file that you want to compress.
1615
tFileCompare
tFileCompare
tFileCompare properties
Component family
File/Management
Function
Compares two files and provides comparison data (based on a read-only schema)
Purpose
Basic settings
File to compare
Reference file
If differences are detected, Type in a message to be displayed in the Run console based on
display and If no difference the result of the comparison.
detected, display
Advanced settings
Usage
Global Variables
Print to console
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
This component can be used as standalone component but it is usually linked to an output
component to gather the log data.
DIFFERENCE: Checks whether two files are identical or not.
This is a Flow variable and it returns a boolean value:
- true if the two files are identical.
- false if there is a difference between them.
For further information about variables, see Talend Studio User
Guide.
A Flow variable means it functions during the execution
of a component while an After variable means it
functions after the execution of a component.
Connections
1616
Limitation
n/a
1.
Drag and drop the following components: tFileUnarchive, tFileCompare, and tFileOutputDelimited.
2.
3.
Connect the tFileCompare to the output component, using a Main row link.
4.
In the tFileUnarchive component Basic settings, fill in the path to the archive to unzip.
5.
In the Extraction Directory field, fill in the destination folder for the unarchived file.
6.
In the tFileCompare Basic settings, set the File to compare. Press Ctrl+Space bar to
display the list of global variables. Select $_globals{tFileUnarchive_1}{CURRENT_FILEPATH} or
"((String)globalMap.get("tFileUnarchive_1_CURRENT_FILEPATH"))" according to the language you
work with, to fetch the file path from the tFileUnarchive component.
7.
8.
In the messages fields, set the messages you want to see if the files differ or if the files are identical, for
example: "[job " + JobName + "] Files differ".
9.
Select the Print to Console check box, for the message defined to display at the end of the execution.
10. The schema is read-only and contains standard information data. Click Edit schema to have a look to it.
1617
11. Then set the output component as usual with semi-colon as data separators.
12. Save your Job and press F6 to run it.
The message set is displayed to the console and the output shows the schema information data.
1618
tFileCopy
tFileCopy
tFileCopy Properties
Component family
File/Management
Function
Copies a source file into a target directory and can remove the source file if required.
Purpose
Helps to streamline processes by automating recurrent and tedious tasks such as copy.
Basic settings
File Name
Destination
Select this check box to overwrite any existing file with the newly
copied file.
Advanced settings
tStatCatcher Statistics
Usage
Global Variables
Connections
Limitation
n/a
1619
1.
Drop a tFileList and a tFileCopy from the Palette to the design workspace.
2.
3.
In the tFileList Basic settings, set the directory for the iteration loop.
4.
Set the Filemask to *.txt to catch all files with this extension. For this use case, the case is not sensitive.
5.
6.
In the File Name field, press Ctrl+Space bar to access the list of variables.
7.
8.
Select the Remove Source file check box to get rid of the file that have been copied.
9.
Select the Replace existing file check box to overwrite any file possibly present in the destination directory.
1620
tFileDelete
tFileDelete
tFileDelete Properties
Component family
File/Management
Function
Purpose
Helps to streamline processes by automating recurrent and tedious tasks such as delete.
Basic settings
File Name
Path to the file to be deleted. This field is hidden when you select
the Delete folder check box or the Delete file or folder check box.
Directory
Enter the path to the file or to the folder you want to delete. This
field is available only when you select the Delete file or folder
check box.
Fail on error
Select this check box to prevent the main Job from being executed
if an error occurs, for example, if the file to be deleted does not
exist.
Delete Folder
Select this check box to display the Directory field, where you can
indicate the path the folder to be deleted.
Advanced settings
tStatCatcher Statistics
Usage
Global Variables
Connections
1621
n/a
1.
Drop the following components: tFileList, tFileDelete, tJava from the Palette to the design workspace.
2.
In the tFileList Basic settings, set the directory to loop on in the Directory field.
3.
4.
In the tFileDelete Basic settings panel, set the File Name field in order for the current file in selection in the
tFileList component be deleted. This delete all files contained in the directory, as specified earlier.
5.
press Ctrl+Space bar to access the list of global variables. In Java, the relevant variable to collect the current
file is: ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
6.
Then in the tJava component, define the message to be displayed in the standard output
(Run console). In this Java use case, type in the Code field, the following script:
System.out.println( ((String)globalMap.get("tFileList_1_CURRENT_FILE"))
+ " has been deleted!" );
7.
1622
The message set in the tJava component displays in the log, for each file that has been deleted through the
tFileDelete component.
1623
tFileExist
tFileExist
tFileExist Properties
Component family
File/Management
Function
Purpose
tFileExists helps to streamline processes by automating recurrent and tedious tasks such as
checking if a file exists.
Basic settings
File Name
Advanced settings
tStatCatcher Statistics
Usage
Global Variables
Connections
Limitation
n/a
1624
Scenario: Checking for the presence of a file and creating it if it does not exist
Drop the following components from the Palette onto the design workspace: tFileExist,
tFileInputDelimited, tFileOutputDelimited, and tMsgBox.
2.
Connect tFileExist to tFile InputDelimited using an OnSubjobOk and to tMsgBox using a Run If link.
3.
In the design workspace, select tFileExist and click the Component tab to define its basic settings.
2.
In the File name field, enter the file path or browse to the file you want to check if it exists or not.
3.
In the design workspace, select tFileInputDelimited and click the Component tab to define its basic settings.
4.
Browse to the input file you want to read to fill out the File Name field.
If the path of the file contains some accented characters, you will get an error message when executing your Job.
For more information regarding the procedures to follow when the support of accented characters is missing, see the
Talend Installation and Upgrade Guide of the Talend solution you are using.
5.
6.
Set the header, footer and number of processed rows as needed. In this scenario, there is one header in our
table.
7.
Set Schema to Built-in and click the Edit schema button to define the data to pass on to the
tFileOutputDelimited component. Define the data present in the file to read, file2 in this scenario.
1625
Scenario: Checking for the presence of a file and creating it if it does not exist
For more information about schema types, see Talend Studio User Guide.
The schema in file2 consists of five columns: Num, Ref, Price, Quant, and tax.
8.
9.
1626
Scenario: Checking for the presence of a file and creating it if it does not exist
16. Click the Component tab to define the basic settings of tMsgBox.
17. Click the If link to display its properties in the Basic settings view.
18. In the Condition panel, press Ctrl+Space to access the variable list and select the global variable EXISTS.
Type an exclamation mark before the variable to negate the meaning of the variable.
2.
Press F6 or click the Run button in the Run tab to execute it.
A dialog box appears to confirm that the file does not exists.
Click OK to close the dialog box and continue the Job execution process. The missing file, file1 in this scenario,
got written in a delimited file in the defined place.
1627
tFileInputARFF
tFileInputARFF
tFileInputARFF properties
Component Family
File/Input
Function
tFileInputARFF reads a ARFF file row by row, with simple separated fields.
Purpose
This component opens a file and reads it row by row, in order to divide it in fields and to send these
fields to the next component, as defined in the schema, through a Row connection.
Basic settings
Property type
File Name
Advanced settings
Usage
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Use this component to read a file and separate the fields with the specified separator.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see
the section about external modules in the Talend Installation and Upgrade Guide.
1628
It is generally made of two parts. The first part describes the data structure, that is to say the rows which begin by
@attribute and the second part comprises the raw data, which follows the expression @data.
Drop the tFileInputARFF component from the Palette onto the workspace.
2.
3.
Right-click the tFileInputARFF and select Row > Main in the menu. Then, drag the link to the tLogRow,
and click it. The link is created and appears.
2.
In the Component view, in the File Name field, browse your directory in order to select your .arff file.
3.
4.
Click the [...] button next to Edit schema to add column descriptions corresponding to the file to be read.
5.
Click on the
button as many times as required to create the number of columns required, according to
the source file. Name the columns as follows.
1629
6.
For every column, the Nullable check box is selected by default. Leave the check boxes selected, for all of
the columns.
7.
Click OK.
8.
1630
9.
Click the [...] button next to Edit schema to check that the schema has been propagated. If not, click the
Sync columns button.
2.
The console displays the data contained in the ARFF file, delimited using a vertical line (the default separator).
1631
tFileInputDelimited
tFileInputDelimited
tFileInputDelimited properties
Component family
File/Input
Function
tFileInputDelimited reads a given file row by row with simple separated fields.
Purpose
Opens a file and reads it row by row to split them up into fields then sends fields as defined in the Schema
to the next Job component, via a Row link.
Basic settings
Property type
File Name/Stream
Row separator
Field separator
CSV options
Select this check box to include CSV specific parameters such as Escape
char and Text enclosure.
Header
Footer
Limit
1632
Die on error
Select this check box to stop the execution of the Job when an error occurs.
Clear the check box to skip the row on error and complete the process for
error-free rows. If needed, you can collect the rows on error using a Row
> Reject link.
To catch the FileNotFoundException, you also need to select this check
box.
Advanced settings
Advanced
numbers)
separator
(for Select this check box to modify the separators used for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Select this check box to set the number of lines to be extracted randomly.
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box to remove leading and trailing whitespace from all
columns.
Check each row structure Select this check box to synchronize every row against the input schema.
against schema
Usage
Check date
Select this check box to check the date format strictly against the input
schema.
Select the check box next to the column name you want to remove leading
and trailing whitespace from.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level as
well as at each component level.
Use this component to read a file and separate fields contained in this file using a defined separator. It
allows you to create a data flow using a Row > Main link or via a Row > Reject link in which case the
data is filtered by data that does not correspond to the type defined. For further information, please see
section Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see the
section about external modules in the Talend Installation and Upgrade Guide.
Drop a tFileInputDelimited component and a tLogRow component from the Palette to the design
workspace.
2.
Right-click on the tFileInputDelimited component and select Row > Main. Then drag it onto the tLogRow
component and release when the plug symbol shows up.
1633
Select the tFileInputDelimited component again, and define its Basic settings:
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
Fill in a path to the file in the File Name field. This field is mandatory.
If the path of the file contains some accented characters, you will get an error message when executing your Job.
For more information regarding the procedures to follow when the support of accented characters is missing, see the
Talend Installation and Upgrade Guide of the Talend Solution you are using.
3.
Define the Row separator allowing to identify the end of a row. Then define the Field separator used to
delimit fields in a row.
4.
In this scenario, the header and footer limits are not set. And the Limit number of processed rows is set on 50.
5.
Edit the schema according to the structure of your input file via the Edit Schema function to define the data
to pass on to the tLogRow component.
Related topics: see Talend Studio User Guide.
6.
Enter the encoding standard the input file is encoded in. This setting is meant to ensure encoding consistency
throughout all input and output files.
7.
Select the tLogRow and define the Field separator to use for the output display. Related topic: section
tLogRow.
8.
Select the Print schema column name in front of each value check box to retrieve the column labels in
the output displayed.
2.
1634
The Log sums up all parameters in a header followed by the result of the Job.
Drop the following components onto the workspace: tFileFetch, tSleep, tFileInputDelimited, and
tLogRow.
2.
Connect tSleep and tFileInputDelimited using a Trigger > OnComponentOk link and connect
tFileInputDelimited to tLogRow using a Row > Main link.
Double-click tFileFetch to display the Basic settings tab in the Component view and set the properties.
1635
2.
From the Protocol list, select the appropriate protocol to access the server on which your data is stored.
3.
In the URI field, enter the URI required to access the server on which your file is stored.
4.
Select the Use cache to save the resource check box to add your file data to the cache memory. This option
allows you to use the streaming mode to transfer the data.
5.
In the workspace, click tSleep to display the Basic settings tab in the Component view and set the properties.
By default, tSleeps Pause field is set to 1 second. Do not change this setting. It pauses the second Job in
order to give the first Job, containing tFileFetch, the time to read the file data.
6.
In the workspace, double-click tFileInputDelimited to display its Basic settings tab in the Component view
and set the properties.
7.
8.
From the Schema list, select Built-in and click [...] next to the Edit schema field to describe the structure
of the file that you want to fetch. The US_Employees file is composed of six columns: ID, Employee, Age,
Address, State, EntryDate.
Click [+] to add the six columns and set them as indicated in the above screenshot. Click OK.
1636
9.
In the workspace, double-click tLogRow to display its Basic settings in the Component view and click Sync
Columns to ensure that the schema structure is properly retrieved from the preceding component.
2.
Select the Multi thread execution check box in order to run the two Jobs at the same time. Bear in mind that
the second Job has a one second delay according to the properties set in tSleep. This option allows you to
fetch the data almost as soon as it is read by tFileFetch, thanks to the tFileDelimited component.
3.
1637
1638
tFileInputExcel
tFileInputExcel
tFileInputExcel properties
Component family
File/Input
Function
tFileInputExcel reads an Excel file (.xls or .xlsx) and extracts data line by line.
Purpose
tFileInputExcel opens a file and reads it row by row to split data up into fields using regular
expressions. Then sends fields as defined in the schema to the next component in the Job via a
Row link.
Basic settings
Property type
Read excel2007 file format Select this check box to read the .xlsx file of Excel 2007.
(xlsx)
File Name/Stream
All sheets
Select this check box to process all sheets of the Excel file.
Sheet list
Click the plus button to add as many lines as needed to the list of the
excel sheets to be processed:
Sheet (name or position): enter the name or position of the excel
sheet to be processed.
Use Regex: select this check box if you want to use a regular
expression to filter the sheets to process.
Header
Footer
Limit
Affect
each Select this check box if you want to apply the parameters set in the
sheet(header&footer)
Header and Footer fields to all excel sheets to be processed.
Die on error
First column
column
Select this check box to stop the execution of the Job when an error
occurs. Clear the check box to skip the row on error and complete
the process for error-free rows. If needed, you can collect the rows
on error using a Row > Reject link.
and
Last Define the range of the columns to be processed through setting the
first and last columns in the First column and Last column fields
respectively.
1639
tFileInputExcel properties
Advanced separator
Select this check box to remove the leading and trailing whitespaces
from all columns. When this check box is cleared, the Check column
to trim table is displayed, which lets you select particular columns
to trim.
This table is filled automatically with the schema being used. Select
the check box(es) corresponding to the column(s) to be trimmed.
Convert date column to Available when Read excel2007 file format (xlsx) is selected in the
string
Basic settings view.
Select this check box to show the table Check need convert date
column. Here you can parse the string columns that contain date
values based on the given date pattern.
Column: all the columns availabe in the schema of the source .xlsx
file.
Convert: select this check box to choose all the columns for
conversion (on the condition that they are all of the string type). You
can also select the individual check box next to each column for
conversion.
Date pattern: set the date format here.
Encoding
Read real
numbers
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
values
for Select this check box to read numbers in real values. This check box
becomes unavailable when you select Read excel2007 file format
(xlsx) in the Basic settings view.
Stop
reading
on Select this check box to ignore the empty line encountered and, if
encountering empty rows
there are any, the lines that follow this empty line. This check box
becomes unavailable when you select Read excel2007 file format
(xlsx) in the Basic settings view.
Generation mode
Usage
Select this check box to in order not to validate data. This check box
becomes unavailable when you select Read excel2007 file format
(xlsx) in the Basic settings view.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Use this component to read an Excel file and to output the data separately depending on the schemas
identified in the file. You can use a Row > Reject link to filter the data which doesnt correspond to
the type defined. For an example of how to use these two links, see section Scenario 2: Extracting
correct and erroneous data from an XML field in a delimited file.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For
details, see the section about external modules in the Talend Installation and Upgrade Guide.
1640
Related scenarios
Related scenarios
No scenario is available for this component yet.
1641
tFileInputFullRow
tFileInputFullRow
File/Input
Function
Purpose
tFileInputFullRow opens a file and reads it row by row and sends complete rows as defined in the Schema
to the next Job component, via a Row link.
Basic settings
File Name
Advanced settings
Usage
Row separator
Header
Footer
Limit
Die on error
Select this check box to stop the execution of the Job when an error occurs.
Clear the check box to skip the row on error and complete the process for
error-free rows. If needed, you can collect the rows on error using a Row
> Reject link.
Encoding
Select the encoding from the list or select Custom and define it manually.
This field is compulsory for DB data handling.
Select this check box to set the number of lines to be extracted randomly.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job level
as well as at each component level.
Use this component to read full rows in delimited files that can get very large. You can also create a
rejection flow using a Row > Reject link to filter the data which does not correspond to the type defined.
For an example of how to use these two links, see section Scenario 2: Extracting correct and erroneous
data from an XML field in a delimited file.
1642
1.
Drop a tFileInputFullRow and a tLogRow from the Palette onto the design workspace.
2.
Right-click on the tFileInputFullRow component and connect it to tLogRow using a Row Main link.
3.
4.
Click the Component tab to define the basic settings for tFileInputFullRow.
5.
6.
Click the three-dot [...] button next to the Edit schema field to see the data to pass on to the tLogRow
component. Note that the schema is read-only and it consists of one column, line.
7.
Fill in a path to the file to process in the File Name field, or click the three-dot [...] button. This field is
mandatory. In this scenario, the file to read is test5. It holds three rows where each row consists of tow fields
separated by a semi colon.
8.
9.
Set the Header to 1, in this scenario the footer and the number of processed rows are not set.
10. From the design workspace, select tLogRow and click the Component tab to define its basic settings. For
more information, see section tLogRow
11. Save your Job and press F6 to execute it.
1643
tFileInputFullRow reads the three rows one by one ignoring field separators, and the complete rows are
displayed on the Run console.
To extract only fields from rows, you must use tExtractDelimitedFields, tExtractPositionalFields, and
tExtractRegexFields. For more information, see section tExtractDelimitedFields, section tExtractPositionalFields
and section tExtractRegexFields.
1644
tFileInputJSON
tFileInputJSON
tFileInputJSON properties
Component Family
File / Input
Function
tFileInputJSON extracts JSON data from a file according to the JSONPath query.
If you have subscribed to one of the Talend solutions with Big Data, you are able to use
this component in a Talend Map/Reduce Job to generate Map/Reduce code. For further
information, see section tFileInputJSON in Talend Map/Reduce Jobs. In that situation,
tFileInputJSON belongs to the MapReduce component family.
Purpose
tFileInputJSON extracts JSON data from a file according to the JSONPath query, then
transferring the data to a file, a database table, etc.
Basic settings
Property type
Read by XPath
Select this check box to show the Loop JSONPath query field
and the Get nodes check box in the Mapping table.
Use URL
Select this check box to retrieve data directly from the Web.
URL: type in the URL path from which you will retrieve data.
Filename
This field is not available if you select the Use URL check box.
Click the [...] button next to the field to browse to the file from
which you will retrieve data or enter the full path to the file
directly.
Mapping
Advanced settings
Advanced separator (for Select this check box to modify the separators used for numbers:
numbers)
Thousands separator: define separators for thousands.
1645
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Usage
Global Variables
In a Talend Map/Reduce Job, tFileInputJSON, as well as the whole Map/Reduce Job using it, generates native
Map/Reduce code. This section presents the specific properties of tFileInputJSON when it is used in that situation.
For further information about a Talend Map/Reduce Job, see the Talend Open Studio for Big Data Getting Started
Guide.
Component family
MapReduce / Input
Function
In a Map/Reduce Job, tFileInputJSON extracts data from one or more JSON files on HDFS
and sends it to the following transformation component.
Basic settings
Property type
1646
Built-in: You create and store the schema locally for this
component only. Related topic: see Talend Studio User Guide.
Folder/File
Enter the path to the file or folder on HDFS from which the data
will be extracted.
If the path you entered points to a folder, all files stored in that
folder will be read.
Mapping
Advanced settings
Usage
Advanced separator (for Select this check box to change the separator used for numbers.
number)
By default, the thousands separator is a coma (,) and the decimal
separator is a period (.).
Validate date
Select this check box to check the date format strictly against the
input schema.
Encoding
Select the encoding from the list or select Custom and define it
manually.
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Hadoop Connection
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
1647
Drop tFileInputJSON and tLogRow from the Palette onto the Job designer.
2.
3.
2.
Click the [...] button next to the Edit schema field to open the schema editor.
1648
3.
Click the [+] button to add five columns, namely type, movie_name, release, rating and starring, with the
type of String except for the column rating, which is Double.
Click OK to close the editor.
4.
In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
5.
6.
In the Mapping table, the schema automatically appears in the Column part. For each column, type in the
JSONPath query to retrieve data from the JSON node under the JSONPath query part.
1649
7.
For the column type, enter the JSONPath query "type", which is the first node of the JSON data.
8.
For columns movie_name, release and rating, enter the JSONPath queries "$..name", "$..release" and
"$..rating" respectively.
Here, "$" stands for the root node relative to the nodes name, release and rating, namely detail. ".." stands
for the recursive decent of the node detail, namely movies.
Therefore, the query is still valid if you replace "$..name" with "detail.movies.name".
9.
For the column starring, enter the JSONPath query "detail.starring". Note that type and detail are two nodes
of the same level in the hierarchy.
11. Select Table (print values in cells of a table) for a better display of the results.
2.
As shown above, the source JSON data is collected in a flat file table.
1650
Drop the following components from the Palette onto the design workspace: tFileInputJSON,
tExtractJSONFields and tLogRow.
2.
3.
2.
Click the [...] button next to the Edit schema field to open the schema editor.
1651
Click the [+] button to add one column, namely friends, of the String type.
Click OK to close the editor.
3.
Clear the Read by XPath check box and select the Use Url check box.
In the URL field, enter the JSON file URL, "http://localhost:8080/docs/facebook.json" in this case.
The JSON file is as follows:
{ "user": { "id": "9999912398",
"name": "Kelly Clarkson",
"friends": [
{ "name": "Tom Cruise",
"id": "55555555555555",
"likes": {
"data": [
{ "category": "Movie",
"name": "The Shawshank Redemption",
"id": "103636093053996",
"created_time": "2012-11-20T15:52:07+0000"
},
{ "category": "Community",
"name": "Positiveretribution",
"id": "471389562899413",
"created_time": "2012-12-16T21:13:26+0000"
}
]
}
},
{ "name": "Tom Hanks",
"id": "88888888888888"
"likes": {
"data": [
{ "category": "Journalist",
"name": "Janelle Wang",
"id": "136009823148851",
"created_time": "2013-01-01T08:22:17+0000"
},
{ "category": "Tv show",
"name": "Now With Alex Wagner",
"id": "305948749433410",
"created_time": "2012-11-20T06:14:10+0000"
}
]
}
}
]
}
1652
4.
Enter the URL in a browser. If the Tomcat server is running, the browser displays:
5.
In the Studio, in the Mapping table, enter the JSONPath query "$.user.friends[*]" next to the friends column,
retrieving the entire friends node from the source file.
6.
7.
Click the [...] button next to the Edit schema field to open the schema editor.
1653
8.
Click the [+] button in the right panel to add five columns, namely id, name, like_id, like_name and
like_category, which will hold the data of relevant nodes in the JSON field friends.
Click OK to close the editor.
9.
In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
Select Table (print values in cells of a table) for a better display of the results.
1654
2.
As shown above, the friends data of the Facebook user Kelly Clarkson is extracted correctly.
1655
tFileInputLDIF
tFileInputLDIF
tFileInputLDIF Properties
Component Family
File/Input
Function
Purpose
tFileInputLDIF opens a file, reads it row by row, et gives the full rows to the next component as defined
in the schema, using a Row connection.
Basic settings
Property type
File Name
add operation as prefix when Select this check box to display the operation mode.
the entry is modify type
Value separator
Type in the separator required for parsing data in the given file. By default,
the separator used is ,.
Die on error
Select this check box to stop the execution of the Job when an error occurs.
Clear the check box to skip the row on error and complete the process for
error-free rows. If needed, you can collect the rows on error using a Row
> Reject link.
Advanced settings
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Use field options (for Base64 Select this check box to specify the Base64-encoded columns of the input
decode checked)
flow. Once selected, this check box activates the Decode Base64 encoding
values table to enable you to precise the columns to be decoded from
Base64.
The data type of the columns to be handled by this check box is
byte[] that you define in the input schema editor.
tStatCatcher Statistics
Usage
Select this check box to gather the Job processing metadata at a Job level
as well as at each component level.
Use this component to read full rows in a voluminous LDIF file. This component enables you to create a
data flow, using a Row > Main link, and to create a reject flow with a Row > Reject link filtering the data
which type does not match the defined type. For an example of usage, see section Scenario 2: Extracting
erroneous XML data via a reject flow from tFileInputXML.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see the
section about external modules in the Talend Installation and Upgrade Guide.
1656
Related scenario
Related scenario
For a related scenario, see section Scenario: Writing DB data into an LDIF-type file.
1657
tFileInputMail
tFileInputMail
tFileInputMail properties
Component family
File/Input
Function
tFileInputMail reads the header and content parts of defined email file.
Purpose
Basic settings
File name
Attachment export directory Enter the path to the directory where you want to export email
attachments.
Mail parts
Die on error
Select this check box to stop the execution of the Job when an error
occurs. Clear the check box to skip the row on error and complete
the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component handles flow of data therefore it requires output. It is defined as an intermediary
step.
Limitation
n/a
1658
1.
Drop a tFileInputMail and a tLogRow component from the Palette to the design workspace.
2.
3.
Double-click tFileInputMail to display its Basic settings view and define the component properties.
4.
Click the three-dot button next to the File Name field and browse to the mail file to be processed.
5.
Set schema type to Built-in and click the three-dot button next to Edit schema to open a dialog box where
you can define the schema including all columns you want to retrieve on your output.
6.
Click the plus button in the dialog box to add as many columns as you want to include in the output flow. In
this example, the schema has four columns: Date, Author, Object and Status.
7.
Once the schema is defined, click OK to close the dialog box and propagate the schema into the Mail parts
table.
8.
Click the three-dot button next to Attachment export directory and browse to the directory in which you want
to export email attachments, if any.
9.
In the Mail part column of the Mail parts table, type in the actual header or body standard keys that will
be used to retrieve the values to be displayed.
10. Select the Multi Value check box next to any of the standard keys if more than one value for the relative
standard key is present in the input file.
11. If needed, define a separator for the different values of the relative standard key in the Separator field.
12. Double-click tLogRow to display its Basic settings view and define the component properties in order for
the values to be separated by a carriage return. On Windows OS, type in \n between double quotes.
13. Save your Job and press F6 to execute it and display the output flow on the console.
1659
The header key values are extracted as defined in the Mail parts table. Mail reception date, author, subject and
status are displayed on the console.
1660
tFileInputMSDelimited
tFileInputMSDelimited
tFileInputMSDelimited properties
Component family
File/Input
Function
Purpose
tFileInputMSDelimited opens a complex multi-structured file, reads its data structures (schemas) and
then uses Row links to send fields as defined in the different schemas to the next Job components.
Basic settings
The [Multi Schema Editor] helps to build and configure the data flow in
a multi-structure delimited file to associate one schema per output.
For more information, see section The Multi Schema Editor.
Advanced settings
Output
Lists all the schemas you define in the [Multi Schema Editor], along with
the related record type and the field separator that corresponds to every
schema, if different field separators are used.
Die on error
Select this check box to stop the execution of the Job when an error occurs.
Clear the check box to skip the row on error and complete the process for
error-free rows.
Select this check box to remove leading and trailing whitespaces from
defined columns.
Validate date
Select this check box to check the date format strictly against the input
schema.
Advanced
numbers)
separator
(for Select this check box to modify the separators used for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
tStatCatcher Statistics
Usage
Select this check box to gather the Job processing metadata at a Job level
as well as at each component level.
Use this component to read multi-structured delimited files and separate fields contained in these files
using a defined separator.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see the
section about external modules in the Talend Installation and Upgrade Guide.
The [Multi Schema Editor] also helps to declare the schema that should act as the source schema (primary key)
from the incoming data to insure its unicity.The editor uses this mapping to associate all schemas processed in the
delimited file to the source schema in the same file.
1661
The editor opens with the first column, that usually holds the record type indicator, selected by default. However, once the
editor is open, you can select the check box of any of the schema columns to define it as a primary key.
For detailed information about the usage of the Multi Schema Editor, see section Scenario: Reading a multi
structure delimited file.
1662
Drop a tFileInputMSDelimited component and three tLogRow components from the Palette onto the
design workspace.
2.
In the design workspace, right-click tFileInputMSDelimited and connect it to tLogRow1, tLogRow2, and
tLogRow3 using the row_A_1, row_B_1, and row_C_1 links respectively.
2.
Click Browse... next to the File name field to locate the multi schema delimited file you need to process.
3.
1663
Select the Use Multiple Separator check box and define the fields that follow accordingly if different field separators
are used to separate schemas in the source file.
A preview of the source file data displays automatically in the Preview panel.
Column 0 that usually holds the record type indicator is selected by default. However, you can select the check box
of any of the other columns to define it as a primary key.
4.
Click Fetch Codes to the right of the Preview panel to list the type of schema and records you have in the
source file. In this scenario, the source file has three schema types (A, B, C).
Click each schema type in the Fetch Codes panel to display its data structure below the Preview panel.
5.
Click in the name cells and set column names for each of the selected schema.
In this scenario, column names read as the following:
-Schema A: Type, DiscName, Author, Date,
-Schema B: Type, SongName,
-Schema C: Type, LibraryName.
You need now to set the primary key from the incoming data to insure its unicity (DiscName in this scenario).
To do that:
1664
6.
In the Fetch Codes panel, select the schema holding the column you want to set as the primary key (schema
A in this scenario) to display its data structure.
7.
Click in the Key cell that corresponds to the DiscName column and select the check box that appears.
8.
Click anywhere in the editor and the false in the Key cell will become true.
You need now to declare the parent schema by which you want to group the other children schemas
(DiscName in this scenario). To do that:
9.
In the Fetch Codes panel, select schema B and click the right arrow button to move it to the right. Then,
do the same with schema C.
The Cardinality field is not compulsory. It helps you to define the number (or range) of fields in children schemas
attached to the parent schema. However, if you set the wrong number or range and try to execute the Job, an error
message will display.
10. In the [Multi Schema Editor], click OK to validate all the changes you did and close the editor.
The three defined schemas along with the corresponding record types and field separators display
automatically in the Basic settings view of tFileInputMSDelimited.
1665
The three schemas you defined in the [Multi Schema Editor] are automatically passed to the three tLogRow
components.
11. If needed, click the Edit schema button in the Basic settings view of each of the tLogRow components to
view the input and output data structures you defined in the Multi Schema Editor or to modify them.
2.
1666
1667
tFileInputMSPositional
tFileInputMSPositional
tFileInputMSPositional properties
Component family
File/Input
Function
Purpose
tFileInputMSPositional opens a complex multi-structured file, reads its data structures (schemas) and
then uses Row links to send fields as defined in the different schemas to the next Job components.
Basic settings
Property type
File Name
Row separator
Records
Limit
Die on unknown header type Length values separated by commas, interpreted as a string between
quotes. Make sure the values entered in this fields are consistent with
the schema defined.
Advanced settings
Process long rows (needed for Select this check box to process long rows (this is necessary to process
processing rows longer than rows longer than 100 000 characters).
100,000 characters wide)
Advanced
numbers)
separator
(for Select this check box to modify the separators used for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
1668
Usage
Select this check box to remove leading and trailing whitespaces from
defined columns.
Validate date
Select this check box to check the date format strictly against the input
schema.
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Use this component to read a multi schemas positional file and separate fields using a position separator
value. You can also create a rejection flow using a Row > Reject link to filter the data which does not
correspond to the type defined. For an example of how to use these two links, see section Scenario 2:
Extracting correct and erroneous data from an XML field in a delimited file.
(car_owner):schema_id;car_make;owner;age
(car-insurance):schema_id;car_owner;age;car_insurance
John
45
Mike
30
45 yes
50 No
Drop one tFileInputMSPositional and two tLogRow from the Palette to the design workspace.
2.
Double-click the tFileInputMSPositional component to show its Basic settings view and define its
properties.
1669
2.
In the File name/Stream field, type in the path to the input file. Also, you can click the [...] button to browse
and choose the file.
3.
In the Header Field Position field, enter the start-end position for the schema identifier in the input file, 0-1
in this case as the first character in each row is the schema identifier.
4.
Click the [+] button twice to added two rows in the Records table.
5.
Click the cell under the Schema column to show the [...] button.
Click the [...] button to show the schema naming box.
6.
1670
7.
Define the schema car_owner, which has four columns: schema_id, car_make, owner and age.
8.
Repeat the steps to define the schema car_insurance, which has four columns: schema_id, car_owner, age
and car_insurance.
9.
Connect tFileInputMSPositional to the car_owner component with the Row > car_owner link, and the
car_insurance component with the Row > car_insurance link.
10. In the Header value column, type in the schema identifier value for the schema, 1 for the schema car_owner
and 2 for the schema car_insurance in this case.
11. In the Pattern column, type in the length of each field in the schema, i.e. the number of characters, number,
etc in each field, 1,8,10,3 for the schema car_owner and 1,10,3,3 for the schema car_insurance in this case.
12. In the Skip from header field, type in the number of beginning rows to skip, 2 in this case as the two rows
in the beginning just describes the two schemas, instead of the values.
13. Choose Table (print values in cells of a table) in the Mode area of the components car_owner and
car_insurance.
1671
2.
The file is read row by row based on the length values defined in the Pattern field and output in two tables
with different schemas.
1672
tFileInputMSXML
tFileInputMSXML
tFileInputMSXML Properties
Component family
XML or File/Input
Function
tFileInputMSXML reads and outputs multiple schema within an XML structured file.
Purpose
tFileInputMSXML opens a complex multi-structured file, reads its data structures (schemas)
and then uses Row links to send fields as defined in the different schemas to the next Job
components.
Basic settings
File Name
The root of the XML tree, which the query is based on.
Enable XPath in column Select this check box if you want to define a XPath path in the
Schema XPath loop But Schema XPath loop field of th Outputs array.
lose the order
This option is only available with the dom4j generation
mode. Make sure this mode is selected in the
Generation mode list, in the Advanced settings tab of
your component. If you use this option, the data will
not be returned in order.
Outputs
Advanced settings
Die on error
Select this check box to stop the execution of the Job when an
error occurs. Clear the check box to skip the row on error and
complete the process for error-free rows.
Validate date
Select this check box to check the date format strictly against the
input schema.
Select this check box to ignore the DTD file indicated in the XML
file being processed.
Generation mode
Encoding
Select the encoding type from the list or select CUSTOM and
define it manually. This field is compulsory for DB data handling.
1673
tStatCatcher Statistics
Limitation
n/a
1.
Drop a tFileInputMSXML and two tLogRow components from the Palette onto the design workspace.
2.
3.
4.
In the Root XPath query field, enter the root of the XML tree, which the query will be based on.
5.
Select the Enable XPath in column Schema XPath loop but lose the order check box if you want to
define a XPath path in the Schema XPath loop field, in the Outputs array. In this scenario, we do not use
this option.
6.
Click the plus button to add lines in the Outputs table where you can define the output schema, two lines
in this scenario: record and book.
1674
7.
In the Outputs table, click in the Schema cell and then click a three-dot button to display a dialog box where
you can define the schema name.
8.
Enter a name for the output schema and click OK to close the dialog box.
The tFileInputMSXML schema editor displays.
9.
10. Do the same for all the output schemas you want to define.
11. In the design workspace, right-click tFileInputMSXML and connect it to tLogRow1, and tLogRow2 using
the record and book links respectively.
12. In the Basic settings view and in the Schema XPath loop cell, enter the node of the XML tree, which the
loop is based on.
13. In the XPath Queries cell, enter the fields to be extracted from the structured XML input.
14. Select the check boxes next to schemas names where you want to create empty rows.
15. Save your Job and press F6 to execute it. The defined schemas are extracted from the multi schema XML
structured file and displayed on the console.
The multi schema XML file is read row by row and the extracted fields are displayed on the Run Job console
as defined.
1675
1676
tFileInputPositional
tFileInputPositional
tFileInputPositional properties
Component family
File/Input
Function
tFileInputPositional reads a given file row by row and extracts fields based on a pattern.
Purpose
This component opens a file and reads it row by row to split them up into fields then sends fields as
defined in the schema to the next Job component, via a Row link.
Basic settings
Property type
File Name/Stream
Row separator
Use byte
cardinality
length
Customize
the Select this check box to enable the support of double-byte character
to this component. JDK 1.6 is required for this feature.
Select this check box to customize the data format of the positional
file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between inverted commas the padding
character used in order for it to be removed from the field. A space
by default.
Alignment: Select the appropriate alignment parameter.
Pattern
1677
Die on error
Select this check box to stop the execution of the Job when an error
occurs. Clear the check box to skip the row on error and complete
the process for error-free rows. If needed, you can collect the rows
on error using a Row > Reject link.
Header
Footer
Limit
Advanced settings
Needed to process rows longer Select this check box if the rows to be processed in the input file are
than 100 000 characters
longer than 100 000 characters.
Advanced
numbers)
separator
(for Select this check box to modify the separators used for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Usage
Validate date
Select this check box to check the date format strictly against the
input schema.
Encoding
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Use this component to read a file and separate fields using a position separator value. You can also
create a rejection flow using a Row > Reject link to filter the data which does not correspond to the
type defined. For an example of how to use these two links, see section Scenario 2: Extracting correct
and erroneous data from an XML field in a delimited file.
Global Variables
1678
CustomerRef
8200
8201
8202
InsuranceNr
50330
50331
50332
00002
8203
50333
1.
2.
Drop a tFileOutputXML component as well. This file is meant to receive the references in a structured way.
3.
Right-click the tFileInputPositional component and select Row > Main. Then drag it onto the
tFileOutputXML component and release when the plug symbol shows up.
Double-click the tFileInputPositional component to show its Basic settings view and define its properties.
2.
Define the Job Property type if needed. For this scenario, we use the built-in Property type.
As opposed to the Repository, this means that the Property type is set for this station only.
3.
Fill in a path to the input file in the File Name field. This field is mandatory.
4.
Define the Row separator identifying the end of a row if needed, by default, a carriage return.
5.
If required, select the Use byte length as the cardinality check box to enable the support of double-byte
character.
6.
Define the Pattern to delimit fields in a row. The pattern is a series of length values corresponding to the
values of your input files. The values should be entered between quotes, and separated by a comma. Make
sure the values you enter match the schema defined.
1679
7.
Fill in the Header, Footer and Limit fields according to your input file structure and your need. In this
scenario, we only need to skip the first row when reading the input file. To do this, fill the Header field with
1 and leave the other fields as they are.
8.
Next to Schema, select Repository if the input schema is stored in the Repository. In this use case, we use
a Built-In input schema to define the data to pass on to the tFileOutputXML component.
9.
You can load and/or edit the schema via the Edit Schema function. For this schema, define three columns,
respectively Contract, CustomerRef and InsuranceNr matching the structure of the input file. Then, click OK
to close the [Schema] dialog box and propagate the changes.
2.
3.
Define the row tag that will wrap each row of data, in this use case ContractRef.
4.
Click the three-dot button next to Edit schema to view the data structure, and click Sync columns to retrieve
the data structure from the input component if needed.
5.
Switch to the Advanced settings tab view to define other settings for the XML output.
1680
6.
Click the plus button to add a line in the Root tags table, and enter a root tag (or more) to wrap the XML
output structure, in this case ContractsList.
7.
Define parameters in the Output format table if needed. For example, select the As attribute check box for
a column if you want to use its name and value as an attribute for the parent XML element, clear the Use
schema column name check box for a column to reuse the column label from the input schema as the tag
label. In this use case, we keep all the default output format settings as they are.
8.
To group output rows according to the contract number, select the Use dynamic grouping check box, add
a line in the Group by table, select Contract from the Column list field, and enter an attribute for it in the
Attribute label field.
9.
Press Ctrl+S to save your Job to ensure that all the configured parameters take effect.
2.
1681
The file is read row by row based on the length values defined in the Pattern field and output as an XML
file as defined in the output settings. You can open it using any standard XML editor.
1682
tFileInputProperties
tFileInputProperties
tFileInputProperties properties
Component family
File/Input
Function
tFileInputProperties reads a text file row by row and extracts the fields.
Purpose
tFileInputProperties opens a text file and reads it row by row then separates the fields according to the
model key = value.
Basic settings
File format
Select from the list your file format, either: .properties or .ini.
.properties: data in the configuration file is written in two lines and
structured according to the following way: key = value.
.ini: data in the configuration file is written in two lines and structured
according to the following way: key = value and re-grouped in sections.
Section Name: enter the section name on which the iteration is based.
Advanced settings
Usage
File Name
Name or path to the file to be processed. Related topic: see Talend Studio
User Guide.
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job level as
well as at each component level.
Use this component to read a text file and separate data according to the structure key = value.
1683
Scenario: Reading and matching the keys and the values of different .properties files and outputting the results in a glossary
Drop the following components from the Palette onto the design workspace: tFileInputProperties (x2),
tMap, and tLogRow.
2.
Connect the component together using Row > Main links. The second properties file, FR, is used as a lookup
flow.
Double-click the first tFileInputProperties component to open its Basic settings view and define its
properties.
2.
1684
Scenario: Reading and matching the keys and the values of different .properties files and outputting the results in a glossary
3.
In the File Name field, click the three-dot button and browse to the input .properties file you want to use.
4.
Do the same with the second tFileInputProperties and browse to the French .properties file this time.
5.
6.
Select all columns from the English_terms table and drop them to the output table.
Select the key column from the English_terms table and drop it to the key column in the French_terms table.
7.
In the glossary table in the lower right corner of the tMap editor, rename the value field to EN because it
will hold the values of the English file.
8.
Click the plus button to add a line to the glossary table and rename it to FR.
9.
10. In the upper left corner of the tMap editor, select the value column in the English_terms table and drop it to
the FR column in the French_terms table. When done, click OK to validate your changes and close the map
editor and propagate the changes to the next component.
1685
Scenario: Reading and matching the keys and the values of different .properties files and outputting the results in a glossary
2.
Press F6 or click the Run button from the Run tab to execute it.
The glossary displays on the console listing three columns holding: the key name in the first column, the English
term in the second, and the corresponding French term in the third.
1686
tFileInputRegex
tFileInputRegex
tFileInputRegex properties
Component family
File/Input
Function
Powerful feature which can replace number of other components of the File family. Requires
some advanced knowledge on regular expression syntax
Purpose
Opens a file and reads it row by row to split them up into fields using regular expressions. Then
sends fields as defined in the Schema to the next Job component, via a Row link.
Basic settings
Property type
File Name/Stream
Row separator
Regex
Header
Footer
Limit
Advanced settings
Die on error
Select this check box to stop the execution of the Job when an error
occurs. Clear the check box to skip the row on error and complete
the process for error-free rows. If needed, you can collect the rows
on error using a Row > Reject link.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
1687
Usage
Use this component to read a file and separate fields contained in this file according to the defined
Regex. You can also create a rejection flow using a Row > Reject link to filter the data which
doesnt correspond to the type defined. For an example of how to use these two links, see section
Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file.
Limitation
n/a
2.
3.
Right-click on the tFileInputRegex component and select Row > Main. Drag this main row link onto the
tFileOutputPositional component and release when the plug symbol displays.
Select the tFileInputRegex again so the Component view shows up, and define the properties:
2.
The Job is built-in for this scenario. Hence, the Properties are set for this station only.
3.
Fill in a path to the file in File Name field. This field is mandatory.
4.
5.
Then define the Regular expression in order to delimit fields of a row, which are to be passed on to the next
component. You can type in a regular expression using Java code, and on mutiple lines if needed.
Regex syntax requires double quotes.
6.
1688
In this expression, make sure you include all subpatterns matching the fields to be extracted.
7.
8.
Select a local (Built-in) Schema to define the data to pass on to the tFileOutputPositional component.
9.
You can load or create the schema through the Edit Schema function.
2.
Now go to the Run tab, and click on Run to execute the Job.
The file is read row by row and split up into fields based on the Regular Expression definition. You can
open it using any standard file editor.
1689
tFileInputXML
tFileInputXML
tFileInputXML belongs to two component families: File and XML. For more information on tFileInputXML,
see section tFileInputXML.
1690
tFileList
tFileList
tFileList properties
Component family
File/Management
Function
Purpose
tFileList retrieves a set of files or folders based on a filemask pattern and iterates on each unity.
Basic settings
Directory
FileList Type
Select the type of input you want to iterate on from the list:
Files if the input is a set of files,
Directories if the input is a set of directories,
Both if the input is a set of the above two types.
Include subdirectories
Select this check box if the selected input source type includes
sub-directories.
Case Sensitive
Set the case mode from the list to either create or not create case
sensitive filter on filenames.
Generate Error if no file Select this check box to generate an error message if no files or
found
directories are found.
Use Glob Expressions as This check box is selected by default. It filters the results using a
Filemask
Global Expression (Glob Expressions).
Files
Order by
The folders are listed first of all, then the files. You can choose to
prioritise the folder and file order either:
By default: alphabetical order, by folder then file;
By file name: alphabetical order or reverese alphabetical order;
By file size: smallest to largest or largest to smallest;
By modified date: most recent to least recent or least recent to
most recent.
If ordering by file name, in the event of identical file
names then modified date takes precedence. If ordering
by file size, in the event of identical file sizes then
file name takes precedence. If ordering by modified
date, in the event of identical dates then file name takes
precedence.
Order action
Advanced settings
1691
tFileList provides a list of files or folders from a defined directory on which it iterates
Global Variables
Connections
Limitation
n/a
1692
Drop the following components from the Palette to the design workspace: tFileList, tFileInputDelimited,
and tLogRow.
2.
Right-click the tFileList component, and pull an Iterate connection to the tFileInputDelimited component.
Then pull a Main row from the tFileInputDelimited to the tLogRow component.
Double-click tFileList to display its Basic settings view and define its properties.
2.
Browse to the Directory that holds the files you want to process. To display the path on the Job itself, use the
label (__DIRECTORY__) that shows up when you put the pointer anywhere in the Directory field. Type in
this label in the Label Format field you can find if you click the View tab in the Basic settings view.
3.
In the Basic settings view and from the FileList Type list, select the source type you want to process, Files
in this example.
4.
In the Case sensitive list, select a case mode, Yes in this example to create case sensitive filter on file names.
5.
Keep the Use Glob Expressions as Filemask check box selected if you want to use global expressions to
filter files, and define a file mask in the Filemask field.
6.
Double-click tFileInputDelimited to display its Basic settings view and set its properties.
1693
7.
Enter the File Name field using a variable containing the current filename path, as you filled in the Basic
settings of tFileList. Press Ctrl+Space bar to access the autocomplete list of variables, and select the
global variable ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) . This way, all files in
the input directory can be processed.
8.
Fill in all other fields as detailed in the tFileInputDelimited section. Related topic: section
tFileInputDelimited.
9.
Select the last component, tLogRow, to display its Basic settings view and fill in the separator to be used to
distinguish field content displayed on the console. Related topic: section tLogRow.
The Job iterates on the defined directory, and reads all included files. Then delimited data is passed on to the last
component which displays it on the console.
1694
From the Palette, drop two tFileList components, two tIterateToFlow components, two
tFileOutputDelimited components, a tFileInputDelimited component, a tUniqRow component, and a
tLogRow component onto the design workspace.
2.
Link the first tFileList component to the first tIterateToFlow component using a Row > Iterate connection,
and the connect the first tIterateToFlow component to the first tFileOutputDelimited component using a
Row > Main connection to form the first subjob.
3.
Link the second tFileList component to the second tIterateToFlow component using a Row > Iterate
connection, and the connect the second tIterateToFlow component to the second tFileOutputDelimited
component using a Row > Main connection to form the second subjob.
4.
Link the tFileInputDelimited to the tUniqRow component using a Row > Main connection, and the
tUniqRow component to the tLogRow component using a Row > Duplicates connection to form the third
subjob.
5.
Link the three subjobs using Trigger > On Subjob Ok connections so that they will be triggered one after
another, and label the components to better identify their roles in the Job.
In the Basic settings view of the first tFileList component, fill the Directory field with the path to the first
folder you want to read filenames from, E:/DataFiles/DI/images in this example, and leave the other settings
as they are.
1695
2.
Double-click the first tIterateToFlow component to show its Basic settings view.
3.
Double-click the [...] button next to Edit schema to open the [Schema] dialog box and define the schema of
the text file the next component will write filenames to. When done, click OK to close the dialog box and
propagate the schema to the next component.
In this example, the schema contains only one column: Filename.
1696
4.
In Value field of the Mapping table, press Ctrl+Space to access the autocomplete list of variables, and
select the global variable ((String)globalMap.get("tFileList_1_CURRENT_FILE")) to read the name
of each file in the input directory, which will be put into a data flow to pass to the next component.
5.
In the Basic settings view of the first tFileOutputDelimited component, fill the File Name field with the path
of the text file that will store the filenames from the incoming flow, D:/temp/tempdata.csv in this example.
This completes the configuration of the first subjob.
6.
Repeat the steps above to complete the configuration of the second subjob, but:
fill the Directory field in the Basic settings view of the second tFileList component with the other folder
you want to read filenames from, E:/DataFiles/DQ/images in this example.
select the Append check box in the Basic settings view of the second tFileOutputDelimited component
so that the filenames previously written to the text file will not be overwritten.
7.
In the Basic settings view of the tFileInputDelimited component, fill the File name/Stream field with the
path of the text file that stores the list of filenames, D:/temp/tempdata.csv in this example, and define the file
schema, which contains only one column in this example, Filename.
1697
8.
In the Basic settings view of the tUniqRow component, select the Key attribute check box for the only
column, Filename in this example.
9.
In the Basic settings view of the tLogRow component, select the Table (print values in cells of a table)
option for better display effect.
2.
1698
1699
tFileOutputARFF
tFileOutputARFF
tFileOutputARFF properties
Component family
File/Output
Function
Purpose
This component writes an ARFF file that holds data organized according to the defined schema.
Basic settings
Property type
File name
Attribute Define
Relation
Append
Select this check box to add the new rows at the end of the file.
Create directory if not exists This check box is selected by default. It creates a directory to hold
the output table if it does not exist.
Advanced settings
Dynamic settings
Select this check box if you do not want to generate empty files.
tStatCatcher Statistics
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent
of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User
Guide.
Global Variables
1700
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
Related scenario
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Connections
Usage
Use this component along with a Row link to collect data from another component and to rewrite the data to an ARFF file.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
Related scenario
For tFileOutputARFF related scenario, see section Scenario: Display the content of a ARFF file.
1701
tFileOutputDelimited
tFileOutputDelimited
tFileOutputDelimited properties
Component family
File/Output
Function
Purpose
This component writes a delimited file that holds data organized according to the defined
schema.
Basic settings
Property type
Select this check box process the data flow of interest. Once you
have selected it, the Output Stream field displays and you can
type in the data flow of interest.
The data flow to be processed must be added to the flow in
order for this component to fetch these data via the corresponding
representattive variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using along
with this component; otherwise, you could define it manually and
use it according to the design of your Job, for example, using
tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you could
select the variable of interest from the auto-completion list (Ctrl
+Space) to fill the current field on condition that this variable has
been properly defined.
For further information about how to use a stream, see section
Scenario 2: Reading data from a remote file in streaming mode.
File name
Row Separator
Field Separator
Append
Select this check box to add the new rows at the end of the file.
Include Header
Select this check box to include the column header to the file.
Select this check box to compress the output file in zip format.
1702
Built-in: You can create the schema and store it locally for this
component. Related topic: see Talend Studio User Guide.
Sync columns
Advanced settings
Advanced separator (for Select this check box to modify the separators used for numbers:
numbers)
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
CSV options
Select this check box to take into account all parameters specific
to CSV files, in particular Escape char and Text enclosure
parameters.
Create directory if not exists This check box is selected by default. It creates the directory that
holds the output delimited file, if it does not already exist.
Split output in several files
In case of very big output files, select this check box to divide the
output delimited file into several files.
Rows in each output file: set the number of lines in each of the
output files.
Custom the flush buffer size Select this check box to define the number of lines to write before
emptying the buffer.
Row Number: set the number of lines to write.
Usage
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box if you do not want to generate empty files.
tStatCatcher Statistics
Use this component to write a delimited file and separate fields using a field separator value.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
2.
1703
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
Click the [...] button next to the File Name field and browse to the input file, customer.csv in this example.
If the path of the file contains some accented characters, you will get an error message when executing your Job.
For more information regarding the procedures to follow when the support of accented characters is missing, see the
Talend Installation and Upgrade Guide of the Talend solution you are using.
3.
In the Row Separators and Field Separators fields, enter respectively "\n" and ";" as line and field
separators.
4.
If needed, set the number of lines used as header and the number of lines used as footer in the corresponding
fields and then set a limit for the number of processed rows.
In this example, Header is set to 6 while Footer and Limit are not set.
5.
1704
Click the [...] button next to Edit Schema to open the [Schema] dialog box and define the input schema as
shown below, and then click OK to close the dialog box.
2.
In the tMap editor, click
on top of the panel to the right to open the [Add a new output table] dialog box.
3.
Enter a name for the table you want to create, row2 in this example.
4.
5.
In the table to the left, row1, select the first three lines (Id, CustomerName and CustomerAddress) and drop
them to the table to the right
6.
In the Schema editor view situated in the lower left corner of the tMap editor, change the type of
RegisterTime to String in the table to the right.
7.
In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and define the
component properties.
1705
2.
In the Property Type field, set the type to Built-in and fill in the fields that follow manually.
3.
Click the [...] button next to the File Name field and browse to the output file you want to write data in,
customerselection.txt in this example.
4.
In the Row Separator and Field Separator fields, set \n and ; respectively as row and field separators.
5.
Select the Include Header check box if you want to output columns headers as well.
6.
Click Edit schema to open the schema dialog box and verify if the recuperated schema corresponds to the
input schema. If not, click Sync Columns to recuperate the schema from the preceding component.
2.
The three specified columns Id, CustomerName and CustomerAddress are output in the defined output file.
1706
2.
2.
3.
1707
4.
Select Use Output Stream check box to enable the Output Stream field in which you can define the output
stream using command.
Fill in the Output Stream field with following command:
(java.io.OutputStream)globalMap.get("out_file")
You can customize the command in the Output Stream field by pressing CTRL+SPACE to select built-in command
from the list or type in the command into the field manually in accordance with actual practice. In this scenario, the
command we use in the Output Stream field will call the java.io.OutputStream class to output the filtered data
stream to a local file which is defined in the Code area of tJava in this scenario.
5.
Click Sync columns to retrieve the schema defined in the preceding component.
6.
2.
1708
tFileOutputExcel
tFileOutputExcel
tFileOutputExcel Properties
Component family
File/Output
Function
Purpose
tFileOutputExcel writes an MS Excel file with separated data value according to a defined
schema.
Basic settings
Write excel 2007 file format Select this check box to write the processed data into the .xlsx
(xlsx)
format of Excel 2007.
Use Output Stream
Select this check box process the data flow of interest. Once you
have selected it, the Output Stream field displays and you can
type in the data flow of interest.
The data flow to be processed must be added to the flow in
order for this component to fetch these data via the corresponding
representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using along
with this component; otherwise, you could define it manually and
use it according to the design of your Job, for example, using
tJava or tJavaFlex.
In order to avoid the inconvenience of writing manually, you
could select the variable of interest from the auto-completion
list (Ctrl+Space) to fill the current field on condition that this
variable has been properly defined.
For further information about how to use a stream, see section
Scenario 2: Reading data from a remote file in streaming mode.
File name
Sheet name
Include header
Select this check box to include a header row to the output file.
Select this check box to add the new lines at the end of the file.
Append existing sheet: Select this check box to add the new lines
at the end of the Excel sheet.
Is absolute Y pos.
Font
Define all columns auto size Select this check box if you want the size of all your columns to
be defined automatically. Otherwise, select the Auto size check
1709
tFileOutputExcel Properties
boxes next to the column names you want their size to be defined
automatically.
Schema and Edit Schema
Advanced settings
Create directory if not exists This check box is selected by default. This option creates the
directory that will hold the output files if it does not already exist.
Custom the flush buffer size Available when Write excel2007 file format (xlsx) is selected
in the Basic settings view.
Select this check box to set the maximum number of rows in the
Row number field that are allowed in the buffer.
Advanced separator (for Select this check box to modify the separators you want to use
numbers)
for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Dynamic settings
Encoding
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent
of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User
Guide.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Use this component to write an MS Excel file with data passed on from other components using
a Row link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
1710
Related scenario
Related scenario
For tFileOutputExcel related scenario, see section tSugarCRMInput;
For scenario about the usage of Use Output Stream check box, see section Scenario 2: Utilizing Output Stream
to save filtered data to a local file.
1711
tFileOutputJSON
tFileOutputJSON
tFileOutputJSON properties
Component Family
File / Output
Function
Purpose
tFileOutputJSON receives data and rewrites it in a JSON structured data block in an output
file.
Basic settings
File Name
Sync columns
Advanced settings
Create directory if not exists This check box is selected by default. This option creates the
directory that will hold the output files if it does not already exist.
tStatCatcher Statistics
Usage
Use this component to rewrite received data in a JSON structured output file.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
You need to use the Hadoop Configuration tab in the Run view to define the connection to
a given Hadoop distribution for the whole Job.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big
Data Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce
Jobs.
Limitation
1712
n/a
In a Talend Map/Reduce Job, tFileOutputJSON, as well as the whole Map/Reduce Job using it, generates native
Map/Reduce code. This section presents the specific properties of tFileOutputJSON when it is used in that
situation. For further information about a Talend Map/Reduce Job, see the Talend Open Studio for Big Data
Getting Started Guide.
Component family
MapReduce / Output
Function
Basic settings
Folder
Enter the folder on HDFS where you want to store the JSON output
file(s).
The folder will be created automatically if it does not exist.
Output type
Type in the name of the data block for the JSON output file(s).
This field will be available only if you select All in one
block from the Output type list.
Action
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
Hadoop Connection
You need to use the Hadoop Configuration tab in the Run view to define the connection to a
given Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
Prerequisites
The Hadoop distribution must be properly installed, so as to guarantee the interaction with Talend
Studio. The following list presents MapR related information for example.
Ensure that you have installed the MapR client in the machine where the Studio is, and added
the MapR client library to the PATH variable of that machine. For Windows, this library
is lib\MapRClient.dll in the MapR client jar file; without adding it, you may encounter the
following error: no MapRClient in java.library.path.
1713
Set the -Djava.library.path argument. This argument provides to the Studio the path to
the native library of that MapR client. This allows the subscription-based users to make full
use of the Data viewer to view locally in the Studio the data stored in MapR. For further
information about how to set this argument, see the section describing how to view data of
Talend Open Studio for Big Data Getting Started Guide.
For further information about how to install a Hadoop distribution, see the manuals
corresponding to the Hadoop distribution you are using.
1.
Drop a tRowGenerator and a tFileOutputJSON component onto the workspace from the Palette.
2.
3.
Double click tRowGenerator to define its Basic Settings properties in the Component view.
4.
Click [...] next to Edit Schema to display the corresponding dialog box and define the schema.
5.
6.
7.
8.
1714
9.
Click [+] next to RowGenerator Editor to open the corresponding dialog box.
10. Under Functions, select pre-defined functions for the columns, if required, or select [...] to set customized
function parameters in the Function parameters tab.
11. Enter the number of rows to be generated in the corresponding field.
12. Click OK to close the dialog box.
13. Click tFileOutputJSON to set its Basic Settings properties in the Component view.
14. Click [...] to browse to where you want the output JSON file to be generated and enter the file name.
15. Enter a name for the data block to be generated in the corresponding field, between double quotation marks.
16. Select Built-In as the Schema type.
17. Click Sync Columns to retrieve the schema from the preceding component.
18. Press F6 to run the Job.
1715
The data from the input schema is written in a JSON structured data block in the output file.
1716
tFileOutputLDIF
tFileOutputLDIF
tFileOutputLDIF Properties
Component family
File/Output
Function
tFileOutputLDIF outputs data to an LDIF type of file which can then be loaded into a LDAP
directory.
Purpose
tFileOutputLDIF writes or modifies a LDIF file with data separated in respective entries based
on the schema defined,.or else deletes content from an LDIF file.
Basic settings
File name
Wrap
Change type
Advanced settings
Sync columns
Append
Select this check box to add the new rows at the end of the file.
Create directory if not exists This check box is selected by default. It creates the directory that
holds the output delimited file, if it does not already exist.
Custom the flush buffer size Select this check box to define the number of lines to write before
emptying the buffer.
Row Number: set the number of lines to write.
Dynamic settings
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box if you do not want to generate empty files.
tStatCatcher Statistics
Click the [+] button to add a row in the table and fill the Code field with a context variable
to choose your HDFS connection dynamically from multiple connections planned in your Job.
This feature is useful when you need to access files in different HDFS systems or different
distributions, especially when you are working in an environment where you cannot change
your Job settings, for example, when your Job has to be deployed and executed independent
of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box
is selected in the Basic settings view. Once a dynamic parameter is defined, the Component
List box in the Basic settings view becomes unusable.
1717
For more information on Dynamic settings and context variables, see Talend Studio User
Guide.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Use this component to write an XML file with data passed on from other components using
a Row link.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in Talend Studio . For details, see the
section about external modules in Talend Installation and Upgrade Guide.
Drop a tMysqlInput component and a tFileOutputLDIF component from the Palette to the design area.
2.
1718
Select the tMysqlInput component, and go to the Component panel then select the Basic settings tab.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
3.
4.
Browse to the folder where you store the Output file. In this use case, a new LDIF file is to be created. Thus
type in the name of this new file.
5.
In the Wrap field, enter the number of characters held on one line. The text coming afterwards will get
wrapped onto the next line.
6.
Select Add as Change Type as the newly created file is by definition empty. In case of modification type of
Change, youll need to define the nature of the modification you want to make to the file.
7.
As the Schema type, select Built-in and use the Sync Columns button to retrieve the input schema definition.
2.
1719
The LDIF file created contains the data from the DB table and the type of change made to the file, in this
use case, addition.
1720
tFileOutputMSDelimited
tFileOutputMSDelimited
tFileOutputMSDelimited properties
Component family
File/ Output
Function
Purpose
Basic settings
File Name
Name and path to the file to be created and/or the variable to be used.
Related topic: see Talend Studio User Guide.
Row Separator
Field Separator
Select this check box to set a different field separator for each of the
schemas using the Field separator field in the Schemas area.
Schemas
Advanced settings
Advanced
numbers)
separator
(for Select this check box to modify the separators used for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Usage
CSV options
Select this check box to take into account all parameters specific to CSV
files, in particular Escape char and Text enclosure parameters.
This check box is selected by default. It creates the directory that holds
the output delimited file, if it does not already exist.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box if you do not want to generate empty files.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Use this component to write a multi-schema delimited file and separate fields using a field separator
value.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see
the section about external modules in the Talend Installation and Upgrade Guide.
Related scenarios
No scenario is available for this component yet.
1721
tFileOutputMSPositional
tFileOutputMSPositional
tFileOutputMSPositional properties
Component family
File/Output
Function
Purpose
Basic settings
File Name
Row separator
Schemas
Advanced settings
Advanced
numbers)
separator
(for Select this check box to modify the separators used for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Usage
This check box is selected by default. It creates the directory that holds
the output delimited file, if it does not already exist.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Use this component to write a multi-schema positional file and separate fields using a position separator
value.
Related scenario
No scenario is available for this component yet.
1722
tFileOutputMSXML
tFileOutputMSXML
tFileOutputMSXML Properties
Component family
File/Output
Function
Purpose
Basic settings
File Name
Advanced settings
Create directory only if not This check box is selected by default. It creates the directory that
exists
holds the output delimited file, if it does not already exist.
Advanced separator (for Select this check box to modify the separators used for numbers:
numbers)
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box if you do not want to generate empty files.
Trim
the
characters
Limitation
whitespace Select this check box to remove leading and trailing whitespace
from the columns.
Escape text
tStatCatcher Statistics
n/a
1723
tFileOutputMSXML Properties
To the left of the mapping interface, under Linker source, the drop-down list includes all the input schemas that
should be added to the multi-schema output XML file (on the condition that more than one input flow is connected
to the tFileOutputMSXML component).
And under Schema List, are listed all columns retrieved from the input data flow in selection.
To the right of the interface, are expected all XML structures you want to create in the output XML file.
You can create manually or easily import the XML structures. Then map the input schema columns onto each
element of the XML tree, respectively for each of the input schemas in selection under Linker source.
Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2.
3.
4.
1724
tFileOutputMSXML Properties
The XML Tree column is hence automatically filled out with the correct elements. You can remove and
insert elements or sub-elements from and to the tree:
5.
6.
7.
Select Delete to remove the selection from the tree or select the relevant option among: Add sub-element,
Add attribute, Add namespace to enrich the tree.
Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2.
3.
On the menu, select Add sub-element to create the first element of the structure.
You can also add an attribute or a child element to any element of the tree or remove any element from the tree.
4.
5.
Right-click to the left of the element name to display the contextual menu.
6.
On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace or Delete.
1725
tFileOutputMSXML Properties
2.
3.
A light blue link displays that illustrates this mapping. If available, use the Auto-Map button, located to the
bottom left of the interface, to carry out this operation automatically.
You can disconnect any mapping on any element of the XML tree:
4.
Select the element of the XML tree, that should be disconnected from its respective schema column.
5.
Right-click to the left of the element name to display the contextual menu.
6.
Loop element
The loop element allows you to define the iterating object. Generally the Loop element is also the row generator.
To define an element as loop element:
1.
2.
Right-click to the left of the element name to display the contextual menu.
3.
1726
tFileOutputMSXML Properties
Group element
The group element is optional, it represents a constant element where the Groupby operation can be performed. A
group element can be defined on the condition that a loop element was defined before.
When using a group element, the rows should be sorted, in order to be able to group by the selected node.
To define an element as group element:
1.
2.
Right-click to the left of the element name to display the contextual menu.
3.
1727
Related scenario
The Node Status column shows the newly added status and any group status required are automatically defined,
if needed.
Click OK once the mapping is complete to validate the definition for this source and perform the same operation
for the other input flow sources.
Related scenario
No scenario is available for this component yet.
1728
tFileOutputPositional
tFileOutputPositional
tFileOutputPositional Properties
Component Family
File/Output
Function
tFileOutputPositional writes a file row by row according to the length and the format of the fields
or columns in a row.
Purpose
It writes a file row by row, according to the data structure (schema) coming from the input flow.
Basic settings
Property type
Select this check box process the data flow of interest. Once you have
selected it, the Output Stream field displays and you can type in the
data flow of interest.
The data flow to be processed must be added to the flow in order for
this component to fetch these data via the corresponding representative
variable.
This variable could be already pre-defined in your Studio or provided by
the context or the components you are using along with this component;
otherwise, you could define it manually and use it according to the
design of your Job, for example, using tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you could select
the variable of interest from the auto-completion list (Ctrl+Space) to
fill the current field on condition that this variable has been properly
defined.
For further information about how to use a stream, see section Scenario
2: Reading data from a remote file in streaming mode.
File Name
Row separator
Append
Select this check box to add the new rows at the end of the file.
Include header
Select this check box to include the column header to the file.
Select this check box to compress the output file in zip format.
Formats
Customize the positional file data format and fill in the columns in the
Formats table.
1729
Related scenario
Advanced
numbers)
separator
(for Select this check box to modify the separators used for numbers:
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
as
Select this check box to define the number of lines to write before
emptying the buffer.
Row Number: set the number of lines to write.
Dynamic settings
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Select this check box if you do not want to generate empty files.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your HDFS connection dynamically from multiple connections planned in your Job. This feature is
useful when you need to access files in different HDFS systems or different distributions, especially
when you are working in an environment where you cannot change your Job settings, for example,
when your Job has to be deployed and executed independent of Talend Studio.
The Dynamic settings table is available only when the Use an existing connection check box is
selected in the Basic settings view. Once a dynamic parameter is defined, the Component List box
in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means
it functions after the execution of a component.
Usage
Use this component to read a file and separate the fields using the specified separator.
Related scenario
For a related scenario, see section Scenario: Regex to Positional file.
For scenario about the usage of Use Output Stream check box, see section Scenario 2: Utilizing Output Stream
to save filtered data to a local file.
1730
tFileOutputProperties
tFileOutputProperties
tFileOutputProperties properties
Component family
File/Output
Function
Purpose
tFileInputProperties writes a configuration file containing text data organized according to the model
key = value.
Basic settings
Schema and Edit Schema A schema is a row description, i.e. it defines the number of fields to be
processed and passed on to the next component. The schema is either Builtin or stored remotely in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode
is available.
For this component, the schema is read-only. It is made of two column, Key
and Value, corresponding to the parameter name and the parameter value
to be copied.
File format
File Name
Advanced settings
Usage
Encoding
Select the encoding from the list or select Custom and define it manually.
This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job level as
well as at each component level.
Use this component to write files where data is organized according to the structure key = value.
Related scenarios
For a related scenario, see section Scenario: Reading and matching the keys and the values of different .properties
files and outputting the results in a glossary of section tFileInputProperties.
1731
tFileOutputXML
tFileOutputXML
tFileOtputXML belongs to two component families: File and XML. For more information on tFileOutputXML,
see section tFileOutputXML.
1732
tFileProperties
tFileProperties
tFileProperties Properties
Component family
File/Management
Function
tFileProperties creates a single row flow that displays the properties of the processed file.
Purpose
Basic settings
Schema
File
Select this check box to check the MD5 of the downloaded file.
Advanced settings
tStatCatcher Statistics
Usage
Connections
Limitation
n/a
Drop a tFileProperties component and a tLogRow component from the Palette onto the design workspace.
2.
1733
3.
4.
5.
6.
If desired, click the Edit schema button to see the read-only columns.
7.
In the File field, enter the file path or browse to the file you want to display the properties for.
8.
In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information, see section tLogRow.
9.
1734
tFileRowCount
tFileRowCount
tFileRowCount properties
Component Family
File/Management
Function
Purpose
tFileRowCount opens a file and reads it row by row in order to determine the number of rows inside.
Basic settings
File Name
Name and path of the file to be processed and/or the variable to be used.
See also: Talend Studio User Guide.
Row separator
Select this checkbox to ignore the empty rows while the component is
counting the rows in the file.
Encoding
Select the encoding type from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Global Variables
COUNT: Returns the number of rows in a file. This is a Flow variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose
the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means
it functions after the execution of a component.
Connections
Usage
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided. You
can easily find out and add such JARs in the Integration perspective of your studio. For details, see
the section about external modules in the Talend Installation and Upgrade Guide.
1735
Scenario: Writing a file to MySQL if the number of its records matches a reference value
Drop tFileRowCount, tJava, tFlieInputDelimited, and tMysqlOutput from the Palette onto the design
workspace.
2.
3.
4.
2.
In the File Name field, type in the full path of the .txt file. You can also click the [...] button to browse for
this file.
1736
Scenario: Writing a file to MySQL if the number of its records matches a reference value
In the Code box, enter the function to print out the number of rows in the file:
System.out.println(globalMap.get("tFileRowCount_1_COUNT"));
4.
In the Condition box, enter the statement to judge if the number of rows is 2:
((Integer)globalMap.get("tFileRowCount_1_COUNT"))==2
This if trigger means that if the row count equals 2, the rows of the .txt file will be written to MySQL.
5.
In the File name/Stream field, type in the full path of the .txt file. You can also click the [...] button to
browse for this file.
6.
1737
Scenario: Writing a file to MySQL if the number of its records matches a reference value
7.
Click the [+] button to add two columns, namely id and name, respectively of the integer and string type.
8.
Click the Yes button in the pop-up box to propagate the schema setup to the following component.
9.
10. In the Host and Port fields, enter the connection details.
In the Database field, enter the database name.
1738
Scenario: Writing a file to MySQL if the number of its records matches a reference value
2.
As shown above, the Job has been executed successfully and the number of rows in the .txt file has been
printed out.
3.
As shown above, the table has been created with the two records inserted.
1739
tFileTouch
tFileTouch
tFileTouch properties
Component Family
File/Management
Function
tFileTouch either creates an empty file or, if the specified file already exists, updates its date of
modification and of last access while keeping the contents unchanged.
Purpose
This component creates an empty file or updates the details of an existing file for further operations,
and creates the destination directory if it does not exist.
Basic settings
File Name
Path and name of the file to be created and/or the variable to be used.
Create directory if not exists This check box is selected by default. It creates a directory to hold the
output table if it does not exist.
Advanced settings
tStatCatcher Statistics
Usage
Connections
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Related scenario
No scenario is available for this component yet.
1740
tFileUnarchive
tFileUnarchive
tFileUnarchive Properties
Component family
File/Management
Function
Decompresses the archive file provided as parameter and puts it in the extraction directory.
Purpose
Decompresses an archive file for further processing. Such formats are supported: *.tar.gz ,
*.tgz, *.tar, *.gz and *.zip.
Basic settings
Archive file
Extraction Directory
Use archive name as root Select this check box to create a folder named as the archive, if it
directory
does not exist, under the specified directory and extract the zipped
file(s) to that folder.
Check the integrity before Select this check box to run an integrity check before unzipping
unzip
the archive.
Extract file paths
Select this check box to reproduce the file path structure zipped
in the archive.
Need a password
Select this check box and provide the correct password if the
archive to be unzipped is password protected. Note that the
encrypted archive must be one created by the tFileArchive
component; otherwise you will see error messages or get nothing
extracted even if no error message is displayed.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Usage
This component can be used as a standalone component but it can also be used within a Job
as a Start component using an Iterate link.
Global Variables
Connections
1741
Related scenario
Limitation
Such files can be decompressed: *.tar.gz , *.tgz, *.tar, *.gz and *.zip.
Related scenario
For tFileUnarchive related scenario, see section tFileCompare.
1742
tGPGDecrypt
tGPGDecrypt
tGPGDecrypt Properties
Component family
File/Management
Function
Decrypts a GnuPG-encrypted file and saves the decrypted file in the specified target directory.
Purpose
This component calls the gpg -d command to decrypt a GnuPG-encrypted file and saves the
decrypted file in the specified directory.
Basic settings
Passphrase
No TTY Terminal
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Usage
Limitation
n/a
Drop a tGPGDecrypt component, a tFileInputDelimited component, and a tLogRow component from the
Palette to the design workspace.
1743
2.
Connect the tGPGDecrypt component to the tFileInputDelimited component using a Trigger >
OnSubjobOk link, and connect the tFileInputDelimited component to the tLogRow component using a
Row > Main link.
Double-click the tGPGDecrypt to open its Component view and set its properties:
2.
3.
In the Output decrypted file field, enter the path to the decrypted file.
If the file path contains accented characters, you will get an error message when running the Job. For more information
on what to do when the accents are not supported, see Talend Installation and Upgrade Guide of the Talend solution
you are using.
4.
In the GPG binary path field, browse to the GPG command file.
5.
In the Passphrase field, enter the passphrase used when encrypting the input file.
6.
Double-click the tFileInputDelimited component to open its Component view and set its properties:
7.
In the File name/Stream field, define the path to the decrypted file, which is the output path you have defined
in the tGPGDecrypt component.
8.
In the Header, Footer and Limit fields, define respectively the number of rows to be skipped in the beginning
of the file, at the end of the file and the number of rows to be processed.
9.
Use a Built-In schema. This means that it is available for this Job only.
10. Click Edit schema and edit the schema for the component. Click twice the [+] button to add two columns
that you will call idState and labelState.
11. Click OK to validate your changes and close the editor.
1744
2.
The specified file is decrypted and the defined number of rows of the decrypted file are printed on the Run console.
1745
tHDFSCompare
tHDFSCompare
tHDFSCompare component belongs to two component families: Big Data and File. For more information about
tHDFSCompare, see section tHDFSCompare.
1746
tHDFSConnection
tHDFSConnection
tHDFSConnection component belongs to two component families: Big Data and File. For more information about
tHDFSConnection, see section tHDFSConnection.
1747
tHDFSCopy
tHDFSCopy
tHDFSCopy belongs to two component families: Big Data and File. For more information on tHDFSCopy, see
section tHDFSCopy.
1748
tHDFSDelete
tHDFSDelete
tHDFSDelete component belongs to two component families: Big Data and File. For more information about
tHDFSDelete, see section tHDFSDelete.
1749
tHDFSExist
tHDFSExist
tHDFSExist component belongs to two component families: Big Data and File. For more information about
tHDFSExist, see section tHDFSExist.
1750
tHDFSGet
tHDFSGet
tHDFSGet component belongs to two component families: Big Data and File. For more information about
tHDFSGet, see section tHDFSGet.
1751
tHDFSList
tHDFSList
tHDFSList belongs to two component families: Big Data and File. For more information on tHDFSList, see
section tHDFSList.
1752
tHDFSInput
tHDFSInput
tHDFSInput component belongs to two component families: Big Data and File. For more information about
tHDFSInput, see section tHDFSInput.
1753
tHDFSOutput
tHDFSOutput
tHDFSOutput component belongs to two component families: Big Data and File. For more information about
tHDFSOutput, see section tHDFSOutput.
1754
tHDFSProperties
tHDFSProperties
tHDFSProperties component belongs to two component families: Big Data and File. For more information about
tHDFSProperties, see section tHDFSProperties.
1755
tHDFSPut
tHDFSPut
tHDFSPut component belongs to two component families: Big Data and File. For more information about
tHDFSPut, see section tHDFSPut.
1756
tHDFSRename
tHDFSRename
tHDFSRename component belongs to two component families: Big Data and File. For more information about
tHDFSRename, see section tHDFSRename.
1757
tHDFSRowCount
tHDFSRowCount
tHDFSRowCount component belongs to two component families: Big Data and File. For more information about
tHDFSRowCount, see section tHDFSRowCount.
1758
tNamedPipeClose
tNamedPipeClose
tNamedPipeClose properties
Component family
File/Input
Function
Purpose
Basic settings
Pipe
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your pipe connection dynamically from multiple connections planned in your Job.
When a dynamic parameter is defined, the Pipe box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For details,
see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For a related scenario, see section Scenario: Writing and loading data through a named-pipe.
1759
tNamedPipeOpen
tNamedPipeOpen
tNamedPipeOpen properties
Component family
File/Input
Function
Purpose
This component is used in inner-process communication, it opens a named-pipe for writing data into it.
Basic settings
Name
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
This component is usually used as the starting component in a inner-process communication Job.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For details,
see the section about external modules in the Talend Installation and Upgrade Guide.
Related scenario
For a related scenario, see section Scenario: Writing and loading data through a named-pipe.
1760
tNamedPipeOutput
tNamedPipeOutput
tNamedPipeOutput properties
Component family
File/Input
Function
Purpose
This component allows you to write data into an existing open named-pipe.
Basic settings
Pipe component
Pipe name
Row separator
Field separator
CSV options
Select this check box to take into account all parameters specific
to CSV files, in particular Escape char and Text enclosure
parameters.
Advanced settings
Dynamic settings
Boolean type
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your pipe connection dynamically from multiple connections planned in your Job.
The Dynamic settings table is available only when the Use existing pipe connection check box is
selected in the Basic settings view. When a dynamic parameter is defined, the Pipe component list
box in the Basic settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component is usually connected to another component in a subjob that reads data from a source.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not provided.
You can easily find out and add such JARs in the Integration perspective of your studio. For details,
see the section about external modules in the Talend Installation and Upgrade Guide.
1761
Drop the following components from the Palette to the design workspace: tNamedPipeOpen, tParallelize,
tNamedPipeClose, tFileInputDelimited, tSleep, tLogRow, tRowGenerator and tNamedPipeOutput.
2.
3.
4.
5.
6.
Connect tParallelize to tNamedPipeClose using a Trigger > Synchronize (Wait for all) connection.
7.
8.
1762
2.
3.
4.
((String)globalMap.get("tNamedPipeOpen_1_PIPE_NATIVE_NAME"))
5.
1763
6.
Click the plus button to add three columns for tFileInputDelimited. Fill the three Column fields with id,
first_name and last_name and set the Type of id to Integer. Keep the rest of the settings as default.
7.
8.
Keep the rest of the settings in the Basic settings view of tFileInputDelimited as default.
9.
Double-click tSleep and fill the Pause (in seconds) field with 1.
10. Double-click tRowGenerator to define its properties in its Basic settings view.
11. Click RowGenerator Editor to define the schema.
12. Click the plus button to add three columns for tRowGenerator. Fill the three Column fields with id,
first_name and last_name and set the Type of id to Integer. Keep the rest of the settings of Type as default.
13. Select sequence from the list in the Functions field for id.
14. Select getFirstName from the list in the Functions field for Column first_name.
15. Select TalendDataGenerator.getLastName from the list in the Functions field for Column last_name.
16. Select id, fill the Value field under Function parameters tab with s1 for sequence identifier, 1001 for start
value and 1 for step.
1764
2.
Select the Use existing pipe connection checkbox and select tNamedPipeOpen_1 from the Pipe component
list.
3.
4.
Click Sync columns to retrieve the schema from the preceding component.
5.
6.
7.
Click Sync columns to retrieve the schema from the preceding component.
8.
9.
1765
The data written into the named-pipe is displayed onto the console.
1766
tPivotToColumnsDelimited
tPivotToColumnsDelimited
tPivotToColumnsDelimited Properties
Component family
File/Output
Function
Purpose
Basic settings
Pivot column
Select the column from the incoming flow that will be used as
pivot for the aggregation operation.
Aggregation column
Select the column from the incoming flow that contains the data
to be aggregated.
Aggregation function
Group by
Define the aggregation sets, the values of which will be used for
calculations.
Input Column: Match the input column label with your output
columns, in case the output label of the aggregation set needs to
be different.
File Name
Global Variables
Field separator
Row separator
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
NB_LINE_OUT: Indicates the number of rows written to the file by the component. This is
an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
1767
Drop the following component from the Palette to the design workspace: tFileInputDelimited,
tPivotToColumnsDelimited.
2.
2.
Browse to the input file to fill out the File Name field.
The file to use as input file is made of 3 columns, including: ID, Question and the corresponding Answer
1768
3.
Define the Row and Field separators, in this example, respectively: carriage return and semi-colon
4.
5.
Set the schema describing the three columns: ID, Questions, Answers.
2.
In the Pivot column field, select the pivot column from the input schema. this is often the column presenting
most duplicates (pivot aggregation values).
3.
In the Aggregation column field, select the column from the input schema that should gets aggregated.
4.
In the Aggregation function field, select the function to be used in case duplicates are found out.
5.
In the Group by table, add an Input column, that will be used to group by the aggregation column.
6.
In the File Name field, browse to the output file path. And on the Row and Field separator fields, set the
separators for the aggregated output rows and data.
2.
1769
tSqoopExport
tSqoopExport
tSqoopExport component belongs to two component families: Big Data and File. For more information about
tSqoopExport, see section tSqoopExport.
1770
tSqoopImport
tSqoopImport
tSqoopImport component belongs to two component families: Big Data and File. For more information about
tSqoopImport, see section tSqoopImport.
1771
tSqoopImportAllTables
tSqoopImportAllTables
tSqoopImportAllTables component belongs to two component families: Big Data and File. For more information
about tSqoopImportAllTables, see section tSqoopImportAllTables.
1772
tSqoopMerge
tSqoopMerge
tSqoopMerge component belongs to two component families: Big Data and File. For more information about
tSqoopMerge, see section tSqoopMerge.
1773
Internet components
This chapter details the main components which belong to the Internet family in the Palette in the Integration
perspective of Talend Studio.
The Internet family comprises all of the components which help you to access information via the Internet, through
various means including Web services, RSS flows, SCP, Emails, FTP etc.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tFileFetch
tFileFetch
tFileFetch properties
Component family
Internet
Function
Purpose
tFileFetch allows you to retrieve file data according to the protocol which is in place.
Basic settings
Protocol
Select the protocol you want to use from the list and fill in the
corresponding fields: http, https, ftp, smb.
The properties differ slightly depending on the type of protocol
selected. The additional fields are defined in this table, after the
basic settings.
URI
Type in the URI of the site from which the file is to be fetched.
Use cache to save resource Select this check box to save the data in the cache.
This option allows you to process the file data flow (in streaming
mode) without saving it on your drive. This is faster and improves
performance.
Domain
Destination Directory
Destination Filename
Create full path according This check box is selected by default. It allows you to reproduce the
to URI
URI directory path. To save the file at the root of your destination
directory, clear the check box.
Available for the http, https and ftp protocols.
Add header
Select this check box if you want to add one or more HTTP request
headers as fetch conditions. In the Headers table, enter the name(s)
of the HTTP header parameter(s) in the Headers field and the
corresponding value(s) in the Value field.
Available for the http and https protocols.
POST method
This check box is selected by default. It allows you to use the POST
method. In the Parameters table, enter the name of the variable(s)
in the Name field and the corresponding value in the Value field.
Clear the check box if you want to use the GET method.
Available for the http and https protocols.
Die on error
Clear this check box to skip the rows in error and to complete the
process for the error free rows
Available for the http, https and ftp protocols.
Read Cookie
1776
tFileFetch properties
Save Cookie
Select this check box to save the web page authentication cookie.
This means you will not have to log on to the same web site in the
future.
Available for the http, https, ftp and smb protocols.
Cookie directory
Click [...] and browse to where you want to save the cookie in your
directory, or to where the cookie is already saved.
Available for the http, https, ftp and smb protocols.
Cookie policy
Check this box to put all cookies into one request header for
maximum compatibility among different servers.
Available for the http, https, ftp and smb protocols.
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at each component
level.
Timeout
Select this check box to print the server response in the console.
Available for the http and https protocols.
Upload file
Select this check box to upload one or more files to the server. In
the Name field, enter the name of the file you want to upload and
in the File field, indicate the path.
Available for the http and https protocols.
Select this check box if you are connecting via a proxy and
complete the fields which follow with the relevant information.
Available for the http, https and ftp protocols.
Enable NTLM Credentials Select this check box if you are using an NTLM authentication
protocol.
Domain: The client domain name.
Host: The clients IP address.
Available for the http and https protocols.
Need authentication
Select this check box and enter the username and password in the
relevant fields, if they are required to access the protocol.
Available for the http and https protocols.
Support redirection
Usage
This component is generally used as a start component to feed the input flow of a Job and is
often connected to the Job using an OnSubjobOk or OnComponentOk link, depending on the
context.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
1777
2.
3.
In the Basic settings view of tFileFetch, select the protocol you want to use from the list. Here, use the
HTTP protocol.
2.
Type in the URI where the file to be fetched can be retrieved from.
3.
In the Destination directory field, browse to the folder where the fetched file is to be stored.
4.
In the Filename field, type in a new name for the file if you want it to be changed. In this example, filefetch.txt.
5.
If needed, select the Add header check box and define one or more HTTP request headers as fetch conditions.
For example, to fetch the file only if it has been modified since 19:43:31 GMT, October 29, 1994, fill in the
Name and Value fields with "If-Modified-Since" and "Sat, 29 Oct 1994 19:43:31 GMT" respectively in the
Headers table. For details about HTTP request header definitions, see Header Field Definitions.
6.
Select the tFileInputRegex, set the File name so that it corresponds to the file fetched earlier.
7.
Using a regular expression, in the Regex field, select the relevant data from the fetched file. In this example:
<td(?: class="leftalign")?> \s* (t\w+) \s* </td>
Regex syntaxe requires double quotation marks.
8.
Define the header, footer and limit if need be. In this case, ignore these fields.
9.
Define the schema describing the flow to be passed on to the final output.
The schema should be automatically propagated to the final output, but to be sure, check the schema in the
Basic settings panel of the tFileOutputDelimited component.
1778
2.
Then press F6 or click Run on the Run tab to execute the Job.
2.
Link the two components as subjobs using a Trigger > On Subjob Ok connection.
1779
2.
In the Procotol field, select the protocol you want to use from the list. Here, we use the HTTP protocol.
3.
In the URI field, type in the URI through which you can log in the website and fetch the web page accordingly.
In this example, the URI is http://www.codeproject.com/script/Membership/LogOn.aspx?rp=http
%3a%2f%2fwww.codeproject.com%2fKB%2fcross-platform%2fjavacsharp.aspx&download=true.
4.
In the Destination directory field, browse to the folder where the fetched file is to be stored. This folder will
be created on the fly if it does not exist. In this example, type in C:/Logpage.
5.
In the Destination Filename field, type in a new name for the file if you want it to be changed. In this
example, webpage.html.
6.
Under the Parameters table, click the plus button to add two rows.
7.
In the Name column of the Parameters table, type in a new name respectively for the two rows. In this
example, they are Email and Password, which are required by the website you are logging in.
8.
9.
Select the Save cookie check box to activate the Cookie directory field.
10. In the Cookie directory field, browse to the folder where you want to store cookie file and type in a name
for the cookie to be saved. This folder must exist already. In this example, the directory is C:/temp/Cookie.
1780
Related scenario
2.
3.
In the URI field, type in the address from which you fetch the files of your interest. In this example, the
address is http://www.codeproject.com/KB/java/RemoteShell/RemoteShell.zip.
4.
In the Destination directory field, type in the directory or browse to the folder where you want to store the
fetched files. This folder can be automatically created if it does not exist yet during the execution process.
In this example, type in C:/JavaProject.
5.
In the Destination Filename field, type in a new name for the file if you want it to be changed. In this
example, RemoteShell.zip.
6.
Clear the Post method check box to deactivate the Parameter table.
7.
Select the Read cookie check box to activate the Cookie directory field.
8.
In the Cookie directory field, type in the directory or browse to the cookie file you have saved and need to
use. In this example, the directory is C:/temp/Cookie.
2.
Then press F6 to run the Job, and check each folder you have used to store the fetched files.
Related scenario
For an example of transferring data in streaming mode, see section Scenario 2: Reading data from a remote file
in streaming mode
1781
tFileInputJSON
tFileInputJSON
tFileInputJSON belongs to two different component families: Internet and File. For further information, see
section tFileInputJSON.
1782
tFTPConnection
tFTPConnection
tFTPConnection properties
Component family
Internet/FTP
Function
tFTPConnection opens an FTP connection in order that a transaction may be carried out.
Purpose
tFTPConnection allows you to open an FTP connection to transfer files in a single transaction.
Basic settings
Property type
Host
Port
SFTP Support
FTPS Support
Connect mode
Usage
This component is typically used as a single-component sub-job. It is used along with other FTP
components.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenarios
For a related scenario, see section Scenario: Putting files on a remote FTP server.
For a related scenario, see section Scenario: Iterating on a remote directory.
For a related scenario using a different protocol, see section Scenario: Getting files from a remote SCP server.
1783
tFTPDelete
tFTPDelete
tFTPDelete properties
Component family
Internet/FTP
Function
Purpose
Basic settings
Property type
Host
FTP IP address
Port
Remote directory
SFTPSupport/
Authentication method
Select this check box and then in the Authentication method list,
select the SFTP authentication method:
Password: Type in the password required in the relevant field.
Public key: Type in the private key or click the three dot button
next to the Private key field to browse to it.
If you select Public Key as the SFTP authentication
method, make sure that the key is added to the agent or
that no passphrase (secret phrase) is required.
Use Perl5 Regex Expression Select this check box if you want to use Perl5 regular expressions
as Filemask
in the Files field as file filters.
For information about Perl5 regular expression syntax, see Perl5
Regular Expression Syntax.
Files
Usage
This component is typically used as a single-component sub-job but can also be used as an output
or end object.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
For tFTPDelete related scenario, see section Scenario: Putting files on a remote FTP server.
For tFTPDelete related scenario using a different protocol, see section Scenario: Getting files from a remote SCP
server.
1784
tFTPFileExist
tFTPFileExist
tFTPFileExist properties
Component family
Internet/FTP
Function
Purpose
Basic settings
Property type
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
Host
FTP IP address.
Port
File Name
SFTPSupport/
Authentication method
Select this check box and then in the Authentication method list,
select the SFTP authentication method:
Password: Type in the password required in the relevant field.
Public key: Type in the private key or click the three dot button
next to the Private key field to browse to it.
If you select Public Key as the SFTP authentication
method, make sure that the key is added to the agent or
that no passphrase (secret phrase) is required.
Connection Mode
Advanced settings
Encoding Type
Select an encoding type from the list, or select Custom and define
it manually. This field is compulsory for DB data handling.
Select this check box if you want to use a proxy. Then, set the Host,
Port, User and Password proxy fields.
Ignore Failure At Quit Select this check box to ignore library closing errors or FTP closing
(FTP)
errors.
tStatCatcher Statistics
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
1785
Related scenario
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
For tFTPFileExist related scenario, see section Scenario: Putting files on a remote FTP server.
For tFTPFileExist related scenario using a different protocol, see section Scenario: Getting files from a remote
SCP server.
1786
tFTPFileList
tFTPFileList
tFTPFileList properties
Component family
Internet/FTP
Function
Objective
tFTPFileList retrieves files and /or folders based on a defined filemask pattern and iterates on
each of them by connecting to a remote directory via an FTP protocol.
Basic settings
Property type
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
Host
FTP IP address.
Port
File detail
Select this check box if you want to display the details of each of
the files or folders on the remote host. These informative details
include:
type of rights on the file/folder, name of the author, name of the
group of users that have a read-write rights, file size and date of
last modification.
SFTPSupport/
Authentication method
Select this check box and then in the Authentication method list,
select the SFTP authentication method:
Password: Type in the password required in the relevant field.
Public key: Type in the private key or click the three dot button
next to the Private key field to browse to it.
If you select Public Key as the SFTP authentication
method, make sure that the key is added to the agent or
that no passphrase (secret phrase) is required.
Files
Click the plus button to add the lines you want to use as filters:
Filemask: enter the filename or filemask using wildcharacters (*)
or regular expressions.
Connect Mode
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
1787
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Drop the following components from the Palette to the design workspace: tFTPConnection, tFTPFileList
and tFTPGet.
2.
Double-click tFTPConnection to display its Basic settings view and define the component properties.
2.
3.
4.
In the Username and Password fields, enter your authentication information for the FTP server.
5.
In the Connect Mode list, select the FTP connection mode you want to use, Passive in this example.
1788
Double-click tFTPFileList to open its Basic settings view and define the component properties.
2.
Select the Use an existing connection check box and in the Component list, click the relevant FTP
connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled
in.
3.
In the Remote directory field, enter the relative path of the directory that holds the files to be listed.
4.
In the Filemask field, click the plus button to add one line and then define a file mask to filter the data to be
retrieved. You can use special characters if need be. In this example, we want only to recuperate delimited
files (*csv).
5.
In the Connect Mode list, select the FTP server connection mode you want to use, Active in this example.
Double-click tFTPGet to display its Basic settings view and define the components properties.
2.
Select the Use an existing connection check box and in the Component list, click the relevant FTP
connection component, tFTPConnection_1 in this scenario. Connection information are automatically filled
in.
1789
3.
In the Local directory field, enter the relative path for the output local directory where you want to write
the recuperated files.
4.
In the Remote directory field, enter the relative path of the remote directory that holds the file to be
recuperated.
5.
In the Transfer Mode list, select the FTP transfer mode you want to use, ascii in this example.
6.
In the Overwrite file field, select an option for you want to use for the transferred files.
7.
In the Files area, click the plus button to add a line in the Filemask list, then click in the
added line and pressCtrl+Space to access the variable list. In the list, select the global variable
((String)globalMap.get("tFTPFileList_1_CURRENT_FILEPATH")) to process all files in the remote
directory.
8.
In the Connect Mode list, select the connection mode to the FTP server you want to use.
2.
All .csv files held in the remote directory on the FTP server are listed in the defined directory, as defined in
the filemask. Then the files are retrieved and saved in the defined local output directory.
1790
tFTPFileProperties
tFTPFileProperties
tFTPFileProperties Properties
Component family
Internet
Function
Purpose
tFTPFileProperties retrieves files and /or folders based on a defined filemask pattern and
iterates on each of them by connecting to a remote directory via an FTP protocol.
Basic settings
Property type
Host
FTP IP address
Port
Username
Password
FTP password.
Remote directory
File
SFTP
Support
and Select this check box and then in the Authentication method list,
Authentication method
select the SFTP authentication method:
Password: Type in the password required in the relevant field.
Public key: Type in the private key or click the three dot button
next to the Private key field to browse to it.
If you select Public Key as the SFTP authentication
method, ensure that the key is added to the agent or that
no passphrase (secret phrase) is required.
If you do not select the check box, choose the connection mode
you want to use:
Active: You determine the connection port to use to allow data
transfer.
Passive: the FTP server determines the connection port to use to
allow data transfer.
Advanced settings
Encoding
Select an encoding type from the list, or select Custom and define
it manually. This field is compulsory for DB data handling.
Select this check box to check the of the downloaded files MD5.
Select this check box if you want to use a proxy. Then, set the
Host, Port, User and Password proxy fields.
1791
Related scenario
Ignore Failure At Quit Select this check box to ignore library closing errors or FTP
(FTP)
closing errors.
tStatCatcher Statistics
Usage
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
Related scenario
For a related scenario, see section Scenario: Displaying the properties of a processed file
1792
tFTPGet
tFTPGet
tFTPGet properties
Component family
Internet/FTP
Function
Purpose
tFTPGet retrieves selected files from a defined remote FTP directory and cop them to a local
directory.
Basic settings
Property type
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
Host
FTP IP address.
Port
Username
Password
FTP password.
Local directory
Remote directory
Transfer mode
Overwrite file
SFTP Support
When you select this check box, the Overwrite file and
Authentication method appear.
Overwrite file: Offers three options:
Overwrite: Overwrite the existing file.
Resume: Resume downloading the file from the point of
interruption.
Append: Add data to the end of the file without overwriting data.
Authentication Offers two means of authentication:
Public key: Enter the access path to the public key.
Password: Enter the password.
FTPS Support
Use Perl5 Regex Expression Select this check box if you want to use Perl5 regular expressions
as Filemask
in the Files field as file filters.
1793
Related scenario
Advanced settings
Usage
Files
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows.
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a Job
level as well as at each component level.
Print message
Select this check box to display in the Console the list of files
downloaded.
This component is typically used as a single-component sub-job but can also be used as output
or end object.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
For an tFTPGet related scenario, see section Scenario: Putting files on a remote FTP server.
For an tFTPGet related scenario, see section Scenario: Iterating on a remote directory.
For an tFTPGet related scenario using a different protocol, see section Scenario: Getting files from a remote
SCP server.
1794
tFTPPut
tFTPPut
tFTPPut properties
Component family
Internet/FTP
Function
Purpose
tFTPPut copies selected files from a defined local directory to a destination remote FTP
directory.
Basic settings
Property type
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
Host
FTP IP address.
Port
Username
Password
FTP password.
Local directory
Remote directory
Transfer mode
SFTPSupport/
Authentication method
Select this check box and then in the Authentication method list,
select the SFTP authentication method:
Password: Type in the password required in the relevant field.
Public key: Type in the private key or click the three dot button
next to the Private key field to browse to it.
If you select Public Key as the SFTP authentication
method, make sure that the key is added to the agent or
that no passphrase (secret phrase) is required.
Use Perl5 Regex Expression Select this check box if you want to use Perl5 regular expressions
as Filemask
in the Files field as file filters.
For information about Perl5 regular expression syntax, see Perl5
Regular Expression Syntax.
Files
Click the [+] button to add a new line, then fill in the columns.
Filemask: file names or path to the files to be transferred.
New name: name to give the FTP file after the transfer.
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Usage
This component is typically used as a single-component sub-job but can also be used as output
component.
1795
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Drop tFTPConnection and tFTPPut from the Palette onto the design workspace. tFTPConnection allows
you to perform all operations in one transaction.
2.
1796
Double-click tFTPConnection to display its Basic settings view and define its properties.
2.
3.
4.
In the Username and Password fields, enter your login and password for the remote server.
5.
From the Connect Mode list, select the FTP connection mode you want to use, Active in this example.
In the design workspace, double-click tFTPPut to display its Basic settings view and define its properties.
2.
Select the Use an existing connection check box and then select tFTPConnection_1 from the Component
List. The connection information is automatically filled in.
3.
In the Local directory field, enter the path to the local directory containing the files, if all your files are in
the same directory. If the files are in different directories, enter the path for each file in the Filemask column
of the Files table.
4.
In the Remote directory field, enter the path to the destination directory on the remote server.
5.
From the Transfer mode list, select the transfer mode to be used.
6.
From the Overwrite file list, select an option for the transferred file.
7.
In the Files table, click twice the plus button to add two lines to the Filemask column and then fill in the
filemasks of all files to be copied onto the remote directory.
2.
1797
1798
tFTPRename
tFTPRename
tFTPRename Properties
Component Family
Internet/FTP
Function
Purpose
Basic settings
Property type
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
Host
FTP IP address.
Port
Username
Password
Remote directory
Overwrite file
SFTPSupport/
Authentication method
Select this check box and then in the Authentication method list,
select the SFTP authentication method:
Password: Type in the password required in the relevant field.
Public key: Type in the private key or click the three dot button
next to the Private key field to browse to it.
If you select Public Key as the SFTP authentication
method, make sure that the key is added to the agent or
that no passphrase (secret phrase) is required.
Files
Click the [+] button to add the lines you want to use as filters:
Filemask: enter the filename or filemask using wildcharacters (*)
or regular expressions.
New name: name to give to the FTP file after the transfer.
Connection Mode
Encoding type
Select an encoding type from the list, or select Custom and define
it manually. This field is compulsory for DB data handling.
Die on error
This check box is selected by default. Clear the check box to skip
the row in error and complete the process for error-free rows.
1799
Advanced settings
Select this check box if you want to use a proxy. Then, set the Host,
Port, User and Password proxy fields.
Ignore Failure At Quit Select this check box to ignore library closing errors or FTP closing
(FTP)
errors.
tStatCatcher Statistics
Usage
This component is generally used as a subjob with one component, but it can also be used as
an output or end component..
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Global Variables
Drop tFTPConnection and tFTPRename from the Palette onto the design workspace.
2.
1800
2.
3.
4.
5.
6.
In the Remote directory field, enter the directory on the FTP server where the file exists.
7.
8.
9.
2.
3.
1801
Related scenario
As shown above, the file on the FTP server has been renamed from movies.json to action_movies.json.
Related scenario
For a related scenario, see section Scenario: Putting files on a remote FTP server .
1802
tFTPTruncate
tFTPTruncate
tFTPTruncate properties
Component family
Internet/FTP
Function
Objective
tFTPTruncate truncates the selected files of a defined local directory via a distant FTP directory.
Basic settings
Property type
Use an existing connection/ Select this check box and in the Component List click the relevant
Component List
connection component to reuse the connection details you already
defined.
Host
FTP IP address.
Port
SFTPSupport/
Authentication method
Select this check box and then in the Authentication method list,
select the SFTP authentication method:
Password: Type in the password required in the relevant field.
Public key: Type in the private key or click the three dot button
next to the Private key field to browse to it.
If you select Public Key as the SFTP authentication
method, make sure that the key is added to the agent or
that no passphrase (secret phrase) is required.
Use Perl5 Regex Expression Select this check box if you want to use Perl5 regular expressions
as Filemask
in the Files field as file filters.
For information about Perl5 regular expression syntax, see Perl5
Regular Expression Syntax.
Files
Click the plus button to add the lines you want to use as filters:
Filemask: enter the filename or filemask using wildcards (*) or
regular expressions.
Connection Mode
Advanced settings
Encoding type
Select an encoding type from the list, or select Custom and define
it manually. This field is compulsory for DB data handling.
Select this check box if you want to use a proxy. Then, set the Host,
Port, User and Password proxy fields.
Ignore Failure At Quit Select this check box to ignore library closing errors or FTP closing
(FTP)
errors.
1803
Related scenario
tStatCatcher Statistics
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related scenario
For a related scenario, see section Scenario: Putting files on a remote FTP server.
1804
tHttpRequest
tHttpRequest
tHttpRequest properties
Component family
Internet
Function
This component sends an HTTP request to the server end and gets the corresponding response
information from the server end.
Purpose
The tHttpRequest component allows you to send an HTTP request to the server and output
the response information locally.
Basic settings
Sync columns
URI
Method
Write response content to Select this check box to save the HTTP response to a local file.
file
You can either type in the file path in the input field or click the
three-dot button to browse to the file path.
Headers
Need authentication
Select this check box to fill in a user name and a password in the
corresponding fields if authentication is needed:
user: Fill in the user name for the authentication.
password: Fill in the password for the authentication.
Advanced settings
tStatCatcher Statistics
Usage
This component can be used in sending HTTP requests to server and saving the response
information. This component can be used as a standalone component.
Limitation
N/A
1805
Scenario: Sending a HTTP request to the server and saving the response information to a local file
Connect the tHttpRequest component to the tLogRow component using a Row > Main connection.
Double-click the tHttpRequest component to open its Basic settings view and define the component properties.
Fill in the URI field with http://192.168.0.63:8081/testHttpRequest/build.xml. Note that this URI is for
demonstration purpose only and it is not a live address.
Select GET from the Method list.
Select the Write response content to file check box and fill in the input field on the right with the file path
by manual entry, D:/test.txt for this use case.
Select the Need authentication check box and fill in the user and password, both tomcat in this use case.
Double-click the tLogRow component to open its Basic settings view and select Table in the Mode area.
Save your Job and press F6 to execute it.
Then the response information from the server is saved and displayed.
1806
Scenario: Sending a HTTP request to the server and saving the response information to a local file
1807
tJMSInput
tJMSInput
tJMSInput properties
Component Family
Internet
Function
Purpose
Using a JMS server, tJMSInput makes it possible to have loosely coupled, reliable, and
asynchronous communication between different components in a distributed application.
Basic settings
Module List
Context Provider
Type
in
the
context
URL,
for
example
"com.tibco.tibjms.naming.TibjmsInitialContext
Factory".
However, be careful, the syntax can vary according to the JMS
server used.
Server URL
Message From
Timeout for Next Message Type in the number of seconds before passing to the next
(in sec)
message.
Maximum Messages
Message
Expression
Processing Mode
Advanced settings
Global Variables
Properties
Click the plus button underneath the table to add lines that
contains username and password required for user authentication.
tStatCatcher Statistics
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
1808
Related scenarios
Usage
Limitation
Related scenarios
No scenario is available for this component yet.
1809
tJMSOutput
tJMSOutput
tJMSOutput properties
Component Family
Internet
Function
Purpose
Using a JMS server, tJMSOutput makes it possible to have loosely coupled, reliable, and
asynchronous communication between different components in a distributed application.
Basic settings
Module List
Context Provider
Type
in
the
context
URL,
for
example
"com.tibco.tibjms.naming.TibjmsInitialContext
Factory".
However, be careful, the syntax can vary according to the JMS
server used.
Server URL
To
Processing Mode
Advanced settings
Delivery Mode
Select a delivery mode from this list to ensure the quality of data
delivery:
Not Persistent: This mode allows data loss during the data
exchange.
Persistent: This mode ensures the integrity of message delivery.
Global Variables
Properties
Click the plus button underneath the table to add lines that
contains username and password required for user authentication.
tStatCatcher Statistics
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list
and choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
1810
Limitation
2.
3.
4.
1811
2.
3.
Click the [+] button to add one column, namely messageContent, of the string type.
Click OK to validate the setup and close the editor.
4.
Now appears the pop-up box that asks for schema propogation.
6.
In the Module List list, select the library to be used, namely the activemq jar in this case.
7.
In
the
Context
Provider
field,
enter
"org.apache.activemq.jndi.ActiveMQInitialContextFactory" in this case.
8.
9.
In the Connection Factory JDNI Name field, enter the JDNI name, "QueueConnectionFactory" in this case.
the
context
URI,
10. Select the Use Specified User Identity check box to show the User Name and Password fields, where you
can enter the authentication data.
11. In the Message type list, select Queue.
12. In the Processing Mode list, select Message Content.
13. Perform the same setup in the Basic settings view of JMSInput.
1813
Related scenarios
2.
Press F6 to run the Job. Note that the ActiveMQ server has started at tcp://192.168.30.46:61616.
Related scenarios
No scenario is available for this component yet.
1814
tMicrosoftMQInput
tMicrosoftMQInput
tMicrosoftMQInput Properties
Component family
Function
This component retrieves the first message in a given Microsoft message queue (only support
String).
Purpose
This component allows you to fetch messages one by one in the ID sequence of these messages
from the Microsoft message queue. Each execution retrieves only one message.
Basic settings
Property type
Host
Queue
Advanced settings
tStatCatcher Statistics
Usage
This component is generally used as a start component of a Job or Subjob. It must be linked
to an output component.
Connections
Limitation
This component supports only String type. Also, it only works with the Windows systems.
This component requires installation of its related jar files. For more information about the
installation of these missing jar files, see the section describing how to configure the Studio of
the Talend Installation and Upgrade Guide.
1815
Scenario: Writing and fetching queuing messages from Microsoft message queue
Drop the three components required for the first Job from the Palette onto the design workspace.
2.
2.
Click the plus button to add three rows into the schema table.
3.
In the Column column, type in a new name for each row to rename it. Here, we type in ID, Name and Address.
4.
In the Type column, select Integer for the ID row from the drop-down list and leave the other rows as String.
5.
In the Functions column, select random for the ID row, getFirstName for the Name row and getUsCity
for the Address row.
6.
In the Number of Rows for RowGenerator field on the right end of the toolbar, type in 12 to limit the
number of rows to be generated. Then, Click Ok to validate this editing.
In real case, you may use an input component to load the data of your interest, instead of the tRowGenerator
component.
7.
1816
Scenario: Writing and fetching queuing messages from Microsoft message queue
8.
In the Host field, type in the host address. In this example, it is localhost.
9.
In the Queue field, type in the queue name you want to write message in. In this example, name it
AddressQueue.
10. In Message column (String Type) field, select Address from the drop-down list to determine the message
body to be written.
2.
You can see that this queue has been created automatically and that the messages have been written.
1817
Scenario: Writing and fetching queuing messages from Microsoft message queue
Drop tMicrosoftMQInput and tLogRow from the Palette to the design workspace.
2.
2.
In the Host field, type in the host name or address. Here, we type in localhost.
3.
In the Queue field, type in the queue name from which you want to fetch the message. In this example, it
is AddressQueue.
2.
The message body Atlanta fetched from the queue is displayed on the console.
1818
Scenario: Writing and fetching queuing messages from Microsoft message queue
1819
tMicrosoftMQOutput
tMicrosoftMQOutput
tMicrosoftMQOutput Properties
Component family
Function
This component writes a defined column of given inflow data to Microsoft message queue
(only support String type).
Purpose
Basic settings
Property type
Usage
Host
Queue
Type in the name of the queue which you want write a given
message in. This queue can be created automatically on the fly if
it does not exist then.
Message column
Connections
Limitation
Related scenario
For a related scenario, see section Scenario: Writing and fetching queuing messages from Microsoft message queue
1820
tPOP
tPOP
tPOP properties
Component family
Internet
Function
The tPOP component fetches one or more email messages from a server using the POP3 or
IMAP protocol.
Purpose
The tPOP component uses the POP or IMAP protocol to connect to a specific email server.
Then it fetches one or more email messages and writes the recovered information in specified
files. Parameters in the Advanced settings view allows you to use filters on your selection.
Basic settings
Host
Port
Output directory
Enter the path to the file in which you want to store the email
messages you retrieve from the email server, or click the threedot button next to the field to browse to the file.
Filename pattern
Define the syntax of the names of the files that will hold each of
the email messages retrieved from the email server, or press Ctrl
+Space to display the list of predefined patterns.
Select this check box if you do not want to keep the retrieved
email messages on the server.
For Gmail servers, this option does not work for the
pop3 protocol. Select the imap protocol and ensure that
the Gmail account is configured to use imap.
From the list, select the protocol to be used to retrieve the email
messages from the server. This protocol is the one used by the
email server. If you choose the imap protocol, you will be able to
select the folder from which you want to retrieve your emails.
Use SSL
Select this check box if your email server uses this protocol for
authentication and communication confidentiality.
This option is obligatory for users of Gmail.
Advanced settings
tStatCatcher Statistics
Filter
Click the plus button to add as many lines as needed to filter email
messages and retrieve only a specific selection:
Filter item: select one of the following filter types from the list:
From: email messages are filtered according to the sender email
address.
1821
Select the type of logical relation you want to use to combine the
specified filters:
and: the conditions set by the filters are combined together, the
research is more restrictive.
or: the conditions set by the filters are independent, the research
is large.
Usage
This component does not handle data flow, it can be used alone.
Limitation
When the Use SSL check box or the imap protocol is selected, tPOP cannot work with IBM
Java 6.
In the Output directory field, enter the path to the output directory manually, or click the three-dot button
next to the field and browse to the output directory where the email messages retrieved from the email server
are to be stored.
1822
In the Filename pattern field, define the syntax you want to use to name the output files that will hold
the messages retrieved from the email server, or press Ctrl+Space to display a list of predefined patterns.
The syntax used in this example is the following: TalendDate.getDate("yyyyMMdd-hhmmss") + "_" +
(counter_tPOP_1 + 1) + ".txt".
The output files will be stored as .txt files and are defined by date, time and arrival chronological order.
Clear the Retrieve all emails? field and in the Number of emails to retrieve field, enter the number of email
messages you want to retrieve, 10 in this example.
Select the Delete emails from server check box to delete the email messages from the email server once they
are retrieved and stored locally.
In the Choose the protocol field, select the protocol type you want to use. This depends on the protocol used
by the email server. Certain email suppliers, like Gmail, use both protocols. In this example, the protocol used
is pop3.
Save your Job and press F6 to execute it.
The tPOP component retrieves the 10 recent messages from the specified email server.
In the tPOP directory stored locally, a .txt file is created for each retrieved message. Each file holds the metadata
of the email message headings (senders address, recipients address, subject matter) in addition to the message
content.
1823
tREST
tREST
tREST properties
Component family
Internet
Function
The tREST component sends HTTP requests to a REpresentational State Transfer (REST) Web
service provider and gets responses correspondingly.
Purpose
The tREST component serves as a REST Web service client that sends HTTP requests to a
REST Web service provider and gets the responses.
Basic settings
URL
HTTP Method
From this list, select an HTTP method that describes the desired
action. The specific meanings of the HTTP methods are subject
to definitions of your Web service provider. Listed below are the
generally accepted HTTP method definitions:
- GET: retrieves data from the server end based on the given
parameters.
- POST: creates and uploads data based on the given parameters.
- PUT: updates data based on the given parameters, or if the data
does not exist, creates it.
- DELETE: removes data based on the given parameters.
HTTP Headers
HTTP Body
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
Use this component as a REST Web service client to communicate with a REST Web service
provider. It must be linked to an output component.
Limitation
1824
Double click the first tREST component to open its Basic settings view.
Fill the URL field with the URL of the Web service you are going to invoke. Note that the URL provided in
this use case is for demonstration purpose only and is not a live address.
From the HTTP Method list, select POST to send an HTTP request for creating a new record.
Click the plus button to add a line in the HTTP Headers table, and type in the appropriate name-value key pair,
which is subject to definition of your service provider, to indicate the media type of the payload to send to the
server end. In this use case, type in Content-Type and application/xml. For reference information about Internet
media types, visit www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.
Fill the HTTP Body field with the payload to be uploaded to the server end. In this use case, type in
<Customer><name>Steven</name></Customer> to create a record for a new customer named Steven.
1825
If you want to include double quotation marks in your payload, be sure to use a backslash escape character before each of the
quotation marks. In this use case, for example, type in <Customer><name>\"Steven\"</name></Customer> if you want
to enclose the name Steven in a pair of double quotation marks.
Double click the second tREST component to open its Basic settings view.
Fill the URL field with the same URL.
From the HTTP Method list, select GET to send an HTTP request for retrieving the existing records.
In the Basic settings view of each tLogRow, select the Print component unique name in front of each output
row and Print schema column name in front of each value check boxes for better identification of the output
flows.
1826
tRSSInput
tRSSInput
tRSSInput Properties
Component family
Internet
Function
Purpose
tRSSInput makes it possible to keep track of blog entries on websites to gather and organize
information for quick and easy access.
Basic settings
Usage
RSS URL
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
Drop the following components from the Palette onto the design workspace: tRSSInput and tLogRow.
2.
1827
3.
In the design workspace, select tRSSInput, and click the Component tab to define the basic settings for
tRSSInput.
4.
Enter the URL for the RSS_Feed to access. In this scenario, tRSSInput links to the Talend RSS_Feed: http://
feeds.feedburner.com/Talend.
5.
Select/clear the other check boxes as required. In this scenario, we want to display the information about two
articles dated from July 20, 2008.
6.
In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information about tLogRow properties, see section tLogRow properties.
7.
The tRSSInput component accessed the RSS feed of Talend website on your behalf and organized the
information for you.
Two blog entries are displayed on the console. Each entry has its own title, description, publication date,
and the corresponding RSS feed URL address. Blogs show the last entry first, and you can scroll down to
read earlier entries.
1828
tRSSOutput
tRSSOutput
tRSSOutput Properties
Component family
Internet
Function
Purpose
tRSSOutput makes it possible to create XML files that hold RSS or Atom feeds.
Basic settings
File name
Name or path to the output XML file. Related topic: see Talend
Studio User Guide.
Encoding
Select an encoding type from the list, or select Custom and define
it manually. This field is compulsory for DB data handling.
Append
Select this check box to add the new rows to the end of the file.
Mode
Optionnal
Elements
Channel Click the [+] button below the table to add new lines and enter
the information relative to the RSS flow metadata:
Element Name: name of the metadata.
Element Value: content of the metadata.
Advanced settings
tStatCatcher Statistics
Usage
Limitation
n/a
1829
Drop the following components from the Palette onto the design workspace: tMysqlInput, tRSSOutput,
and tFTPPut.
2.
Right-click tMysqlInput and connect it to tRSSOutput using a Row > Main link.
3.
Right-click tMysqlInput and connect it to tFTPPut using a Trigger > OnSubjobOk link.
1830
In the design workspace, select tMysqlInput, and click the Component tab to define the basic settings for
tMysqlInput.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
3.
In the Table Name field, either type your table name or click the three dots button [...] and select your table
name from the list. In this scenario, the Mysql input table is called rss_talend and the schema is made up
of four columns, TITLE, Description, PUBDATE, and LINK.
4.
In the Query field, enter your DB query paying particular attention to properly sequence the fields in order
to match the schema definition, or click Guess Query.
In the design workspace, select tRSSOutput, and click the Component view to define the basic settings
for tRSSOutput.
1831
2.
In the File name field, use the by default file name and path, or browse to set your own for the output XML
file.
3.
4.
5.
In the Channel panel, enter a title, a description, a publication date, and a link to define your input data as
a whole.
6.
7.
1832
The tRSSOutput component created an output RSS flow in an XML format for the defined files.
In the design workspace, select FTPPut, and click the Component tab to define the basic settings for
tFTPPut.
1833
2.
Enter the host name and the port number in their corresponding fields.
3.
Enter your connection details in the corresponding Username and Password fields.
4.
Browse to the local directory, or enter it manually in the Local directory field.
5.
6.
7.
On the Files panel, click on the plus button to add new lines and fill in the filemasks of all files to be copied
onto the remote directory. In this scenario, the files to be saved on the FTP server are all text files.
8.
Drop tRSSInput and tRSSOutput from the Palette to the design workspace.
2.
Connect the two components together using a Row > Main link.
1834
Double-click tRSSInput to open its Basic settings view and define the component properties.
2.
Enter the URL for the RSS_Feed to access. In this scenario, tRSSInput links to the Talend RSS_Feed: http://
feeds.feedburner.com/Talend.
3.
In the design workspace, double-click tRSSOutput to display its Basic settings view and define the
component properties.
4.
In the File name field, use the by default file name and path, or browse to set your own for the output XML
file.
5.
6.
7.
In the Channel panel, enter a title, a description, a publication date and a link to define your input data as
a whole.
8.
In the Optional Channel Element, define the RSS flow metadata. In this example, the flow has two metadata:
copyright, which value is tos, and language which value is en_us.
9.
2.
1835
The defined files are copied in the output XML file and the metadata display under the <channel> node
above the information about the RSS flow.
Drop the following components from the Palette onto the deisgn workspace: tFixedFlowInput of the Misc
component group and tRSSOutput of the Internet component group.
2.
3.
When asked whether you want to pass on the schema of tRSSOutput to tFixedFlowInput, click Yes.
1836
In the design workspace, double-click tFixedFlowInput to display its corresponding Component view and
define its basic settings.
2.
In the Number of rows field, leave the default setting to 1 to only generate one line of data.
3.
In the Mode area, leave the Use Single Table option selected and fill in the Values table. Note that the
Column field of the Values table is filled in by the columns of the schema defined in the component.
4.
In the Value field of the Values table, type in the data you want to be sent to the following component.
5.
In the design workspace, double-click tRSSOutput to display its corresponding Component view and define
its basic settings.
1837
6.
Click the [...] button next to the File Name field to set the output XML file directory and name.
7.
In the Mode area, select ATOM to generate an ATOM feed XML file.
As the ATOM feed format is strict, some default information is required to create the XML file. So, the schema
tRSSOutput contains default columns that will contain those information. Those default columns are greyed out to
indicate that they must not be modified. If you choose to modify the schema of the component, the ATOM XML file
created will not be valid.
8.
In the Feed area, enter a title, link, id, update date, author name to define your input data as a whole.
2.
1838
1839
tSCPClose
tSCPClose
tSCPClose Properties
Component family
Internet/SCP
Function
Purpose
Basic settings
Component list
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
choose your SCP connection dynamically from multiple connections planned in your Job.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Limitation
n/a
Related scenario
This component is closely related to tSCPConnection and tSCPRollback. It is generally used with
SCPConnection as it allows you to close a connection for the transaction which is underway.
For a related scenario see section tMysqlConnection.
1840
tSCPConnection
tSCPConnection
tSCPConnection properties
Component family
Internet/SCP
Function
Purpose
tSCPConnection allows you to open an SCP connection to transfer files in one transaction.
Basic settings
Host
Port
Username
Authentication method
Password
Usage
This component is typically used as a single-component sub-job. It is used along with other SCP
components.
Limitation
n/a
Related scenarios
For a related scenario, see section Scenario: Putting files on a remote FTP server.
For a related scenario using a different protocol, see section Scenario: Getting files from a remote SCP server.
1841
tSCPDelete
tSCPDelete
tSCPDelete properties
Component family
Internet/SCP
Function
This component deletes files from remote hosts over a fully encrypted channel.
Purpose
tSCPDelete allows you to remove a file from the defined SCP server.
Basic settings
Host
SCP IP address.
Port
Username
Authentication method
Password
SCP password.
Filelist
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
n/a
Related scenario
For tSCPDelete related scenario, see section Scenario: Getting files from a remote SCP server.
For tSCPDelete related scenario using a different protocol, see section Scenario: Putting files on a remote FTP
server.
1842
tSCPFileExists
tSCPFileExists
tSCPFileExists properties
Component family
Internet/SCP
Function
This component checks, over a fully encrypted channel, if a file exists on a remote host.
Purpose
tSCPFileExists allows you to verify the existence of a file on the defined SCP server.
Basic settings
Host
SCP IP address.
Port
Username
Authentication method
Password
SCP password.
Remote directory
Filename
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
n/a
Related scenario
For tSCPFileExists related scenario, see section Scenario: Getting files from a remote SCP server.
For tSCPFileExists related scenario using a different protocol, see section Scenario: Putting files on a remote
FTP server.
1843
tSCPFileList
tSCPFileList
tSCPFileList properties
Component family
Internet/SCP
Function
This component iterates, over a fully encrypted channel, on files of a given directory on a remote
host.
Purpose
tSCPFileList allows you to list files from the defined SCP server.
Basic settings
Host
SCP IP address.
Port
Username
Authentication method
Password
SCP password.
Command separator
Filelist
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
n/a
Related scenario
For tSCPFileList related scenario, see section Scenario: Getting files from a remote SCP server.
For tSCPFileList related scenario using a different protocol, see section Scenario: Putting files on a remote FTP
server.
1844
tSCPGet
tSCPGet
tSCPGet properties
Component family
Internet/SCP
Function
This component transfers defined files via an SCP connection over a fully encrypted channel.
Purpose
tSCPGet allows you to copy files from the defined SCP server.
Basic settings
Host
SCP IP address.
Port
Username
Authentication method
Password
SCP password.
Local directory
Overwrite or Append
Filelist
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
n/a
1845
Fill in the Host IP address, the listening Port number, and the user name in the corresponding fields.
On the Authentication method list, select the appropriate authentication method.
Note that the field to follow changes according to the selected authentication method. The authentication form
used in this scenario is password.
Fill in the local directory details where you want to copy the fetched file.
On the Overwrite or Append list, select the action to be carried out.
In the Filelist area, click the plus button to add a line in the Source list and fill in the path to the given file
on the remote SCP server.
In this scenario, the file to copy from the remote SCP server to the local disk is backport.
Save the Job and press F6 to execute it.
The given file on the remote server is copied on the local disk.
1846
tSCPPut
tSCPPut
tSCPPut properties
Component family
Internet/SCP
Function
This component copies defined files to a remote SCP server over a fully encrypted channel.
Purpose
Basic settings
Host
SCP IP address.
Port
Username
Authentication method
Password
SCP password.
Remote directory
Filelist
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
n/a
Related scenario
For tSCPPut related scenario, see section Scenario: Getting files from a remote SCP server.
For tSCPut related scenario using a different protocol, see section Scenario: Putting files on a remote FTP server.
1847
tSCPRename
tSCPRename
tSCPRename properties
Component family
Internet/SCP
Function
Purpose
Basic settings
Host
SCP IP address.
Port
Username
Authentication method
Password
SCP password.
File to rename
Rename to
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
n/a
Related scenario
For tSCPRename related scenario, see section Scenario: Getting files from a remote SCP server.
1848
tSCPTruncate
tSCPTruncate
tSCPRename properties
Component family
Internet/SCP
Function
This component removes all the data from a file via an SCP connection.
Purpose
tSCPTruncate allows you to remove data from file(s) on the defined SCP server.
Basic settings
Host
SCP IP address.
Port
Username
Authentication method
Password
SCP password.
Remote directory
Filelist
Usage
This component is typically used as a single-component sub-job but can also be used with other
components.
Limitation
n/a
Related scenario
For tSCPTruncate related scenario, see section Scenario: Getting files from a remote SCP server.
1849
tSendMail
tSendMail
tSendMail Properties
Component family
Internet
Function
Purpose
tSendMail purpose is to notify recipients about a particular state of a Job or possible errors.
Basic settings
To
From
Select this check box if you want the sender name to show in the
messages.
Cc
Bcc
Subject
Message
Die if the attachment file This check box is selected by default. Clear this check box if you
doesnt exist
want the message to be sent even if there are no attachments.
Attachments / File and Click the plus button to add as many lines as needed where you
Content Transfer Encoding can put filemask or path to the file to be sent along with the mail,
if any. Two options are available for content transfer encoding, i.e.
Default and Base64.
Other Headers
Click the plus button to add as many lines as needed where you
can type the Key and the corresponding Value of any header
information that does not belong to the standard header.
SSL Support
Select this check box to authenticate the server at the client side
via an SSL protocol.
STARTTLS Support
Select this check box to authenticate the server at the client side
via a STARTTLS protocol.
Importance
Need
authentication
/ Select this check box and enter a username and a password in the
Username and Password
corresponding fields if this is necessary to access the service.
Die on error
Advanced settings
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows.
MIME subtype from the Select in the list the structural form for the text of the message.
text MIME type
Encoding type
Select the encoding from the list or select Custom and define it
manually.
tStatCatcher Statistics
Usage
This component is typically used as one sub-job but can also be used as output or end object. It
can be connected to other components with either Row or Iterate links.
Limitation
n/a
1850
Drop the following components from your Palette to the design workspace: tFileInputDelimited,
tFileOutputXML, tSendMail.
Define tFileInputdelimited properties. Related topic: section tFileInputDelimited.
Right-click on the tFileInputDelimited component and select Row > Main. Then drag it onto the
tFileOutputXML component and release when the plug symbol shows up.
Define tFileOutputXML properties.
Drag a Run on Error link from tFileDelimited to tSendMail component.
Define the tSendMail component properties:
Enter the recipient and sender email addresses, as well as the email subject.
Enter a message containing the error code produced using the corresponding global variable. Access the list of
variables by pressing Ctrl+Space.
Add attachments and extra header information if any. Type in the SMTP information.
1851
In this scenario, the file containing data to be transferred to XML output cannot be found. tSendmail runs on this
error and sends a notification email to the defined recipient.
1852
tSetKerberosConfiguration
tSetKerberosConfiguration
tSetKerberosConfiguration properties
Component family
Internet
Function
Purpose
Basic settings
KDC Server
Realm
Advanced settings
tStatCatcher Statistics
Usage
This component is typically used as a sub-job by itself and is used along with tSoap.
Limitation
Related scenarios
No scenario is available for this component.
1853
tSetKeystore
tSetKeystore
tSetKeystore properties
Component family
Internet
Function
Purpose
This component allows you to set the authentication data type between PKCS 12 and JKS.
Basic settings
TrustStore type
TrustStore file
TrustStore password
Select this check box to validate the keystore data. Once doing
so, you need complete three fields:
- KeyStore type: select the type of the keystore to be used. It may
be PKCS 12 or JKS.
- KeyStore file: type in the path, or browse to the file (including
filename) containing the keystore data.
- KeyStore password: type in the password for this keystore.
Advanced settings
tStatCatcher Statistics
Usage
Connections
Limitation
n/a.
1854
<wsdl:port name="CustomerServiceHttpSoap11Endpoint"
binding="ns:CustomerServiceSoap11Binding">
<soap:address location="https://192.168.0.22:8443/axis2/services/
CustomerService.CustomerServiceHttpSoap11Endpoint/"/>
</wsdl:port>
So we need keystore files to connect to this WSDL file. To replicate this Job, proceed as follows:
Drop the following components from the Palette onto the design workspace: tSetKeystore, tWebService, and
tLogRow.
In the TrustStore type field, select PKCS12 from the drop-down list.
In the TrustStore file field, browse to the corresponding truststore file. Here, it is server.p12.
1855
In the TrustStore password field, type in the password for this truststore file. In this example, it is password.
Select the Need Client authentication check box to activate the keystore configuration fields.
In the KeyStore type field, select JKS from the drop-down list.
In the KeyStore file field, browse to the corresponding keystore file. Here, it is server.keystore.
Double-click tWebService to open the component editor, or select the component in the design workspace and
in the Basic settings view, click the three-dot button next to Service configuration.
In the WSDL field, browse to the private WSDL file to be used. In this example, it is CustomerService.wsdl.
Click the refresh button next to the WSDL field to retrieve the WSDL description and display it in the fields
that follow.
In the Port Name list, select the port you want to use, CustomerServiceHttpSoap11Endpoint in this example.
In the Operation list, select the service you want to use. In this example the selected service is
getCustomer(parameters):Customer.
Click Next to open a new view in the editor.
1856
In the panel to the right of the Input mapping view, the input parameter of the service displays automatically.
However, you can add other parameters if you select [+] parameters and then click the plus button on top to
display the [Parameter Tree] dialog box where you can select any of the listed parameters.
The Web service in this example has only one input parameter, ID.
In the Expression column of the parameters.ID row, type in the customer ID of your interest between quotation
marks. In this example, it is A00001.
Click Next to open a new view in the editor.
In the Element list to the left of the view, the output parameter of the web service displays automatically. However,
you can add other parameters if you select [+] parameters and then click the plus button on top to display the
[Parameter Tree] dialog box where you can select any of the parameters listed.
The Web service in this example has four output parameter: return.address, return.email, return.name and
return.phone.
You now need to create a connection between the output parameter of the defined Web service and the schema
of the output component. To do so:
In the panel to the right of the view, click the three-dot button next to Edit Schema to open a dialog box in
which you can define the output schema.
In the schema editing dialog box, click the plus button to add four columns to the output schema.
1857
Click in each column and type in the new names, Name, Phone, Email and Address in this example. This will
retrieve the customer information of your interest.
Click OK to validate your changes and to close the schema editing dialog box.
In the Element list to the right of the editor, drag each parameter to the field that corresponds to the column
you have defined in the schema editing dialog box.
If available, use the Auto map! button, located at the bottom left of the interface, to carry out the mapping operation
automatically.
1858
tSetProxy
tSetProxy
tSetProxy properties
Component family
Internet
Function
Purpose
tSetProxy allows you to enter the relevant information for proxy setup.
Basic settings
Proxy type
Proxy host
Proxy port
Proxy user
Proxy password
Advanced settings
tStatCatcher Statistics
Usage
Typically used as a sub-job by itself, tSetProxy is deployed along with other Internet
components.
Limitation
n/a
Related scenarios
No scenario is available for this component.
1859
tSocketInput
tSocketInput
tSocketInput properties
Component family
Internet
Function
tSocketInput component opens the socket port and listens for the incoming data.
Purpose
tSocketInput component is a listening component, allowing to pass data via a defined port
Host name
Port
Timeout
Uncompress
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Rejects
link.
Field separator
Row separator
Escape Char
Text enclosure
Schema
Schema
type
and
Encoding type
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Usage
This component opens a point of access to a workstation or server. This component starts a Job
and only stops after the time goes out.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
1860
For the first Job, drop a tSocketInput component and a tLogRow component from the Palette to the design
workspace, and link them using a Row > Main connection.
2.
For the second Job, drop a tFileInputDelimited component and a tSocketOutput component from the
Palette to the design workspace, and link them using a Row > Main connection.
On the second Job, select the tFileInputDelimited and on the Basic Settings tab of the Component view,
set the access parameters to the input file.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
2.
In File Name, browse to the file, and fill the Row, Field separators, and Header fields according to the
input file used.
3.
1861
Select the tSocketOutput component and set the parameters on the Basic Settings tab of the Component
view.
5.
Define the Host IP address and the Port number where the data will be passed on to.
6.
Set the number of retries in the Retry field and the amount of time (in seconds) after which the Job will
time out.
7.
Now on the other Job (SocketInput) design, define the parameters of the tSocketInput component.
8.
Define the Host IP address and the listening Port number where the data are passed on to.
9.
Set the amount of time (in seconds) after which the Job will time out.
10. Edit the schema and set it to reflect the whole or part of the other Jobs schema.
Press F6 to execute this Job (SocketInput) first, in order to open the listening port and prepare it to receive
the passed data.
2.
Before the time-out, launch the other Job (SocketOutput) to pass on the data.
The result displays on the Run view, along with the opening socket information.
1862
1863
tSocketOutput
tSocketOutput
tSocketOutput properties
Component family
Internet
Function
Purpose
tSocketOutput sends out the data from the incoming flow to listening socket port.
Basic settings
Host name
Port
Compress
Retry times
Timeout
Die on error
Clear this check box to skip the row on error and complete the
process for error-free rows.
Field separator
Row separator
Escape Char
Text enclosure
Encoding type
Usage
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
This component opens a point of access to a workstation or server. This component starts a Job
and only stops after the time goes out.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Related Scenario
For use cases in relation with tSocketOutput, see section Scenario: Passing on data to the listening port
1864
tSOAP
tSOAP
tSOAP properties
Component family
Internet
Function
tSOAP sends the defined SOAP message with the given parameters to the invoked Web service
and returns the value as defined, based on the given parameters.
Purpose
This component calls a method via a Web service in order to retrieve the values of the parameters
defined in the component editor.
Basic settings
Use NTLM
Select this check box if you want to use the NTLM authentication
protocol.
Domain: Name of the client domain.
Need authentication
Select this check box and enter a user name and a password in the
corresponding fields if this is necessary to access the service.
Select this check box if you are using a proxy server and fill in the
necessary information.
Select this check box to validate the server certificate to the client
via an SSL protocol and fill in the corresponding fields:
TrustStore file: enter the path (including filename) to the
certificate TrustStore file that contains the list of certificates that
the client trusts.
TrustStore password: enter the password used to check the
integrity of the TrustStore data.
ENDPOINT
SOAP Action
SOAP version
1865
SOAP message
Advanced settings
Use Kerberos
tStatCatcher Statistics
Usage
Connections
Limitation
N/A
1866
Drop the following components from the Palette onto the design workspace: tSOAP and tLogRow.
2.
3.
Double-click tSOAP to open its Basic settings view and define the component properties.
4.
In ENDPOINT field, type in or copy-paste the URL address of the Web service to be used between the
quotation marks: http://localhost:8200/airport.service.
5.
In the SOAP Action field, type in or copy-paste the URL address of the SOAPAction HTTP header
field that indicates that you want to retrieve the airport information: http://airportsoap.sopera.de/
getAirportInformationByISOCountryCode.
You can see this address by looking at the WSDL for the Web service you are calling. For the Web service of this
example, in a web browser, append ?wsdl on the end of the URL of the Web service used in the ENDPOINT field,
open the corresponding web page, and then see the SOAPAction defined under the operation node:
<wsdl:operation name="getAirportInformationByISOCountryCode">
<soap:operation soapAction="http://airportsoap.sopera.de/
getAirportInformationByISOCountryCode" style="document"/>
6.
From the SOAP Version list, select the version of the SOAP system being used. In this scenario, the version
is SOAP 1.1.
7.
In the SOAP message field, enter the XML-format message used to retrieve the airport information from
the invoked Web service. In this example, the airport information of China (whose country code is CN) is
needed, so the message is:
"<soapenv:Envelope xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\"
xmlns:web=\"http://airportsoap.sopera.de\">
<soapenv:Header/>
<soapenv:Body>
1867
Scenario 2: Using a SOAP message from an XML file to get airport information and saving the information to an XML file
<web:getAirportInformationByISOCountryCode>
<web:CountryAbbrviation>CN</web:CountryAbbrviation>
</web:getAirportInformationByISOCountryCode>
</soapenv:Body>
</soapenv:Envelope>"
8.
Drop the following components from the Palette onto the design workspace: tFileInputXML, tSOAP, and
tFileOutputXML.
2.
1868
Scenario 2: Using a SOAP message from an XML file to get airport information and saving the information to an XML file
2.
Click the [...] button next to Edit schema to open the [Schema] dialog box.
3.
Click the [+] button to add a column, give it a name, getAirport in this example, and select Document from
the Type list. Then, click OK to close the dialog box.
4.
In the File name/Stream field, enter the path to the input XML file that contains the SOAP message to be
used, or browse to the path by clicking the [...] button.
The input file contains the following SOAP message:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:web="http://airportsoap.sopera.de">
<soapenv:Header/>
<soapenv:Body>
<web:getAirportInformationByISOCountryCode>
<web:CountryAbbrviation>CN</web:CountryAbbrviation>
</web:getAirportInformationByISOCountryCode>
</soapenv:Body>
</soapenv:Envelope>
5.
In the Loop XPath query field, enter / to define the root as the loop node of the input file structure.
6.
In the Mapping table, fill the XPath query column with . to extract all data from context node of the
source, and select the Get Nodes check box to build a Document type data flow.
1869
Scenario 2: Using a SOAP message from an XML file to get airport information and saving the information to an XML file
2.
In ENDPOINT field, enter or copy-paste the URL address of the Web service to be used between the
quotation marks: http://localhost:8200/airport.service.
3.
In the SOAP Action field, enter or copy-paste the URL address of the SOAPAction HTTP header
field that indicates that you want to retrieve the airport information: http://airportsoap.sopera.de/
getAirportInformationByISOCountryCode.
4.
Select the Use a message from the schema check box, and select a Document type column from the SOAP
Message list to read the SOAP message from the input file to send to the Web service. In this example, the
input schema has only one column, getAirport.
5.
Select the Output in Document check box to output the response message in XML format.
2.
In the File Name field, enter the path to the output XML file.
3.
Select the Incoming record is a document check box to retrieve the incoming data flow as an XML
document. Note that a Column list appears allowing you choose a column to retrieve data from. In this
example, the schema contains only one column.
2.
Press F6, or click Run on the Run tab to execute the Job.
1870
Scenario 2: Using a SOAP message from an XML file to get airport information and saving the information to an XML file
The airport information of China is returned and the information is saved in the defined XML file.
1871
tWebService
tWebService
tWebService properties
Component family
Internet
Function
tWebservice calls the defined method from the invoked Web service and returns the class as
defined, based on the given parameters.
Purpose
This component calls a method via a Web service in order to retrieve the values of the
parameters defined in the component editor.
Basic settings
Property type
Service configuration
Click the three-dot button next to the field to open the component
editor.
In this editor, you can:
-select the Web service you want to use,
-configure the input parameters of the Web service,
-configure the output parameters of the Web service. These
parameters will be used to retrieve and output specific data.
Auto: By default, the links between the input and output schemas
and the Web service parameters are in the form of curves.
Curves: Links between the schema and the Web service
parameters are in the form of curve.
Lines: Links between the schema and the Web service parameters
are in the form of straight lines. This option slightly optimizes
performance.
Input schema
1872
Edit Schema
Click the [...] button to make changes to the schema. Note that if
you make changes, the schema automatically becomes built-in.
Sync columns
This button is available when an input link has been created. Click
this button to retrieve the schema from the previous component
connected in the Job.
Output schema
Use NTLM
Select this check box if you want to use the NTLM authentication
protocol.
Domain: Name of the client domain,
Host: Client IP address.
Need authentication
Select this check box and enter a username and a password in the
corresponding fields if this is necessary to access the service.
Select this check box if you are using a proxy server and fill in
the necessary information.
Select this check box to validate the server certificate to the client
via an SSL protocol and fill in the corresponding fields:
TrustStore file: enter the path (including filename) to the
certificate TrustStore file that contains the list of certificates that
the client trusts.
TrustStore password: enter the password used to check the
integrity of the TrustStore data.
Die on error
Advanced settings
Temporary
wsdl2java)
Clear this check box to skip the rows in error and to complete the
process for the error free rows.
folder
tStatCatcher Statistics
Usage
Limitation
1873
Linking components
1.
Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tWebService, and tLogRow.
2.
Double-click tFixedFlowInput to open its Basic settings view and define the component properties.
2.
Click the [...] button next to the Edit schema field to open a dialog box where you can define the input schema.
3.
In the open dialog box, click the [+] button to add a column to the schema.
4.
5.
Click OK to close the schema definition dialog box. The Country column displays in the Values table in the
component Basic settings view.
6.
In the Values table, click in the Value column and enter the value of the Country column, ITALY in this
example. This will retrieve the list of defenders of the Italian football team.
1874
Double-click tWebService to open the component editor, or select the component in the design workspace
and in the Basic settings view, click the [...] button next to Service configuration.
2.
3.
In the WSDL field, enter the Web service address or browse to it, if the WSDL is locally stored, by clicking
the [Browse...] button.
4.
Click the refresh button next to the WSDL filed to retrieve the WSDL description and display it in the fields
that follow.
5.
In the Port Name list, select the port you want to use, FootballPoolWebServiceSoap in this example.
6.
In the Operation list, select the service you want to use. In this example the selected service is
AllDefenders(parameters):ArrayOfString .
1875
2.
In the panel to the right of the Input mapping view, the input parameter of the service displays automatically.
However, you can add other parameters if you select [+] parameters and then click the [+] button on top to
display the [Parameter Tree] dialog box where you can select any of the listed parameters.
The Web service in this example has only one input parameter, sCountryName.
If available, use the Auto map! button, located at the bottom left of the interface, to carry out the mapping operation
automatically.
You now need to create a connection between the input schema and the input parameter of the defined Web
service.
3.
In the Column list, drag the column in the input schema you want to link to the input parameter of the Web
service to the corresponding parameter in the panel to the right.
In the Element list to the left of the view, the output parameter of the web service displays automatically.
However, you can add other parameters if you select [+] parameters and then click the [+] button on top to
display the [Parameter Tree] dialog box where you can select any of the parameters listed.
1876
The Web service in this example has only one output parameter: AllDefendersResult.string.
You now need to create a connection between the output parameter of the defined Web service and the schema
of the output component.
2.
In the panel to the right of the view, click the [+] button next to Edit Schema to open a dialog box in which
you can define the output schema.
3.
In the Output list to the right of the dialog box, click the [+] button to add a column to the output schema.
4.
Click in the column and type in a name, Name in this example. This will retrieve the names of the defenders.
5.
Click OK to validate your changes and to close the schema definition dialog box.
6.
In the Element list to the right of the editor, drag parameters.AllDefendersResult.string to the field that
corresponds to the Name column to the right of the editor.
If available, use the Auto map! button, located at the bottom left of the interface, to carry out the mapping operation
automatically.
7.
Select this row in the panel to the right and click Denormalize in order to denormalize the returned data.
Note that the Normalize or the Denormalize button is activated only when it is required.
1877
8.
Add
[*]
after
the
parameter
in
order
to
have
the
following
code:
denormalize(parameters.AllDefendersResult.string[*],:). This will retrieve all data separated
by a colon :.
9.
In the design workspace, double-click tLogRow to open its Basic settings view and define its properties.
2.
Click Sync columns to retrieve the schema from the preceding component.
3.
The names of all defenders of the Italian football team are returned and displayed in the console of Talend Studio.
1878
tWebServiceInput
tWebServiceInput
tWebServiceInput Properties
Component family
Internet
Function
Calls the defined method from the invoked Web service, and returns the class as defined, based
on the given parameters.
Purpose
Basic settings
Property type
WSDL
Need
authentication
/ Select this check box and:
Username and Password
-enter a username and a password in the corresponding fields if
this is necessary to access the service. Or,
-select the Windows authentication check box and enter the
windows domain in the corresponding field if this is necessary to
access the service.
Use http proxy
Select this check box if you are using a proxy server and fill in
the necessary information.
Select this check box to validate the server certificate to the client
via an SSL protocol and fill in the corresponding fields:
TrustStore file: enter the path (including filename) to the
certificate TrustStore file that contains the list of certificates that
the client trusts.
TrustStore password: enter the password used to check the
integrity of the TrustStore data.
Method Name
1879
Advanced Use
Select this check box to display the fields dedicated for the
advanced use of tWebServiceInput:
WSDL2java: click the three-dot button to generate Talend
routines that hold the Java code necessary to connect and query
the Web service.
Code: replace the generated model Java code with the code
necessary to connect and query the specified Web service using
the code in the generated Talend routines.
Match Brackets: select the number of brackets to be used to
close the for loop based on the number of open brackets.
tStatCatcher Statistics
Usage
Limitation
n/a
Drop a tWebServiceInput component and a tLogRow component from the Palette onto the design
workspace.
2.
1880
Double-click tWebServiceInput to open its Basic settings view in the Component tab.
2.
Click the [...] button next to Edit schema to define the structure of the data to be received.
3.
In the WSDL field, type in the URL through which you can browse the Web service WSDL, "http://
localhost:8200/airport.service?wsdl" in this example.
4.
In the Method name field, type in the name of the method to be invoked from the Web service,
getAirportInformationByISOCountryCode in this example.
5.
In the Parameters table, click the [+] button to add one row, and enter the expected parameter. In this
example, type in CN, which is a country code abbreviation.
6.
2.
1881
Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features
The airport information corresponding to the country code CN is displayed on the console.
Drop the following components from the Palette onto the design workspace: tWebServiceInput and tLogRow.
Link the two components together using a Row Main connection.
Double-click tWebServiceInput to show the Component view and set the component properties:
1882
Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features
Select the check box next to Advanced Use to display the advanced configuration fields.
Click the [...] button next to the WSDL2Java field in order to generate routines from the WSDL Web service.
1883
Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features
The routines generated display automatically under Code > Routines in the Repository tree view. These routines
can thus easily be called in the code to build the function required to fetch complex hierarchical data from the
Web Service.
Enter the relevant function in the Code field. By default, two examples of code are provided in the Code field.
The first example returns one piece of data, and the second example returns several.
In this scenario, several data are to be returned. Therefore, remove the first example of code and use the second
example of code to build the function.
Replace the pieces of code provided as examples with the relevant routines that have been automatically
generated from the WSDL.
Change
TalendJob_PortType to the
XigniteFundHoldingsSoap_PortType.
routine
name
ending
with
_Port_Type,
such
as:
Replace the various instances of TalendJob with a more relevant name such as the name of the method in use.
In this use case: fundHolding
Replace TalendJobServiceLocator with the name of the routine ending with Locator, such as:
XigniteFundHoldingLocator.
Replace both instances of TalendJobSoapBindingStub with the routine name ending with BindingStub, such
as: XigniteFundHoldingsSoap_BindingStub.
Within the brackets corresponding to the pieces of code: stub.setUsername and stub.setPassword, enter your
username and password respectively, between quotes.
For the sake of confidentiality or maintenance, you can store your username and password in context variables.
1884
Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features
The list of funds provided by the Xignite Web service is identified using so-called symbols, which are of string
type. In this example, we intend to fetch the list of funds of which the symbol is between I and J. To do so,
define the following statements: string startSymbol=I and string endSymbol=J.
Then enter the piece of code to create the result table showing the list of funds (listFunds)
of funds holdings using the statements defined earlier on: routines.Fund[]
result
=
fundHoldings.listFunds(startSymbol, endSymbol);
Run a loop on the fund list to fetch the funds ranging from I to J: for(int i = 0; i < result.length;
i++) {.
Define the results to return, for example: fetch the CIK data from the Security schema using the code
getSecurity().getCIK(), then pass them on to the CIK output schema.
The function that operates the Web service should read as follows:
routines.XigniteFundHoldingsSoap_PortType
fundHoldings = new routines.XigniteFundHoldingsLocator().getXigniteFundHoldingsSoap(
);
routines.XigniteFundHoldingsSoap_BindingStub
stub = (routines.XigniteFundHoldingsSoap_BindingStub)fundHoldings;
stub.setUsername(identifiant);
Stub.setPassword(mot de passe);
String startSymbol="I";
String endSymbol="J";
routines.Fund[ ] result = fundHoldings.listFunds(startSymbol,
endSymbol); for(int i = 0; i < result.length; i++) {
output_row.CIK = (result[i]).getSecurity().getCIK();
output_row.cusip = (result[i]).getSecurity().getCusip();
output_row.symbol = (result[i]).getSecurity().getSymbol();
output_row.ISIN = (result[i]).getSecurity().getISIN();
output_row.valoren = (result[i]).getSecurity().getValoren();
output_row.name = (result[i]).getSecurity().getName();
output_row.market = (result[i]).getSecurity().getMarket();
output_row.category =
(result[i]).getSecurity().getCategoryOrIndustry();
output_row.asOfDate = (result[i]).getAsOfDate();
The outputs defined in the Java function output_row.output must match the columns defined in the component schema
exactly. The case used must also be matched in order for the data to be retrieved.
In the Match Brackets field, select the number of brackets to use to end the For loop, based on the number of
open brackets. For this scenario, select one bracket only as only one bracket has been opened in the function.
Double-click the tLogRow component to display the Component view and set its parameters.
Click the [...] button next to the Edit Schema field in order to check that the preceding component schema was
properly propagated to the output component. If needed, click the Sync Columns button to retrieve the schema.
Save your Job and press F6 to run it.
1885
Scenario 2: Reading the data published on a Web service using the tWebServiceInput advanced features
The funds comprised between I and J are returned and displayed in the Talend Studio console.
1886
tXMLRPCInput
tXMLRPCInput
tXMLRPCInput Properties
Component family
Internet
Function
Calls the defined method from the invoked RPC service, and returns the class as defined, based
on the given parameters.
Purpose
Invokes a Method through a Web service and for the described purpose
Basic settings
Server URL
Need
authentication
/ Select this check box and fill in a username and password if
Username and Password
required to access the service.
Method Name
Return class
Parameters
Usage
Limitation
n/a
Drop the tXMLRPCInput and a tLogRow components from the Palette to the design workspac.
1887
Then set the Server url. For this demo, use: http://phpxmlrpc.sourceforge.net/server.php
No authentication details are required in this use case.
The Method to be called is: examples.getStateName
The return class is not compulsory for this method but might be strictly required for another. Leave the default
setting for this use case.
Then set the input Parameters required by the method called. The Name field is not used in the code but the
value should follow the syntax expected by the method. In this example, the Name used is State Nr and the
value randomly chosen is 42.
The class has not much impact using this demo method but could have with another method, so leave the default
setting.
On the tLogRow component Component view, check the box: Print schema column name in front of each
value.
Then save the Job and press F6 to execute it.
South Dakota is the state name found using the GetStateName RPC method and corresponds the 42nd State of
the United States as defined as input parameter.
1888
tAssert
tAssert
tAssert Properties
The tAssert component works alongside tAssertCatcher to evaluate the status of a Job execution. It concludes
with the boolean result based on an assertive statement related to the execution and feed the result to
tAssertCatcher for proper Job status presentation.
Component family
Function
Purpose
Generates the boolean evaluation on the concern for the Job execution status. The status
includes:
- Ok: the Job execution succeeds.
- Fail: the Job execution fails. The tested Job's result does not match the expectation or an
execution error occured at runtime.
Basic settings
Description
Expression
Usage
This component follows the action the assertive condition is directly related to. It can be the
intermediate or end component of the main Job, or the start, intermediate or end component
of the secondary Job.
Limitation
Drop tFixedFlowInput, tMysqlOutput, tAssert, tAssertCatcher, and tLogRow onto the workspace.
2.
Rename tFixedFlowInput as orders, tAssert as orders >=20, tAssertCatcher as catch comparison result
and tLogRow as ok or failed.
3.
4.
1890
Scenario 1: Viewing product orders status (on a daily basis) against a benchmark number
5.
Berry
Berry
Berry
Berry
Berry
Berry
Berry
Berry
Berry
Berry
Berry
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
Juice;2013-02-19
11:14:15;3.6
12:14:15;3.6
13:14:15;3.6
14:14:15;3.6
12:14:15;3.6
12:14:15;3.6
12:14:15;3.6
12:14:15;3.6
12:14:15;3.6
12:14:15;3.6
12:14:15;3.6
1891
Scenario 1: Viewing product orders status (on a daily basis) against a benchmark number
Note that the orders listed are just for illustration of how tAssert functions and the number here is less than 20.
2.
3.
Click the [+] button to add four columns, namely product_id, product_name, date and price, of the String,
Date, Float types respectively.
Click OK to validate the setup and close the editor.
4.
5.
In the Host, Port, Database, Username and Password fields, enter the connection details and the
authentication credentials.
6.
In the Table field, enter the name of the table, for example order.
1892
7.
In the Action on table list, select the option Drop table if exists and create.
8.
9.
10. In the description field, enter the descriptive information for the purpose of tAssert in this case.
11. In the expression field, enter the expression allowing you to compare the data to a fixed number:
((Integer)globalMap.get("tMysqlOutput_1_NB_LINE_INSERTED"))>=20
13. In the Mode area, select Table (print values in cells of a table) for a better display.
2.
As shown above, the orders status indicates Failed as the number of orders is less than 20.
1893
tFileInputDelimited and tFileOutputDelimited. The two components compose the main Job of which
the execution status is evaluated. For the detailed information on the two components, see section
tFileInputDelimited and section tFileOutputDelimited.
tFileCompare. It realizes the comparison between the output file of the main Job and a standard reference file.
The comparative result is evaluated by tAssert against the assertive condition set up in its settings. For more
detailed information on tFileCompare, see section tFileCompare.
tAssertCatcher. It captures the evaluation generated by tAssert. For more information on tAssertCatcher,
see section tAssertCatcher.
tLogRow. It allows you to read the captured evaluation. For more information on tLogRow, see section
tLogRow.
First proceed as follows to design the main Job:
Prepare a delimited .csv file as the source file read by your main Job.
Edit two rows in the delimited file. The contents you edit are not important, so feel free to simplify them.
Name it source.csv.
In Talend Studio, create a new job JobAssertion.
Place tFileInputDelimited and tFileOutputDelimited on the workspace.
Connect them with a Row Main link to create the main Job.
Still in the Component view, set Property Type to Built-In and click
next to Edit schema to define the
data to pass on to tFileOutputDelimited. In the scenario, define the data presented in source.csv you created.
For more information about schema types, see Talend Studio User Guide.
Define the other parameters in the corresponding fields according to source.csv you created.
1894
Press F6 to execute the main Job. It reads source.csv, pass the data to tFileOutputDelimited and output an
delimited file, out.csv.
Then contine to edit the Job to see how tAssert evaluates the execution status of the main Job.
Rename out.csv as reference.csv.This file is used as the expected result the main Job should output.
Place tFileCompare, tAssert and tLogRow on the workspace.
Connect them with Row Main link.
Connect tFileInputDelimited to tFileCompare with OnSubjobOk link.
1895
In the Component view, edit the assertion row2.differ==0 in the expression field and the descriptive message
of the assertion in description field.
In the expression field, row2 is the data flow transmissing from tFileCompare to tAssert, differ is one of
the columns of the tFileCompare schema and presents whether the compared files are identical, and 0 means no
difference is detected between the out.csv and reference.csv by tFileCompare. Hence when the compared files
are identical, the assertive condition is thus fulfilled, tAssert concludes that the main Job succeeds; otherwise,
it concludes failure.
The differ column is in the read-only tFileCompare schema. For more information on its schema, see section tFileCompare.
1896
The console shows the comparison result of tFileCompare: Files are identical. But you find nowhere the
evaluation result of tAssert.
So you need tAssertCatcher to capture the evaluation.
Place tAssertCatcher and tLogRow on the workspace.
Connect them with Row Main link.
1897
The descriptive information on JobAssertion in the console is organized according to the tAssertCatcher schema.
This schema includes, in the following order, the execution time, the process ID, the project name, the Job name,
the code language, the evaluation origin, the evaluation result, detailed information of the evaluation, descriptive
message of the assertion. For more information on the schema of tAssertCatcher, see section tAssertCatcher.
The console indicates that the execution status of Job JobAssertion is Ok. In addition to the evalution, you can
still see other descriptive information about JobAssertion including the descriptive message you have edited in
the Basic settings of tAssert.
Then you will perform operations to make the main Job fail to generate the expected file. To do so, proceed as
follows in the same Job you have executed:
Delete a row in reference.csv.
Press F6 to execute the Job again.
Check the result presented in Run view.
2010-02-01 19:47:43|GeHJNO|TASSERT|JobAssertion|tAssert_1|Failed|Test
logically failed|The output file should be identical with the reference
file
The console shows that the execution status of the main Job is Failed. The detailed explanation for this status is
closely behind it, reading Test logically failed.
You can thus get a basic idea about your present Job status: it fails to generate the expected file because of a logical
failure. This logical failure could come from a logical mistake during the Job design.
The status and its explanatory information are presented respectively in the status and the substatus columns of
the tAssertCatcher schema. For more information on the columns, see section tAssertCatcher.
1898
tAssertCatcher
tAssertCatcher
tAssertCatcher Properties
Component family
Function
Based on its pre-defined schema, fetches the execution status information from repository, Job
execution and tAssert.
Purpose
Generates a data flow consolidating the status information of a job execution and transfer the
data into defined output files.
Basic settings
This check box allows to capture Java exception errors and show
the message in the Description column (Get original exception
not selected) or in the Exception column (Get original exception
selected) column, once checked.
Catch tAssert
1899
Related scenarios
Usage
This component is the start component of a secondary Job which fetches the execution status
information from several sources. It generates a data flow to transfer the information to the
component which proceeds.
Limitation
Related scenarios
For using case in relation with tAssertCatcher, see tAssert scenario:
section Scenario 2: Setting up the assertive condition for a Job execution
1900
tChronometerStart
tChronometerStart
tChronometerStart Properties
Component family
Function
Purpose
Operates as a chronometer device that starts calculating the processing time of one or more
subjobs in the main Job, or that starts calculating the processing time of part of your subjob.
Usage
You can use tChronometerStart as a start or middle component. It can precede one or more
processing tasks in the subjob. It can precede one or more subjobs in the main Job.
Limitation
n/a
Related scenario
For related scenario, see section Scenario: Measuring the processing time of a subjob and part of a subjob.
1901
tChronometerStop
tChronometerStop
tChronometerStop Properties
Component family
Function
Purpose
Operates as a chronometer device that stops calculating the processing time of one or more
subjobs in the main Job, or that stops calculating the processing time of part of your subjob.
tChronometerStop displays the total execution time.
Basic settings
Since options
Display duration in console When selected, it displays subjob execution information on the
console.
Display component name
Caption
Limitation
n/a
1902
Click Edit schema to define the schema of the tRowGenerator. For this Job, the schema is composed of two
columns: First_Name and Last_Name, so click twice the [+] button to add two columns and rename them.
Click the RowGenerator Editor three-dot button to open the editor and define the data to be generated.
In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows for
RowGenerator field and click OK. The RowGenerator Editor closes.
You will be prompted to propagate changes. Click Yes in the popup message.
Double-click on the tMap component to open the Map editor. The Map editor opens displaying the input
metadata of the tRowGenerator component.
1903
In the Schema editor panel of the Map editor, click the plus button of the output table to add two rows and
define them.
In the Map editor, drag the First_Name row from the input table to the Last_Name row in the output table and
drag the Last_Name row from the input table to the First_Name row in the output table.
Click Apply to save changes.
You will be prompted to propagate changes. Click Yes in the popup message.
Click OK to close the editor.
Select tFileOutputDelimited and click the Component tab to display the component view.
In the Basic settings view, set tFileOutputDelimited properties as needed.
1904
Select tChronometerStop and click the Component tab to display the component view.
In the Since options panel of the Basic settings view, select Since the beginning option to measure the duration
of the subjob as a whole.
t
Select/clear the other check boxes as needed. In this scenario, we want to display the subjob duration on the
console preceded by the component name.
If needed, enter a text in the Caption field.
Save your Job and press F6 to execute it.
You can measure the duration of the subjob the same way by placing tChronometerStop below tRowGenerator, and
connecting the latter to tChronometerStop using an OnSubjobOk link.
1905
tDie
tDie
tDie properties
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally make sense
when used alongside a tLogCatcher in order for the log data collected to be encapsulated and passed on to the
output defined.
Component family
Function
This component throws an error and kills the job. If you simply want to throw a warning, see
section tWarn.
Purpose
Triggers the tLogCatcher component for exhaustive log before killing the Job.
Basic settings
Die message
Error code
Priority
Usage
This component cannot be used as a start component and it is generally used with a
tLogCatcher for the log purpose.
Limitation
n/a
Related scenarios
For use cases in relation with tDie, see tLogCatcher scenarios:
section Scenario 1: warning & log on entries
section Scenario 2: Log & kill a Job
1906
tFlowMeter
tFlowMeter
tFlowMeter Properties
Component family
Function
Purpose
The number of rows is then meant to be caught by the tFlowMeterCatcher for logging purpose.
Basic settings
Use input connection name Select this check box to reuse the name given to the input main
as label
row flow as label in the logged data.
Mode
Select the type of values for the data measured: Absolute: the
actual number of rows is logged
Relative: a ratio (%) of the number of rows is logged. When this
option is selected, a Connections List shows to let you select a
reference connection.
Thresholds
Usage
Limitation
n/a
If you have a need of log, statistics and other measurement of your data flows, see Talend Studio User Guide.
Related scenario
For related scenario, see section Scenario: Catching flow metrics from a Job
1907
tFlowMeterCatcher
tFlowMeterCatcher
tFlowMeterCatcher Properties
Component family
Function
Based on a defined schema, the tFlowMeterCatcher catches the processing volumetric from
the tFlowMeter component and passes them on to the output component.
Purpose
Operates as a log function triggered by the use of a tFlowMeter component in the Job.
Basic settings
Usage
This component is the start component of a secondary Job which triggers automatically at the
end of the main Job.
Limitation
The use of this component cannot be separated from the use of the tFlowMeter. For more
information, see section tFlowMeter
1908
Drop the following components from the Palette to the design workspace: tMysqlInput, tFlowMeter (x2),
tMap, tLogRow, tFlowMeterCatcher and tFileOutputDelimited.
Link components using row main connections and click on the label to give consistent name throughout the
Job, such as US_States from the input component and filtered_states for the output from the tMap component,
for example.
Link the tFlowMeterCatcher to the tFileOutputDelimited component using a row main link also as data is
passed.
On the tMysqlInput Component view, configure the connection properties as Repository, if the table metadata
are stored in the Repository. Or else, set the Type as Built-in and configure manually the connection and schema
details if they are built-in for this Job.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in.
For further information about how to edit a Built-in schema, see Talend Studio User Guide.
The 50 States of the USA are recorded in the table states. In order for all 50 entries of the table to get selected,
the query to run onto the Mysql database is as follows:
select * from states.
Select the relevant encoding type on the Advanced settings vertical tab.
Then select the following component which is a tFlowMeter and set its properties.
1909
Select the check box Use input connection name as label, in order to reuse the label you chose in the log
output file (tFileOutputDelimited).
The mode is Absolute as there is no reference flow to meter against, also no Threshold is to be set for this
example.
Then launch the tMap editor to set the filtering properties.
For this use case, drag and drop the ID and State columns from the Input area of the tMap towards the Output
area. No variable is used in this example.
On the Output flow area (labelled filtered_states in this example), click the arrow & plus button to activate the
expression filter field.
Drag the State column from the Input area (row2) towards the expression filter field and type in the rest of
the expression in order to filter the state labels starting with the letter M. The final expression looks like:
row2.State.startsWith("M")
1910
Select the Append check box in order to log all tFlowMeter measures.
Then save your Job and press F6 to execute it.
The Run view shows the filtered state labels as defined in the Job.
1911
In the delimited csv file, the number of rows shown in column count varies between tFlowMeter1 and
tFlowMeter2 as the filtering has then been carried out. The reference column shows also this difference.
1912
tLogCatcher
tLogCatcher
tLogCatcher properties
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally make sense
when used alongside a tLogCatcher in order for the log data collected to be encapsulated and passed on to the
output defined.
Component family
Function
Fetches set fields and messages from Java Exception, tDie and/or tWarn and passes them on
to the next component.
Purpose
Operates as a log function triggered by one of the three: Java exception, tDie or tWarn, to
collect and transfer log data.
Basic settings
Select this check box to trigger the tCatch function when a Java
Exception occurs in the Job
Catch tDie
Select this check box to trigger the tCatch function when a tDie
is called in a Job
Catch tWarn
Select this check box to trigger the tCatch function when a tWarn
is called in a Job
Usage
This component is the start component of a secondary Job which automatically triggers at the
end of the main Job
Limitation
n/a
1913
Drop a tRowGenerator, a tWarn, a tLogCatcher and a tLogRow from the Palette, on your design workspace
Connect the tRowGenerator to the tWarn component.
Connect separately the tLogCatcher to the tLogRow.
On the tRowGenerator editor, set the random entries creation using a basic function:
On the tWarn Component view, set your warning message, the code the priority level. In this case, the message
is this is a warning.
For this scenario, we will concatenate a function to the message above, in order to collect the first value from
the input table.
On the Basic settings view of tLogCatcher, select the tWarn check box in order for the message from the
latter to be collected by the subjob.
Click Edit Schema to view the schema used as log output. Notice that the log is comprehensive.
Press F6 to execute the Job. Notice that the Log produced is exhaustive.
1914
Drop all required components from various folders of the Palette to the design workspace: tRowGenerator,
tFileOutputDelimited, tDie, tLogCatcher, tLogRow.
On the tRowGenerator Component view, define the setting of the input entries to be handled.
Edit the schema and define the following columns as random input examples: id, name, quantity, flag and
creation.
Set the Number of rows onto 0. This will constitute the error which the Die operation is based on.
On the Values table, define the functions to feed the input flow.
Define the tFileOutputDelimited to hold the possible output data. The row connection from the
tRowGenerator feeds automatically the output schema. The separator is a simple semi-colon.
Connect this output component to the tDie using a Trigger > If connection. Double-click on the newly created
connection to define the if:
((Integer)globalMap.get("tRowGenerator_1_NB_LINE")) <=0
Then double-click to select and define the Basic settings of the tDie component.
Enter your Die message to be transmitted to the tLogCatcher before the actual kill-job operation happens.
Next to the Job but not physically connected to it, drop a tLogCatcher from the Palette to the design workspace
and connect it to a tLogRow component.
1915
Define the tLogCatcher Basic settings. Make sure the tDie box is selected in order to add the Die message to
the Log information transmitted to the final component.
Press F6 to run the Job and notice that the log contains a black message and a red one.
The black log data come from the tDie and are transmitted by the tLogCatcher. In addition the normal Java
Exception message in red displays as a Job abnormally died.
1916
tLogRow
tLogRow
tLogRow properties
Component family
Function
Purpose
Basic settings
Sync columns
Click to synchronize the output file schema with the input file
schema. The Sync function is available only when the component is
linked with the preceding component using a Row connection.
Basic
Table
Vertical
Separator
Enter the separator which will delimit data on the Log display.
Select this check box to include the header of the input flow in the
output display.
Select this check box to set a fixed width for the value display.
This component can be used as intermediate step in a data flow or as a n end object in the Job
flowchart.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this component
as a Map/Reduce component. In a Talend Map/Reduce Job, this component is used as an intermediate
or an end step. It generates native Map/Reduce code that can be executed directly in Hadoop.
You need to use the Hadoop Configuration tab in the Run view to define the connection to a given
Hadoop distribution for the whole Job.
This connection is effective on a per-Job basis.
1917
Related scenarios
For further information about a Talend Map/Reduce Job, see the sections describing how to create,
convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting
Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard
Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
n/a
Related scenarios
For related scenarios, see:
section Scenario: Reading master data in an MDM hub.
section Scenario: Reading data from different MySQL databases using dynamically loaded connection
parameters.
section Scenario 1: warning & log on entries.
section Scenario 2: Log & kill a Job.
1918
tStatCatcher
tStatCatcher
tStatCatcher Properties
Component family
Function
Based on the pre-defined schema, tStatCatcher gathers the Job processing metadata at the Job
level and at the component level when the tStatCatcher Statistics check box is selected.
Purpose
Gathers the Job processing metadata at the Job level and at the component level when the
tStatCatcher Statistics check box is selected and transfers the log data to the subsequent
component for display or storage.
Basic settings
Schema
Usage
This component is the start component of a secondary Job which triggers automatically at the
end of the main Job. The processing time is also displayed at the end of the log.
Limitation
n/a
1919
2.
3.
2.
1920
3.
Click the [+] button to add three columns, namely ID_Owners, Name_Customer and ID_Insurance, of the
Integer and String types respectively.
4.
5.
In the dialog box that appears, click Yes to propagate the changes to the subsequent component.
6.
7.
8.
9.
In the File Name field, enter the full name of the file to save the statistics data.
1921
11. Select Vertical (each row is a key/value list) for a better display of the results.
2.
As shown above, the statistics log of the Job execution is correctly generated.
1922
tWarn
tWarn
tWarn Properties
Both tDie and tWarn components are closely related to the tLogCatcher component.They generally make sense
when used alongside a tLogCatcher in order for the log data collected to be encapsulated and passed on to the
output defined.
Component family
Function
This component provides a priority-rated message to the next component. It does not stop your
Job in case of error. If you want to kill a Job in case of error, see section tDie.
Purpose
Triggers a warning often caught by the tLogCatcher component for exhaustive log.
Basic settings
Warn message
Code
Priority
Usage
Limitation
n/a
Related scenarios
For use cases in relation with tWarn, see tLogCatcher scenarios:
section Scenario 1: warning & log on entries
section Scenario 2: Log & kill a Job
1923
tAddLocationFromIP
tAddLocationFromIP
tAddLocationFromIP Properties
Component family
Misc
Function
Purpose
Basic settings
Schema type and Edit A schema is a row description, i.e. it defines the number of fields to be
schema
processed and passed on to the next component. The schema is either
Built-in or stored remotely in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in
mode is available.
Built-in: You create and store the schema locally for this component
only. Related topic: see Talend Studio User Guide.
Database Filepath
Input parameters
Input column: Select the input column from which the input values
are to be taken.
input value is a hostname: Check if the input column holds hostnames.
input value is an IP address: Check if the input column holds IP
addresses.
Location type
Usage
This component is an intermediary step in the data flow allowing to replace IP with geolocation
information. It can not be a start component as it requires an input flow. It also requires an
output component.
Limitation
Due to license incompatibility, the following JAR required to use this component is not
provided. You can easily add the JAR by following the How to install external modules section
of Talend Studio User Guide.
geoip.jar
1926
Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tAddLocationFromIP, and tLogRow.
2.
In the design workspace, select tFixedFlowInput, and click the Component tab to define the basic settings
for tFixedFlowInput.
2.
Click the [...] button next to Edit Schema to define the structure of the data you want to use as input. In this
scenario, the schema is made of one column that holds an IP address.
3.
Click OK to close the dialog box, and accept propagating the changes when prompted by the system. The
defined column is displayed in the Values panel of the Basic settings view.
4.
In the Number of rows field, enter the number of rows to be generated, and click in the Value cell and set
the value for the IP address.
5.
In the design workspace, select tAddLocationFromIP and click the Component tab to define the basic
settings for tAddLocationFromIP.
1927
6.
Click the Sync columns button to synchronize the schema with the input schema set with tFixedFlowInput.
7.
Browse to the GeoIP.dat file to set its path in the Database filepath field.
Ensure to download the latest version of the IP address lookup database file from the relevant site as indicated in the
Basic settings view of tAddLocationFromIp.
8.
In the Input parameters panel, set your input parameters as needed. In this scenario, the input column is the
ip column defined earlier that holds an IP address.
9.
In the Location type panel, set location type as needed. In this scenario, we want to display the country name.
10. In the design workspace, select tLogRow and click the Component tab and define the basic settings for
tLogRow as needed. In this scenario, we want to display values in cells of a table.
2.
One row is generated to display the country name that is associated with the set IP address.
1928
tBufferInput
tBufferInput
tBufferInput properties
Component family
Misc
Function
Purpose
The tBufferInput component retrieves data bufferized via a tBufferOutput component, for
example, to process it in another subjob.
Basic settings
Usage
This component is the start component of a secondary Job which is triggered automatically at
the end of the main Job.
Drop the following components from the Palette onto the design workspace: tFileInputDelimited and
tBufferOutput.
Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the access parameters
to the input file.
1929
In the File Name field, browse to the delimited file holding the data to be bufferized.
Define the Row and Field separators, as well as the Header.
Click [...] next to the Edit schema field to describe the structure of the file.
Generally speaking, the schema is propagated from the input component and automatically fed into the tBufferOutput
schema. But you can also set part of the schema to be bufferized if you want to.
Drop the tBufferInput and tLogRow components from the Palette onto the design workspace below the subjob
you just created.
Connect tFileInputDelimited and tBufferInput via a Trigger > OnSubjobOk link and connect tBufferInput
and tLogRow via a Row > Main link.
Double-click tBufferInput to set its Basic settings in the Component view.
In the Basic settings view, click [...] next to the Edit Schema field to describe the structure of the file.
Use the schema defined for the tFileInputDelimited component and click OK.
The schema of the tBufferInput component is automatically propagated to the tLogRow. Otherwise, doubleclick tLogRow to display the Component view and click Sync column.
Save your Job and press F6 to execute it.
1930
The standard console returns the data retrieved from the buffer memory.
1931
tBufferOutput
tBufferOutput
tBufferOutput properties
Component family
Misc
Function
This component collects data in a buffer in order to access it later via webservice for example.
Purpose
This component allows a Webservice to access data. Indeed it had been designed to be
exported as Webservice in order to access data on the web application server directly. For more
information, see Talend Studio User Guide.
Basic settings
Usage
This component is not startable (green background) and it requires an output component.
Create two Jobs: a first Job (BufferFatherJob) runs the second Job and displays its content onto the Run console.
The second Job (BufferChildJob) stores the defined data into a buffer memory.
On the first Job, drop the following components: tRunJob and tLogRow from the Palette to the design
workspace.
1932
On the second Job, drop the following components: tFileInputDelimited and tBufferOutput the same way.
Lets set the parameters of the second Job first:
Select the tFileInputDelimited and on the Basic Settings tab of the Component view, set the access parameters
to the input file.
In File Name, browse to the delimited file whose data are to be bufferized.
Define the Row and Field separators, as well as the Header.
Generally the schema is propagated from the input component and automatically fed into the tBufferOutput
schema. But you could also set part of the schema to be bufferized if you want to.
Now on the other Job (BufferFatherJob) Design, define the parameters of the tRunJob component.
1933
Edit the Schema if relevant and select the column to be displayed. The schema can be identical to the bufferized
schema or different.
You could also define context parameters to be used for this particular execution. To keep it simple, the default
context with no particular setting is used for this use case.
Press F6 to execute the parent Job. The tRunJob looks after executing the child Job and returns the data onto
the standard console:
1934
Click the plus button to add the three parameter lines and define your variables.
Click OK to close the dialog box and accept propagating the changes when prompted by the system. The three
defined columns display in the Values panel of the Basic settings view of tFixedFlowInput.
Click in the Value cell of each of the first two defined columns and press Ctrl+Space to access the global
variable list.
From the global variable list, select TalendDate.getCurrentDate() and talendDatagenerator.getFirstName, for
the now and firstname columns respectively.
For this scenario, we want to define two context variables: nb_lines and lastname. In the first we set the number
of lines to be generated, and in the second we set the last name to display in the output list. The tFixedFlowInput
component will generate the number of lines set in the context variable with the three columns: now, firstname
and lastname. For more information about how to create and use context variables, see Talend Studio User Guide.
To define the two context variables:
Select tFixedFlowInput and click the Contexts tab.
In the Variables view, click the plus button to add two parameter lines and define them.
1935
Click the Values as table tab and define the first parameter to set the number of lines to be generated and the
second to set the last name to be displayed.
Click the Component tab to go back to the Basic settings view of tFixedFlowInput.
Click in the Value cell of lastname column and press Ctrl+Space to access the global variable list.
From the global variable list, select context.lastname, the context variable you created for the last name column.
1936
Click the Browse... button to select a directory to archive your Job in.
In the Build type panel, select the build type you want to use in the Tomcat webapp directory (WAR in this
example) and click Finish. The [Build Job] dialog box disappears.
Copy the War folder and paste it in a Tomcat webapp directory.
1937
The Job uses the default values of the context variables: nb_lines and lastname, that is it generates three lines with
the current date, first name and Ford as a last name.
You can modify the values of the context variables directly from your browser. To call the Job from your browser
and modify the values of the two context variables, type the following URL:
http://localhost:8080//export_job/services/export_job3?method=runJob&arg1=--context_param
%20lastname=MASSY&arg2=--context_param%20nb_lines=2.
%20 stands for a blank space in the URL language. In the first argument arg1, you set the value of the context
variable to display MASSY as last name. In the second argument arg2, you set the value of the context variable
to 2 to generate only two lines.
Click Enter to execute your Job from your browser.
1938
Set the Schema Type to Built-In and click the three-dot [...] button next to Edit Schema to describe the data
structure you want to call from the exported Job. In this scenario, the schema is made of three columns, now,
firstname, and lastname.
1939
Click the plus button to add the three parameter lines and define your variables. Click OK to close the dialog box.
In the WSDL field of the Basic settings view of tWebServiceInput, enter the URL http://localhost:8080/
export_job/services/export_job3?WSDL where export_job is the name od the webapp directory where the Job
to call is stored and export_job3 is the name of the Job itself.
1940
The system generates three columns with the current date, first name, and last name and displays them onto the
log console in a tabular mode.
1941
tContextDump
tContextDump
tContextDump properties
Component family
Misc
Function
tContextDump dumps the context setup of the current Job to the subsequent component.
Purpose
tContextDump copies the context setup of the current Job to a flat file, a database table, etc.,
which can then be used by tContextLoad. Together with tContextLoad, this component makes
it simple to apply the context setup of one Job to another.
Basic settings
Hide Password
Usage
As a start component, tContextDump dumps the context setup of the current Job to a file, a
database table, etc.
Limitation
n/a
Related Scenario
No scenario is available for this component yet.
1942
tContextLoad
tContextLoad
tContextLoad properties
Component family
Misc
Function
Purpose
Basic settings
If a variable loaded, but not If a variable is loaded but does not appear in the context, select
in the context
how the notification must be displayed. In the shape of an Error,
a warning or an information (info).
If a variable in the context, If a variable appears in the context but is not loaded, select how the
but not loaded
notification must be displayed. In the shape of an Error, a warning
or an information (info).
Print operations
Select this check box to display the context parameters set in the
Run view.
Disable errors
Disable warnings
Disable infos
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows.
Advanced settings
tStatCatcher Statistics
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to
turn on or off the Print operations option dynamically at runtime.
When a dynamic parameter is defined, the corresponding Print operations option in the Basic
settings view becomes unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
This component relies on the data flow to load the context values to be used, therefore it requires
a preceding input component and thus cannot be a start component.
Limitation
tContextLoad does not create any non-defined variable in the default context.
1943
Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters
Drop a tFileInputDelimited component and a tContextLoad component from the Palette onto the design
workspace, and link them using a Row > Main connection to form the first subjob.
2.
Drop a tMysqlInput component and a tLogRow component onto the design workspace, and link them using
a Row > Main connection to form the second subjob.
3.
Create two delimited files corresponding to the two contexts in this scenario, namely two databases we will
access, and name them test_connection.txt and prod_connection.txt, which contain the database connection
details for testing and actual production purposes respectively. Each file is made of two columns, containing
the parameter names and the corresponding values respectively. Below is an example:
host;localhost
port;3306
database;test
username;root
password;talend
2.
1944
Select the Contexts view of the Job, click the Variables tab, and click the
the table to define the following parameters:
Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters
3.
icon at the upper right corner of the panel to open the [Configure
4.
Select the default context, click the Edit button and rename the context to Test.
5.
Click New to add a new context named Production. Then click OK to close the dialog box.
1945
Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters
6.
Back in the Values as tree tab view, expand the filename variable node, type in the prompt message in the
Prompt field, type in the full paths to the delimited files for the two contexts in the respective Value field,
and select the Prompt check box for each context.
7.
Expand the printOperations variable node, type in the prompt message in the Prompt field, select false as
the variable value for the Production context and true for the Test context, and select the Prompt check
box for each context.
1946
In the tFileInputDelimited component Basic settings panel, fill the File name/Stream field with the relevant
context variable we just defined: context.filename.
Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters
2.
Define the file schema manually (Built-in). It contains two columns defined as: Key and Value.
3.
4.
5.
6.
Fill the Host, Port, Database, Username, and Password fields with the relevant variables stored in the
delimited files and defined in the Contexts tab view: context.host, context.port, context.database,
context.username, and context.password respectively in this example, and fill the Table Name field
with the actual database table name to read data from, customers for both databases in this example.
1947
Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters
7.
Then fill in the Schema information. If you stored the schema in the Repository Metadata, then you can
retrieve it by selecting Repository and the relevant entry in the list.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
In this example, the schema of both database tables is made of four columns: id (INT, 2 characters
long), firstName (VARCHAR, 15 characters long), lastName (VARCHAR, 15 characters long), and city
(VARCHAR, 15 characters long).
8.
In the Query field, type in the SQL query to be executed on the DB table specified. In this example, simply
click Guess Query to retrieve all the columns of the table, which will be displayed on the Run tab, through
the tLogRow component.
9.
In the Basic settings view of the tLogRow component, select the Table option to display data records in
the form of a table.
Press Ctrl+S to save the Job, and press F6 to run the Job using the default context, which is Test in this
use case.
A dialog box appears to prompt you to specify the delimited file to read and decide whether to display the
set context parameters on the console.
1948
Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters
You can specify a file other than the default one if needed, and clear the Show loaded variables check box
if you do not want to see the set context variables on the console. To run the Job using the default settings,
click OK.
The context parameters and content of the database table in the Test context are all displayed on the Run
console.
2.
Now select the Production context and press F6 to launch the Job again. When the prompt dialog box appears,
simply click OK to run the Job using the default settings.
1949
Scenario: Reading data from different MySQL databases using dynamically loaded connection parameters
The content of the database table in the Production context is displayed on the Run console. Because the
printOperations variable is set to false, the set context parameters are not displayed on the console this time.
1950
tFixedFlowInput
tFixedFlowInput
tFixedFlowInput properties
Component family
Misc
Function
tFixedFlowInput generates as many lines and columns as you want using the context variables.
Purpose
Basic settings
Mode
From the three options, select the mode that you want to use.
Use Single Table : Enter the data that you want to generate in the
relevant value field.
Use Inline Table : Add the row(s) that you want to generate.
Use Inline Content : Enter the data that you want to generate,
separated by the separators that you have already defined in the Row
and Field Separator fields.
Number of rows
Values
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at a Job
level as well as at each component level.
Usage
This component can be used as a start or intermediate component and thus requires an output
component.
Related scenarios
For related scenarios, see:
section Scenario 2: Buffering output data on the webapp server.
section Scenario: Iterating on a DB table and listing its column names.
section Scenario: Filtering and searching a list of names.
1951
tMemorizeRows
tMemorizeRows
tMemorizeRows properties
Component family
Misc
Function
Purpose
tMemorizeRows memorizes a sequence of rows that pass this component and then allows its
following component(s) to perform operations of your interest on the memorized rows.
Basic settings
Columns to memorize
Advanced settings
tStatCatcher Statistics
Usage
This component can be used as intermediate step in a data flow or the last step before beginning
a subjob.
Note: You can use the global variable NB_LINE_ROWS to retrieve the value of the Row count
to memorize field of the tMemorizeRows component.
Connections
1952
Drop tRowGenerator, tSortRow, tMemorizeRows, tJavaFlex and tJava on the design workspace.
2.
3.
Do the same to link together tSortRow, tMemorizeRows and tJavaFlex using the Row > Main link.
4.
1953
2.
In this editor, click the plus button three times to add three columns and name them as: id, name, age.
3.
4.
5.
In the Functions column, select random for id and age, then select getFirstName for name.
6.
7.
In the Column column, click age to open its corresponding Function parameters view in the lower part
of this editor.
In the Value column of the Function parameters view, type in the minimum age and maximum age that
will be generated for the 12 customers. In this example, they are 10 and 25.
1954
2.
In the Criteria table, click the plus button to add one row.
3.
In the Schema column column, select the data column you want to base the sorting operation on. In this
example, select age as it is the ages that should be compared and counted.
4.
In the Sort num or alpha column, select the type of the sorting operation. In this example, select num, that
is numerical, as age is integer.
5.
In the Order asc or desc column, select desc as the sorting order for this scenario.
2.
In the Row count to memorize field, type in the maximum number of rows to be memorized at any given time.
As you need to compare ages of two customers for each time, enter 2. Thus, this component memorizes two
rows at maximum at any given moment and always indexes the newly incoming row as 0 and the previously
incoming row as 1.
3.
In the Memorize column of the Columns to memorize table, select the check box(es) to determine the
column(s) to be memorized. In this example, select the check box corresponding to age.
1955
2.
In the Start code area, enter the Java code that will be called during the initialization phase. In this example,
type in int count=0; in order to declare a variable count and assign the value 0 to it.
3.
In the Main code area, enter the Java code to be applied to each row in the data flow. In this scenario, type in
if(!age_tMemorizeRows_1[0].equals(age_tMemorizeRows_1[1]))
{
count++;
}
System.out.println(age_tMemorizeRows_1[0]);
This code compares two ages memorized by tMemorizeRows each time and count one change every
time when the ages are found different. Then this code displays the ages that have been indexed as 0 by
tMemorizeRows.
4.
In the End code area, enter the Java code that will be called during the closing phase. In this example, type
in globalMap.put("count", count); to output the count result.
5.
6.
In
1956
ages:
2.
Press F6, or click Run on the Run console to execute the Job.
In the console, you can read that there are 10 different ages within the group of 12 customers.
1957
tMsgBox
tMsgBox
tMsgBox properties
Component family
Misc
Function
Opens a dialog box with an OK button requiring action from the user.
Purpose
Basic settings
Title
Text entered shows on the title bar of the dialog box created.
Buttons
Usage
Icon
Message
This component can be used as intermediate step in a data flow or as a start or an end object
in the Job flowchart.
It can be connected to the next/previous component using either a Row or Iterate link.
Limitation
n/a
1958
After the user clicked OK, the Run log is updated accordingly.
Related topic: see Talend Studio User Guide.
1959
tRowGenerator
tRowGenerator
tRowGenerator properties
Component family
Misc
Function
tRowGenerator generates as many rows and fields as are required using random values taken
from a list.
Purpose
It can be used to create an input flow in a Job for testing purposes, in particular for boundary
test sets
Basic settings
Schema
schema
and
RowGenerator editor The editor allows you to define the columns and the nature of data to
be generated. You can use predefined routines or type in the function
to be used to generate the data specified
Usage
The tRowGenerator Editors ease of use allows users without any Java knowledge to generate
random data for test purposes.
Limitation
n/a
1960
Make sure you define then the nature of the data contained in the column, by selecting the Type in the
list. According to the type you select, the list of Functions offered will differ. This information is therefore
compulsory.
Some extra information, although not required, might be useful such as Length, Precision or Comment. You
can also hide these columns, by clicking on the Columns drop-down button next to the toolbar, and unchecking
the relevant entries on the list.
In the Function area, you can select the predefined routine/function if one of them corresponds to your
needs.You can also add to this list any routine you stored in the Routine area of the Repository. Or you can type
in the function you want to use in the Function definition panel. Related topic: see Talend Studio User Guide.
Click Refresh to have a preview of the data generated.
Type in a number of rows to be generated. The more rows to be generated, the longer itll take to carry out
the generation operation.
In the Value area, type in the Java function to be used to generate the data specified.
Click on the Preview tab and click Preview to check out a sample of the data generated.
1961
Drop a tRowGenerator and a tLogRow component from the Palette to the design workspace.
Right-click tRowGenerator and select Row > Main. Drag this main row link onto the tLogRow component
and release when the plug symbol displays.
Double click tRowGenerator to open the Editor.
Define the fields to be generated.
The random ID column is of integer type, the First and Last names are of string type and the Date is of date type.
In the Function list, select the relevant function or set on the three dots for custom function.
On the Function parameters tab, define the Values to be randomly picked up.
First_Name and Last_Name columns are to be generated using the getAsciiRandomString function that is
predefined in the system routines. By default the length defined is 6 characters long. You can change this if
need be.
The Date column calls the predefined getRandomDate function. You can edit the parameter values in the
Function parameters tab.
Set the Number of Rows to be generated to 50.
Click OK to validate the setting.
Double click tLogRow to view the Basic settings. The default setting is retained for this Job.
Press F6 to run the Job.
1962
The 50 rows are generated following the setting defined in the tRowGenerator editor and the output is displayed
in the Run console.
1963
Orchestration components
This chapter details the main components that you can find in Orchestration family of the Palette in the
Integration perspective of Talend Studio.
The Orchestration family groups together components that help you to sequence or orchestrate tasks or processing
in your Jobs or subjobs and so on.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tFileList
tFileList
tFileList belongs to two component families: File and Orchestration. For more information on tFileList, see
section tFileList.
1966
tFlowToIterate
tFlowToIterate
tFlowToIterate Properties
Component family
Orchestration
Function
Purpose
This component is used to read data line by line from the input flow and store the data entries
in iterative global variables.
Basic settings
Use the default (key, value) When selected, the system uses the default value of the global
in global variables
variable in the current Job.
Customize
key: Type in a name for the new global variable. Press Ctrl+Space
to access all available variables either global or user-defined.
value: Click in the cell to access a list of the columns attached to
the defined global variable.
Usage
You cannot use this component as a start component. tFlowToIterate requires an output
component.
Global Variables
Connections
Limitation
n/a
1967
Drop the following components from the Palette onto the design workspace: two tFileInputDelimited
components, a tFlowToIterate, and a tLogRow.
2.
Connect the first tFileInputDelimited to tFlowToIterate using a Row > Main link, tFlowToIterate to the
second tFileInputDelimited using an Iterate link, and the second tFileInputDelimited to tLogRow using
a Row > Main link.
2.
Click the [...] button next to the File Name field to select the path to the input file.
The File Name field is mandatory.
The input file used in this scenario is Customers.txt. It is a text file that contains a list of names of three other
simple text files: Name.txt, E-mail.txt and Address.txt. The first text file, Name.txt, is made of one column
holding customers names. The second text file, E-mail.txt, is made of one column holding customers e-mail
addresses. The third text file, Address.txt, is made of one column holding customers postal addresses.
Fill in all other fields as needed. For more information, see section tFileInputDelimited properties. In this
scenario, the header and the footer are not set and there is no limit for the number of processed rows.
3.
1968
Click Edit schema to describe the data structure of this input file. In this scenario, the schema is made of
one column, FileName.
4.
Click the plus button to add new parameter lines and define your variables, and click in the key cell to enter
the variable name as desired. In this scenario, one variable is defined: "Name_of_File".
Alternatively, you can select the Use the default (key, value) in global variables check box to use the default
in global variables.
5.
In the File name field, enter the directory of the files to be read, and then press Ctrl+Space to select the
global variable "Name_of_File". In this scenario, the syntax is as follows:
"C:/scenario/flow_to_iterate/"+((String)globalMap.get("Name_of_File"))
Click Edit schema to define the schema column name. In this scenario, it is RowContent.
Fill in all other fields as needed. For more information, see section tFileInputDelimited properties.
1969
6.
In the design workspace, select the last component, tLogRow, and click the Component tab to define its
basic settings.
Define your settings as needed. For more information, see section tLogRow properties.
2.
Customers names, customers e-mails, and customers postal addresses appear on the console preceded by the
schema column name.
1970
tForeach
tForeach
tForeach Properties
Component Family
Orchestration
Function
Purpose
Basic settings
Values
Use the [+] button to add rows to the Values table. Then click on the fields
to enter the list values to be iterated upon, between double quotation marks.
Advanced settings
tStatCatcher
Statistics
Select this check box to collect the log data at a component level.
Usage
tForeach is an input component and requires an Iterate link to connect it to another component.
Limitation
n/a
Click the [+] button to add as many rows to the Values list as required.
1971
Click on the Value fields to enter the list values, between double quotation marks.
Double-click tJava to open its Basic settings view:
Enter
the
following
Java
code
in
the
System.out.println(globalMap.get("tForeach_1_CURRENT_VALUE")+"_out");
Code
1972
area:
tInfiniteLoop
tInfiniteLoop
tInfiniteLoop Properties
Component Family
Orchestration
Function
Purpose
Basic settings
Advanced settings
tStatCatcher
Statistics
Usage
tInifniteLoop is an input component and requires an Iterate link to connect it to the following
component.
Global Variables
Select this check box to collect the log data at a component level.
Connections
Limitation
n/a
Related scenario
For an example of the kind of scenario in which tInifniteLoop might be used, see section Scenario: Job execution
in a loop, regarding the tLoop component.
1973
tIterateToFlow
tIterateToFlow
tIterateToFlow Properties
Component family
Orchestration
Function
Purpose
Basic settings
Mapping
Advanced Settings
tStatCatcher Statistics
Usage
This component is not startable (green background) and it requires an output component.
Connections
Select this check box to collect the log data at a component level.
Drop the following components: tFileList, tIterateToFlow and tLogRow from the Palette to the design
workspace.
Connect the tFileList to the tIterateToFlow using an iterate link and connect the Job to the tLogRow using
a Row main connection.
1974
In the tFileList Component view, set the directory where the list of files is stored.
In this example, the files are three simple .txt files held in one directory: Countries.
No need to care about the case, hence clear the Case sensitive check box.
Leave the Include Subdirectories check box unchecked.
Then select the tIterateToFlow component et click Edit Schema to set the new schema
Add two new columns: Filename of String type and Date of date type. Make sure you define the correct pattern
in Java.
Click OK to validate.
Notice that the newly created schema shows on the Mapping table.
In each cell of the Value field, press Ctrl+Space bar to access the list of global and user-specific variables.
For the Filename column, use the global variable: tFileList_1CURRENT_FILEPATH. It retrieves the current
filepath in order to catch the name of each file, the Job iterates on.
For the Date column, use the Talend routine:TalendDate.getCurrentDate() (in Java)
Then on the tLogRow component view, select the Print values in cells of a table check box.
Save your Job and press F6 to execute it.
1975
The filepath displays on the Filename column and the current date displays on the Date column.
1976
tLoop
tLoop
tLoop Properties
Component family
Orchestration
Function
Purpose
Basic settings
Loop Type
For
While
From
Type in the first instance number which the loop should start from. A start
instance number of 2 with a step of 2 means the loop takes on every even
number instance.
To
Type in the last instance number which the loop should finish with.
Step
Type in the step the loop should be incremented of. A step of 2 means
every second instance.
Declaration
Condition
Type in the condition that should be met for the loop to end.
Iteration
Values
increasing
Usage
Global Variables
are Select this check box to only allow an increasing sequence. Deselect this
check box to only allow a decreasing sequence.
tLoop is to be used as a start component and can only be used with an iterate connection to the
next component.
Current value: Indicates the current value. This is available as a Flow
variable.
Returns an integer.
Current iteration: Indicates the number of the current iteration. This is
available as a Flow variable
Returns an integer.
The CURRENT_VALUE variable is available only in case of a For type
loop.
For further information about variables, see Talend Studio User Guide.
Connections
1977
n/a
In the parent Job, drop a tLoop, a tRunJob and a tSleep component from the Palette to the design workspace.
Connect the tLoop to the tRunJob using an Iterate connection.
Then connect the tRunJob to a tSleep component using a Row connection.
On the child Job, drop the following components: tPOP, tFileInputMail and tLogRow the same way.
On the Basic settings panel of the tLoop component, type in the instance number to start from (1), the instance
number to finish with (5) and the step (1)
On the Basic settings panel of the tRunJob component, select the child Job in the list of stored Jobs offered.
In this example: popinputmail
Select the context if relevant. In this use case, the context is default with no variables stored.
In the tSleep Basic settings panel, type in the time-off value in second. In this example, type in 3 seconds in
the Pause field.
Then in the child Job, define the connection parameters to the pop server, on the Basic settings panel.
In the tFileInputMail Basic settings panel, select a global variable as File Name, to collect the current file in
the directory defined in the tPOP component. Press Ctrl+Space bar to access the variable list. In this example,
the variable to be used is: ((String)globalMap.get("tPOP_1_CURRENT_FILEPATH"))
Define the Schema, for it to include the mail element to be processed, such as author, topic, delivery date and
number of lines.
1978
In the Mail Parts table, type in the corresponding Mail part for each column defined in the schema. ex: author
comes from the From part of the email file.
Then connect the tFileInputMail to a tLogRow to check out the execution result on the Run view.
Press F6 to run the Job.
1979
tPostjob
tPostjob
tPostjob Properties
Component family
Orchestration
Function
Purpose
Usage
tPostjob is a start component and can only be used with an iterate connection to the next
component.
Connections
Limitation
n/a
For more information about the tPostjob component, see Talend Studio User Guide.
Related scenarios
For a scenario that uses the tPostjob component, see section Scenario: Handling files before and after the
execution of a data Job.
1980
tPrejob
tPrejob
tPrejob properties
Component family
Orchestration
Function
Purpose
Usage
tPrejob is a start component and can only be used with an iterate connection to the next
component.
Connections
Limitation
n/a
For more information about the tPrejob component, see Talend Studio User Guide.
1981
Scenario: Handling files before and after the execution of a data Job
With the main Job open on the design workspace, add a tPrejob, a tPostjob, a tFileDelete, and two tFileCopy
components to the Job.
2.
Link the tPrejob component to the first tFileCopy component using a Trigger > On Component Ok
connection to build the pre-job.
3.
Link the tPostjob component the tFileDelete component using a Trigger > On Component Ok connection,
and link the tFileDelete component to the other tFileCopy component to build the post-job.
4.
2.
Fill the File Name field with the path and filename of the temporary text file to be renamed, D:/temp/
tempdata.csv in this example.
1982
Scenario: Handling files before and after the execution of a data Job
3.
In the Destination directory field, specify or browse to destination directory. In this example, we will save
the backup copy in the same directory, D:/temp/.
4.
Select the Rename check box, and specify the new filename in the Destination filename field, backuptempdata.csv. Leave the other parameters as they are.
In the Basic settings view of the tFileDelete component, fill the File Name field with the path and filename
of the temporary file to be deleted, D:/temp/tempdata.csv in this example, and leave the other parameters
as they are.
2.
3.
Fill the File Name field with the path and filename of the backup file, D:/temp/backup-tempdata.csv in this
example.
4.
In the Destination directory field, specify or browse to destination directory, D:/temp/ in this example.
5.
Select the Rename check box, and specify the orignal name of the temporary file in the Destination filename
field, tempdata.csv.
6.
Select the Remove source file check box to remove the backup file after the renaming action. Leave the
other parameters as they are.
1983
Scenario: Handling files before and after the execution of a data Job
If the temporary file does not exist, the two tFileCopy components will generate an error, but this does not prevent
the main data Job from being executed.
For the execution result of the main data Job, see section Scenario 2: Finding duplicate files between two folders.
1984
tReplicate
tReplicate
tReplicate Properties
Component family
Orchestration
Function
Purpose
Basic settings
Usage
This component is not startable (green background), it requires an Input component and an output
component.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Connections
1985
Drop the following components from the Palette to the design workspace: one tFileInputDelimited
component, one tReplicate component, two tSortRow components, and two tLogRow components.
2.
3.
Repeat the step above to connect tReplicate to two tSortRow components respectively and connect
tSortRow to tLogRow.
4.
1986
Double-click the tFileInputDelimited component to open its Basic settings view in the Component tab.
2.
Click the [...] button next to the File name/Stream field to browse to the file from which you want to read the
input flow. In this example, the input file is Names&States.csv, which contains two columns: name and state.
name;state
Andrew Kennedy;Mississippi
Benjamin Carter;Louisiana
Benjamin Monroe;West Virginia
Bill Harrison;Tennessee
Calvin Grant;Virginia
Chester Harrison;Rhode Island
Chester Hoover;Kansas
Chester Kennedy;Maryland
Chester Polk;Indiana
Dwight Nixon;Nevada
Dwight Roosevelt;Mississippi
Franklin Grant;Nebraska
3.
Fill in the Header, Footer and Limit fields according to your needs. In this example, type in 1 in the Header
field to skip the first row of the input file.
4.
Click Edit schema to define the data structure of the input flow.
5.
Double-click the first tSortRow component to open its Basic settings view.
6.
In the Criteria panel, click the [+] button to add one row and set the sorting parameters for the schema
column to be processed. To sort the input data by name, select name under Schema column. Select alpha
as the sorting type and asc as the sorting order.
1987
For more information about those parameters, see section tSortRow properties.
7.
Double-click the second tSortRow component and repeat the step above to define the sorting parameters
for the state column.
8.
In the Basic settings view of each tLogRow component, select Table in the Mode area for a better view
of the Job execution result.
2.
1988
The data sorted by name and state are both displayed on the console.
1989
tRunJob
tRunJob
tRunJob belongs to two component families: System and Orchestration. For more information on tRunJob, see
section tRunJob.
1990
tSleep
tSleep
tSleep Properties
Component family
Orchestration
Function
Purpose
Allows you to identify possible bottlenecks using a time break in the Job for testing or tracking
purpose. In production, it can be used for any needed pause in the Job to feed input flow for
example.
Basic settings
Usage
tSleep component is generally used as a middle component to make a break/pause in the Job,
before resuming the Job.
Connections
Limitation
n/a
Related scenarios
For use cases in relation with tSleep, see section Scenario: Job execution in a loop.
1991
tUnite
tUnite
tUnite Properties
Component family
Orchestration
Function
Purpose
Basic settings
Advanced settings
tStatCatcher Statistics
Usage
This component is not startable and requires one or several input components and an output
component.
Global Variables
Select this check box to collect log data at the component level.
Connections
Limitation
n/a
1992
Drop the following components onto the design workspace: tFileList, tFileInputDelimited, tUnite and
tLogRow.
2.
Connect the tFileList to the tFileInputDelimited using an Iterate connection and connect the other
component using a row main link.
In the tFileList Basic settings view, browse to the directory, where the files to merge are stored.
The files are pretty basic and contain a list of countries and their respective score.
2.
In the Case Sensitive field, select Yes to consider the letter case.
1993
3.
Select the tFileInputDelimited component, and display this components Basic settings view.
4.
Fill in the File Name/Stream field by using the Ctrl+Space bar combination to access the variable
completion list, and selecting tFileList.CURRENT_FILEPATH from the global variable list to process all
files from the directory defined in the tFileList.
5.
Click the Edit Schema button and set manually the 2-column schema to reflect the input files content.
For this example, the 2 columns are Country and Points. They are both nullable. The Country column is of
String type and the Points column is of Integer type.
6.
Click OK to validate the setting and accept to propagate the schema throughout the Job.
7.
Then select the tUnite component and display the Component view. Notice that the output schema strictly
reflects the input schema and is read-only.
8.
In the Basic settings view of tLogRow, select the Table option to display properly the output values.
2.
Press F6, or click Run on the Run console to execute the Job.
The console shows the data from the various files, merged into one single table.
1994
1995
tWaitForFile
tWaitForFile
tWaitForFile properties
Component family
Orchestration
Function
tWaitForFile component iterates on a given folder for file insertion or deletion then triggers a
subjob to be executed when the condition is met.
Purpose
This component allows a subjob to be triggered given a condition linked to file presence or
removal.
Basic settings
Time (in seconds) between Set the time interval in seconds between each check for the file.
iterations
Max. number of iterations Number of checks for file before the jobs times out.
(infinite loop if empty)
Directory to scan
File mask
Include subdirectories
Case sensitive
Then
Select the action to be carried out: either stop the iterations when
the condition is met (exit loop) or continue the loop until the end
of the max iteration number (continue loop).
Advanced Settings
Usage
This component plays the role of the start (or trigger) component of the subjob which gets
executed under the condition described. Therefore this component requires a subjob to be
connected to via an Iterate link.
Global Variables
Select this check box so that the subjob only triggers after the
file insertion/update/removal operation is complete. In case the
operation is incomplete, the subjob will not trigger.
1996
Returns a string.
Created File Name: Indicates the name and path to a newly
created file which activated the trigger. This is available as a Flow
variable.
Returns a string.
Updated File: Indicates the name and path to a file which has been
updated, thereby activating the trigger. This is available as a Flow
variable.
Returns a string.
File Name: Indicates the name of a file which has been created,
deleted or updated, thereby activating the trigger. This is available
as a Flow variable.
Returns a string.
Not Updated File Name: Indicates the names of files which have
not been updated, thereby activating the trigger. This is available
as a Flow variable.
Returns a string.
For further information about variables, see Talend Studio User
Guide.
Connections
Limitation
n/a
This use case only requires two components from the Palette: tWaitForFile and tMsgbox
1997
Click and place these components on the design workspace and connect them using an Iterate link to implement
the loop.
Then select the tWaitForFile component, and on the Basic Settings view of the Component tab, set the
condition and loop properties:
In the Time (in seconds) between iteration field, set the time in seconds you want to wait before the next
iteration starts. In this example, the directory will be scanned every 5 seconds.
In the Max. number of iterations (infinite loop if empty) field, fill out the number of iterations max you want to
have before the whole Job is forced to end. In this example, the directory will be scanned a maximum of 5 times.
In the Directory to scan field, type in the path to the folder to scan.
In the Trigger action when field, select the condition to be met, for the subjob to be triggered. In this use case,
the condition is a file is deleted (or moved) from the directory.
In the Then field, select the action to be carried out when the condition is met before the number of iteration
defined is reached. In this use case, as soon as the condition is met, the loop should be ended.
Then set the subjob to be executed when the condition set is met. In this use case, the subjob simply displays
a message box.
Select the tMsgBox component, and on the Basic Setting view of the Component tab, set the message to be
displayed.
Fill out the Title and Message fields.
Select the type of Buttons and the Icon
In the Message field, you can write any type of message you want to display and use global variables available
in the auto-completion list via Ctrl+Space combination.
The message is:
"Deleted file: "+((String)globalMap.get("tWaitForFile_1_DELETED_FILE"))+"
on iteration Nr:"+((Integer)globalMap.get("tWaitForFile_1_CURRENT_ITERATION"))
1998
Then execute the Job via the F6 key. While the loop is executing, remove a file from the location defined. The
message pops up and shows the defined message.
1999
tWaitForSocket
tWaitForSocket
tWaitForSocket properties
Component Family
Orchestration
Function
tWaitForSocket component makes a loop on a defined port, to look for data, and triggers a
subjob when the condition is met.
Purpose
Basic settings
Port
Then
Advanced settings
tStatCatcher Statistics
Select this check box to collect the log data at a component level.
Usage
This is an input, trigger component for the subjob executed depending on the condition set.
Hence, it needs to be connected to a subjob via an Iterate link.
Global Variables
Client input data: Returns the data transmitted by the client. This
is available as a Flow variable.
Returns a string.
For further information about variables, see Talend Studio User
Guide.
Connections
Limitation
n/a
Related scenario
No scenario is available for this component yet.
2000
tWaitForSqlData
tWaitForSqlData
tWaitForSqlData properties
Component family
Orchestration
Function
Purpose
This component allows a subjob to be triggered given a condition linked to SQL data presence.
Basic settings
Wait at each iteration (in Set the time interval in seconds between each check for the sql data.
seconds)
Max. iterations (infinite if Number of checks for sql data before the Jobs times out.
empty)
Use an existing connection/ A connection needs to be open to allow the loop to check for sql
Component List
data on the defined DB.
When a Job contains the parent Job and the child
Job, Component list presents only the connection
components in the same Job level, so if you need to use
an existing connection from the other level, you can
From the available database connection component
in the level where the current component is, select
the Use or register a shared DB connection check
box. For more information about this check box,
see Databases - traditional components, Databases
- appliance/datawarehouse components, or Databases
- other components for the connection components
according to the database you are using.
Otherwise, still in the level of the current component,
deactivate the connection components and use Dynamic
settings of the component to specify the intended
connection manually. In this case, make sure the
connection name is unique and distinctive all over
through the two Job levels. For more information about
Dynamic settings, see Talend Studio User Guide.
Table to scan
Trigger
action
rowcount is
Usage
Global Variables
Value
Then
Select the action to be carried out: either stop the iterations when
the condition is met (exit loop) or continue the loop until the end
of the max iteration number (continue loop).
Although this component requires a Connection component to open the DB access, it plays also
the role of the start (or trigger) component of the subjob which gets executed under the condition
described. Therefore this component requires a subjob to be connected to via an Iterate link.
CURRENT_ITERATION: Returns the number of the current
iteration. This is a Flow variable and it returns an integer.
ROW_COUNT: Indicates the number of records detected in the
table. This is a Flow variable and it returns an integer.
2001
n/a
This scenario describes a Job reading a DB table and waiting for data to be put in this table in order for a subjob
to be executed. When the condition of the data insertion in the table is met, then the subjob performs a Select* on
the table and simply displays the content of the inserted data onto the standard console.
Drop the following components from the Palette onto the design workspace: tMySqlConnection,
tWaitForSqlData, tMysqlInput, tLogRow.
Connect the tMysqlConnection component to the tWaitforSqlData using an OnSubjobOK link, available
on the right-click menu.
Then connect the tWaitForSqlData component to the subjob using an Iterate link as no actual data is
transferred in this part. Indeed, simply a loop is implemented by the tWaitForSqlData until the condition is met.
On the subjob to be executed if the condition is met, a tMysqlInput is connected to the standard console
component, tLogRow. As the connection passes on data, use a Row main link.
Now, set the connection to the table to check at regular intervals. On the Basic Settings view of the
tMySqlConnection Component tab, set the DB connection properties.
Fill out the Host, Port, Database, Username, Password fields to open the connection to the Database table.
Select the relevant Encoding if needed.
2002
Then select the tWaitForSqlData component, and on the Basic Setting view of the Component tab, set its
properties.
In the Wait at each iteration field, set the time in seconds you want to wait before the next iteration starts.
In the Max iterations field, fill out the number of iterations max you want to have before the whole Job is
forced to end.
The tWaitForSqlData component requires a connection to be open in order to loop on the defined number of
iteration. Select the relevant connection (if several) in the Component List combo box.
In the Table to scan field, type in the name of the table in the DB to scan.In this example: test_datatypes.
In the Trigger action when rowcount is and Value fields, select the condition to be met, for the subjob to be
triggered. In this use case, the number of rows in the scanned table should be greater or equal to 1.
In the Then field, select the action to be carried out when the condition is met before the number of iteration
defined is reached. In this use case, as soon as the condition is met, the loop should be ended.
Then set the subjob to be executed when the condition set is met. In this use case, the subjob simply selects the
data from the scanned table and displays it on the console.
Select the tMySqlInput component, and on the Basic Setting view of the Component tab, set the connection
to the table.
If the connection is set in the Repository, select the relevant entry on the list. Or alternatively, select the Use an
existing connection check box and select the relevant connection component on the list.
In this use case, the schema corresponding to the table structure is stored in the Repository.
Fill out the Table Name field with the table the data is extracted from, Test_datatypes.
Then in the Query field, type in the Select statement to extract the content from the table.
No particular setting is required in the tLogRow component for this use case.
2003
Then before executing the Job, make sure the table to scan (test_datatypes) is empty, in order for the condition
(greater or equal to 1) to be met. Then execute the Job by pressing the F6 key on your keyboard. Before the end
of the iterating loop, feed the test_datatypes table with one or more rows in order to meet the condition.
The Job ends when this table insert is detected during the loop, and the table content is thus displayed on the
console.
2004
Processing components
This chapter details the main components that you can find in Processing family of the Palette in the Integration
perspective of Talend Studio.
The Processing family gathers together components that help you to perform all types of processing tasks on data
flows, including aggregation, mapping, transformation, denormalizing, filtering and so on.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tAggregateRow
tAggregateRow
tAggregateRow properties
Component family
Processing
Function
tAggregateRow receives a flow and aggregates it based on one or more columns. For each output
line, are provided the aggregation key and the relevant result of set operations (min, max, sum...).
Purpose
Basic settings
Group by
Define the aggregation sets, the values of which will be used for
calculations.
Output Column: Select the column label in the list offered based
on the schema structure you defined. You can add as many output
columns as you wish to make more precise aggregations.
Ex: Select Country to calculate an average of values for each
country of a list or select Country and Region if you want to
compare one countrys regions with another country regions.
Input Column: Match the input column label with your output
columns, in case the output label of the aggregation set needs to
be different.
Operations
Select the type of operation along with the value to use for the
calculation and the output field.
Output Column: Select the destination field in the list.
Function: Select the operator among: count, min, max, avg, sum,
first, last, list, list(objects), count(distinct), standard deviation.
Input column: Select the input column from which the values are
taken to be aggregated.
Ignore null values: Select the check boxes corresponding to the
names of the columns for which you want the NULL value to be
ignored.
Advanced settings
Delimiter(only
operation)
for
list Enter the delimiter you want to use to separate the different
operations.
Use financial precision, Select this check box to use a financial precision. This is a max
this is the max precision precision but consumes more memory and slows the processing.
for sum and avg
We advise you to use the BigDecimal type for the output
operations, checked option
in order to obtain precise results.
heaps more memory and
slower than unchecked.
Check
type
(slower)
overflow Checks the type of data to ensure that the Job doesnt crash.
Check ULP (Unit in the Select this check box to ensure the most precise results possible
Last Place), ensure that a for the Float and Double types.
2006
Usage
This component handles flow of data therefore it requires input and output, hence is defined
as an intermediary step. Usually the use of tAggregateRow is combined with the tSortRow
component.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
n/a
From the File folder in the Palette, drop a tFileInputDelimited component to the design workspace.
Click the label and rename it as Countries. Or rename it from the View tab panel
In the Basic settings tab panel of this component, define the filepath and the delimitation criteria.
Click Edit schema... and set the columns: Countries and Points to match the file structure.
Then from the Processing folder in the Palette, drop a tAggregateRow component to the design workspace.
Rename it as Calculation.
Connect Countries to Calculation via a right-click and select Row > Main.
Double-click Calculation (tAggregateRow component) to set the properties. Click Edit schema and define the
output schema. You can add as many columns as you need to hold the set operations results in the output flow.
2007
In this example, well calculate the average notation value per country and we will display the max and the
min notation for each country, given that each country holds several notations. Click OK when the schema is
complete.
To carry out the various set operations, back in the Basic settings panel, define the sets holding the operations
in the Group By area. In this example, select Country as group by column. Note that the output column needs
to be defined a key field in the schema. The first column mentioned as output column in the Group By table is
the main set of calculation. All other output sets will be secondary by order of display.
Select the input column which the values will be taken from.
Then fill in the various operations to be carried out. The functions are average, min, max for this use case.
Select the input columns, where the values are taken from and select the check boxes in the Ignore null values
list as needed.
Drop a tSortRow component from the Palette onto the design workspace. For more information regarding this
component, see section tSortRow properties.
Connect the tAggregateRow to this new component using a row main link.
On the Component view of the tSortRow component, define the column the sorting is based on, the sorting
type and order.
2008
In this case, the column to be sorted by is Country, the sort type is alphabetical and the order is ascending.
Drop a tFileOutputDelimited from the Palette to the design workspace and define it to set the output flow.
Connect the tSortRow component to this output component.
In the Component view, enter the output filepath. Edit the schema if need be. In this case the delimited file is
of csv type. And select the Include Header check box to reuse the schema column labels in your output flow.
Press F6 to execute the Job. The csv file thus created contains the aggregating result.
2009
tAggregateSortedRow
tAggregateSortedRow
tAggregateSortedRow properties
Component family
Processing
Function
tAggregateSortedRow receives a sorted flow and aggregates it based on one or more columns. For
each output line, are provided the aggregation key and the relevant result of set operations (min, max,
sum...).
Purpose
Helps to provide a set of metrics based on values or calculations. As the input flow is meant to be
sorted already, the performance are hence greatly optimized.
Basic settings
Group by
Define the aggregation sets, the values of which will be used for
calculations.
Output Column: Select the column label in the list offered based
on the schema structure you defined. You can add as many output
columns as you wish to make more precise aggregations.
Ex: Select Country to calculate an average of values for each country
of a list or select Country and Region if you want to compare one
countrys regions with another country regions.
Input Column: Match the input column label with your output
columns, in case the output label of the aggregation set needs to be
different.
Operations
Select the type of operation along with the value to use for the
calculation and the output field.
Output Column: Select the destination field in the list.
Function: Select the operator among: count, min, max, avg, first, last.
Input column: Select the input column from which the values are
taken to be aggregated.
Ignore null values: Select the check boxes corresponding to the names
of the columns for which you want the NULL value to be ignored.
Advanced settings
tStatCatcher Statistics
Usage
This component handles flow of data therefore it requires input and output, hence is defined as an
intermediary step.
2010
Related scenario
Limitation
n/a
Related scenario
For related use case, see section Scenario: Aggregating values and sorting data.
2011
tConvertType
tConvertType
tConvertType properties
Component family
Processing
Function
tConvertType allows specific conversions at run time from one Talend java type to another.
Purpose
Helps to automatically convert one Talend java type to another and thus avoid compiling errors.
Basic settings
Auto Cast
Manual Cast
This mode is not visible if the Auto Cast check box is selected.
It allows you to precise manually the columns where a java type
conversion is needed.
Set empty values to Null This check box is selected to set the empty values of String or
before converting
Object type to null for the input data.
Die on error
This check box is selected to kill the Job when an error occurs.
Not
available
for Map/Reduce
Jobs.
Advanced settings
tStatCatcher Statistics
Usage
This component cannot be used as a start component as it requires an input flow to operate.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
n/a
2012
This Java scenario describes a four-component Job where the tConvertType component is used to convert Java
types in three columns, and a tMap is used to adapt the schema and have as an output the first of the three columns
and the sum of the two others after conversion.
Drop the following components from the Palette onto the design workspace: tConvertType, tMap, and
tLogRow.
2.
In the Repository tree view, expand Metadata and from File delimited drag the relevant node, JavaTypes
in this scenario, to the design workspace.
The [Components] dialog box displays.
3.
4.
2.
Set Property Type to Repository since the file details are stored in the repository. The fields to follow are
pre-defined using the fetched data.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Builtin. For further information about how to edit a Built-in schema, see Talend Studio User Guide.
The input file used in this scenario is called input. It is a text file that holds string, integer, and float java types.
2013
Fill in all other fields as needed. For more information, see section tFileInputDelimited. In this scenario, the
header and the footer are not set and there is no limit for the number of processed rows.
3.
Click Edit schema to describe the data structure of this input file. In this scenario, the schema is made of
three columns, StringtoInteger, IntegerField, and FloatToInteger.
4.
5.
6.
Set Schema Type to Built in, and click Sync columns to automatically retrieve the columns from the
tFileInputDelimited component.
7.
Click Edit schema to describe manually the data structure of this processing component.
2014
In this scenario, we want to convert a string type data into an integer type and a float type data into an integer
type.
Click OK to close the [Schema of tConvertType] dialog box.
8.
9.
In the Schema editor panel of the Map editor, click the plus button of the output table to add two rows and
name them to StringToInteger and Sum.
10. In the Map editor, drag the StringToInteger row from the input table to the StringToInteger row in the output
table.
11. In the Map editor, drag each of the IntegerField and the FloatToInteger rows from the input table to the Sum
row in the output table and click OK to close the Map editor.
2015
12. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information, see section tLogRow.
2.
The string type data is converted into an integer type and displayed in the StringToInteger column on the
console. The float type data is converted into an integer and added to the IntegerField value to give the
addition result in the Sum column on the console.
2016
The sample data used in this scenario is the same as in the scenario explained earlier.
3;123;456.21
Since Talend Studio allows you to convert a Job between its Map/Reduce and Standard (Non Map/Reduce)
versions, you can convert the previous scenario to create this Map/Reduce Job. This way, many components used
can keep their original settings so as to reduce your workload in designing this Job.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the
Hadoop distribution to be used. Then proceed as follows:
In the Repository tree view of the Integration perspective of Talend Studio, right-click the Job you have
created in the earlier scenario to open its contextual menu and select Edit properties.
Then the [Edit properties] dialog box is displayed. Note that the Job must be closed before you are able to
make any changes in this dialog box.
This dialog box looks like the image below:
2017
Note that you can change the Job name as well as the other descriptive information about the Job from this
dialog box.
2.
Click Convert to Map/Reduce Job. Then a Map/Reduce Job using the same name appears under the Map/
Reduce Jobs sub-node of the Job Design node.
If you need to create this Map/Reduce Job from scratch, you have to right-click the Job Design node or the Map/
Reduce Jobs sub-node and select Create Map/Reduce Job from the contextual menu. Then an empty Job is
opened in the workspace. For further information, see the section describing how to create a Map/Reduce Job of
the Talend Open Studio for Big Data Getting Started Guide.
Double-click this new Map/Reduce Job to open it in the workspace. The Map/Reduce components' Palette is
opened accordingly and in the workspace, the crossed-out components, if any, indicate that those components
do not have the Map/Reduce version.
2.
Right-click each of those components in question and select Delete to remove them from the workspace.
3.
Drop a tHDFSInput component in the workspace. The tHDFSInput component reads data from the Hadoop
distribution to be used.
If from scratch, you have to drop tConvertType, tMap and tLogRow, too.
2018
4.
Connect tHDFSInput to tConvertType using the Row > Main link and accept to get the schema of
tConvertType.
Click Run to open its view and then click the Hadoop Configuration tab to display its view for configuring
the Hadoop connection for this Job.
This view looks like the image below:
2.
From the Property type list, select Built-in. If you have created the connection to be used in Repository,
then select Repository and thus the Studio will reuse that set of connection information for this Job.
For further information about how to create an Hadoop connection in Repository, see the chapter describing
the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide.
3.
In the Version area, select the Hadoop distribution to be used and its version. If you cannot find from the list
the distribution corresponding to yours, select Custom so as to connect to a Hadoop distribution not officially
supported in the Studio.
For a step-by-step example about how to use this Custom option, see section Connecting to a custom Hadoop
distribution.
Note that if you use Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
4.
In the Name node field, enter the location of the master node, the NameNode, of the distribution to be used.
For example, hdfs://talend-cdh4-namenode:8020.
5.
In the Job tracker field, enter the location of the JobTracker of your distribution. For example, talend-cdh4namenode:8021.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs described in
Apache's documentation on http://hadoop.apache.org/.
6.
If the distribution to be used requires Kerberos authentication, select the Use Kerberos authentication check
box and complete the authentication details. Otherwise, leave this check box clear.
2019
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains
pairs of Kerberos principals and encrypted keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal designates but
must have the right to read the keytab file being used. For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in this situation, ensure that user1 has the right to read
the keytab file to be used.
7.
In the User name field, enter the login user name for your distribution. If you leave it empty, the user name
of the machine hosting the Studio will be used.
8.
In the Temp folder field, enter the path in HDFS to the folder where you store the temporary files generated
during Map/Reduce computations.
9.
Leave the default value of the Path separator in server as it is, unless you have changed the separator used
by your Hadoop distribution's host machine for its PATH variable or in other words, that separator is not a
colon (:). In that situation, you must change this value to the one you are using in that host.
10. Leave the Clear temporary folder check box selected, unless you want to keep those temporary files.
11. If the Hadoop distribution to be used is Hortonworks Data Platform V1.2 or Hortonworks Data Platform
V1.3, you need to set proper memory allocations for the map and reduce computations to be performed by
the Hadoop system.
In that situation, you need to enter the values you need to in the Mapred job map memory mb and
the Mapred job reduce memory mb fields, respectively. By default, the values are both 1000 which are
normally appropriate for running the computations.
For further information about this Hadoop Configuration tab, see the section describing how to configure the
Hadoop connection for a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting Started Guide.
Configuring components
Configuring tHDFSInput
1.
2.
2020
Click the
defined.
button next to Edit schema to verify that the schema received in the earlier steps is properly
Note that if you are creating this Job from scratch, you need to click the
button to manually define
the schema; otherwise, if the schema has been defined in Repository, you can select the Repository option
from the Schema list in the Basic settings view to reuse it. For further information about how to define a
schema in Repository, see the chapter describing metadata management in the Talend Studio User Guide or
the chapter describing the Hadoop cluster node in Repository of Talend Open Studio for Big Data Getting
Started Guide.
3.
If you make changes in the schema, click OK to validate these changes and accept the propagation prompted
by the pop-up dialog box.
4.
In the Folder/File field, enter the path, or browse to the source file you need the Job to read.
If this file is not in the HDFS system to be used, you have to place it in that HDFS, for example, using
tFileInputDelimited and tHDFSOutput in a Standard Job.
This component keeps its both Basic settings and Advanced settings used by the original Job. Therefore, as
its original one does, it converts the string type and the float type into integer.
Reviewing tMap
Double-click tMap to open its editor. The mapping configuration remains as it is in the original Job, that
is to say, to output the converted StringtoInteger column and to make the sum of the IntegerField and the
FloatToInteger columns.
2021
If you want to configure the presentation mode on its Component view, double-click the tLogRow
component of interest to open the Component view and in the Mode area, then, select the Table (print
values in cells of a table) option.
2.
During the execution, the Run view is automatically opened, where you can read how this Job progresses,
including the status of the Map/Reduce computation the Job is performing.
In the meantime in the workspace, progress bars automatically appear under the components performing Map/
Reduce to graphically show the same status of the Map/Reduce computation.
2022
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
2023
tDenormalize
tDenormalize
tDenormalize Properties
Component family
Processing/Fields
Function
Purpose
Basic settings
To denormalize
Advanced settings
tStatCatcher Statistics
Usage
Select this ckeck box to collect the log data at component level.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
n/a
2024
Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to the design
workspace.
Connect the components using Row main connections.
On the tFileInputDelimited Component view, set the filepath to the file to be denormalized.
In the Basic settings of tDenormalize, define the column that contains multiple values to be grouped.
In this use case, the column to denormalize is Children.
Set the Delimiter to separate the grouped values. Beware as only one column can be denormalized.
Select the Merge same value check box, if you know that some values to be grouped are strictly identical.
Save your Job and press F6 to execute it.
2025
All values from the column Children (set as column to denormalize) are grouped by their Fathers column. Values
are separated by a comma.
Drop the following components: tFileInputDelimited, tDenormalize, tLogRow from the Palette to the design
workspace.
Connect all components using a Row main connection.
On the tFileInputDelimited Basic settings panel, set the filepath to the file to be denormalized.
Define the Row and Field separators, the Header and other information if required.
The file schema is made of four columns including: Name, FirstName, HomeTown, WorkTown.
2026
In the tDenormalize component Basic settings, select the columns that contain the repetition. These are the
column which are meant to occur multiple times in the document. In this use case, FirstName, HomeCity and
WorkCity are the columns against which the denormalization is performed.
Add as many line to the table as you need using the plus button. Then select the relevant columns in the dropdown list.
In the Delimiter column, define the separator between double quotes, to split concanated values. For FirstName
column, type in #, for HomeCity, type in , ans for WorkCity, type in .
Save your Job and press F6 to execute it.
This time, the console shows the results with no duplicate instances.
2027
tDenormalizeSortedRow
tDenormalizeSortedRow
tDenormalizeSortedRow properties
Component family
Processing/Fields
Function
tDenormalizeSortedRow combines in a group all input sorted rows. Distinct values of the
denormalized sorted row are joined with item separators.
Purpose
Basic settings
To denormalize
Advanced settings
tStatCatcher Statistics
Select this ckeck box to collect the log data at component level.
Usage
This component handles flows of data therefore it requires input and output components.
Limitation
n/a
2028
Click the Component tab to define the basic settings for tFileInputDelimited.
If needed, define row and field separators, header and footer, and the number of processed rows.
Set Schema to Built in and click the three-dot button next to Edit Schema to define the data to pass on to the
next component. The schema in this example consists of two columns, id and name.
2029
Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tFileInputDelimited
component.
In the Criteria panel, use the plus button to add a line and set the sorting parameters for the schema column to
be processed. In this example we want to sort the id columns in ascending order.
In the design workspace, select tDenormalizeSortedRow.
Click the Component tab to define the basic settings for tDenormalizeSortedRow.
Set the Schema Type to Built-In and click Sync columns to retrieve the schema from the tSortRow component.
In the Input rows countfield, enter the number of the input rows to be processed or press Ctrl+Space to access
the context variable list and select the variable: tFileInputDelimited_1_NB_LINE.
In the To denormalize panel, use the plus button to add a line and set the parameters to the column to be
denormalize. In this example we want to denormalize the name column.
In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information about tLogRow, see section tLogRow.
Save your Job and press F6 to execute it.
2030
The result displayed on the console shows how the name column was denormalize.
2031
tExternalSortRow
tExternalSortRow
tExternalSortRow properties
Component family
Processing
Function
Uses an external sort application to sort input data based on one or several columns, by sort type
and order
Purpose
Basic settings
File Name
Field separator
External command sort Enter the path to the external file containing the sorting algorithm
path
to use.
Criteria
Click the plus button to add as many lines as required for the sort
to be complete. By default the first column defined in your schema
is selected.
Schema column: Select the column label from your schema,
which the sort will be based on. Note that the order is essential as
it determines the sorting priority.
Sort type: Numerical and Alphabetical order are proposed. More
sorting types to come.
Order: Ascending or descending order.
Advanced settings
Maximum memory
Temporary directory
Set temporary input file Select the check box to activate the field in which you can specify
directory
the directory to handle your temporary input file.
Add a dummy EOF line
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component handles flow of data therefore it requires input and output, hence is defined as
an intermediary step.
Limitation
n/a
2032
Related scenario
Related scenario
For related use case, see section tSortRow.
2033
tExtractDelimitedFields
tExtractDelimitedFields
tExtractDelimitedFields properties
Component family
Processing/Fields
Function
Purpose
tExtractDelimitedFields helps to extract fields from within a string to write them elsewhere
for example.
Basic settings
Field to split
Ignore NULL as the source Select this check box to ignore the Null value in the source data.
data
Clear this check box to generate the Null records that correspond
to the Null value in the source data.
Field separator
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject
link.
Advanced settings
Advanced
number)
separator
Trim column
(for Select this check box to modify the separators used for numbers.
Select this check box to remove leading and trailing whitespace
from all columns.
Check each row structure Select this check box to synchronize every row against the input
against schema
schema.
Global Variables
Validate date
Select this check box to check the date format strictly against the
input schema.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
2034
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component handles flow of data therefore it requires input and output components. It allows
you to extract data from a delimited field, using a Row > Main link, and enables you to create
a reject flow filtering data which type does not match the defined type.
Limitation
n/a
Drop the following components from the Palette onto the design workspace: tFileInputDelimited,
tExtractDelimitedFields, and tLogRow.
2.
2.
2035
3.
Click the [...] button next to the File Name field to select the path to the input file.
The File Name field is mandatory.
The input file used in this scenario is called test5. It is a text file that holds comma-delimited data.
4.
In the Basic settings view, fill in all other fields as needed. For more information, see section
tFileInputDelimited. In this scenario, the header and the footer are not set and there is no limit for the number
of processed rows
5.
Click Edit schema to describe the data structure of this input file. In this scenario, the schema is made of
one column, name.
6.
7.
From the Field to split list, select the column to split, name in this scenario.
8.
9.
Click Edit schema to describe the data structure of this processing component.
10. In the output panel of the [Schema of tExtractDelimitedFields] dialog box, click the plus button to add two
columns for the output schema, firstname and lastname.
2036
In this scenario, we want to split the name column into two columns in the output flow, firstname and
lastname.
11. Click OK to close the [Schema of tExtractDelimitedFields] dialog box.
12. In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information, see section tLogRow.
2.
2037
tExtractEBCDICFields
tExtractEBCDICFields
tExtractEBCDICFields properties
Component family
Processing/Fields
Function
Purpose
tExtractEBCDICFields allows you to use regular expressions to extract data from a formatted
string.
Basic settings
Sync columns
Advanced settings
Field
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject
connection.
Encoding
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Usage
This component handles flow of data therefore it requires input and output components. It allows
you to extract data from a delimited field, using a Row > Main link, and enables you to create
a reject flow filtering data which type does not match the defined type.
Limitation
n/a
Related scenario
For a related scenario, see section Scenario: Extracting name, domain and TLD from e-mail addresses.
2038
tExtractJSONFields
tExtractJSONFields
tExtractJSONFields properties
Component family
Processing/Fields
Function
tExtractJSONFields extracts the desired data from incoming JSON fields based on the XPath
query.
If you have subscribed to one of the Talend solutions with Big Data, you are able to use
this component in a Talend Map/Reduce Job to generate Map/Reduce code. In that situation,
tExtractJSONFields belongs to the MapReduce component family.
Purpose
tExtractJSONFields extracts the data from JSON fields stored in a file, a database table, etc.,
based on the XPath query.
Basic settings
Property type
JSON field
Mapping
Die on error
Select this check box to throw exceptions and kill the Job during
the extraction process.
Clear this check box to show error alerts (instead of exceptions)
on the console and continue the Job execution. In this case, error
messages can be collected via a Row > Reject link.
Advanced settings
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for database data handling.
tStatCatcher Statistics
Usage
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
2039
Scenario 1: Retrieving error messages while extracting data from JSON fields
You need to use the Hadoop Configuration tab in the Run view to define the connection to
a given Hadoop distribution for the whole Job.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big
Data Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce
Jobs.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and
Upgrade Guide.
Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tWriteJSONField, tExtractJSONFields, and tLogRow (X2). The two tLogRow components are renamed
as data_extracted and reject_info.
2.
3.
4.
5.
2040
Scenario 1: Retrieving error messages while extracting data from JSON fields
2.
Click the [+] button to add three columns, namely firstname, lastname and dept, with the type of string.
Click OK to close the editor.
3.
Select Use Inline Content and enter the data below in the Content box:
Andrew;Wallace;Doc
John;Smith;R&D
Christian;Dior;Sales
2041
Scenario 1: Retrieving error messages while extracting data from JSON fields
2.
In the Linker target panel, click the default rootTag and type in staff, which is the root node of the JSON
field to be generated.
4.
Right-click staff and select Add Sub-element from the context menu.
5.
Repeat the steps to add two more sub-nodes, namely lastname and dept.
2042
Scenario 1: Retrieving error messages while extracting data from JSON fields
6.
Right-click firstname and select Set As Loop Element from the context menu.
7.
Drop firstname from the Linker source panel to its counterpart in the Linker target panel.
In the pop-up dialog box, select Add linker to target node.
9.
10. Click the [+] button in the right panel to add one column, namely staff, which will hold the JSON data
generated.
Click OK to close the editor.
2043
Scenario 1: Retrieving error messages while extracting data from JSON fields
2.
3.
Click the [+] button in the right panel to add three columns, namely firstname, lastname and dept, which will
hold the data of their counterpart nodes in the JSON field staff.
Click OK to close the editor.
4.
In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
5.
In the Loop XPath query field, enter "/staff", which is the root node of the JSON data.
2044
6.
In the Mapping area, type in the node name of the JSON data under the XPath query part. The data of those
nodes will be extracted and passed to their counterpart columns defined in the output schema.
7.
Specifically, define the XPath query "firstname" for the column firstname, "lastname" for the column
lastname, and "" for the column dept. Note that "" is not a valid XPath query and will lead to execution errors.
2.
Select Table (print values in cells of a table) for a better display of the results.
3.
Perform the same setup on the other tLogRow component, namely reject_info.
2.
As shown above, the reject row offers such details as the data extracted, the JSON fields whose data is not
extracted and the cause of the extraction failure.
2045
Drop the following components from the Palette onto the design workspace: tFileInputJSON,
tExtractJSONFields and tLogRow.
2.
3.
2.
2046
Click the [+] button to add one column, namely friends, of the String type.
Click OK to close the editor.
3.
Click the [...] button to browse for the JSON file, facebook.json in this case:
{ "user": { "id": "9999912398",
"name": "Kelly Clarkson",
"friends": [
{ "name": "Tom Cruise",
"id": "55555555555555",
"likes": {
"data": [
{ "category": "Movie",
"name": "The Shawshank Redemption",
"id": "103636093053996",
"created_time": "2012-11-20T15:52:07+0000"
},
{ "category": "Community",
"name": "Positiveretribution",
"id": "471389562899413",
"created_time": "2012-12-16T21:13:26+0000"
}
]
}
},
{ "name": "Tom Hanks",
"id": "88888888888888"
"likes": {
"data": [
{ "category": "Journalist",
"name": "Janelle Wang",
"id": "136009823148851",
"created_time": "2013-01-01T08:22:17+0000"
},
{ "category": "Tv show",
"name": "Now With Alex Wagner",
"id": "305948749433410",
"created_time": "2012-11-20T06:14:10+0000"
}
]
}
}
]
}
}
4.
5.
2047
6.
7.
Click the [+] button in the right panel to add five columns, namely id, name, like_id, like_name and
like_category, which will hold the data of relevant nodes in the JSON field friends.
Click OK to close the editor.
8.
In the pop-up Propagate box, click Yes to propagate the schema to the subsequent components.
9.
10. In the Mapping area, type in the queries of the JSON nodes in the XPath query column. The data of those
nodes will be extracted and passed to their counterpart columns defined in the output schema.
2048
11. Specifically, define the XPath query "../../id" (querying the "/friends/id" node) for the column id, "../../name"
(querying the "/friends/name" node) for the column name, "id" for the column like_id, "name" for the column
like_name, and "category" for the column like_category.
12. Double-click tLogRow to display its Basic settings view.
13. Select Table (print values in cells of a table) for a better display of the results.
2.
As shown above, the friends data of the Facebook user Kelly Clarkson is extracted correctly.
2049
tExtractPositionalFields
tExtractPositionalFields
tExtractPositionalFields properties
Component family
Processing/Fields
Function
tExtractPositionalFields generates multiple columns from one column using positional fields.
Purpose
tExtractPositionalFields allows you to use a positional pattern to extract data from a formatted
string.
Basic settings
Field
Ignore NULL as the source Select this check box to ignore the Null value in the source data.
data
Clear this check box to generate the Null records that correspond
to the Null value in the source data.
Customize
Select this check box to customize the data format of the positional
file and define the table columns:
Column: Select the column you want to customize.
Size: Enter the column size.
Padding char: Type in between inverted commas the padding
character used, in order for it to be removed from the field. A space
by default.
Alignment: Select the appropriate alignment parameter.
Pattern
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject link.
Advanced settings
Advanced separator (for Select this check box to modify the separators used for numbers.
number)
Trim Column
Check each row structure Select this check box to synchronize every row against the input
against schema
schema.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Global Variables
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
2050
Related scenario
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
This component handles flow of data therefore it requires input and output components. It allows
you to extract data from a delimited field, using a Row > Main link, and enables you to create
a reject flow filtering data which type does not match the defined type.
Limitation
n/a
Related scenario
For a related scenario, see section Scenario: Extracting name, domain and TLD from e-mail addresses.
2051
tExtractRegexFields
tExtractRegexFields
tExtractRegexFields properties
Component family
Processing/Fields
Function
tExtractRegexFields generates multiple columns from a given column using regex matching.
Purpose
tExtractRegexFields allows you to use regular expressions to extract data from a formatted
string.
Basic settings
Field to split
Regex
Advanced settings
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject link.
Check each row structure Select this check box to synchronize every row against the input
against schema
schema.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Usage
This component handles flow of data therefore it requires input and output components. It allows
you to extract data from a delimited field, using a Row > Main link, and enables you to create
a reject flow filtering data which type does not match the defined type.
Limitation
n/a
2052
name, domain and TLD are extracted and displayed on the console in three separate columns. Data in the other
two input columns, id and age is extracted and routed to destination as well.
Drop the following components from the Palette onto the design workspace: tFileInputDelimited,
tExtractRegexFields, and tLogRow.
2.
Connect tFileInputDelimited to tExtractRegexFields using a Row > Main link, and do the same to connect
tExtractRegexFields to tLogRow.
Double-click the tFileInputDelimited component to open its Basic settings view in the Component tab.
2.
Click the [...] button next to the File name/Stream field to browse to the file where you want to extract
information from.
The input file used in this scenario is called test4. It is a text file that holds three columns: id, email, and age.
id;email;age
1;anna@yahoo.net;24
2;diana@sohu.com;31
3;fiona@gmail.org;20
Click Edit schema to define the data structure of this input file.
4.
2053
5.
Select the column to split from the Field to split list: email in this scenario.
6.
Enter the regular expression you want to use to perform data matching in the Regex panel. In this scenario, the
regular expression "([a-z]*)@([a-z]*).([a-z]*)" is used to match the three parts of an email address:
user name, domain name and TLD name.
For more information about the regular expression, see http://en.wikipedia.org/wiki/Regular_expression.
7.
Click Edit schema to open the [Schema of tExtractRegexFields] dialog box, and click the plus button to
add five columns for the output schema.
In this scenario, we want to split the input email column into three columns in the output flow, name, domain,
and tld. The two other input columns will be extracted as they are.
8.
9.
2.
2054
The tExtractRegexFields component matches all given e-mail addresses with the defined regular expression and
extracts the name, domain, and TLD names and displays them on the console in three separate columns. The two
other columns, id and age, are extracted as they are.
2055
tExtractXMLField
tExtractXMLField
tExtractXMLFieldbelongs to two component families: Processing and XML. For more information on
tExtractXMLField, see section tExtractXMLField.
2056
tFilterColumns
tFilterColumns
tFilterColumns Properties
Component family
Processing
Function
Makes specified changes to the schema defined, based on column name mapping.
Purpose
Helps homogenize schemas either on the columns order or by removing unwanted columns or
adding new columns.
Basic settings
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component is not startable (green background) and it requires an output component.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Related Scenario
For more information regarding the tFilterColumns component in use, see section Scenario 1: Multiple
replacements and column filtering.
2057
tFilterRow
tFilterRow
tFilterRow Properties
Component family
Processing
Function
Purpose
Basic settings
to In the case you want to combine simple filtering and advanced mode,
select the operator to combine both modes.
Click the plus button to add as many conditions as needed. The
conditions are performed one after the other for each row.
Input column: Select the column of the schema the function is to be
operated on
Function: Select the function on the list
Operator: Select the operator to bind the input column with the value
Value: Type in the filtered value, between quotes if need be.
Select this check box when the operation you want to perform cannot
be carried out through the standard functions offered. In the text field,
type in the regular expression as required.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
This component is not startable (green background) and it requires an output component.
Usage in Map/Reduce Jobs If you have subscribed to one of the Talend solutions with Big Data, you can also use this component
as a Map/Reduce component. In a Talend Map/Reduce Job, this component is used as an intermediate
step and other components used along with it must be Map/Reduce components, too. They generate
native Map/Reduce code that can be executed directly in Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to create,
convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting
Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard
Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
2058
characters; the second will list all rejected records. An error message for each rejected record will display in the
same table to explain why such a record has been rejected.
Drop tFixedFlowInput, tFilterRow and tLogRow from the Palette onto the design workspace.
Connect the tFixedFlowInput to the tFilterRow, using a Row > Main link. Then, connect the tFilterRow to
the tLogRow, using a Row > Filter link.
Drop tLogRow from the Palette onto the design workspace and rename it as reject. Then, connect the
tFilterRow to the reject, using a Row > Reject link.
Double-click tFixedFlowInput to display its Basic settings view and define its properties.
Select the Use Inline Content(delimited file) option in the Mode area to define the input mode.
Set the row and field separators in the corresponding fields. The row separator is a carriage return and the field
separator is a semi-colon.
Click the three-dot button next to Edit schema to define the schema for the input file. In this example, the
schema is made of the following four columns: firstname, gender, language and frequency. In the Type column,
select String for the first three rows and select Integer for frequency.
Click OK to validate and close the editor. A dialog box opens and asks you if you want to propagate the schema.
Click Yes.
2059
Type in content in the Content multiline textframe according to the setting in the schema.
Double-click tFilterRow to display its Basic settings view and define its properties.
In the Conditions table, fill in the filtering parameters based on the firstname column.
In InputColumn, select firstname, in Function, select Length, in Operator, select Lower than.
In the Value column, type in 6 to filter only first names of which length is lower than six characters.
In the Value column, you must type in your values between double quotes for all data types, except for the Integer type,
which does not need quotes.
Then to implement the search on names whose language is italian, select the Use advanced mode check
box and type in the following regular expression that includes the name of the column to be searched:
input_row.language.equals("italian")
To combine both conditions (simple and advanced), select And as logical operator for this example.
In the Basic settings of tLogRow components, select Table (print values in cells of a table) in the Mode area.
Save your Job and press F6 to execute it.
Thus, the first table lists records that have Italian names made up of less than six characters and the second table
lists all records that do not match the filter condition rejected record. Each rejected record has a corresponding
error message that explains the reason of rejection.
2060
tJoin
tJoin
tJoin properties
Component family
Processing
Function
tJoin joins two tables by doing an exact match on several columns. It compares columns from
the main flow with reference columns from the lookup flow and outputs the main flow data and/
or the rejected data.
Purpose
This component helps you ensure the data quality of any source data against a reference data
source.
Basic settings
Include lookup columns in Select this check box to include the lookup columns you define in
output
the output flow.
Key definition
Select the column(s) from the main flow that needs to be checked
against the reference (lookup) key column.
Select the lookup key columns that you will use as a reference
against which to compare the columns from the input flow.
Inner join
output)
(with
reject Select this check box to join the two tables first and gather the
rejected data from the main flow.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is not startable and it requires two input components and one or more output
components.
Limitation/prerequisite
n/a
This scenario describes a five-component Job aiming at carrying out an exact match between the firstnameClient
column of an input file against the data of the reference input file, and the lastnameClient column against the data
of the reference input file. The outputs of this exact match are written in two separate files: exact data are written
in an Excel file, and inaccurate data are written in a delimited file.
2061
Scenario 1: Doing an exact match on two columns and outputting the main and rejected data
In the Repository tree view, expand Metadata and the file node where you have stored the input schemas
and drop the relevant file onto the design workspace.
The [Components] dialog box appears.
2.
Select tFileInputDelimited from the list and click OK to close the dialog box.
The tFileInputDelimited component displays in the workspace. The input file used in this scenario is called
ClientSample. It holds four columns including the two columns firstnameClient and lastnameClient we want
to do the exact match on.
3.
Do the same for the second input file you want to use as a reference, ClientSample_Update in this scenario.
4.
Drop the following components from the Palette onto the design workspace: tJoin, tFileOutputExcel, and
tFileOutputDelimited.
5.
Connect the main and reference input files to tJoin using Main links. The link between the reference input
file and tJoin appears as a lookup link on the design workspace.
6.
Connect tJoin to tFileOutputExcel using the Main link and tJoin to tFileOutputDelimited using the Inner
join reject link.
2062
Scenario 1: Doing an exact match on two columns and outputting the main and rejected data
If needed, double-click the main and reference input files to display their Basic settings views. All their
property fields are automatically filled in. If you do not define your input files in the Repository, fill in the
details manually after selecting Built-in in the Property Type field.
2.
Double click tJoin to display its Basic settings view and define its properties.
3.
Click the Edit schema button to open a dialog box that displays the data structure of the input files, define
the data you want to pass to the output components, three columns in this scenario, idClient, firstnameClient
and lastnameClient, and then click OK to validate the schema and close the dialog box.
4.
In the Key definition area of the Basic settings view of tJoin, click the plus button to add two columns to
the list and then select the input columns and the output columns you want to do the exact matching on from
the Input key attribute and Lookup key attribute lists respectively, firstnameClient and lastnameClient
in this example.
5.
Select the Inner join (with reject output) check box to define one of the outputs as inner join reject table.
6.
Double click tFileOutputExcel to display its Basic settings view and define its properties.
2063
Scenario 1: Doing an exact match on two columns and outputting the main and rejected data
7.
Set the destination file name and the sheet name, and select the Include header check box.
8.
Double click tFileOutputDelimited to display its Basic settings view and define its properties.
9.
Set the destination file name, and select the Include header check box.
2.
Press F6, or click Run on the Run tab to execute the Job.
2064
Scenario 1: Doing an exact match on two columns and outputting the main and rejected data
The output of the exact match on the firstnameClient and lastnameClient columns is written to the defined
Excel file.
2065
tMap
tMap
tMap properties
Component family
Processing
Function
Purpose
tMap transforms and routes data from single or multiple sources to single or multiple destinations.
Basic settings
Map editor
Advanced settings
Enter the path where you want to store the temporary data generated for
lookup loading. For more information on this folder, see Talend Studio
User Guide.
Preview
Ignore trailing
BigDecimal
zeros
tStatCatcher Statistics
Usage
Usage
Jobs
for Select this check box to ignore trailing zeros for BigDecimal data.
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Possible uses are from a simple reorganization of fields to the most complex Jobs of data multiplexing
or demultiplexing transformation, concatenation, inversion, filtering and more...
in
Map/Reduce If you have subscribed to one of the Talend solutions with Big Data, you can also use this component as
a Map/Reduce component. In a Talend Map/Reduce Job, this component is used as an intermediate step
and other components used along with it must be Map/Reduce components, too. They generate native
Map/Reduce code that can be executed directly in Hadoop.
As explained earlier, If you need to use multiple expression keys to join different input tables, use
mutiple tMap components one after another.
For further information about a Talend Map/Reduce Job, see the sections describing how to create,
convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting Started
Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard
Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
The use of tMap supposes minimum Java knowledge in order to fully exploit its functionalities.
This component is a junction step, and for this reason cannot be a start nor end component in the Job.
2066
The Job described below aims at reading data from a csv file with its schema stored in the Repository, looking up
at a reference file, the schema of which is also stored in the Repository, then extracting data from these two files
based on a defined filter to an output file and reject files.
Drop two tFileInputDelimited components, tMap and three tFileOutputDelimited components onto the
design workspace.
2.
Rename the two tFileInputDelimited components as Cars and Owners, either by double-clicking the label
in the design workspace or via the View tab of the Component view.
3.
Connect the two input components to tMap using Row > Main connections and label the connections as
Cars_data and Owners_data respectively.
4.
Connect tMap to the three output components using Row > New Output (Main) connections and name the
output connections as Insured, Reject_NoInsur and Reject_OwnerID respectively.
Double-click the tFileInputDelimited component labelled Cars to display its Basic settings view.
2067
2.
Select Repository from the Property type list and select the components schema, cars in this scenario, from
the [Repository Content] dialog box. The rest fields are automatically filled.
3.
Double-click the component labelled Owners and repeat the setting operation. Select the appropriate metadata
entry, owners in this scenario.
4.
5.
Create a join between the two tables on the ID_Owner column by simply dropping the ID_Owner column
from the Cars_data table onto the ID_Owner column in the Owners_data table.
6.
Define this join as an inner join by clicking the tMap settings button, clicking in the Value field for Join
Model, clicking the small button that appears in the field, and selecting Inner Join from the [Options] dialog
box.
7.
Drag all the columns of the Cars_data table to the Insured table.
2068
8.
Drag the ID_Owner, Registration, and ID_Reseller columns of the Cars_data table and the Name column of
the Owners_data table to the Reject_NoInsur table.
9.
Drag all the columns of the Cars_data table to the Reject_OwnerID table.
For more information regarding data mapping, see Talend Studio User Guide.
10. Click the plus arrow button at the top of the Insured table to add a filter row.
Drag the ID_Insurance column of the Owners_data table to the filter condition area and enter the formula
meaning not undefined: Owners_data.ID_Insurance != null.
With this filter, the Insured table will gather all the records that include an insurance ID.
11. Click the tMap settings button at the top of the Reject_NoInsur table and set Catch output reject to true to
define the table as a standard reject output flow to gather the records that do not include an insurance ID.
12. Click the tMap settings button at the top of the Reject_OwnerID table and set Catch lookup inner join reject
to true so that this output table will gather the records from the Cars_data flow with missing or unmatched
owner IDs.
2069
Select the Include header check box to reuse the column labels from the schema as header row in the output
file.
2.
2070
This scenario, based on scenario 1, adds one input file containing details about resellers and extra fields in the
main output table. Two filters on inner joins are added to gather specific rejections.
2.
Connect it to the Mapper using a Row > Main connection, and label the connection as Resellers_data.
3.
Connect the tMap component to the new tFileOutputDelimited component by using the Row connection
named Reject_ResellerID.
2071
2.
Select Repository from the Property type list and select the components schema, resellers in this scenario,
from the [Repository Content] dialog box. The rest fields are automatically filled.
3.
4.
Create a join between the main input flow and the new input flow by dropping the ID_Reseller column of
the Cars_data table to the ID_Reseller column of the Resellers_data table.
5.
Click the tMap settings button at the top of the Resellers_data table and set Join Model to Inner Join.
6.
Drag all the columns except ID_Reseller of the Resellers_data table to the main output table, Insured.
2072
When two inner joins are defined, you either need to define two different inner join reject tables to differentiate the two
rejections or, if there is only one inner join reject output, both inner join rejections will be stored in the same output.
7.
Click the [+] button at the top of the output area to add a new output table, and name this new output table
Reject_ResellerID.
8.
Drag all the columns of the Cars_data table to the Reject_ResellerID table.
9.
Click the tMap settings button and select Catch lookup inner join reject to true to define this new output
table as an inner join reject output.
If the defined inner join cannot be established, the information about the relevant cars will be gathered through
this output flow.
10. Now apply filters on the two Inner Join reject outputs, in order for to distinguish the two types of rejection.
2073
In the first Inner Join output table, Reject_OwnerID, click the plus arrow button to add a filter line and fill it
with the following formula to gather only owner ID related rejection: Owners_data.ID_Owner==null
11. In the second Inner Join output table, Reject_ResellerID, repeat the same operation using the following
formula: Resellers_data.ID_Reseller==null
Click OK to validate the map settings and close the Mapper Editor.
12. Double-click the No_Reseller_ID component to display its Basic settings view.
Specify the output file path and select the Include Header check box, and leave the other parameters as
they are.
13. To demonstrate the work of the Mapper, in this example, remove reseller IDs 5 and 8 from the input file
Resellers.csv.
2074
2.
This scenario introduces a Job that allows you to find BMW owners who have two to six children (inclusive), for
sales promotion purpose for example.
2.
Connect the input components to the tMap using Row > Main connections.
Pay attention to the file you connect first as it will automatically be set as Main flow, and all the other
connections will be Lookup flows. In this example, the connection for the input component Owners is the
Main flow.
2075
Define the properties of each input components in the respective Basic settings view. Define the properties
of Owners.
2.
Select Repository from the Property type list and select the components schema, owners in this scenario,
from the [Repository Content] dialog box. The rest fields are automatically filled.
In the same way, set the properties of the other input components: Cars and Resellers. These two Lookup
flows will fill in secondary (lookup) tables in the input area of the Map Editor.
3.
Then double-click the tMap component to launch the Map Editor and define the mappings and filters.
Set an explicit join between the Main flow Owner and the Lookup flow Cars by dropping the ID_Owner
column of the Owners table to the ID_Owner column of the Cars table.
The explicit join is displayed along with a hash key.
2076
4.
In the Expr. Key field of the Make column, type in a filter. In this use case, simply type in BMW as the
search is focused on the owners of this particular make.
5.
Implement a cascading join between the two lookup tables Cars and Resellers on the ID_Reseller column
in order to retrieve resellers information.
6.
As you want to reject the null values into a separate table and exclude them from the standard output, click
the tMap settings button and set Join Model to Inner Join in each of the Lookup tables.
2077
7.
In the tMap settings, you can set Match Model to Unique match, First match, or All matches. In this
use case, the All matches option is selected. Thus if several matches are found in the Inner Join, i.e. rows
matching the explicit join as well as the filter, all of them will be added to the output flow (either in rejection
or the regular output).
The Unique match option functions as a Last match. The First match and All matches options function as named.
8.
On the output area of the Map Editor, click the plus button to add two tables, one for the full matches and
the other for the rejections.
9.
Drag all the columns of the Owners table, the Registration, Make and Color columns of the Cars table, and
the ID_Reseller and Name_Reseller columns of the Resellers table to the main output table.
10. Drag all the columns of the Owners table to the reject output table.
11. Click the Filter button at the top of the main output table to display the Filter expression area.
Type in a filter statement to narrow down the number of rows loaded in the main output flow. In this use
case, the statement reads: Owners.Children_Nr >= 2 && Owners.Children_Nr <= 6.
12. In the reject output table, click the tMap settings button and set the reject types.
Set Catch output reject to true to collect data about BMW car owners who have less than two or more
than six children.
Set Catch lookup inner join reject to true to collect data about owners of other car makes and owners for
whom the reseller information is not found.
2078
2079
2.
Take the same Job as in section Scenario 4: Advanced mapping using filters, explicit joins and rejections.
2.
Drop a new tFileOutputDelimited component from the Palette on the design workspace, and name it
Rejects_BMW_Mercedes to present its functionality.
3.
Connect the tMap component to the new output component using a Row connection and label the connection
according to the functionality of the output component.
This connection label will appear as the name of the new output table in the Map Editor.
4.
Relabel the existing output connections and output components to reflect their functionality.
The existing output tables in the Map Editor will be automatically renamed according to the connection
labels. In this example, relabel the existing output connections BMW_Mercedes_withChildren and
Owners_Other_Makes respectively.
2080
Double-click the tMap component to launch the Map Editor to change the mappings and the filters.
Note that the output area contains a new, empty output table named Rejects_BMW_Mercedes. You can adjust
the position of the table by selecting it and clicking the Up or Down arrow button at the top of the output area.
2.
Remove the Expr. key filter (BMW) from the Cars table in the input area.
3.
Click the Filters button to display the Filter field, and type in a new filter to limit the search
to BMW or Mercedes car makes. The statement reads as follows: Cars.Make.equals("BMW") ||
Cars.Make.equals("Mercedes")
4.
Select all the columns of the main output table and drop them down to the new output table.
Alternatively, you can also drag the corresponding columns from the relevant input tables to the new output
table.
5.
Click the tMap settings button at the top of the new output table and set Catch output reject to true to
collect data about BMW and Mercedes owners who have less than two or more than six children.
2081
6.
In the Owners_Other_Makes table, set Catch lookup inner join reject to true to collect data about owners
of other car makes and owners for whom the reseller information is not found.
7.
8.
Define the properties of the output components in their respective Basic settings view.
In this use case, simple specify the output file paths and select the Include Header check box, and leave the
other parameters as they are.
2082
2.
The following scenario describes a Job that retrieves people details from a lookup database, based on a join on
the age. The main flow source data is read from a MySQL database table called people_age that contains people
details such as numeric id, alphanumeric first name and last name and numeric age. The people age is either 40
or 60. The number of records in this table is intentionally restricted.
The reference or lookup information is also stored in a MySQL database table called large_data_volume. This
lookup table contains a number of records including the city where people from the main flow have been to. For
the sake of clarity, the number of records is restricted but, in a normal use, the usefulness of the feature described
in the example below is more obvious for very large reference data volume.
To optimize performance, a database connection component is used in the beginning of the Job to open the
connection to the lookup database table in order not to do that every time we want to load a row from the lookup
table.
An Expression Filter is applied to this lookup source flow, in order to select only data from people whose age is
equal to 60 or 40. This way only the relevant rows from the lookup database table are loaded for each row from
the main flow.
Therefore this Job shows how, from a limited number of main flow rows, the lookup join can be optimized to load
only results matching the expression key.
Generally speaking, as the lookup loading is performed for each main flow row, this option is mainly interesting when a
limited number of rows is processed in the main flow while a large number of reference rows are to be looked up to.
The join is solved on the age field. Then, using the relevant loading option in the tMap component editor, the
lookup database information is loaded for each main flow incoming row.
This Job is formed with five components, four database components and a mapping component.
2083
Drop the DB Connection under the Metadata node of the Repository to the design workspace. In this
example, the source table is called people_age.
2.
Select tMysqlInput from the list that pops up when dropping the component.
3.
Drop the lookup DB connection table from the Metadata node to the design workspace selecting
tMysqlInput from the list that pops up. In this Job, the lookup is called large_data_volume.
4.
The same way, drop the DB connection from the Metadata node to the design workspace selecting
tMysqlConnection from the list that pops up. This component creates a permanent connection to the lookup
database table in order not to do that every time we want to load a row from the lookup table.
5.
Then pick the tMap component from the Processing family, and the tMysqlOutput and tMysqlCommit
components from the Database family in the Palette to the right hand side of the editor.
6.
Now connect all the components together. To do so, right-click the tMysqlInput component corresponding
to the people table and drag the link towards tMap.
7.
Release the link over the tMap component, the main row flow is automatically set up.
8.
Rename the Main row link to people, to identify more easily the main flow data.
9.
Perform the same operation to connect the lookup table (large_data_volume) to the tMap component and
the tMap to the tMysqlOutput component.
10. A dialog box prompts for a name to the output link. In this example, the output flow is named:
people_mixandmatch.
11. Rename also the lookup row connection link to large_volume, to help identify the reference data flow.
12. Connect tMysqlConnection to tMysqlInput using the trigger link OnSubjobOk.
13. Connect the tMysqlInput component to the tMysqlCommit component using the trigger link OnSubjobOk.
2084
2085
2.
The Output table (that was created automatically when you linked the tMap to the tMySQLOutput will be
formed by the matching rows from the lookup flow (large_data_volume) and the main flow (people_age).
Select the main flow rows that are to be passed on to the output and drag them over to paste them in the
Output table (to the right hand side of the mapping editor).
In this example, the selection from the main flow include the following fields: id, first_name, last_Name
and age.
From the lookup table, the following column is selected: city.
Drop the selected columns from the input tables (people and large_volume) to the output table.
3.
Now set up the join between the main and lookup flows.
Select the age column of the main flow table (on top) and drag it towards the age column of the lookup flow
table (large_volume in this example).
A key icon appears next to the linked expression on the lookup table. The join is now established.
4.
2086
Click the tMap settings button, click the three-dot button corresponding to Lookup Model, and select the
Reload at each row option from the [Options] dialog box in order to reload the lookup for each row being
processed.
5.
In the same way, set Match Model to All matches in the Lookup table, in order to gather all instances of
age matches in the output flow.
6.
Now implement the filtering, based on the age column, in the Lookup table. The GlobalMapKey field is
automatically created when you selected the Reload at each row option. Indeed you can use this expression
to dynamically filter the reference data in order to load only the relevant information when joining with the
main flow.
As mentioned in the introduction of the scenario, the main flow data contains only people whose age is either
40 or 60. To avoid the pain of loading all lookup rows, including ages that are different from 40 and 60, you
can use the main flow age as global variable to feed the lookup filtering.
2087
7.
Drop the Age column from the main flow table to the Expr. field of the lookup table.
8.
Then in the globalMap Key field, put in the variable name, using the expression. In this example, it reads:
people.Age
Click OK to save the mapping setting and go back to the design workspace.
9.
To finalize the implementation of the dynamic filtering of the lookup flow, you need now to add a WHERE
clause in the query of the database input.
10. At the end of the Query field, following the Select statement, type in the following WHERE clause: WHERE
AGE ='"+((Integer)globalMap.get("people.Age"))+"'"
11. Make sure that the type corresponds to the column used as variable. In this use case, Age is of Integer type.
And use the variable the way you set in the globalMap key field of the map editor.
12. Double-click the tMysqloutput component to define its properties.
2088
13. Select the Use an existing connection check box to leverage the created DB connection.
Define the target table name and relevant DB actions.
2.
Click the Run tab at the bottom of the design workspace, to display the Job execution tab.
3.
From the Debug Run view, click the Traces Debug button to view the data processing progress.
For more comfort, you can maximize the Job design view while executing by simply double-clicking on the
Job name tab.
The lookup data is reloaded for each of the main flows rows, corresponding to the age constraint. All age
matches are retrieved in the lookup rows and grouped together in the output flow.
Therefore if you check out the data contained in the newly created people_mixandmatch table, you will find
all the age duplicates corresponding to different individuals whose age equals to 60 or 40 and the city where
they have been to.
2089
The following scenario describes a Job that processes reject flows without separating them from the main flow.
In the Repository tree view, click Metadata > File delimited. Drag and drop the customers metadata onto
the workspace.
The customers metadata contains information about customers, such as their ID, their name or their address,
etc.
For more information about centralizing metadata, see Talend Studio User Guide.
2.
In the dialog box that asks you to choose which component type you want to use, select tFileInputDelimited
and click OK.
3.
Drop the states metadata onto the design workspace. Select the same component in the dialog box and click
OK.
The states metadata contains the ID of the state, and its name.
4.
Drop a tMap and two tLogRow components from the Palette onto the design workspace.
5.
Connect the customers component to the tMap, using a Row > Main connection.
2090
6.
Connect the states component to the tMap, using a Row > Main connection. This flow will automatically
be defined as Lookup.
2.
Click the Property Settings button at the top of the input area to open the [Property Settings] dialog box,
and clear the Die on error check box in order to handle the execution errors.
The ErrorReject table is automatically created.
3.
Select the id, idState, RegTime and RegisterTime in the input table and drag them to the ErrorReject table.
2091
4.
Click the [+] button at the top right of the editor to add an output table. In the dialog box that opens, select
New output. In the field next to it, type in the name of the table, out1. Click OK.
5.
Drag the following columns from the input tables to the out1 table: id, CustomerName, idState, and
LabelState.
Add two columns, RegTime and RegisterTime, to the end of the out1 table and set their date formats: "dd/
MM/yyyy HH:mm" and "yyyy-MM-dd HH:mm:ss.SSS" respectively.
6.
Click in the Expression field for the RegTime column, and press Ctrl+Space to display the autocompletion list. Find and double-click TalendDate.parseDate. Change the pattern to ("dd/MM/yyyy
HH:mm",row1.RegTime).
7.
Do the same thing for the RegisterTime column, but change the pattern to ("yyyy-MM-dd
HH:mm:ss.SSS",row1.RegisterTime).
8.
Click the [+] button at the top of the output area to add an output table. In the dialog box that opens, select
Create join table from, choose Out1, and name it rejectInner. Click OK.
9.
Click the tMap settings button and set Catch lookup inner join reject to true in order to handle rejects.
2092
10. Drag the id, CustomerName, and idState columns from the input tables to the corresponding columns of the
rejectInner table.
Click in the Expression field for the LabelState column, and type in UNKNOWN.
11. Click
in
the
Expression
field
for
the
RegTime
column,
press
Ctrl+Space,
and
select
press
Ctrl
12. Click
in
the
Expression
field
for
+Space, and select TalendDate.parseDate,
HH:mm:ss.SSS",row1.RegisterTime).
the
but
RegisterTime
column,
change the pattern to
("yyyy-MM-dd
If the data from row1 has a wrong pattern, it will be returned by the ErrorReject flow.
2.
2093
The Run console displays the main out flow and the ErrorReject flow. The main output flow unites both
valid data and inner join rejects, while the ErrorReject flow contains the error information about rows with
unparseable date formats.
2094
tNormalize
tNormalize
tNormalize Properties
Component family
Processing/Fields
Function
Purpose
tNormalize helps improve data quality and thus eases the data update.
Basic settings
Column to normalize
Select the column from the input flow which the normalization is
based on.
Item separator
Enter the separator which will delimit data in the input flow.
The item separator is based on regular expressions,
so the character "." (a special character for regular
expression) should be avoided or used carefully here.
Advanced settings
Get rid of duplicated rows Select this check box to deduplicate rows in the data of the output
from output
flow.
This feature is not available for the Map/Reduce version of this
component.
Use CSV parameters
Discard the trailing empty Select this check box to discard the trailing empty strings.
strings
Trim resulting values
Select this check box to trim leading and trailing whitespace from
the resulting data.
When both Discard the trailing empty string and Trim
resulting values check boxes are selected, the former
works first.
tStatCatcher Statistics
Global Variables
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
NB_LINE: Indicates the number of rows read by an input component or transferred to an output
component. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and
choose the variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable
means it functions after the execution of a component.
Usage
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
2095
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
For scenario demonstrating a Map/Reduce Job using this component, see section Scenario 2:
Normalizing data using Map/Reduce components.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Drop the following components from the Palette to the design workspace: tFileInputDelimited, tNormalize,
tLogRow.
2.
2096
2.
In the File name field, specify the path to the input file to be normalized.
3.
Click the [...] button next to Edit schema to open the [Schema] dialog box, and set up the input schema by
adding one column named Tags. When done, click OK to validate your schema setup and close the dialog
box, leaving the rest of the settings as they are.
4.
5.
Check the schema, and if necessary, click Sync columns to get the schema synchronized with the input
component.
6.
2097
In this use case, the input schema has only one column, Tags, so just accept the default setting.
7.
In the Advanced settings view, select the Get rid of duplicate rows from output, Discard the trailing
empty strings, and Trim resulting values check boxes.
8.
In the tLogRow component, select the Print values in the cells of table radio button.
2.
The list is tidied up, with duplicate tags, leading and trailing whitespace and trailing empty strings removed,
and the result is displayed in a table cell on the console.
2098
Note that the Talend Map/Reduce components are available to subscription-based Big Data users only and this
scenario can be replicated only with Map/Reduce components.
The sample data used in this scenario is the same as in the scenario explained earlier.
ldap,
db2, jdbc driver,
grid computing, talend architecture
content, environment,,
tmap,,
eclipse,
database,java,postgresql,
tmap,
database,java,sybase,
deployment,,
repository,
database,informix,java
Since Talend Studio allows you to convert a Job between its Map/Reduce and Standard (Non Map/Reduce)
versions, you can convert the scenario explained earlier to create this Map/Reduce Job. This way, many
components used can keep their original settings so as to reduce your workload in designing this Job.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the
Hadoop distribution to be used. Then proceed as follows:
In the Repository tree view of the Integration perspective of Talend Studio, right-click the Job you have
created in the earlier scenario to open its contextual menu and select Edit properties.
Then the [Edit properties] dialog box is displayed. Note that the Job must be closed before you are able to
make any changes in this dialog box.
This dialog box looks like the image below:
2099
Note that you can change the Job name as well as the other descriptive information about the Job from this
dialog box.
2.
Click Convert to Map/Reduce Job. Then a Map/Reduce Job using the same name appears under the Map/
Reduce Jobs sub-node of the Job Design node.
If you need to create this Map/Reduce Job from scratch, you have to right-click the Job Design node or the Map/
Reduce Jobs sub-node and select Create Map/Reduce Job from the contextual menu. Then an empty Job is
opened in the workspace. For further information, see the section describing how to create a Map/Reduce Job of
the Talend Open Studio for Big Data Getting Started Guide.
Double-click this new Map/Reduce Job to open it in the workspace. The Map/Reduce components' Palette is
opened accordingly and in the workspace, the crossed-out components, if any, indicate that those components
do not have the Map/Reduce version.
2.
Right-click each of those components in question and select Delete to remove them from the workspace.
3.
Drop a tHDFSInput component and a tHDFSOutput component in the workspace. The tHDFSInput
component reads data from the Hadoop distribution to be used, the tHDFSOutput component, replacing
tLogRow, writes data in that distribution.
If from scratch, you have to drop a tNormalize component, too.
2100
4.
Connect tHDFSInput to tNormalize using the Row > Main link and accept to get the schema of tNormalize.
5.
Click Run to open its view and then click the Hadoop Configuration tab to display its view for configuring
the Hadoop connection for this Job.
This view looks like the image below:
2.
From the Property type list, select Built-in. If you have created the connection to be used in Repository,
then select Repository and thus the Studio will reuse that set of connection information for this Job.
For further information about how to create an Hadoop connection in Repository, see the chapter describing
the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide.
3.
In the Version area, select the Hadoop distribution to be used and its version. If you cannot find from the list
the distribution corresponding to yours, select Custom so as to connect to a Hadoop distribution not officially
supported in the Studio.
For a step-by-step example about how to use this Custom option, see section Connecting to a custom Hadoop
distribution.
Note that if you use Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
4.
In the Name node field, enter the location of the master node, the NameNode, of the distribution to be used.
For example, hdfs://talend-cdh4-namenode:8020.
5.
In the Job tracker field, enter the location of the JobTracker of your distribution. For example, talend-cdh4namenode:8021.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs described in
Apache's documentation on http://hadoop.apache.org/.
Talend Open Studio for Big Data Components Reference Guide
2101
6.
If the distribution to be used requires Kerberos authentication, select the Use Kerberos authentication check
box and complete the authentication details. Otherwise, leave this check box clear.
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains
pairs of Kerberos principals and encrypted keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal designates but
must have the right to read the keytab file being used. For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in this situation, ensure that user1 has the right to read
the keytab file to be used.
7.
In the User name field, enter the login user name for your distribution. If you leave it empty, the user name
of the machine hosting the Studio will be used.
8.
In the Temp folder field, enter the path in HDFS to the folder where you store the temporary files generated
during Map/Reduce computations.
9.
Leave the default value of the Path separator in server as it is, unless you have changed the separator used
by your Hadoop distribution's host machine for its PATH variable or in other words, that separator is not a
colon (:). In that situation, you must change this value to the one you are using in that host.
10. Leave the Clear temporary folder check box selected, unless you want to keep those temporary files.
11. If the Hadoop distribution to be used is Hortonworks Data Platform V1.2 or Hortonworks Data Platform
V1.3, you need to set proper memory allocations for the map and reduce computations to be performed by
the Hadoop system.
In that situation, you need to enter the values you need to in the Mapred job map memory mb and
the Mapred job reduce memory mb fields, respectively. By default, the values are both 1000 which are
normally appropriate for running the computations.
For further information about this Hadoop Configuration tab, see the section describing how to configure the
Hadoop connection for a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting Started Guide.
2102
2.
Click the
defined.
button next to Edit schema to verify that the schema received in the earlier steps is properly
Note that if you are creating this Job from scratch, you need to click the
button to manually define
the schema; otherwise, if the schema has been defined in Repository, you can select the Repository option
from the Schema list in the Basic settings view to reuse it. For further information about how to define a
schema in Repository, see the chapter describing metadata management in the Talend Studio User Guide or
the chapter describing the Hadoop cluster node in Repository of Talend Open Studio for Big Data Getting
Started Guide.
3.
If you make changes in the schema, click OK to validate these changes and accept the propagation prompted
by the pop-up dialog box.
4.
In the Folder/File field, enter the path, or browse to the source file you need the Job to read.
If this file is not in the HDFS system to be used, you have to place it in that HDFS, for example, using
tFileInputDelimited and tHDFSOutput in a Standard Job.
This component keeps its both Basic settings and Advanced settings used by the original Job. It normalizes
the Tags column of the input flow.
2103
Configuring tHDFSOutput
1.
2.
As explained earlier for verifying the schema of tHDFSInput, do the same to verify the schema of
tHDFSOutput. If it is not consistent with that of its preceding component, tNormalize, click Sync column
to retrieve the schema of tNormalize.
3.
In the Folder field, enter the path, or browse to the folder you want to write data in.
2104
4.
From the Action list, select the operation you need to perform on the folder in question. If the folder already
exists, select Overwrite; otherwise, select Create.
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
2105
tPigAggregate
tPigAggregate
tPigAggregate component belongs to two component families: Big Data and Processing. For more information
about tPigAggregate, see section tPigAggregate.
2106
tPigCode
tPigCode
tPigCode component belongs to two component families: Big Data and Processing. For more information about
tPigCode, see section tPigCode.
2107
tPigCross
tPigCross
tPigCross component belongs to two component families: Big Data and Processing. For more information about
tPigCross, see section tPigCross.
2108
tPigDistinct
tPigDistinct
tPigDistinct component belongs to two component families: Big Data and Processing. For more information
about tPigDistinct, see section tPigDistinct.
2109
tPigFilterColumns
tPigFilterColumns
tPigFilterColumns component belongs to two component families: Big Data and Processing. For more
information about tPigFilterColumns, see section tPigFilterColumns.
2110
tPigFilterRow
tPigFilterRow
tPigFilterRow component belongs to two component families: Big Data and Processing. For more information
about tPigFilterRow, see section tPigFilterRow.
2111
tPigJoin
tPigJoin
tPigJoin component belongs to two component families: Big Data and Processing. For more information about
tPigJoin, see section tPigJoin.
2112
tPigLoad
tPigLoad
tPigLoad component belongs to two component families: Big Data and Processing. For more information about
tPigLoad, see section tPigLoad.
2113
tPigMap
tPigMap
tPigMap component belongs to two component families: Big Data and Processing. For more information about
tPigMap, see section tPigMap.
2114
tPigReplicate
tPigReplicate
tPigReplicate component belongs to two component families: Big Data and Processing. For more information
about tPigReplicate, see section tPigReplicate.
2115
tPigSort
tPigSort
tPigSort component belongs to two component families: Big Data and Processing. For more information about
tPigSort, see section tPigSort.
2116
tPigStoreResult
tPigStoreResult
tPigStoreResult component belongs to two component families: Big Data and Processing. For more information
about tPigStoreResult, see section tPigStoreResult.
2117
tReplace
tReplace
tReplace Properties
Component family
Processing
Function
Carries out a Search & Replace operation in the input columns defined.
Purpose
Basic settings
Simple Mode
Replace
Search
/
Click the
button to add as many conditions as needed. The
conditions are performed one after the other for each row.
Input column: Select the column of the schema the search &
replace is to be operated on
Search: Type in the value to search in the input column
Replace with: Type in the substitution value.
Whole word: Select this check box if the searched value is to be
considered as whole.
Case sensitive: Select this check box to care about the case.
Note that you cannot use regular expression in these columns.
Select this check box when the operation you want to perform
cannot be carried out through the simple mode. In the text field,
type in the regular expression as required.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the job processing metadata at a job
level as well as at each component level.
Usage
This component is not startable as it requires an input flow. And it requires an output component.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data
Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
2118
Drop the following components from the Palette onto the design workspace: tFileInputDelimited, tReplace,
tFilterColumn and tFileOutputDelimited.
Connect the components using Main Row connections via a right-click each component.
Select the tFileInputDelimited component and set the input flow parameters.
The File is a simple csv file stored locally. The Row Separator is a carriage return and the Field Separator is
a semi-colon. In the Header is the name of the column, and no Footer nor Limit are to be set.
The file contains characters such as: *t, . or Nikson which we want to turn into Nixon, and streat, which
we want to turn into Street.
The schema for this file is built in also and made of four columns of various types (string or int).
Now select the tReplace component to set the search & replace parameters.
2119
The tFilterColumn component holds a schema editor allowing to build the output schema based on the column
names of the input schema. In this use case, add one new column named empty_field and change the order of
the input schema columns to obtain a schema as follows: empty_field, Firstname, Name, Street, Amount.
Click OK to validate.
2120
The first column is empty, the rest of the columns have been cleaned up from the parasitical characters, and Nikson
was replaced with Nixon. The street column was moved and the decimal delimiter has been changed from a dot
to a comma, along with the currency sign.
Note that the Talend Map/Reduce components are available to subscription-based Big Data users only and this
scenario can be replicated only with Map/Reduce components.
2121
The sample data to be used in this scenario is the same as in the Job described earlier, reading as follows:
streat;John;Kennedy;98.30$
streat;Richad;Nikson;78.23$
streat;Richard;Nikson;78.2$
streat;toto;Nikson;78.23$
streat;Richard;Nikson;78.23$
street;Georges *t;bush;99.99$
Since Talend Studio allows you to convert a Job between its Map/Reduce and Standard (Non Map/Reduce)
versions, you can convert the scenario explained earlier to create this Map/Reduce Job. This way, many
components used can keep their original settings so as to reduce your workload in designing this Job.
Before starting to replicate this scenario, ensure that you have appropriate rights and permissions to access the
Hadoop distribution to be used. Then proceed as follows:
In the Repository tree view of the Integration perspective of Talend Studio, right-click the Job you have
created in the earlier scenario to open its contextual menu and select Edit properties.
Then the [Edit properties] dialog box is displayed. Note that the Job must be closed before you are able to
make any changes in this dialog box.
This dialog box looks like the image below:
2122
Note that you can change the Job name as well as the other descriptive information about the Job from this
dialog box.
2.
Click Convert to Map/Reduce Job. Then a Map/Reduce Job using the same name appears under the Map/
Reduce Jobs sub-node of the Job Design node.
If you need to create this Map/Reduce Job from scratch, you have to right-click the Job Design node or the Map/
Reduce Jobs sub-node and select Create Map/Reduce Job from the contextual menu. Then an empty Job is
opened in the workspace. For further information, see the section describing how to create a Map/Reduce Job of
the Talend Open Studio for Big Data Getting Started Guide.
Double-click this new Map/Reduce Job to open it in the workspace. The Map/Reduce components' Palette is
opened accordingly and in the workspace, the crossed-out components, if any, indicate that those components
do not have the Map/Reduce version.
2.
Right-click each of those components in question and select Delete to remove them from the workspace.
3.
Drop a tHDFSInput component and a tHDFSOutput component in the workspace. The tHDFSInput
component reads data from the Hadoop distribution to be used and the tHDFSOutput component writes data
in that distribution.
If from scratch, you have to drop a tReplace component and a tFilterColumns component, too.
4.
Connect tHDFSInput to tReplace using the Row > Main link and accept to get the schema of tReplace.
5.
Click Run to open its view and then click the Hadoop Configuration tab to display its view for configuring
the Hadoop connection for this Job.
This view looks like the image below:
2123
2.
From the Property type list, select Built-in. If you have created the connection to be used in Repository,
then select Repository and thus the Studio will reuse that set of connection information for this Job.
For further information about how to create an Hadoop connection in Repository, see the chapter describing
the Hadoop cluster node of the Talend Open Studio for Big Data Getting Started Guide.
3.
In the Version area, select the Hadoop distribution to be used and its version. If you cannot find from the list
the distribution corresponding to yours, select Custom so as to connect to a Hadoop distribution not officially
supported in the Studio.
For a step-by-step example about how to use this Custom option, see section Connecting to a custom Hadoop
distribution.
Note that if you use Hortonworks Data Platform V2.0.0, the type of the operating system for running the
distribution and a Talend Job must be the same, such as Windows or Linux.
4.
In the Name node field, enter the location of the master node, the NameNode, of the distribution to be used.
For example, hdfs://talend-cdh4-namenode:8020.
5.
In the Job tracker field, enter the location of the JobTracker of your distribution. For example, talend-cdh4namenode:8021.
Note that the notion Job in this term JobTracker designates the MR or the MapReduce jobs described in
Apache's documentation on http://hadoop.apache.org/.
6.
If the distribution to be used requires Kerberos authentication, select the Use Kerberos authentication check
box and complete the authentication details. Otherwise, leave this check box clear.
If you need to use a Kerberos keytab file to log in, select Use a keytab to authenticate. A keytab file contains
pairs of Kerberos principals and encrypted keys. You need to enter the principal to be used in the Principal
field and the access path to the keytab file itself in the Keytab field.
Note that the user that executes a keytab-enabled Job is not necessarily the one a principal designates but
must have the right to read the keytab file being used. For example, the user name you are using to execute
a Job is user1 and the principal to be used is guest; in this situation, ensure that user1 has the right to read
the keytab file to be used.
7.
In the User name field, enter the login user name for your distribution. If you leave it empty, the user name
of the machine hosting the Studio will be used.
8.
In the Temp folder field, enter the path in HDFS to the folder where you store the temporary files generated
during Map/Reduce computations.
9.
Leave the default value of the Path separator in server as it is, unless you have changed the separator used
by your Hadoop distribution's host machine for its PATH variable or in other words, that separator is not a
colon (:). In that situation, you must change this value to the one you are using in that host.
10. Leave the Clear temporary folder check box selected, unless you want to keep those temporary files.
11. If the Hadoop distribution to be used is Hortonworks Data Platform V1.2 or Hortonworks Data Platform
V1.3, you need to set proper memory allocations for the map and reduce computations to be performed by
the Hadoop system.
In that situation, you need to enter the values you need to in the Mapred job map memory mb and
the Mapred job reduce memory mb fields, respectively. By default, the values are both 1000 which are
normally appropriate for running the computations.
For further information about this Hadoop Configuration tab, see the section describing how to configure the
Hadoop connection for a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting Started Guide.
2124
2.
Click the
defined.
button next to Edit schema to verify that the schema received in the earlier steps is properly
Note that if you are creating this Job from scratch, you need to click the
button to manually add these
schema columns; otherwise, if the schema has been defined in Repository, you can select the Repository
option from the Schema list in the Basic settings view to reuse it. For further information about how to
define a schema in Repository, see the chapter describing metadata management in the Talend Studio User
Guide or the chapter describing the Hadoop cluster node in Repository of the Talend Open Studio for Big
Data Getting Started Guide.
3.
If you make changes in the schema, click OK to validate these changes and accept the propagation prompted
by the pop-up dialog box.
4.
In the Folder/File field, enter the path, or browse to the source file you need the Job to read.
2125
If this file is not in the HDFS system to be used, you have to place it in that HDFS, for example, using
tFileInputDelimited and tHDFSOutput in a Standard Job.
This component keeps its configuration used by the original Job. It searches incoming entries and replaces
the ones you have specified in the Search column with the values given in the Replace with column.
2.
The components keeps its schema from the original Job while the order of its columns stays no longer as it
was rearranged in the scenario earlier and has automatically changed back to its original order.
2126
Configuring tHDFSOutput
1.
2.
As explained earlier for verifying the schema of tHDFSInput, do the same to verify the schema of
tHDFSOutput. If it is not consistent with that of its preceding component, tFilterColumns, click Sync
columns to retrieve the schema of tFilterColumns.
3.
In the Folder field, enter the path, or browse to the folder you want to write the unique entries in.
2127
4.
From the Action list, select the operation you need to perform on the folder in question. If the folder already
exists, select Overwrite; otherwise, select Create.
If you need to obtain more details about the Job, it is recommended to use the web console of the Jobtracker
provided by the Hadoop distribution you are using.
2128
tSampleRow
tSampleRow
tSampleRow properties
Component family
Processing
Function
Purpose
tSampleRow helps to select rows according to a list of single lines and/or a list of groups of lines.
Basic settings
Range
Usage
This component handles flows of data therefore it requires input and output components.
Limitation
n/a
Drop the following components from the Palette onto the design workspace: tRowGenerator, tSampleRow,
and tLogRow.
2.
2129
In the design workspace, select tRowgenerator, and click the Component tab to define the basic settings
for tRowGenerator.
2.
Click the [...] button next to Edit Schema to define the data you want to use as input. In this scenario, the
schema is made of five columns.
3.
In the Basic settings view, click RowGenerator Editor to define the data to be generated.
4.
In the RowGenerator Editor, specify the number of rows to be generated in the Number of Rows for
RowGenerator field and click OK. The RowGenerator Editor closes.
5.
In the design workspace, select tSampleRow and click the Component tab to define the basic settings for
tSampleRow.
2130
6.
In the Basic settings view, set the Schema to Built-In and click Sync columns to retrieve the schema from
the tRowGenerator component.
7.
In the Range panel, set the filter to select your rows using the correct syntax as explained. In this scenario,
we want to select the first and fifth lines along with the group of lines between 9 and 12.
8.
In the design workspace, select tLogRow and click the Component tab to define its basic settings. For more
information about tLogRow, see section tLogRow.
2.
Press F6, or click Run on the Run tab to execute the Job.
The filtering result displayed on the console shows the first and fifth rows and the group of rows between
9 and 12.
2131
tSortRow
tSortRow
tSortRow properties
Component family
Processing
Function
Sorts input data based on one or several columns, by sort type and order
Purpose
Basic settings
Criteria
Advanced settings
Sort on disk
Not available for Temp data directory path: Set the location where the temporary files
the
Map/Reduce should be stored.
version of this
Create temp data directory if not exists: Select this checkbox to create
component.
the directory if it does not exist.
Buffer size of external sort: Type in the size of physical memory you
want to allocate to sort processing.
tStatCatcher Statistics
Usage
Select this check box to gather the Job processing metadata at the Job level
as well as at each component level.
This component handles flow of data therefore it requires input and output, hence is defined as an
intermediary step.
Usage in Map/Reduce If you have subscribed to one of the Talend solutions with Big Data, you can also use this component as a
Jobs
Map/Reduce component. In a Talend Map/Reduce Job, this component is used as an intermediate step and
other components used along with it must be Map/Reduce components, too. They generate native Map/
Reduce code that can be executed directly in Hadoop.
For further information about a Talend Map/Reduce Job, see the sections describing how to create, convert
and configure a Talend Map/Reduce Job of the Talend Open Studio for Big Data Getting Started Guide.
For a scenario demonstrate a Map/Reduce Job using this component, see section Scenario 2: Deduplicating
entries using Map/Reduce components.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only Standard Jobs,
that is to say traditional Talend data integration Jobs, and non Map/Reduce Jobs.
2132
Limitation
n/a
Drop the three components required for this use case: tRowGenerator, tSortRow and tLogRow from the
Palette to the design workspace.
Connect them together using Row main links.
On the tRowGenerator editor, define the values to be randomly used in the Sort component. For more
information regarding the use of this particular component, see section tRowGenerator
In this scenario, we want to rank each salesperson according to its Sales value and to its number of years in
the company.
Double-click tSortRow to display the Basic settings tab panel. Set the sort priority on the Sales value and as
secondary criteria, set the number of years in the company.
Use the plus button to add the number of rows required. Set the type of sorting, in this case, both criteria being
integer, the sort is numerical. At last, given that the output wanted is a rank classification, set the order as
descending.
2133
Display the Advanced Settings tab and select the Sort on disk check box to modify the temporary memory
parameters. In the Temp data directory path field, type the path to the directory where you want to store the
temporary data. In the Buffer size of external sort field, set the maximum buffer value you want to allocate
to the processing.
The default buffer value is 1000000 but the more rows and/or columns you process, the higher the value needs to be to
prevent the Job from automatically stopping. In that event, an out of memory error message displays.
Make sure you connected this flow to the output component, tLogRow, to display the result in the Job console.
Press F6 to run the Job. The ranking is based first on the Sales value and then on the number of years of
experience.
2134
tSplitRow
tSplitRow
tSplitRow properties
Component family
Processing/Fields
Function
Purpose
This component helps splitting one input row into several output rows.
Basic settings
Columns mapping
Advanced settings
tStatCatcher Statistics
Select this check box to gather the Job processing metadata at the
Job level as well as at each component level.
Usage
This component splits one input row into multiple output rows by mapping input columns onto
output columns.
Limitation
n/a
1. Drop the following components required for this use case: tFixedFlowInput, tSplitRow and tLogRow from
the Palette to the design workspace.
2. Connect them together using Row Main connections.
3. Double-click tFixedFlowInput to open its Basic settings view.
2135
7. Click the plus button to add twelve lines for the input columns: Company, City, State, CountryCode, Street,
Industry, Company2, City2, State2, CountryCode2, Street2 and Industry2.
8. Click OK to close the dialog box.
9. Double-click tSplitRow to open its Basic settings view.
2136
10.Click Edit schema to set the schema for the output data.
11.Click the plus button beneath the tSplitRow_1(Output) table to add four lines for the output columns: Company,
CountryCode, Address and Industry.
12.Click OK to close the dialog box. Then an empty table with column names defined in the preceding step will
appear in the Columns mapping area:
13.Click the plus button beneath the empty table in the Columns mapping area to add two lines for the output rows.
14.Fill the table in the Columns mapping area by columns with the following values:
Company: row1.Company, row1.Company2;
Country: row1.CountryCode, row1.CountryCode2;
2137
The value in Address column, for example, row1.Street+","+row1.City+","+row1.State, will display an absolute address
by combining values in Street column, City column and State column together. The "row1" used in the values of each
column refers to the input row from tFixedFlowInput.
16.Click Sync columns to retrieve the schema defined in the preceding component.
17.Select Table in the Mode area.
18.Save the Job and press F6 to run it.
The input data in one row is split into two rows of data containing the same company information.
2138
tWriteJSONField
tWriteJSONField
tWriteJSONField properties
Component family
Processing/Fields
Function
Purpose
tWriteJSONField transforms the incoming data into JSON fields and transfers them to a file,
a database table, etc.
Basic settings
Output Column
List of the columns defined in the output schema to hold the JSON
field generated.
Sync columns
Group by
Select this check box to remove the root node from the JSON
field generated.
Advanced settings
tStatCatcher Statistics
Usage
Preceded by an input component, this component wraps the incoming data into a JSON field.
If you have subscribed to one of the Talend solutions with Big Data, you can also use this
component as a Map/Reduce component. In a Talend Map/Reduce Job, this component is
used as an intermediate step and other components used along with it must be Map/Reduce
components, too. They generate native Map/Reduce code that can be executed directly in
Hadoop.
You need to use the Hadoop Configuration tab in the Run view to define the connection to
a given Hadoop distribution for the whole Job.
For further information about a Talend Map/Reduce Job, see the sections describing how to
create, convert and configure a Talend Map/Reduce Job of the Talend Open Studio for Big
Data Getting Started Guide.
Note that in this documentation, unless otherwise explicitly stated, a scenario presents only
Standard Jobs, that is to say traditional Talend data integration Jobs, and non Map/Reduce
Jobs.
2139
Global Variables
Drop the following components from the Palette onto the design workspace: tFixedFlowInput,
tWriteJSONField and tLogRow.
2.
3.
2140
2.
Click the [+] button to add three columns, namely firstname, lastname and dept, with the type of string.
Click OK to close the editor.
3.
Select the Use Inline Content option and enter the data below in the Content box:
Andrew;Wallace;Doc
John;Smith;R&D
Christian;Dior;Sales
4.
Select the Remove root node option to remove the root node setting from the JSON fields generated.
5.
2141
In the Linker target panel, click the default rootTag and type in staff, which is the root node of the JSON
field to be generated.
7.
Right-click staff and select Add Sub-element from the context menu.
8.
Repeat the steps to add two more sub-nodes, namely lastname and dept.
9.
Right-click firstname and select Set As Loop Element from the context menu.
10. Drop firstname from the Linker source panel to its counterpart in the Linker target panel.
In the pop-up dialog box, select Add linker to target node.
2142
12. Click the [+] button in the right panel to add one column, namely staff, which will hold the JSON data
generated.
Click OK to close the editor.
13. Double-click tLogRow to display its Basic settings view.
Select Table (print values in cells of a table) for a better display of the results.
2.
2143
Related Scenarios
As shown above, the JSON fields have been generated correctly, with the root node settings removed.
Related Scenarios
For related scenarios, see:
section Scenario 1: Retrieving error messages while extracting data from JSON fields.
section Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table.
section Mapping XML data.
2144
tXMLMap
tXMLMap
tXMLMap properties
Component family
Processing/XML
Function
tXMLMap is an advanced component fine-tuned for transforming and routing XML data flow
(data of the Document type), especially when processing numerous XML data sources, with
or without flat data to be joined.
Purpose
tXMLMap transforms and routes data from single or multiple sources to single or multiple
destinations.
Basic settings
Map editor
Advanced settings
tStatCatcher
Statistics
Select this check box to gather the Job processing metadata at the Job
level as well as at each component level.
Usage
Possible uses are from a simple reorganization of fields to the most complex jobs of data
multiplexing or demultiplexing transformation, concatenation, inversion, filtering and so on.
When needs be, you can define sophisticated outputting strategy for the output XML flows
using group element, aggregate element, empty element and many other features such as All
in one. For further information about these features, see Talend Studio User Guide.
It is used as an intermediate component and fits perfectly the process requiring many XML
data sources, such as, the ESB request-response processes.
Limitation
The following sections present several generic use cases about how to use the tXMLMap component, while if
you need some specific examples using this component along with the ESB components to build data services,
see the scenarios for the ESB components:
section Scenario 2: Using tESBConsumer with custom SOAP Headers
If you need further information about the principles of mapping multiple input and output flows, see Talend Studio
User Guide.
2145
tXMLMap: this component maps and transforms the received XML data flows into one single XML data flow.
tLogRow: this component is used to display the output data.
2. Double click the tFileInputXML component labelled Customers to open its contextual menu.
3. From this menu, select Row > Main link to connect this component to tXMLMap..
4. Repeat this operation to connect tXMLMap to tLogRow using Row > *New output* (Main) link. A dialog
box pops up to prompt you to name this output link. In this scenario, name it as Customer_States.
2146
2. Next to Edit schema, click the three-dot button to open the schema editor.
3. In the schema editor, click the plus button to add one row.
4. In the Column column, type in a new name for this row. In this scenario, it is Customer.
5. In the Type column, select the data type of this row. In this scenario, it is Document. The document data type
is essential for making full use of tXMLMap. For further information about this data type, see Talend Studio
User Guide.
6. Click OK to validate this editing and accept the propagation prompted by the popup dialog box. One row is
added automatically to the Mapping table.
7. In the File name / Stream field, browse to, or type in the path to the XML source that provides the customer
data.
8. In the Loop XPath query field, type in / to replace the default one. This means the source data is queried
from the root.
9. In the XPath query column of the Mapping table, type in the XPath. In this scenario, type in ., meaning
that all of the data from source are queried.
10.In the Get Nodes column of the Mapping table, select the check box.
In order to build the Document type data flow, it is necessary to get the nodes from this component.
2147
3. From this contextual menu, select Import From File and in the pop-up dialog box, browse to the corresponding
source file in order to import therefrom the XML structure used by the data to be received by tXMLMap. In
this scenario, the source file is Customer.xml, which is the data input to tFileInputXML (Customers).
You can also import an XML tree from an XSD file. When importing either an input or an output XML tree structure
from an XSD file, you can choose an element as the root of your XML tree. For more information on importing an XML
tree from an XSD file, see Talend Studio User Guide.
4. In the imported XML tree, right click the Customer node and select As loop element to set it as the loop element.
5. On the lower part of this map editor, click the schema editor tab to display the corresponding view.
6. On the right side of this view, click the plus button to add one row to the Customer table and rename this row
as Customer_States.
7. In the Type column of this Customer_States row, select Document as the data type. The corresponding XML
root is added automatically to the top table on the right side which represents the output flow.
8. On the right side in the top table labelled Customer_States, import the XML data structure that you need to use
from the corresponding XML source file. In this scenario, it is Customer_State.xml.
2148
9. Right click the customer node and select As loop element from the contextual menu.
Then you can begin to map the input flow to the output flow.
10.In the top table on the input side (left) of the map editor, click the id node and drop it to the Expression column
in the row corresponding to the output row you need map. In this scenario, it is the @id node.
11.Do the same to map CustomerName to CustomerName, CustomerAddress to CustomerAddress and idState to
idState from the left side to the right side.
In the real project, you may have to keep empty elements in your output XML tree. If so, you can use tXMLMap to
manage them. For further information about how to manage empty elements using tXMLMap, see Talend Studio User
Guide.
12.If required to generate single XML flow, click the wrench icon on top of the output side to open the setting
panel and set the All in one feature as true. In this example, this option is set as true. For further information
about the All in one feature, see Talend Studio User Guide.
2149
2150
2. Double click the tFileInputXML component labelled USstates to open its contextual menu and select Row >
Main connection to connect this component to tXMLMap. As you create this connection in the second place,
this connection is of type Lookup.
3. Double click the tFileInputXML component labelled USstates to open its Component view.
4. Next to Edit schema, click the three-dot buttons to open the schema editor.
5. Click the plus button to add one rows and rename it, for example, as USState.
6. In the Type column, select the Document option from the drop-down list.
2151
7. Click OK to validate this editing and accept the propagation prompted by the pop-up dialog box.
8. In the File name/Stream field, browse to or type in the path to the USStates.xml file.
9. In the Loop XPath query field, type in "/" to replace the default value. This means the loop is based on the root.
10.In the Mapping table, where one row is already added automatically, enter "." in the XPath query column
to retrieve US States data from the source file.
11.In the Get Nodes column, select the check box. This retrieves the XML structure for the Document type data.
2152
Then you can begin to modify the mapping you have done in the previous scenario to join the complementary
data into the input flow. This mapping then should look like as follows:
3. In the lookup table on the input side (left) of the map editor, click the LabelState row and drop it on the customer
node on the output side. A dialog box pops up.
4. In this dialog box, select Create as sub-element of target node and click OK. This operation adds a new subelement to the output XML tree and maps it with LabelState on the input inside at the same time.
5. If required to generate single XML flow, click the wrench icon on top of the output side to open the setting
panel and set the All in one feature as true. In this example, this option is set as true. For further information
about the All in one feature, see Talend Studio User Guide.
2153
The US state labels that correspond to the state IDs provided as the lookup key by the main data flow are selected
and outputted.
A step-by-step tutorial related to this Join topic is available on the Talend Technical Community Site. For further
information, see http://talendforge.org/tutorials/tutorial.php?language=english&idTuto=101.
2154
1. In your Studio, open the Job used in the previous scenario to display it in the Design workspace.
2. Double click tXMLMap to open its editor. In this editor, the input and output data flows have been mapped
since the replication of the previous scenario.
3.
On the output side (right), click the
2155
4. In this filter area, drop the idState node from the tree view of the input data flow. The Xpath of idState is added
automatically to this filter area.
5. Still in this area, write down the filter condition of interest in Java. In this scenario, this condition reads:
"9".equals([row1.Customer:/Customers/Customer/Address/idState])
6. If required to generate single XML flow, click the wrench icon on top of the output side to open the setting
panel and set the All in one feature as true. In this example, this option is set as true. For further information
about the All in one feature, see Talend Studio User Guide.
2156
The result says that the customer Pivot Point College is selected as its state ID is 9, representing the Florida state
in this scenario.
2157
9. In the Reject table presented on the right part of this Schema editor view, rename each of the four newly added
rows. They are: ID, Customer, idState, LabelState.
2158
In this scenario, the Reject output flow uses flat data type. However, you can create an XML tree view for this flow
using the Document data type. For further information about how to use this Document type, see section Scenario 1:
Mapping and transforming XML data.
The Reject table is completed and thus you have defined the schema of the output flow used to carry the captured
rejected data. Then you need to set up the condition(s) to catch the rejected data of interest.
10.On the upper part of the output side in this Map editor, select the Reject table.
11.
At the top of this table, click the
12.In the Catch Output Reject row of the setting area, select true from the drop-down list. Thus tXMLMap
outputs the data rejected by the filter set up in the previous scenario for the Customer output flow.
13.Do the same thing to switch the Catch Lookup Inner Join Reject row to the true option.
14.Click OK to validate this editing and close this editor.
15.Press F6 to run this Job.
The captured data rejected by the filter and the lookup reads as follows in the Run view:
None of the State IDs of these customers is 9. The customer BBQ Smiths Tex Mex is marked with the state ID 60.
This number does not exist in the idState column of USState.txt where the defined lookup was done, so the data
of this customer is rejected by the lookup and the other data rejected by the filter.
The data selected by the filter you set up in the previous scenario reads as follows in XML format.
2159
To replicate this scenario, you can reuse the Job in section Scenario 2: Launching a lookup in a second XML flow
to join complementary data.
In this Job, double click tXMLMap to open the Map editor.
The objective of this scenario is to group the customer id and the customer name information according to the States
the customers come from. To do this, you need to adjust the XML structure with considering the following factors:
The elements tagging the customer id and the customer name information should be located under the loop
element. Thus they are the sub-elements of the loop element.
The loop element and its sub-elements should be dependent directly on the group element.
The element tagging the States used as grouping condition should be dependent directly on the group element.
2160
Based on this analysis, the structure of the output data should read as follows:
2161
In this figure, the customers node is the root, the Customer element is set as group element and the output data
is grouped according to the LabelState element.
To set a group element, two restrictions must be respected:
the root node cannot be set as group element;
the group element must be the parent of the loop element.
Once the group element is set, the first element except the loop one is used as condition to group the output data.
2. Again in the XML tree view of the output side, right-click the root node customers to open the contextual menu
and select Create sub-element. Then a dialog box pops up.
2162
9. Do the same to map the id element and the CustomerName elements between both sides. Then this modification
is done.
10.If required to generate single XML flow, click the wrench icon on top of the output side to open the setting
panel and set the All in one feature as true. In this example, this option is set as true. For further information
about the All in one feature, see Talend Studio User Guide.
11.Click OK to validate this modification and close this editor.
If you close the Map Editor without having set the required loop elements as described earlier in this scenario, the root
element will be automatically set as loop element.
The id element and the CustomerName element contained in the loop are grouped according to the LabelState
element. The group element Customer tags the start and the end of each group.
tXMLMap provides group element and aggregate element to classify data in the XML tree structure. When handling one
row of data ( one complete XML flow), the behavioral difference between them is:
The group element processes the data always within one single flow.
The aggregate element splits this flow into separate and complete XML flows.
2163
On the Design workspace, double-click the tXMLMap component to open its Map editor. There the output side
reads as follows:
The objective of this scenario is to class the customer information using aggregate element in accordance with
the States they come from and then to send these classes separately in different XML flows to the component
that follows.
To put an aggregate element into effect, the XML data to be processed should have been sorted, for example via your XML
tools, around the element you need to use as the aggregating condition. The figure below presents part of the sorted source
data used in this scenario. The customers possessing the same State id is already put together.
2164
2165
tXMLMap outputs three separate XML flows, each of which carries the information of one State and the
customers from that State.
tXMLMap provides group element and aggregate element to classify data in the XML tree structure. When handling one
row of data ( one complete XML flow), the behavioral difference between them is:
The group element processes the data always within one single flow.
The aggregate element splits this flow into separate and complete XML flows.
2166
The objective of this restructuring is to streamline the presentation of the products information to serve the
manufacturing operations.
The output flow is expected to read as follows:
2167
In the output flow, the root element is changed to manufactures, the sales information is selected and consolidated
into the sale element and the manufacture element is reduced to one single level.
To replicate this scenario, proceed as follows:
2168
1.
On the workspace, drop tFileInputXML, tXMLMap, tLogRow and tFileOutputDelimited from the
Palette.
2.
Right-click tFileInputXML to open its contextual menu and select the Row > Main link from this menu to
connect this component to the tXMLMap component.
3.
Repeat this operation to connect tXMLMap to tLogRow using Row > *New output* (Main) link. A dialog
box pops up to prompt you to name this output link. In this scenario, name it as outDoc.
4.
Do the same to connect tLogRow to tFileOutputDelimited using the Row > Main link.
2.
Click the [...] button next to Edit schema to open the schema editor.
3.
Click the [+] button to add one row to the editor and rename it as doc.
4.
In the Type column, select Document from the drop-down list as the type of the input flow.
5.
In the File name / Stream field, browse to, or type in the path to the XML source that provides the customer
data.
2169
6.
In the Loop XPath query field, type in / to replace the default one. This means the source data is queried
from the root.
7.
In the XPath query column of the Mapping table, type in the XPath. In this scenario, type in ., meaning
that all of the data from source are queried.
8.
In the Get Nodes column of the Mapping table, select the check box.
2.
3.
From this contextual menu, select Import From File and in the pop-up dialog box, browse to the
corresponding source file in order to import therefrom the XML structure used by the data to be received
by tXMLMap. In this scenario, the source file is input2.xml, which provides the data read and loaded by
tFileInputXML.
4.
In the imported XML tree, right-click the manufacture node and select As loop element to set it as the loop
element. Then do the same to set the types node and the sale node as loop element, respectively.
5.
On the lower part of this map editor, click the schema editor tab to display the corresponding view.
6.
On the right side of this view, click the [+] button to add one row to the outDoc table and rename this row
as outDoc.
7.
In the Type column of this outDoc row, select Document as the data type. The corresponding XML root is
added automatically to the top table on the right side which represents the output flow.
2170
8.
On the right side in the top table labelled outDoc, import the XML data structure that you need to use from the
corresponding XML source file. In this scenario, it is ref.xml. This file provides the expected XML structure
mentioned earlier.
9.
Right-click the manufacture node and select As loop element from the contextual menu. Then do the same
to set the types node and the sale node as loop element, respectively.
Then you can begin to map the input flow to the output flow.
10. In the top table on the input side (left) of the map editor, click the @category node and drop it to the
Expression column in the row corresponding to the output row you need to map. In this scenario, it is the
@category node.
2171
When a loop element receives mappings from more than one loop element of the input flow, a [...] button appears next
to this receiving loop element and allows you to set the sequence of the input loops. For example, in this scenario the
2172
types loop element of the output flow is mapped with @id and type which belong to the manufacture loop element
and the types loop element, respectively, so the [...] button appears beside this types loop element.
If the receiving flow is flat data, once it receives mappings from more than one loop element, this [...] button appears
as well, on the head of the table representing the flat data flow, though.
14. Click OK to validate the mappings and close the Map Editor.
If you close the Map Editor without having set the required loop elements as described earlier in this scenario, the
root element will be automatically set as loop element.
2.
If this component does not have the same schema of the preceding component, a warning icon appears. In
this case, click the Sync columns button to retrieve the schema from the preceding one and once done, the
warning icon disappears.
3.
Click OK to validate these changes and accept the propagation prompted by the pop-up dialog box.
4.
5.
In the File Name field, browse to, or enter the path to the file you need to generate the output flow in.
Open the file generated, and you will see the expected products data restructured for manufacturing.
2173
System components
This chapter details the main components that you can find in the System family of the Palette in the Integration
perspective of Talend Studio.
The System family groups together components that help you to interact with the operating system.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tRunJob
tRunJob
tRunJob Properties
Component
family
System
Function
tRunJob executes the Job called in the components properties, in the frame of the context defined.
Purpose
tRunJob helps mastering complex Job systems which need to execute one Job after another.
Basic settings
Select this check box to allow multiple Jobs to be called and processed. When this
option is enabled, only the latest version of the Jobs can be called and processed.
An independent process will be used to run the subjob. The Context and the Use an
independent process to run subjob options disappear.
The Use dynamic job option is not compatible with the Jobserver cache.
Therefore, the execution may fail if you run a Job that contains tRunjob
with this check box selected in Talend Administration Center.
This option is incompatible with the Use or register a shared DB
Connection option of database connection components. When tRunJob
works together with a database connection component, enabling both
options will cause your Job to fail.
Context job
This field is visible only when the Use dynamic job option is selected. Enter the
name of the Job that you want to call from the list of Jobs selected.
Job
Select the Job to be called in and processed. Make sure you already executed once
the Job called, beforehand, in order to ensure a smooth run through tRunJob.
Context
If you defined contexts and variables for the Job to be run by the tRunJob, select
the applicable context entry on the list.
Use an independent process Select this check box to use an independent process to run the subjob. This helps in
to run subjob
solving issues related to memory limits.
This option is not compatible with the Jobserver cache. Therefore, the
execution may fail if you run a Job that contains tRunjob with this check
box selected in Talend Administration Center.
This option is incompatible with the Use or register a shared DB
Connection option of database connection components. When tRunJob
works together with a database connection component, enabling both
options will cause your Job to fail.
2176
Clear this check box to execute the parent Job even though there is an error when
executing the child Job.
Select this check box to get all the context variables from the parent Job. Deselect it
to get all the context variables from the child Job.
If this check box is selected when the parent and child Jobs have the same context
variables defined:
variable values for the parent Job will be used during the child Job execution if no
relevant values are defined in the Context Param table.
otherwise, values defined in the Context Param table will be used during the
child Job execution.
Context Param
You can change the value of selected context parameters. Click the [+] button to add
the parameters defined in the Context tab of the child Job. For more information on
context parameters, see Talend Studio User Guide.
The values defined here will be used during the child Job execution even if Transmit
whole context is selected.
Advanced
settings
Global
Variables
Print Parameters
Select this check box to display the internal and external parameters in the Console.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level as well as
at each component level.
CHILD_RETURN_CODE: Indicates the Java return code of the child Job. This is an After variable and it returns
an integer. If there are no errors, the code value is 0. If an error occurs, an exception message shows.
CHILD_EXCEPTION_STACKTRACE: Returns a Java stack trace from a child Job. This is an After variable
and it returns a string.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it functions
after the execution of a component.
Connections
Usage
This component can be used as a standalone Job or can help clarifying complex Job by avoiding having too many
sub-jobs all together in one Job.
Limitation
n/a
2177
Drop a tFileInputDelimited and a tLogRow from the Palette to the design workspace.
2.
Connect the two components together using a Row > Main link.
Double-click tFileInputDelimited to open its Basic settings view and define its properties.
2.
Click in the File Name field and then press F5 to open the [New Context Parameter] dialog box and
configure the context variable.
2178
3.
In the Name field, enter a name for this new context variable, File in this example.
4.
In the Default value field, enter the full path to the default input file.
5.
Click Finish to validate the context parameter setup and fill the File Name field with the context variable.
You can also create or edit a context parameter in the Context tab view beneath the design workspace. For more
information, see Talend Studio User Guide.
6.
Click the [...] button next to Edit schema to open the [Schema] dialog box where you can configure the
schema manually.
7.
In the dialog box, click the [+] button to add columns and name them according to the input file structure.
In this example, this component will actually read files defined in the parent Job, and these files contain up to
five columns. Therefore, add five string type columns and name them col_1, col_2, col_3, col_4, and col_5
respectively, and then click OK to validate the schema configuration and close the [Schema] dialog box.
8.
Double-click tLogRow to display its Basic settings view and define its properties.
9.
Drop a tFileList and a tRunJob from the Palette to the design workspace.
2.
2179
Double-click tFileList to open its Basic settings view and define its properties.
2.
In the Directory field, specify the path to the directory that holds the files to be processed, or click the [...]
button next to the field to browse to the directory.
In this example, the directory is called tRunJob and it holds three delimited files with up to five columns.
3.
4.
Check that the Use Glob Expressions as Filemask check box is selected, and then click the [+] button to
add a line in the Files area and define a filter to match files. In this example, enter *.csv to retrieve all
delimited files.
5.
Double-click tRunJob to display its Basic settings view and define its properties.
6.
Click the [...] button next to the Job field to open the [Find a Job] dialog box.
2180
7.
Select the child Job you want to execute and click OK to close the dialog box. The name of the selected Job
appears in the Job field.
8.
In the Context Param area, click the plus button to add a line and define the context parameter. The only
context parameter defined in the child Job, named File, appears in the Parameter cell.
9.
Click in the Values cell, press Ctrl+Space on your keyboard to access the list of context variables, and select
tFileList-1.CURRENT_FILEPATH.
The
corresponding
context
variable
appears
in
the
Values
cell:
((String)globalMap.get(tFileList-1.CURRENT_FILEPATH)).
For more information on context variables, see Talend Studio User Guide.
2.
The parent Job calls the child Job, which reads the files defined in the parent Job, and the content of the files
is displayed on the Run console.
Related topic: section tLoop, and section Scenario 1: Buffering data of the tBufferOutput component.
2181
tSetEnv
tSetEnv
tSetEnv Properties
Component family
System
Function
tSetEnv adds variables temporarily to system environment during the execution of a Job.
Purpose
tSetEnv allows to create variables and execute a Job script through communicating the
information about the newly created variables between subjobs. After job execution, the newly
created variables are deleted.
Basic settings
Parameters
Click the plus button to add the variables needed for the job.
name: Enter the syntax for the new variable.
value: Enter a parameter value according to the context.
append: Select this check box to add the new variable at the end.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Usage
Limitation
n/a
2182
2183
4. Select the tRunJob component and click the Component tab. In the Job field, type the name of your child
Job, here ChildJob. This will run the child Job when you run the parent Job.
5. Now double-click the tRunJob component to open the child Job ChildJob.
6. Select the tSetEnv component, and click the Component tab. Add a variable row by clicking the [+] button to
set the initial value of the variable. Type Variable_1 in the Name field, and Child job value in the Value field.
7. Select the tMsgBox component and click the Component tab. In the Message field, type the message
displayed in the info-box which confirms that your variable has properly been taken into account. For example:
"Son:"+System.getProperty("Variable_1") displays the variable set in the tSetEnv component (here
Child job value).
8. Save your Job, go back to parentJob, then run the Job by pressing F6.
2184
tSSH
tSSH
tSSH Properties
Component
family
System
Function
Returns data from a remote computer, based on the secure shell command defined.
Purpose
Allows to establish a communication with distant server and return securely sensible information.
Basic settings
Schema and Edit A schema is a row description, i.e. it defines the number of fields to be processed and
Schema
passed on to the next component. The schema is either Built-in or stored remotely in the
Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode is available.
Click Edit Schema to make changes to the schema.
Click Sync columns to retrieve the schema from the preceding component in the Job.
Built-in: You create and store the schema locally for this component only. Related topic:
see Talend Studio User Guide.
Host
IP address
Port
User
Authentication
method
Public
Key/Key Select the relevant option.
Passphrase/
In case of Public Key, type in the passphrase, if required, in the Key Passphrase field and
Private Key
then in the Private key field, type in the private key or click the three dot button next to
the Private key field to browse to it.
Authentication
method
Password/
Password
Authentication
method
Keyboard
Interactive/
Password
Pseudo terminal
Select this check box to call the interactive shell that performs the terminal operations.
Command
separator
Type in the command separator required. Once the Pseudo terminal check box is selected,
this field becomes unavailable.
Commands
Type in the command for the relevant information to be returned from the remote computer.
When you select the Pseudo terminal check box, this table becomes a terminal emulator
and each row in this table is a single command.
In case of Keyboard Interactive, type in the required password in the Password field.
Use
timeout/ Define the timeout time period. A timeout message will be generated if the actual response
timeout in seconds time exceeds this expected processing time.
Standard Output
Select the destination to which the standard output is returned. The output may be returned
to:
- to console: the output is displayed in the console of the Run view.
- to global variable: the output is indicated by the corresponding global variable.
- both to console and global variable: the output is indicated both of the two means.
- normal: the output is a standard ssh output.
Error Output
Select the destination to which the error output is returned. The output may be returned to:
- to console: the output is displayed in the console of the Run view.
2185
tStatCatcher
Statistics
Select this check box to gather the processing metadata at the Job level as well as at each
component level.
Connections
Global Variables STDOUT: Indicates the standard execution output of the remote command. This is an After variable and it
returns a string.
STDERR: Indicates the error execution output of the remote command. This is an After variable and it returns
a string.
EXIT_CODE: Indicates the exit status of the remote command. This is an After variable and it returns an integer.
To fill up a field or expression with a variable, press Ctrl + Space to access the variable list and choose the
variable to use from it.
For further information about variables, see Talend Studio User Guide.
A Flow variable means it functions during the execution of a component while an After variable means it
functions after the execution of a component.
Usage
Limitation
2186
1.
Type in the name of the Host to be accessed through SSH as well as the Port number.
2.
3.
Select the Authentication method on the list. For this use case, the authentication method used is the public
key.
4.
5.
On the Command field, type in the following command. For this use case, type in hostname; date between
double quotes.
6.
Select the Use timeout check box and set the time before falling in error to 5 seconds.
The remote machine returns the host name and the current date and time as defined on its system.
2187
tSystem
tSystem
tSystem Properties
Component family
System
Function
Purpose
tSystem can call other processing commands, already up and running in a larger Job.
Basic settings
Select this check box to change the name and path of a dedicated
directory.
When the required command is very simple, to the degree that, for
example, only one parameter is used and without space, select this
option to activate its Command field. In this field, enter the simple
system command. Note that the syntax is not checked.
In Windows, the MS-DOS commands do not allow you
to pass directly from the current folder to the folder
containing the file to be launched. To launch a file, you
must therefore use an initial command to change the
current folder, then a second one to launch the file
Select this option to activate its Command field. In this field, enter
the system command in array, one parameter per line.
For example, enter the following command with consecutive
spaces in array for Linux:
"cp"
"/temp/source.txt"
"/temp/copy to/"
Standard Output and Error Select the type of output for the processed data to be transferred to.
Output
to console: data is passed on to be viewed in the Run view.
to global variable: data is passed on to an output variable linked
to the tSystem component.
to console and to global variable: data is passed on to the Run
view and to an output variable linked to the tSystem component.
normal: data is passed on to the component that comes next.
Schema and Edit Schema
Environment variables
2188
Advanced settings
tStatCatcher Statistics
Usage
This component can typically used for companies which already implemented other applications
that they want to integrate into their processing flow through Talend.
Global Variables
Connections
Limitation
n/a
2.
2189
3.
Select the Use Single Command option to activate its Command field and type in "cmd /c echo Hello
World!".
4.
In the Standard Output drop-down list, select to both console and global variable.
5.
The Job executes an echo command and shows the output in the Console of the Run view.
2190
tMDMBulkLoad
tMDMBulkLoad
tMDMBulkLoad properties
Component family
Talend MDM
Function
tMDMBulkLoad writes XML structured master data into the MDM hub in bulk mode.
Purpose
This component uses bulk mode to write data so that big batches of data or data of high complexity
can be quickly uploaded onto the MDM server.
Basic settings
XML field
Select the name of the column in which you want to write the XML data.
URL
Version
Type in the name of the Version of master data you want to connect to,
for which you have the required user rights.
Leave this field empty if you want to display the default Version of
master data.
Data Model
Type in the name of the data model against which the data to be written
is validated.
Data Container
Type in the name of the data container where you want to write the master
data.
Entity
Type in the name of the entity that holds the data record(s) you want to
write.
Type
Validate
Select this checkbox to validate the data you want to write onto the MDM
server against validation rules defined for the current data model.
Note that for the PROVISIONING Data Container, validation checks
will always be performed on incoming records, regardless of whether or
not this checkbox is selected.
For more information on how to set the validation rules, see Talend
Studio User Guide.
If you need faster loading performance, do not select this
checkbox.
Generate ID
Select this check box to generate an ID number for all of the data written.
If you need faster loading performance, do not select this
checkbox.
2192
tMDMBulkLoad properties
Advanced settings
Connections
Commit size
Type in the row count of each batch to be written onto the MDM server.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Outgoing links (from this component to another):
Row: Main,
Trigger: Run if; On Component Ok; On Component Error, On Subjob
Ok, On Subjob Error.
Incoming links (from one component to this one):
Row: Main
Trigger: Run if, On Component Ok, On Component Error, On Subjob
Ok, On Subjob Error
For further information regarding connections, see Talend Studio User
Guide.
Usage
This component needs always an incoming link to offer XML structured data. If your data offered is
not yet in the XML structure, you need use components like tWriteXMLField to transform this data
into the XML structure. For further information about tWriteXMLField, see section tWriteXMLField.
2193
tMDMBulkLoad properties
In such a scenario, the tMDMBulkLoad component waits for XML data as an input. You must manually format
this incoming data to match the entity schema defined in the MDM perspective of Talend Studio. Most of the
time, the data you want to import is in a flat format, and you have to transform it into XML.
As XML parsing is memory consuming, you can workaround this problem by splitting your source file into several
files using the tAdvancedFileOutputXML component. To do this, you select the Split output in several files
option in the Advaced settings view of the component and then set the rows in each output file through a context
variable (context.chunkSize), for example.
The XML schema you must define in the XML editor of this component should be an exact match of the business
entity defined in the MDM perspective of Talend Studio. The XML schema in the editor must represent a single
<root> element which contains all the other elements, so that you can loop on each of the element. The path of
the file should be defined in a temporary folder.
Use a tFileList component to read all the XML files that have just been created. This component enables you to
parallelize the process. Connect it to a tFileInputXML component using the Iterate link.
2194
For the Iterate link, it is recommended that you set as many threads as the number of the physical cores of the computer.
You can achieve that using Runtime.getRuntime().availableProcessors()
The tFileInputXML component will read the data from the XML files you have created, by defining a loop on
the elements, and getting all the nodes that are already formatted as XML. You must then select the Get Nodes
check box.
Ensure that you set the commit size to the same value you defined in the tAdvancedfileOutputXML, the context.chunkSize
context variable.
The tFiledelete component in such a scenario will delete all the temporary data at the end of the Job.
2195
For further information about how to create a data container, a data model, and a business entity along with its
attributes, see Talend Studio User Guide.
The Job in this scenario uses three components.
tFixedFlowInput: this component generates the records to be loaded into the ProductFamily business entity.
In the real case, your records to be loaded are often voluminous and stored in a specific file, while in order to
simplify the replication of this scenario, this Job uses tFixedFlowInput to generate four sample records.
tWriteXMLField: this component transforms the incoming data into XML structure.
tMDMBulkLoad: this component writes the incoming data into the ProductFamily business entity in bulk
mode, generating ID value for each of the record data.
For the time being, tWriteXMLField has some limitations when used with very large datasets. Another scenario is possible
to enhance the MDM bulk data load. For further information, see section Enhancing the MDM bulk data load.
Click the three-dot button next to Edit schema to open the schema editor.
2196
In the schema editor, click the plus button to add one row.
In the schema editor, click the new row and type in the new name: family.
Click OK.
In the Mode area of the Basic settings view, select the Use inline table option.
Under the inline table, click the plus button four times to add four rows in the table.
In the inline table, click each of the added rows and type in their names between the quotation marks: Shirts,
Hats, Pets, Mugs.
Double click tWriteXMLField to open its Basic settings view.
Click the three-dot button next to the Edit schema field to open the schema editor where you can add a row
by clicking the plus button.
2197
Click the newly added row to the right view of the schema editor and type in the name of the output column
where you want to write the XML content. In this example, type in xmlRecord.
Click OK to validate this output schema and close the schema editor.
In the popped up dialog box, click OK to propagate this schema to the following component.
On the Basic settings view, click the three-dot button next to Configure Xml Tree to open the interface that
helps to create the XML structure.
In the Link Target area, click rootTag and rename it as ProductFamily, which is the name of the business
entity used in this scenario.
In the Linker source area, drop family to ProductFamily in the Link target area.
A dialog box displays asking what type of operation you want to do.
Select Create as sub-element of target node to create a sub-element of the ProductFamily node. Then the
family element appears under the ProductFamily node.
In the Link target area, click the family node and rename it as Name, which is one of the attributes of the
ProductFamily business entity.
Right-click the Name node and select from the contextual menu Set As Loop Element.
Click OK to validate the XML structure you defined.
Double-click tMDMBulkLoad to open its Basic settings view.
In XML Field, click this field and select xmlRecord from the drop-down list.
2198
In the URL field, enter the bulk loader URL, between quotes: for example, http://localhost:8080/datamanager/
loadServlet.
In the Username and Password fields, enter your login and password to connect to the MDM server.
In the Data Model and the Data Container fields, enter the names corresponding to the data model and the
data container you need to use. Both are Product for this scenario.
In the Entity field, enter the name of the business entity which the records are to be loaded in. In this example,
type in ProductFamily.
Select the Generate ID check box in order to generate ID values for the records to be loaded.
In the Commit size field, type in the batch size to be written into the MDM hub in bulk mode.
Press F6 to run the Job.
Log into your Talend MDM Web User Interface to check the newly added records for the ProductFamily
business entity.
2199
tMDMClose
tMDMClose
tMDMClose properties
Component family
Talend MDM
Function
Purpose
This component is used to terminate an open MDM server connection after the execution of the
proceeding subjob.
Basic settings
Component List
Select the tMDMConnection component from the list if more than one
connection is planned for the current Job.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Dynamic settings
Click the [+] button to add a row in the table and fill the Code field with a context variable to choose
your MDM server connection dynamically from multiple connections planned in your Job.
Once a dynamic parameter is defined, the Component List box in the Basic settings view becomes
unusable.
For more information on Dynamic settings and context variables, see Talend Studio User Guide.
Usage
Related scenario
For a related use case, see section Scenario: Deleting master data from an MDM Hub.
2200
tMDMCommit
tMDMCommit
tMDMCommit properties
Component family
Talend MDM
Function
tMDMCommit explicitly commits all changes to the database made within the scope of a transaction
in MDM.
Purpose
This component is used to control the point in an MDM Job at which any changes made to the database
within the scope of an MDM transaction are committed, for example to prevent partial commits if an
error occurs.
Basic settings
Component List
Close Connection
Select this check box to close the session for this connection to the MDM
Server after committing the changes. Note that even if you do not select
this check box, the connection can still not be used in a subsequent subjob
unless the Autocommit mode has been enabled.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Usage
This component is to be used along with the tMDM Connection, tMDMRollback. tMDMSP,
tMDMViewSearch, tMDMInput, tMDMDelete, tMDMRouteRecord, tMDMOutput, and
tMDMClose components.
Related scenario
For a related use case, see section Scenario: Deleting master data from an MDM Hub.
2201
tMDMConnection
tMDMConnection
tMDMConnection properties
Component family
Talend MDM
Function
tMDMConnection opens an MDM server connection for convenient reuse in the current transaction.
Purpose
This component is used to open a connection to an MDM server that can then be reused in the
subsequent subjob or subjobs, to avoid having to specify the connection details in each component.
Basic settings
URL
Version
Type in the name of the Version of master data you want to connect to.
Leave this field empty if you want to display the default Version of
master data.
Auto Commit
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Usage
Related scenario
For a related use case, see section Scenario: Deleting master data from an MDM Hub.
2202
tMDMDelete
tMDMDelete
tMDMDelete properties
Component family
Talend MDM
Function
tMDMDelete deletes data records from specific entities in the MDM Hub.
Purpose
Basic settings
URL
Version
Type in the name of the Version of master data you want to connect to,
for which you have the required user rights.
Leave this field empty if you want to display the default Version of
master data.
Entity
Type in the name of the entity that holds the data record(s) you want to
delete.
Data Container
Type in the name of the data container that holds the data record(s) you
want to delete.
Type
Select this check box to filter the master data to be deleted, using certain
conditions.
Xpath: Enter between quotes the path and the XML node to which you
want to apply the condition.
Function: Select the condition to be used from the list.
Value: Enter between inverted commas the value you want to use.
Predicate: Select a predicate if you use more than one condition.
Specify the field(s) (in sequence order) composing the key when the
entity have a multiple key.
Logical delete
Select this check box to send the master data to the Recycle bin and fill
in the Recycle bin path. Once in the Recycle bin, the master data can
be definitely deleted or restored. If you leave this check box clear, the
master data will be permanently deleted.
2203
Die on error
Select this check box to skip the row in error and complete the process
for error-free rows. If needed, you can retrieve the rows in error via a
Row > Rejects link.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Usage
Use this component to write a file and separate the fields using a specific separator.
2204
1. Double-click tMDMConnection to display its Basic settings view and define the component properties.
2. In the URL field, enter the MDM server URL, between quotation marks: for example, "http://localhost:8180/
talend/TalendPort".
3. In the Username and Password fields, enter your user name and password to connect to the MDM server.
4. In the Version field, enter the name of the master data Version you want to access, between quotation marks.
Leave this field empty to access the default master data Version.
5. Double-click tMDMCommit to display its Basic settings view and define the component properties.
This component commits the changes made to the database on successful completion of the proceeding subjob.
6. From the Component List list, select the component for the server connection you want to close if you have
configured more than one MDM server connection. In this use case, there is only one MDM server connection
open, so simply use the default setting.
7. Deselect the Close Connection check box if it is selected. In this example, the tMDMClose component closes
the connection to the MDM Server.
8. Double-click tMDMRollback to display its Basic settings view and define the component properties.
This component rolls back any changes and returns the database to its previous state if the proceeding subjob
fails.
2205
9. From the Component List list, select the component for the server connection you want to close if you have
configured more than one MDM server connection. In this use case, there is only one MDM server connection
open, so simply use the default setting.
10.Deslect the Close Connection check box if it is selected. In this example, the tMDMClose component closes
the connection to the MDM Server.
11.Double-click tMDMClose to display its Basic settings view and define the component properties.
The tMDMClose component is used to close the connection after the successful execution of the Job. You can also close
the connection by selecting the Close Connection check box in the tMDMCommit and tMDMRollback components,
but for the purposes of this scenario the tMDMClose component is used instead.
12.From the Component List list, select the component for the server connection you want to close if you have
configured more than one MDM server connection. In this use case, there is only one MDM server connection
open, so simply use the default setting.
2. From the Schema list, select Built-in and click [...] next to Edit schema to open a dialog box.
Here you can define the structure of the master data you want to read in the MDM hub.
2206
3. The master data is collected in a three-column schema of the type String: Id, Name and Price. Click OK to
close the dialog box and proceed to the next step.
4. Select the Use an existing connection check box, and from the Component List list that appears, select the
component you have configured to open your MDM server connection.
In this scenario, only one MDM server connection exists, so simply use the default selection.
5. In the Entity field, enter the name of the business entity that holds the data record(s) you want to read, between
quotation marks. Here, we want to access the Product entity.
6. In the Data Container field, enter the name of the data container that holds the master data you want to read,
between quotation marks. In this example, we use the Product container.
The Use multiple conditions check box is selected by default.
7. In the Operations table, define the conditions to filter the master data you want to delete as follows:
Click the plus button to add a new line.
In the Xpath column, enter the Xpath and the tag of the XML node on which you want to apply the filter,
between quotation marks. In this example, we work with the Product entity, so enter Product/Name.
In the Function column, select the function you want to use. In this scenario, we use the Contains function.
In the Value column, enter the value of your filter. Here, we want to filter the master data where the Name
contains mug.
8. In the Component view, click Advanced settings to set the advanced parameters.
2207
9. In the Loop XPath query field, enter the structure and the name of the XML node on which the loop is to be
carried out, between quotation marks.
10.In the Mapping table and in the XPath query column, enter the name of the XML tag in which you want to
collect the master data, next to the corresponding output column name, between quotation marks.
2. From the Schema list, select Built-in and click the three-dot button next to the Edit Schema field to describe
the structure of the master data in the MDM hub.
2208
3. Click the plus button to the right to add one column of the type String. In this example, name this column
outputXML. Click OK to close the dialog box and proceed to the next step.
4. Select the Use an existing connection check box, and from the Component List list that appears, select the
component you have configured to open your MDM server connection.
In this scenario, only one MDM server connection exists, so simply use the default selection.
5. In the Entity field, enter the name of the business entity that holds the master data you want to delete, the
Product entity in this example.
6. In the Data Container, enter the name of the data container that holds the data to be deleted, Product in this
example.
7. In the Keys table, click the plus button to add a new line. In the Keys column, select the column that holds the
key of the Product entity. Here, the key of the Product entity is set on the Id field.
If the entity has multiple keys, add as many line as required for the keys and select them in sequential order.
8. Select the Logical delete check box if you do not want to delete the master data permanently. This will send
the deleted data to the Recycle bin. Once in the Recycle bin, the master data can be restored or permanently
deleted. If you leave this check box clear, the master data will be permanently deleted.
9. Fill in the Recycle bin path field. Here, we left the default path but if your recycle bin is in a path different
from the default, specify the path.
2209
tMDMInput
tMDMInput
tMDMInput properties
Component family
Talend MDM
Function
Purpose
This component reads master data in an MDM Hub and thus makes it possible to process this data.
Basic Settings
Property Type
A schema is a row description, i.e., it defines the number of fields that will
be processed and passed on to the next component. The schema is either
built-in or remote in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode
is available.
Click Edit Schema to modify the schema.
If you modify the schema, it automatically becomes built-in.
Built-in: The schema will be created and stored for this component only.
Related Topic: see Talend Studio User Guide.
Use an existing connection Select this check box if you want to use a configured tMDMConnection
component.
URL
Version
Type in the name of the master data Version you want to connect to and
to which you have access rights.
Leave this field empty if you want to display the default Version.
Entity
Type in the name of the business entity that holds the master data you want
to read.
Data Container
Type in the name of the data container that holds the master data you want
to read.
Type
Select this check box to filter the master data using certain conditions.
Xpath: Enter between quotes the path and the XML node to which you
want to apply the condition.
Function: Select the condition to be used from the list. Depending on the
type of field pointed to by the XPath, only certain operators may apply;
for instance, if the field is a boolean only the Equal or Not Equal operators
are appropriate.
The following operators are available:
Contains: Returns a result which contains the word or words entered.
Joins With : Returns a result which functions as a join operator.
2210
tMDMInput properties
Starts With: Returns a result which begins with the string entered.
Strict Contains: Returns a result which contains the exact regular
expression entered. Applies only to XML databases.
Equal: Returns a result which matches the boolean entered; that is, True
or False.
Not Equal: Returns a result of any value other than the boolean entered;
that is, True or False.
is greater than: Returns a result which is greater than the numerical
value entered. Applies to number fields only.
is greater or equal: Returns a result which is greater than or equal to the
numerical value entered. Applies to number fields only.
is lower than: Returns a result which is less than the numerical value
entered. Applies to number fields only.
is lower or equal: Returns a result which is less than or equal to the
numerical value entered. Applies to number fields only.
whole content contains: Performs a plain text search in all the fields of
the entity. For SQL databases, a "Starts with" search is performed; for
XML databases, a "Contains" search is performed.
contains a word like: Performs a fuzzy search to return a similar word
to the word entered.
is empty or null: Returns a result where the field is empty or returns a
null value.
Value: Enter between inverted commas the value you want to use. Note
that if the value contains XML special characters such as /, you must
also enter the value in single quotes ("'ABC/XYZ'") or the value will be
considered as an XPath.
Predicate: Select a predicate if you use more than one condition.
The following predicates are available:
Default: Interpreted as an and.
or: One of the conditions applies.
and: Both or all of the conditions apply.
The other predicates are reserved for future use and may be subject to
unpredictable behavior.
If you clear this check box, you have the option of selecting particular IDs
to be displayed in the ID value column of the IDS table.
If you clear the Use multiple conditions check box, the Batch
Size option in the Advanced Settings tab will no longer be
available
Advanced settings
Skip Rows
Max Rows
Die on error
Select this check box to skip the row in error and complete the process for
error-free rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.
Batch Size
Mapping
2211
XPath query: Type in the name of the fields to extract from the input
XML structure.
Get Nodes: Select this check box to retrieve the Xml node together with
the data.
tStatCatcher Statistics
Usage
Select this check box to gather the processing metadata at the Job level as
well as at each component level.
From the Palette, drop tMDMInput and tLogRow onto the design workspace.
Connect the two components together using a Row Main link.
Double-click tMDMInput to open the Basic settings view and define the component properties.
In the Schema list, select Built-In and click the three-dot button next to Edit schema to open a dialog box.
Here you can define the structure of the master data you want to read on the MDM server.
2212
The master data is collected in a three column schema of the type String: ISO2Code, Name and Currency. Click
OK to close the dialog box and proceed to the next step.
In the URL field, enter between inverted commas the URL of the MDM server.
In the Username and Password fields, enter your login and password to connect to the MDM server.
In the Version field, enter between inverted commas the name of the master data Version you want to access.
Leave this field empty to display the default Version.
In the Entity field, enter between inverted commas the name of the business entity that holds the master data
you want to read.
In the Data Container field, enter between inverted commas the name of the data container that holds the
master data you want to read.
In the Component view, click Advanced settings to set the advanced parameters.
In the Loop XPath query field, enter between inverted commas the structure and the name of the XML node
on which the loop is to be carried out.
In the Mapping table and in the XPath query column, enter between inverted commas the name of the XML
tag in which you want to collect the master data, next to the corresponding output column name.
In the design workspace, click on the tLogRow component to display the Basic settings in the Component
view and set the properties.
2213
Click on Edit Schema and ensure that the schema has been collected from the previous component. If not, click
Sync Columns to fetch the schema from the previous component.
Save the Job and press F6 to run it.
The list of different countries along with their codes and currencies is displayed on the console of the Run view.
2214
tMDMOutput
tMDMOutput
tMDMOutput properties
Component family
Talend MDM
Function
Purpose
Basic settings
Property Type
Input Schema and Edit An input schema is a row description, i.e., it defines the number of fields
schema
that will be processed and passed on to the next component. The schema
is either built-in or remote in the Repository.
If you are using Talend Open Studio for Big Data, only the Built-in mode
is available.
Click Edit Schema to modify the schema. Note that if you modify the
schema, it automatically becomes built-in.
Click Sync columns to collect the schema from the previous component.
Built-in: You create the schema and store it locally for this component
only. Related topic: see Talend Studio User Guide.
Build the document
Select this check box if you want to build the document from a flat schema
If this is the case, double-click the component and map your schema in
the dialog box that opens.
If the check box is not selected, you must select the column in your schema
that contains the document from the Predefined XML document list.
Result
of
serialization
the
XML Lists the name of the XML output column that will hold the XML data.
Use an existing connection Select this check box if you want to use a configured tMDMConnection
component.
URL
Version
Type in the name of the master data management Version you want to
connect to, for which you have the user rights required.
Leave this field empty if you want to display the default perspective.
Data Model
Type in the name of the data model against which the data to be written
is validated.
Data Container
Type in the name of the data container where you want to write the master
data.
This data container must already exist.
Type
2215
tMDMOutput properties
Return Keys
Is Update
Select this check box to add the actions carried out to a modification
report.
Source Name: Between quotes, enter the name of the application to be
used to carry out the modifications.
Enable verification by before saving process: Select this check box
to verify the commit that has been just added; prior to saving.
then the Xpath you enter in this Pivot field must read as the following:
/Person/Children/Child where the Overwrite check box is set to
false.
And, if you need to replace a child sub-element in an existing item:
<Person>
<Id>1</Id>
<Addresses>
<Address>
<Type>office</Type>
(...address elements
are here....)
</Address>
<Address>
<Type>home</Type>
(...address elements
are here....)
</Address>
<Addresses>
</Person>
then the Xpath you enter in this Pivot field must read as the following: /
Person/Addresses/Adress where the Overwrites check box is set to
true, and the Key field is set to /Type .
In such example, assuming the item in MDM only has an office address,
the office address will be replaced, and the home address will be added.
- Overwrite: select this check box if you need to replace or update the
original sub-elements with the input sub-elements. Leave unselected if
you want to add a sub-element.
- Key: type in the xpath relative to the pivot that will help match a subelement of the source XML with a sub-element of the item. If a key is not
2216
Advanced settings
Die on error
Select this check box to skip the row in error and complete the process for
error-free rows. If needed, you can retrieve the rows in error via a Row
> Rejects link.
Extended Output
Select this check box to commit master data in batches. You can specify
the number of lines per batch in the Rows to commit field.
Opens the interface which helps create the XML structure of the master
data you want to write.
Group by
Create empty element if This check box is selected by default. If the content of the interface's
needed
Related Column which enables creation of the XML structure is null,
or if no column is associated with the XML node, this option creates an
opening and closing tag at the required places.
Advanced separator (for Select this check box to modify the number of separators used by default.
number)
- Thousands separator: enter between inverted commas the separator
for thousands.
- Decimal separator: enter between inverted commas the decimal
separator.
Generation mode
Usage
Encoding
Select the encoding type from the list or else select Custom and define
it manually. This is an obligatory field for the manipulation of data on
the server.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Use this component to write a data record and separate the fields using a specific separator.
In this example, we want to load a new agency in the Agency business entity. This new agency should have an
id, a name and a city.
From the Palette, drop tFixedFlowInput and tMDMOutput onto the design workspace.
2217
In the Schema list, select Built-In and click the three-dot button next to Edit schema to open a dialog box in
which you can define the structure of the master data you want to write on the MDM server.
Click the plus button and add three columns of the type String. Name the columns: Id, Name and City.
Click OK to validate your changes and proceed to the next step.
In the Number of rows field, enter the number of rows you want to generate.
In the Mode area, select the Use Single Table option to generate just one table.
In the Value fields, enter between inverted commas the values which correspond to each of the schema columns.
In the design workspace, click tMDMOutput to open its Basic settings view and set the component properties.
2218
In the Schema list, select Built-In and, if required, click on the three dot button next to the Edit Schema field
to see the structure of the master data you want to load on the MDM server.
The tMDMOutput component basically generates an XML document, writes it in an output field, and then sends
it to the MDM server, so the output schema always has a read-only xml column.
Click OK to proceed to the next step.
The Result of the XML serialization list in the Basic settings view is automatically filled in with the output
xml column.
In the URL field, enter the URL of the MDM server.
In the Username and Password fields, enter the authentication information required to connect to the MDM
server.
In the Version field, enter between inverted commas the name of the master data Version you want to access,
if more than one exists on the server. Leave the field blank to access the default Version.
In the Data Model field, enter between inverted commas the name of the data model against which you want
to validate the master data you want to write.
2219
In the Data Container, enter between inverted commas the name of the data container into which you want
to write the master data.
In the Component view, click Advanced settings to set the advanced parameters for the tMDMOutput
component.
Select the Extended Output check box if you want to commit master data in batches. You can specify the
number of lines per batch in the Rows to commit field.
Click the three-dot button next to Configure Xml Tree to open the tMDMOutput editor.
In the Link target area to the right, click in the Xml Tree field and then replace rootTag with the name of the
business entity in which you want to insert the data record, Agency in this example.
In the Linker source area, select your three schema columns and drop them on the Agency node.
The [Selection] dialog box displays.
2220
Select the Create as sub-element of target node option so that the three columns are linked to the three XML
sub-elements of the Agency node and then click OK to close the dialog box.
Right-click the element in the Link Target area you want to set as a loop element and select Set as Loop
Element from the contextual menu. In this example, we want City to be the iterating object.
Click OK to validate your changes and close the dialog box.
Save your Job and press F6 to run it.
The new data record is inserted in the Agency business entity in the DStar data container on the MDM server. This
data records holds, as you defined in the schema, the agency id, the agency name and the agency city.
2221
2222
tMDMReceive
tMDMReceive
tMDMReceive properties
Component family
Talend MDM
Function
tMDMReceive receives an MDM record in XML from MDM triggers or MDM processes.
Purpose
This component decodes a context parameter holding MDM XML data and transforms it into a flat
schema.
Basic Settings
Property Type
XML Record
Enter the context parameter allowing to retrieve the last changes made
to the MDM server. For more information about creating and using a
context parameter, see Talend Studio User Guide.
XPath Prefix
If required, select from the list the looping xpath expression which is a
concatenation of the prefix + looping xpath.
/item: select this xpath prefix when the component receives the record
from a process because processes encapsulate the record within an item
element only.
/exchange/item: select this xpath prefix when the component receives
the record from a trigger because triggers encapsulate the record within
an item element which is within an exchange element.
Mapping
Limit
Die on error
This check box is selected by default. Clear the check box to skip the row
on error and complete the process for error-free rows. If needed, you can
retrieve the rows on error via a Row > Reject link.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Usage
2223
Scenario prerequisites
A data container Product and a data model Product are created and deployed to MDM server. The Product and
Store data entities are defined and some data records already exist in them.
The entities Product and Store are linked by a foreign key which is the Name of the Store.
This example is designed to obtain the store information for a new product. Therefore, when you create a new
Product record, make sure that the Store information is also added for the new Product record.
The entities and their attributes are shown below.
For more information about MDM working principles, see the MDM part in Talend Studio User Guide.
2224
Drop the following components from the Palette onto the design workspace: tMDMReceive, and tLogRow.
2.
3.
From the Variables view of the Contexts tab, click the [+] button to add one variable and name it
exchangeMessage.
2.
Click Values as table. Fill in the variable value from the Default column.
2225
Note that the XML record must conform to a particular schema. For more information about the schema, see
the description of processes and schemas used in MDM processes to call Jobs in Talend Studio User Guide.
One sample of XML record from the Update Report is as follows:
<exchange xmlns:mdm="java:com.amalto.core.plugin.base.xslt.MdmExtension">
<report>
<Update>
<UserName>administrator</UserName>
<Source>genericUI</Source>
<TimeInMillis>1381486872930</TimeInMillis>
<OperationType>ACTION</OperationType>
<RevisionID>null</RevisionID>
<DataCluster>Product</DataCluster>
<DataModel>Product</DataModel>
<Concept>Product</Concept>
<Key>2</Key>
</Update>
</report>
<item><Product><Id>001</Id><Name>Computer</Name><Description>Laptop
series</Description><Availability>true</Availability><Price>400</
Price><OnlineStore>TalendShop@@http://www.cafepress.com/Talend.2</
OnlineStore><Stores><Store>[Dell]</Store><Store>[Lenovo]</Store></Stores></
Product></item>
</exchange>
3.
Double-click the tMDMReceive component to open its Basic settings view in theComponent tab.
2.
Click the [...] button next to Edit schema to define the desired data structure. In this example, three columns
are added: Product_ID, Product_Name, and Store_Name.
2226
3.
4.
5.
In the Loop XPath query field, type in the name of the XML tree root tag. In this example, type in "/Product/
Stores/Store".
6.
The Column column in the Mapping table is populated with the columns defined in the schema. In the
XPath query column, enter the XPath query accordingly. In this example, the information of product ID,
product name and store name will be extracted.
7.
Double-click the tLogRow component to open its Basic settings view in theComponent tab.
8.
2.
2227
tMDMRollback
tMDMRollback
tMDMRollback properties
Component family
Talend MDM
Function
tMDMRollback returns a database to its original state before a Job was run, instead of committing
any changes.
Purpose
This component is used as part of an overall MDM transaction to roll back any changes made in the
database rather than definitively committing them, for example to prevent partial commits if an error
occurs.
Basic settings
Component List
Close Connection
Select this check box to close the session for this connection to the MDM
Server after rolling back the changes. Note that even if you do not select
this check box, the connection can still not be used in a subsequent subjob
unless the Autocommit mode has been enabled.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Usage
Related scenario
For a related use case, see section Scenario: Deleting master data from an MDM Hub.
2228
tMDMRouteRecord
tMDMRouteRecord
tMDMRouteRecord properties
Component family
Talend MDM
Function
tMDMRouteRecord submits the primary key of a record stored in your MDM Hub to Event Manager
in order for Event Manager to trigger the due process(es) against some specific conditions that you
can define in the process or trigger pages of the MDM Studio.
For more information on Event Manager and on a MDM process, see Talend Studio.
Purpose
This component helps Event Manager identify the changes which you have made on your data so that
correlative actions can be triggered.
Basic Settings
URL
Version
Type in the name of the master data management Version you want to
connect to, for which you have the user rights required.
Leave this field empty if you want to display the default perspective.
Advanced settings
Connections
Data Container
Type in the name of the data container that holds the record you want
Event Manager to read.
Type
Entity Name
Type in the name of the business entity that holds the record you want
Event Manager to read.
IDs
Specify the primary key(s) of the record(s) you want Event Manager
to read.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Outgoing links (from this component to another):
Row: Iterate
Trigger: Run if; On Component Ok; On Component Error, On Subjob
Ok, On Subjob Error.
Incoming links (from one component to this one):
Row: Iterate;
Trigger: Run if, On Component Ok, On Component Error, On Subjob
Ok, On Subjob Error
For further information regarding connections, see Talend Studio User
Guide.
Global Variables
2229
Scenario prerequisites
The following prerequisites must be met in order to replicate this scenario:
A data container stores several records using a specific model. In this scenario, the container is named Product,
and a record in the container is entered against the model named Product:
This figure shows one of the stored product records with all of its viewable attributes.
2230
For further information about how to create a data container and a data model, see your Talend Studio User Guide.
For further information about how to create a record and access its viewable attributes, see the Talend MDM Web
User Interface User Guide.
3. On the Actions panel on the right, select the required data container and data model in which is the record to
be updated. In this scenario, the data container and the data model are both Product.
4. Click Save to save the selected data container and data model.
5. In the Master Data Browser view, select the Product entity.
6. Double-click one of the product records to display its viewable attributes in a new view dedicated to this product.
For example, open the product Talend Mug with the unique Id 231035938.
2231
7. In this view, modify one of the attribute values. You can, for example, update this product and make it available
by selecting the Availability check box.
8. Click Save to validate this update.
9. Open your Talend Studio and make sure you are connected to the MDM server. For further information about
how to launch the Talend Studio and connect it to the MDM server, see Talend Studio User Guide.
10.In the MDM Repository tree view, under the Job Designs node, right click the message Job.
11.In the contextual menu, select Generate Talend Job Caller Process, accept the default options and click
Generate.The process used to call this Job is generated and stored under Event Management > Process.
12.Deploy your newly-created Process to the MDM Server.
13.Under the Event Management node, right click Trigger.
14.In the contextual menu, select New.
15.In the pop-up New Trigger wizard, name the trigger, for example, TriggerMessage.
2232
16.Click OK to open the new trigger view in the workspace of your studio.
17.Configure the trigger to make it launch the process that calls the message Job once an update is done.
18.In the Description field, enter, for example, Trigger that calls the Talend Job: message_0.1.war to describe
the trigger being created.
19.In the Entity field, select or type in the business entity you want to trigger the process on. In this example,
it is exactly Update.
20.In the Service JNDI Name field, select callprocess from the drop-down list.
21.In the Service Parameters field, select the CallJob_message check box.
22.
In the Trigger xPath Expressions area, click the
23.In the newly added line, click the three-dot button to open a dialog box where you can select the entity or
element on which you want to define conditions. In this example, it is Update/OperationType.
2233
24.In the Value column, enter a value for this line. In this example, it is exactly UPDATE.
25.In the Condition Id column, enter a unique identifier for the condition you want to set, for example, C1.
26.In the Conditions area, enter the query you want to undertake on the data record using the condition ID C1
you set earlier.
27.Press Ctrl+S to save the trigger, and then deploy the trigger to the MDM Server.
28.In the MDM Repository tree view, double click Data Container > System > UpdateReport to open the Data
Container Browser UpdateReport view. An Update Report is a complete track of all create, update or delete
actions on any master data.
Note that, if the Update Report data container is not available, you may first have to import it from your MDM
Server. For details of how to import system objects from the MDM Server to your local repository, see the
Talend Studio User Guide.
2234
29.
Next to the Entity field of this view, click the
button to search all the action records in the UpdateReport.
Note that the Update entity does not necessarily mean that the corresponding action recorded is the update, as it
is just the entity name defined by the data model of UpdateReport and may record different actions including
create, delete, update.
The last record corresponds to what you did to the product record at the beginning of the scenario. The primary
key of this record is genericUI.1283244014172 and this is the record that will be routed to Event trigger.
30.In the Integration perspective, right-click Job Designs, in the Repository tree view. In the contextual menu,
select Create Job.
31.A wizard opens. In the Name field, type in RouteRecord, and click Finish.
32.Drop the tMDMRouteRecord component from the Palette onto the design workspace.
33.Double click this component to open its Component view.
34.In the URL field, enter the address of your MDM server. This example uses http://localhost:8180/talend/
TalendPort.
35.In the Username and the Password fields, type in the relevant information.
36.In the Data Container field, enter the data container name that stores the record you want to route. It is
UpdateReport in this example.
37.In the Entity Name field, enter the entity name that the record you want to route belongs to. In this example,
the entity name is Update.
38.In the IDS area, click the plus button under the table to add a new line.
39.In the newly added line, fill in the primary key of the record to be routed to Event Manager, that is,
genericUI.1283244014172, as was read earlier from the Data Container Browser UpdateReport.
40.Press F6 to run this Job. Event Manager calls the process to execute the message Job and generate the dialog
box informing the user that this recorded has been updated.
2235
This component submits the primary key of the record noting the update to Event Manager. When Event
Manager checks this record and finds that this record meets the conditions you have defined on the trigger
TriggerMessages configuration view, it calls the process that launches the message Job to pop up the dialog box
informing the user of this update.
2236
tMDMSP
tMDMSP
tMDMSP Properties
Component family
Talend MDM
Function
Purpose
tMDMSP offers a convenient way to centralize multiple or complex queries in a MDM Hub and
call them easily.
Basic settings
URL
Version
Type in the name of the master data management Version you want
to connect to, for which you have the user rights required.
Leave this field empty if you want to display the default perspective.
Data Container
Type in the name of the data container that stores the procedure you
want to call.
Type
Procedure Name
Click the Plus button and select the various Input Columns that will
be required by the procedures.
The SP schema can hold more columns than there are
parameters used in the procedure.
Advanced settings
Connections
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Outgoing links (from this component to another):
Row: Main
Trigger: Run if; On Component Ok; On Component Error, On
Subjob Ok, On Subjob Error.
Incoming links (from one component to this one):
Row: Main, Iterate;
Trigger: Run if, On Component Ok, On Component Error, On
Subjob Ok, On Subjob Error
2237
This component is used as intermediary component. It can be used as start component but only no
input parameters are thus needed for the procedure to be called. An output link is required.
Limitation
N/A
This Job will generate parameters used to execute a stored procedure in the MDM Hub, then extract the desired
data from the returned XML-format result and present the extracted data in the studio.
The products of which the prices are to be treated are listed on your MDM Web UI.
This Job requires you to have previously created a stored procedure called PriceAddition in the MDM Repository
tree view and deployed this stored procedure to the server. The procedure uses the following steps:
for $d in distinct-values(//Product/Name)
let $product := //Product[Name= $d and Price >= %0 and Price <=%1]
order by $d
return <result><Name>{$d}</Name><Sum>{sum($product/Price)}</Sum></result>
2238
For more information on working with stored procedures, see Talend Studio User Guide.
To create this Job, proceed as follows:
1. Drag and drop the following components used in this example: tFixedFlowInput, tMDMSP,
tExtractXMLField, tLogRow.
2. Connect the components using the Row Main link.
3. The tFixedFlowInput is used to generate the price range of your interest for this calculation. In this example,
define 10 as the minimum and 17 as the maximum in order to cover all of the products. To begin, double-click
on tFixedFlowInput to open its Component view.
4. On the Component view, click the [...] button next to Edit schema to open the schema editor of this component.
5. In the schema editor, add the two parameters min and max that are used to define the price range.
6. Click OK.
In the Values table, in the Mode area of the Component view, the two parameters min and max that you have
defined in the schema editor of this component display.
7. In the Value column of the Values table, enter 10 for the min parameter and 17 for the max parameter.
2239
9. In the URL field of the Component view, type in the MDM server address, in this example, http://
localhost:8080/talend/TalendPort.
10.In Username and Password, enter the authentication information, in this example, admin and talend.
11.In Data Container and Procedure Name, enter the exact names of the data container Product and of the stored
procedure PriceAddition.
12.Under the Parameters (in order) table, click the plus button two times to add two rows in this table.
13.In the Parameters (in order) table, click each of both rows you have added and from the drop-down list, select
the min parameter for one and the max parameter for the other.
14.Double-click on tExtractXMLField to open its Component view.
2240
15.On the Component view, click the [...] button next to Edit schema to open the schema editor of this component.
16.In the schema editor, add two columns to define the structure of the outcoming data. These two columns are
name and sum. They represent respectively the name and the total price of each kind of product recorded in
the MDM Web UI.
17.Click OK to validate the configuration and the two columns display in the Mapping table of the Component
view.
18.In the Loop XPath query field, type in the node of the XML tree, which the loop is based on. In this
example, the node is /result as you can read in the procedure code: return <result><Name>{$d}</
Name><Sum>{sum($product/Price)}</Sum></result>.
19.In XPath query of the Mapping table, enter the exact node name on which the loop is applied. They are /
result/Name used to extract the product names and /result/Sum used to extract the total prices.
20.Eventually, double-click tLogRow to open its Component view.
2241
The output lists the four kinds of products recorded in the MDM Web UI and the total price for each of them.
2242
tMDMTriggerInput
tMDMTriggerInput
tMDMTriggerInput properties
Component family
Talend MDM
Function
Once executed, tMDMTriggerInput reads the XML message (Document type) sent by MDM and
passes them to the component that follows.
This component works alongside the new trigger service and process plug-in in MDM version
5.0 and higher. The MDM Jobs, triggers and processes developed in previous MDM versions
remain supported. However, we recommend using this component when designing new
MDM Jobs.
Purpose
Every time when you save a change in your MDM, the corresponding change record is generated in
XML format. At runtime, this component reads this record and sends the relative information to the
following component.
With this component, you do not need to configure your Job any more in order to communicate the
data changes from MDM to your Job.
Basic settings
Property Type
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Usage
2243
Limitation
During the deployment of this component on the MDM server, you need to select the Hosted (Zip)
type as the format of the deployed Job. If you deploy it in the Distributed (War) type, the relative Job
cannot be invoked. For further information about the available types, see Talend Studio User Guide.
In this scenario, a four-component Job is used to exchange the event information about a product record. Using an
established MDM connection from the Repository, this Job is triggered by Talend Studio once you have updated
a product record.
To replicate this scenario, accomplish the following tasks sequentially:
1. Create an MDM connection of the Receive type in the Repository of the Studio. This connection is to the
MDM hub holding the record you want to update.
2. Create the Job receiving and sending the MDM update message.
3. Generate the process invoking this Job created.
4. Update a specific MDM record.
To create the MDM records, model and container used in this scenario, you can execute the Jobs in the MDM
demo project in Talend Studio and then update the MDM server to deploy the objects thus created for them to be
taken into account at runtime. You will use this server all through this scenario.
For further information about how to import a demo project, see Talend Studio User Guide.
For further information about how to update the server for deploying objects, see Talend Studio User Guide.
For further information about an MDM event and the event management, see Talend Studio User Guide.
Launch the MDM server with which you need to communicate the update message.
2.
In the Integration perspective of Talend Studio, expand the Metadata node in the Repository.
3.
Right-click the Talend MDM item and select Create MDM connection.
2244
4.
Enter the Name you want to use for this connection and if required, added the Purpose and the Description
in the corresponding fields. For example, we name this connection as receive_update.
5.
In the Next step, enter the authentication information used to connect to the MDM web service through which
you manage the record to be updated.
Once you click the Check button and the connection is shown successful, the Next button becomes clickable.
2245
6.
In the Next step, select the Version, the Data model and the Data Container used by the record to be
updated. In this scenario, the model and the container are both Product.
7.
Click Finish to validate the creation. The connection created appears under the Metadata node in the
Repository.
Retrieving entities
1.
Right-click the connection created and from the contextual menu, select Retrieve entities. Then the wizard
appears.
2.
2246
3.
Select the entity to be retrieved. In this scenario, it is Product. Then the name field is entered automatically.
4.
In the Next step, drop the elements you need to retrieve from the Source Schema area to the Target Schema
area. In this scenario, the Features element is the loop and the Id, the Name and the Description elements
are the fields to extract.
2247
5.
2248
In the Next step, if required, change the description of the schema retrieved; otherwise, click Finish to finalize
retrieving this entity. In this scenario, we keep the default schema description and click Finish.
The schema of the product entity is retrieved. For further information about the container and the data model
used by , see Talend Studio User Guide.
In the Integration perspective of Talend Studio, select Create Job from the Job Design node in the
Repository tree view. Then the New Job wizard appears.
2.
Name this new Job and click Finish to close the wizard and validate the creation. An empty Job is opened
on the workspace of the Studio.
3.
Drop tMDMTriggerInput, tXMLMap, tMDMTriggerOutput and tLogRow from Palette onto the
workspace.
4.
Right-click tMDMTriggerInput and from the contextual menu, select the Row > Main link to connect it
to tXMLMap.
5.
Do the same to connect tXMLMap to tMDMTriggerOutput. When doing so, a dialog box appears to prompt
you to name this link created.
6.
7.
2249
8.
Select the single pre-defined column of tMDMTriggerOutput, then, click
on the input side (left).
2.
In the table representing the input flow (up-left of the editor), right-click the column name MDM_Message
on the top of the XML tree and select Import from repository. The [Metadata] wizard appears.
3.
Select the entity schema retrieved earlier using the Receive MDM model, then click OK. In this scenario,
the entity schema is ProductReceive.
4.
A dialog box appears prompting you to add the schema of the Update Report to the input XML tree. Click OK
to accept it. This builds a complete input document for an MDM event. In the input XML tree, the Features
element is set as loop element automatically.
5.
In the table representing the output flow (up-right of the editor), develop the output XML tree as presented in
the figure below. This tree is constructed depending on the required static model of the MDM output report.
2250
6.
Map the OperationType element on the input side with the message element on the output side. This will
output the information about the type of the event occurring on the MDM record.
To get more information, you can build the concatenation of the input elements you need to extract in the
Expression column of this message element. Both tMap and tXMLMap allow you to edit expressions using
the expression editor. For further information about how to edit an expression, see Talend Studio User Guide.
7.
8.
Click the pincer icon to display the output settings panel, then set the All in one option as true.
9.
10. Double click tLogRow to open its Component view, then, click Sync columns.
This Job is finalized. For further information about the input document and the output report of an MDM event,
see Talend Studio User Guide.
Switch to the MDM perspective by clicking the corresponding button in the up-right corner of the Studio.
2.
In MDM Repository, click the refresh button so that the Job created appears under the Job Designs node
of this Repository's tree view.
3.
Right-click this Job created, update_product in this scenario, and from the contextual menu, select Deploy
to in order to deploy it to the MDM server.
Talend Open Studio for Big Data Components Reference Guide
2251
4.
The deployment wizard appears. From the server list, select the MDM server you are using, then click OK.
5.
In the [Deploy to Talend MDM] window that pops up, select the Export type and the Context scripts for
the Job to be deployed. In this scenario, keep the default settings: Export type is Hosted (zip) and Context
scripts is Default.
For further information about these settings, see Talend Studio User Guide.
6.
2252
Click Finish to validate these settings and start the deployment. When the deployment is done, a message
box pops up to indicate that the deployment is successful.
7.
Click OK to close this message box, then a window pops up to list the objects deployed. In this scenario,
it is the Job, update_product.
8.
Right-click the Job update_product again and select Generate Talend Job Caller Process from the
contextual menu.
2.
In the pop-up window, keep the default settings for this scenario: Integrated and Embedded. For further
information about the available options in this window, see Talend Studio User Guide.
2253
3.
Click Generate to start the generation. Once done, a process named CallJob_update_product appears under
the Process node in MDM Repository.
4.
Right-click this process, then select Deploy to from the contextual menu to deploy it onto the MDM server.
5.
In the pop-up wizard, select the server you are using, then , click OK to open the window listing the objects
deployed.
6.
Click OK to close this window and finalize the deployment. The question mark disappears from the icon
of this process.
2254
7.
In MDM Repository, right-click the CallJob_update_prodcut process, then select Rename from the
contextual menu.
8.
In the pop-up window, rename this process as beforeSaving_update_product depending on the required
process naming pattern. Then click OK to validate it.
9.
Log in the web service of the MDM hub you are using.
2.
In the Actions panel on the right side, verify the Data Container and the Data Model you are using are
both Product.
3.
In the Data Browser page, launch the search in the product entities so as to list all the available product
records
4.
Select the product record you need to update from the list, for example, Talend Trucker Hat. The details of
this record appears in the Product tab view.
5.
Update one of its attributes. For example, update the price to 11.00, then click Save.
The message about the operation type of this event has been sent to the MDM server and thanks to tLogRow,
this message is displayed on the window of this MDM server.
For further information about how to use the MDM web service, see Talend MDM Web User Interface User Guide
2255
tMDMTriggerOutput
tMDMTriggerOutput
tMDMTriggerOutput properties
Component family
Talend MDM
Function
tMDMTriggerOutput receives an XML flow (Document type) from its preceding component.
This component works alongside the new trigger service and process plug-in in MDM version
5.0 and higher. The MDM Jobs, triggers and processes developed in previous MDM versions
remain supported. However, we recommend using this component when designing new
MDM Jobs.
Purpose
This component receives an XML flow to set the MDM message so that MDM retrieves this message at
runtime. With this component, you do not need to configure your Job any more in order to communicate
the data changes from MDM to your Job.
Basic settings
Property Type
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Usage
Limitation
2256
During the deployment of this component on the MDM server, you need to select the Hosted (Zip)
type as the format of the deployed Job. If you deploy it in the Distributed (War) type, the relative Job
cannot be invoked. For further information about the available types, see Talend Studio User Guide.
Related scenario
Related scenario
For a related scenario, see section Scenario: Exchanging the event information about an MDM record
2257
tMDMViewSearch
tMDMViewSearch
tMDMViewSearch properties
Component family
Talend MDM
Function
tMDMViewSearch selects records from an MDM Hub by applying filtering criteria you have created
in a specific view. The resulting data is in XML structure.
For more information on a view on which you can define filtering criteria, see Talend Studio User
Guide.
Purpose
This component allows you to retrieve the MDM records from an MDM hub.
Basic settings
XML Field
Select the name of the column in which you want to write the XML data.
URL
Version
Type in the name of the master data management Version you want to
connect to, for which you have the user rights required.
Leave this field empty if you want to display the default perspective.
Data Container
Type in the name of the data container that holds the master data you
want to read.
Type
View Name
Type in the name of the view whose filters will be applied to process
the records.
Operations
2258
Advanced settings
Usage
Spell Threshold
Skip Rows
Type in the count of rows to be ignored to specify from which row the
process should begin. For example, if you type 8 in the field, the process
will begin from the 9th row.
Max Rows
Batch Size
Number of lines in each processed batch. By default, the batch size is set
to -1, meaning that all the lines are processed in one batch.
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job level
as well as at each component level.
Connections
Global Variables
Limitation
n/a
2259
In this example, you will select the T-shirt information from the Product entity via the Browse_items_Product
view created from Talend Studio. Each record in the entity contains the details defined as filtering criteria: Id,
Name, Description and Price.
From the Palette, drop tMDMViewSearch and tLogRow onto the design workspace.
Connect the components using a Row Main link.
Double-click tMDMViewSearch to view its Basic settings, in the Component tab and set the component
properties.
In the Schema list, select Built-In and click the three-dot button next to Edit schema to open a dialog box in
which you can define the structure of the XML data you want to write in.
2260
Click the plus button and add one column of the type String. Name the column as Tshirt.
Click OK to validate your creation and proceed to the next step.
In the XML Field field, select Tshirt as the column you will write the retrieved data in.
Use your MDM server address in the URL field and type in the corresponding connection data in the Username
and the Password fields. In this example, use the default url, then enter admin as username as well as password.
In the Data Container field, type in the container name: Product.
In the View Name field, type in the view name: Browse_item_Product.
Below the Operations table, click the plus button to add one row in this table.
In the Operations table, define the XPath as Product/Name, meaning that the filtering operation will be
performed at the Name node, then select Contains in the Function column and type in Tshirt in the Value
column.
Below the Order (One Row) table, click the plus button to add one row in this table.
In the Order (One Row) table, define the XPath as Product/Id and select the asc order for the Order column.
In the design workspace, click tLogRow to open its Basic settings view and set the properties.
Next to the three-dot button used for editing schema, click Sync columns to acquire the schema from the
preceding component.
2261
In the console docked in the Run view, you can read the retrieved Tshirt records in XML structure, which are
sorted in the ascending order.
2262
Technical components
This chapter details the components you can find in the Technical group of the Palette in the Integration
perspective of Talend Studio.
The Technical components are Java-oriented components that perform very technical actions such as loading data
in memory (in small subset of information) and keep it to allow its reuse at various stage of the processing.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tHashInput
tHashInput
tHashInput Properties
This component is used along with tHashOutput. It reads from the cache memory data loaded by tHashOutput.
Together, these twin components offer high-speed data access to facilitate transactions involving a massive amount
of data.
Component family
Technical
Function
tHashInput reads from the cache memory data loaded by tHashOutput to offer high-speed data
stream.
Purpose
This component reads from the cache memory data loaded by tHashOutput to offer high-speed
data feed, facilitating transactions involving a large amount of data.
Basic settings
Component list
Select this check box to clear the cache after reading the data
loaded by a certain tHashOutput component. This way, the following
tHashInput components, if any, will not be able to read the cached
data loaded by that tHashOutput component.
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component is used along with tHashOutput. It reads from the cache memory data loaded
by tHashOutput. Together, these twin components offer high-speed data access to facilitate
transactions involving a massive amount of data.
Limitation
n/a
2264
Scenario 1: Reading data from the cache memory for high-speed data access
Drag and drop the following components from the Palette to the workspace: tFixedFlowInput (X2),
tHashOutput (X2), tHashInput and tFileOutputDelimited.
2.
Connect the first tFixedFlowInput to the first tHashOutput using a Row > Main link.
3.
Connect the second tFixedFlowInput to the second tHashOutput using a Row > Main link.
4.
Connect the first subjob (from tFixedFlowInput_1) to the second subjob (to tFixedFlowInput_2) using an
OnSubjobOk link.
5.
6.
Connect the second subjob to the last subjob using an OnSubjobOk link.
Double-click the first tFixedFlowInput component to display its Basic settings view.
2265
Scenario 1: Reading data from the cache memory for high-speed data access
2.
3.
Click Edit schema to define the data structure of the input flow. In this case, the input has two columns: ID
and ID_Insurance, and then click OK to close the dialog box.
4.
Fill in the Number of rows field to specify the entries to output, e.g. 50000.
5.
Select the Use Single Table check box. In the Values table and in the Value column, assign values to the
columns, e.g. 1 for ID and 3 for ID_Insurance.
6.
Perform the same operations for the second tFixedFlowInput component, with the only difference in the
values. That is, 2 for ID and 4 for ID_Insurance in this case.
7.
2266
Scenario 1: Reading data from the cache memory for high-speed data access
8.
Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the
previous component. Select Keep all from the Keys management drop-down list and keep the Append
check box selected.
9.
Perform the same operations for the second tHashOutput component, and select the Link with a
tHashOutput check box.
2.
Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure, which is
the same as that of tHashOutput.
3.
4.
5.
Select Built-In from the Property Type drop-down list. In the File Name field, enter the full path and name
of the file, e.g. "E:/Allr70207V5.0/Talend-All-r70207-V5.0.0NB/workspace/out.csv".
6.
Select the Include Header check box and click Sync columns to retrieve the schema from the previous
component.
2267
Scenario 2: Clearing the memory before loading data to it in case an iterator exists in the same subjob
2.
Press F6, or click Run on the Run tab to execute the Job.
You can find that mass entries are written and read very rapidly.
Drag and drop the following components from the Palette to the workspace: tLoop, tFixedFlowInput,
tHashOutput, tHashInput and tLogRow.
2.
3.
4.
5.
2268
Scenario 2: Clearing the memory before loading data to it in case an iterator exists in the same subjob
2.
Select For as the loop type. Type in 1, 2 1 in the From, To and Step fields respectively. Keep the Values
are increasing check box selected.
3.
2269
Scenario 2: Clearing the memory before loading data to it in case an iterator exists in the same subjob
4.
5.
Click Edit schema to define the data structure of the input flow. In this case, the input has one column: Name.
6.
7.
Fill in the Number of rows field to specify the entries to output, for example 1.
8.
Select the Use Single Table check box. In the Values table, assign a value to the Name field, e.g. Marx.
9.
2270
Scenario 2: Clearing the memory before loading data to it in case an iterator exists in the same subjob
10. Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the
previous component. Select Keep all from the Keys management drop-down list and deselect the Append
check box.
2.
Select Built-In from the Schema drop-down list. Click Edit schema to define the data structure, which is
the same as that of tHashOutput.
3.
4.
5.
Select Built-In from the Schema drop-down list and click Sync columns to retrieve the schema from the
previous component. In the Mode area, select Table (print values in cells of a table).
2.
Press F6, or click Run on the Run tab to execute the Job.
You can find that only one row was output although two rows were generated by tFixedFlowInput.
2271
Scenario 2: Clearing the memory before loading data to it in case an iterator exists in the same subjob
2272
tHashOutput
tHashOutput
tHashOutput Properties
This component writes data to the cache memory and is closely related to tHashInput. Together, these twin
components offer high-speed data access to facilitate transactions involving a massive amount of data.
Component family
Technical
Function
Purpose
This component loads data to the cache memory to offer high-speed access, facilitating transactions
involving a large amount of data.
Basic settings
Component list
Keys management
Append
Advanced settings
tStatCatcher Statistics
Select this check box to collect log data at the component level.
Usage
This component writes data to the cache memory and is closely related to tHashInput. Together,
these twin components offer high-speed data access to facilitate transactions involving a massive
amount of data.
Limitation
n/a
Related scenarios
For related scenarios, see:
2273
Related scenarios
section Scenario 1: Reading data from the cache memory for high-speed data access.
section Scenario 2: Clearing the memory before loading data to it in case an iterator exists in the same subjob.
2274
XML components
This chapter details the main components that you can find in the XML family of the Palette in the Integration
perspective of Talend Studio.
The XML family groups together the components dedicated to XML related tasks such as parsing, validation,
XML structure creation and so on.
For Talend Open Studio for Big Data, the Property type, Schema and Query Type of components are always Built-in. For
further information about how to edit a Built-in schema, see Talend Studio User Guide.
tAdvancedFileOutputXML
tAdvancedFileOutputXML
tAdvancedFileOutputXML properties
Component family
XML or File/Output
Function
tAdvancedFileOutputXML outputs data to an XML type of file and offers an interface to deal
with loop and group by elements if needed.
Purpose
Basic settings
Property type
Select this check box process the data flow of interest. Once you
have selected it, the Output Stream field displays and you can
type in the data flow of interest.
The data flow to be processed must be added to the flow in
order for this component to fetch these data via the corresponding
representative variable.
This variable could be already pre-defined in your Studio or
provided by the context or the components you are using along
with this component; otherwise, you could define it manually and
use it according to the design of your Job, for example, using
tJava or tJavaFlex.
In order to avoid the inconvenience of hand writing, you could
select the variable of interest from the auto-completion list (Ctrl
+Space) to fill the current field on condition that this variable has
been properly defined.
For further information about how to use a stream, see section
Scenario 2: Reading data from a remote file in streaming mode.
File name
Opens the dedicated interface to help you set the XML mapping.
For details about the interface, see section Defining the XML tree.
Sync columns
2276
tAdvancedFileOutputXML properties
Advanced settings
Select this check box to add the new lines at the end of your source
XML file.
Select this check box to generate a file that does not have any
empty space or line separators. All elements then are presented
in a unique line and this will reduce considerably file size.
If the XML file output is big, you can split the file every certain
number of rows.
Trim data
This check box is activated when you are using the dom4j
generation mode. Select this check box to trim the leading or
trailing whitespace from the value of a XML element.
Create directory only if not This check box is selected by default. It creates a directory to hold
exists
the output XML files if required.
Create empty element if This box is selected by default. If no column is associated to an
needed
XML node, this option will create an open/close tag in place of
the expected tag.
Create attribute even if its Select this check box to generate XML tag attribute for the
value is NULL
associated input column whose value is null.
Create attribute even if it is Select this check box to generate XML tag attribute for the
unmapped
associated input column that is unmapped.
Create associated XSD file If one of the XML elements is defined as a Namespace element,
this option will create the corresponding XSD file.
To use this option, you must select Dom4J as the
generation mode.
Advanced separator (for Select this check box to change the expected data separator.
number)
Thousands separator: define the thousands separator, between
inverted commas
Decimal separator: define the decimals separator between
inverted commas
Generation mode
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Select the check box to collect the log data at a Job level as well
as at each component level.
Usage
Use this component to write an XML file with data passed on from other components using
a Row link.
Limitation
n/a
2277
tAdvancedFileOutputXML properties
To the left of the mapping interface, under Schema List, all of the columns retrieved from the incoming data flow
are listed (on the condition that an input flow is connected to the tAdvancedFileOutputXML component).
To the right of the interface, define the XML structure you want to obtain as output.
You can easily import the XML structure or create it manually, then map the input schema columns onto each
corresponding element of the XML tree.
Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2.
3.
4.
2278
tAdvancedFileOutputXML properties
The XML Tree column is hence automatically filled out with the correct elements. You can remove and insert
elements or sub-elements from and to the tree:
1.
2.
3.
Select Delete to remove the selection from the tree or select the relevant option among: Add sub-element,
Add attribute, Add namespace to enrich the tree.
Rename the root tag that displays by default on the XML tree panel, by clicking on it once.
2.
3.
On the menu, select Add sub-element to create the first element of the structure.
You can also add an attribute or a child element to any element of the tree or remove any element from the tree.
1.
2.
Right-click to the left of the element name to display the contextual menu.
3.
On the menu, select the relevant option among: Add sub-element, Add attribute, Add namespace or Delete.
2.
3.
2279
tAdvancedFileOutputXML properties
A light blue link displays that illustrates this mapping. If available, use the Auto-Map button, located to the bottom
left of the interface, to carry out this operation automatically.
You can disconnect any mapping on any element of the XML tree:
1.
Select the element of the XML tree, that should be disconnected from its respective schema column.
2.
Right-click to the left of the element name to display the contextual menu.
3.
Loop element
The loop element allows you to define the iterating object. Generally the Loop element is also the row generator.
To define an element as loop element:
1.
2.
Right-click to the left of the element name to display the contextual menu.
3.
2280
tAdvancedFileOutputXML properties
Group element
The group element is optional, it represents a constant element where the groupby operation can be performed. A
group element can be defined on the condition that a loop element was defined before.
When using a group element, the rows should sorted, in order to be able to group by the selected node.
To define an element as group element:
1.
2.
Right-click to the left of the element name to display the contextual menu.
3.
2281
The Node Status column shows the newly added status and any group status required are automatically defined,
if needed.
Click OK once the mapping is complete to validate the definition and continue the job configuration where needed.
Drop a tFileInputDelimited and a tAdvancedFileOutputXML from the Palette onto the design workspace.
2.
Right-click on the input component and drag a row main link towards the tAdvancedFileOutputXML
component to implement a connection.
3.
Select the tFileInputDelimited component and display the Component settings tab located in the tab system
at the bottom of the Studio.
2282
4.
Fill out the fields displayed on the Basic settings vertical tab.
The input file contains the following type of columns separated by semi-colons: id, name, category, year,
language, director and cast.
In this simple use case, the Cast field gathers different values and the id increments when changing movie.
5.
6.
Once you checked that the schema of the input file meets your expectation, click on OK to validate.
Then select the tAdvancedFileOutputXML component and click on the Component settings tab to
configure the basic settings as well as the mapping. Note that a double-click on the component will open
directly the mapping interface.
2283
2.
In the File Name field, browse to the file to be written if it exists or type in the path and file name that needs
to be created for the output.
By default, the schema (file description) is automatically propagated from the input flow. But you can edit
it if you need.
3.
Then click on the three-dot button or double-click on the tAdvancedFileOutputXML component on the
design workspace to open the dedicated mapping editor.
To the left of the interface, are listed the columns from the input file description.
4.
To the right of the interface, set the XML tree panel to reflect the expected XML structure output.
You can create the structure node by node. For more information about the manual creation of an XML tree,
see section Defining the XML tree.
In this example, an XML template is used to populate the XML tree automatically.
5.
Right-click on the root tag displaying by default and select Import XML tree at the end of the contextual
menu options.
6.
Browse to the XML file to be imported and click OK to validate the import operation.
You can import an XML tree from files in XML, XSD and DTD formats.
7.
Then drag & drop each column name from the Schema List to the matching (or relevant) XML tree elements
as described in section Mapping XML data.
The mapping is shown as blue links between the left and right panels.
2284
Finally, define the node status where the loop should take place. In this use case, the Cast being the changing
element on which the iteration should operate, this element will be the loop element.
Right-click on the Cast element on the XML tree, and select Set as loop element.
8.
To group by movie, this use case needs also a group element to be defined.
Right-click on the Movie parent node of the XML tree, and select Set as group element.
The newly defined node status show on the corresponding element lines.
9.
2285
2286
tDTDValidator
tDTDValidator
tDTDValidator Properties
Component family
XML
Function
Validates the XML input file against a DTD file and sends the validation log to the defined
output.
Purpose
Basic settings
DTD file
XML file
If XML is valid, display If Type in a message to be displayed in the Run console based on
XML is invalid, display
the result of the comparison.
Print to console
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Usage
This component can be used as standalone component but it is usually linked to an output
component to gather the log data.
Limitation
n/a
1.
Drop the following components from the Palette to the design workspace: tFileList, tDTDValidator, tMap,
tFileOutputDelimited.
2.
Connect the tFileList to the tDTDValidator with an Iterate link and the remaining component using a main
row.
3.
Set the tFileList component properties, to fetch an XML file from a folder.
2287
Click the plus button to add a filemask line and enter the filemask: *.xml. Remember Java code requires
double quotes.
Set the path of the XML files to be verified.
Select No from the Case Sensitive drop-down list.
4.
In the tDTDValidate Component view, the schema is read-only as it contains standard log information
related to the validation process.
In the Dtd file field, browse to the DTD file to be used as reference.
5.
Click in the XML file field, press Ctrl+Space bar to access the variable list, and double-click the current
filepath global variable: tFileList.CURRENT_FILEPATH.
6.
In the various messages to display in the Run tab console, use the jobName variable
to recall the job name tag. Recall the filename using the relevant global variable:
((String)globalMap.get("tFileList_1_CURRENT_FILE")). Remember Java code requires double
quotes.
Select the Print to Console check box.
7.
2288
In the tMap component, drag and drop the information data from the standard schema that you want to pass
on to the output file.
8.
Once the Output schema is defined as required, add a filter condition to only select the log information data
when the XML file is invalid.
Follow the best practice by typing first the wanted value for the variable, then the operator based on the type
of data filtered then the variable that should meet the requirement. In this case: 0 == row1.validate.
9.
Then connect (if not already done) the tMap to the tFileOutputDelimited component using a Row > Main
connection. Name it as relevant, in this example: log_errorsOnly.
10. In the tFileOutputDelimited Basic settings, Define the destination filepath, the field delimiters and the
encoding.
11. Save your Job and press F6 to run it.
On the Run console the messages defined display for each of the files. At the same time the output file is
filled with the log data for invalid files.
2289
tExtractXMLField
tExtractXMLField
tExtractXMLField properties
Component family
XML
Function
tExtractXMLField reads an input XML field of a file or a database table and extracts desired
data.
Purpose
tExtractXMLField opens an input XML field, reads the XML structured data directly without
having first to write it out to a temporary file, and finally sends data as defined in the schema
to the following component via a Row link.
Basic settings
Property type
Schema
Schema
type
and
XML field
Mapping
Limit
Die on error
This check box is selected by default. Clear the check box to skip
the row on error and complete the process for error-free rows. If
needed, you can retrieve the rows on error via a Row > Reject
link.
Advanced settings
tStatCatcher Statistics
Usage
Limitation
n/a
2290
Drop the following components from the Palette onto the design workspace: tMysqlInput,
tExtractXMLField, and tFileOutputDelimited.
Connect the three components using Main links.
2.
Double-click tMysqlInput to display its Basic settings view and define its properties.
3.
Enter the database connection and the data structure information manually. For more information about
tMysqlInput properties, see section tMysqlInput.
4.
In the Table Name field, enter the name of the table holding the XML data, customerdetails in this example.
Click Guess Query to display the query corresponding to your schema.
5.
Double-click tExtractXMLField to display its Basic settings view and define its properties.
2291
Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file
6.
Click Sync columns to retrieve the schema from the preceding component. You can click the three-dot button
next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7.
In the Xml field list, select the column from which you want to extract the XML data. In this example, the
filed holding the XML data is called CustomerDetails.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
In the Xpath query column, enter between inverted commas the node of the XML field holding the data you
want to extract, CustomerName in this example.
8.
Double-click tFileOutputDelimited to display its Basic settings view and define its properties.
9.
In the File Name field, define or browse to the path of the output file you want to write the extracted data in.
Click Sync columns to retrieve the schema from the preceding component. If needed, click the three-dot
button next to Edit schema to view the schema.
tExtractXMLField read and extracted the clients names under the node CustomerName of the CustomerDetails
field of the defined database table.
Drop the following components from the Palette to the design workspace: tFileInputDelimited,
tExtractXMLField, tFileOutputDelimited and tLogRow.
Connect the first three components using Row Main links.
2292
Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file
2.
Double-click tFileInputDelimited to open its Basic settings view and define the component properties.
3.
Click the three-dot button next to Edit schema to display a dialog box where you can define the structure
of your data.
Click the plus button to add as many columns as needed to your data structure. In this example, we have one
column in the schema: xmlStr.
Click OK to validate your changes and close the dialog box.
4.
In the File Name field, click the three-dot button and browse to the input delimited file you want to process,
CustomerDetails_Error in this example.
This delimited file holds a number of simple XML lines separated by double carriage return.
Set the row and field separators used in the input file in the corresponding fields, double carriage return for
the first and nothing for the second in this example.
If needed, set Header, Footer and Limit. None is used in this example.
5.
In the design workspace, double-click tExtractXMLField to display its Basic settings view and define the
component properties.
2293
Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file
6.
Click Sync columns to retrieve the schema from the preceding component. You can click the three-dot button
next to Edit schema to view/modify the schema.
The Column field in the Mapping table will be automatically populated with the defined schema.
7.
In the Xml field list, select the column from which you want to extract the XML data. In this example, the
filed holding the XML data is called xmlStr.
In the Loop XPath query field, enter the node of the XML tree on which to loop to retrieve data.
8.
In the design workspace, double-click tFileOutputDelimited to open its Basic settings view and display the
component properties.
9.
In the File Name field, define or browse to the output file you want to write the correct data in,
CustomerNames_right.csv in this example.
Click Sync columns to retrieve the schema of the preceding component. You can click the three-dot button
next to Edit schema to view/modify the schema.
10. In the design workspace, double-click tLogRow to display its Basic settings view and define the component
properties.
Click Sync Columns to retrieve the schema of the preceding component. For more information on this
component, see section tLogRow.
11. Save your Job and press F6 to execute it.
2294
Scenario 2: Extracting correct and erroneous data from an XML field in a delimited file
tExtractXMLField reads and extracts in the output delimited file, CustomerNames_right, the client information
for which the XML structure is correct, and displays as well erroneous data on the console of the Run view.
2295
tFileInputXML
tFileInputXML
tFileInputXML Properties
Component family
XML or File/Input
Function
tFileInputXML reads an XML structured file and extracts data row by row.
Purpose
Opens an XML structured file and reads it row by row to split them up into fields then sends
fields as defined in the Schema to the next component, via a Row link.
Basic settings
Property type
File name/Stream
Mapping
2296
tFileInputXML Properties
Advanced settings
Limit
Die on error
This check box is selected by default and stops the job in the
event of error. Clear the check box to skip erroneous rows. The
process will still be completed for error-free rows. If needed, you
can retrieve the erroneous rows using a Row > Reject link.
Select this check box to ignore the DTD file indicated in the XML
file being processed.
Advanced separator (for Select this check box to change data separator for numbers:
number)
Thousands separator: define the separators to use for thousands.
Decimal separator: define the separators to use for decimals.
Ignore the namespaces
Use Separator for mode Select this check box if you want to separate concatenated
Xerces
children node values.
This field can only be used if the selected Generation
mode is Xerces.
The following field displays:
Field separator: Define the delimiter to be used to separate the
children node values.
Encoding
Select the encoding type from the list or select Custom and define
it manually. This field is compulsory for DB data handling.
Generation mode
From the drop-down list select the generation mode for the XML
file, according to the memory available and the desired speed:
Slow and memory-consuming (Dom4j)
This option allows you to use dom4j to process the
XML files of high complexity.
Memory-consuming (Xerces).
Fast with low memory consumption (SAX)
Validate date
Select this check box to check the date format strictly against the
input schema.
tStatCatcher Statistics
Usage
tFileInputXML is for use as an entry componant. It allows you to create a flow of XML data
using a Row > Main link. You can also create a rejection flow using a Row > Reject link
to filter the data which doesnt correspond to the type defined. For an example of how to use
these two links, see section Scenario 2: Extracting correct and erroneous data from an XML
field in a delimited file.
Limitation
n/a
2297
This scenario describes a basic Job that reads a defined XML directory and extracts specific information and
outputs it on the Run console via a tLogRow component.
1.
Drop tFileInputXML and tLogRow from the Palette to the design workspace.
2.
3.
Double-click tFileInputXML to open its Basic settings view and define the component properties.
4.
As the street dir file used as input file has been previously defined in the Metadata area, select Repository
as Property type. This way, the properties are automatically leveraged and the rest of the properties fields
are filled in (apart from Schema).
5.
Select the same way the relevant schema in the Repository metadata list. Edit schema if you want to make
any change to the schema loaded.
6.
7.
In Loop XPath query, change if needed the node of the structure where the loop is based.
8.
On the Mapping table, fill the fields to be extracted and displayed in the output.
9.
10. Enter the encoding if needed then double-click on tLogRow to define the separator character.
11. Save your Job and press F6 to execute it.
2298
The fields defined in the input properties are extracted from the XML structure and displayed on the console.
This Java scenario describes a three-component Job that reads an XML file and:
1. first, returns correct XML data in an output XML file,
2. and second, displays on the console erroneous XML data which type does not correspond to the defined one
in the schema.
1.
Drop the following components from the Palette to the design workspace: tFileInputXML,
tFileOutputXML and tLogRow.
Right-click tFileInputXML and select Row > Main in the contextual menu and then click tFileOutputXML
to connect the components together.
Right-click tFileInputXML and select Row > Reject in the contextual menu and then click tLogRow to
connect the components together using a reject link.
2.
Double-click tFileInputXML to display the Basic settings view and define the component properties.
2299
3.
In the Property Type list, select Repository and click the three-dot button next to the field to display the
[Repository Content] dialog box where you can select the metadata relative to the input file if you have
already stored it in the File xml node under the Metadata folder of the Repository tree view. The fields that
follow are automatically filled with the fetched data. If not, select Built-in and fill in the fields that follow
manually.
4.
In the Schema Type list, select Repository and click the three-dot button to open the dialog box where you
can select the schema that describe the structure of the input file if you have already stored it in the Repository
tree view. If not, select Built-in and click the three-dot button next to Edit schema to open a dialog box
where you can define the schema manually.
The schema in this example consists of five columns: id, CustomerName, CustomerAddress, idState and id2.
5.
Click the three-dot button next to the Filename field and browse to the XML file you want to process.
6.
In the Loop XPath query, enter between inverted commas the path of the XML node on which to loop in
order to retrieve data.
In the Mapping table, Column is automatically populated with the defined schema.
In the XPath query column, enter between inverted commas the node of the XML file that holds the data
you want to extract from the corresponding column.
2300
7.
In the Limit field, enter the number of lines to be processed, the first 10 lines in this example.
8.
Double-click tFileOutputXML to display its Basic settings view and define the component properties.
9.
Click the three-dot button next to the File Name field and browse to the output XML file you want to collect
data in, customer_data.xml in this example.
In the Row tag field, enter between inverted commas the name you want to give to the tag that will hold
the recuperated data.
Click Edit schema to display the schema dialog box and make sure that the schema matches that of the
preceding component. If not, click Sync columns to retrieve the schema from the preceding component.
10. Double-click tLogRow to display its Basic settings view and define the component properties.
Click Edit schema to open the schema dialog box and make sure that the schema matches that of the preceding
component. If not, click Sync columns to retrieve the schema of the preceding component.
In the Mode area, select the Vertical option.
11. Save your Job and press F6 to execute it.
The output file customer_data.xml holding the correct XML data is created in the defined path and erroneous
XML data is displayed on the console of the Run view.
2301
tFileOutputXML
tFileOutputXML
tFileOutputXML properties
Component family
XML or File/Output
Function
Purpose
tFileOutputXML writes an XML file with separated data value according to a defined schema.
Basic settings
File name
Incoming
document
record
is
a Select this check box if the data from the preceding component
is in XML format.
When this check box is selected, a Column list appears allowing
you to select a Document type column of the schema that holds
the data, and the Row tag field disappears.
Row tag
Specify the tag that will wrap data and structure per row.
Sync columns
Advanced settings
Split output in several files If the output is big, you can split the output into several files, each
containing the specified number of rows.
Rows in each output file: Specify the number of rows in each
output file.
Create directory if not This check box is selected by default. It creates a directory to hold
exists
the output XML files if required.
Root tags
Specify one or more root tags to wrap the whole output file
structure and data. The default root tag is root.
Output format
2302
Related scenarios
than from the input schema for any column, clear this check box
for that column and specify a tag label between quotation marks
in the Label field.
Use dynamic grouping
Select this check box if you want to dynamically group the output
columns. Click the plus button to add one ore more grouping
criteria in the Group by table.
Column: Select a column you want to use as a wrapping element
for the grouped output rows.
Attribute label: Enter an attribute label for the group wrapping
element, between quotation marks.
Custom the flush buffer size Select this check box to define the number of rows to buffer
before the data is written into the target file and the buffer is
emptied.
Row Number: Specify the number of rows to buffer.
Advanced separator (for Select this check box to modify the separators used for numbers:
numbers)
Thousands separator: define separators for thousands.
Decimal separator: define separators for decimals.
Usage
Encoding
Select the encoding from the list or select Custom and define it
manually. This field is compulsory for DB data handling.
tStatCatcher Statistics
Use this component to write an XML file with data passed on from other components using
a Row link.
Global Variables
Related scenarios
For related scenarios using tFileOutputXML, see section Scenario: From Positional to XML file and section
Scenario 2: Using a SOAP message from an XML file to get airport information and saving the information to
an XML file.
2303
tWriteXMLField
tWriteXMLField
tWriteXMLField properties
Component family
XML
Function
Purpose
tWriteXMLField reads an input XML file and extracts the structure to insert it in defined
fields of the output file.
Basic settings
Output Column
Advanced settings
Sync columns
Group by
Remove
the
declaration
XML Select this check box if you do not want to include the XML
header.
Create empty element if This check box is selected by default. If the Related Column in
needed
the XML tree editor has null values, or if no column is associated
with the XML node, this option creates an open/close tag in the
expected place.
Expand Empty Element if Select this option to allow a null element to appear in
needed(for dom4j)
the form of tag pair, e.g. <element></element>. Otherwise,
such an element appears as a solo tag, e.g. <element/>. For
more information about XML tags, see http://www.tizag.com/
xmlTutorial/xmltag.php.
To use this option, you must select the Dom4J
generation mode.
Available when Create empty element if needed is
selected.
Create associated XSD file If one of the XML elements is defined as a Namespace element,
this option will create the corresponding XSD file.
To use this option, you must select the Dom4J
generation mode.
Advanced separator (for Select this check box if you want to modify the separators used
number)
by default for numbers.
Thousands separator: enter between brackets the separators to
use for thousands.
2304
Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table
Usage
Encoding
Select the encoding type in the list or select Custom and define it
manually. This field is compulsory when working with databases.
tStatCatcher Statistics
Global Variables
This three-component scenario allows to read an XML file, extract the XML structure, and finally outputs the
structure to the fields of a database table.
1.
Drop the following components from the Palette onto the design workspace: tFileInputXml,
tWriteXMLField, and tMysqlOutput.
Connect the three components using Main links.
2.
Double-click tFileInputXml to open its Basic settings view and define its properties.
2305
Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table
3.
If you have already stored the input schema in the Repository tree view, select Repository first from the
Property Type list and then from the Schema list to display the [Repository Content] dialog box where
you can select the relevant metadata.
4.
If you have not stored the input schema locally, select Built-in in the Property Type and Schema fields and
fill in the fields that follow manually. For more information about tFileInputXML properties, see section
tFileInputXML.
If you have selected Built-in, click the three-dot button next to the Edit schema field to open a dialog box
where you can manually define the structure of your file.
5.
In the Look Xpath query field, enter the node of the structure where the loop is based. In this example, the
loop is based on the customer node. Column in the Mapping table will be automatically populated with the
defined file content.
In the Xpath query column, enter between inverted commas the node of the XML file that holds the data
corresponding to each of the Column fields.
6.
In the design workspace, click tWriteXMLField and then in the Component view, click Basic settings to
open the relevant view where you can define the component properties.
7.
Click the three-dot button next to the Edit schema field to open a dialog box where you can add a line by
clicking the plus button.
2306
Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table
8.
Click in the line and enter the name of the output column where you want to write the XML content,
CustomerDetails in this example.
Define the type and length in the corresponding fields, String and 255in this example.
Click Ok to validate your output schema and close the dialog box.
In the Basic settings view and from the Output Column list, select the column you already defined where
you want to write the XML content.
9.
Click the three-dot button next to Configure Xml Tree to open the interface that helps to create the XML
structure.
10. In the Link Target area, click rootTag and rename it as CustomerDetails.
In the Linker source area, drop CustomerName and CustomerAddress to CustomerDetails. A dialog box
displays asking what type of operation you want to do.
Select Create as sub-element of target node to create a sub-element of the CustomerDetails node.
Right-click CustomerName and select from the contextual menu Set As Loop Element.
Click OK to validate the XML structure you defined.
11. Double-click tMysqlOutput to open its Basic settings view and define its properties.
2307
Scenario: Extracting the structure of an XML file and inserting it into the fields of a database table
12. If you have already stored the schema in the DB Connection node in the Repository tree view, select
Repository from the Schema list to display the [Repository Content] dialog box where you can select the
relevant metadata.
If you have not stored the schema locally, select Built-in in the Property Type and Schema fields and enter
the database connection and data structure information manually. For more information about tMysqlOutput
properties, see section tMysqlOutput.
In the Table field, enter the name of the database table to be created, where you want to write the extracted
XML data.
From the Action on table list, select Create table to create the defined table.
From the Action on data list, select Insert to write the data.
Click Sync columns to retrieve the schema from the preceding component. You can click the three-dot button
next to Edit schema to view the schema.
13. Save your Job and click F6 to execute it.
tWriteXMLField fills every field of the CustomerDetails column with the XML structure of the input file:
the XML processing instruction <?xml version=""1.0"" encoding=""ISO-8859-15""?>, the first node
that separates each client <CustomerDetails> and finally customer information <CustomerAddress> and
<CustomerName>.
2308
tXMLMap
tXMLMap
tXMLMap belongs to two component families: Processing and XML. For more information on it, see section
tXMLMap.
2309
tXSDValidator
tXSDValidator
tXSDValidator Properties
Component family
XML
Function
Validates an input XML file or an input XML flow against an XSD file and sends the validation
log to the defined output.
Purpose
Helps at controlling data and structure quality of the file or flow to be processed
Basic settings
Mode
XSD file
Filepath to the reference XSD file. HTTP URL also supported, e.g.
http://localhost:8080/book.xsd.
XML file
If XML is valid, display If Type in a message to be displayed in the Run console based on
XML is invalid, display
the result of the comparison.
Print to console
Allocate
Encoding
tStatCatcher Statistics
Advanced settings
Usage
When used in File mode, this component can be used as standalone component but it is usually
linked to an output component to gather the log data.
Limitation
n/a
2310
2.
Double-click the tFileInputDelimited to open its Component view and set its properties:
3.
Browse to the input file, and define the number of rows to be skipped in the beginning of the file.
Click Edit schema and edit the schema according to the input file. In this scenario, the input file has only two
columns: ID and ShipmentInfo. The ShipmentInfo column is an XML column and needs to be validated.
4.
On your design workspace, connect the tFileInputDelimited component to the tXSDValidator component
using a Row > Main link.
5.
2311
6.
7.
8.
Connect the tXSDValidator component to the other tFileOutputDelimited component using a Row >
Rejects link to output the information about invalid XML rows.
9.
Double-click each of the two tFileOutputDelimited components and configure the component properties.
In the File Name field, enter or, if you want to use an existing output file, browse to the output file path.
10. Click Sync columns to retrieve the schema from the preceding component.
2312
The output files contain the validation information about the valid and invalid XML rows of the specified column
respectively.
2313
tXSLT
tXSLT
tXSLT Properties
Component family
XML
Function
Refers to an XSL stylesheet, to transform an XML source file into a defined output file.
Purpose
Basic settings
XML file
XSL file
Output file
File path to the output file. If the file does not exist, it will be
created. The output file can be any structured or unstructured file
such as html, xml, txt or even pdf or edifact depending on your xsl.
Parameters
Click the plus button to add new lines in the Parameters list and
define the transformation parameters of the XSLT file. Click in
each line and enter the key in the name list and its associated value
in the value list.
Advanced settings
tStatCatcher Statistics
Select this check box to gather the processing metadata at the Job
level as well as at each component level.
Usage
Limitation
Due to license incompatibility, one or more JARs required to use this component are not
provided. You can easily find out and add such JARs in the Integration perspective of your
studio. For details, see the section about external modules in the Talend Installation and Upgrade
Guide.
Drop the tXSLT and tMsBox components from the Palette to the design workspace.
2.
Double-click tXSLT to open its Basic settings view where you can define the component properties.
2314
3.
In the XML file field, set the path or browse to the xml file to be transformed. In this example, the xml file
holds a list of MP3 song titles and related information including artist names, company etc.
4.
In the XSL file field in the Basic settings view, set the path or browse to the relevant xsl file.
5.
In the Output file field, set the path or browse to the output html file.
In this example, we want to convert the xml data into an html file holding a table heading followed by a table
listing artists names next to song titles.
2315
6.
In the Parameters area of the Basic settings view, click the plus button to add a line where you can define the
name and value of the transformation parameter of the xsl file. In this example, the name of the transformation
parameter we want to use is bgcolor and the value is green.
7.
Double-click the tMsgBox to display its Basic settings view and define its display properties as needed.
8.
Save the Job and press F6 to execute it. The message box displays confirming that the output html file is
created and stored in the defined path.
2316
9.
You can now open the output html file to check the transformation of the xml data and that of the background
color of the table heading.
2317