Вы находитесь на странице: 1из 3

Datastage experience and the use of unix applications

Ascential Datastage is a company well-known third-party ETL tools produced. Its main
features are:
1. Visualization operations section, to avoid a lot of manual code
2. Third-party tools, good at dealing with complex data sources
3. Monitors is good, can quickly find and solve problems in ETL
For the beginners, Datastage, by learning the official training materials (online
everywhere as a), can be quickly started, after all, Datastage is a visual tool, not too
difficult to understand the content of Shibuya. But in the real in the use of a variety of
problems may be encountered. Here to talk about my confusion in use have had some
problems:
1. Job granularity. An ETL process, with multiple steps in the design process, in the end
is rough and is not used to achieve a small and complex job, or refine some of the job
with multiple and simple to achieve better? Personally, I think, relatively fine particle size
is more conducive to program development. In the early stage of development,
refinement of the job appears to be more complicated, but the latter part of the testing
phase of the project, refine the job can be more accurate positioning error and easy to
modify.
2. Parallel and serial. As to the late development stage, we are ready to connect to
multiple job, we will find, can be more efficient ETL job to become the key to the parallel,
and this factor often overlooked in the design stage. ETL may be involved in multiple
tables from multiple data sources, and multiple job may also be formed on a data source
and a table which contention. When the data source contention, it will affect the
execution efficiency of ETL. When the table when the contention can not be resolved,
you can only use the serial. And a good structure, process design, can greatly reduce
this contention, thereby improving the efficiency of ETL.
3. To Datastage combined with external code. Datastage is not a stand-alone
development tool, it needs an external control program as the carrier before it can be
good customer operations. The Datastage is not a panacea, simply, it's just a visual
language sql carrier. Therefore, there are some features, not necessarily in Datastage to
achieve, but should be placed in an external program in order to complete the form sql
code, to ensure the stability of the whole process, security.
The above are some of the general direction of the problem, in practice there will be
many tedious little problems, I try to list some:
1. Character set: output and input in the character set is set to none, is a good
choice.Run at least ensure that garbage will not abort.
2. Column delimiter in the text can not be set to three, in theory, only three separator can
guarantee program will not be identified as garbage separator, which is a defect
Datastage.
3. Using a custom sql, you need to use non-custom forms needed to manually configure
the table, and then switch back to custom format, if the direct writing custom sql, will
result in Datastage table name can not be distinguished, leading to the error, which
should is a bug.
4. To keep the configuration of a input or output, to view data about the habits, do not
wait until run again when back to find error.
5. Input as much as possible not to use insert or update options like it, and insert only
difference is huge. Options such as using the insert or update, the equivalent of using a
cursor, one by one comparison, each insert one, should do first a full table scan, the
speed can be imagined. If you need to achieve this functionality, you should use other
methods, such as the target table first delete all the duplicate records with the source
table and then insert the data from the source table.
6. Date type data is more trouble as the date format Datastage timestamp, of course, you
can also change the date format of its type to date, but often an error. For oracle
database source table and target table, the data do not need any type of date
conversion, you can directly use the default, but for some other informix database, you
need to use oconv, iconv function to convert and output the corresponding changes in
output The date format sql. Specific usage can go online or check the datastage help.
7. As long as you ensure that input and output data type and length when there is no
problem between the two during this period, Datastage data type and length is free to
change, but also can add custom columns.
8. The half-width spaces in the string need trimb, rather than the trim function, but this is
often ignored. Other cases may also have Chinese and other half-width, so the string,
length, character set, these often result in Datastage between the error, it should try to
ensure that the length of the string to insert before the insert after the length of the string
is less than , and you see the insert length of the string before it is not necessarily the
true length of Datastage, so use trimb function in the input sql to do what restrictions, is
the most secure method.
Concluded that the application of a datastage unix instance in order for your reference:
A complete ETL, the steps are:
1. Business user interface (java, jsp, etc.-friendly interface) triggers
2. Shell Run
3. Start Controljob run
4. Controljob starting job
5. Monitoring job status Controljob run (loop running until the end of all the job)
6. Back to job performance to the shell
7. Shell returns to the service interface implementation
8. Users get the results
It can be seen here, including several major elements: the business interface, shell,
controljob, getstatus controljob, job
Which only give you lists controljob, getstatus controljob, and the shell of the controljob
scheduling order, the other part of the details are no longer
First, the common control job
1. With a slash, underline and bold the part of the need to consider whether it is parallel,
if the sentence is not parallel to
2. The original layer needs to add the blue line
3. Red that job name, job name to replace
[B1] line corresponds to the original level of the job, when you need to load data from
text, you need to invoke the sh here * Setup DXrtInc, run it, wait for it to finish, and
test for success

hJobDXrtInc1 = DSAttachJob("DXrtInc", DSJ.ERRFATAL)

If NOT(hJobDXrtInc1) Then

Call DSLogFatal("Job Attach Failed: DXrtInc", "JobControl")

Abort

End

Call DSExecute("UNIX", "/essbase/script/dwcorp/system/t[b1].sh", Output,


SystemReturnCode)

*If FAIL Then RESET

Status = DSGetJobInfo(hJobDXrtInc1, DSJ.JOBSTATUS)

If Status = DSJS.RUNFAILED Then

ErrCode = DSRunJob(hJobDXrtInc1, DS

Вам также может понравиться