Вы находитесь на странице: 1из 13

1. What is a Sequential File Stage..? A.

Sequential File Stage allows you to read data from or write data to one or more flat files. The stage can have a single input link or a single output link, and a single rejects link. 2. How to read and write a file in sequential stage file..? y Writing to a file  In the Input Link Properties Tab specify the pathname of the file being written to (repeat this for writing to multiple files). The other properties all have default values, which you can change or not as required.  In the Input Link Format Tab specify format details for the file(s) you are writing to, or accept the defaults (variable length columns enclosed in double quotes and delimited by commas, rows delimited with UNIX newlines).  Ensure column meta data has been specified for the file(s) (this can be achieved via a schema file if required). y Reading from a file  In the Output Link Properties Tab: In Read Method, specify whether to read specific files (the default) or all files whose name fits a pattern. If you are reading specific files, specify the pathname of the file being read from (repeat this for reading multiple files). If you are reading files that fit a pattern, specify the name pattern to match. Accept the default for the options or specify new settings (available options depend on the Read Method).  In the Output Link Format Tab specify format details for the file(s) you are reading from, or accept the defaults (variable length columns enclosed in double quotes and delimited by commas, rows delimited with UNIX newlines). v Ensure column meta data has been specified for the file(s) (this can be achieved via a schema file if required). 3. What are the read methods in Sequential File Stage..? A. Specific files, file Patterns 4. What are Type Defaults..? A. Type Defaults These are properties that apply to all columns of a specific data type unless specifically overridden at the column level. They are divided into a number of subgroups according to data type. Type Defaults These are properties that apply to all columns of a specific data type unless specifically overridden at the column level. They are divided into a number of subgroups according to data type. TIME Format String Is Midnight Seconds TIME STAMP

5. In my source .csvfile the Date is in format ddmonyy, I tried viewing the file through Sequentialstage by giving it datatype Date,timestamp and Numeric, but I am not able to view it what data type should i set so that i can view it ..can it be varchar .? A. SequentialFilestage does not understand nulls. So you have to handle it. Inside SequentialFileStage Properties, go to columns and then double click on the left hand side of the required column name. In that you will find a "Nullable" property. Set a Null Field Value for that. It can be a valid value or an empty string. 6. How to add 5 text files in Sequential file stage..? A. use file pattern option 7. I have a sequential file with 2000 million records which is already sorted on key1,key2 and key3. I am reading this file using a sequential file stage running in sequential mode and then hash partitioning the data after reading on Key1. Seq stage -> copy stage (input hash on key 1) -> Dataset stage From the test i did, the data going into the dataset is sorted within the partition. Does the sorted data remains sorted? A. since i am reading a csv file in a sequential mode and then repartitioning it, data will be sorted within a given partition (due to the first in first out pipe line processing nature of the stage and because the data is moving from sequential to parallel stage). 8. I am writing to a pipe delimited file.The data looks like '1234'|'NY'.This is the warning message I am getting in the log APT_CombinedOperatorController(1),0: Field 'Affis' from input dataset '0' is NULL. Record dropped I am actually doing a nulltovalue in the transformer and while reading sequential file I have set nullvalue in the format to empty string ''. A. In the server job sequential file output, open the edit column metadata (double click on the column number), in this there will be a null field value, enter the spaces based on the length of the column, do the same in the parallel source sequential file 9. I am capturing database rejections by adding reject link to oracle connector stage. At the other end am restructuring the same rejects as tab separated values through Column Export stage and writing them into a single column. My Job is RCP enabled. Since few columns have Decimal datatype in the Oracle database and upon any database rejections, decimal values are getting rejected with leading & trailing zero's into it. Is there any property in Column Export stage which removes leading & trailing zeros and put's only actual decimal value. currently using Default decimal precision & scale [38,10]

eg: If am inserting any value "44.00" into database, and rejection happens due to key violation , am getting value as "0000000000000000000000000044.0000000000". Sol: As, In actual tables it's loading correctly. But when any records get's rejected, All the column data will get inserted into a single column of reject table as tab separated value through Column Export stage. In the same tab separated value column , leading & trailing values getting inserted for decimal data.

10. I am using an excel sheet which is saved as .csv as a source file, containing 30 records. Reading this file using sequentialfile stage. My job is reading only 15 records from this file and then it throw some warnings. I physically checked .csvfile and can't find any problem after record 15. Can't understand why this is happening. Any help is highly appreciated. Following is log file of job:DataStage Report - Summary Log for job: first Produced on: 3/21/2011 10:52:17 AM Project: test Host system: MAC01 Items: 1 - 29 Sorted on: Date Sorter Entries are filtered Occurred: 10:49:22 AM On date: 3/21/2011 Type: Reset Event: Log cleared by user Occurred: 10:49:44 AM On date: 3/21/2011 Type: Control Event: Starting Job first. Occurred: 10:49:50 AM On date: 3/21/2011 Type: Info Event: Environment variable settings: (...) Occurred: 10:49:50 AM On date: 3/21/2011 Type: Info Event: Parallel job initiated Occurred: 10:49:51 AM On date: 3/21/2011 Type: Info Event: main_program: IBM WebSphereDataStage Enterprise Edition 8.0.1.4458 (...) Occurred: 10:49:52 AM On date: 3/21/2011 Type: Info Event: main_program: orchgeneral: loaded (...) Occurred: 10:49:57 AM On date: 3/21/2011 Type: Info Event: main_program: APT configuration file: C:/IBM/InformationServer/Server/Configurations/default.apt (...) Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning

Event: Input,0: Delimiter for field "IMPACTED_USERS___FTA" not found; input: {0}, at offset: 422 Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import warning at record 15. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import unsuccessful at record 15. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Delimiter for field "IMPACTED_USERS___FTA" not found; input: {0}, at offset: 396 Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import warning at record 16. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import unsuccessful at record 16. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Delimiter for field "IMPACTED_USERS___FTA" not found; input: {0}, at offset: 406 Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import warning at record 17. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import unsuccessful at record 17. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Delimiter for field "IMPACTED_USERS___FTA" not found; input: {0}, at offset: 346 Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import warning at record 18. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import unsuccessful at record 18. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Delimiter for field "IMPACTED_USERS___FTA" not found; input: {0}, at offset: 350 Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import warning at record 19. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Warning Event: Input,0: Import unsuccessful at record 19. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Info Event: Input,0: No further reports will be generated from this partition until a successful import. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Info Event: Input,0: Import complete; 15 records imported successfully, 19 rejected. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Info Event: Sequential_File_35,0: Export complete; 15 records exported successfully, 0 rejected. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Info Event: main_program: Step execution finished with status = OK. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Info Event: main_program: Startup time, 0:05; production run time, 0:00. Occurred: 10:49:57 AM On date: 3/21/2011 Type: Info Event: Parallel job reports successful completion Occurred: 10:49:58 AM On date: 3/21/2011 Type: Control

Event: Finished Job first. End of report. A. If you are looking in Excel, then you won't see it. Open the CSV file in a text editor and count the commas. I have found that it is generally a bad idea to save an Excel sheet as a CSV and expect it to work in DataStage. I have written an Excel VBA macro that processes Excel sheets and writes them out as CSV files, with optional formatting e.g. date formats, zero padding, etc. that I use to generate CSV files for DataStage job inputs.

11. how to run a job if job failed after loading some records Ex:i have 1000 records in source job is aborted after loading 600 records how to load remaining 400 records A. normally if job is aborted i didn't find any records in target (i mean record count is zero) If the job aborts after loading 600 records out of 1000 in source file, verify if the records 600 are actually loaded. Then in the output database stage whatever it be use UPSERT Update then Insert in place of simple insert and do not forget to specify the KEY fields in that. That will do yours work- it checks for the key if exists will update the remaining fields which will be same as the source file is the same - then after the update is done it further inserts the remaining records only. 12. Faced with the necessity to parse a text file which has no record separator. The file is an variable-length file with comma-separated fields, but the records do not separate. for example: "Row1_field1",row1_field2,"row1_field3",row1_field4"row2_field1",row2_field2,"ro w2_field3",row2_field4 A. We need not use parse, Set up the properties in the sequential file stage and let the stage do its job. Record Type = implicit Delimiter = comma Quote = double

13. DataStage can't handle timestamps that saved with milliseconds, for example: 2007-08-16-00.00.00.000000 When i'll try to view data in sequential file stage, the following error is occurred: Field "DATE_DOLP" delimiter not seen, at offset: 288 where DATE_DOLP is field corresponding to timestamp. DataStage expects that after 00.00.00 position field will be finished and next field are started.

When i try to define timestamp format %yyyy-%mm-%dd-%hh.%nn.%ss.%SSSSSS, the following error is occurred At field "DATE_DOLP": When validating import/export function: "timestamp_format" property value (%yyyy-%mm-%dd-%hh.%nn.%ss.%SSSSSS) is not a valid format. A. Timestamp format should probably be:%yyyy-%mm-%dd-%hh.%nn.%ss.6 Or define it as a string in the file and convert it inside the job. 14. job design is like this dataset----->transformer------->odbc connector my job is having 169000 records,i am using update then insert to load the data.It is taking 45 minits to load the data.I am using HASH partitioning in transformer and odbc stages. can anyone please suggest me how to improve the performance. A. Can prove the location of the bottleneck by writing to a Sequential File rather than Oracle. If that's no faster, the problem is in DataStage. If it is markedly faster, then the problem is in Oracle database 15. I've one file it contains 9 records(EceptionDb).If I open file with .csv format then I can able see 9 records.then I design a job I use the sequentialfile as source Simple I loading in target.While reading data It showing 26 records but actually it contains 9 records. So sequentialfile not able read properly.In meta data I've column like Description_c due to this field its splitting intomultiple records In my source contains 9 records. Id,Org_Name,Websit,Comment_c,Client_Id,.......

In the above MetaDataComent_c filed size 3200 bits,This field comment about the client.While enter the comment if user press enter button then sequentialfile consider as newline character from enter position onwards it treat as another line so that even though my source contains 9 records it showing more then 9 records

A. Use a Server job, the SequentialFile stage there has a 'Contains Terminators' option the PX one doesn't for some reason, it will easily read your 9 records properly. 16. I have a job design as follows. Sequentialfile --> Aggregator stage -->Sequentialfile The source file has 2 columns and read 2 columns through sequentialfile as follows.

GroupNO -- varchar(10) GDate -- varchar(12) At Aggregator stage level i used group as "GroupNO" and aggregation type as "Calculation" and column for calculation as "Gdate" and Minimum value output column as "Gdate". I am getting improper result like at "2" at Gdate and also it is throughing below warnings... ------------------------------------------------------Aggregator_File: When checking operator: When binding input interface field "GDate" to field "Gdate": Implicit conversion from source type "string[max=12]" to result type "dfloat": Converting string to number. Aggregator_File: When checking operator: When binding output interface field "GDate" to field "GDate": Implicit conversion from source type "dfloat" to result type "string[max=12]": Converting number to string. Aggregator_File: When checking operator: When binding output interface field "Gdate" to field "Gdate": Converting a nullable source to a non-nullable result; a fatal runtime error could occur; use the modify operator to specify a value to which the null should be converted. ------------------------------------------------------How to handle this situation. Whether any stages are required to input in between stages. I need to pick minimum date record from each group. The test data is as follows. GroupNO, GDate 200, 2009-03-02 100, 2010-01-01 100, 2009-02-02 200, 2008-03-02 and output should be as follows. GroupNO, GDate 100,2009-02-02 200,2008-03-02 A. The input we have is : GroupNO, GDate 200, 2009-03-02 100, 2010-01-01 100, 2009-02-02 200, 2008-03-02 The output we need is GroupNO, GDate 100,2009-02-02

200,2008-03-02 One way of doing this is by using a remove duplicate stage. The input to the remove duplicate stage should be sorted based upon GroupNo and GDate in ascending order and should be hash partitioned only on GroupNo.This can be achieved either by link sort of remove duplicate stage or by using a separate sort stage.Then deduplicate in the data based on GroupNo and pick up the first record in the group.This should give the desired result without going for an aggregator stage

17. I have come across with RARE kind of requirements which i have work till now. Source is sequentialfile (pipe delimited) Data look like abc|xyz|a,b,c,d|123 but i need to produce the file with 7 columns for above record. like this. abc|xyz|a|b|c|d|123 Here one more thing is 3rd column in the above source data might get vary, so based on the no of delimiters i have to write records to target ... Target is again Sequentialfile. A. If the ONLY thing you need to do is to replace the commas "," with pipes "|", read the entire record into one column and use Convert() in a transformer, or use awk or sed as a filter to do the replacement or process the file outside of DataStage altogether using awk, sed or perl. (This was suggested in the earlier post) If you need to do additional processing (other than the above) to or with the "a,b,c,d[,e,f]" data, you can read those into a column by themselves and use the Convert() function without breaking them out into separate columns (keep them in a single column). Then write the file back up at the end of the processing stream. 18. I have a requirement to produce a file in following format. I have all the incoming data "INPUT_DATA" and only one record is coming at a time from Source. Anything other than "INPUT_DATA" is literal. is there any function to break the line of the record to get following format? Required output Format ================== Attention Partner #"INPUT_DATA"

The Following order was not in stock at Warehouse. Warehouse Location# "INPUT_DATA" Reason for Cancellation, OUT OF STOCK. "INPUT_DATA" Product# "INPUT_DATA" PO# "INPUT_DATA" Order Date | Prod Number | Ordered Quantity "INPUT_DATA" "INPUT_DATA" "INPUT_DATA" A. Instead of reading input as separatecolums, I read fixed width file as Varchar [length 65], and wrote them in a singleVarChar field [length not specified] along with the literals. but did not get the output format i was looking for. The output format willbe sent as an attachment for email so has to be in that format. INPUT READ: iRecVarchar - with 65 Length

19. We are extracting data from db2 and using modify stage converting into string then finally loading into sequential file. Job Design: Db2 stage -->Modify stage -->sequential file stage Source column name : Cust_amount Datatype : Decimal (20,2) Target column name : Cust_amount Datatype : Varchar(25) Modify stage specification: Cust_amount:string[max=25] = string_from_decimal [suppress_zero](Cust_amount) Sample data Sum of the column @ database level 1223454545903.30 Sum of the column @ file level (After loading into sequential file) 1223454545901.50 we observed loss of data at sum of the column not at the row level. we have get this value after loading into sequential file (1223454545903.30) but we are getting different values.

Tried the following other options 1. db2 stage -->seq file stage (we are seeing some difference) A. First loading into a comma separated file then moving the file into windows through winscp then open with excel and sum of all the values (Format: number( -1234.10 with 2 decimal places)

20. The basic differences between File Sets, Data Sets and a normal Sequential file?

A sequential file can only be accessed on one node. In general it can only be accessed sequentially by a single process, hence concept of parllelism is gone. dataset preserves partition.It stores data on the nodes,so when you read from a dataset you dont have to re partition your data. we cannot use Unix cp or rm commands to copy or delete a dataset becuse,Datastage represents a single data set with multiple files.Usingrm simply removes the descriptor file,leaving the much larger files behind.

21. Is there a way to get more than the "RAW" data from the SeqFileStg Reject link? A. You will note that you can not influence the metadata on this link, therefore the answer is NO as far as the Sequential File stage is concerned. The rows on this link simply don't match the metadata 22. I am confused regarding the effect of defining multiple files in the Sequential stage as targets My current design is like this DB2 -> Sorter ->SeqFile NAME AGE ---------- ----------ADARSH 23 AKSHAY 25 ADARSH 24 - 29 ADARSH 22 ANUP 22

LAMBA 34 ADARSH 27 - 23 - 24 adarsh 12 In the Sort stage I'm sorting on name with the following additional properties In the Seq Stage under Properties -> Target I define two files File=/xyz/a1.txt File=/xyz/a2.txt the outputs are as follows a1 ADARSH,23 ADARSH,24 ADARSH,22 ADARSH,27 adarsh,12 a2 AKSHA,25 ANUP,22 LAMBA,34 Any explanations as to why this is happening? A. It's because - intentionally or otherwise - you designed your job to do that. You have not provided anything like enough information to enable anyone to diagnose this. For example, how many processing nodes are defined in the configuration file under which the job was run? Have you tested it with a single processing node? Where do you specify the sorting? Exactly what sorting do you specify? Exactly what partitioning do you specify, and where? What null handling/representation do you specify in the Sequential File stage? Computers are DUMB - they do exactly what you tell them to do. Exactly. So you need some skills in representing your requirements exactly in a form that the computer can use. The Sequential file Target was set to Auto Partitioning type ... setting to entire instead allows the required result.

23. I have duplicate data in a sequential file. There is a timestamp field in the sequential file. I want to extract most recent data from the sequential file based on the time stamp. A. Use Sort stage. Use two key in Sort stage, 1. Some unique field or ID, 2. Timestamp (Descending). Retain first. 24. Can I create a sequential file on the fly? A. You could make the sequential file name (or pathname) a job parameter, and use that parameter in the Sequential File stage. 25. I have a requirement where in I need to create a Sequentail file on fly ,but the name of the Sequentail file created should be concatination of few columns Like Sequential File Name :SSN:DateTime A. Create a routine and call the routine in the job which when necessary. I found it the easiest way to do.

26. what is the main advantage of using data set over sequential file? A. Parallelism 27. I have a sequential file with 2 columns and 10 rows I wish to call a job one time for each row and pass the columns/values as parameters to that job. If possible please include a breif detail about the procedure or stages to use . A. You an use Looping activity in Job Sequence to have a loop for 10 times and pass the value of n to the job 1-10. So that you can retrive nth row for each iteration. Pass on this value either as UserStatus or write into an external file and read it and pass it as command output through Execute command activity. 28 . I have a problem with a stage output in a job which has many column value as null. I want to write these data to a sequential file. I have to find a way to handle nulls for every column, not only for one. there are many columns which might be null, and I don't want to edit string properties for every column. is there anyway to issue a statement like; if anycolumn in anyrow is null, then write a special character for that value into the seq file, otherwise just write the value to the seq file. or do you have any routine or funtion implement this, or do you have any advise?

A.

You can specify how null is to be represented as a record level property. You can override that at the individual column level using Edit Row (right click on that column in the Columns grid to expose the menu from which you can open the Edit Row dialog).

Вам также может понравиться