Вы находитесь на странице: 1из 16

A comparative analysis between SQL*LOADER and UTL_F LE !

tility"
An!naya S#rivastava $omp!tec# $orporation
An!naya S#rivastava #as been wor%in& wit# Oracle tec#nolo&y 'or more t#an 'ive years" (e is c!rrently wit# Oracle Applications" (e wor%s 'or $omp!tec# Answers) a Detroit based company wit# branc#es all across US" *o! can reac# An!naya at an!naya+#otmail"com" E,E$UT -E SU..AR* In implementing new systems we come across problems of importing "alien" data. This may be coming from a legacy system or an on-going system. This data is transported via extract files from the legacy system to the Oracle system. The gateway to Oracle for this data is SQL Loader and data is loaded into tables via a control script into tables. Typically! the older systems do not have very normali"ed data! nor have they been operating with f#lly implemented database constraints. The lac$ of constraints over the years in legacy system can lead to bad data that has crept in. Therefore! while bringing external data into oracle system we need a refined set of chec$s and balances to ens#re that we get good data. This re%#ires a lot of programmatic control in the process of data-loading. The approach applied in case of SQL Loader is as follows & '. Load the data into temporary tables via SQL Loader via control file and ma$e the data native to O()*L+. ,. -rite a .L/SQL program to do the processing. 0. Load the data into live tables. This approach has a lot of dependencies as well as a strong lac$ of integration of steps and programmatic control. To overcome this! we have analy"ed another facility in that has been release Oracle 1.0.x onwards. It is called the 2TL34IL+ pac$age. -ith some creative #se of this pac$age we can achieve whatever SQL LO)5+( offers and in addition to that do some high level validation and complex data loading. In the following disc#ssion a st#dy of two tools is done with an example case for 2TL34IL+. A /R EF O-ER- E0 OF SQL*Loader SQL Loader is a server #tility for loading data from external data files into Oracle database. The basic advantage of #sing SQL Loader is for simple loads and fast loading of data. It can load data into myraid data formats! perform elementary filtering! load data into m#ltiple tables! and create one logical record from one or more physical records. It creates a detailed log file! a bad file that contains re6ected records and a discard file to hold the records that are selectively not loaded. The tool is exec#ted from a command line and a #sername and password and the control file name and location are re%#ired to r#n it. 7See 4ig#re'8 Typical syntax of a control file for loading data for SQL Loader. '. LO)5 5)T) ,. I94IL+ file3name / 0. I9S+(T/)..+95/(+.L)*+/T(29*)T+ :. *O9TI92+ I4 T;IS 7'8 < character =. I9TO T)>L+ +?. @. 7 empid S+Q2+9*+ 7?)A!'8! 1. first3name .osition 7B'&0B8 *;)(! C. last3name .osition 70'&@B8 *;)(! D. hire3date .osition 7 @'&1,8 5)T+! 'B. emp3no .osition 7 10&C=8 *;)( ''. I9TO T)>L+ 5+.T ',. -;+9 5+.T9OE< FGHF '0. 7emp3no .osition 710&C=8 *;)(! ':. dept3no .osition 7C@&D=8 I9T+I+(3+AT+(9)L8 '=. >+II9 5)T) - - if yo# are #sing on line ,.

Table ). Line J ' , 0 : = N '' ', '= +xplanation This line is the starting syntax for the control file. This line gives the location of an inp#t file. I4 a K L option is chosen then data is appended in the control file itself. This line contains the 5?L operation we want to perform with SQL Loader. The character 7 for instance! KML or K L8 specifies that next physical record sho#ld be appended till they find another character li$e that to for a single logical record. These lines provide the table names where the records need to be inserted. This line filters 7does not load8 the records where deptno < GH. This line is #sed if one is #sing the K L option on the LineJ ,

5atafile +xtracts

*ontrol 4ile

SQL Logfile

Loader >ad 4ile

rdbms

4II2(+ '. The schematic representations of SQL Loader operations for data loading operations.

O2T.2T 4IL+S. There are three types of file that are written d#ring an exec#tion of SQL Loader process. They are O

LOI 4IL+ - The log file gives the stat#s of affairs as the SQL loader program is exec#ting and contains s#mmary of data load exec#tion. It also has table load and s#mmary statistics. >)5 4IL+ O The bad file contain records that fail the criteria in the control file or are re6ected by the database. The bad file is written in same format as that of the data file! so that re6ected records can be reloaded with the same control file. 5IS*)(5 4IL+ O The discard file is created only when specified or needed. The file has records that failed the filtered criteria. ;ence these records are discarded and not inserted. They are written as the same format as that of the data file.

+A+*2TIO9- The command to exec#te SQL Loader in a 29IA shell script is as follows. The files to be s#pplied as shell script variables that ma$e the script more generic. s%lldr #serid < PQLOII93*O9RSS control < PQ*T(L34IL+S log < PQLOI34IL+S discard < PQ5IS34IL+S bad < PQ>)534IL+S data < PQ5)T34IL+S errors < 'BBBBBB J code to chec$ the s#ccess of the invo$ing of s%lloader if TP U -ne BV then echo "+rror E The s%l loader call failed " exit ' fi

4or a complex load we can #se trigger logic to #se .L/SQL programmatic control. ) point to note here is that if yo# want a very high-speed load yo# can #se the direct path method that is faster than the conventional path method. ;owever! the clean #p time re%#ired in the direct path load method later on offsets the benefits obtained by accelerated speed of load. UTL_F LE -- .L/SQL does not have text file inp#t o#tp#t capabilities per se b#t ac%#ires it via 2TL34IL+ pac$age. It provides r#dimentary #tility for reading 7 as well as writing8 files from within a .L/SQL program. The general o#tline of a process of reading a file is shown in 4ig#re ,. The lines in the file are read se%#entially and hence it effects the performance of the program. The 2TL34IL+ pac$age can be wrapped aro#nd with a .L/SQL program and since this pac$age is integrated with .L/SQL it provides #s the tremendo#s ability for flexing o#r "programming m#scles." 7See 4ig#re 08 Some proced#res and f#nctions can be added to this wrapper program that serve as a handy "tool" for doing normal file reading operations. -ith this approach we can achieve whatever SQL Loader can do and m#ch more. The sec#rity mechanism for 2TL34IL+ is achieved by defining a parameter in I9ITWSI5X.ora file called #tl3file3dir parameter. The directories that 2TL34IL+ can read from and write to need to have permissions of Oracle instance owner and the #ser r#nning the pac$age.

5eclare a file handle

Rariable to hold line

Open file with 4O.+9 Iet Line in line variable

*lose the file

4ig#re ,. Schematic representation of 4ile Operations with 2TL34IL+.

2TL34IL+ Inside a wrapper program

.rocessing

tables

4II2(+ 0

2TL34IL+ wrapper 7 tool programs for processing records and inserting into database8.

A 1OTE O1 !tl_'ile_dir This is a very important parameter that needs to be set before yo# can #se the 2TL34IL+ pac$age. It controls the sec#rity of the 2TL34IL+ pac$age. This parameter sho#ld be added to the initWSI5X.ora file. If one wants to write to! or read from n#mero#s directories all of them sho#ld be listed in this parameter separated by commas or spaces. 4or e.g. #tl3file3dir < /home/oracle/inbo#nd

.2ORTA1T O The oracle instance m#st be bro#ght down and restarted for the changes of initWSI5X.ora file to be effective. $ASE STUD* OF UTL_F LE ) case st#dy is developed on 2TL34IL+ to perform similar f#nctions that are performed by SQL Loader tool. ) set of .L/SQL proced#res and f#nctions are developed that can be #sed for performing the actions that SQL Loader can perform and m#ch more. The approach here is on providing the technical insight into developing a schematic approach for loading data via 2TL34IL+ 7fig#re :8 rather than writing .L/SQL pac$age per se. 4IL+ L)YO2T o-empn#m o-name o-sex o-ssn o-hiredate o-termdate o-addressline' o-addressline, o-city o-co#nty o-state o-"ipcode o-dept pic pic pic pic pic pic pic pic pic pic pic pic pic x7''8 x7@B8 x7'8 x7''8 x7D8 x7D8 x70=8 x70=8 x70=8 x70B8 x7,8 x7'B8 x708

4IL+ 4O(?)T 74IA+58


00000000001ALAN BAXTER, K Luca !"#$606 % % % % % % % % 12$ M 228967890821011 701 Kentucky Avenue Toledo

4II2(+ :

4IL+ L)YO2TS )95 4O(?)T

$ASE 2RO/LE.3O22URTU1 T* $O1STRU$T 0 T( A SOLUT O1 STRATE4* The case problem is a Zsnapshot[ of a typical conversion operation where yo# have to load a data into tables. ) data extract file is ta$en as inp#t. The layo#t of the file and positioning of the records is shown in fig#re =. On the database side this data needs be loaded into two tables. In addition to this there is a table that is #sed for loo$#p and an error table that holds the dis%#alified records. The mapping of the two systems 7file and the database8 is shown in fig#re @. The tools programs and main programs are developed calling the 2TL34IL+ pac$age. )n approach towards handling and re-loading the error record is also incl#ded. These programs are incl#ded in the following disc#ssions. The complete pict#re of the approach is provided after the code for these f#nctions 7See fig#re 18. There are lines in the code where I have #sed statement beginning with KJL o#tlining the actions to performed. 4or instance! J- Insert records. This has

been done to write the entire insert statement b#t it is #nderstood that these shorthand are very selfexplanatory and are #sed to foc#s on the important points. The approach can be s#mmari"ed in the following steps O '.)nalysis of the datafile and its str#ct#re. ,.?apping of the data elements 0.5escription of the data ob6ects storing transferred data. :.5ata validation and data massaging tool programs. =.?ain program #sing the tool programs for data load and writing records to error tables. @.+rror correction and reload of data.
*(+)T+ T)>L+ +?.35+T)ILS 7 empno R)(*;)(,7''8! f#llname R)(*;)(,7'=B8! firstname R)(*;)(,70B8! lastname R)(*;)(,70B8! middlenames R)(*;)(,70B8! hiredate 5)T+! terminationdate 5)T+! Sex R)(*;)(,7=8! ssn R)(*;)(,70B8! *ostcenter R)(*;)(,7'B88\

*(+)T+ T)>L+ +?.3)55(+SS+S 7 empno R)(*;)(,7''8! addressline' R)(*;)(,7=B8! addressline, R)(*;)(,7=B8! city R)(*;)(,7=B8! co#nty R)(*;)(,7=B8! state R)(*;)(,70B8! "ipcode R)(*;)(,7'=8! *o#ntry R)(*;)(,7=B88\ *(+)T+ T)>L+ +?.35+.T3LOO]2.S 7 val#e3id 92?>+(! dept3no 92?>+(! costcenter R)(*;)(,7,B88\ *(+)T+ T)>L+ +?.3+((O(S 7 empno R)(*;)(,7''8! f#llname R)(*;)(,7'=B8! firstname R)(*;)(,70B8! lastname R)(*;)(,70B8! middlenames R)(*;)(,70B8! hiredate 5ate! terminationdate 5ate! Sex Rarchar,7=8! ssn R)(*;)(,70B8! *ostcenter Rarchar,7'B8! addressline' R)(*;)(,7=B8! addressline, R)(*;)(,7=B8! city R)(*;)(,7=B8! co#nty R)(*;)(,7=B8! state R)(*;)(,70B8! "ipcode R)(*;)(,7'=8! *o#ntry R)(*;)(,7=B8! +rror3reason R)(*;)(,7,BB8! Stat#s R)(*;)(,7'=8! tabname R)(*;)(,7,B88\ 4II2(+ =. 5+S*(I.TIO9 O4 T)>L+S I9ROLR+5.

+?.35+T)ILS +mpno 4irstname! Lastname! ?iddlenames Sex Ssn ;iredate Termination *ostcenter +?.3)55(+SS+S +mpno )ddressline' )ddressline, *ity *o#nty State Hipcode

+?.392?>+( 9)?+ S+A SS9 ;I(+5)T+ T+(?I9)TIO95)T+ 5+.T9O +?.392?>+( )55(+SSLI9+' )55(+SSLI9+, *ITY *O29TY ST)T+ HI.*O5+

4II2(+@ 5ata ?apping for the tables.

(ead record

(eload program .arse N validation 9O Y+S

criteria

9O +rror tables

Y+S

5ata tables ?)I9 .(OI()?

4II2(+ 1 5etails Of The .rogram 4or Loading 5ata Into Tables.

TOOL PROGRAMS --/** This function returns the converted datatype for the extract data**/-Function GET_ORACLE_ ATE! ps_date in "ARC#AR$% RET&R' ATE () orac*e_for+atted_date ate, ps_date_for+at varchar$!-%./0111122 0, --3eop*esoft date for+at 4EG(' orac*e_for+atted_date./TO_ ATE!TO_'&24ER!LTR(2!RTR(2!ps_dat e%%%5ps_date_for+at%, return orac*e_for+atted_ ate,

E6CE3T(O' 7hen others then 8-- Error hand*er insert the record in the error ta9*e: end, /**3rocedure to set e+p*oyee na+e: This proc 9rea;s the na+e into +idd*e na+e 5 first na+e and *ast na+e: The for+at of the co+in< na+e is *astna+e5firtna+e +idd*ena+e for e:< )hrivastava5 Anunaya =: This is returned >ith the he*p of varia9*e and can 9e read inside the ca**in< pro<ra+ **/ 3ROCE &RE )ET_E23_'A2E !na+e_in in varchar$5firstna+e_out in out "archar$5 +idd*ena+e_out in out varchar$5 *astna+e_out in out varchar$% is 9e<in *astna+e_out./ rtri+!su9str!na+e_in5?5instr!na+e_in50505?5?%-?%%, +idd*ena+e_out./rtri+!su9str!na+e_in5instr!na+e_in50505?5?%@instr!s u9str!na+e_in5instr!na+e_in50505?5?%@?5AB%50 05?5?%@?%%, Firstna+e_out./rtri+!su9str!na+e_in5instr!na+e_in50505?5?%@?5instr!s u9str!na+e_in5instr!na+e_in50505?5?%@?5AB%50 05?5?%%%, E6CE3T(O' 7hen others Then 8-- Error hand*er insert the record in the error ta9*e: End , /* Function to <et ho+e center: This function does a dyna+ic *oo;up and returns the correspondin< cost center for the dept8*/ F&'CT(O' GET_CO)T_CE'TER !in_dept_no in "ARC#AR$% RET&R' varchar$ is *_cost_center "archar$!?B%, 4EG(' )ELECT costcenter into *_cost_center FRO2 E23_ E3T_LOO=&3) 7#ERE dept_no / in_dept_no, RET&R' *_cost_center, E6CE3T(O' 8-- Error hand*er-insert the rec in the error ta9*e: end ,

-------/** The error hand*er inserts the records into the error ta9*es >ith a reason and the source ta9*e so that they can 9e re*oaded after correction **/----------------------3rocedure error_hand*er ! err_e+p_rec e+p_detai*sCro>type5 err_add_rec in e+p_addressesCro>type5 err_+s< in e+p_errors:error_reasonCtype5 err_ta9*e in e+p_errors:ta9na+eCtype% () 4e<in 8--(nsert_into_e+p_errors---

end, -----/** This procedure parses the input strin< into various records and fie*ds**/-3ROCE &RE 3AR)E_REC !v_rec_*ine in "archar$5 v_rec in out e+p_detai*sCro>type5 d_rec in out e+p_addressesCro>type% () 9e<in --*** parse for detai*s ta9*e **---------------------------------------------------------------v_rec:e+pno./ )&4)TR!v_rec_*ine5?5??%, v_rec:fu**na+e./)&4)TR!v_rec_*ine5?$5DB%, v_rec:sex./)&4)TR!v_rec_*ine5EF5?%, v_rec:ssn./)&4)TR!v_rec_*ine5EG5??%, -------------------Trans*ate the datatype and Get it into Orac*e ta9*e use the too* function------v_rec:hiredate./GET_ORACLE_ ATE!)&4)TR!v_rec_*ine5-D5H%%, v_rec:ter+inationdate./GET_ORACE_ ATE!)&4)TR!v_rec_*ine5HD5H%% , ----GET the firstna+e5 *astna+e and +idd*e na+e !data trans*ation >ith functions%--------------set_e+p_na+e!v_rec:fu**na+e5 v_rec:firstna+e5 v_rec:+idd*ena+es5v_rec:*astna+e%,

------------------GET the cost center-----------------------------------------------------------v_rec:costcenter./GET_CO)T_CE'TER!su9str!v_rec_*ine5$D$5F%%, -----***parse for add ta9*e ***------------------------------------------------------------------d_rec:e+pno./)&4)TR!v_rec_*ine5?5??%, d_rec:address*ine?./)&4)TR!v_rec_*ine5?BD5FA%, d_rec:address*ine$./)&4)TR!v_rec_*ine5?G$5FA%, d_rec:city ./)&4)TR!v_rec_*ine5?E-5FA%, d_rec:county./)&4)TR!v_rec_*ine5$?G5FB%, d_rec:state./)&4)TR!v_rec_*ine5$GA5$%, d_rec:Iipcode./)&4)TR!v_rec_*ine5$G-5?B%, d_rec:Country./)&4)TR!v_rec_*ine5$AH5F%, E6CE3T(O' --8 ca** the Error_hand*er procedure for errin< record and insert into the error ta9*e end, /*(nsert into ta9*es on*y one sa+p*e pro<ra+ is sho>n here5 the sa+e *o<ic can 9e app*ied for other ta9*es *i;e e+p_address and e+p_add_errors insertion*/ 3rocedure insert_e+p_detai*s ! <rec in e+p_detai*sCro>type% As 9e<in (')ERT ('TO E23_ ETA(L) !e+pno5 fu**na+e5 firstna+e5 *astna+e5 +idd*ena+es5 hiredate5 ter+inationdate5 )ex5 ssn5 Costcenter% "AL&E) !<rec:e+pno5 <rec:fu**na+e5 <rec:firstna+e5 <rec:*astna+e5

<rec:+idd*ena+es5 <rec:hiredate5 <rec:ter+inationdate5 <rec:sex5 <rec:ssn5 <rec:costcenter%, end, 2A(' 3ROGRA2 3ROCE &RE LOA _ ATA ! *oc in varchar$5 fi*e in varchar$% () e+p_det_rec e+p_detai*sCro>type, e+p_add_rec e+p_addressesCro>type, ------&TL_F(LE re*ated varia9*es---fi*e_hand*e &TL_F(LE:F(LE_T13E, data_*ine "archar$!?B$F%, ----------------------------------4EG(' -- open the fi*e in a read +ode------fi*e_hand*e ./ &TL_F(LE:FO3E'!*oc5fi*e50R0%, ---GEt the *ines in the *oop and do the processin<-LOO3 9e<in &TL_F(LE:GET_L('E!fi*e_hand*e5 data_*ine%, data_*ine./ rtri+!*tri+!data_*ine%%, ----parse the records and *oad the+ into out rec varia9*es! too* pro<ra+ proc parse_rec%--parse_rec!data_*ine5e+p_det_rec5e+p_add_rec%, ----Load the data into respective ta9*es ! too* pro<ra+ of insertin< into e+p ta9*es%------(nsert into e+p detai*s-----(nsert_into_e+p_detai*s ! e+p_det_rec%, ---------------------------------------------------------

---(nsert into e+p address------(nsert_into_e+p_addresses!e+p_add_rec%, --------------------------------------------------------E6CE3T(O' 7#E' 'O_ ATA_FO&' then E6(T, end, E' LOO3, ---&TL_F(LE CLO)E---&TL_F(LE:FCLO)E!fi*e_hand*e%, E6CE3T(O' 7#E' &TL_F(LE:('"AL( _3AT# then &TL_F(LE:FCLO)E!fi*e_hand*e%, d9+s_output:put_*ine!0(nva*id path for the fi*e0%, 7#E' &TL_F(LE:('"AL( _2O E then &TL_F(LE:FCLO)E!fi*e_hand*e%, d9+s_output:put_*ine!0(nva*id +ode for the fi*e0%, 7#E' &TL_F(LE:('"AL( _F(LE#A' LE then &TL_F(LE:FCLO)E!fi*e_hand*e%, d9+s_output:put_*ine!0(nva*id fi*e hand*e 0%, 7#E' &TL_F(LE:REA _ERROR then &TL_F(LE:FCLO)E!Fi*e_hand*e%, d9+s_output:put_*ine!0Read error for the fi*e0%, 7#E' OT#ER) then &TL_F(LE:FCLO)E!fi*e_hand*e%, d9+s_output:put_*ine!0Error in Load data procedure0%, E' , RELOAD PROGRAM /* This procedure re*oads the records after correction: This correction can 9e done >ith a s+a** user interface that can 9e deve*oped in eve*oper$BBB For+sG:A and then the status 9e chan<ed for those records fro+ the user interface */ 3rocedure Re*oad_correct_recs () Cursor correct_rec is )ELECT * FRO2 e+p_errors

7#ERE status /0CORRECTE 0 For &pdate of )tatus, 4EG(' For i in correct_rec LOO3 (f &33ER!i:ta9na+e% / 0E23_ ETA(L)0 then 8insert the records in E+p_detai*s >ith appropriate datatypes e*sif &33ER!i:ta9na+e% / 0E23_A RE))E)0 then 8 insert the records in E+p_addresses>ith appropriate datatypes e*sif &33ER!i:ta9na+e% /04OT#0 then 8insert the records in E+p_detai*s >ith appropriate datatypes 8insert the records in E+p_addresses >ith appropriate datatypes End if, &pdate e+p_errors set status / 0RELOA E 0 7#ERE current of correct_rec, E' end, LOO3,

TA4LE 4 3ara+eter 4ac;>ard co+pati9i*it y )ecurity (nte<ration >ith 3L/)JL )JL*Loader #i<h: 2ature and sta9*e product #i<h ata9ase *eve* security "ery Lo> "ery difficu*t to inte<rate &TL_F(LE Lo> (s app*ica9*e fro+ Orac*e E:F:on>ards Lo> #as to achieved 9y ut*_fi*e_dir "ery hi<h )ea+*ess inte<ration >ith 3L/)JL

3erfor+anc e &pdates >hi*e Loadin< Enforce referentia* inte<rity ata transfor+a tion Co+p*ex data Load conditions Error Correction Record *en<th Fi*e operations

#i<h speed *oadin< Good for very hi<h vo*u+e of data as in conversion Cannot perfor+ the updates Lo> "ery hard to i+p*e+ent #ard to do a conditiona* transfor+ation ifficu*t to hand*e

Lo> speed *oadin< Good for *o> vo*u+e of data as in an (nterface Can perfor+ updates Easy to i+p*e+ent Easy to do a data transfor+ation Easy to hand*e >ith the 3L/)LJ pro<ra++atic contro* An easy user-interface can 9e deve*oped in For+s G:A to edit the records fro+ the error ta9*es: Len<th of a *ine of a record cannot 9e +ore than ?B$$ characters &TL_F(LE is inte<rated >ith 3L/)JL and has *i+ited fi*e operations capa9i*ities

4ad fi*e need re-editin< throu<h so+e editor or pro<ra++in< effort via )ed 5a>; 5 or 3er*: Len<th of a *ine of record can 9e +ore than ?B$$ characters oes not have any direct fi*e operations capa9i*ities 9ut can 9e done >ith the &nix )he** script

CONCLUSIONS The co+parative ana*ysis of )JL*Loader and &TL_F(LE revea*s that these too*s are suita9*e to your environ+ent su9Kect to the conditions of your needs: (f the data *oad is co+p*ex !as is the case in re*ationa* data9ases% &TL_F(LE see+s to 9e the too* of choice: This too* does reLuire pro<ra++atic effort in ter+s of >ritin< a >rapper pac;a<e 9ut the su9seLuent effort in this direction is <reat*y reduced once the initia* too* ;it is 9ui*t for your environ+ent: The &TL_F(LE ti<ht*y inte<rates the fi*e input >ith the pro<ra++atic contro* and the data +anipu*ation inside a sin<*e 3L/)JL unit: There are disadvanta<es of speed in *oadin< in case of &TL_F(LE 9ut these are tota**y offset 9y the pro<ra++atic contro* it offers and the inte<ration it 9rin<s in: Thus >e find that &TL_F(LE too* 9rid<es the <ap *eft 9y )JL*Loader for co+p*ex data *oads:

Вам также может понравиться