Вы находитесь на странице: 1из 49

Moving Data

Advance Database Management System Using Oracle 10g

Moving Data : General Architecture

Moving Data : General Architecture

Directory Object : Overview

Directory Object : Overview [cont]


Directory objects are logical structures that represents

a physical directory on the servers file system. They contain the location of a specific operating system directory. They provide greater file management flexibility because the names of the directory objects can be used in Enterprise Manager, so you are not required to hard-code directory path specifications. Directory objects are owned by the SYS user. Directory names are unique across the database because all the directories are located in a single name space (that is, SYS).

Directory Object : Overview [cont]


Directory objects are required when you specify

the file locations for Data Pump because it accesses files on the server rather than on the client. In Enterprise Manager, select Administration -> Directory Objects. To edit or delete a directory object, select the directory object and click appropriate button.

Creating Directory Objects

Creating Directory Objects [cont]

SQL*Loader : Overview

SQL*Loader : Overview [cont]


SQL*Loader loads data from external files into

tables of an Oracle database. It has a powerful data parsing engine that puts little limitation on the format of the data in the data file.

SQL*Loader : Overview [cont]


The files that are used by SQL*Loader are as follows: Input Data Files :
SQL*Loader reads data from one or more files that are

specified in the control file. From SQL*Loader perspective, the data in the data file is organized as records. A particular data file can be in fixed record format, variable record format, or stream record format. The record format can be specified in the control file with the INFILE parameter. If no record format is specified, the default is stream record format.

SQL*Loader : Overview [cont]


The files that are used by SQL*Loader [cont] Control File :
The control file is a text file that is written in a language

that SQL*Loader understands. The control file indicates to SQL*Loader where to find the data, how to parse and interpret the data, where to insert the data, and so on. Although not precisely defined, a control file can be said to have three sections:
Global options, such as the input data file name, and records

to be skipped. INFILE clauses to specify where the input data is located. Data to be loaded.

SQL*Loader : Overview [cont]


The files that are used by SQL*Loader [cont] Control File [cont]
Although not precisely defined, a control file can be said to have three sections:
The first section contains session wide information,

for example,
Global options, such as the input data file name, and records to be skipped. INFILE clauses to specify where the input data is located. Data to be loaded.

SQL*Loader : Overview [cont]


The files that are used by SQL*Loader [cont] Control File [cont]
Although not precisely defined, a control file can be said to have three sections:
The second section consists of one or more INTO

TABLE blocks. Each of these blocks contains information about the table (such as the table name and the columns of the table) into which the data is to be loaded. The third section is optional and, if present , contains input data.

SQL*Loader : Overview [cont]


The files that are used by SQL*Loader [cont] Log file :
When

SQL*Loader begins execution, it creates a log file. If it cannot create a log file, execution terminates. The log file contains a detailed summary of the load, including a description of any errors that occurred during the load.

Bad file :
The bad file contains the records that are rejected, either

by SQL*Loader or by the Oracle database. Data file records are rejected by SQL*Loader when the input format is invalid.

SQL*Loader : Overview [cont]


The files that are used by SQL*Loader [cont]
Bad file [cont]:
After a data file record is accepted for processing by

SQL*Loader, it is sent to the Oracle database fro insertion into a table as a row. If the Oracle database determines that the row is valid, then the row is inserted into the table. If the row is determined to be invalid, then the record is rejected and SQL*Loader puts it in the bad file.

SQL*Loader : Overview [cont]


The files that are used by SQL*Loader [cont]
Discard file :
This file is created only when it is needed, and only

if you have specified that a discard file should be enabled. The discard file contains records that are filtered out of the load because they do not match any recordselection criteria specified in the control file.

Loading Data with SQL*Loader

Using SQL * Loader


SQL*Loader is an Oracle utility that enables

you to efficiently load large amounts of data into a database. If you have data in a flat file, such as commadelimited text file, and you need to get that data into the Oracle database, SQL * Loader is the tool to use.

Introducing SQL * Loader


Using SQL*Loader, you can do the following:
Load the data from the delimited text file, such as

comma-delimited file Load the data from the fixed width text file Load the data from a binary file Combine multiple input records into one logical record Store data from one logical record into one table or into several tables

Introducing SQL * Loader [cont]


Using SQL*Loader, you can do the following:
Write SQL expressions to validate and transform

data as it is being read from a file combine data from multiple files into one table Filter the data in the input file, loading only selected rows Collect bad records that is, those records that wont load into a separate file where you can fix them And more!

Understanding the SQL*Loader Control File

To use SQL* Loader, you need to have


A database A flat file to load, and A control file to describe the contents of the flat

file

Understanding the SQL*Loader Control File [cont]

Understanding the SQL*Loader Control File [cont]


Place the figure from spiral

Understanding the SQL*Loader Control File [cont]


Control files, such as one illustrated in previous figure,

contain a number of commands and clauses describing the data that SQL*Loader is reading. Control files also tell SQL*Loader where to store that data , and they can define validation expressions for that data. The control file is aptly named, because it controls almost every aspect of how SQL*Loader operates. The control file describes the format of the data in the input file and tells SQL*Loader which tables and columns to populate with this data.

Understanding the SQL*Loader Control File [cont]


When you write a control file, you need to be

concerned with following questions:


What file, or files, contain the data you want to load? What table, or tables you are loading? What is the format of the data that you are loading? What do you want to do with records that wont load?

All of these items represent things that you specify when you write a SQL*Loader control file.

Understanding the SQL*Loader Control File [cont]


Generally,

control files consist of one long command that starts out like this: LOAD DATA The keyword DATA is optional. Everything else in a control file is a clause of some sort that is added onto this command.

Understanding the SQL*Loader Control File [cont]


Create following table: CREATE TABLE animal_feeding animal_id NUMBER, feeding_date DATE, pounds_eaten NUMBER (5,2), note VARCHAR2 (80) );

Specifying the input file


You use the INFILE clause to identify the file

containing the data that you want to load. The data can be in a file separate from the control file, which is usually the case, or you can place the data within the control file itself. Use multiple INFILE clauses if your data is spread across several files. Control File Data: If you are loading the data from a text file, you have the option of placing the LOAD command at the beginning of that file, which then becomes the control file.

Specifying the input file [cont]


Control File Data: to specify that SQL*Loader looks in the control file for the data, supply an asterisk (*) for the file name in the INFILE clause. For example: LOAD DATA INFILE * .. . BEGINDATA data data data

Specifying the input file [cont]


Control File Data:
If you do include your data in the control file, the last clause of your LOAD command must be the BEGINDATA clause. This tells the SQL*Loader where the command ends and where your data begins. SQL*Loader will begin reading data from the line immediately following BEGINDATA.

Specifying the input file [cont]


DATA in a Separate File: Although you can have data in the control file, its more common to have it in a separate file. In that case, you place the file name after the keyword INFILE as shown in example: LOAD DATA INFILE animal_feeding.csv .

Specifying the input file [cont]


DATA in Multiple Files: You can use multiple INLINE clauses to load data from several files at once. The clauses must follow each other, as shown below: LOAD DATA INFILE animal_feeding_fixed_1.dat INFILE animal_feeding_fixed_2.dat

Loading data into nonempty tables


After listing the input file, or files, in SQL * Loader,

you need to specify whether you expect the table that you are loading to be empty. By default, SQL*Loader expects that you are loading the data in completely empty table. If, when load starts, SQL*Loader finds even one row in the table, the load will be aborted. Four keywords control SQL*Loaders behaviour when it comes do dealing with empty vs. nonempty tables:

Loading data into nonempty tables


INSERT Specifies that you are loading an empty table. SQL *Loader will abort the load if the table contains data to start with. Specifies that you are adding data to a table. SQL * Loader will proceed with the load even if preexisting data is in the table. Specifies that you want to replace the data in the table. Before loading, SQL *Loader will delete any existing data. Specifies the same as REPLACE, but SQL * Loader uses the TRUNCATE statement instead of a DELETE statement to delete the existing data.

APPEND

REPLACE

TRUNCATE

Loading data into nonempty tables [cont]


Place the keyword for whichever option you

choose after INFILE clause, as shown in example:


LOAD DATA INFILE animal_feeding.csv APPEND . .

If you dont specify an option, then INSERT is assumed by default.

Specifying the table to load


In SQL*Loader, you use the INTO TABLE clause

to specify which table or tables you wan to load. It also specifies the format of the data contained in the input file. The INTO TABLE clause is the most complex of all the clauses.

Specifying the table to load [cont]


Loading One Table:

example: LOAD DATA INFILE 'load1.csv' INSERT INTO TABLE LOAD_TEST ( eno char terminated by ",", ename char terminated by ",", city char terminated by "," )

Using SQL*Loader Data Types

SQL*Loader supports a variety of data types. Some of the most useful data types for loading data from text files are given below:

Data Type Description Name CHAR Identifies the character data. If you are loading data into any type of text field, such as VARCHAR2, CHAR, or CLOB, use the SQL*Loader CHAR data type. Identifies a date. Even though its optional, specify the format to avoid problems.

DATE [format]

INTEGER Identifies an integer value that is stored in character form. EXTERNAL For ex: the character string 123 is a valid INTEGER EXTRNAL value.

Using SQL*Loader Data Types [cont]


Data Type Name DECIMAL EXTERNAL ZONED scale) Description Identifies a numeric value that is stored in character form and that may include a decimal point. The string -123.45 is a good exa. Of this data type.

(precision, Zoned decimal fields are numeric values represented as character strings and that contain an assumed decimal point. For ex., a definition of ZONED (5,2) would cause 12345 to be interpreted as 123.45.

Creating a control file [example]


Loading One Table:

example: LOAD DATA INFILE 'load1.csv' INSERT INTO TABLE LOAD_TEST ( eno char terminated by ",", ename char terminated by ",", city char terminated by "," )

Describing fixed-width columns


The INTO TABLE clause contains a field list

within parenthesis. This list defines the fields being loaded from the flat file into the table. Each entry in the field list has this general format:
column_name POSITION (start:end) datatype column_name : the name of a column in the table that you are loading POSITION (start:end) : the position of the column within the record. The values for start and end represents the character positions for the first and last characters of the column. The first character of a record is always position 1.

Describing fixed-width columns


Datatype : A SQL*Loader data type that identifies

the type of data being loaded.


You will need to write one field list entry for each column that

you are loading. As an example, consider the following record:


10010-jan-200002350Flipper seemed unusually hungry today. The above record contains a three-digit ID number, followed by a date, followed by a five digit number, followed by a text field. The ID number occupies character positions 1 through 3 and is an integer, so its definition would look like this: animal_id POSITION (1:3) INTEGER EXTERNAL The date field is next. Occupying character positions 4 through 14, and its definition looks like this: feeding_date POSITION (4:14) DATE dd-mon-yyyy.

Example: Loading fixed-width data


LOAD DATA INFILE animal_feeding_fixed_1.dat APPEND INTO TABLE animal_feeding TRAILING NULLCOLS (
animal_id feeding_date pounds_eaten note POSITION (1:3) INTEGER EXTERNAL, POSITION (4:14) DATE dd-mon-yyyy, POSITION (15:19) ZONED (5,2), POSITION (20:99) CHAR

Describing Delimited Columns


The format for describing delimited data, such as comma-delimited

data, is similar to that used for fixed-width data. The difference is that you need to specify the delimited being used. The general format of a delimited column definition looks like this: column_name datatype TERMINATED BY delim [OPTIONALLY ENCLOSED BY delim] The elements of this column definition are described as follows: column_name : the name of a column in the table that you are loading datatype : A SQL*Loader datatype TERMINATED BY delim : identifies the delimiter that marks the end of the column OPTIONALLY ENCLOSED BY delim : Specifies an optional enclosing character. Many text values, for example, are enclosed by quotation marks.

Describing Delimited Columns [cont]


When describing delimited fields, we must be

careful to describe them in the order in which they occur. Take a look at following record which contains delimited data:
100,1-jan-2000,23.5,Flipper seemed unusually hungry today. It can be defined as below: animal_id INTEGER EXTERNAL TERMINATED BY ,, feeding_date DATE dd-mon-yyyy TERMINATED BY ,, pounds_eaten DECIMAL EXTERNAL TERMINATED BY ,, note CHAR TERMINATED BY , OPTIONALLY ENCLOSED BY

Working with short records


When dealing with delimited data, you occasionally runs into

cases where not all fields are present in each record in a data file. For example, look at two records:
100,1-jan-2000,23.5,Flipper seemed unusually hungry today. 151,1-jan-2000,55 The first record contains a note, while the second does not. SQL*Loaders default behavior is to consider the second record as an error because not all fields are present. You can changes this behavior and cause SQL*Loader to treat missing values at the end of a record as nulls, by using TRAILING NULLCOLS clause.

Working with short records [cont]


The

TRAILING NULLCOLS clause is the part of the INTO TABLE clause, and it appears as follows:
INTO TABLE animal_feeding TRAILING NULLCOLS (

Converting Blanks to Nulls


When dealing with data in fixed-width columns, you will find that missing

values appear as blanks in data file. For ex: 100120-mar-2012good morning all 11100223-mar-2012this is demo
The first record is missing the two digit id value. If this case is not

handled, then the record will be rejected from the load. If you prefer to treat a blank field as a null, you can use the NULLIF clause to tell SQL*Loader to interpret it as null value. The NULLIF clause comes after the datatype and takes the following form:
NULLIF field_name= BLANKS e.g: cid POSITION (1:3) INTEGER EXTERNAL NULLIF cid=BLANKS,