Вы находитесь на странице: 1из 74

Oracle 10 g

All That Is New & Exciting

‫איציק הדר‬
‫יונתן טולדנו‬
Certified Oracle 10g Technician
Agenda
• Regular Expressions
• Native Floating-Point Data Type
• LOB Enhancements
• PL/SQL Performance Improvement
• Data Pump
Regular Expressions

The regular expressions feature in


Oracle Database 10g is a powerful tool
for manipulating textual data
Vastly improves your ability to search and
manipulate character data
What are Regular
Expressions?
• A language, or syntax, you can use to
describe patterns in text
• Comprises one or more character
literals and/or metacharacters
• Metacharacters provide algorithms that
specify how Oracle should process the
characters that make up a regular
expression
What Is It Good For?
• Validate formats (Constraints):
– phone numbers
– zip codes
– email addresses
– Social Security numbers
– IP addresses
– filenames and pathnames
– and more…

• Search & locate patterns such as:


– HTML tags
– Numbers
– Dates
– or anything that fits any pattern within any textual data
Pre-Oracle Database 10g
Find parks with acreage in their descriptions:

SELECT *
FROM park
WHERE description LIKE '%acre%';

Finds '217-acre' and '27 acres', but also ‘few


acres’, ‘more acres than all other parks’, 'the
location of a massacre', etc.
Oracle Database 10g
• Four regular expression functions
– REGEXP_LIKE does pattern match?
– REGEXP_INSTR where does it match?
– REGEXP_SUBSTR what does it match?
– REGEXP_REPLACE replace what matched.
• POSIX Extended Regular Expressions
• (Portable Operating System Interface )
– Be very specific about the type of character you are
looking for: alphabetic, numeric, punctuation …
– Must be enclosed by a character list indicated by
square brackets ([]).
REGEXP_LIKE - Syntax
REGEXP_LIKE(source_string, pattern [, match_parameter])

source_string – column name or string (CHAR, VARCHAR2,


CLOB, NCHAR, NVARCHAR2, NCLOB but not LONG).
pattern = regular expression
match_parameter - optional parameters such as handling
the newline character, retaining multiline formatting, and
providing control over case-sensitivity.
REGEXP_LIKE - Example
• Determine whether a pattern exists in a string
• Revisiting the acreage problem:
SELECT *
FROM park
WHERE REGEXP_LIKE(description,
'[0-9]+(-| )acre');
• Finds '217-acre' and '27 acres'
• REJECTS ‘few acres’, ‘more acres than all other
parks’, 'the location of a massacre', etc.
REGEXP_LIKE - Explain
SELECT *
FROM park
WHERE REGEXP_LIKE(description,
'[0-9]+(-| )acre');

[] Define character list


0-9 Lookup range 0 to 9 ( ‘-’ indicate range)
+ Match 1 or more times the character list
() Subexpression
| Alternate – OR: ‘-’ OR ‘ ‘ (whitespace)
REGEXP_LIKE
Constraint Example
SQL> CREATE TABLE t1 (c1 VARCHAR2(20),
CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$')));
SQL> INSERT INTO t1 VALUES ('newuser');
SQL> 1 row created.
SQL> INSERT INTO t1 VALUES ('newuser1');
SQL> ORA-02290: check constraint violated

^ Anchor the expression to the start of a line


[:alpha:] POSIX - Alphabetic characters
$ Anchor the expression to the end of a line
REGEXP_INSTR - Syntax
REGEXP_INSTR(source_string, pattern [, start_position
[, occurrence
[, return_option
[, match_parameter]]]])
Returns the starting position of a pattern

Start_position – Where to begin the search


Occurrence - looking for a subsequent occurrence
Return_option - 0 returns the starting point
1 returns the end point + 1
REGEXP_INSTR - Example
• Find out where a match occurs:
SELECT REGEXP_INSTR(address,
'[[:digit:]]{5}(-[[:digit:]]{4})?$')
“inStr”, address FROM park;

inStr ADDRESS
----- -------------------------------------
27 Mr. Smith, Good St., CA – 98765-9876
11 Jo, NY-NY 14357-6543
23 Helen, sun street, FL 87876

REGEXP_INSTR - Explain
SELECT REGEXP_INSTR(address,
'[[:digit:]]{5}(-[[:digit:]]{4})?$')
Match 0

Literal
Or 1 time

Match exactly 4 times


POSIX: Numeric
Match exactly 5 times
End of Line
digits only

POSIX: Numeric
digits only
Character List

Character List subexpression


REGEXP_SUBSTR - Syntax
REGEXP_SUBSTR(source_string, pattern [, position
[, occurrence
[, match_parameter]]])

RETURN the sub-string that match the pattern


REGEXP_SUBSTR - Example
SELECT REGEXP_SUBSTR
(‘first, second , third’, ‘, [^,]*,’) rs
FROM dual;

RS
-------------
, second ,
REGEXP_SUBSTR - Explain
SELECT REGEXP_SUBSTR
(‘first, second , third’, ‘, [^,]*,’) rs
FROM dual;
Search for a
comma followed
by a space

Then zero or more


^: If you use ‘^’ as the first characters that are
character inside a character list,
not commas
it means the negation of a
character list. Lastly looks
for another
comma
REGEXP_REPLACE - Syntax
REGEXP_REPLACE(source_string, pattern [, replace_string
[, position
[, occurrence
[, match_parameter]]]])
RETURN the new string with the replaced pattern

Replaces the matching pattern with a specified replace_string


REGEXP_REPLACE- Example
SELECT REGEXP_REPLACE(
‘Smith Hildi Ellen’,
(.*) (.*) (.*),
‘\3, \2 : \1’) rr
FROM dual;

RR
------------------------
Ellen, Hildi : Smith
REGEXP_REPLACE- Explain
SELECT REGEXP_REPLACE(
‘Smith Hildi Ellen’,
(.*) (.*) (.*),
‘\3, \2 : \1’) rr
FROM dual;

( ) - Subexpression
(.) - Match any character within the subexpression
(.*) - Match any character in the subexpression zero or more times
(.*) (.*) - Whitspace between subexpressions mast be matched as well
\digit - Capture the corresponding subexpression and reference the
subexpression value. This is called “Backreferences” and is
used in the example to return the subexpressions in a new
format (including whitespace, comma & colon)
Practical Uses: Constraints
ALTER TABLE students ADD CONSTRAINT std_snn_chk
(REGEXP_LIKE (snn,
‘^([[:digit]]{3}-[[:digit]]{2}-[[:digit]]{4}|[[:digit]]{9})$’));

Anchor the expression to the start of a line


Anchor the expression to the start of a line

3 Digits 2 Digits 4 Digits 9 Digits


Dash Dash
OR

Legal values:
123-45-6789
OR
123456789
Practical Uses: Indexes
• Use function-based indexes:
CREATE INDEX acre_ind
ON park (REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+'));
• To support regular expression queries:
SELECT * FROM park
WHERE
REGEXP_SUBSTR(REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+') = 217;
Practical Uses: Views
• Hide the complexity from users:
CREATE VIEW park_acreage as
SELECT park_name,
REGEXP_SUBSTR(
REGEXP_SUBSTR(
description,
'[0-9]+(-| )acre'),
'[0-9]+') acreage
FROM park;
Practical Uses: PL/Sql
• REGEXP_LIKE acts as a Boolean function in
PL/SQL:
IF REGEXP_LIKE(description,'[0-9]+(-| )acre') THEN
acres := REGEXP_SUBSTR(
REGEXP_SUBSTR(description,
'[0-9]+(-| )acre'),'[0-9]+');
...

• All other functions act identically in PL/SQL and SQL.


• Replace hundreds of lines of code
– String manipulation functions can be simplified
Match Parameter
• All functions take an optional match
parameter:
– Is matching case sensitive?
– Does period (.) match newlines?
– Is the source string one line or many?
• The match parameter comes last
Case-sensitivity
• Case-insensitive search:
SELECT c1 FROM t1
WHERE
REGEXP_LIKE(c1, ‘Joh?n Ste(ph|v)ens?’,‘i’);
i Case insensitivity
? Match 0 or 1 time
Æ Jon Stevens
Æ JOHN Stephens
Æ John StEVen
Newline matching
INSERT INTO park VALUES ('Park 6',
'640' || CHR(10) || 'ACRE');

SELECT *
FROM park
WHERE REGEXP_LIKE(
description,
'[0-9]+.acre',
'in');
String anchors
INSERT INTO employee (surname)
VALUES ('Ellison' || CHR(10) ||
'Gennick');

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
Yes!
surname,'^Ellison');
String anchors
INSERT INTO employee (surname)
VALUES ('Ellison' || CHR(10) ||
'Gennick')

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
No!
surname,'^Gennick');
String anchors
INSERT INTO employee (surname)
VALUES ('Ellison' || CHR(10) ||
'Gennick')

SELECT * FROM
EMPLOYEE
WHERE REGEXP_LIKE(
Yes!
surname,'^Gennick','m');
Locale Support
• Full Locale Support
– All character sets
– All languages
• Case and accent insensitive searching
• Linguistic range
• Character classes
• Equivalence classes
Character Sets and
Languages
• For example, you can search for Hebrew
names beginning with ‫ ע‬and ending with ‫ל‬:
SELECT *
FROM employee
WHERE REGEXP_LIKE(
surname,
'^‫[[ע‬:alpha:]]*‫ל‬$');
Case- and Accent-
Insensitive Searching
• Respect for NLS settings:
ALTER SESSION
SET NLS_SORT = GENERIC_BASELETTER;
• With this sort, case won't matter and an
expression such as:
REGEXP_INSTR(x,'resume')
will find "resume", "résumé", "Résume", etc.
Linguistic Range
• Ranges respect NLS_SORT settings:

NLS_SORT=GERMAN a,b,c…z

[a-z]

NLS_SORT=GERMAN_CI a,A,b,B,c,C…z,Z
Character Classes
• Character classes such as [:alpha:] and
[:digit:] encompass more than just Latin
characters.
• For example, [:digit:] matches:
– Latin 0 through 9
– Arabic-Indic ٠ through ٩
– And more
Equivalence Classes
• Ignore case and accents without changing
NLS_SORT:
REGEXP_INSTR(x,'r[[=e=]]sum[[=e=]]')
• Finds 'resume', 'résumé', and 'rEsumE'
Regular Expression
Conclusion
• 80% of application logic is string processing then
it's hard to think of an application that could not
make use of regular expressions.
• Oracle Regular Expressions provide versatile string
manipulation in the database instead of
externalized in middle tier logic
• They are Locale sensitive and support character
large objects
• Available in both SQL and PL/SQL
Agenda
• Regular Expressions
• Native Floating-Point Data Type
• LOB Enhancements
• Collection Enhancement
• PL/SQL Performance Improvement
• Data Pump
Native Floating-Point Data Type

• Two new numeric data types


BINARY_FLOAT, BINARY_DOUBLE
– IEEE 754 Standard for binary floating point
arithmetic
– Part of numerous other standards (e.g,
Java, XML Schema) and hardware
platforms
– Prevalent in Business Intelligence, Life
Sciences, Engineering/Scientific
Computation, etc.
Key Benefits
• Usability
– Well known and widely accepted by the majority of numerical
computation users
– Works well for database client applications such as XML and java,
and map to their native data types
• Efficiency
– While Oracle NUMBERs are implemented in software Native
floating-point numbers implemented in hardware
– Faster and also may use less disk space
• Seamless Support
– Both SQL and PL/SQL offer full support for these new types. They
can be used in all contexts where a scalar type may be used
PL/Sql Example
• Calculate π using the Euler series
• Approx 300,000 iterations

• NUMBER takes ~ 27.7 sec

• BINARY_DOUBLE takes ~ 3.7 sec

• Improvement factor: ~7x


Native Floating-Point
Functions
• New type conversion functions
– TO_BINARY_FLOAT, TO_BINARY_DOUBLE
– TO_NUMBER
• SQL function support
– Numberic functions (sin, cos, etc.)
– Aggregate functions (sum, avg, stddev, etc.)
– Analytic functions (sum, avg, stddev, etc.)
Native Floating Point
Constraints
create table floating_point_table1 (
fltNnull binary_float constraint flt_null not null,
dblNnull binary_double constraint dbl_null not null,
fltUnq binary_float constraint flt_unq unique,
dblUnq binary_double constraint dbl_unq unique,
fltChk binary_float constraint
flt_chk check ( fltChk is not NaN ) ,
dblChk binary_double constraint
dbl_chk check ( dblChk is not infinite) ,
fltPrm binary_float constraint flt_prm primary key);

* NaN (Not a Number) – e.g. 0/0, infinity/infinity


Native Floating Point
Constraints
create table floating_point_table2 (
dblPrm binary_double constraint
dbl_prm primary key,
fltFrn binary_float constraint flt_frn
references floating_point_table1(fltPrm)
on delete cascade);
Agenda
• Regular Expressions
• Native Floating-Point Data Type
• LOB Enhancements
• PL/SQL Performance Improvement
• Data Pump
LOB Enhancements
• LOBs are prevalent in storing
unstructured data (text, AVI,
genomic/proteomic
sequences, etc.)

• Implicit charset conversion


Between CLOB and NCLOB
Ultra-Sized LOBs
• Terabyte (8 – 128 TB) sized Lobs
– DB_BLOCK_SIZE (2 – 32 KB) x (4GB –1)
• New DBMS_LOB.GET_STORAGE_LIMIT
function
• OCI, JDBC, and DBMS_LOB now supports
LOBs greater than 4GB
– OCILobRead2(), OCIWriteAppend2(), and
OCILobWrite2() functions
– Same APIs for JDBC and DBMS_LOB
Massive Database
• Ultra large database: 8 Million Terabytes (8 exa byte)
• Ultra large number of Tablespaces or Files: 64K
• Ultra large data files: 4 Terabytes in a single file
• Ultra large LOB columns: 4 Gigabytes * blocksize

™ Berkeley, 2001: All of human data over


history will occupy 6 exa byte
™ Berkeley, 2003: All of human data over
history will occupy 24 exa byte
LOB Performance
Improvements
• 5x performance gain for
accessing inline LOBs

• Temporary LOBs uses reference


counting to provide orders of
magnitude performance gain
– Reference on Read
– Copy on Write
Agenda
• Regular Expressions
• Native Floating-Point Data Type
• LOB Enhancements
• PL/SQL Performance Improvement
• Data Pump
New PL/SQL Features
• All the SQL language features just
discussed

• Compiler warnings

• New Utl_Mail and Utl_Compress


packges
Explicit vs. Implicit Compilation
• Explicit compilation is where you tell Oracle
to compile a program unit:
– CREATE PACKAGE dbrx_util…
– CREATE OR REPLACE TRIGGER customers_t1…
– ALTER FUNCTION valid_email_address COMPILE;
• Implicit compilation is where Oracle needs to
access a PL/SQL program unit that has been
invalidated. In this case Oracle recompiles
the program without being told to do so.
PL/SQL compilation
and execution 101
PL/SQL Source Code

Front-end

IR == Diana

Back-end

Object code == MCode

PVM
PL/SQL Native Compilation
• Starting in Oracle9i Release 1, PL/SQL
program units can be compiled directly into
machine code.
– Stored procedures, functions, packages, types,
and triggers
• Alternatively, PL/SQL code can be
interpreted as in Oracle8i and earlier.

plsql_compiler_flag = INTERPRETED | NATIVE


How PL/SQL Code is
Natively Compiled
• When you compile a PL/SQL program unit,
Oracle parses the code, validates it, and
generates byte codes for interpretation at
runtime.
• With native compilation Oracle generates a C
code source file. The C code is compiled using
your C compiler, and linked into a shared library
callable by the oracle executable.
PL/SQL compilation
and execution 101
PL/SQL Source Code

Front-end

IR == Diana

Back-end

Object codeObject
== MCode
codeor==Native
MCode machine code

PVM Hardware
Testing by Oracle Team
• The next slide shows much more detail

• The baseline is Oracle Version 8.0.6

• Speed-up factors are shown for


– 8i
– 9iR2 INTERPRETED
– 9iR2 NATIVE
– 10g INTERPRETED
– 10g NATIVE
Testing by Oracle Team
• 8i faster than 8.0.6

• 9iR2 faster than 8i

• NATIVE always faster than INTERPRETED


– at 9iR2
– at 10g
• 10g always faster than 9iR2
– INTERPRETED vs INTERPRETED
– NATIVE vs NATIVE
Testing by Oracle Team
• Most of the 10g Native test programs speed
up by a factor of more than 2x 9i interpreted

• Some speed up by very much more than that

• The 10x speed up is for a program which


uses BINARY_INTEGER and which has
idioms that are particularly susceptible to
optimization
Agenda
• Regular Expressions
• Native Floating-Point Data Type
• LOB Enhancements
• PL/SQL Performance Improvement
• Data Pump
Data Pump: What is it?
• Server-based facility for high performance
loading and unloading of data and metadata
• Callable: DBMS_DATAPUMP. Internally uses
DBMS_METADATA
• Data written in Direct Path stream format. Metadata
written as XML
• New clients expdp and impdp: Supersets of original
exp / imp.
• Foundation for Streams, Logical Standby, Grid,
Transportable Tablespaces and Data Mining initial
instantiation.
Features: Performance!!
• Automatic, two-level parallelism
– Direct Path for inter-partition parallelism
– External Tables for intra-partition parallelism
– Simple: parallel=<number of active threads>
– Dynamic: Workers can be added and removed from a running
job in Enterprise Edition
– Index builds automatically “parallelized” up to degree of job
• Simultaneous data and metadata unload
• Single thread of data unload: 1.5-2X exp
• Single thread of data load: 15X-40X imp
• With index builds: 4-10X imp
Features: Checkpoint / Restart
• Job progress recorded in a “Master Table”
• May be explicitly stopped and restarted later:
– Stop after current item finishes or stop immediate
• Abnormally terminated job is also restartable
• Current objects can be skipped on restart if
problematic
Features: Monitoring
• Flexible GET_STATUS call
• Per-worker status showing current object and
percent done
• Initial job space estimate and overall percent done
• Job state and description
• Work-in-progress and errors
Features: Network Mode
• Network import: Load one database
directly from another
• Network export: Unload a remote database to a local
dumpfile set
– Allows export of read-only databases
• Data Pump runs locally, Metadata API runs remotely.
• Uses DB links / listener service names, not pipes. Data
is moved as ‘insert into <local table> select from <remote
table>@service_name’
• Direct path engine is used on both ends
• It’s easy to swamp network bandwidth: Be careful!
Features: Fine-Grained Object Selection
• All object types are supported for both operations:
export and import
• Exclude: Specified object types are excluded from the
operation
• Include: Only the specified object types are included.
E.g, just retrieve packages, functions and procedures
• More than one of each can be specified, but use of both
is prohibited by new clients
• Both take an optional name filter for even finer
granularity:
– INCLUDE PACKAGE: “LIKE ‘PAYROLL%’ “
– EXCLUDE TABLE: “IN (‘FOO’,’BAR’, … )’ “
New Clients – expdp / impdp
• Similar (but not identical) look and feel to exp / imp
• All modes supported: full, schema, table, tablespace,
transportable. Superset of exp / imp
• Flashback is supported
• Query supported by both expdp and impdp… and on a
per-table basis!
• Detach from and attach to running jobs
• Multiple clients per job allowed; but a single client can
attach to only one job at a time
• If privileged, attach to and control other users’ jobs
New Clients – expdp / impdp
• Interactive mode entered via Ctl-C:
– ADD_FILE: Add dump files and wildcard specs. to job
– PARALLEL: Dynamically add or remove workers
– STATUS: Get detailed per-worker status and change reporting
interval
– STOP_JOB{=IMMEDIATE}: Stop job, leaving it restartable.
Immediate doesn’t wait for workers to finish current work
items… they’ll be re-done at restart
– START_JOB: Restart a previously stopped job
– KILL_JOB: Stop job and delete all its resources (master table,
dump files) leaving it unrestartable
– CONTINUE: Leave interactive mode, continue logging
– EXIT: Exit client, leave job running
Features: Other Cool Stuff…

• Can extract and load just data, just metadata or both


• SQLFILE operation generates executable DDL script
• If a table pre-exists at load time, you can: skip it
(default), replace it, truncate then load or append to it.
• Space estimates based on allocated blocks (default) or
statistics if available
• Enterprise Manager interface integrates 9i and 10g
• Callable!
Large Internet Company
2 Fact Tables: 16.2M rows, 2 Gb

Program Elapsed
exp out of the box: direct=y 0 hr 10 min 40 sec
exp tuned: direct=y buffer=2M recordlength=64K 0 hr 04 min 08 sec
expdp out of the box: Parallel=1 0 hr 03 min 12 sec
imp out of the box 2 hr 26 min 10 sec
imp tuned: buffer=2M recordlength=64K 2 hr 18 min 37 sec
impdp out of the box: Parallel=1 0 hr 03 min 05 sec
Keep in Mind:

• Designed for *big* jobs with lots of data.


– Metadata performance is about the same
– More complex infrastructure, longer startup
• XML is bigger than DDL, but much more flexible
• Data format in dump files is ~15% more
compact than exp
• Import subsetting is accomplished by pruning
the Master Table
Original exp and imp

• Original imp will be supported


forever to allow loading of V5 – V9i dump files
• Original exp will ship at least in 10g, but may
not support all new functionality.
• 9i exp may be used for downgrades from 10g
• Original and Data Pump dump file formats are
not compatible
FIN

Thank You

hadar@hi-tech.co.il
toledano@hi-tech.co.il

Вам также может понравиться