Combined Con6359

Querying Oracle Table from
Hive
and Querying HDFS from
PL/SQL
CON6359
Kuassi Mensah Nicholas Van Wyen

CTO
Director, Product Management
MTI
Oracle Server Technologies
September 19, 2016
Copyright 2016, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement
The following is intended to outline our general product direction. It is
intended for information purposes only, and may not be incorporated
into any contract. It is not a commitment to deliver any material, code,
or functionality, and should not be relied upon in making purchasing
decisions. The development, release, and timing of any features or
functionality described for Oracles products remains at the sole
discretion of Oracle.
Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 2

Program Agenda
1
Querying Oracle Table from Hadoop/Hive
2
Querying Hadoop/HDFS from PL/SQL

Querying Oracle Table from
Hive:
Oracle Datasource for
Hadoop
Kuassi Mensah
Director, Product Management
Oracle Server Technologies
September 19, 2016

Agenda
1
Big Data Analytics & Requirements
2
Oracle Datasource for Hadoop (OD4H)
3
Summary and Demo

Big Data Analytics
Goal: furnish actionable information to help business
decisions making.
Example
Which of our products got a rating of four stars or higher,
on social media in the last quarter?

Big Data Analytics and Requirements
Goal: furnish actionable information to help business
decisions making.
Example
Which of our products got a rating of four stars or higher,
on social media in the last quarter?
Big Data
(Weblogs, Facts, Scans,
Master Data Events, IoT)

Two Approaches for Accessing Master Data in
Oracle Direct Access
ETL Copy: Oracle -> Ad-hoc queries, always current
Hadoop Hive SQL, Spark SQL, Impala*,
Preplanned/scheduled other SQL engines
What to copy and when? Hadoop APIs
Always behind Oracle database security
Copy is protected using Hadoop
file-level security Oracle Big Data SQL (not
covered here)
Oracle Datasource for
Oracle CopyToBDA 2.0 Hadoop (OD4H)
Hive-ODCI (part II of this
presentation)
Direct Access From Hadoop
Example
Hive query for joining tables from Big Data and Oracle
SELECT HadoopT.First_Name, HadoopT.Last_Name,

OracleT.bonus
FROM HadoopT join OracleT on
(HadoopT.Emp_ID=OracleT.Emp_ID)
WHERE salary > 70000 and bonus > 7000;

Program Agenda
1
2
Oracle Datasource for Hadoop & Spark (OD4H)
3
Summary and Demo

Hadoop 2.0 Architecture
Hive SQL Batch Big Data Spark Mahout

(MapReduce) SQL (In-Memory) (ML libs)
YARN
Data
HCatalog,
InputFormat,
StorageHand
Compute ler Storage
Resources HDFS NoSQLExternal
Table Oracle
+ Handler
table(s)
Scheduler
Redundant Storage

ect, parallel, fast secure and consistent access to master data (SCN)
Hive
StorageHandler
InputFormat
Database
HCatalog
Impa
Oracle
la
YARN
Spar
k
Maho
ut
Oth
er
Oracle Table as Hive External Table
DDL
CREATE EXTERNAL TABLE Hadoop_employees (

EMPLOYEE_ID INT, FIRST_NAME STRING, LAST_NAME
STRING,SALARY DOUBLE, ...)
STORED BY
'oracle.hcat.osh.storagehandler.OracleStorageHandler
TBLPROPERTIES
( ...
'mapreduce.jdbc.input.table.name' ='EMPLOYEES,
...
);
Parallel Access to Oracle Table: Splitter Patterns
SINGLE_SPLITTER
ROW_SPLITTER
number of rows set inoracle.hcat.osh.rowsPerSplit
BLOCK_SPLITTER
max # of splits directed byoracle.hcat.osh.maxStorageBasedSplits
PARTITION_SPLITTER
CUSTOM_SPLITTER
a user-defined SELECT statement that emits ROWIDs corresponding to
start and end of each split in oracle.hcat.osh.chunkSQL
OD4H Steps
Hadoop or Spark Cluster 1. Gets a secure connection to DB

2. Generate database Splits (DLDL)
Hive
with SCN
or 3. Rewrites HiveQL or Spark SQL into
Spar Execution OD4 Oracle SQL for each split
k Plan (partial) H
4. Each split is processed by a
Quer
y Hadoop/Spark task
5. Matching rows returned to
Hadoop/Spark Query coordinator
Oracle Confidential
Putting Everything Together
Hive
DDL
HCatalog
Oracle Oracle Map Reduce Job

Table Rewritten Storage
Query Handler MapTask
granule split
MapTask Job Tracker
granule split
granule split
split
MapTask
granule

Program Agenda
1
2
3
Summary and Demo

OD4H Summary
Support for Hadoop & Spark query engines: Hive SQL, Spark-SQL,
Impala*
Support for Hadoop programming models: Pig, MapReduce, Pig, etc
Secure and reliable authentication: Kerberos authentication, SSL,
Oracle Wallet
Efficient translation of HQL to Oracle SQL
Scalability: splits based on DB meta-data
Column Projection Pushdown
Predicate Pushdown
Partition Pruning
Connection caching Oracle Confidential
OpenWorld 2016
Querying Hadoop/HDFS from
PL/SQL
CON6359
Nicholas Van Wyen

MTI
September 18, 2016
Confidential Oracle Internal/Restricted/Highly

Restricted
Agenda
1
The Real-World
2
The Problem
3
The Solution
4
Considerations
5
Questions

Restricted
Lets get started
1
The Real-World
2
The Problem
3
The Solution
4
Considerations
5
Questions

Restricted
The Real-World
Different solutions, for different requirements
Spar
C++
k
Analytic C Pro*C
s
Sqoo Java PL/SQ

p L
ES C#

Restricted
Moving on
1
The Real-World
2
The Problem
3
The Solution
4
Considerations
5
Questions

Restricted
The Problem
Changes
{;}
PL/SQL
available storage capacity available storage capacity

Application
Restricted
The Problem
Example
$ beeline -u jdbc:hive2://hive.corp.com:10000 \ SQL> desc SCOTT.USER_LOG

-n oracle -w welcome1.passwd
Name Null? Type
0: jdbc:hive2://localhost:10000> desc user_log; ---------- -------- ---------------
+------------------+------------+---------+ STAMP NOT NULL DATE
| col_name | data_type | comment | ACCOUNT VARCHAR2(30)
+------------------+------------+---------+ MESSAGE VARCHAR2(4000)
| stamp | date | |
| account | string | |
| message | string | |
+------------------+------------+---------+
procedure user_report( p out xmltype ) is

begin
create view user_log_monthly
as for rec in ( select account,
select stamp, message
account, from scott.user_log
message order by account ) loop
from scott.user_log ...
where stamp between sysdate - 30
and sysdate; end user_report;

Restricted
Next
1
The Real-World
2
The Problem
3
The Solution
4
Considerations
5
Questions

Restricted
The Solution
Introduction
Presenting Hive-ODCI
Built on Oracle Data Cartridge Interface
Inspired by DBPrism and internal projects using ODCI
Initial Requirements
Dynamically access Hadoop/Hive within the Oracle 12c RDBMS
Allow for First-Class Oracle objects
Leverage existing RBAC
Support active Bind variables
User defined, Static or Saved
Support Oracle SQL and PL/SQL
Easy to use, for Developers and Administrators
Restricted
The Solution
Overview
Hadoo
Hive
p
server
ojvm
HiveDriv
hive.jar org.apache.hive.jdbc.HiveDriver
er
database
pl/sql
odci
session hive_t binding
sql
hive_q view
select/dml/ddl
client
Application
Restricted
The Solution
Example
pipelined data
hive-odci hive_t hive_q
parallel session binding
param
bidirectional Application

Restricted
The Solution
Example param( 'hive_jdbc_url', 'jdbc:hive2://hive.corp.com:10000' );
1 param( 'hive_jdbc_url.1', 'user=oracle' );
param( 'hive_jdbc_url.2', 'password=welcome1' );
create or replace view scott.user_log

( stamp, account, message )
2 as
select *
from table( hive_q( q'[ select stamp,
account,
message
hive-odci from user_log
order by stamp ]' ) )
procedure user_report( p out xmltype ) is

3 begin
for rec in ( select account,

message
from scott.user_log
order by account ) loop
SQL> alter procedure scott.user_report compile; ...
Procedure altered. end user_report;

Restricted
The Solution
Example
create or replace view scott.user_log_monthly
(
stamp,
account,
message
)
as
select *
from table( hive_q( q'[ select stamp,
account,
message
from user_log
hive-odci where stamp between ? and ? ]',
hive_binds( hive_bind( to_char( sysdate - 30,
'yyyy-mm-dd' ),
1 /* type_date */,
1 /* ref_in */ ),
hive_bind( to_char( sysdate, ,
4 'yyyy-mm-dd' ),
1 /* type_date */,
1 /* ref_in */ ) ) )

Restricted
The Solution
Example create or replace trigger scott.user_log_dml
instead of insert or update or delete on scott.user_log
for each row
declare
cmd varchar2( 4000 );
bnd hive_binds := hive_binds();
begin
if ( inserting ) then
cmd := q'[ insert into user_log

( stamp, account, message )
values
( ?, ?, ? ) ];
hive-odci
bnd.extend;
bnd( bnd.count ) := hive_bind( to_char( :new.stamp,
'yyyy-mm-dd' ),
hive_binding.type_date,
hive_binding.ref_in );
...
elsif ( updating ) then
...
end if;
5
hive_remote.dml( cmd, bnd );
end user_log_dml;

Restricted
Wrapping it up
1
The Real-World
2
The Problem
3
The Solution
4
Considerations
5
Questions

Restricted
Considerations
In Oracle
Become familiar with the Hive-ODCI API
Read the documentation
Ask questions and test, test, test
Use session isolation whenever possible
Particularly authentication, set at the session not the system
Lean on your experience and your DBA Team
Keep signatures consistent
Change code if necessary
Become familiar with the DB wait events
Restricted
Considerations
In Hive
Analytics over in-line views
Review queries and use common sense
Leverage the CBO and gather statistics
Use best practices
ORCFile - Optimized Row Columnar File format, highly efficient Hive data storage
Apache Tez - Extensible framework for high performance batch and interactive
processing, coordinated by YARN, it improves MapReduce by dramatically improving speed,
while maintaining ability to scale
Vectorized queries - Hive feature that greatly reduces the CPU usage for query
operations like scans, filters, aggregates, and joins, which involves metadata interpretation
in the inner loop of execution code paths.
Lean on your experience and your BDA Team Confidential Oracle Internal/Restricted/Highly
Restricted
41
Considerations
Reach out
If you have questions, concerns or comments
Feel free to contact me
Available on Github
https://github.com/nvanwyen/hive-odci
https://github.com/nvanwyen/hive-odci/releases/latest
Contact
nvanwyen@mtihq.com
Restricted
Thats it
1
The Real-World
2
The Problem
3
The Solution
4
Considerations
5
Questions

Restricted

Combined Con6359

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Combined Con6359

Загружено:

Авторское право:

Доступные форматы

Querying Oracle Table from

Kuassi Mensah Nicholas Van Wyen

Copyright 2016, Oracle and/or its affiliates. All rights reserved. |

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 2

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 3

Copyright 2016, Oracle and/or its affiliates. All rights reserved. |

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 5

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 6

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 7

SELECT HadoopT.First_Name, HadoopT.Last_Name,

Copyright 2016, Oracle and/or its affiliates. All rights reserved. |

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 10

Hive SQL Batch Big Data Spark Mahout

Copyright 2016, Oracle and/or its affiliates. All rights reserved. |

CREATE EXTERNAL TABLE Hadoop_employees (

Hadoop or Spark Cluster 1. Gets a secure connection to DB

Oracle Oracle Map Reduce Job

Copyright 2016, Oracle and/or its affiliates. All rights reserved. |

Copyright 2016, Oracle and/or its affiliates. All rights reserved. | 17

Nicholas Van Wyen

Confidential Oracle Internal/Restricted/Highly

Confidential Oracle Internal/Restricted/Highly

Confidential Oracle Internal/Restricted/Highly

Sqoo Java PL/SQ

Confidential Oracle Internal/Restricted/Highly

Confidential Oracle Internal/Restricted/Highly

available storage capacity available storage capacity

$ beeline -u jdbc:hive2://hive.corp.com:10000 \ SQL> desc SCOTT.USER_LOG

procedure user_report( p out xmltype ) is

Confidential Oracle Internal/Restricted/Highly

Confidential Oracle Internal/Restricted/Highly

hive-odci hive_t hive_q

parallel session binding

Confidential Oracle Internal/Restricted/Highly

create or replace view scott.user_log

procedure user_report( p out xmltype ) is

for rec in ( select account,

Procedure altered. end user_report;

Confidential Oracle Internal/Restricted/Highly

Confidential Oracle Internal/Restricted/Highly

cmd := q'[ insert into user_log

Confidential Oracle Internal/Restricted/Highly

Confidential Oracle Internal/Restricted/Highly

Confidential Oracle Internal/Restricted/Highly

Вам также может понравиться