Академический Документы
Профессиональный Документы
Культура Документы
Hive/HBase Integration
or, MaybeSQL?
April 2010
John Sichi
Facebook
Agenda
» Use Cases
» Architecture
» Storage Handler
» Load via INSERT
» Query Processing
» Bulk Load
» Q&A
Facebook
Motivations
Facebook
How Can HBase Help?
Facebook
Use Case 1: HBase As ETL Data Target
Facebook
Use Case 2: HBase As Data Source
HBase
Other
Files/Ta
bles
Facebook
Use Case 3: Low Latency Warehouse
Continuous Update
HBase
Hive
Queries
Other
Periodic Load Files/Ta
bles
Facebook
HBase Architecture
Facebook
All Together Now!
Facebook
Hive CLI With HBase
hive \
--auxpath hive_hbasehandler.jar,hbase.jar,zookeeper.jar \
-hiveconf hbase.zookeeper.quorum=zk1,zk2…
Facebook
Storage Handler
Facebook
Load Via INSERT
Facebook
Map-Reduce Job for INSERT
HBase
Facebook
From http://blog.maxgarfinkel.com/wp-uploads/2010/02/mapreduceDIagram.png
Map-Only Job for INSERT
HBase
Facebook
Query Processing
Facebook
Metastore Integration
Facebook
Bulk Load
Ideally…
SET hive.hbase.bulk=true;
INSERT OVERWRITE TABLE users SELECT … ;
But for now, you have to do some work and issue multiple
Hive commands
1 Sample source data for range partitioning
2 Save sampling results to a file
3 Run CLUSTER BY query using HiveHFileOutputFormat
and TotalOrderPartitioner (sorts data, producing a large
number of region files)
4 Import HFiles into HBase
5 HBase can merge files if necessary
Facebook
Range Partitioning During Sort
(H)
(R)
TotalOrderPartitioner
A-G loadtable.rb
H-Q
HBase
R-Z
Facebook
Sampling Query For Range Partitioning
Facebook
Sorting Query For Bulk Load
set mapred.reduce.tasks=12;
set hive.mapred.partitioner=
org.apache.hadoop.mapred.lib.TotalOrderPartitioner;
set total.order.partitioner.path=/tmp/hb_range_key_list;
set hfile.compression=gz;
create table hbsort(user_id string, user_type string, ...)
stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat’
outputformat 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat’
tblproperties ('hfile.family.path' = '/tmp/hbsort/cf');
Facebook
Deployment
Facebook
Questions?
» hive-user@hadoop.apache.org
» jsichi@facebook.com
» http://wiki.apache.org/hadoop/Hive/HBaseIntegration
» http://wiki.apache.org/hadoop/Hive/HBaseBulkLoad