Академический Документы
Профессиональный Документы
Культура Документы
Deepak Sharma
September-09-2015
Why Sqoop?
A way needed for transferring proceed data from HDFS to
RDBMS
Need Parallelism ( #Mapper process) for loading data into
RDBMS
For application which require to move data from RDBMS
to Hadoop
Using Scripts for Transferring data is inefficient and time
consuming
Sqoop Installation
Prerequisite: Machine must have installed and
Configured Hadoop server or client (one of Slave
node).
Download Sqoop from sqoop-1.4.4.bin Or latest
(based on Hadoop Version) from the mirror website
http://sqoop.apache.org/
Untar the download file
tar xvzf sqoop-1.4.4.bin_hadoop-1.0.0.tar.gz
Sqoop Configuration
Sqoop Commands
12
13
14
Sqoop Limitations
Sqoop has some limitations , including:
Error prone Syntax (cryptic , contextual command line
argument)
Client-only Architecture.
Tight coupling to JDBC model not a good fit for nonRDBMS systems
Poor support for security.
$sqoop import username scott password tiger..
Sqoop can read command line options from an option file, but this still
has hole
15
Fortunately..
Sqoop2(Incubating) will address many of these
limitations
Add a web-based GUI.
Centralized configuration (Client-Server
Architecture)
More flexible model.
Improved security model
16
Sqoop 2 Architecture(Proposed)
17
18
19
QUESTIONS?
20