Вы находитесь на странице: 1из 4

What is sqoop?

Transferring data to and from relational databases. Taking advantage of MapReduce,


Hadoops execution engine, Sqoop performs the transfers in a parallel manner.

Advantage of sqoop 2
The motivation behind Sqoop 2 was to make Sqoop easier to use by having a web
application run Sqoop. This allows you to install Sqoop and use it from anywhere.
having a REST API for operation and management enables Sqoop to integrate better
with external systems such as Apache Oozie.

Installation option
Binary Tarball
Package
package benefit
seamless integration with the operating system: for example, Configuration files are
stored in /etc/ and logs in /var/log and anc configuration in /etc/sqoop/conf
Installing packages is simpler than using tarballs. They are already integrated with
the operating system and will automatically download and install most of the
required dependencies during the Sqoop installation.
sudo yum install sqoop
sudo apt-get install sqoop

RedHat/CentOS
Ubuntu

sqoop main configuration


sqoop-site.xml
Sqoop requires the JDBC drivers for your specific database server (MySQL, Oracle,
etc.)
lib/
/usr/lib/sqoop/lib

Tarball
package

Installing Specialized Connectors


Some database systems provide special connectors, which are not part of the Sqoop
distribution, and these take advantage of advanced database features.
/etc/sqoop
/conf/

package
Tarball

connector.fully.qualified.class.name=/full/path/to/the/jar

Sqoop Command

sqoop

TOOL PROPERTY_ARGS

SQOOP_ARGS

[-- EXTRA_ARGS]

TOOL: import/export
PROPERTY_ARGS: Java properties Dname=value
SQOOP_ARGS: sqoop parameter
EXTRA_ARGS: for specialized connectors

Transferring an Entire Table (import: RDBMS to HDFS)


sqoop import | import-all-tables | export
-Dsqoop.export.records.per.statement=1 \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
-P
Protecting password with prompt
--password-file my-sqoop-password
Protecting password with password file
--table cities \
Import table content into hdfs
--where "country = 'USA'"
Importing subset of data
--map-column-java id=Long
Override type of column to java type
--as-sequencefile
Binary format
--as-avrodatafile
Binary format
--compress
Compress imported data
--direct
Speeding up transfer
--num-mappers 10
Controlling parallelism
--null-string '\\N' \
Encoding null value
--null-non-string '\\N'
Encoding null value
--exclude-tables cities,countries
Exclude table while importing all
--target-dir /etl/input/cities
Specify target directory
--warehouse-dir /etl/input/
To specify parent directory for all your job

Incremental loading
sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table visits \
--incremental append \
--check-column id \
--last-value 1

Incrementally Importing Mutable Data


sqoop import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table visits \
--incremental lastmodified \
--check-column last_update_date \

--last-value "2013-05-22 01:01:01"

Preserving the Last Imported Value


sqoop job \
--create visits \
-- \
import \
--connect jdbc:mysql://mysql.example.com/sqoop \
--username sqoop \
--password sqoop \
--table visits \
--incremental append \
--check-column id \
--last-value 0
sqoop
sqoop
sqoop
sqoop

job
job
job
job

--exec visits
--list
--delete visits
--show visits

Storing Passwords in the Metastore


vi sqoop-site.xml
<configuration>
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
</property>
</configuration>

Overriding the Arguments to a Saved Job


sqoop job --exec visits -- --verbose

Sharing the Metastore Between Sqoop Clients


sqoop metastore
sqoop job
--create visits \
--meta-connect jdbc:hsqldb:hsql://metastore.example.com:16000/sqoop \
-- \
import \
--table visits
<configuration>
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:hsqldb:hsql://your-metastore:16000/sqoop</value>

</property>
</configuration>

Вам также может понравиться