Hive and Impala

Big Data Hadoop and Spark Developer
Lesson 4—Basics of Hive and Impala
© Simplilearn. All rights reserved.

Learning Objectives
Identify the features of Hive and Impala
Understand the methods to interact with Hive and Impala

Basics of Hive and Impala
Topic 1: Features of Hive and Impala
Introduction to Hive and Impala
Batch Interactive
Processing SQL
SELECT t1.a1 as c1, t2.b1 as c2
FROM t1 JOIN t2 ON (t1.a2=t2.b2);

• Hive and Impala provide an SQL-like interface
for users to extract data from the Hadoop
system.
Resource Management • They reside on top of Hadoop and can be used
to query data from the underlying storage
Storage components.
HDFS HBase
Hive and Impala: Similarities
• Hive is very similar to Impala in the following ways:

Hive and Impala: Differences
Hive was developed by Facebook around 2007. Impala was developed by Cloudera around
2012.
It is an Open source Apache project. It is an incubating Apache project.
It has a high level abstraction layer on top of It has a high performance dedicated SQL
MapReduce and Apache Spark.
engine.
It uses HiveQL to query the structured data in a
metastore. It uses Impala SQL for ad hoc queries.
It is suitable for structured data. It is designed for high concurrency and ad

z
hoc queries.
Hive and Impala: Comparison
Hive Impala
• Comprises a specialized SQL

• Provides more features than engine that offers five to fifty
Impala times faster performance
than Hive
• Is highly extensible
• Used mainly for interactive
queries and data analysis
• Used mostly for batch
processing • Accommodates many
concurrent users
Relational Databases vs. Hive vs. Impala
Use Case: Hive and Impala
Hive and Impala are commonly used to analyze social media coverage.
Basics of Hive and Impala
Topic 2: Interacting with Hive and Impala
Executing a Query in Hive and Impala
Receive SQL query Receive SQL query
Parse Hive QL 1 Parse Impala SQL
Make optimizations 2 Make optimizations
Plan execution 3 Plan execution
Submit job(s) to cluster 4 Execute query on cluster
Monitor progress 5 Store the data in HDFS
Process data—
6
MapReduce or Apache Spark
Store the data in HDFS 7

Hive Query Editor
Interfaces to Run Hive and Impala Queries
Hive and Impala offer numerous interfaces to run queries:
• Command-line shell:
– Impala: Impala shell
– Hive: Beeline
Impala Query Editor
• Hue Web UI:
– Hive Query Editor
– Impala Query Editor
• Metastore Manager:
– ODBC/JDBC
Impala Lab Access Details
• The steps to start Impala in lab are as follows:
Step 1 Step 2
• Log in to cloud lab • Connect to any

web console with daemon server with
your credentials the help of the
command below:
•impala-shell -i
cloudera-
slavenode3.cloudlab.
com
Demo
Starting Impala Lab
Demonstrate the method to start and connect to the Impala lab from command.
Impala Lab Access Details
Connecting with Hive and Impala Shell
• To execute Impala commands from Impala shell:
• To run Hive using Beeline:

Running Impala Queries from Command Line
To check all options of Impala using the help option: Impala-shell –help
Impala-shell –q ‘select *
To run direct queries from shell using the –q option: from simple’
Impala-shell –d
To issue a use database on startup using the –d option: Simplilearn
Demo
Connecting with Hive and Impala Shell
Demonstrate the method to connect with Hive and Impala shell, along with some basic
operations.
Sample Queries
SELECT version();
To explore a new Impala instance:
SELECT current_database();
CREATE DATABASE IF NOT EXISTS

To create a database: sample;
To verify a database: SHOW databases;
To specify the location where the database is CREATE DATABASE IF NOT EXISTS
to be created: database_name LOCATION hdfs_path;
Sample Queries
To switch the current session to another

USE db_name;
database:
CREATE TABLE stockprice

(stock_id INT,
date STRING,
open_price FLOAT,
high_price FLOAT,
low_price FLOAT,
close_price FLOAT,
To create a table in Parquet format: stock_volume INT,
adjclose_price FLOAT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION
'/home/singh25nov_gmail/input'
stored as parquet ;
Sample Queries
CREATE EXTERNAL TABLE stop_loss

(
stock_id INT,
stock_volume FLOAT,
stock_current_rate DOUBLE,
To load csv data from local files: stock_trigger_price DOUBLE
)
ROW FORMAT DELIMITED FIELDS
TERMINATED BY ','
LOCATION
'/user/cloudera/sample_data/tab1';
To list all tables in the current database in SHOW tables;

Impala:
Sample Queries
INSERT INTO stockprice

(date,open_price,high_price,low_price,c
To insert a single row: lose_price,stock_volume,adjclose_price
) VALUES ('15112017',102
,105,98.6,100154711,100);
impala-shell -i <impala-daemon-uri> -f
To migrate from SQL:
<filename>.sql;
Sample Queries
SELECT stockprice.open_price,
MAX(stockprice.stock_volume),
MIN(stop_loss.stock_current_rate)
To aggregate and join: FROM stop_loss JOIN stockprice USING
(stock_id)
GROUP BY high_price ORDER BY 1
LIMIT 5;
DROP (DATABASE|SCHEMA) [IF EXISTS]
To drop a database: database_name [RESTRICT | CASCADE]

[LOCATION hdfs_path];
Sample Queries
• Interactive mode:
SELECT count(*) FROM stockprice;
To query the Impala table: • Set of commands contained in a file:

impala-shell-i impala-host -f <filename>.sql;
• Single command to the impala-shell:
impala-shell-i impala-host-q 'select count(*) from stockprice‘;
Executing Queries in the Impala Shell
localhost.localdomain:21000] > select * from webpage where page_id > 40

> LIMIT 5;
Query: select * from webpage where page_id > 40
LIMIT 5
+---------+--------------------------+--------------------------------------+
| page_id | name | assoc_files |
+---------+--------------------------+--------------------------------------+
| 41 | sorrento_f31l_sales.html | theme2.css,code.js,sorrento_f31l.jpg |
| 45 | titanic_2400_sales.html | theme1.css,code.js,titanic_2400.jpg |
+---------+--------------------------+--------------------------------------+
Fetched 5 row(s) in 1.32s
Demo
Impala Queries
Demonstrate the sample Impala queries.

Running Hive Queries Using Beeline
• The character “!” is used to execute Beeline

commands.
The commands used to run Beeline:

• !exit: Used to exit the shell
• !help: Shows list of all commands
• !verbose: Shows added details of queries
Demo
Running Hive Queries Using Beeline
Demonstrate the method to connect with Beeline and execute basic queries.
Running Beeline from Command Line
beeline –u … -f
To execute file using the –u option: simplilearn.hql
To use HiveQL directly from the command line using the -e beeline –u ... -e 'SELECT *
option: FROM users‘
To continue running script even after an error: beeline –u … -force=TRUE

Running Hive Query
Hive> select * from device

> LIMIT 5;
OK • All SQL commands are terminated
1 2008-10-21 00:00:00 Sorrento F00L phone with a semicolon “;”
2 2010-04-19 00:00:00 Titanic 2100 phone
3 2011-02-18 00:00:00 MeeToo 3.0 phone
4 2011-09-21 00:00:00 MeeToo 3.1 phone
5 2008-10-21 00:00:00 iFruit 1 phone
Time taken: 0.296 seconds, Fetched: 5 row(s)
Connecting Hive and Impala Shell with Hue
• Hue can be used to write Hive and Impala

queries from the User Interface.
Demo
Connecting Hive and Impala Shell with Hue
Demonstrate the method to connect Hive and Impala shell using Hue.
Hive and Impala Editors in Hue
Diagram 1 Diagram 2
Key Takeaways
Hive and Impala are tools to perform SQL queries on data residing on HDFS
or HBase.
Hive and Impala are easy to learn for experienced SQL developers.
Hive and Impala solve the Big Data problem but cannot replace a traditional
RDBMS.
Hive runs MapReduce or Spark jobs on Hadoop based on HiveQL statements.
Impala uses a very fast specialized SQL engine that is faster than MapReduce.
Quiz
QUIZ
Which of the following components can be used to accept command inputs from users?
1
a. Command Line Interface
b. Query compiler
c. Execution engine
d. Thrift server
QUIZ
Which of the following components can be used to accept command inputs from users?
1
a. Command Line Interface
b. Query compiler
c. Execution engine
d. Thrift server
The correct answer is a.

The Command Line Interface is used as an input medium to accept command input from users.
QUIZ
Hive can be accessed from Hue using ________.
2
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI
QUIZ
Hive can be accessed from Hue using ________.
2
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI
The correct answer is b.

Hive can be accessed through the Hive editor in Hue.
QUIZ
Impala can be accessed from Hue using ________.
3
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI
QUIZ
Impala can be accessed from Hue using ________.
3
a. Impala editor
b. Hive Editor
c. File browser
d. YARN UI

Impala can be accessed through the Impala editor in Hue.
QUIZ
Updating an individual record is possible in______.
4
a. Impala
b. Hive
c. RDBMS
d. All of the above

QUIZ
Updating an individual record is possible in______.
4
a. Impala
b. Hive
c. RDBMS
d. All of the above
The correct answer is c.

Hive and Impala cannot update individual records, but an RDBMS can.
QUIZ
Deleting an individual record is possible in_______.
5
a. RDBMS
b. Hive
c. Impala
d. All of the above

QUIZ
Deleting an individual record is possible in_______.
5
a. RDBMS
b. Hive
c. Impala
d. All of the above

Hive and Impala cannot delete individual records, but an RDBMS can.
This concludes “Basics of Hive and Impala.”
The next lesson is “Working with Hive and Impala.”
©Simplilearn. All rights reserved

Hive and Impala

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Hive and Impala

Загружено:

Авторское право:

Доступные форматы

Big Data Hadoop and Spark Developer

Lesson 4—Basics of Hive and Impala

© Simplilearn. All rights reserved.

Identify the features of Hive and Impala

Understand the methods to interact with Hive and Impala

SELECT t1.a1 as c1, t2.b1 as c2

FROM t1 JOIN t2 ON (t1.a2=t2.b2);

• Hive is very similar to Impala in the following ways:

It is an Open source Apache project. It is an incubating Apache project.

It is suitable for structured data. It is designed for high concurrency and ad

• Comprises a specialized SQL

Receive SQL query Receive SQL query

Parse Hive QL 1 Parse Impala SQL

Make optimizations 2 Make optimizations

Plan execution 3 Plan execution

Submit job(s) to cluster 4 Execute query on cluster

Monitor progress 5 Store the data in HDFS

Store the data in HDFS 7

Hive and Impala offer numerous interfaces to run queries:

• The steps to start Impala in lab are as follows:

• Log in to cloud lab • Connect to any

• To execute Impala commands from Impala shell:

• To run Hive using Beeline:

CREATE DATABASE IF NOT EXISTS

To verify a database: SHOW databases;

To switch the current session to another

CREATE TABLE stockprice

CREATE EXTERNAL TABLE stop_loss

To list all tables in the current database in SHOW tables;

INSERT INTO stockprice

DROP (DATABASE|SCHEMA) [IF EXISTS]

To drop a database: database_name [RESTRICT | CASCADE]

To query the Impala table: • Set of commands contained in a file:

localhost.localdomain:21000] > select * from webpage where page_id > 40

Demonstrate the sample Impala queries.

• The character “!” is used to execute Beeline

The commands used to run Beeline:

To continue running script even after an error: beeline –u … -force=TRUE

Hive> select * from device

• Hue can be used to write Hive and Impala

Hive runs MapReduce or Spark jobs on Hadoop based on HiveQL statements.

a. Command Line Interface

a. Command Line Interface

The correct answer is a.

The correct answer is b.

The correct answer is a.

d. All of the above

d. All of the above

The correct answer is c.

d. All of the above

d. All of the above

The correct answer is a.

©Simplilearn. All rights reserved

Вам также может понравиться