HiveLab1-Exploring Hive v4 0

IBM Software An IBM Proof of Technology
Accessing Hadoop Data Using Hive

Unit 1: Exploring Hive
An IBM Proof of Technology
Catalog Number
© Copyright IBM Corporation, 2015

US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM Software
Contents
LAB 1 EXPLORING HIVE.............................................................................................................................................. 4
1.1 GETTING STARTED .................................................................................................................................. 5
1.2 HIVE AND AMBARI .................................................................................................................................... 9
1.3 EXPLORING THE HIVE ENVIRONMENT ....................................................................................................... 11
1.3.1 INVESTIGATING HIVE DIRECTORY STRUCTURE WITH THE CONSOLE ................................................ 11
1.3.2 EXPLORING THE HIVE BEELINE COMMAND LINE INTERFACE (CLI)................................................. 14
1.4 SUMMARY ............................................................................................................................................. 16
Contents Page 3
IBM Software
Lab 1 Exploring Hive

The overwhelming trend towards digital services, combined with cheap storage, has generated massive
amounts of data that enterprises need to effectively gather, process, and analyze. Data analysis
techniques from the data warehouse and high-performance computing communities are invaluable for
many enterprises, however often times their cost or complexity of scale-up discourages the accumulation
of data without an immediate need. As valuable knowledge may nevertheless be buried in this data,
related scaled-up technologies have been developed. Examples include Google’s MapReduce, and the
open-source implementation, Apache Hadoop.
Writing MapReduce programs to analyze your Big Data can get complex. Apache Hive can help make
querying your data much easier. Hive, first created at Facebook, is a data warehouse system for Hadoop
that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in
Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and
query the data using a SQL-like language called HiveQL.
After completing this hands-on lab, you will be able to:
• Start and stop Hive from the Ambari UI.

• Use the Linux command line to explore the Hive directory structure.
• Interact with the Hive Beeline CLI in interactive mode.
Allow 30 minutes to complete this section of lab.
This version of the lab was designed using the IBM BigInsights 4.0 Quick Start Edition. Throughout this
lab it is assumed that you will be using the following account login information:
Username Password
VM image setup screen root password
Linux virtuser password
Ambari admin admin
Page 4 Exploring Hive

IBM Software
1.1 Getting Started

To prepare for the contents of this lab, you must go through the process of getting all of the Hadoop
components started. These instructions assume you have already followed the IBM BigInsights Quick
Start Edition, v4.0 README setup guide.
__1. Start the VMware image by clicking the Play virtual machine button in the VMware Player if it is
not already on.
__2. Select virtuser from the list.
Hands-on-Lab Page 5
IBM Software
__3. Log in to the VMware virtual machine using the virtuser credentials.
__4. After you log in, you will see the Red Hat desktop. Ambari will be open in a browser window.
Before we can start working with Hive and the Hadoop Distributed File system, we must first start the
necessary BigInsights components. We will login to Ambari to do this.

IBM Software
__5. Login to Ambari.
__6. Ambari will start up the various Hadoop services and applications. View the 4.0 README guide
for more details.
Hands-on-Lab Page 7
IBM Software
__7. You can also manually start/stop the services by clicking the Actions drop down.
__8. Once all components have started successfully you may move on to the next section.

IBM Software
1.2 Hive and Ambari

Let’s learn more about our installation of Hive by browsing Hive details from within the Ambari web
interface.
__1. Click Hive in the left hand Ambari menu. This will load information about our Hive installation.
Notice there are two tabs at the top of the screen. A Summary tab and a Configs tab. On the
Summary tab, we can see the status of the various Hive components.
__2. Click on the Configs tab. You will see all the Hive configuration settings. These settings are
stored on the file system in the /usr/iop/4.0.0.0/hive/conf directory. The Ambari GUI makes it
easy for us to view and edit them.
Hands-on-Lab Page 9
IBM Software
__3. Go back to the Summary tab. Let’s take a look at one of the Hive components. Notice that the
WebHCat Server has a status of “Started”. WebHCat (formerly known as Templeton) provides a
REST-like web API for HCatalog (a component of Hive).
__4. To prove that WebHCat is available, let’s try a couple WebHCat URLs in the web browser. In a
new browser window, enter this URL:

IBM Software
http://localhost:50111/templeton/v1/status
We get a response that the status is “ok”.
__5. Next try the following URL:
http://localhost:50111/templeton/v1/ddl/database?user.name=hive
WebHCat gives us a list of databases back. Right now, the only database we have in the system
is the “Default” database.
1.3 Exploring the Hive environment
1.3.1 Investigating Hive directory structure with the console
Let’s navigate to the Hive home directory on the Linux file system and investigate the directories that
Hive is comprised of.
__1. Open the Linux terminal by selecting Applications->System Tools->Terminal from the upper Red
Hat menu.
Hands-on-Lab Page 11
IBM Software
The terminal will open up.
__2. In the terminal change to the Hive directory
$ cd /usr/iop/4.0.0.0/hive

IBM Software
This directory is where Hive is setup on this BigInsights virtual machine.
__3. Explore the directory structure inside the hive folder by running the ls command.
$ ls
__4. You will notice the following directories of interest:
• bin – executables to start/stop/configure/check status of hive, various scripts
• conf – Hive environment, metastore, security, and log configuration files
• doc – Hive documentation and Hive examples
• lib – server’s JAR files
• man – man page information
• scripts – scripts for upgrading derby and MySQL metastores from one version of Hive to the next
__5. Now cd into the /usr/iop/4.0.0.0/hive-hcatalog directory. This is where the Hcatalog files are
stored.
$ cd /usr/iop/4.0.0.0/hive-hcatalog
IBM Software
1.3.2 Exploring the Hive Beeline Command Line Interface (CLI)
From the Hive Beeline CLI shell you can perform queries, DML, DDL and more. We will be doing much
work in the Hive Beeline CLI so let’s briefly check it out!
__1. In the Linux terminal change into the Hive bin directory
$ cd /usr/iop/4.0.0.0/hive/bin
__2. Inside the bin directory we will run a command that will show us the command line options for the
Hive CLI.
$ ./beeline --help

IBM Software
__3. Start an interactive Hive shell session.
$ ./beeline
__4. Connect to Hive.
beeline> !connect jdbc:hive2://rvm.svl.ibm.com:10000 virtuser password

org.apache.hive.jdbc.HiveDriver
__5. Run the SHOW DATABASES statement from within the interactive Hive session.
beeline> SHOW DATABASES;
IBM Software
You can see the default database is listed.
__6. Quit Hive.
hive> !quit
1.4 Summary
Congratulations! You now know how to start and stop Hive using the Ambari Web UI. You can navigate
to the Hive directories and understand the contents of those directories. You also know how to interact
with the Hive Beeline CLI. You may move on to the next Unit.

NOTES
NOTES
© Copyright IBM Corporation 2015.
The information contained in these materials is provided for

informational purposes only, and is provided AS IS without warranty
of any kind, express or implied. IBM shall not be responsible for any
damages arising out of the use of, or otherwise related to, these
materials. Nothing contained in these materials is intended to, nor
shall have the effect of, creating any warranties or representations
from IBM or its suppliers or licensors, or altering the terms and
conditions of the applicable license agreement governing the use of
IBM software. References in these materials to IBM products,
programs, or services do not imply that they will be available in all
countries in which IBM operates. This information is based on
current IBM product plans and strategy, which are subject to change
by IBM without notice. Product release dates and/or capabilities
referenced in these materials may change at any time at IBM’s sole
discretion based on market opportunities or other factors, and are not
intended to be a commitment to future product or feature availability
in any way.
IBM, the IBM logo and ibm.com are trademarks of International

Business Machines Corp., registered in many jurisdictions
worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is
available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml.

HiveLab1-Exploring Hive v4 0

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

HiveLab1-Exploring Hive v4 0

Загружено:

Авторское право:

Доступные форматы

IBM Software An IBM Proof of Technology

Accessing Hadoop Data Using Hive

© Copyright IBM Corporation, 2015

Lab 1 Exploring Hive

After completing this hands-on lab, you will be able to:

• Start and stop Hive from the Ambari UI.

Allow 30 minutes to complete this section of lab.

VM image setup screen root password

Linux virtuser password

Ambari admin admin

Page 4 Exploring Hive

1.1 Getting Started

__2. Select virtuser from the list.

Page 6 Exploring Hive

__5. Login to Ambari.

Page 8 Exploring Hive

1.2 Hive and Ambari

Page 10 Exploring Hive

We get a response that the status is “ok”.

__5. Next try the following URL:

1.3 Exploring the Hive environment

1.3.1 Investigating Hive directory structure with the console

The terminal will open up.

__2. In the terminal change to the Hive directory

Page 12 Exploring Hive

This directory is where Hive is setup on this BigInsights virtual machine.

__4. You will notice the following directories of interest:

• bin – executables to start/stop/configure/check status of hive, various scripts

• conf – Hive environment, metastore, security, and log configuration files

• doc – Hive documentation and Hive examples

• lib – server’s JAR files

• man – man page information

1.3.2 Exploring the Hive Beeline Command Line Interface (CLI)

Page 14 Exploring Hive

__3. Start an interactive Hive shell session.

__4. Connect to Hive.

beeline> !connect jdbc:hive2://rvm.svl.ibm.com:10000 virtuser password

beeline> SHOW DATABASES;

You can see the default database is listed.

__6. Quit Hive.

Page 16 Exploring Hive

The information contained in these materials is provided for

IBM, the IBM logo and ibm.com are trademarks of International

Вам также может понравиться