Вы находитесь на странице: 1из 20

IBM Software An IBM Proof of Technology

Accessing Hadoop Data Using Hive


Unit 1: Exploring Hive
An IBM Proof of Technology
Catalog Number

© Copyright IBM Corporation, 2015


US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
IBM Software

Contents
LAB 1 EXPLORING HIVE.............................................................................................................................................. 4
1.1 GETTING STARTED .................................................................................................................................. 5
1.2 HIVE AND AMBARI .................................................................................................................................... 9
1.3 EXPLORING THE HIVE ENVIRONMENT ....................................................................................................... 11
1.3.1 INVESTIGATING HIVE DIRECTORY STRUCTURE WITH THE CONSOLE ................................................ 11
1.3.2 EXPLORING THE HIVE BEELINE COMMAND LINE INTERFACE (CLI)................................................. 14
1.4 SUMMARY ............................................................................................................................................. 16

Contents Page 3
IBM Software

Lab 1 Exploring Hive


The overwhelming trend towards digital services, combined with cheap storage, has generated massive
amounts of data that enterprises need to effectively gather, process, and analyze. Data analysis
techniques from the data warehouse and high-performance computing communities are invaluable for
many enterprises, however often times their cost or complexity of scale-up discourages the accumulation
of data without an immediate need. As valuable knowledge may nevertheless be buried in this data,
related scaled-up technologies have been developed. Examples include Google’s MapReduce, and the
open-source implementation, Apache Hadoop.

Writing MapReduce programs to analyze your Big Data can get complex. Apache Hive can help make
querying your data much easier. Hive, first created at Facebook, is a data warehouse system for Hadoop
that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in
Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and
query the data using a SQL-like language called HiveQL.

After completing this hands-on lab, you will be able to:

• Start and stop Hive from the Ambari UI.


• Use the Linux command line to explore the Hive directory structure.
• Interact with the Hive Beeline CLI in interactive mode.

Allow 30 minutes to complete this section of lab.

This version of the lab was designed using the IBM BigInsights 4.0 Quick Start Edition. Throughout this
lab it is assumed that you will be using the following account login information:

Username Password

VM image setup screen root password

Linux virtuser password

Ambari admin admin

Page 4 Exploring Hive


IBM Software

1.1 Getting Started


To prepare for the contents of this lab, you must go through the process of getting all of the Hadoop
components started. These instructions assume you have already followed the IBM BigInsights Quick
Start Edition, v4.0 README setup guide.

__1. Start the VMware image by clicking the Play virtual machine button in the VMware Player if it is
not already on.

__2. Select virtuser from the list.

Hands-on-Lab Page 5
IBM Software

__3. Log in to the VMware virtual machine using the virtuser credentials.

__4. After you log in, you will see the Red Hat desktop. Ambari will be open in a browser window.

Before we can start working with Hive and the Hadoop Distributed File system, we must first start the
necessary BigInsights components. We will login to Ambari to do this.

Page 6 Exploring Hive


IBM Software

__5. Login to Ambari.

__6. Ambari will start up the various Hadoop services and applications. View the 4.0 README guide
for more details.

Hands-on-Lab Page 7
IBM Software

__7. You can also manually start/stop the services by clicking the Actions drop down.

__8. Once all components have started successfully you may move on to the next section.

Page 8 Exploring Hive


IBM Software

1.2 Hive and Ambari


Let’s learn more about our installation of Hive by browsing Hive details from within the Ambari web
interface.

__1. Click Hive in the left hand Ambari menu. This will load information about our Hive installation.

Notice there are two tabs at the top of the screen. A Summary tab and a Configs tab. On the
Summary tab, we can see the status of the various Hive components.

__2. Click on the Configs tab. You will see all the Hive configuration settings. These settings are
stored on the file system in the /usr/iop/4.0.0.0/hive/conf directory. The Ambari GUI makes it
easy for us to view and edit them.

Hands-on-Lab Page 9
IBM Software

__3. Go back to the Summary tab. Let’s take a look at one of the Hive components. Notice that the
WebHCat Server has a status of “Started”. WebHCat (formerly known as Templeton) provides a
REST-like web API for HCatalog (a component of Hive).

__4. To prove that WebHCat is available, let’s try a couple WebHCat URLs in the web browser. In a
new browser window, enter this URL:

Page 10 Exploring Hive


IBM Software

http://localhost:50111/templeton/v1/status

We get a response that the status is “ok”.

__5. Next try the following URL:

http://localhost:50111/templeton/v1/ddl/database?user.name=hive

WebHCat gives us a list of databases back. Right now, the only database we have in the system
is the “Default” database.

1.3 Exploring the Hive environment

1.3.1 Investigating Hive directory structure with the console

Let’s navigate to the Hive home directory on the Linux file system and investigate the directories that
Hive is comprised of.

__1. Open the Linux terminal by selecting Applications->System Tools->Terminal from the upper Red
Hat menu.

Hands-on-Lab Page 11
IBM Software

The terminal will open up.

__2. In the terminal change to the Hive directory

$ cd /usr/iop/4.0.0.0/hive

Page 12 Exploring Hive


IBM Software

This directory is where Hive is setup on this BigInsights virtual machine.

__3. Explore the directory structure inside the hive folder by running the ls command.

$ ls

__4. You will notice the following directories of interest:

• bin – executables to start/stop/configure/check status of hive, various scripts

• conf – Hive environment, metastore, security, and log configuration files

• doc – Hive documentation and Hive examples

• lib – server’s JAR files

• man – man page information

• scripts – scripts for upgrading derby and MySQL metastores from one version of Hive to the next

__5. Now cd into the /usr/iop/4.0.0.0/hive-hcatalog directory. This is where the Hcatalog files are
stored.

$ cd /usr/iop/4.0.0.0/hive-hcatalog

Hands-on-Lab Page 13
IBM Software

1.3.2 Exploring the Hive Beeline Command Line Interface (CLI)

From the Hive Beeline CLI shell you can perform queries, DML, DDL and more. We will be doing much
work in the Hive Beeline CLI so let’s briefly check it out!

__1. In the Linux terminal change into the Hive bin directory

$ cd /usr/iop/4.0.0.0/hive/bin

__2. Inside the bin directory we will run a command that will show us the command line options for the
Hive CLI.

$ ./beeline --help

Page 14 Exploring Hive


IBM Software

__3. Start an interactive Hive shell session.

$ ./beeline

__4. Connect to Hive.

beeline> !connect jdbc:hive2://rvm.svl.ibm.com:10000 virtuser password


org.apache.hive.jdbc.HiveDriver

__5. Run the SHOW DATABASES statement from within the interactive Hive session.

beeline> SHOW DATABASES;

Hands-on-Lab Page 15
IBM Software

You can see the default database is listed.

__6. Quit Hive.

hive> !quit

1.4 Summary
Congratulations! You now know how to start and stop Hive using the Ambari Web UI. You can navigate
to the Hive directories and understand the contents of those directories. You also know how to interact
with the Hive Beeline CLI. You may move on to the next Unit.

Page 16 Exploring Hive


NOTES
NOTES
© Copyright IBM Corporation 2015.

The information contained in these materials is provided for


informational purposes only, and is provided AS IS without warranty
of any kind, express or implied. IBM shall not be responsible for any
damages arising out of the use of, or otherwise related to, these
materials. Nothing contained in these materials is intended to, nor
shall have the effect of, creating any warranties or representations
from IBM or its suppliers or licensors, or altering the terms and
conditions of the applicable license agreement governing the use of
IBM software. References in these materials to IBM products,
programs, or services do not imply that they will be available in all
countries in which IBM operates. This information is based on
current IBM product plans and strategy, which are subject to change
by IBM without notice. Product release dates and/or capabilities
referenced in these materials may change at any time at IBM’s sole
discretion based on market opportunities or other factors, and are not
intended to be a commitment to future product or feature availability
in any way.

IBM, the IBM logo and ibm.com are trademarks of International


Business Machines Corp., registered in many jurisdictions
worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is
available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml.

Вам также может понравиться