Академический Документы
Профессиональный Документы
Культура Документы
Contents
LAB 1 EXPLORING HIVE.............................................................................................................................................. 4
1.1 GETTING STARTED .................................................................................................................................. 5
1.2 HIVE AND AMBARI .................................................................................................................................... 9
1.3 EXPLORING THE HIVE ENVIRONMENT ....................................................................................................... 11
1.3.1 INVESTIGATING HIVE DIRECTORY STRUCTURE WITH THE CONSOLE ................................................ 11
1.3.2 EXPLORING THE HIVE BEELINE COMMAND LINE INTERFACE (CLI)................................................. 14
1.4 SUMMARY ............................................................................................................................................. 16
Contents Page 3
IBM Software
Writing MapReduce programs to analyze your Big Data can get complex. Apache Hive can help make
querying your data much easier. Hive, first created at Facebook, is a data warehouse system for Hadoop
that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in
Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and
query the data using a SQL-like language called HiveQL.
This version of the lab was designed using the IBM BigInsights 4.0 Quick Start Edition. Throughout this
lab it is assumed that you will be using the following account login information:
Username Password
__1. Start the VMware image by clicking the Play virtual machine button in the VMware Player if it is
not already on.
Hands-on-Lab Page 5
IBM Software
__3. Log in to the VMware virtual machine using the virtuser credentials.
__4. After you log in, you will see the Red Hat desktop. Ambari will be open in a browser window.
Before we can start working with Hive and the Hadoop Distributed File system, we must first start the
necessary BigInsights components. We will login to Ambari to do this.
__6. Ambari will start up the various Hadoop services and applications. View the 4.0 README guide
for more details.
Hands-on-Lab Page 7
IBM Software
__7. You can also manually start/stop the services by clicking the Actions drop down.
__8. Once all components have started successfully you may move on to the next section.
__1. Click Hive in the left hand Ambari menu. This will load information about our Hive installation.
Notice there are two tabs at the top of the screen. A Summary tab and a Configs tab. On the
Summary tab, we can see the status of the various Hive components.
__2. Click on the Configs tab. You will see all the Hive configuration settings. These settings are
stored on the file system in the /usr/iop/4.0.0.0/hive/conf directory. The Ambari GUI makes it
easy for us to view and edit them.
Hands-on-Lab Page 9
IBM Software
__3. Go back to the Summary tab. Let’s take a look at one of the Hive components. Notice that the
WebHCat Server has a status of “Started”. WebHCat (formerly known as Templeton) provides a
REST-like web API for HCatalog (a component of Hive).
__4. To prove that WebHCat is available, let’s try a couple WebHCat URLs in the web browser. In a
new browser window, enter this URL:
http://localhost:50111/templeton/v1/status
http://localhost:50111/templeton/v1/ddl/database?user.name=hive
WebHCat gives us a list of databases back. Right now, the only database we have in the system
is the “Default” database.
Let’s navigate to the Hive home directory on the Linux file system and investigate the directories that
Hive is comprised of.
__1. Open the Linux terminal by selecting Applications->System Tools->Terminal from the upper Red
Hat menu.
Hands-on-Lab Page 11
IBM Software
$ cd /usr/iop/4.0.0.0/hive
__3. Explore the directory structure inside the hive folder by running the ls command.
$ ls
• scripts – scripts for upgrading derby and MySQL metastores from one version of Hive to the next
__5. Now cd into the /usr/iop/4.0.0.0/hive-hcatalog directory. This is where the Hcatalog files are
stored.
$ cd /usr/iop/4.0.0.0/hive-hcatalog
Hands-on-Lab Page 13
IBM Software
From the Hive Beeline CLI shell you can perform queries, DML, DDL and more. We will be doing much
work in the Hive Beeline CLI so let’s briefly check it out!
__1. In the Linux terminal change into the Hive bin directory
$ cd /usr/iop/4.0.0.0/hive/bin
__2. Inside the bin directory we will run a command that will show us the command line options for the
Hive CLI.
$ ./beeline --help
$ ./beeline
__5. Run the SHOW DATABASES statement from within the interactive Hive session.
Hands-on-Lab Page 15
IBM Software
hive> !quit
1.4 Summary
Congratulations! You now know how to start and stop Hive using the Ambari Web UI. You can navigate
to the Hive directories and understand the contents of those directories. You also know how to interact
with the Hive Beeline CLI. You may move on to the next Unit.