Вы находитесь на странице: 1из 26

Home

Blog

Testing

SAP

Web

Must Learn!

Live Projects

Sea rch

Data operations in Hive


Data operations are the
operations that can be
performed by
databases and tables
present in the Hive such
as DML, DDL
commands, etc.

open in browser PRO version

Big Data

Are you a developer? Try out the HTML to PDF API

1. Online Database Software

2. Data Warehousing Softwa

3. Database Management Se

4. Compare Database Softwa


5. Data Management Tools

pdfcrowd.com

In this tutorial, you will learn Data types in Hive


Creation and dropping of Database in Hive
Create, Drop and altering of tables in Hive
Table types and its Usage
Partitions

Hive
Tutorials
1) Introduction to

Buckets

Hive

Data types in Hive


Data types are very important elements in Hive query language and
data modeling. For defining the table column types, we must have to
know about the data types and its usage.
The following gives brief overview of some data types present in Hive:
These are

Configuration of
HIVE and MYSQL
3) Data operations
in Hive
4) Hive Queries
and

Numeric Types
String Types
Date/Time Types
open in browser PRO version

2) Installation and

Are you a developer? Try out the HTML to PDF API

Implementation
5) Hive Query
Language, Built-in
pdfcrowd.com

Date/Time Types
Complex Types

Operators and
Functions

Numeric Types:
Type

Memory allocation

TINY INT

Its 1-byte signed integer (-128 to 127)

SMALL
INT

2-byte signed integer (-32768 to 32767)

INT

4 byte signed integer ( -2,147,484,648 to 2,147,484,647)

BIG INT

8 byte signed integer

FLOAT

4 byte single precision floating point number

DOUBLE

8- byte double precision floating point number

DECIMAL

We can define precision and scale in this Type

6) Data Extraction
Using Hive

String Types:
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Type

Length

CHAR

255

VARCHAR

1 to 65355

STRING

We can define length here(No Limit)

Date/Time Types:
Type

Usage

Timestamp Supports traditional UNIX timestamp with optional


nanosecond precision
Date

It's in YYYY-MM-DD format.


The range of values supported for the Date type is
be 0000-01-01 to 9999-12-31, dependent onsupport
by the primitive Java Date type

Complex Types:
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Type

Usage

Arrays

ARRAY<data_type>
Negative values and non-constant expressions not allowed

Maps

MAP<primitive_type, data_type>
Negative values and non-constant expressions not allowed

Structs STRUCT<col_name :datat_type,


Union

.. >

UNIONTYPE<data_type, datat_type,

>

Creation and dropping of Database in Hive:


Create Database:
For creating database in Hive shell, we have to use the command as
shown in syntax below:Syntax:
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Create database <DatabaseName>


Example: -Create database "guru99"

From the above screen shot, we are doing two things


Creating database "guru99" in Hive
Displaying existing databases using "show" command
In the same screen, Database "guru99" name is displayed at the end
when we execute the show command. Which means Database
"guru99" is successfully created.
Drop Database:
For Dropping database in Hive shell, we have to use the "drop"
command as shown in syntax below:Syntax:
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Drop database <DatabaseName>


Example:Drop database guru99

In the above screenshot, we are doing two things


We are dropping database 'guru99' from Hive
Cross checking the same with "show" command
In the same screen, after checking databases with show command,
database"guru99" does not appear inside Hive.
So we can confirm now that database "guru99" is dropped

Create, Drop and altering of tables in Hive


Table Operations such as Creation, Altering, and Dropping tables in
Hive can be observed in this section.

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

In the above screenshot, we are creating a table with columns and


altering the table name.
1. Creating table guru_sample with two column names such as
"empid" and "empname"
2. Displaying tables present in guru99 database
3. Guru_sample displaying under tables
4. Altering table "guru_sample" as "guru_sampleNew"
5. Again when you execute "show" command, it will display the new
name Guru_sampleNew
Dropping table guru_sampleNew:

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Table types and its Usage:


Coming to Tables it's just like the way that we create in traditional
Relational Databases. The functionalities such as filtering, joins can be
performed on the tables.
Hive deals with two types of table structures like Internal and
External tables depending on the loading and design of schema in
Hive.
Internal tables
Internal Table is tightly coupled in nature.In this type of table, first
we have to create table and load the data.
We can call this one as data on schema.
By dropping this table, both data and schema will be removed.
The stored location of this table will be at /user/hive/warehouse.
When to Choose Internal Table:
If the processing data available in local file system
If we want Hive to manage the complete lifecycle of data including
the deletion
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

the deletion
Sample code Snippet for Internal Table
1. To create the internal table
Hive>CREATE TABLE guruhive_internaltable (id INT,Name
STRING);
Row format delimited
Fields terminated by '\t';
2. Load the data into internal table
Hive>LOAD DATA INPATH '/user/guru99hive/data.txt' INTO
table guruhive_internaltable;

3. Display the content of the table


Hive>select * from guruhive_internaltable;
4. To drop the internal table
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Hive>DROP TABLE guruhive_internaltable;


If you dropped the guruhive_internaltable, including its metadata and
its data will be deleted from Hive.
From the following screenshot, we can observe the output

In above code and from screen shot we do following things,


Create the internal table
Load the data into internal table
Display the content of the table
To drop the internal table
External tables
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

External Table is loosely coupled in nature. Data will be available in


HDFS.The table is going to create on HDFS data.
In other way, we can say like its creating schema on data.
At the time of dropping the table it drops only schema, the data will
be still available in HDFS as before.
External tables provide an option to create multiple schemas for the
data stored in HDFS instead of deleting the data every time
whenever schema updates
When to Choose External Table:
If processing data available in HDFS
Useful when the files are being used outside of Hive
Sample code Snippet for External Table
1. Create External table
Hive>CREATE EXTERNAL TABLE guruhive_external(id
INT,Name STRING)
Row format delimited
Fields terminated by '\t'
LOCATION '/user/guru99hive/guruhive_external;
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

2. If we are not specifying the location at the time of table


creation, we can load the data manually
Hive>LOAD DATA INPATH '/user/guru99hive/data.txt' INTO
TABLE guruhive_external;
3. Display the content of the table
Hive>select * from guruhive_external;
4. To drop the internal table
Hive>DROP TABLE guruhive_external;
From the following screen shot, we can observe the output

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

In above code, we do following things


Create the External table
Load the data into internal table
Display the content of the table
Dropping external table
Difference between Internal Vs External tables
Feature

Internal

External

Schema

Data on Schema

Schema on Data

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Storage Location

/usr/hive/warehouse

HDFS location

Data availability

Within local file system

Within HDFS

Tables, Partitions, and Buckets are the parts of Hive data modeling.

Partitions
Partitions:
Hive organizes tables into partitions. It is one of the ways of dividing
tables into different parts based on partition keys. Partition is helpful
when the table has one or more Partition keys. Partition keys are basic
elements for determining how the data is stored in the table.
For Example: "Client having Some E commerce data which belongs to India
operations in which each state (38 states) operations mentioned in as
a whole. If we take state column as partition key and perform
partitions on that India data as a whole, we can able to get Number of
partitions (38 partitions) which is equal to number of states (38)
present in India. Such that each state data can be viewed separately in
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

partitions tables.
Sample Code Snippet for partitions
1. Creation of Table all states
create table all states(state string, District
string,Enrolments string)
row format delimited
fields terminated by ',';
2. Loading data into created table all states
Load data local inpath
'/home/hduser/Desktop/AllStates.csv' into table allstates;
3. Creation of partition table
create table state_part(District string,Enrolments string)
PARTITIONED BY(state string);
4. set hive.exec.dynamic.partition.mode=nonstrict
For partition we have to set this property
5. Loading data into partition table
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

INSERT OVERWRITE TABLE state_part PARTITION(state)


SELECT district,enrolments,state from allstates;
6. Actual processing and formation of partition tables based on
state as partition key
7. There is going to 38 partition outputs in HDFS storage with the
file name as state name. We will check this in this step
The following screen shots will show u the execution of above
mentioned code

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

From the above code, we do following things


1. Creation of table all states with 3 column names such as state,
district, and enrollment
2. Loading data into table all states
3. Creation of partition table with state as partition key
4. In this step Setting partition mode as non-strict( This mode will
activate dynamic partition mode)
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

5. Loading data into partition tablestate_part


6. Actual processing and formation of partition tables based on state
as partition key
7. There is going to 38 partition outputs in HDFS storage with the file
name as state name. We will check this in this step. In This step, we
seeing the 38 partition outputs in HDFS

Buckets
Buckets used for efficient querying. Buckets in hive provide an
effective way of segregating hive tables data into multiple files or
directories.
The data i.e. present in that partitions can be divided further into
Buckets
The division is performed based on Hash of particular columns that
we selected in the table.
Buckets use some form of Hashing algorithm at back end to read
each record and place it into buckets
In Hive, we have to enable buckets by using the
set.hive.enforce.bucketing=true;
Step 1) Creating Bucket as shown below.

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

From the above screen shot


We are creating sample_bucket with column names such as
first_name, job_id, department, salary and country
We are creating 4 buckets overhere.
Once the data get loaded it automatically, place the data into 4
buckets
Step 2) Loading Data into table sample bucket
Assuming that"Employees table" already created in Hive system. In
this step, we will see the loading of Data from employees table into
table sample bucket.
Before we start moving employees data into buckets, make sure that it
consist of column names such as first_name, job_id, department,
salary and country.
Here we are loading data into sample bucket from employees table.
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Step 3)Displaying 4 buckets that created in Step 1

From the above screenshot, we can see that the data from the
employees table is transferred into 4 buckets created in step 1.
Views and Indexes
Views:
Views are Similar to tables, which are generated based on the
requirements.
We can save any result set data as a view in Hive
Usage is similar to as views used in SQL
All type of DML operations can be performed on a view
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Creation of View:
Syntax:
Create VIEW < VIEWNAME> AS SELECT
Example:
Hive>Create VIEW Sample_ViewAS SELECT * FROM employees WHERE
salary>25000
In this example, we are creating view Sample_View where it will display
all the row values with salary field greater than 25000.
Index:
Indexes are pointers to particular column name of a table.
The user has to manually define the index
Wherever we are creating index, it means that we are creating
pointer to particular column name of table
Any Changes made to the column present in tables are stored using
the index value created on the column name.
Syntax:
Create INDEX < INDEX_NAME> ON TABLE < TABLE_NAME(column
open in browser PRO version

Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

names)>
Example:
Create INDEX sample_Index ON TABLE guruhive_internaltable(id)
Here we are creating index on table guruhive_internaltable for column
name id.

You Might Like


SQLite Primary & Foreign
Key Tutorial

SQLite Create, Alter,


Drop Table

HBase Vs Hive

DataStage Tutorial:
Complete Guide

Prev

Next

0 Comments
open in browser PRO version

Guru99
Are you a developer? Try out the HTML to PDF API

pdfcrowd.com

Recommend 5

Share

Start the discussion

Be the first to comment.

ALSO ON GURU99

Top Tips To get Software Testing Jobs

qTest Tutorial: A Complete Guide

1 comment 7 months ago

1 comment 7 months ago

HARISH PAWAR Nice Article Preeti !!!

Pinky Walve

screenshots :)

Accounts Receivable and Accounts Payable

AWS Tutorial

1 comment 5 months ago

1 comment 5 months ago

Prakash Hi, This article by Manisha Singh, is really

ubalddmo

simple and easy for a beginner, like

developer associate

Subscribe

open in browser PRO version

d Add Disqus to your site

Are you a developer? Try out the HTML to PDF API

Privacy

pdfcrowd.com

About

Contact Us

About us
Advertise
with Us
Jobs
Privacy Policy

Contact us
FAQ
Write For Us

Follow Us

Copyright - Guru99 2016

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

Certifications Execute
online
ISTQB
Certification
MySQL
Certification
QTP
Certification
Testing
Certification
CTAL Exam

Execute Java
Online
Execute
Javascript
Execute
HTML
Execute
Python

Interesting!
Books to
Read!
Contest
Quiz

pdfcrowd.com

Вам также может понравиться