Data Operations Hive

Home
Blog
Testing
SAP
Web
Must Learn!
Live Projects
Sea rch
Data operations in Hive

Data operations are the
operations that can be
performed by
databases and tables
present in the Hive such
as DML, DDL
commands, etc.
open in browser PRO version
Big Data
Are you a developer? Try out the HTML to PDF API
1. Online Database Software
2. Data Warehousing Softwa
3. Database Management Se
4. Compare Database Softwa

5. Data Management Tools
pdfcrowd.com
In this tutorial, you will learn Data types in Hive

Creation and dropping of Database in Hive
Create, Drop and altering of tables in Hive
Table types and its Usage
Partitions
Hive
Tutorials
1) Introduction to
Buckets
Hive
Data types in Hive

Data types are very important elements in Hive query language and
data modeling. For defining the table column types, we must have to
know about the data types and its usage.
The following gives brief overview of some data types present in Hive:
These are
Configuration of
HIVE and MYSQL
3) Data operations
in Hive
4) Hive Queries
and
Numeric Types
String Types
Date/Time Types
2) Installation and
Implementation
5) Hive Query
Language, Built-in
pdfcrowd.com
Date/Time Types
Complex Types
Operators and
Functions
Numeric Types:
Type
Memory allocation
TINY INT
Its 1-byte signed integer (-128 to 127)
SMALL
INT
2-byte signed integer (-32768 to 32767)
INT
4 byte signed integer ( -2,147,484,648 to 2,147,484,647)
BIG INT
8 byte signed integer
FLOAT
4 byte single precision floating point number
DOUBLE
8- byte double precision floating point number
DECIMAL
We can define precision and scale in this Type
6) Data Extraction
Using Hive
String Types:
pdfcrowd.com
Type
Length
CHAR
255
VARCHAR
1 to 65355
STRING
We can define length here(No Limit)
Date/Time Types:
Type
Usage
Timestamp Supports traditional UNIX timestamp with optional

nanosecond precision
Date
It's in YYYY-MM-DD format.

The range of values supported for the Date type is
be 0000-01-01 to 9999-12-31, dependent onsupport
by the primitive Java Date type
Complex Types:
pdfcrowd.com
Type
Usage
Arrays
ARRAY<data_type>
Negative values and non-constant expressions not allowed
Maps
MAP<primitive_type, data_type>
Negative values and non-constant expressions not allowed
Structs STRUCT<col_name :datat_type,

Union
.. >
UNIONTYPE<data_type, datat_type,
>
Creation and dropping of Database in Hive:

Create Database:
For creating database in Hive shell, we have to use the command as
shown in syntax below:Syntax:
pdfcrowd.com
Create database <DatabaseName>

Example: -Create database "guru99"
From the above screen shot, we are doing two things

Creating database "guru99" in Hive
Displaying existing databases using "show" command
In the same screen, Database "guru99" name is displayed at the end
when we execute the show command. Which means Database
"guru99" is successfully created.
Drop Database:
For Dropping database in Hive shell, we have to use the "drop"
command as shown in syntax below:Syntax:
pdfcrowd.com
Drop database <DatabaseName>

Example:Drop database guru99
In the above screenshot, we are doing two things

We are dropping database 'guru99' from Hive
Cross checking the same with "show" command
In the same screen, after checking databases with show command,
database"guru99" does not appear inside Hive.
So we can confirm now that database "guru99" is dropped
Create, Drop and altering of tables in Hive

Table Operations such as Creation, Altering, and Dropping tables in
Hive can be observed in this section.
pdfcrowd.com
In the above screenshot, we are creating a table with columns and

altering the table name.
1. Creating table guru_sample with two column names such as
"empid" and "empname"
2. Displaying tables present in guru99 database
3. Guru_sample displaying under tables
4. Altering table "guru_sample" as "guru_sampleNew"
5. Again when you execute "show" command, it will display the new
name Guru_sampleNew
Dropping table guru_sampleNew:
pdfcrowd.com
Table types and its Usage:

Coming to Tables it's just like the way that we create in traditional
Relational Databases. The functionalities such as filtering, joins can be
performed on the tables.
Hive deals with two types of table structures like Internal and
External tables depending on the loading and design of schema in
Hive.
Internal tables
Internal Table is tightly coupled in nature.In this type of table, first
we have to create table and load the data.
We can call this one as data on schema.
By dropping this table, both data and schema will be removed.
The stored location of this table will be at /user/hive/warehouse.
When to Choose Internal Table:
If the processing data available in local file system
If we want Hive to manage the complete lifecycle of data including
the deletion
pdfcrowd.com
the deletion
Sample code Snippet for Internal Table
1. To create the internal table
Hive>CREATE TABLE guruhive_internaltable (id INT,Name
STRING);
Row format delimited
Fields terminated by '\t';
2. Load the data into internal table
Hive>LOAD DATA INPATH '/user/guru99hive/data.txt' INTO
table guruhive_internaltable;
3. Display the content of the table

Hive>select * from guruhive_internaltable;
4. To drop the internal table
pdfcrowd.com
Hive>DROP TABLE guruhive_internaltable;

If you dropped the guruhive_internaltable, including its metadata and
its data will be deleted from Hive.
From the following screenshot, we can observe the output
In above code and from screen shot we do following things,

Create the internal table
Load the data into internal table
Display the content of the table
To drop the internal table
External tables
pdfcrowd.com
External Table is loosely coupled in nature. Data will be available in

HDFS.The table is going to create on HDFS data.
In other way, we can say like its creating schema on data.
At the time of dropping the table it drops only schema, the data will
be still available in HDFS as before.
External tables provide an option to create multiple schemas for the
data stored in HDFS instead of deleting the data every time
whenever schema updates
When to Choose External Table:
If processing data available in HDFS
Useful when the files are being used outside of Hive
Sample code Snippet for External Table
1. Create External table
Hive>CREATE EXTERNAL TABLE guruhive_external(id
INT,Name STRING)
Row format delimited
Fields terminated by '\t'
LOCATION '/user/guru99hive/guruhive_external;
pdfcrowd.com
2. If we are not specifying the location at the time of table

creation, we can load the data manually
Hive>LOAD DATA INPATH '/user/guru99hive/data.txt' INTO
TABLE guruhive_external;
3. Display the content of the table
Hive>select * from guruhive_external;
4. To drop the internal table
Hive>DROP TABLE guruhive_external;
From the following screen shot, we can observe the output
pdfcrowd.com
In above code, we do following things

Create the External table
Load the data into internal table
Display the content of the table
Dropping external table
Difference between Internal Vs External tables
Feature
Internal
External
Schema
Data on Schema
Schema on Data
pdfcrowd.com
Storage Location
/usr/hive/warehouse
HDFS location
Data availability
Within local file system
Within HDFS
Tables, Partitions, and Buckets are the parts of Hive data modeling.
Partitions
Partitions:
Hive organizes tables into partitions. It is one of the ways of dividing
tables into different parts based on partition keys. Partition is helpful
when the table has one or more Partition keys. Partition keys are basic
elements for determining how the data is stored in the table.
For Example: "Client having Some E commerce data which belongs to India
operations in which each state (38 states) operations mentioned in as
a whole. If we take state column as partition key and perform
partitions on that India data as a whole, we can able to get Number of
partitions (38 partitions) which is equal to number of states (38)
present in India. Such that each state data can be viewed separately in
pdfcrowd.com
partitions tables.
Sample Code Snippet for partitions
1. Creation of Table all states
create table all states(state string, District
string,Enrolments string)
row format delimited
fields terminated by ',';
2. Loading data into created table all states
Load data local inpath
'/home/hduser/Desktop/AllStates.csv' into table allstates;
3. Creation of partition table
create table state_part(District string,Enrolments string)
PARTITIONED BY(state string);
4. set hive.exec.dynamic.partition.mode=nonstrict
For partition we have to set this property
5. Loading data into partition table
pdfcrowd.com
INSERT OVERWRITE TABLE state_part PARTITION(state)

SELECT district,enrolments,state from allstates;
6. Actual processing and formation of partition tables based on
state as partition key
7. There is going to 38 partition outputs in HDFS storage with the
file name as state name. We will check this in this step
The following screen shots will show u the execution of above
mentioned code
pdfcrowd.com
pdfcrowd.com
From the above code, we do following things

1. Creation of table all states with 3 column names such as state,
district, and enrollment
2. Loading data into table all states
3. Creation of partition table with state as partition key
4. In this step Setting partition mode as non-strict( This mode will
activate dynamic partition mode)
pdfcrowd.com
5. Loading data into partition tablestate_part

6. Actual processing and formation of partition tables based on state
as partition key
7. There is going to 38 partition outputs in HDFS storage with the file
name as state name. We will check this in this step. In This step, we
seeing the 38 partition outputs in HDFS
Buckets
Buckets used for efficient querying. Buckets in hive provide an
effective way of segregating hive tables data into multiple files or
directories.
The data i.e. present in that partitions can be divided further into
Buckets
The division is performed based on Hash of particular columns that
we selected in the table.
Buckets use some form of Hashing algorithm at back end to read
each record and place it into buckets
In Hive, we have to enable buckets by using the
set.hive.enforce.bucketing=true;
Step 1) Creating Bucket as shown below.
pdfcrowd.com
From the above screen shot

We are creating sample_bucket with column names such as
first_name, job_id, department, salary and country
We are creating 4 buckets overhere.
Once the data get loaded it automatically, place the data into 4
buckets
Step 2) Loading Data into table sample bucket
Assuming that"Employees table" already created in Hive system. In
this step, we will see the loading of Data from employees table into
table sample bucket.
Before we start moving employees data into buckets, make sure that it
consist of column names such as first_name, job_id, department,
salary and country.
Here we are loading data into sample bucket from employees table.
pdfcrowd.com
Step 3)Displaying 4 buckets that created in Step 1
From the above screenshot, we can see that the data from the
employees table is transferred into 4 buckets created in step 1.
Views and Indexes
Views:
Views are Similar to tables, which are generated based on the
requirements.
We can save any result set data as a view in Hive
Usage is similar to as views used in SQL
All type of DML operations can be performed on a view
pdfcrowd.com
Creation of View:
Syntax:
Create VIEW < VIEWNAME> AS SELECT
Example:
Hive>Create VIEW Sample_ViewAS SELECT * FROM employees WHERE
salary>25000
In this example, we are creating view Sample_View where it will display
all the row values with salary field greater than 25000.
Index:
Indexes are pointers to particular column name of a table.
The user has to manually define the index
Wherever we are creating index, it means that we are creating
pointer to particular column name of table
Any Changes made to the column present in tables are stored using
the index value created on the column name.
Syntax:
Create INDEX < INDEX_NAME> ON TABLE < TABLE_NAME(column
pdfcrowd.com
names)>
Example:
Create INDEX sample_Index ON TABLE guruhive_internaltable(id)
Here we are creating index on table guruhive_internaltable for column
name id.
You Might Like

SQLite Primary & Foreign
Key Tutorial
SQLite Create, Alter,

Drop Table
HBase Vs Hive
DataStage Tutorial:
Complete Guide
Prev
Next
0 Comments
Guru99
pdfcrowd.com
Recommend 5
Share
Start the discussion
Be the first to comment.
ALSO ON GURU99
Top Tips To get Software Testing Jobs
qTest Tutorial: A Complete Guide
1 comment 7 months ago
HARISH PAWAR Nice Article Preeti !!!
Pinky Walve
screenshots :)
Accounts Receivable and Accounts Payable
AWS Tutorial
Prakash Hi, This article by Manisha Singh, is really
ubalddmo
simple and easy for a beginner, like
developer associate
Subscribe
d Add Disqus to your site
Privacy
pdfcrowd.com
About
Contact Us
About us
Advertise
with Us
Jobs
Privacy Policy
Contact us
FAQ
Write For Us
Follow Us
Copyright - Guru99 2016
Certifications Execute
online
ISTQB
Certification
MySQL
Certification
QTP
Certification
Testing
Certification
CTAL Exam
Execute Java
Online
Execute
Javascript
Execute
HTML
Execute
Python
Interesting!
Books to
Read!
Contest
Quiz
pdfcrowd.com

Data Operations Hive

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Data Operations Hive

Загружено:

Авторское право:

Доступные форматы

Home

Data operations in Hive

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

1. Online Database Software

2. Data Warehousing Softwa

4. Compare Database Softwa

In this tutorial, you will learn Data types in Hive

Data types in Hive

Are you a developer? Try out the HTML to PDF API

Its 1-byte signed integer (-128 to 127)

2-byte signed integer (-32768 to 32767)

4 byte signed integer ( -2,147,484,648 to 2,147,484,647)

8 byte signed integer

4 byte single precision floating point number

8- byte double precision floating point number

We can define precision and scale in this Type

Are you a developer? Try out the HTML to PDF API

We can define length here(No Limit)

Timestamp Supports traditional UNIX timestamp with optional

It's in YYYY-MM-DD format.

Are you a developer? Try out the HTML to PDF API

Structs STRUCT<col_name :datat_type,

Creation and dropping of Database in Hive:

Are you a developer? Try out the HTML to PDF API

Create database <DatabaseName>

From the above screen shot, we are doing two things

Are you a developer? Try out the HTML to PDF API

Drop database <DatabaseName>

In the above screenshot, we are doing two things

Create, Drop and altering of tables in Hive

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

In the above screenshot, we are creating a table with columns and

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

Table types and its Usage:

Are you a developer? Try out the HTML to PDF API

3. Display the content of the table

Are you a developer? Try out the HTML to PDF API

Hive>DROP TABLE guruhive_internaltable;

In above code and from screen shot we do following things,

Are you a developer? Try out the HTML to PDF API

External Table is loosely coupled in nature. Data will be available in

Are you a developer? Try out the HTML to PDF API

2. If we are not specifying the location at the time of table

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

In above code, we do following things

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

Within local file system

Are you a developer? Try out the HTML to PDF API

Are you a developer? Try out the HTML to PDF API

INSERT OVERWRITE TABLE state_part PARTITION(state)

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

open in browser PRO version

Are you a developer? Try out the HTML to PDF API

From the above code, we do following things

Are you a developer? Try out the HTML to PDF API

5. Loading data into partition tablestate_part

open in browser PRO version

Are you a developer? Try out the HTML to PDF API