Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Learn Hive in 24 Hours
Learn Hive in 24 Hours
Learn Hive in 24 Hours
Ebook113 pages36 minutes

Learn Hive in 24 Hours

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Apache Hive is the new member in database family that works within the Hadoop ecosystem. It provides all great features like data summarization, ad-hoc query, and analysis of large datasets. If you are not a good programmer, then this edition will teach you how to use hive queries without writing complex codes.


Most users face the problem of not getting a dedicated course on Hive. The goal of this e-book is to cater everything about Hive and only Hive with minimum jargons. The notes, lessons and hands-on examples in this small e-book are simplified and tactfully presented to solve all your Hive queries. Instead of writing long code for MapReduce or Java, the e-book shows tips on writing the same program with a minimum code snippet.


Beginners as well as peers will thoroughly enjoy this book. They will discover and learn more hive patterns for data processing and data integrations. Unlike other e-book, where they skip basic detail thinking users having prior subject knowledge. This edition has given complete attention to each and every small aspect of the hive like “how to set up and configure Hive in your environment”.


This e-book is also helpful for those who just want to explore Hive and don’t want to spend big bucks for short courses. You will quickly learn, apply and share your Hive knowledge with this e-book.


Table of content


Chapter 1: Introduction


What is Hive?


Hive Architecture


Different modes of Hive


What is Hive Server2 (HS2)?


Hive vs Map Reduce


Chapter 2: Installation and Configuration


Installation of Hive


Hive shell commands


Install and configure MYSQL database


Chapter 3: Data operations


Data types in Hive


Creation and dropping of Database in Hive


Create, Drop and altering of tables in Hive


Table types and its Usage


Partitions


Buckets


Chapter 4: Queries and Implementation


Order by query


Group by query


Sort by


Cluster By


Distribute By


Join queries


Different type of joins


Sub queries


Embedding custom scripts


UDFs (User Define Functions)


Chapter 5: Query Language, Built-in Operators and Functions


Hive Query Language (HQL)


Built-in operators


Built-in functions


Chapter 6: Data Extraction


Working with Structured Data using Hive


Working with Semi structured data using Hive (XML, JSON)


Hive in Real time projects – When and Where to Use

LanguageEnglish
PublisherPublishdrive
Release dateNov 12, 2021
Learn Hive in 24 Hours

Read more from Alex Nordeen

Related to Learn Hive in 24 Hours

Related ebooks

Computers For You

View More

Related articles

Reviews for Learn Hive in 24 Hours

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Learn Hive in 24 Hours - Alex Nordeen

    Learn Hive in 24 Hours

    By Alex Nordeen

    Copyright 2021 - All Rights Reserved – Alex Nordeen

    ALL RIGHTS RESERVED. No part of this publication may be reproduced or transmitted in any form whatsoever, electronic, or mechanical, including photocopying, recording, or by any informational storage or retrieval system without express written, dated and signed permission from the author.

    Table Of Content

    Chapter 1: Introduction

    What is Hive?

    Hive Architecture

    Different modes of Hive

    What is Hive Server2 (HS2)?

    Hive vs Map Reduce

    Chapter 2: Installation and Configuration

    Installation of Hive

    Hive shell commands

    Install and configure MYSQL database

    Chapter 3: Data operations

    Data types in Hive

    Creation and dropping of Database in Hive

    Create, Drop and altering of tables in Hive

    Table types and its Usage

    Partitions

    Buckets

    Chapter 4: Queries and Implementation

    Order by query

    Group by query

    Sort by

    Cluster By

    Distribute By

    Join queries

    Different type of joins

    Sub queries

    Embedding custom scripts

    UDFs (User Define Functions)

    Chapter 5: Query Language, Built-in Operators and Functions

    Hive Query Language (HQL)

    Built-in operators

    Built-in functions

    Chapter 6: Data Extraction

    Working with Structured Data using Hive

    Working with Semi structured data using Hive (XML, JSON)

    Hive in Real time projects – When and Where to Use

    Chapter 1: Introduction

    Hive is developed on top of Hadoop. It is a data warehouse framework for querying and analysis of data that is stored in HDFS. Hive is an open source-software that lets programmers analyze large data sets on Hadoop. 

    What is Hive?

    Hive is an ETL and Data warehousing tool developed on top of Hadoop Distributed File System (HDFS). Hive makes job easy for performing operations like

    Data encapsulation

    Ad-hoc queries

    Analysis of huge datasets

    Important characteristics of Hive

    In Hive, tables and databases are created first and then data is loaded into these tables.

    Hive as data warehouse designed for managing and querying only structured data that is stored in tables.

    While dealing with structured data, Map Reduce doesn't have optimization and usability features like UDFs but Hive framework does. Query optimization refers to an effective way of query execution in terms of performance.

    Hive's SQL-inspired language separates the user from the complexity of Map Reduce programming. It reuses familiar concepts from the relational database world, such as tables, rows, columns and schema, etc. for ease of learning.

    Hadoop's programming works on flat files. So, Hive can use directory structures to partition data to improve performance on certain queries.

    A new and important component of Hive i.e. Metastore used for storing schema information. This Metastore typically resides in a relational database. We can interact with Hive using methods like

    Web GUI

    Java Database Connectivity (JDBC) interface

    Most interactions tend to take place over a command line interface (CLI). Hive provides a CLI to write Hive queries using Hive Query Language(HQL)

    Generally, HQL syntax is similar to the SQL syntax that most data analysts are familiar with. The Sample query below display all the records present in mentioned table name.

    Sample query : Select * from

    Hive supports four file formats those are TEXTFILE, SEQUENCEFILE, ORC and RCFILE (Record Columnar File).

    For single user metadata storage, Hive uses derby database and for multiple user Metadata or shared Metadata case Hive uses MYSQL.

    Enjoying the preview?
    Page 1 of 1