ICITST2014 Presentation

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/309616918
ICITST 2014 - Presentation
Data · December 2014
CITATIONS READS
0 34
2 authors:
Pedro Roger Magalhães Vasconcelos Gisele Azevedo de Araújo Freitas

Universidade Federal do Ceará Universidade Federal do Ceará
8 PUBLICATIONS 6 CITATIONS 15 PUBLICATIONS 33 CITATIONS
SEE PROFILE SEE PROFILE
All content following this page was uploaded by Pedro Roger Magalhães Vasconcelos on 02 November 2016.
The user has requested enhancement of the downloaded file.

Performance Analysis of Hadoop MapReduce on an
OpenNebula Cloud with KVM and OpenVZ
Virtualizations
Pedro Roger Magalhães Vasoncelos

pedro.roger@alu.ufc.br
Gisele Azevedo de Araújo Freitas

gisele@lia.ufc.br
Postgraduate Program in Electrical and Computer Engineering
Federal University of Ceará
Sobral, Ceará - Brazil
Agenda
Introduction
Introduction Cloud Computing
Cloud Computing Virtualization

KVM
Virtualization OpenVZ
KVM MapReduce
Hadoop
OpenVZ HDFS
Experimental
Evaluation
MapReduce Conclusions
Hadoop
HDFS
Experimental Evaluation
Conclusions
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Introduction
Cloud computing provides access to a set of 2 Cloud Computing

Virtualization
resources such as virtual machines, storage and KVM

OpenVZ
network as services. MapReduce

Hadoop
HDFS
I On-demand unlimited data storage
Experimental
I On-demand computation power, mainly represented as Evaluation
Conclusions
Virtual Machines
I Uses the internet to access, use and process the
resources
I Multitenancy
I Massive scalability

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Introduction
3 Cloud Computing
Virtualization
Characteristics: KVM
OpenVZ
I Typically is hosted on a server farm MapReduce
Hadoop
I Large amount of computers and resources HDFS
I The cloud provider offers a interface for users: Experimental

Evaluation
I Pay for certain amount of processing power, storage or Conclusions
computers
I Based on a business model
I The resources can be increased or decreased based on
demand

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Introduction
Nowadays, there are a lot of open source cloud 4 Cloud Computing
Virtualization
computing solutions for providing infrastructure KVM
OpenVZ
environments: MapReduce
Hadoop
HDFS
Experimental
I OpenStack Evaluation
Conclusions
I Eucalyptus
I OpenNebula
I CloudStack

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Introduction
5 Cloud Computing
Virtualization
OpenNebula is an open source solution that allows KVM

OpenVZ
easily deploy private/hybrid infrastructure clouds MapReduce

Hadoop
based on IaaS model. HDFS
Experimental
Evaluation
Conclusions
I Great flexibility regarding hypervisor usage
I Natively supports KVM, Xen and VMware ESXi
I Drivers provided by OpenNebula community provides
support for OpenVZ OS-level virtualization

Transactions
31 (ICITST-2014)
Introduction
Virtualization
Introduction
Cloud Computing
6 Virtualization
Virtualization KVM
OpenVZ
I Logical representation of the computer using a software MapReduce

Hadoop
I Allows several operating systems run inside a single HDFS
Experimental
machine Evaluation
Conclusions
Main advantages:
I Effective use of hardware
I Virtual machine isolation
I Allows less physical hardware and less dissipation of heat

Transactions
31 (ICITST-2014)
Introduction
Virtualization
KVM - Kernel-based Virtual Machine Introduction

Cloud Computing
Virtualization
7 KVM
KVM is a full-virtualization solution for the Linux kernel. OpenVZ
MapReduce
I Requires a processor with hardware virtualization Hadoop
HDFS
extension
Experimental
Evaluation
Full-virtualization:
Conclusions
I A layer, commonly called the hypervisor, exists between
the virtualized operating systems and the hardware
I This layer multiplexes the system resources between
competing operating system instances
I Provides total abstraction of physical hardware
I Does not require modification in the guest OS
Transactions
31 (ICITST-2014)
Introduction
Virtualization
Introduction
OpenVZ Cloud Computing

Virtualization
KVM
8 OpenVZ
OpenVZ is an operating system-level virtualization technology MapReduce

Hadoop
based on the Linux kernel and operating system. HDFS
Experimental
Evaluation
OS-level Virtualization:
Conclusions
I Allows a physical server to run multiple isolated OS
instances, known as containers
I Technology which works at OS layer
I In practice, hypervisors works at the hardware abstraction
level and OS-level virtualization at the system call layer

Transactions
31 (ICITST-2014)
Introduction
Introduction
Cloud Computing
Virtualization
KVM
9 OpenVZ
MapReduce
This paper evaluates the performance of a Hadoop Hadoop
HDFS
MapReduce cluster on a OpenNebula cloud under Experimental
two different types of virtualization: full virtualization Evaluation
Conclusions
and operating system-level virtualization.

Transactions
31 (ICITST-2014)
MapReduce
Introduction
Cloud Computing
Virtualization
KVM
MapReduce OpenVZ
10 MapReduce
Hadoop
I Programming model that works on large datasets HDFS
I Many organizations use MapReduce (MR) model for Experimental

Evaluation
computing when they have huge datasets and need to Conclusions
process them within short time
I Works by breaking the processing into two phases: the
map phase and the reduce phase

Transactions
31 (ICITST-2014)
MapReduce
Introduction
Cloud Computing
Virtualization
KVM
Map phase: OpenVZ
11 MapReduce
I processes the input in the form of key/value pairs and Hadoop
HDFS
generate intermediate key/value pairs Experimental
Evaluation
Conclusions
Reduce phase:
I process all intermediate values associated with the same
intermediate key generated by the Map function

Transactions
31 (ICITST-2014)
MapReduce
Introduction
Cloud Computing
Virtualization
Hadoop KVM
OpenVZ
MapReduce
I Hadoop is a distributed programming framework and an 12 Hadoop
execution environment for MapReduce programs HDFS
Experimental
I A MR job consists of multiple map and reduce tasks that Evaluation
are scheduled to run in the Hadoop cluster nodes Conclusions
I There are two types of nodes that control the job execution
process:
I A JobTracker
I A number of TaskTrackers

Transactions
31 (ICITST-2014)
MapReduce
Introduction
Hadoop Distributed File System - HDFS Cloud Computing

Virtualization
KVM
I HDFS is the main storage system used by Hadoop OpenVZ
MapReduce
I HDFS creates multiple replicas of data blocks and Hadoop
13 HDFS
distributes them among the cluster nodes Experimental
Evaluation
I All data is stored as HDFS files composed of datablocks of
Conclusions
fixed size (64MB) distributed across multiple nodes
I Two tipe of nodes: a NameNode and a number of
DataNodes
I Namenodes maintains the metadata about the files and
directory tree
I DataNodes store the data blocks themselves

Transactions
31 (ICITST-2014)
Introduction
To evaluate the performance of Hadoop cluster on each Cloud Computing
virtualization platform, we propose the establishment of two Virtualization

KVM
private OpenNebula clouds OpenVZ
MapReduce
I 2x IBM BladeCenter HS21
Hadoop
HDFS
I Intel Xeon CPUs E5-2620 of 2.00GHz (with 6 cores and HT
14 Experimental
technology in each) Evaluation
I 48GB of RAM Conclusions
I Connected to a SAN via Fibre Channel
I Running Ubuntu GNU/Linux 14.04.1 LTS amd64
I OpenNebula 4.8.0
I Each Hadoop Cluster consists of 6 VMs
I 2 vCPUs, 2GB of vRAM, 1GB of swap, 10GB of disk
I Ubuntu GNU/Linux 12.04 amd64
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
I QEMU/KVM 2.0.0 OpenVZ
I For best performance, VM used KVM VirtIO paravirtualized MapReduce

Hadoop
drives for disk and network HDFS
I OpenVZ kernel 2.6.32-openvz-amd64-042stab093.5 15 Experimental

Evaluation
I Hadoop 1.2.1 Conclusions
I Oracle Java 1.7.0 45

I HDFS size: 60GB (6 x 10GB per virtual machine)
I HDFS block size: 64MB
I HDFS replication factor: 3

Transactions
31 (ICITST-2014)
Introduction
WordCount Cloud Computing
Virtualization
I Application that reads text files as input and computes the KVM
OpenVZ
number of occorences of each word in a file MapReduce

Hadoop
I The WordCount already comes with the Hadoop default HDFS
installation and is widely used as a method of comparing 16 Experimental

Evaluation
performance between different Hadoop clusters Conclusions
I Input files:
I 64MB, 128MB, 256MB, 512MB, 1GB and 2GB
I Generated from the concatenation of random text files
downloaded from Project Gutenberg
I OpenVZ reached the lowest execution time of WordCount
for all input data sizes:
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
17 Experimental
Evaluation
Conclusions

Figure: WordCount Results Conference for Internet
Transactions
31 (ICITST-2014)
TeraSort Introduction
Cloud Computing
Virtualization
I The goal of TeraSort benchmark is to sort certain volume KVM
OpenVZ
of data as quickly as possible
MapReduce
I It is a benchmark that combines the use of HDFS layer Hadoop
HDFS
and MapReduce layer 18 Experimental
Evaluation
I TeraSort consists of 3 MR applications:
Conclusions
I TeraGen is a MR program to generate the data
I TeraSort samples the input data and uses MR to sort the
data into a total order
I TeraValidate is a MR program that validates the output is
sorted
I We used variable sizes for generation of input data
through TeraGen: 512MB, 1GB and 2GB
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
19 Experimental
Evaluation
Conclusions

Figure: TeraSort Results Conference for Internet
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
TeraSort MapReduce
Hadoop
I OpenVZ performs better than KVM in the TeraSort and HDFS
TeraValidate benchmarks 20 Experimental

Evaluation
I TeraSort and TeraValidate are a cpu-intensive and I/O
Conclusions
reading-intensive tests
I KVM outperforms OpenVZ in the all TeraGen tests
I TeraGen is a I/O writing-intensive test

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
TestDFSIO Virtualization
KVM
OpenVZ
I TestDFSIO is a read/write test to HDFS MapReduce
Hadoop
I Useful to perform stress tests in the HDFS, to find HDFS
bottlenecks and evaluate the cluster I/O rate 21 Experimental

Evaluation
I TestDFSIO consists of 2 tests: Conclusions
I The first, generates and writes the files in a HDFS directory
I The second, reads the files created by the first run and
performs measurements
I In our runs, we inform the TestDFSIO to create 10 files of
100MB in HDFS

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
22 Experimental
Evaluation
Conclusions

Figure: TestDFSIO Results Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
TestDFSIO Hadoop
HDFS
I OpenVZ performance in writing tests was much lower than 23 Experimental
Evaluation
KVM
Conclusions
I Although, in the reading tests OpenVZ performs better
than KVM

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
NNBench KVM
OpenVZ
I This test generates various requests to the HDFS with MapReduce

Hadoop
normally very small payloads for the purpose of stressing it HDFS
I The benchmark can simulate requests for creating, 24 Experimental

Evaluation
reading, renaming and deleting files on HDFS Conclusions
I In our implementation, NNBench created 1000 files using

6 maps and 2 reducers
I OpenVZ experienced better performance in the sequential
creation of files than KVM

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
25 Experimental
Evaluation
Conclusions
Figure: NNBench Results The 9th International

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
MRBench KVM
OpenVZ
I The MRBench loops a small job a number of times MapReduce

Hadoop
I CPU-intensive and network-intensive tests HDFS
26 Experimental
I It put its focus on the MapReduce layer as its impact on Evaluation
the HDFS layer is very limited Conclusions
I We ran MRBench in order to run a loop of 50 small test

jobs
I OpenVZ completed the test run in less time.
I Almost 50% faster

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
27 Experimental
Evaluation
Conclusions
Figure: MRBench Results The 9th International

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
Pi KVM
OpenVZ
I The Hadoop Pi benchmark is a MapReduce program that MapReduce

Hadoop
estimates Pi using monte-carlo method HDFS
I This benchmark is focused on computation and involve 28 Experimental

Evaluation
nearly no storage I/O or network traffic Conclusions
I In our runs, Pi calculated 10 billions samples spread

across 6 maps tasks
I OpenVZ completed the test run in less time.
I More then 2x faster

Transactions
31 (ICITST-2014)
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
Hadoop
HDFS
29 Experimental
Evaluation
Conclusions
Figure: Pi Results The 9th International

Transactions
31 (ICITST-2014)
Conclusions
I OpenVZ performs better than KVM in the CPU intensive

Introduction
tests, in both MR layer or HDFS layer, like WordCount, Cloud Computing
TeraSort, TeraValidate, MRBench and Pi. Virtualization

KVM
I The CPU overhead in OpenVZ is much smaller than KVM OpenVZ
CPU overhead. MapReduce

Hadoop
I Openvz too reachs better results than KVM on I/O reading HDFS
Experimental
tests as showed in the values of Read Throughput and Evaluation
Read Average I/O Rate operations of DFSIOtest 30 Conclusions
benchmark
I However, OpenVZ performs worst than KVM in I/O writing
tests of large files in TeraGen and reached low rates of
Write Throughput and Write Average I/O in TestDFSIO
I But, in the sequential creation of inumerous small files in
NNBench test the time elapsed in KVM run was almost
twice of OpenVZ time The 9th International
Transactions
31 (ICITST-2014)
Conclusions
Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
MapReduce
I By using OpenVZ, an Hadoop cluster can achieve a high Hadoop
HDFS
performance on virtualized systems when running jobs Experimental
that use intensively CPU, network and I/O reading. Jobs Evaluation
that perform intense disk writing operations should be 31 Conclusions
executed with caution or executed natively

Transactions
31 (ICITST-2014)
Thank you!
pedro.roger@alu.ufc.br gisele@lia.ufc.br
View publication stats

ICITST2014 Presentation

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

ICITST2014 Presentation

Загружено:

Авторское право:

Доступные форматы

See

ICITST 2014 - Presentation

Data · December 2014

Pedro Roger Magalhães Vasconcelos Gisele Azevedo de Araújo Freitas

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Pedro Roger Magalhães Vasoncelos

Gisele Azevedo de Araújo Freitas

Cloud Computing Virtualization

Cloud computing provides access to a set of 2 Cloud Computing

resources such as virtual machines, storage and KVM

network as services. MapReduce

The 9th International

I The cloud provider offers a interface for users: Experimental

The 9th International

The 9th International

OpenNebula is an open source solution that allows KVM

easily deploy private/hybrid infrastructure clouds MapReduce

The 9th International

I Logical representation of the computer using a software MapReduce

I Allows several operating systems run inside a single HDFS

The 9th International

KVM - Kernel-based Virtual Machine Introduction

OpenVZ Cloud Computing

OpenVZ is an operating system-level virtualization technology MapReduce

The 9th International

two different types of virtualization: full virtualization Evaluation

The 9th International

I Many organizations use MapReduce (MR) model for Experimental

The 9th International

The 9th International

execution environment for MapReduce programs HDFS

are scheduled to run in the Hadoop cluster nodes Conclusions

The 9th International

Hadoop Distributed File System - HDFS Cloud Computing

I HDFS is the main storage system used by Hadoop OpenVZ

The 9th International

virtualization platform, we propose the establishment of two Virtualization

private OpenNebula clouds OpenVZ

I For best performance, VM used KVM VirtIO paravirtualized MapReduce

I OpenVZ kernel 2.6.32-openvz-amd64-042stab093.5 15 Experimental

I Oracle Java 1.7.0 45

The 9th International

number of occorences of each word in a file MapReduce

installation and is widely used as a method of comparing 16 Experimental

The 9th International

The 9th International

TeraValidate benchmarks 20 Experimental

The 9th International

bottlenecks and evaluate the cluster I/O rate 21 Experimental

The 9th International

The 9th International

The 9th International

I This test generates various requests to the HDFS with MapReduce

I The benchmark can simulate requests for creating, 24 Experimental

I In our implementation, NNBench created 1000 files using

The 9th International

Figure: NNBench Results The 9th International

I The MRBench loops a small job a number of times MapReduce

the HDFS layer is very limited Conclusions

I We ran MRBench in order to run a loop of 50 small test

The 9th International

Figure: MRBench Results The 9th International

I The Hadoop Pi benchmark is a MapReduce program that MapReduce