Вы находитесь на странице: 1из 34

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/309616918

ICITST 2014 - Presentation

Data · December 2014

CITATIONS READS

0 34

2 authors:

Pedro Roger Magalhães Vasconcelos Gisele Azevedo de Araújo Freitas


Universidade Federal do Ceará Universidade Federal do Ceará
8 PUBLICATIONS 6 CITATIONS 15 PUBLICATIONS 33 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Pedro Roger Magalhães Vasconcelos on 02 November 2016.

The user has requested enhancement of the downloaded file.


Performance Analysis of Hadoop MapReduce on an
OpenNebula Cloud with KVM and OpenVZ
Virtualizations

Pedro Roger Magalhães Vasoncelos


pedro.roger@alu.ufc.br

Gisele Azevedo de Araújo Freitas


gisele@lia.ufc.br
Postgraduate Program in Electrical and Computer Engineering
Federal University of Ceará
Sobral, Ceará - Brazil
Agenda

Introduction
Introduction Cloud Computing

Cloud Computing Virtualization


KVM

Virtualization OpenVZ

KVM MapReduce
Hadoop
OpenVZ HDFS

Experimental
Evaluation

MapReduce Conclusions

Hadoop
HDFS

Experimental Evaluation

Conclusions
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing

Introduction

Cloud computing provides access to a set of 2 Cloud Computing


Virtualization

resources such as virtual machines, storage and KVM


OpenVZ

network as services. MapReduce


Hadoop
HDFS
I On-demand unlimited data storage
Experimental
I On-demand computation power, mainly represented as Evaluation

Conclusions
Virtual Machines
I Uses the internet to access, use and process the
resources
I Multitenancy
I Massive scalability

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing

Introduction
3 Cloud Computing
Virtualization
Characteristics: KVM
OpenVZ
I Typically is hosted on a server farm MapReduce
Hadoop
I Large amount of computers and resources HDFS

I The cloud provider offers a interface for users: Experimental


Evaluation
I Pay for certain amount of processing power, storage or Conclusions

computers
I Based on a business model
I The resources can be increased or decreased based on
demand

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing

Introduction
Nowadays, there are a lot of open source cloud 4 Cloud Computing
Virtualization
computing solutions for providing infrastructure KVM
OpenVZ
environments: MapReduce
Hadoop
HDFS

Experimental
I OpenStack Evaluation

Conclusions

I Eucalyptus
I OpenNebula
I CloudStack

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Cloud Computing

Introduction
5 Cloud Computing
Virtualization

OpenNebula is an open source solution that allows KVM


OpenVZ

easily deploy private/hybrid infrastructure clouds MapReduce


Hadoop
based on IaaS model. HDFS

Experimental
Evaluation

Conclusions
I Great flexibility regarding hypervisor usage
I Natively supports KVM, Xen and VMware ESXi
I Drivers provided by OpenNebula community provides
support for OpenVZ OS-level virtualization

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Virtualization

Introduction
Cloud Computing
6 Virtualization

Virtualization KVM
OpenVZ

I Logical representation of the computer using a software MapReduce


Hadoop

I Allows several operating systems run inside a single HDFS

Experimental
machine Evaluation

Conclusions
Main advantages:
I Effective use of hardware
I Virtual machine isolation
I Allows less physical hardware and less dissipation of heat

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Virtualization

KVM - Kernel-based Virtual Machine Introduction


Cloud Computing
Virtualization
7 KVM
KVM is a full-virtualization solution for the Linux kernel. OpenVZ

MapReduce
I Requires a processor with hardware virtualization Hadoop
HDFS
extension
Experimental
Evaluation
Full-virtualization:
Conclusions
I A layer, commonly called the hypervisor, exists between
the virtualized operating systems and the hardware
I This layer multiplexes the system resources between
competing operating system instances
I Provides total abstraction of physical hardware
I Does not require modification in the guest OS
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction
Virtualization

Introduction

OpenVZ Cloud Computing


Virtualization
KVM
8 OpenVZ

OpenVZ is an operating system-level virtualization technology MapReduce


Hadoop
based on the Linux kernel and operating system. HDFS

Experimental
Evaluation
OS-level Virtualization:
Conclusions
I Allows a physical server to run multiple isolated OS
instances, known as containers
I Technology which works at OS layer
I In practice, hypervisors works at the hardware abstraction
level and OS-level virtualization at the system call layer

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Introduction

Introduction
Cloud Computing
Virtualization
KVM
9 OpenVZ

MapReduce
This paper evaluates the performance of a Hadoop Hadoop
HDFS
MapReduce cluster on a OpenNebula cloud under Experimental

two different types of virtualization: full virtualization Evaluation

Conclusions
and operating system-level virtualization.

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
MapReduce

Introduction
Cloud Computing
Virtualization
KVM

MapReduce OpenVZ

10 MapReduce
Hadoop
I Programming model that works on large datasets HDFS

I Many organizations use MapReduce (MR) model for Experimental


Evaluation
computing when they have huge datasets and need to Conclusions
process them within short time
I Works by breaking the processing into two phases: the
map phase and the reduce phase

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
MapReduce

Introduction
Cloud Computing
Virtualization
KVM
Map phase: OpenVZ

11 MapReduce
I processes the input in the form of key/value pairs and Hadoop
HDFS
generate intermediate key/value pairs Experimental
Evaluation

Conclusions
Reduce phase:
I process all intermediate values associated with the same
intermediate key generated by the Map function

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
MapReduce

Introduction
Cloud Computing
Virtualization

Hadoop KVM
OpenVZ

MapReduce
I Hadoop is a distributed programming framework and an 12 Hadoop

execution environment for MapReduce programs HDFS

Experimental
I A MR job consists of multiple map and reduce tasks that Evaluation

are scheduled to run in the Hadoop cluster nodes Conclusions

I There are two types of nodes that control the job execution
process:
I A JobTracker
I A number of TaskTrackers

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
MapReduce

Introduction

Hadoop Distributed File System - HDFS Cloud Computing


Virtualization
KVM

I HDFS is the main storage system used by Hadoop OpenVZ

MapReduce
I HDFS creates multiple replicas of data blocks and Hadoop
13 HDFS
distributes them among the cluster nodes Experimental
Evaluation
I All data is stored as HDFS files composed of datablocks of
Conclusions
fixed size (64MB) distributed across multiple nodes
I Two tipe of nodes: a NameNode and a number of
DataNodes
I Namenodes maintains the metadata about the files and
directory tree
I DataNodes store the data blocks themselves

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
To evaluate the performance of Hadoop cluster on each Cloud Computing

virtualization platform, we propose the establishment of two Virtualization


KVM

private OpenNebula clouds OpenVZ

MapReduce
I 2x IBM BladeCenter HS21
Hadoop
HDFS
I Intel Xeon CPUs E5-2620 of 2.00GHz (with 6 cores and HT
14 Experimental
technology in each) Evaluation
I 48GB of RAM Conclusions
I Connected to a SAN via Fibre Channel
I Running Ubuntu GNU/Linux 14.04.1 LTS amd64
I OpenNebula 4.8.0
I Each Hadoop Cluster consists of 6 VMs
I 2 vCPUs, 2GB of vRAM, 1GB of swap, 10GB of disk
I Ubuntu GNU/Linux 12.04 amd64
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
I QEMU/KVM 2.0.0 OpenVZ

I For best performance, VM used KVM VirtIO paravirtualized MapReduce


Hadoop
drives for disk and network HDFS

I OpenVZ kernel 2.6.32-openvz-amd64-042stab093.5 15 Experimental


Evaluation
I Hadoop 1.2.1 Conclusions

I Oracle Java 1.7.0 45


I HDFS size: 60GB (6 x 10GB per virtual machine)
I HDFS block size: 64MB
I HDFS replication factor: 3

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
WordCount Cloud Computing
Virtualization

I Application that reads text files as input and computes the KVM
OpenVZ

number of occorences of each word in a file MapReduce


Hadoop
I The WordCount already comes with the Hadoop default HDFS

installation and is widely used as a method of comparing 16 Experimental


Evaluation
performance between different Hadoop clusters Conclusions
I Input files:
I 64MB, 128MB, 256MB, 512MB, 1GB and 2GB
I Generated from the concatenation of random text files
downloaded from Project Gutenberg
I OpenVZ reached the lowest execution time of WordCount
for all input data sizes:
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
Hadoop
HDFS

17 Experimental
Evaluation

Conclusions

The 9th International


Figure: WordCount Results Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

TeraSort Introduction
Cloud Computing
Virtualization
I The goal of TeraSort benchmark is to sort certain volume KVM
OpenVZ
of data as quickly as possible
MapReduce
I It is a benchmark that combines the use of HDFS layer Hadoop
HDFS
and MapReduce layer 18 Experimental
Evaluation
I TeraSort consists of 3 MR applications:
Conclusions
I TeraGen is a MR program to generate the data
I TeraSort samples the input data and uses MR to sort the
data into a total order
I TeraValidate is a MR program that validates the output is
sorted
I We used variable sizes for generation of input data
through TeraGen: 512MB, 1GB and 2GB
The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
Hadoop
HDFS

19 Experimental
Evaluation

Conclusions

The 9th International


Figure: TeraSort Results Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ
TeraSort MapReduce
Hadoop
I OpenVZ performs better than KVM in the TeraSort and HDFS

TeraValidate benchmarks 20 Experimental


Evaluation
I TeraSort and TeraValidate are a cpu-intensive and I/O
Conclusions
reading-intensive tests
I KVM outperforms OpenVZ in the all TeraGen tests
I TeraGen is a I/O writing-intensive test

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing

TestDFSIO Virtualization
KVM
OpenVZ
I TestDFSIO is a read/write test to HDFS MapReduce
Hadoop
I Useful to perform stress tests in the HDFS, to find HDFS

bottlenecks and evaluate the cluster I/O rate 21 Experimental


Evaluation
I TestDFSIO consists of 2 tests: Conclusions
I The first, generates and writes the files in a HDFS directory
I The second, reads the files created by the first run and
performs measurements
I In our runs, we inform the TestDFSIO to create 10 files of
100MB in HDFS

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
Hadoop
HDFS

22 Experimental
Evaluation

Conclusions

The 9th International


Conference for Internet
Technology and Secured
Figure: TestDFSIO Results Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
TestDFSIO Hadoop
HDFS
I OpenVZ performance in writing tests was much lower than 23 Experimental
Evaluation
KVM
Conclusions
I Although, in the reading tests OpenVZ performs better
than KVM

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
NNBench KVM
OpenVZ

I This test generates various requests to the HDFS with MapReduce


Hadoop
normally very small payloads for the purpose of stressing it HDFS

I The benchmark can simulate requests for creating, 24 Experimental


Evaluation
reading, renaming and deleting files on HDFS Conclusions

I In our implementation, NNBench created 1000 files using


6 maps and 2 reducers
I OpenVZ experienced better performance in the sequential
creation of files than KVM

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
Hadoop
HDFS

25 Experimental
Evaluation

Conclusions

Figure: NNBench Results The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
MRBench KVM
OpenVZ

I The MRBench loops a small job a number of times MapReduce


Hadoop
I CPU-intensive and network-intensive tests HDFS

26 Experimental
I It put its focus on the MapReduce layer as its impact on Evaluation

the HDFS layer is very limited Conclusions

I We ran MRBench in order to run a loop of 50 small test


jobs
I OpenVZ completed the test run in less time.
I Almost 50% faster

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
Hadoop
HDFS

27 Experimental
Evaluation

Conclusions

Figure: MRBench Results The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
Pi KVM
OpenVZ

I The Hadoop Pi benchmark is a MapReduce program that MapReduce


Hadoop
estimates Pi using monte-carlo method HDFS

I This benchmark is focused on computation and involve 28 Experimental


Evaluation
nearly no storage I/O or network traffic Conclusions

I In our runs, Pi calculated 10 billions samples spread


across 6 maps tasks
I OpenVZ completed the test run in less time.
I More then 2x faster

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Experimental Evaluation

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
Hadoop
HDFS

29 Experimental
Evaluation

Conclusions

Figure: Pi Results The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Conclusions

I OpenVZ performs better than KVM in the CPU intensive


Introduction
tests, in both MR layer or HDFS layer, like WordCount, Cloud Computing

TeraSort, TeraValidate, MRBench and Pi. Virtualization


KVM
I The CPU overhead in OpenVZ is much smaller than KVM OpenVZ

CPU overhead. MapReduce


Hadoop

I Openvz too reachs better results than KVM on I/O reading HDFS

Experimental
tests as showed in the values of Read Throughput and Evaluation

Read Average I/O Rate operations of DFSIOtest 30 Conclusions

benchmark
I However, OpenVZ performs worst than KVM in I/O writing
tests of large files in TeraGen and reached low rates of
Write Throughput and Write Average I/O in TestDFSIO
I But, in the sequential creation of inumerous small files in
NNBench test the time elapsed in KVM run was almost
twice of OpenVZ time The 9th International
Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Conclusions

Introduction
Cloud Computing
Virtualization
KVM
OpenVZ

MapReduce
I By using OpenVZ, an Hadoop cluster can achieve a high Hadoop
HDFS
performance on virtualized systems when running jobs Experimental
that use intensively CPU, network and I/O reading. Jobs Evaluation

that perform intense disk writing operations should be 31 Conclusions

executed with caution or executed natively

The 9th International


Conference for Internet
Technology and Secured
Transactions
31 (ICITST-2014)
Thank you!

pedro.roger@alu.ufc.br gisele@lia.ufc.br

View publication stats

Вам также может понравиться