Assn - No:1 Cloud Computing Assignment 13.10.2019

1
ASSN.NO:1 CLOUD COMPUTING ASSIGNMENT

13.10.2019
MAP REDUCE:
MapReduce is a core component of the Apache Hadoop software framework.
Hadoop enables resilient, distributed processing of massive unstructured data

sets across commodity computer clusters, in which each node of the cluster
includes its own storage. MapReduce serves two essential functions: it filters
and parcels out work to various nodes within the cluster or map, a function
sometimes referred to as the mapper, and it organizes and reduces the results
from each node into a cohesive answer to a query, referred to as the reducer.
How MapReduce works:
The original version of MapReduce involved several component daemons,

including:
 JobTracker -- the master node that manages all the jobs and resources in a
cluster;
 TaskTrackers -- agents deployed to each machine in the cluster to run the
map and reduce tasks; and
 JobHistory Server -- a component that tracks completed jobs and is typically
deployed as a separate function or with JobTracker.
With the introduction of MapReduce and Hadoop version 2, previous

JobTracker and TaskTracker daemons have been replaced with components of
Yet Another Resource Negotiator (YARN), called ResourceManager and
NodeManager.
2
 ResourceManager runs on a master node and handles the submission and

scheduling of jobs on the cluster. It also monitors jobs and allocates
resources.
 NodeManager runs on slave nodes and interoperates with Resource Manager
to run tasks and track resource usage. NodeManager can employ other
daemons to assist with task execution on the slave node.
To distribute input data and collate results, MapReduce operates in

parallel across massive cluster sizes. Because cluster size doesn't affect a
processing job's final results, jobs can be split across almost any number of
servers. Therefore, MapReduce and the overall Hadoop framework simplify
software development.
MapReduce is available in several languages, including C, C++, Java, Ruby,

Perl and Python. Programmers can use MapReduce libraries to create tasks
without dealing with communication or coordination between nodes.
MapReduce is also fault-tolerant, with each node periodically reporting its

status to a master node. If a node doesn't respond as expected, the master node
reassigns that piece of the job to other available nodes in the cluster. This
creates resiliency and makes it practical for MapReduce to run on inexpensive
commodity servers.
MapReduce examples and uses
The power of MapReduce is in its ability to tackle huge data sets by distributing
processing across many nodes, and then combining or reducing the results of
those nodes.
As a basic example, users could list and count the number of times every word
appears in a novel as a single server application, but that is time-consuming. By
contrast, users can split the task among 26 people, so each takes a page, writes a
word on a separate sheet of paper and takes a new page when they're finished.
This is the map aspect of MapReduce. And if a person leaves, another person
takes his or her place. This exemplifies MapReduce's fault-tolerant element.
3
When all the pages are processed, users sort their single-word pages into 26
boxes, which represent the first letter of each word. Each user takes a box and
sorts each word in the stack alphabetically. The number of pages with the same
word is an example of the reduce aspect of MapReduce.
There is a broad range of real-world uses for MapReduce involving complex

and seemingly unrelated data sets. For example, a social networking site could
use MapReduce to determine users' potential friends, colleagues and other
contacts based on site activity, names, locations, employers and many other data
elements. A booking website could use MapReduce to examine the search
criteria and historical behaviors of users, and can create customized offerings
for each. An industrial facility could collect equipment data from different
sensors across the installation and use MapReduce to tailor maintenance
schedules or predict equipment failures to improve overall uptime and cost-
savings.
MapReduce services and alternatives
One challenge with MapReduce is the infrastructure it requires to run. Many

businesses that could benefit from big data tasks can't sustain the capital and
overhead needed for such an infrastructure. As a result, some organizations rely
on public cloud services for Hadoop and MapReduce, which offer enormous
scalability with minimal capital costs or maintenance overhead.
For example, Amazon Web Services (AWS) provides Hadoop as a service

through its Amazon Elastic MapReduce (EMR) offering. Microsoft Azure
offers its HDInsight service, which enables users to provision Hadoop, Apache
Spark and other clusters for data processing tasks. Google Cloud Platform
provides its Cloud Dataproc service to run Spark and Hadoop clusters.
For organizations that prefer to build and maintain private, on-premises big data
infrastructures, Hadoop and MapReduce represent only one option.
Organizations can opt to deploy other platforms, such as Apache Spark, High-
Performance Computing Cluster and Hydra. The big data framework an
enterprise chooses will depend on the types of processing tasks required,
4
supported programming languages, and performance and infrastructure

demands.

Assn - No:1 Cloud Computing Assignment 13.10.2019

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Assn - No:1 Cloud Computing Assignment 13.10.2019

Загружено:

Авторское право:

Доступные форматы

1

ASSN.NO:1 CLOUD COMPUTING ASSIGNMENT

MapReduce is a core component of the Apache Hadoop software framework.

Hadoop enables resilient, distributed processing of massive unstructured data

How MapReduce works:

The original version of MapReduce involved several component daemons,

With the introduction of MapReduce and Hadoop version 2, previous

 ResourceManager runs on a master node and handles the submission and

To distribute input data and collate results, MapReduce operates in

MapReduce is available in several languages, including C, C++, Java, Ruby,

MapReduce is also fault-tolerant, with each node periodically reporting its

MapReduce examples and uses

There is a broad range of real-world uses for MapReduce involving complex

MapReduce services and alternatives

One challenge with MapReduce is the infrastructure it requires to run. Many

For example, Amazon Web Services (AWS) provides Hadoop as a service

supported programming languages, and performance and infrastructure

Вам также может понравиться