Вы находитесь на странице: 1из 190

Distributed Computing

Varun Thacker
Linux Users Group Manipal

April 8, 2010

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

1 / 42

Outline I
1

Introduction
LUG Manipal
Points To Remember

Distributed Computing
Distributed Computing
Technologies to be covered
Idea
Data !!
Why Distributed Computing is Hard
Why Distributed Computing is Important
Three Common Distributed Architectures

Distributed File System


GFS
What a Distributed File System Does
Google File System Architecture
GFS Architecture: Chunks
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

2 / 42

Outline II
GFS Architecture: Master
GFS: Life of a Read
GFS: Life of a Write
GFS: Master Failure
4

MapReduce
MapReduce
Do We Need It?
Bad News!
MapReduce
Map Reduce Paradigm
MapReduce Paradigm
Working
Working
Under the hood: Scheduling
Robustness
Hadoop
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

3 / 42

Outline III
Hadoop
What is Hadoop
Who uses Hadoop?
Mapper
Combiners
Reducer
Some Terminology
Job Distribution
6

Contact Information

Attribution

Copying
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

4 / 42

Who are we?

Linux Users Group Manipal

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

5 / 42

Who are we?

Linux Users Group Manipal


Life, Universe and FOSS!!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

5 / 42

Who are we?

Linux Users Group Manipal


Life, Universe and FOSS!!
Believers of Knowledge Sharing

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

5 / 42

Who are we?

Linux Users Group Manipal


Life, Universe and FOSS!!
Believers of Knowledge Sharing
Most technologically focused group in University

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

5 / 42

Who are we?

Linux Users Group Manipal


Life, Universe and FOSS!!
Believers of Knowledge Sharing
Most technologically focused group in University
LUG Manipal is a non profit Group alive only on voluntary work!!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

5 / 42

Who are we?

Linux Users Group Manipal


Life, Universe and FOSS!!
Believers of Knowledge Sharing
Most technologically focused group in University
LUG Manipal is a non profit Group alive only on voluntary work!!
http://lugmanipal.org

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

5 / 42

Points To Remember!!!

If you have problem(s) dont hesitate to ask

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

6 / 42

Points To Remember!!!

If you have problem(s) dont hesitate to ask


Slides are based on Documentation so discussions are really
important, slides are for later reference!!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

6 / 42

Points To Remember!!!

If you have problem(s) dont hesitate to ask


Slides are based on Documentation so discussions are really
important, slides are for later reference!!
Please dont consider sessions as Class( Classes are boring !! )

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

6 / 42

Points To Remember!!!

If you have problem(s) dont hesitate to ask


Slides are based on Documentation so discussions are really
important, slides are for later reference!!
Please dont consider sessions as Class( Classes are boring !! )
Speaker is just like any person sitting next to you

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

6 / 42

Points To Remember!!!

If you have problem(s) dont hesitate to ask


Slides are based on Documentation so discussions are really
important, slides are for later reference!!
Please dont consider sessions as Class( Classes are boring !! )
Speaker is just like any person sitting next to you
Documentation is really important

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

6 / 42

Points To Remember!!!

If you have problem(s) dont hesitate to ask


Slides are based on Documentation so discussions are really
important, slides are for later reference!!
Please dont consider sessions as Class( Classes are boring !! )
Speaker is just like any person sitting next to you
Documentation is really important
Google is your friend

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

6 / 42

Points To Remember!!!

If you have problem(s) dont hesitate to ask


Slides are based on Documentation so discussions are really
important, slides are for later reference!!
Please dont consider sessions as Class( Classes are boring !! )
Speaker is just like any person sitting next to you
Documentation is really important
Google is your friend
If you have questions after this workshop mail me or come to LUG
Manipals forums
http://forums.lugmanipal.org

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

6 / 42

Distributed Computing

Distributed Computing

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

7 / 42

Technologies to be covered

Distributed computing refers to the use of distributed systems to


solve computational problems on the distributed system.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

8 / 42

Technologies to be covered

Distributed computing refers to the use of distributed systems to


solve computational problems on the distributed system.
A distributed system consists of multiple computers that
communicate through a network.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

8 / 42

Technologies to be covered

Distributed computing refers to the use of distributed systems to


solve computational problems on the distributed system.
A distributed system consists of multiple computers that
communicate through a network.
MapReduce is a framework which implements the idea of a
distributed computing.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

8 / 42

Technologies to be covered

Distributed computing refers to the use of distributed systems to


solve computational problems on the distributed system.
A distributed system consists of multiple computers that
communicate through a network.
MapReduce is a framework which implements the idea of a
distributed computing.
GFS is the distributed file system on which distributed programs store
and process data in Google. Its free implementation is HDFS.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

8 / 42

Technologies to be covered

Distributed computing refers to the use of distributed systems to


solve computational problems on the distributed system.
A distributed system consists of multiple computers that
communicate through a network.
MapReduce is a framework which implements the idea of a
distributed computing.
GFS is the distributed file system on which distributed programs store
and process data in Google. Its free implementation is HDFS.
Hadoop is an open source framework written in Java which
implements the MapReduce technology.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

8 / 42

Idea

While the storage capacities of hard drives have increased massively


over the years, access speedsthe rate at which data can be read
from drives have not kept up.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

9 / 42

Idea

While the storage capacities of hard drives have increased massively


over the years, access speedsthe rate at which data can be read
from drives have not kept up.
One terabyte drives are the norm, but the transfer speed is around
100 MB/s, so it takes more than two and a half hours to read all the
data off the disk.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

9 / 42

Idea

While the storage capacities of hard drives have increased massively


over the years, access speedsthe rate at which data can be read
from drives have not kept up.
One terabyte drives are the norm, but the transfer speed is around
100 MB/s, so it takes more than two and a half hours to read all the
data off the disk.
The obvious way to reduce the time is to read from multiple disks at
once. Imagine if we had 100 drives, each holding one hundredth of
the data. Working in parallel, we could read the data in under two
minutes.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

9 / 42

Data

We live in the data age.An IDC estimate put the size of the digital
universe at 0.18 zettabytes(?) in 2006.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

10 / 42

Data

We live in the data age.An IDC estimate put the size of the digital
universe at 0.18 zettabytes(?) in 2006.
And by 2011 there will be a tenfold growth to 1.8 zettabytes.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

10 / 42

Data

We live in the data age.An IDC estimate put the size of the digital
universe at 0.18 zettabytes(?) in 2006.
And by 2011 there will be a tenfold growth to 1.8 zettabytes.
1 zetabyte is one million petabytes, or one billion terabytes.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

10 / 42

Data

We live in the data age.An IDC estimate put the size of the digital
universe at 0.18 zettabytes(?) in 2006.
And by 2011 there will be a tenfold growth to 1.8 zettabytes.
1 zetabyte is one million petabytes, or one billion terabytes.
The New York Stock Exchange generates about one terabyte of new
trade data per day.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

10 / 42

Data

We live in the data age.An IDC estimate put the size of the digital
universe at 0.18 zettabytes(?) in 2006.
And by 2011 there will be a tenfold growth to 1.8 zettabytes.
1 zetabyte is one million petabytes, or one billion terabytes.
The New York Stock Exchange generates about one terabyte of new
trade data per day.
Facebook hosts approximately 10 billion photos, taking up one
petabyte of storage.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

10 / 42

Data

We live in the data age.An IDC estimate put the size of the digital
universe at 0.18 zettabytes(?) in 2006.
And by 2011 there will be a tenfold growth to 1.8 zettabytes.
1 zetabyte is one million petabytes, or one billion terabytes.
The New York Stock Exchange generates about one terabyte of new
trade data per day.
Facebook hosts approximately 10 billion photos, taking up one
petabyte of storage.
The Large Hadron Collider near Geneva produces about 15 petabytes
of data per year.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

10 / 42

Why Distributed Computing is Hard

Computers crash.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

11 / 42

Why Distributed Computing is Hard

Computers crash.
Network links crash.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

11 / 42

Why Distributed Computing is Hard

Computers crash.
Network links crash.
Talking is slow(even ethernet has 300 microsecond latency, during
which time your 2Ghz PC can do 600,000 cycles).

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

11 / 42

Why Distributed Computing is Hard

Computers crash.
Network links crash.
Talking is slow(even ethernet has 300 microsecond latency, during
which time your 2Ghz PC can do 600,000 cycles).
Bandwidth is finite.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

11 / 42

Why Distributed Computing is Hard

Computers crash.
Network links crash.
Talking is slow(even ethernet has 300 microsecond latency, during
which time your 2Ghz PC can do 600,000 cycles).
Bandwidth is finite.
Internet scale: the computers and network are
heterogeneous,untrustworthy, and subject to change at any time.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

11 / 42

Why Distributed Computing is Important

Can be more reliable.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

12 / 42

Why Distributed Computing is Important

Can be more reliable.


Can be faster.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

12 / 42

Why Distributed Computing is Important

Can be more reliable.


Can be faster.
Can be cheaper ($30 million Cray versus 100 $1000 PCs).

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

12 / 42

Three Common Distributed Architectures

Hope: have N computers do separate pieces of work. Speed-up < N.


Probability of failure = 1(1 p)N Np. (p = probability of
individual crash).

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

13 / 42

Three Common Distributed Architectures

Hope: have N computers do separate pieces of work. Speed-up < N.


Probability of failure = 1(1 p)N Np. (p = probability of
individual crash).
Replication: have N computers do the same thing. Speed-up < 1.
Probability of failure = p N .

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

13 / 42

Three Common Distributed Architectures

Hope: have N computers do separate pieces of work. Speed-up < N.


Probability of failure = 1(1 p)N Np. (p = probability of
individual crash).
Replication: have N computers do the same thing. Speed-up < 1.
Probability of failure = p N .
Master-servant: have 1 computer hand out pieces of work to N-1
servants, and re-hand out pieces of work if servants fail. Speed-up
< N 1. Probability of failure p.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

13 / 42

GFS

GFS

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

14 / 42

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

15 / 42

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.
Allow distributed access to files.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

15 / 42

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.
Allow distributed access to files.
Files are stored distributedly.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

15 / 42

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.
Allow distributed access to files.
Files are stored distributedly.
If you just do #1 and #2, you are a network file system.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

15 / 42

What a Distributed File System Does

Usual file system stuff: create, read, move & find files.
Allow distributed access to files.
Files are stored distributedly.
If you just do #1 and #2, you are a network file system.
To do #3, its a good idea to also provide fault tolerance.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

15 / 42

GFS Architecture

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

16 / 42

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may be


smaller).

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

17 / 42

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may be


smaller).
Each chunk is identified by an unique 64-bit id.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

17 / 42

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may be


smaller).
Each chunk is identified by an unique 64-bit id.
Chunks are stored as regular files on local disks.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

17 / 42

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may be


smaller).
Each chunk is identified by an unique 64-bit id.
Chunks are stored as regular files on local disks.
By default, each chunk is stored thrice, preferably on more than one
rack.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

17 / 42

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may be


smaller).
Each chunk is identified by an unique 64-bit id.
Chunks are stored as regular files on local disks.
By default, each chunk is stored thrice, preferably on more than one
rack.
To protect data integrity, each 64 KB block gets a 32 bit checksum
that is checked on all reads.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

17 / 42

GFS Architecture: Chunks

Files are divided into 64 MB chunks (last chunk of a file may be


smaller).
Each chunk is identified by an unique 64-bit id.
Chunks are stored as regular files on local disks.
By default, each chunk is stored thrice, preferably on more than one
rack.
To protect data integrity, each 64 KB block gets a 32 bit checksum
that is checked on all reads.
When idle, a chunkserver scans inactive chunks for corruption.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

17 / 42

GFS Architecture: Master

Stores all metadata (namespace, access control).

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

18 / 42

GFS Architecture: Master

Stores all metadata (namespace, access control).


Stores (file > chunks) and (chunk > location) mappings.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

18 / 42

GFS Architecture: Master

Stores all metadata (namespace, access control).


Stores (file > chunks) and (chunk > location) mappings.
Clients get chunk locations for a file from the master, and then talk
directly to the chunkservers for the data.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

18 / 42

GFS Architecture: Master

Stores all metadata (namespace, access control).


Stores (file > chunks) and (chunk > location) mappings.
Clients get chunk locations for a file from the master, and then talk
directly to the chunkservers for the data.
Advantage of single master simplicity.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

18 / 42

GFS Architecture: Master

Stores all metadata (namespace, access control).


Stores (file > chunks) and (chunk > location) mappings.
Clients get chunk locations for a file from the master, and then talk
directly to the chunkservers for the data.
Advantage of single master simplicity.
Disadvantages of single master:

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

18 / 42

GFS Architecture: Master

Stores all metadata (namespace, access control).


Stores (file > chunks) and (chunk > location) mappings.
Clients get chunk locations for a file from the master, and then talk
directly to the chunkservers for the data.
Advantage of single master simplicity.
Disadvantages of single master:
Metadata operations are bottlenecked.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

18 / 42

GFS Architecture: Master

Stores all metadata (namespace, access control).


Stores (file > chunks) and (chunk > location) mappings.
Clients get chunk locations for a file from the master, and then talk
directly to the chunkservers for the data.
Advantage of single master simplicity.
Disadvantages of single master:
Metadata operations are bottlenecked.
Maximum Number of files limited by masters memory.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

18 / 42

GFS: Life of a Read

Client program asks for 1 Gb of file A starting at the 200 millionth


byte.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

19 / 42

GFS: Life of a Read

Client program asks for 1 Gb of file A starting at the 200 millionth


byte.
Client GFS library asks master for chunks 3, ... 16387 of file A.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

19 / 42

GFS: Life of a Read

Client program asks for 1 Gb of file A starting at the 200 millionth


byte.
Client GFS library asks master for chunks 3, ... 16387 of file A.
Master responds with all of the locations of chunks 2, ... 20000 of file
A.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

19 / 42

GFS: Life of a Read

Client program asks for 1 Gb of file A starting at the 200 millionth


byte.
Client GFS library asks master for chunks 3, ... 16387 of file A.
Master responds with all of the locations of chunks 2, ... 20000 of file
A.
Client caches all of these locations (with their cache time-outs)

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

19 / 42

GFS: Life of a Read

Client program asks for 1 Gb of file A starting at the 200 millionth


byte.
Client GFS library asks master for chunks 3, ... 16387 of file A.
Master responds with all of the locations of chunks 2, ... 20000 of file
A.
Client caches all of these locations (with their cache time-outs)
Client reads chunk 2 from the closest location.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

19 / 42

GFS: Life of a Read

Client program asks for 1 Gb of file A starting at the 200 millionth


byte.
Client GFS library asks master for chunks 3, ... 16387 of file A.
Master responds with all of the locations of chunks 2, ... 20000 of file
A.
Client caches all of these locations (with their cache time-outs)
Client reads chunk 2 from the closest location.
Client reads chunk 3 from the closest location.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

19 / 42

GFS: Life of a Read

Client program asks for 1 Gb of file A starting at the 200 millionth


byte.
Client GFS library asks master for chunks 3, ... 16387 of file A.
Master responds with all of the locations of chunks 2, ... 20000 of file
A.
Client caches all of these locations (with their cache time-outs)
Client reads chunk 2 from the closest location.
Client reads chunk 3 from the closest location.
...

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

19 / 42

GFS: Life of a Write

Client gets locations of chunk replicas as before.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

20 / 42

GFS: Life of a Write

Client gets locations of chunk replicas as before.


For each chunk, client sends the write data to nearest replica.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

20 / 42

GFS: Life of a Write

Client gets locations of chunk replicas as before.


For each chunk, client sends the write data to nearest replica.
This replica sends the data to the nearest replica to it that has not
yet received the data.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

20 / 42

GFS: Life of a Write

Client gets locations of chunk replicas as before.


For each chunk, client sends the write data to nearest replica.
This replica sends the data to the nearest replica to it that has not
yet received the data.
When all of the replicas have received the data, then it is safe for
them to actually write it.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

20 / 42

GFS: Life of a Write

Client gets locations of chunk replicas as before.


For each chunk, client sends the write data to nearest replica.
This replica sends the data to the nearest replica to it that has not
yet received the data.
When all of the replicas have received the data, then it is safe for
them to actually write it.
Tricky Details:

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

20 / 42

GFS: Life of a Write

Client gets locations of chunk replicas as before.


For each chunk, client sends the write data to nearest replica.
This replica sends the data to the nearest replica to it that has not
yet received the data.
When all of the replicas have received the data, then it is safe for
them to actually write it.
Tricky Details:
Master hands out a short term ( 1 minute) lease for a particular
replica to be the primary one.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

20 / 42

GFS: Life of a Write

Client gets locations of chunk replicas as before.


For each chunk, client sends the write data to nearest replica.
This replica sends the data to the nearest replica to it that has not
yet received the data.
When all of the replicas have received the data, then it is safe for
them to actually write it.
Tricky Details:
Master hands out a short term ( 1 minute) lease for a particular
replica to be the primary one.
This primary replica assigns a serial number to each mutation so that
every replica performs the mutations in the same order.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

20 / 42

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutation
log.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

21 / 42

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutation
log.
Both are replicated.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

21 / 42

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutation
log.
Both are replicated.
Master election and notification is implemented using an external lock
server.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

21 / 42

GFS: Master Failure

The Master stores its state via periodic checkpoints and a mutation
log.
Both are replicated.
Master election and notification is implemented using an external lock
server.
New master restores state from checkpoint and log.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

21 / 42

MapReduce

MapReduce

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

22 / 42

Do We Need It?

Yes: Otherwise some problems are too big.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

23 / 42

Do We Need It?

Yes: Otherwise some problems are too big.


Example: 20+ billion web pages x 20KB = 400+ terabytes

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

23 / 42

Do We Need It?

Yes: Otherwise some problems are too big.


Example: 20+ billion web pages x 20KB = 400+ terabytes
One computer can read 30-35 MB/sec from disk

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

23 / 42

Do We Need It?

Yes: Otherwise some problems are too big.


Example: 20+ billion web pages x 20KB = 400+ terabytes
One computer can read 30-35 MB/sec from disk
four months to read the web

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

23 / 42

Do We Need It?

Yes: Otherwise some problems are too big.


Example: 20+ billion web pages x 20KB = 400+ terabytes
One computer can read 30-35 MB/sec from disk
four months to read the web
Same problem with 1000 machines, < 3 hours

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

23 / 42

Bad News!

Bad News!!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination
recovering from machine failure (all the time!)

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination
recovering from machine failure (all the time!)
debugging

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination
recovering from machine failure (all the time!)
debugging
optimization

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination
recovering from machine failure (all the time!)
debugging
optimization
locality

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination
recovering from machine failure (all the time!)
debugging
optimization
locality
Bad news II: repeat for every problem you want to solve

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination
recovering from machine failure (all the time!)
debugging
optimization
locality
Bad news II: repeat for every problem you want to solve
Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

Bad News!

Bad News!!
communication and coordination
recovering from machine failure (all the time!)
debugging
optimization
locality
Bad news II: repeat for every problem you want to solve
Good News I and II: MapReduce and Hadoop!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

24 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:
automatic parallelization

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:
automatic parallelization
load balancing

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:
automatic parallelization
load balancing
network and disk transfer optimization

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:
automatic parallelization
load balancing
network and disk transfer optimization
handling of machine failures

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:
automatic parallelization
load balancing
network and disk transfer optimization
handling of machine failures
robustness

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:
automatic parallelization
load balancing
network and disk transfer optimization
handling of machine failures
robustness
Therfore we can write application level programs and let MapReduce
insulate us from many concerns.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

MapReduce

A simple programming model that applies to many large-scale


computing problems
Hide messy details in MapReduce runtime library:
automatic parallelization
load balancing
network and disk transfer optimization
handling of machine failures
robustness
Therfore we can write application level programs and let MapReduce
insulate us from many concerns.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

25 / 42

Map Reduce Paradigm

Read a lot of data

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

26 / 42

Map Reduce Paradigm

Read a lot of data


Map: extract something you care about from each record.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

26 / 42

Map Reduce Paradigm

Read a lot of data


Map: extract something you care about from each record.
Shuffle and Sort.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

26 / 42

Map Reduce Paradigm

Read a lot of data


Map: extract something you care about from each record.
Shuffle and Sort.
Reduce: aggregate, summarize, filter, or transform

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

26 / 42

Map Reduce Paradigm

Read a lot of data


Map: extract something you care about from each record.
Shuffle and Sort.
Reduce: aggregate, summarize, filter, or transform
Write the results.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

26 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).


For example, key = URL, value = HTML of the web page.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).


For example, key = URL, value = HTML of the web page.
Programmer specifies two primary methods:

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).


For example, key = URL, value = HTML of the web page.
Programmer specifies two primary methods:
Map: (k, v) > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).


For example, key = URL, value = HTML of the web page.
Programmer specifies two primary methods:
Map: (k, v) > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
Reduce: (k, <v1, v2,...,vn>) > <(k, v1), (k, v2),...,(k,
vn)>

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).


For example, key = URL, value = HTML of the web page.
Programmer specifies two primary methods:
Map: (k, v) > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
Reduce: (k, <v1, v2,...,vn>) > <(k, v1), (k, v2),...,(k,
vn)>
All v with same k are reduced together.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).


For example, key = URL, value = HTML of the web page.
Programmer specifies two primary methods:
Map: (k, v) > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
Reduce: (k, <v1, v2,...,vn>) > <(k, v1), (k, v2),...,(k,
vn)>
All v with same k are reduced together.
(Remember the invisible Shuffle and Sort step.)

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

MapReduce Paradigm

Basic data type: the key-value pair (k,v).


For example, key = URL, value = HTML of the web page.
Programmer specifies two primary methods:
Map: (k, v) > <(k1,v1), (k2,v2), (k3,v3),...,(kn,vn)>
Reduce: (k, <v1, v2,...,vn>) > <(k, v1), (k, v2),...,(k,
vn)>
All v with same k are reduced together.
(Remember the invisible Shuffle and Sort step.)

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

27 / 42

Working

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

28 / 42

Working

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

29 / 42

Under the hood: Scheduling


One master, many workers

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker
Considers locality of data to worker when assigning task

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker
Considers locality of data to worker when assigning task
Worker reads task input (often from local disk!)

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker
Considers locality of data to worker when assigning task
Worker reads task input (often from local disk!)
Worker produces R local files containing intermediate (k,v) pairs

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker
Considers locality of data to worker when assigning task
Worker reads task input (often from local disk!)
Worker produces R local files containing intermediate (k,v) pairs
Master assigns each reduce task to a free worker

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker
Considers locality of data to worker when assigning task
Worker reads task input (often from local disk!)
Worker produces R local files containing intermediate (k,v) pairs
Master assigns each reduce task to a free worker
Worker reads intermediate (k,v) pairs from map workers

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker
Considers locality of data to worker when assigning task
Worker reads task input (often from local disk!)
Worker produces R local files containing intermediate (k,v) pairs
Master assigns each reduce task to a free worker
Worker reads intermediate (k,v) pairs from map workers
Worker sorts & applies users Reduce op to produce the output

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Under the hood: Scheduling


One master, many workers
Input data split into M map tasks (typically 64 MB in size)
of output files)
Reduce phase partitioned into R reduce tasks (#
Tasks are assigned to workers dynamically
Master assigns each map task to a free worker
Considers locality of data to worker when assigning task
Worker reads task input (often from local disk!)
Worker produces R local files containing intermediate (k,v) pairs
Master assigns each reduce task to a free worker
Worker reads intermediate (k,v) pairs from map workers
Worker sorts & applies users Reduce op to produce the output
User may specify Partition: which intermediate keys to which Reducer
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

30 / 42

Robustness

One master, many workers.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.
Re-execute completed and in-progress map tasks.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.
Re-execute completed and in-progress map tasks.
Re-execute in-progress reduce tasks.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.
Re-execute completed and in-progress map tasks.
Re-execute in-progress reduce tasks.
Master assigns each map task to a free worker.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.
Re-execute completed and in-progress map tasks.
Re-execute in-progress reduce tasks.
Master assigns each map task to a free worker.
Master failure:

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.
Re-execute completed and in-progress map tasks.
Re-execute in-progress reduce tasks.
Master assigns each map task to a free worker.
Master failure:
State is checkpointed to replicated file system.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.
Re-execute completed and in-progress map tasks.
Re-execute in-progress reduce tasks.
Master assigns each map task to a free worker.
Master failure:
State is checkpointed to replicated file system.
New master recovers & continues.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Robustness

One master, many workers.


Detect failure via periodic heartbeats.
Re-execute completed and in-progress map tasks.
Re-execute in-progress reduce tasks.
Master assigns each map task to a free worker.
Master failure:
State is checkpointed to replicated file system.
New master recovers & continues.
Very Robust: lost 1600 of 1800 machines once, but finished
fine-Google.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

31 / 42

Hadoop

Hadoop

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

32 / 42

What is hadoop

Apache Hadoop is a Java software framework that supports


data-intensive distributed applications under a free license.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

33 / 42

What is hadoop

Apache Hadoop is a Java software framework that supports


data-intensive distributed applications under a free license.
Hadoop was inspired by Googles MapReduce and Google File System
(GFS) papers.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

33 / 42

What is hadoop

Apache Hadoop is a Java software framework that supports


data-intensive distributed applications under a free license.
Hadoop was inspired by Googles MapReduce and Google File System
(GFS) papers.
A Map/Reduce job usually splits the input data-set into independent
chunks which are processed by the map tasks in a completely parallel
manner.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

33 / 42

What is hadoop

Apache Hadoop is a Java software framework that supports


data-intensive distributed applications under a free license.
Hadoop was inspired by Googles MapReduce and Google File System
(GFS) papers.
A Map/Reduce job usually splits the input data-set into independent
chunks which are processed by the map tasks in a completely parallel
manner.
It is then made input to the reduce tasks.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

33 / 42

What is hadoop

Apache Hadoop is a Java software framework that supports


data-intensive distributed applications under a free license.
Hadoop was inspired by Googles MapReduce and Google File System
(GFS) papers.
A Map/Reduce job usually splits the input data-set into independent
chunks which are processed by the map tasks in a completely parallel
manner.
It is then made input to the reduce tasks.
The framework takes care of scheduling tasks, monitoring them and
re-executes the failed tasks.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

33 / 42

Who uses Hadoop?


Adobe

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.
Facebook

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.
Facebook
Google

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.
Facebook
Google
IBM

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.
Facebook
Google
IBM
Twitter

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.
Facebook
Google
IBM
Twitter
Yahoo!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.
Facebook
Google
IBM
Twitter
Yahoo!
The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Who uses Hadoop?


Adobe
AOL
Baidu - the leading Chinese language search engine
Cloudera, Inc - Cloudera provides commercial support and
professional training for Hadoop.
Facebook
Google
IBM
Twitter
Yahoo!
The New York Times,Last.fm,Hulu,LinkedIn

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

34 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.
The framework then calls map method for each key/value pair.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.
The framework then calls map method for each key/value pair.
Applications can use the Reporter to report progress.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.
The framework then calls map method for each key/value pair.
Applications can use the Reporter to report progress.
All intermediate values associated with a given output key are
subsequently grouped by the framework, and passed to the
Reducer(s) to determine the final output.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.
The framework then calls map method for each key/value pair.
Applications can use the Reporter to report progress.
All intermediate values associated with a given output key are
subsequently grouped by the framework, and passed to the
Reducer(s) to determine the final output.
The intermediate, sorted outputs are always stored in a simple
(key-len, key, value-len, value) format.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.
The framework then calls map method for each key/value pair.
Applications can use the Reporter to report progress.
All intermediate values associated with a given output key are
subsequently grouped by the framework, and passed to the
Reducer(s) to determine the final output.
The intermediate, sorted outputs are always stored in a simple
(key-len, key, value-len, value) format.
The number of maps is usually driven by the total size of the inputs,
that is, the total number of blocks of the input files.
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.
The framework then calls map method for each key/value pair.
Applications can use the Reporter to report progress.
All intermediate values associated with a given output key are
subsequently grouped by the framework, and passed to the
Reducer(s) to determine the final output.
The intermediate, sorted outputs are always stored in a simple
(key-len, key, value-len, value) format.
The number of maps is usually driven by the total size of the inputs,
that is, the total number of blocks of the input files.
Users can optionally specify a combiner to perform local aggregation
of the intermediate outputs.
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Mapper
Mapper maps input key/value pairs to a set of intermediate key/value
pairs.
The Hadoop Map/Reduce framework spawns one map task for each
InputSplit generated by the InputFormat.
Output pairs do not need to be of the same types as input pairs.
Mapper implementations are passed the JobConf for the job.
The framework then calls map method for each key/value pair.
Applications can use the Reporter to report progress.
All intermediate values associated with a given output key are
subsequently grouped by the framework, and passed to the
Reducer(s) to determine the final output.
The intermediate, sorted outputs are always stored in a simple
(key-len, key, value-len, value) format.
The number of maps is usually driven by the total size of the inputs,
that is, the total number of blocks of the input files.
Users can optionally specify a combiner to perform local aggregation
of the intermediate outputs.
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

35 / 42

Combiners

When the map operation outputs its pairs they are already available
in memory.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

36 / 42

Combiners

When the map operation outputs its pairs they are already available
in memory.
If a combiner is used then the map key-value pairs are not
immediately written to the output.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

36 / 42

Combiners

When the map operation outputs its pairs they are already available
in memory.
If a combiner is used then the map key-value pairs are not
immediately written to the output.
They are collected in lists, one list per each key value.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

36 / 42

Combiners

When the map operation outputs its pairs they are already available
in memory.
If a combiner is used then the map key-value pairs are not
immediately written to the output.
They are collected in lists, one list per each key value.
When a certain number of key-value pairs have been written, this
buffer is flushed by passing all the values of each key to the combiners
reduce method and outputting the key-value pairs of the combine
operation as if they were created by the original map operation.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

36 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.
Reducer implementations are passed the JobConf for the job.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.
Reducer implementations are passed the JobConf for the job.
The framework then calls reduce(WritableComparable, Iterator,
OutputCollector, Reporter) method for each key, (list of values) pair
in the grouped inputs.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.
Reducer implementations are passed the JobConf for the job.
The framework then calls reduce(WritableComparable, Iterator,
OutputCollector, Reporter) method for each key, (list of values) pair
in the grouped inputs.
The reducer has 3 primary phases:

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.
Reducer implementations are passed the JobConf for the job.
The framework then calls reduce(WritableComparable, Iterator,
OutputCollector, Reporter) method for each key, (list of values) pair
in the grouped inputs.
The reducer has 3 primary phases:
Shuffle:Input to the Reducer is the sorted output of the mappers. In
this phase the framework fetches the relevant partition of the output
of all the mappers, via HTTP.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.
Reducer implementations are passed the JobConf for the job.
The framework then calls reduce(WritableComparable, Iterator,
OutputCollector, Reporter) method for each key, (list of values) pair
in the grouped inputs.
The reducer has 3 primary phases:
Shuffle:Input to the Reducer is the sorted output of the mappers. In
this phase the framework fetches the relevant partition of the output
of all the mappers, via HTTP.
Sort:The framework groups Reducer inputs by keys (since different
mappers may have output the same key) in this stage.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.
Reducer implementations are passed the JobConf for the job.
The framework then calls reduce(WritableComparable, Iterator,
OutputCollector, Reporter) method for each key, (list of values) pair
in the grouped inputs.
The reducer has 3 primary phases:
Shuffle:Input to the Reducer is the sorted output of the mappers. In
this phase the framework fetches the relevant partition of the output
of all the mappers, via HTTP.
Sort:The framework groups Reducer inputs by keys (since different
mappers may have output the same key) in this stage.
Reduce:In this phase the reduce method is called for each <key, (list
of values)> pair in the grouped inputs.
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Reducer
Reducer reduces a set of intermediate values which share a key to a
smaller set of values.
Reducer implementations are passed the JobConf for the job.
The framework then calls reduce(WritableComparable, Iterator,
OutputCollector, Reporter) method for each key, (list of values) pair
in the grouped inputs.
The reducer has 3 primary phases:
Shuffle:Input to the Reducer is the sorted output of the mappers. In
this phase the framework fetches the relevant partition of the output
of all the mappers, via HTTP.
Sort:The framework groups Reducer inputs by keys (since different
mappers may have output the same key) in this stage.
Reduce:In this phase the reduce method is called for each <key, (list
of values)> pair in the grouped inputs.
The generated ouput is a new value.
Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

37 / 42

Some Terminology

Job A full program - an execution of a Mapper and Reducer


across a data set.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

38 / 42

Some Terminology

Job A full program - an execution of a Mapper and Reducer


across a data set.
Task An execution of a Mapper or a Reducer on a slice of data

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

38 / 42

Some Terminology

Job A full program - an execution of a Mapper and Reducer


across a data set.
Task An execution of a Mapper or a Reducer on a slice of data
Task Attempt A particular instance of an attempt to execute a task
on a machine.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

38 / 42

Job Distribution

MapReduce programs are contained in a Java jar file + an XML file


containing serialized program configuration options.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

39 / 42

Job Distribution

MapReduce programs are contained in a Java jar file + an XML file


containing serialized program configuration options.
Running a MapReduce job places these files into the HDFS and
notifies TaskTrackers where to retrieve the relevant program code.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

39 / 42

Job Distribution

MapReduce programs are contained in a Java jar file + an XML file


containing serialized program configuration options.
Running a MapReduce job places these files into the HDFS and
notifies TaskTrackers where to retrieve the relevant program code.
Data Distribution: Implicit in design of MapReduce!

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

39 / 42

Contact Information

Varun Thacker
varunthacker1989@gmail.com
http:
//varunthacker.wordpress.com

Varun Thacker (LUG Manipal)

Linux Users Group Manipal


http://lugmanipal.org
http://forums.lugmanipal.org

Distributed Computing

April 8, 2010

40 / 42

Attribution

Google
Under the Creative Commons Attribution-Share Alike 2.5 Generic.

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

41 / 42

Copying

Creative Commons Attribution-Share Alike 2.5 India License


http://creativecommons.org/licenses/by-sa/2.5/in/

Varun Thacker (LUG Manipal)

Distributed Computing

April 8, 2010

42 / 42

Вам также может понравиться