Академический Документы
Профессиональный Документы
Культура Документы
CORONA
nagarjuna@outlook.com
What is happening in
FaceBook
1,000
>
What is happening in
FaceBook
Largest
More
datawarehouseNow = 2500 X
datawarehouse past
nagarjuna@outlook.com
Limitations of Hadoop
MR scheduling
Job Tracker
Responsibilities
Managing Cluster
Resources
Limitations
Job Tracker unable to
handle dual
responsibilities
adequately
nagarjuna@outlook.com
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
Pull based
scheduling
Task trackers
provide a heartbeat
status to the job
tracker in order to
get tasks to run.
This is periodic
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
STATIC SLOT-BASED
RESOURCE
MANAGEMENT
a MapReduce cluster is
divided into a fixed
number of map and
reduce slots based on a
static configuration.
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
Job tracker design
required hard
downtime (all running
jobs are killed) during a
software upgrade
Every software
upgrade resulted in
significant wasted
computation.
Limitations of Hadoop
MR scheduling
nagarjuna@outlook.com
Another problem:
Traditional analytic
databases have
advanced resourcebased scheduling for a
long time. Hadoop needs
this.
A better Scheduling
Frame Work
Better
CORONA
nagarjuna@outlook.com
Cluster Manager
Track nodes and
free resources in
the cluster
Job Tracker
A dedicated job
tracker for each
and every job
Client process
separate process
in the cluster.
CORONA
Push based
implementations
Cluster manager
gets resource
requests from Job
Tracker
CM pushes back
resource grants
back to Job Tracker
Job Tracker then
creates tasks and
pushes to task
trackers for
execution.
No Periodic Heat-Beat.
Scheduling latency is
minimized.
nagarjuna@outlook.com
Cluster Manager
doesnt track the
CORONA
progress of jobs.
Cluster Manager is
agnostic abt
MapReduce
nagarjuna@outlook.com
Benefits of Corona
Greater
Lower
No
scalability
Latency
downtime upgrades
Better
resource management
nagarjuna@outlook.com
nagarjuna@outlook.com
Utilization
nagarjuna@outlook.com
improvements in
Scheduling fairness
Job Latency
nagarjuna@outlook.com
nagarjuna@outlook.com
Why Not
YARN
nagarjuna@outlook.com
Corona Usage
Storage
: 1oo PB of data
Analyzes
: 105Tb/30 minutes
nagarjuna@outlook.com
similar concept
More abt Avatar :
http://gigaom.com/cloud/how-facebook-keeps-100-petabytes-of-had
oop-data-online
/
https://
www.facebook.com/notes/facebook-engineering/under-the-hood-had
nagarjuna@outlook.com
Corona : Concerns
But
Those
Solutions
What
networks so fast
Limitation with present Arch :
All the machines of the cluster shud be close
enough
nagarjuna@outlook.com
Solutions
Feasibility
Introducing tens of milliseconds of delay
nagarjuna@outlook.com
Prism
nagarjuna@outlook.com
Prism
Can
Not
nagarjuna@outlook.com
Prism Status
Still
in development
nagarjuna@outlook.com
October
http://www.theregister.co.uk/2009/10/23
/google_spanner
/
Google : Google Spanner instamatic
redundancy for 10 million servers?
GoogleSpanner
Prism
similar to Spanner ?
Spanner,
nagarjuna@outlook.com