Вы находитесь на странице: 1из 8

YARN

YARN: Yet Another Resource Negotiator. OS which acts on multiple machine together.
Resource manager(Name node): Name node on which resource manager is running.
Node manager(Data node): Data node on which node manager is running.
Master slave architecture: Client -> R M -> Name Node -> A container of resources is allocated and
create application master -> If resources are not sufficient then A M will talk to RM and try to get
resource(Yet Another resouce nago) -> For 2nd container first AM or Node manager will create another
AM, Node manager manages the AM.
yarn-site.xml we can change scheduling policies (Default is capacity scheduler)

Sequence: Value stored in key-values pair. It is specific to Hadoop in this file there is no meta data. So
other system will not understand.

+ JASON(Meta data)

Apache Avro file format: schema+data,


all these are row based when it row based we have to read full row for fetching the single value. Which
is not efficient.

Select * from is prefererable.

ORC File - Optimized row column file.


It is very specific to hive, no other system can understand when user need some column then column
based will be preferable.

+Meta data

Parquet

Sqoop

Internally MR, Reducer=0, Default 4 Mapper is available.


YARN

YOUTUBE Lecture: https://www.youtube.com/watch?v=ZFbkNY6Xn94


Overview:
YARN

Map Reduce 1 Execution Framework


YARN

YARN ARCHITECTURE

Resource Manager: It runs on the master daemons and it tracks how many resources are live and
available.

Node Manager: It provides computational resources in the form of container and manager’s processes
running in those container and a container executes an application specific process.

3. Container which can run different type of task like map task or reduce task.
YARN

Let’s discuss in details:

Application Master and Map Reduce task run in container, it is scheduled by Resource Manager and
manage by Node Manager.
YARN

Node Manager ensures that container should not use resources more than allocated.
YARN

Running WordCount Application in MR2

Client contact -> Resource Manager -> Ask to Run Application Master Process -> Resource manager finds
a Node manager which can launch Map Reduce Application Master in the container -> Map Reduce
Application Master request for the resource from Resource Manager -> Finally it launch the container on
Data Node for Map Reduce task (Application Master and Map Reduce task runs in a container that are
schedule by Resource Manager and Manage by Node Manager.
YARN

Вам также может понравиться