Вы находитесь на странице: 1из 27

max executor/node mem/executor numExecutor coresPer totExecutorMem

1 Err:511 3 8 Err:511
2 Err:511 7 4 Err:511
4 Err:511 15 2 Err:511
8 Err:511 31 1 Err:511
16 Err:511 63 0.5 Err:511
19 2048 75 0.4210526316 153600

Nodes in Cluster 4
vCores/node 8 Note: On EMR the number of vitrual cores is usually 2 * n
Mem*/node 5600 *yarn.nodemanager.resource.memory-mb from http://docs

mem overhead minimum


spark.yarn.executor.memoryOverhead 0.9 384
You proably don't want to change these. The default settings are .9 and 384

NOTE: yarn.scheduler.capacity.resource-calculator must be set to org.apache.hadoo

[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apach
}
}
]
driverCores driverMem memOverhead
8 Err:511 Err:511
4 Err:511 Err:511
2 Err:511 Err:511
1 Err:511 Err:511
0.5 Err:511 Err:511
0.4210526316 2048 -133248

vitrual cores is usually 2 * number of advertised cores


memory-mb from http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html

e set to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator to recieve more than 1 core/executor nod

urce-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"

import json

textFile= sc.textFile("s3://swiggy-analytics-data/click-stream/dt=2016-04-17");

def convertToMap(line):
def getEvent(event_name,data):

def getRestaurantsV2(rdd):

def getRests(e):

def func(e):
launchCommand Client Mode

pyspark --num-executors 75 --executor-memory 2048M --driver-memory 3600M --executor-cores 0.4210526315

asticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html

.DominantResourceCalculator to recieve more than 1 core/executor node

source.DominantResourceCalculator"

ile= sc.textFile("s3://swiggy-analytics-data/click-stream/dt=2016-04-17");

onvertToMap(line):
etEvent(event_name,data):

return True

etRestaurantsV2(rdd):

res=dict()
for res in ev:

res['count']=len(ev)
res['avg_rating']=cnt/len(ev)
fin_res[li]=res
li+=1
totalTask
24
28
30
31
31.5
cnt+=res["avg_rating"]
max executor/node mem/executor numExecutor coresPer totExecutorMem
1 5040 9 16 45360
2 2416 19 8 45904
4 1016 39 4 39624
8 316 360 2 113760

Nodes in Cluster 10 4
vCores/node 16 4
Mem*/node 53000 *yarn.nodemanager.resource.memory-mb from
http://docs.aws.amazon.com/ElasticMapReduce/latest/Dev

mem overhead minimum


spark.yarn.executor.memoryOverhead 0.9 384

NOTE: yarn.scheduler.capacity.resource-calculator must be set to org.apache.hadoop.

[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apach
}
}
]
driverCores driverMem memOverhead
16 5040 5600
8 2416 7680
4 1016 15360
2 316 30720

esource.memory-mb from
on.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html

must be set to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator to recieve more than 1 core/executo

city.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
launchCommand Cluster Mode
pyspark --num-executors 9 --executor-memory 5040M --driver-memory 5040M --executor-cores 16 --driver-cores 1
pyspark --num-executors 19 --executor-memory 2416M --driver-memory 2416M --executor-cores 8 --driver-cores 8
pyspark --num-executors 39 --executor-memory 1016M --driver-memory 1016M --executor-cores 4 --driver-cores 4
pyspark --num-executors 360 --executor-memory 316M --driver-memory 316M --executor-cores 2 --driver-cores 2

e/TaskConfiguration_H2.html

source.DominantResourceCalculator to recieve more than 1 core/executor node

rn.util.resource.DominantResourceCalculator"
launchCommand Client Mode
pyspark --num-executors 9 --executor-memory 5040M --driver-memory 3600M --executor-cores 16 --driver-c
pyspark --num-executors 19 --executor-memory 2416M --driver-memory 3600M --executor-cores 8 --driver-c
pyspark --num-executors 39 --executor-memory 1016M --driver-memory 3600M --executor-cores 4 --driver-c
pyspark --num-executors 360 --executor-memory 316M --driver-memory 3600M --executor-cores 2 --driver-c
ecutor-cores 16 --driver-cores 14
xecutor-cores 8 --driver-cores 14
xecutor-cores 4 --driver-cores 14
xecutor-cores 2 --driver-cores 14
max executor/node mem/executor numExecutor coresPer totExecutorMem
1 5040 9 36 45360
2 2416 19 18 45904
4 1016 39 9 39624
8 316 360 4.5 113760

Nodes in Cluster 10 4
vCores/node 36 4
Mem*/node 53248 *yarn.nodemanager.resource.memory-mb from
http://docs.aws.amazon.com/ElasticMapReduce/latest/Dev

mem overhead minimum


spark.yarn.executor.memoryOverhead 0.9 384

NOTE: yarn.scheduler.capacity.resource-calculator must be set to org.apache.hadoop.

[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apach
}
}
]
driverCores driverMem memOverhead
36 5040 5600
18 2416 7680
9 1016 15360
4.5 316 30720

esource.memory-mb from
on.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html

must be set to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator to recieve more than 1 core/executo

city.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
launchCommand Cluster Mode
pyspark --num-executors 9 --executor-memory 5040M --driver-memory 5040M --executor-cores 36 --driver-cores 3
pyspark --num-executors 19 --executor-memory 2416M --driver-memory 2416M --executor-cores 18 --driver-cores
pyspark --num-executors 39 --executor-memory 1016M --driver-memory 1016M --executor-cores 9 --driver-cores 9
pyspark --num-executors 360 --executor-memory 316M --driver-memory 316M --executor-cores 4.5 --driver-cores 4

e/TaskConfiguration_H2.html

source.DominantResourceCalculator to recieve more than 1 core/executor node

rn.util.resource.DominantResourceCalculator"
launchCommand Client Mode
pyspark --num-executors 9 --executor-memory 5040M --driver-memory 3600M --executor-cores 36 --driver-c
pyspark --num-executors 19 --executor-memory 2416M --driver-memory 3600M --executor-cores 18 --driver-
pyspark --num-executors 39 --executor-memory 1016M --driver-memory 3600M --executor-cores 9 --driver-c
pyspark --num-executors 360 --executor-memory 316M --driver-memory 3600M --executor-cores 4.5 --driver
ecutor-cores 36 --driver-cores 34
xecutor-cores 18 --driver-cores 34
xecutor-cores 9 --driver-cores 34
xecutor-cores 4.5 --driver-cores 34
max executor/node mem/executor numExecutor coresPer totExecutorMem
1 5040 4 16 20160
2 2416 9 8 21744
4 1016 19 4 19304
8 316 360 2 113760

Nodes in Cluster 5 4
vCores/node 16 4
Mem*/node 54272 *yarn.nodemanager.resource.memory-mb from
http://docs.aws.amazon.com/ElasticMapReduce/latest/Dev

mem overhead minimum


spark.yarn.executor.memoryOverhead 0.9 384

NOTE: yarn.scheduler.capacity.resource-calculator must be set to org.apache.hadoop.

[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apach
}
}
]
driverCores driverMem memOverhead
16 5040 2800
8 2416 3840
4 1016 7680
2 316 15360

esource.memory-mb from
on.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html

must be set to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator to recieve more than 1 core/executo

city.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
launchCommand Cluster Mode
pyspark --num-executors 4 --executor-memory 5040M --driver-memory 5040M --executor-cores 16 --driver-cores 1
pyspark --num-executors 9 --executor-memory 2416M --driver-memory 2416M --executor-cores 8 --driver-cores 8
pyspark --num-executors 19 --executor-memory 1016M --driver-memory 1016M --executor-cores 4 --driver-cores 4
pyspark --num-executors 360 --executor-memory 316M --driver-memory 316M --executor-cores 2 --driver-cores 2

e/TaskConfiguration_H2.html

source.DominantResourceCalculator to recieve more than 1 core/executor node

rn.util.resource.DominantResourceCalculator"
launchCommand Client Mode
pyspark --num-executors 4 --executor-memory 5040M --driver-memory 3600M --executor-cores 16 --driver-c
pyspark --num-executors 9 --executor-memory 2416M --driver-memory 3600M --executor-cores 8 --driver-co
pyspark --num-executors 19 --executor-memory 1016M --driver-memory 3600M --executor-cores 4 --driver-c
pyspark --num-executors 360 --executor-memory 316M --driver-memory 3600M --executor-cores 2 --driver-c
ecutor-cores 16 --driver-cores 14
ecutor-cores 8 --driver-cores 14
xecutor-cores 4 --driver-cores 14
xecutor-cores 2 --driver-cores 14
max executor/node mem/executor numExecutor coresPer totExecutorMem
1 5040 4 64 20160
2 2416 9 32 21744
4 1016 19 16 19304
8 316 360 8 113760

Nodes in Cluster 5 4
vCores/node 64 4
Mem*/node 241664 *yarn.nodemanager.resource.memory-mb from
http://docs.aws.amazon.com/ElasticMapReduce/latest/Dev

mem overhead minimum


spark.yarn.executor.memoryOverhead 0.9 384

NOTE: yarn.scheduler.capacity.resource-calculator must be set to org.apache.hadoop.

[
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apach
}
}
]
driverCores driverMem memOverhead
64 5040 2800
32 2416 3840
16 1016 7680
8 316 15360

esource.memory-mb from
on.com/ElasticMapReduce/latest/DeveloperGuide/TaskConfiguration_H2.html

must be set to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator to recieve more than 1 core/executo

city.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
launchCommand Cluster Mode
pyspark --num-executors 4 --executor-memory 5040M --driver-memory 5040M --executor-cores 64 --driver-cores 6
pyspark --num-executors 9 --executor-memory 2416M --driver-memory 2416M --executor-cores 32 --driver-cores 3
pyspark --num-executors 19 --executor-memory 1016M --driver-memory 1016M --executor-cores 16 --driver-cores
pyspark --num-executors 360 --executor-memory 316M --driver-memory 316M --executor-cores 8 --driver-cores 8

e/TaskConfiguration_H2.html

source.DominantResourceCalculator to recieve more than 1 core/executor node

rn.util.resource.DominantResourceCalculator"
launchCommand Client Mode
pyspark --num-executors 4 --executor-memory 5040M --driver-memory 3600M --executor-cores 64 --driver-c
pyspark --num-executors 9 --executor-memory 2416M --driver-memory 3600M --executor-cores 32 --driver-c
pyspark --num-executors 19 --executor-memory 1016M --driver-memory 3600M --executor-cores 16 --driver-
pyspark --num-executors 360 --executor-memory 316M --driver-memory 3600M --executor-cores 8 --driver-c
ecutor-cores 64 --driver-cores 62
ecutor-cores 32 --driver-cores 62
xecutor-cores 16 --driver-cores 62
xecutor-cores 8 --driver-cores 62

Вам также может понравиться