Вы находитесь на странице: 1из 375

2010 3 31

.................................................................................................................... 1-1
1.1

........................................................................... 1-1

1.2

............................................................. 1-2

1.2.1

................................................................. 1-2

1.2.2

................................................................. 1-4

1.3
2

..................................................................................................... 1-6

MapReduce .............................................................. 2-1


2.1

........................................................................ 2-1

2.2

.................................................................... 2-2

2.2.1

............................................................................ 2-2

2.2.2

....................................................................................... 2-2

2.2.3

............................................................................ 2-2

2.2.4

............................................... 2-3

2.3

MapReduce ....................................................... 2-6

2.3.1

.................................................. 2-7

2.3.2

MapReduce ...................................................................................... 2-19

2.4

MapReduce ............................................................ 2-30

2.4.1

Map ....................................................................................... 2-31

2.4.2

Reduce ................................................................................... 2-43

2.4.3

MapReduce ....................................................................... 2-47

2.5
3

.............................................................................................................. 2-49

........................................................................... 3-1
3.1

Hadoop ................................................... 3-1

3.1.1

Hadoop ................................................................................ 3-1

3.1.2

MapReduce Hadoop ................................. 3-2

3.1.3

Hadoop ...................................................... 3-3

3.2

MapReduce ................................................................................ 3-5

3.2.1

.......................................................... 3-5

3.2.2

Map ........................................................................... 3-7

3.2.3

Reduce ...................................................................... 3-7

3.3

MapReduce .................................................................. 3-7

3.3.1

............................................... 3-8

3.3.2

MapReduce ............................................................... 3-8

-1

3.3.3
3.4

.................................. 3-13

3.4.2

..................................... 3-15

3.4.3

............................................. 3-16

.............................................................................................................. 3-17

................................................................................... 4-1
4.1

................................................................ 4-1

4.1.1

..................................................................... 4-1

4.1.2

Hadoop ....................................................... 4-1

4.1.3

Hadoop ................................................... 4-6

4.1.4

L3 ........................................................... 4-8

4.1.5

Hadoop .............................................................. 4-8

4.1.6

Hadoop ................................................ 4-9

4.1.7

L2 ................................................. 4-10

4.2

............................................................................................... 4-10

4.2.1

............................................................... 4-10

4.2.2

............................................................... 4-12

4.2.3

Hadoop ................................................. 4-13

4.2.4

Hadoop ................................................. 4-14

4.3

Hadoop ........................................................ 4-15

4.3.1

FT ................................................................ 4-15

4.3.2

FT ................................................................... 4-18

4.4

.................................................................................... 3-13

3.4.1

3.5
4

MapReduce .................................................................... 3-10

........................................................................................ 4-19

4.4.1

Hadoop ........................... 4-19

4.4.2

................................................................................................ 4-19

........................................................................ 5-1
5.1

........................................................................... 5-1

5.1.1

............................................................................ 5-1

5.1.2

....................................... 5-2

5.1.3

.......................................................... 5-2

5.1.4

............................................... 5-3

5.1.5

............................................................................ 5-4

5.1.6

.............................................................................................. 5-4

5.2

............................................................................................................ 5-5

5.2.1

..................................................................................................... 5-5

-2

5.3

5.3.1

..................................................................... 5-6

5.3.2

.......................................................... 5-9

5.3.3

......................................... 5-12

5.3.4

............................................................... 5-14

5.4

................................................................................. 5-18

5.4.2

................................................................................................ 5-20

.................................................................................................................... 6-1
............................................................................................................ 6-1

6.1.1

........................................................................................... 6-1

6.1.2

................................................................................... 6-2

6.2

..................................................................................................... 6-5

6.3

MapReduce .......................... 6-5

6.3.1

...................................................... 6-6

6.3.2

......................................... 6-13

6.4

................................................ 6-17

6.4.1

........................................................... 6-17

6.4.2

............................................. 6-18

6.4.3

................................................ 6-18

6.4.4

1 ............................................. 6-19

6.4.5

Kemari ..................................................... 6-20

6.5

................................................ 6-21

6.5.1

............................................. 6-22

6.5.2

........................................................... 6-28

6.6

............................................................................................ 6-31

.................................................................................................................... 7-1

2
8

........................................................................................ 5-17

5.4.1

6.1

......................................................................................................... 5-6

MapReduce .................................................................. 8-1


8.1

MapReduce ............................................................................................... 8-1

8.1.1

MapReduce ........................................................................................ 8-1

8.1.2

MapReduce .................................................................................... 8-2

8.2

MapReduce .................................................................................... 8-2

8.2.1

MapReduce ...................................................................... 8-3

8.2.2

MapReduce ............................................................................. 8-3

8.2.3

MapReduce ....................................................... 8-5

-3

8.2.4
8.3

Hadoop MapReduce ......................... 8-9

8.3.1

MapReduce ........................................................................... 8-10

8.3.2

Map .................................................................................................. 8-14

8.3.3

Reduce .............................................................................................. 8-17

8.3.4

................................................................................................... 8-18

8.3.5

Shuffle ...................................................................................................... 8-22

8.3.6

.......................................................................... 8-23

8.3.7

.......................................................................... 8-26

8.3.8

................ 8-27

8.4

MapReduce ................................................. 8-28

8.4.1

Pig............................................................................................................. 8-28

8.4.2

Hive .......................................................................................................... 8-31

8.4.3

Pig Hive ............................................... 8-33

8.5

MapReduce ........................... 8-33

8.5.1

Map Reduce ................................................................... 8-33

8.5.2

.................................................... 8-35

8.5.3

MapReduce ................................... 8-35

8.5.4

MapReduce .......................................... 8-36

8.5.5

Map Reduce ................................. 8-37

8.6
9

MapReduce ................................................................................. 8-6

................................................................................................... 8-38

Hadoop ..................................................................... 9-1


9.1

.............................................................................................. 9-1

9.1.1

.......................................................... 9-1

9.1.2

.................................... 9-2

9.1.3

..................................................................... 9-3

9.2

............................................................................................................ 9-5

9.2.1

...................................................... 9-5

9.2.2

....................................... 9-6

9.2.3

Hadoop ......................................................................... 9-6

9.3

Hadoop .......................................................... 9-6

9.3.1

Hadoop MapReduce ....................................................... 9-7

9.3.2

MapReduce ................................. 9-8

9.3.3

Hadoop ............... 9-12

9.3.4

Hadoop ........................ 9-16

9.4

Hadoop ............................................................ 9-19

-4

9.4.1

Hadoop ..................................... 9-19

9.4.2

Hadoop ........................ 9-23

9.5

Hadoop ...................... 9-28

9.5.1

Map Hadoop ............................. 9-28

9.5.2

Reduce Hadoop ........................ 9-30

9.5.3

MapReduce ......................................................... 9-31

9.6

MapReduce ................................................................ 9-47

9.6.1

............................................. 9-48

9.6.2

MapReduce ............................................................. 9-48

9.6.3

MapReduce .................................................. 9-50

9.7

Hadoop ............................................. 9-53

9.7.1

Hadoop MapReduce .............................................. 9-53

9.7.2

Hadoop ................................................. 9-53

9.7.3

MapReduce Hadoop ........................ 9-53

9.7.4

MapReduce .......................................... 9-53

10

Hadoop .................................................... 10-1

10.1

......................................................................... 10-1

10.2

....................................................... 10-3

10.3

................................................................................. 10-4

10.3.1 ..................................................................................... 10-5


10.3.2 .................................................................................. 10-5
10.3.3 .............................................................................. 10-5
10.4

........................................................................................ 10-5

10.4.1 .............................................................................. 10-6


10.4.2

Heartbeat DRBD HA ....................................... 10-9

10.4.3

HA Kemari FT ................... 10-10

10.5

........................................................................... 10-11

10.5.1

HA .............................................................. 10-12

10.5.2 FT ....................................................... 10-18


10.6

........................................................................... 10-25

10.6.1

HA .............................................................. 10-25

10.6.2 FT ....................................................... 10-25


10.6.3 .............................................................................................. 10-27
11

Hadoop ................................................................... 11-1

11.1

Hadoop .................................................... 11-1

11.1.1

Hadoop ........................................................... 11-1

-5

11.1.2

Hadoop ............. 11-3

11.1.3

......................................................................................... 11-4

11.2

....................................................................................................... 11-6

11.2.1

............................................................................................ 11-6

11.2.2

Hadoop ................................................................... 11-6

11.3

Hadoop ................................................................... 11-7

11.3.1

................................................................... 11-7

11.3.2

........................... 11-7

11.3.3

................ 11-8

11.3.4

11-10

11.3.5

......................11-11

11.4

............................................................................... 11-13

11.4.1

........................................................................ 11-13

11.4.2

Hadoop ............................................... 11-14

11.4.3

Hadoop .............................................................. 11-14

11.4.4

Hadoop ...................................................... 11-14

11.4.5

Hadoop ............................. 11-16

11.4.6

Hadoop ...................... 11-18

11.5

Hadoop ..................................................................... 11-19

11.5.1

.................................................................... 11-19

11.5.2

......................................................... 11-19

11.5.3

........................................... 11-20

11.5.4

................................................................. 11-22

11.6

.................................................................................. 11-23

11.6.1

..................................................................................................... 11-23

11.6.2

.............................................................................................. 11-23

12

Hadoop ............................................................ 12-1

12.1

...................................................................... 12-1

12.1.1

Hadoop .................................................................................. 12-1

12.1.2

Hadoop ................................... 12-2

12.2

....................................................................................................... 12-3

12.2.1

.......................................................................... 12-3

12.2.2

.................................................... 12-3

12.2.3

.............................................................................. 12-4

12.3
12.3.1

.................................................................................... 12-4
Hadoop ............. 12-4

-6

12.3.2
12.4

......................................................................................... 12-5

Kickstart puppet .................................................. 12-7

12.4.1

Kickstart .................................................. 12-7

12.4.2

Kickstart .............................................. 12-8

12.4.3

Puppet ......................................................... 12-9

12.4.4

Kickstart Puppet ............................. 12-9

12.4.5

............................................................................... 12-10

12.4.6

...................................................... 12-12

12.4.7

.................................................. 12-13

12.4.8

......................... 12-14

12.4.9

Kickstart Puppet . 12-17

12.4.10 Puppet ........ 12-18


12.5

........................................................................... 12-19

12.5.1

Hadoop ...................... 12-19

12.5.2

............................................................................... 12-21

13

Hadoop ............................................................... 13-1

13.1

........................................................................................ 13-1

13.1.1

................................................................................. 13-1

13.1.2

............................................................................................ 13-1

13.2

....................................................................................................... 13-3

13.2.1
13.3

.......................................................................... 13-3
.................................................................................... 13-3

13.3.1

1Hadoop ................................ 13-4

13.3.2

2 ......................................... 13-9

13.3.3

3 ..
............................................................................................................ 13-12

13.3.4

..................... 13-14

13.3.5

....................................................................................... 13-20

13.4
13.4.1

..................................................................................................... 13-27

13.4.2

.............................................................................................. 13-28

3
I

.................................................................................. 13-27

............................................................................................................. I-1
I.1

..................................................................... I-1

I.2

......................................................................... I-3

I.2.1

............................................................................................... I-3

-7

I.2.2
II

...................................................................................... I-15

................................................................................................................... II-1

-8

1.1

(
)

1-1

Hadoop
MPI(Message Passing Interface)
MPICHOpen MPI
HDFS
MapReduce Hadoop
Hadoop Google MapReduce

1.2

1.2.1

2 1-1

1-1

1-2

2
Hadoop

(1)
Google
MapReduce Hadoop
MapReduce Map
Reduce 2 Map

Reduce Map
MapReduce
MapReduce MapReduce Map
Reduce

MAP

SHUFFLE

REDUCE

1-2 MapReduce

1-3

(2)
Hadoop HDFS(Hadoop Distributed File
System) 64MB
1
1

HDFS NameNode
DataNode 2

Client
NameNode

SW

SW

SW

DataNodes

Rack

1-3 HDFS

1.2.2

1-4

1.2.2.1

Hadoop

100 Hadoop

1.2.2.2

Hadoop

1.2.2.3
1 1

1-5

Hadoop

Hadoop
Hadoop
100 Hadoop

1.3
2 1 7
8 13

1-4 26

8 MapReduce
3 Hadoop
9

2
4 Hadoop
Hadoop
10
5 Hadoop
3 11 Hadoop
12 Hadoop 13
Hadoop

1-6

2 5

2 MapReduce

8 MapReduce

9 Hadoop

10 Hadoop

11 Hadoop

12 Hadoop
6
13 Hadoop
7

BA

1-4

1-7

2 MapReduce

MapReduce
MapReduce

MapReduce
MapReduce
8

2.1

GPS

Hadoop

2-1

2 MapReduce

2.2
Hadoop MapReduce

2.2.1

HDFS

2.2.2

2.2.3
5

2-2

2 MapReduce

2.2.4

2.2.4.1

10

2-1
2-1
No.

110

ID

2-3

2 MapReduce

2-2
2-2
No.

12

ID

2-3
2-3
No.

105

388

2.2.4.2
5

2-4
2-1

2-4

2 MapReduce

2-4
No.

ID

ID

2-1

2-5

2 MapReduce

2.3 MapReduce
MapReduce

MapReduce
Map Reduce

MapReduce Map Reduce

MapReduce MapReduce

Map
Reduce


MapReduce

2-6

2 MapReduce

2.3.1

2-2

2-2

2-7

2 MapReduce

2.3.1.1
2-2
2-3

ID

(1)

(2)

(3)

(4)

(5)

2-3
(1)

2-8

2 MapReduce

(2)
ID

2-4

Step1

ID:1

ID:2

Step2

500m
230m

ID:1

Step3

230m

500m

2-4

2-9

2 MapReduce

(3)

(2)

(4)

2-5

10:00:00

10:00:00

10:05:00

10:05:00

10:10:00

10:10:00

10:15:00

10:03:02

10:00:00
2-5
(5)
ID ID

(2)

(
)(2)
ID ID
1

2-10

2 MapReduce

2-6

Step1

1
2

ID:1
:

ID:1
:

ID:1
:

Step2
)
::40km::20km::10km
1
40km

2
40km
20km

Step3

1
40km

2
(40km+20km)/2=30km

Step4

ID

ID

09/11/01
12:00:00

40km30km

09/11/01
12:00:00

10km2km,1km

2-6

2-11

2 MapReduce

2.3.1.2
2-2
2-7

ID

(1)

(2)

(3)

(4)

(5)

(6)

2-7
(1)

2-12

2 MapReduce

(2)

2-8

10:00:00

10:00:00

10:05:00

10:05:00

10:10:00

10:10:00

10:15:00

10:03:02

10:00:00
2-8
(3)

ID
ID

2-13

2 MapReduce

(4)
ID

2-9
Step1

ID:1

ID:2

Step2

500m
230m

ID:1

Step3

230m

500m

2-9
(5)
ID

2-14

2 MapReduce

2-10

()

12:00:08

12:00:04

12:00:00

ID

2-10
(6)

2-11

2-15

2 MapReduce

Step1

Step2

1
2
3

12:00:45

12:00:30

12:00:25

1
36km

12:00:10

12:00:00

2
26km

Step3

ID

ID

09/11/01
12:00:00

0001

36km26km

09/11/01
12:00:00

0002

10km2km,1km

2-11

2-16

2 MapReduce

2.3.1.3
2-2
2-12

(1)

(2)

2-12
(1)


ID

ID

ID

2-17

2 MapReduce

(2)
(1)

(1)
2-13

ID

2009/11/30 10:00:00

1:,2:

2009/11/30 10:00:00

1:,2:

ID

12:00

1:,2:

10:00

1:,2:

ID

2009/5/3

10:00

1:,2:

2009/5/3

11:00

1:,2:

2-13

2-18

2 MapReduce

2.3.2 MapReduce
2.3.1 MapReduce

Map Reduce

MapReduce Map
Reduce
(Gane-Sarson ) 2-14
MapReduce
2.3.1.1

(1)
(2)
(3) Map
(4) Reduce
(5)
(6)
(7)

2-14 MapReduce
Gane-Sarson
MapReduce
2-15

2-15

2-19

2 MapReduce

(1)

2-16 2.3.1.1

(1)

(2)

(3)

(4)

(5)

2-16

2.3.1.1

2.3.1.1

2-20

2 MapReduce

( ID)
ID

2-17
ID

ID

ID
ID

ID
ID

ID

ID

2-17

2-21

2 MapReduce

(2)
2-17

ID ID

2-17
2-18
ID

ID

ID

ID

ID
ID

ID
ID

ID
ID

ID
ID

ID

ID

ID

ID

2-18

2-22

2 MapReduce

(3) Map
2-18 Map Map

2-18
2-18

Map
Map 2-19
Map
ID

ID

ID

ID

ID
ID

ID
ID

ID
ID

ID
ID

ID

ID

ID

ID

2-19 Map

2-23

2 MapReduce

(4) Reduce
2-18 Reduce
Reduce
2-18

2-18

Reduce
Reduce
2-20
ID

ID

Map

ID

ID

ID
ID

ID
ID

ID
ID

ID
ID

ID

ID

ID

ID

Reduce

2-20 Reduce

2-24

2 MapReduce

(5)
2-18 KeyValue
KeyValue
Map Reduce
Map Reduce
Key Value Key Reduce
Key
Reduce
Reduce
2.3.1.1 (5)
ID
ID
Key ID ID
Map KeyValue 2-21
Key
ID

ID

Map

ID

ID

ID
ID

ID
ID

ID
ID

ID
ID

ID

ID

Reduce

ID

ID

2-21

2-25

2 MapReduce

(6)
2-18 KeyValue
KeyValue
Map Map

Key Value
Key

Key Value

2-22

ID

ID

Map

ID

ID

ID
ID

ID
ID

ID
ID

ID
ID

ID

ID

Reduce

ID

ID

2-22

2-26

2 MapReduce

(7)
2-18 KeyValue
KeyValue
Reduce Reduce

Key Value
Key
Reduce Key
Value
2-23
ID

ID

Map

ID

ID

ID
ID

ID
ID

ID
ID

ID
ID

ID

ID

Reduce

ID

ID

2-23

2-27

2 MapReduce

MapReduce
2-24 MapReduce Map Redcue

ID

Map

Reduce

ID

ID

2-24 MapReduce

MapRedcue
MapReduce 2-25 2-26

2-28

2 MapReduce

ID

Map

ID

Reduce

ID

2-25 MapReduce

Map

Reduce

ID

IDID

ID

2-26 MapReduce

2-29

2 MapReduce

2.4 MapReduce
MapReduce

MapReduce MapReduce
MapReduce 2.3.2 MapReduce
MapReduce
2-27

ID

(Key)

LongWritable

(Value)

Text

TextInputFormat

Map

TaxiProbeAnalysisMapper

(Key)

TaxiProbeAnalysisKeyWrita
bleComparable

(Value)

TaxiProbeAnalysis
ValueWritable

Reduce

TaxiProbeAnalysisReduce

(Key)

NullWritable

(Value)

ProbeAnalysisInfo

TextOutputFormat

Map

ID

Reduce

ID

2-27

2-30

2 MapReduce

2.4.1 Map
Map Map

Map
Map 2-28 2-29

2-28 Map (1/2)

2-31

2 MapReduce

2-29 Map (2/2)


:

Map Mapper Mapper


Key Value Key Value
Key Value
TextInputFormat TextInputFormat LongWritable
Text Key Value
TaxiProbeAnalysisKeyWritableComparable
TaxiProbeAnalysisValueWritable

2-32

2 MapReduce

setup()Map
Map

A) Context getConfiguration() Configuration


B) Configuration get()()

Configuration MapReduce

map()2.3.2 Map
map() Key 2.3.2
Value TextInputFormat setPaths()

A) map() value

B) Key(TaxiProbeAnalysisWritableComparable)
Value(TaxiProbeAnalysisValueWritable)
C) Key Value Context write()

KeyValue Hadoop Text

Text

2-33

2 MapReduce

Key
2.3.2 2-30 2-31

2-30 Key (1/2)

2-34

2 MapReduce

2-31 Key (2/2)


:

Key WritableComparable
WritableComparable

2.3.2 Value ( ID
ID)
get()set()

2-35

2 MapReduce

write()
2-5
2-5 write()

ID

int

IntWritable write()

boolean

BooleanWritable write()

String

Text write()

ID

String

Text wirte()

int( ID)boolean()String( ID)


Hadoop
Comparator
compare()
:

read()read()
write()
2-6
2-6 read()

ID

int

IntWritable readFields()

boolean

BooleanWritable readFields ()

String

Text readFields ()

ID

String

Text readFields ()

comparatorTo()Key ID
ID Hadoop
Key Comparator

2-36

2 MapReduce

Comparator
Comparator Key

Hadoop Comparator Key Comarator


Key comparatorTo()
Comparator

Comparator
Comparator

Key Comparator
Key TaxiProbeAnalysisKeyWritableComparable
Comparator 2-32 2-33

2-32 Comparator (1/2)

2-37

2 MapReduce

2-33 Comparator (2/2)


:

Key TaxiProbeAnalysisComparator
Comparator TaxiProbeAnalysisComparator

2-38

2 MapReduce

Comparator WritableComparator

TaxiProbeAnalysisKeyWritableComparable

compare() Key
Compare() b1 b2
Key s1
s2 Key
b1 b2 s1s2
Key write()

2-7
2-7 compare()

()

ID

int

s1,s2

Integer.SIZE/8

boolean

ID + ID

ID

string

string

WritableUtils.decodeVIntSiz

e()+WritableUtils.readVInt()

ID +

WritableUtils.decodeVIntSiz

ID

e()+WritableUtils.readVInt()

Key(TaxiProbeAnalysisKeyWritableComparable) write()
ID ID

int( ID)IntWritable Comparator.compare()


IntWritable write()

( ID
)
int

2-39

2 MapReduce

WritableUtils.decodeVIntSize()+WritableUtils.readVInt()
Text write()

Partition
Key Partition
Hadoop Partititon Key hashCode()
Key hashCode()
Key
Key
Reduce Key
TaxiProbeAnalysisKeyWritableComparable Partition
2-34

2-34 Partition
:

Partition Partitioner Partitioner


Key
TaxiProbeAnalysisKeyWritableComparable

getPartition()TaxiProbeAnalysisKeyWritableComparable

2-40

2 MapReduce

TaxiProbeAnalysisKeyWritableComparable
hashCode()

Value
2.3.2 Value 2-35

2-35 Value
:

Value Writable

2.3.2 Value (

2-41

2 MapReduce

)
get()set()
:

write()
2-8
2-8 write()

double

DataOutput writeDouble()

double

DataOutput writeDouble()

int

DataOutput writeInt()

double

DataOutput wirteDouble()

read()
write()
2-9
2-9 read()

double

DataInput readDouble()

double

DataInput readDouble()

int

DataInput readInt()

double

DataInput readDouble()

2-42

2 MapReduce

2.4.2 Reduce
Reduce Reduce

Reduce
2.3.2 Reduce 2-36

2-36 Reduce
2-43

2 MapReduce

Reduce Reducer Reducer


Key Value Key Value
Key Value
TaxiProbeAnalysisKeyWritableComparable
TaxiProbeAnalysisValueWritable Key
NullWritable
TextOutputFormat Key Value
Value
ProbeAnalysisInfo

setup()Reduce Reduce
2.3.1.1

A) Context getConfiguration() Configuration


B) Configuration get()(
)
Configuration MapReduce

reduce()2.3.2 Reduce
reduce() KeyValue Map map()
Context.write() Key

A) reduce() KeyValue
2.3.2 Reduce
B) Value(ProbeAnalysisInfo)
C) KeyValue Context write()
Key NullWritable
TextOutputFormat Key
Value Key NullWritable

2-44

2 MapReduce

KeyValue

TextOutputFormat Key NullWritable


Key 2.3.2
Key NullWritable
Value 2.3.2
Value 2-37

2-37 Value

2-45

2 MapReduce

2.3.2 Value ( ID
ID)
get()set()

toString()
TextOutputFormat Value
toString()

2-46

2 MapReduce

2.4.3 MapReduce
2.3.2 MapReduce 2-38

2-38 Reduce
:

MapReduce Configured Tool

set()
Hadoop

2-47

2 MapReduce

Configuration
set() MapReduce

MapReduce
Configuration
MapReduce

set()
:

run()MapReduce
2-10
2-10 MapReduce

No.

Job

Job

InputSplit

TextInputFormat

setMaxInputSplitSize

TextInputFormat

setInputPaths

Map

Job

setMapperClass

TaxiProbeAnalysisMapper

Key

Job

setMapperOutputKeyC

TaxiProbeAnalysisKeyWritableC

lass

omparator

setMapperOutputValue

TaxiProbeAnalysisValueWritable

Value

Job

Class
7

Shuffle

Job

setPartitionerClass

TaxiProbeAnalysisPartitioner

Reduce

Job

setReducerClass

TaxiProbeAnalysisReducer

Key

Job

setOutputKeyClass

NullWritable

10

Value

Job

setOutputValueClass

ProbeAnalysisInfo

11

TextOutputFormat

setOutputPath

12

Reduce

Job

setNumReduceTasks

Reduce

Map Reduce Configuration set()


Key Value Map Reduce
Key Map Reduce
Key Map
Reduce Key Key

MapReduce Job waitForCompletion()


MapReduce
waitForCompletion()

2-48

2 MapReduce

true MapReduce
2.5
Hadoop
MapReduce
Hadoop Map Reduce

MapReduce

Map Reduce

Hadoop MapReduce

Hadoop

Hadoop

MapReduce Hadoop

MapReduce

2-49

2 Hadoop

Hadoop
Hadoop
MapReduce

MapReduce

MapReduce

3.1 Hadoop
2 Hadoop

Hadoop

3.1.1

MapReduce 3 3-1 3
3-2
3-1 3-2 24 48
Hadoop

3-1 Hadoop
No.

24

24

48

48

93

3-1

3-2
No.

CPU

S1

Core 2 Duo T9400

2GB

SATA

48

24 , 48 ,

2.53GHz 2
2

S2

Xeon E5504

250GB2
6GB

SAS

2GHz 4
3

S3
S4

Xeon 5148

2GB

SAS

S5

Xeon X5460

6GB

SAS

16

Xeon E5345

146GB2
8GB

SAS

2.33GHz 4 2

3.1.2

72GB2

3.16GHz 4
5

17

300GB2

2.33GHz 2
4

146GB2

MapReduce Hadoop

MapReduce Hadoop
Hadoop MapReduce 3-1 2

Map

Reduce

Map

Map

Reduce

3-1 Hadoop MapReduce


(1) Map : Map
(2) Reduce : Key Reduce

3-2

MapReduce
Map Reduce Hadoop
9
2


Hadoop MapReduce JavaVM


MapReduce
MapReduce

RAID-0 1

3.1.3

Hadoop
Hadoop

Map Reduce
Map Reduce ()
CPU
9 Map Reduce
Map 11.5
Reduce 1

Map Reduce
Hadoop HDFSMapReduce

3-3

9
HDFS

HDFS

MapReduce
HDFS MapReduce
MapReduce

Hadoop
3.1.2 Hadoop
Hadoop


Map Reduce JavaVM

2
JavaVM
Map Reduce
JavaVM
Map Reduce TaskTracker

MapReduce
Map Reduce
JavaVM
9 3-2
200MB450MB JavaVM

3-4

Hadoop 3-2 2
OS 2 1
1
Hadoop OS
60GB 1 3-2
S3 GB Hadoop
MapReduce Map Reduce
S3 RAID-0
2 1 Hadoop

Hadoop 3-3
3-3 Hadoop
No.

Map

Reduce

S1

S2

S3

RAID-0 2
1

S4

S5

JavaVM 200MB450MB

3.2 MapReduce
MapReduce Map Reduce

3.2.1

2
MapReduce

3-4

3-5

3-4 MapReduce
No.

MapReduce

Map

Reduce

No.1,No.2

MapReduce

100,000
MapReduce Map Reduce 1
3-5
3-5
No.
1

Map

Reduce

Map

Reduce

Map

Reduce

670

25

141

19

MapReduce Map
MapReduce Reduce
Map Reduce

3-6

MapReduce Map
Map
MapReduce Reduce
Reduce

3.2.2 Map
Map

Map Map

Map 9 Map
CPU
CPU Map 9 Map
30 Map
3-5 HDFS Map

Map Map

3.2.3 Reduce
Reduce MapReduce

MapReduce Reduce
9 Reduce
Reduce JavaVM
Reduce
MapReduce MapReduce
Reduce JavaVM
Reduce

3.3 MapReduce
MapReduce
GB

3-7

3.3.1
( 10MB GB )MapReduce

Map : Map
Reduce : Reduce
Map :
Reduce : Reduce

Map : Map
Reduce : Reduce
Map : Map
Reduce : Reduce

3.3.2 MapReduce
MapReduce
MapReduce

Map
Map : Map
Reduce : Reduce
Map : Map
Reduce : Reduce

Map
Map Map
[]1Map = []Map ([]Map [
]Map )
[] Map = []1Map []Map

3-8

[] Map
[][]1Map = ([]Map [
]Map ) ([]Map []Map )
Map [] Map 1Map

Reduce
Reduce
[]Reduce = []([][
]Reduce )
Reduce
[]1Reduce = []Reduce ([]Reduce
[]Reduce )
[]1Reduce = []Reduce []Reduce

[]1Reduce = []Reduce [
]Reduce
[]1Reduce = []1Reduce [
]1Reduce []1Reduce
Reduce = []1Reduce []Reduce
[]Reduce

MapReduce
Hadoop MapReduce Reduce
Map
Reduce Map ( 5%)MapReduce

MapReduce Map Reduce Map

3-9

+ Reduce

3.3.3 MapReduce
MapReduce
MapReduce

1 MapReduce

1 MapReduce

1 ( 5GB)
24

Map : 74
Reduce : 896
Map 1 : 5.8210^9 Byte
Reduce : 7.2710^9 Byte
Map : 87
Reduce : 260
Map (24 ) : 48
Reduce (24 ) : 48

Map (1 ) : 1.8610^11 Byte


Map : 2782
Reduce : 1300
Map () : 260
Reduce () : 260

Map
Map
[]1Map = []Map ([]Map [
]Map )

3-10

74 (87 48)

40.828 ()

[] Map = []1Map []Map


[] Map
= 40.828 2782 260
= 436.86 ()
[][]1Map = ([]Map [
]Map ) ([]Map []Map )
= (5.8210^9 87) (1.8610^11 2782)
= 1.000026
Map [] Map 1Map

= 436.86 1.000026
= 436.84 ()

Reduce
Reduce

[]Reduce = [] ([]
[]Reduce )
1.8610^11 (5.8210^9 7.2710^9 )
2.32 10^11 (Byte)
Reduce Reduce

[]1Reduce = []Reduce ([]Reduce


[]Reduce )
=

896 (260 48)

165.41 ()

[]1Reduce = []Reduce []Reduce

3-11

7.2710^11 260

2.7910^7 (Byte)

[]1Reduce = []Reduce [
]Reduce
=

2.3210^11 1300

1.7910^8 (Byte)

[]1Reduce = []1Reduce [
]1Reduce []1Reduce
=

165.41 1.7910^8 2.7910^7

1057.87()

Reduce = []1Reduce []Reduce


[]Reduce
=

1057.87 1300 260

5289.35 ()

MapReduce
Reduce Map 5%
MapReduce Map Reduce Map
+ Reduce
436.840.05 + 5289.35
5311.19 ()

Map : 492 ( 13%)


Reduce : 5812 ( 10%)
MapReduce : 5841 ( 10%)
MapReduce 10%

3-12

3.4
2 3-1

105
388

3.4.1
Hadoop 3-6 3

3-13

3-6 Hadoop
No.

24

24

48

48

93

1
9

93 Map Reduce 260


1Map 640KB
Reduce 1300
JavaVM 400MB

3-2

8000

7537

7000

()

6000
5000

3856

4000
3000

2285

1744

2000

1230
634

1000
0
0

10

20

30

40

50
60

70

80

90

100

3-2

3-14

3.4.2

9 30 (1 )90 (3 )365
(1 )

93 Map Reduce 260


JavaVM 400MB

3-3

80000

74604

70000

()

60000
50000
40000
30000
16564

20000
10000

5841
1744

0
0

50

100

150
200
250
()

300

350

400

3-3

3-15

3.4.3

1
9
2
93 Map Reduce 260
1Map 640KB
Reduce 1300
JavaVM 400MB

2000
1744

1800
1600

()

1400
1200
1000

902

800
634
600
400

281

200
0

3-4

105

388

3-16

3.5
2 MapReduce

MapReduce

MapReduce
MapReduce
1.9

24
10%

3-17

Hadoop

Hadoop 10

4.1
Hadoop

4.1.1
2 5
5

1
5


Hadoop

4.1.2 Hadoop
1.2.2.2
Hadoop

Hadoop 2

HDFS
MapReduce

4.1.2.1 HDFS
HDFS 4-1 HDFS
DataNode NameNode DataNode

4-1

DataNode DataNode
RackAwareness

NameNode HDFS

Client

NameNode

Rack
SW

SW

SW

DataNodes

4-1 HDFS
HDFS 2


SecondaryNameNode
4-2 SecondaryNameNode
SecondaryNameNode NameNode

4-2

NameNode

SecondaryNameNode

NameNode

editFSImage

edits

fsimage

fsimage

edits

edits.new

edit

fsimage.chkpt

FSImage

fsimage.chkpt
edit
FSImage
fsimage

edits

4-2 SecondaryNameNode
Hadoop

4.1.2.2 Hadoop
MapReduce 4-3
MapReduce TaskTracker
JobTracker
TaskTracker TaskTracker
JobTracker TaskTracker
MapReduce Hadoop

4-3

MAP

SHUFFLE

REDUCE

4-3 MapReduce
4.1.2.3 Hadoop
Hadoop
(1) HDFS
HDFS 4-1
4-1 HDFS
No.

DataNode

10
()
Hadoop

2
3

DataNode

DataNode

Hadoop

() 2 heartbeat.recheck.interval (300 )+ 10 dfs.heartbeat.interval(3 )


(2) MapReduce
MapReduce 4-2 MapReduce
4-4

Hadoop Terasort
4-2 MapReduce
No.

Map

TaskTracker

TaskTracker

Map Map

TaskTracker

Reduce

TaskTracker

TaskTracker

Reduce Reduce

TaskTracker

Hadoop Hadoop

Reduce
10
4-4
MapReduce
Map Reduce Hadoop
Shuffle Reduce

JobTracker Hadoop Reduce


Map Shuffle
Reduce Shuffle Map

4-5

4-4 MapReduce Shuffle


JobTracker Shuffle Map

shuffle

Failed fetch notification #1 for task attempt_201001130758_1351_m_000053_0


Failed fetch notification #2 for task attempt_201001130758_1351_m_000053_0
Failed fetch notification #3 for task attempt_201001130758_1351_m_000053_0
Too many fetch-failures for output of task: attempt_201001130758_1351_m_000053_0 ... killing it
Error from attempt_201001130758_1351_m_000053_0: Too many fetch-failures
Choosing a non-local task task_201001130758_1351_m_000053
Adding task 'attempt_201001130758_1351_m_000053_1' to tip task_201001130758_1351_m_000053, for
tracker 'tracker_r7-2-0-16.example.net:localhost.localdomain/127.0.0.1:44354'
map

4.1.3 Hadoop
Hadoop
4-5 4-3

4-6

Job

L3

Hadoop (DataNode/TaskTracker)

L2

L2

L2

L2

L2

NameNode

Hadoop 100
JobTracker

Core2 Duo
40

Xeon QuadCore XeonQuadCore Xeon DualCore


12
18
16

Hadoop

Core2 Duo
10

4-5
4-3
No.

L3

L2

Hadoop

Hadoop

HDFS

Hadoop

MapReduce

JobTracker

Hadoop

HDFS

NameNode
4
5

MapReduce

DHCP/DNS

4-7


L3 Hadoop

L2 Hadoop
MapReduce

4.1.4 L3
L3
4-4
4-4 L3
No.

Hadoop - Hadoop

LAN

4.1.5 Hadoop
Hadoop HDFS DataNode
Hadoop
Hadoop MapReduce
2
4-5

4-5 Hadoop
No.

CPU

MapReduce

MapReduce

4-8

No.

3
4

HDD

MapReduce

RAID1

Hadoop

bonding

NIC

MapReduce

OS

7
8

MapReduce

OS

NameNode

HDFS

JobTracker

MapReduce

MapReduce

4.1.6 Hadoop
Hadoop Hadoop

4-6 Hadoop Hadoop

Hadoop
HDFS 3
4-6 Hadoop
No.

HDFS

Hadoop
Hadoop

MapReduce

Hadoop MapReduce

4-9

4.1.7 L2
L2 Hadoop
Hadoop L2
Hadoop



MapReduce

2 Hadoop
6 L2
1/6

4.2

Hadoop

4.2.1

Hadoop

4.1.4 L3 4-6 L3
L2 L3

4-10

L3

L2-L3

L3

L2

4-6

L3
L3 4-7 VRRP
L3

4-8

VRRP(Active)

VRRP(Standby)

L3
L2

4-7 VRRP

4-11

4-8 L3
No.

Active/Standby

VRRP

10

OS

L2
L3 L2 L2
(STP)

4-7
L3

L2

4-7

4.2.2

4-9
4-9
No.

L3 ()

4-12

No.

L3 (

)
3

L2

L3

L2

1 Hadoop

4.2.3 Hadoop
4.1.3 Hadoop

HA

HA Heartbeat
DRBD HA
4-8

LAN
NameNode

NameNode

heartbeat
DRBD

heartbeat
DRBD

NameNode()

NameNode()

edits

Heartbeat/ LAN

4-8 HA

4-13

edits

NameNode 4-10
SecondaryNameNode NameNode
SecondaryNameNode JobTracker
4-10
No.

NFS

NFS

SecondaryNameNode

JobTracker

4.2.4 Hadoop
Hadoop
4-11
4-11 Hadoop
No.

()

bonding

()

()

STONITH

()

NameNode

NameNode

4-14

No.

10

JobTracker

JobTracker

NameNode
Heartbeat 2 NameNode
Safemode HDFS
3

JobTracker
Heartbeat

Heartbeat
Hadoop 3
JobTracker MapReduce

HA

4.3 Hadoop
4.2.4 Hadoop HA

NameNode Safemode
JobClient
FT

4.3.1 FT
FT

4.3.1.1 FT
FT 2

FT 4-9
4-15

OS

OS

OS

OS

CPU

CPU

CPU

LAN

LAN

CPU

LAN

LAN

4-9 FT

FT Kemari Kemari
FT I/O

Kemari 4-10 4-12


FT CPU Kemari
Kemari

LAN

OS

OS

LAN

4-10 Kemari

4-16

4-12 Kemari
No.

FT

OS CPU

4.3.1.2 FT
Kemari Hadoop

Hadoop
Hadoop

Hadoop
Hadoop

(1) NameNode
HDFS HDFS
1 200byte
8GB HDFS 40TB

NameNode
(2) JobTracker
JobTracker MapReduce
TaskTracker

Hadoop Kemari

4-17

FT

4.3.2

Kemari

Hadoop FT
4-13
4-13 FT
No.

()

bonding

()

STONITH

()

()
Hadoop 1
Hadoop

Kemari 4-14 Kemari


/Kemari
4-14 Kemari
No.

NNBench

Kemari

268tps

Map :1

Kemari

30tps

:5000

Kemari

17.5

:10GB

Kemari

16

2
3
4

Terasort

4-18

NNBench NameNode 1/10


Terasort MapReduce
Terasort TaskTracker
Hadoop

4.4

4.4.1 Hadoop
Hadoop

Hadoop HA

FT Kemari

4.4.2
Hadoop Hadoop
Hadoop0.21.0 SecondaryNameNode StandbyNode
NameNode
4-11
Hadoop
DRBD

4-19

Active
NameNode

SNN Registration
RPC: NameNodeProtocol

Standby
NameNode

SNN Edits Stream


RPC: NameNodeProtocol

Journal
Spool

Edit OutputStream

Edit OutputStream

Image transfer channel


HTTP:NameNodeServlet

4-11 StandbyNameNode

4-20

Hadoop

Hadoop

11 12 13

5.1
Hadoop
Hadoop

5.1.1
5-1 Hadoop

5-1
No.

OS
Hadoop

5-1

5.1.2

Hadoop

Hadoop

Hadoop

5.1.3

Hadoop

5-1

5-2

5-1
11 12 13

5-1
5-2
No

5.1.4
Hadoop
Hadoop

5-3

5.1.5

Hadoop

CPU

5.1.6

5.1.1

5.1.2

11 12 13

5.1.35.1.45.1.5
5-3
5-3
No

5.3.1.3

5.3.1.4

5.3.1.5

5.3.2.3

5-4

No

5.3.2.4

10

5.3.2.5

5.3.3.3

11

12

13

14

5.3.3.4

15

5.3.3.5

5.3.4.3

16

17

18

19

5.3.4.4

20

5.3.4.5

5.2

5.2.1
5-2

Job

L3
L3

Hadoop (DataNode/TaskTracker)
L2

L2

L2

L2

L2

NameNode
Namenode

Hadoop 100
JobTracker
JobTracker

5-2

5-5

Core2 Duo
40

Xeon QuadCore XeonQuadCore Xeon DualCore


12
18
16

Hadoop

Core2 Duo
10

5.3

Hadoop
5-3
Hadoop

5.3.1
Hadoop Hadoop

5.3.1.1
Hadoop 96
Hadoop
12 5-1
Hadoop 96

Kickstart

DHCP/TFTP/HTTP

DHCP/DNS

96

5-3 11

5.3.1.2
96 90

10 Hadoop

5-4 CPU

5-6

5-4 CPU

5.3.1.3
96

10
4 380

CPU 25%
4 400 Hadoop
HTTP

CPU 5%
20 2000 Hadoop

Puppet

5-4

5-7

5-4
No.

10

4 380

90 96
HTTP CPU
90 400

Hadoop

5.3.1.4
Hadoop

5.3.1.5
96 Hadoop 5-5
5

5-5
No.

CPU

HDD

HP

Xeon QuadCore/2.33GHz x2

8GB

SAS 146GB x 2

HP

Xeon QuadCore/3.16G

6GB

SAS 146GB x 2

HP

Xeon DualCore/2.33G

2GB

SAS 72GB x 2

HP

Xeon QuadCore/2G

6GB

SAS300GB x 2

NEC

Core2 Duo T9400

2GB

SATA 250GB x 2

5-8

CPU

Hadoop

5.3.2
Hadoop Hadoop

5.3.2.1
Hadoop

11 Ganglia

12 Ganglia
5-5

gmond
gmond
gmond

gmond
gmond

gmond

gmond
gmond

gmond

gmond
gmond
gmond

5-5 Ganglia

5.3.2.2
Ganglia Hadoop
CPU Hadoop

5-9

5-6 4 r4-1-0-01
4

5-6 Ganglia
4
5-7 Ganglia

5-7 Ganglia
5-8 Hadoop Ganglia

1.

2.

3.Heap

4.swap-inout

5.

6.

5-8 Ganglia Hadoop

5-10

5.3.2.3
96

Web

1 Hadoop
100 Hadoop
CPU 35%
300

CPU 10%
10 1000
0.25%WAIT CPU
WAIT CPU 10% CPU
90 / 0.25 = 360 CPU Idle
13
5-6
No.

Web

100 1
CPU
300

5.3.2.4
Hadoop
Hadoop

Hadoop
3
5-11

5.3.2.5

5.3.3
Hadoop Hadoop Hadoop

5.3.3.1

Hadoop
11
Puppet
5-9

Hadoop

OS

(puppetrun)

Puppet

-Hadoop NameNode

-Hadoop DataNode
-Hadoop

Ganglia

-Hadoop

CPU/

5-9 Puppet

5.3.3.2
Puppet Hadoop
1
Hadoop Puppet facter
Hadoop

5-12

5.3.3.3
Hadoop

Hadoop

3 1 Hadoop

1 10
CPU 1
10
5-7
5-7
No.

100

100 3

5.3.3.4
Hadoop

Hadoop

5.3.3.5
96 Hadoop 5-5

Puppet facter

5-13

5.3.4

Hadoop

5.3.4.1
3 70

5.3.4.2
5-10

30

25

SAS 46

20

SATA 24

15

10

0
0:00

0:10

0:20

0:30

0:40

0:50

1:00

1:10

1:20

1:30

1:40

[:]

5-10
2 Hadoop
CPU 5-11

5-14

5-11 CPU

11
5-8
No.

11/27

HDD RAID

OS

BIOS

12/1 RAID

PhysicalVolume

12/13

HDD RAID

OS

BIOS

12/15 HDDHDD

PhysicalVolume

1/ 26

swap

5-15

No.

end_request I/O error dev, sda,


sector NNNNN
OS fsck

1/ 28

OS

5.3.4.3

10
1 5
2 70
90
5-9
No.

70 1

40
30

5
2

5.3.4.4
5-10 5-11
5-10
No..

70

5-16

10

91

No..

HDFS

5-11
No.

32

OS
2

OS

Ganglia,

OS

Hadoop

5.3.4.5
Hadoop 5-5

Puppet

5.4
Hadoop

5-17

5.4.1
Hadoop

5-12
Hadoop

5-12
No.

96 Hadoop

380

90 96

100 1

Hadoop

5-18

No.

10

11

100

12

13

100 3

14

15

Puppet facter

16

70 1

17

18

5 2

19

20

5-19

5.4.2

5-20

5 Hadoop
MapReduce

6.1

6.1.1
2 5 MapReduce Hadoop
Hadoop

2 3

4 Hadoop
FT
Kemari
Hadoop
6-1

(1 )Kemari

5 Hadoop

6.1.2

6.1.2.1
5
5
0 5 10 55

5
5
2

6-1

6-2

6-1

()

No.
1

22

(:

730 )

25

900

(:

30000 )

93

900

(:

30000 )

()
3
100
2
3

6.1.2.2
1

6.1.2.1
( 10 )

2
6-2

6-3

6-2

93

()

No.
1

2.1TB

140

6-4

6.2
2 5 6-1

Hadoop/
Job
L3
L2

NameNode
Namenode

JobTracker
JobTracker

r2

L2

L2

L2

L2

L2

L2

Hadoop
(DataNode/TaskTracker)

r6
10

r5
18

r4
16

r3
12

r7
40

6-1

6.3 MapReduce
2 6-1

6-2

6-5

6.3.1

6-1

6-3

6-3

()

No.
1

22

(:

730 )

25

900

(:

30000 )

93

900

(:

30000 )

()

6-6

6.3.1.1
()

6-4
6-4 1
No.

122

()
(300 ) 122

6-2

6-2

6-7

6.3.1.2

900

( 1 )

25 ()
6-5
6-5 2
No.

1,223

25

240

()
1 1223
240

6-6
6-6
No.

()

()

22 ( 1 )

12,215

900 ( 2 )

295,745

24

6-8

6-3 6-4

6-3 1

6-4 2
6-9

6.3.1.3

25 2
93 ( 3 ) 6-7

6-7 3
No.

25

548

93

221

()
( 25 )548

221

3
6-8
6-8
No.

( 2 )

295,745

( 3 )

457,816

6-10

154%

6-5 6-6

6-5 2

6-11

6-6 3

6-12

6.3.2

6-2

9.6 MapReduce

9.6.2 MapReduce
6-9
6-9

93

()

No.
1

2.1TB

140

6.3.2.1

9.6.2 MapReduce

3.5

9.6.2
6-10

6-13

6-10
No.

Map

546

492

119

Reduce

572

5349

128

Map

850MB

173GB

7.8GB

Reduce

750MB

217GB

21GB

Map

1386

2782

2600

Reduce

1300

1300

50

Map

260

260

260

Reduce

260

260

260

Hadoop
MapReduce
6-11
6-11
No.

4.6GB

1.9TB

85GB

Map

3900

31840

2600

Reduce

1300

1300

100

Map

260

260

260

Reduce

260

260

260

6-11 9.6.2 MapReduce

9.6

MapReduce

9.6.2 MapReduce
6-12

6-14

6-12MapReduce
No.

Map

Reduce

Total

51 2

46 37

56 11

1 33 48

16 59 43

17 25

21 37

41 26

42 31

6.3.2.2
6-9 1 1

MapReduce

6-13
6-13
No.

93

21 53 10

10
MapReduce

10
Hadoop Map Map
HDFS
Map
Map
HDFS Map

6-15

6-14
6-14
No.

(-)

43 45

51 2

-14.5%

44 8

46 37

-17.7%

46 41

56 11

-16.9%

21.5%

Map

54 1

33 48

20

16

Reduce

40 11

59 43

20

17

Total

43 24

25

11 2

21 37

-48.9%

20 51

41 26

-49.6%

23 5

42 31

-45.7%

21 53

18 43 7

16.9%

10

Map

Reduce

Total

5
6
7

21.6%
21.3%

Map
8

Reduce

Total

10

Total

16.9%9
MapReduce
MapReduce

Map Reduce -14.5%


-17.7%9 30%
Map Reduce
6-16

10

Map Reduce

Map 1.7%

6.4
4

FT Kemari

(1)
(2)

(3) Kemari
(4) Kemari

(1) (6.4.1)
(2) (6.4.2)
(3) (JotTracker/NameNode)(6.4.3)
(4) L2 (6.4.4)

6.4.1

6-17

6.4.2

1 (r4-1-0-01)

Reduce

6-15
(r5-1-0-12)
Map
6-15 Reduce
No

r4-1-0-01

11:10:22

35

r5-1-0-12

6.4.3

11:11:41

10 9

10 21

FT

JobTracker NameNode
FT GratiousARP

6.4.3.1 JobTracker
JobTracker
4 JobTracker
Hadoop
6-16
6-16 MapReduce
No

Map

184

Reduce

50

6.4.3.2 NameNode
NameNode
2 NameNode
Hadoop
6-17
6-18

6-17 MapReduce
No

Map

184

Reduce

268

FT

6.4.4

1
(r6) 10

1 6-18
6.4.1
1

6-18
No

11:17:01

1 46

11:21:21

20 1

11:41:24

28

Hadoop CPU HDFS


6-7 10
15 Hadoop I/O

6-19

HDFS
I/O

6-7 1 CPU

6.4.5 Kemari
FT Kemari

I/O Kemari
6-1
Kemari 6-19
6-19 Kemari
No.

Kemari

180

256

29%

25

249

485

48%

93

258

553

53%

6-20

Kemari

Kemari 6-8
4
40Mbps

25

93

6-8

6.5
5 6-1

(1)
(2)

6-21

6.5.1

12

6.5.1.1

Hadoop 25

100

6.5.1.2

6-20
LAN

6-20
No

70

10

96

70

5 (

TaskTracker

Hadoop HDFS
6-22

No

1,2

6-9

30

SAS 46

25

20
[]

SATA 24

15

10

0
0:00

0:10

0:20

0:30

0:40

0:50

1:00

1:10

1:20

1:30

1:40

[:]

6-9

6-21

6-23

6-21 tiobench I/O


No.

CPU

(Mbyte/s1 )
seq read / seq write

1
2

SAS (

2.33GHz2 8

SAS 146GB

SATA

Core2 Duo T9400

SATA 250GB

()

2.53GHz 2

60.939 / 62.358
95.419/ 26.693

6-10

JobClient

6-10 CPU

6-24

Ganglia IO

JobClient

6.5.1.3

6-22
No

233 ( 8 )

149 ( 19 )

98

( 9 )

CPU
CPU

19

6-11 Hadoop CPU


6-25

4
Ganglia 4

OS

6-12 4 CPU
OS
Hadoop

6.5.1.4
Hadoop JobClient

Hadoop

6-13

6-26

JobClient

TaskTracker

TaskTracker

TaskTracker

TaskTracker

TaskTracker

6-13

TaskTracker TaskTracker JobClient


TaskTracker
rsync

6-10 CPU

JobClient CPU


CPU
Linux rsync
rsync SSH
CPU Hadoop

RSH
CPU
5GB
6-27

2GB
JobClient CPU

6.5.2
Hadoop
5 10

6.5.2.1
6-23

6-23
No.

r4-1-0-09

Hadoop OS

6.5.2.2
11
6-24
6-24
No.

16:50

r4-1-0-09

16:52

Ganglia ( 6-14)

16:55

( 6-15)

17:00

6-28

No.

32

17:02

17:03

OS

17:35

Ganglia,

OS

Hadoop

6.5.2.3
6-24 No2
Ganglia 6-14

6-14 Ganglia
Ganglia

6-29

Ganglia 6-15
4
4

6-15
Nagios, Ganglia

Ganglia

6-16

6-16

6-30

Ganglia

6.5.2.4

6-25
6-25
No

85.6 ( 4 )

84.5 ( 2 )

86.3 ( 7 )

86.8 ( 5 )

() 5

6.6


FT Kemari Hadoop
Hadoop
Kemari
6-31

Hadoop
Hadoop

Hadoop 10 2,3
8 (3 25 )
5 1(1223 240 )
Hadoop

Hadoop

FT Kemari 93 53%
FT

6-32

100

MapReduce

MapReduce
Map Reduce

Map Reduce


Map Reduce
Map Reduce
Hadoop
Map Reduce CPU
GB
MapReduce GB
MapReduce

Hadoop

Hadoop Hadoop
Hadoop Hadoop L3

7-1

Hadoop HA FT
FT Kemari
Hadoop
HA
Kemari
Hadoop 3 3 100

Hadoop

Hadoop
Hadoop
Kickstart Puppet
100 90

Hadoop

Ganglia
100 Hadoop

PC 400

Hadoop 10 2,3
8 (3 25 )
5 1(1223 240 )
Hadoop

Hadoop
7-2

FT Kemari 93 53%
FT

7-3

8 MapReduce

MapReduce
MapReduce
MapReduce MapReduce
Hadoop
MapReduce PageRank
Hadoop MapReduce Hadoop
MapReduce

Hadoop MapReduce Pig Hive

8.1 MapReduce
MapReduce

8.1.1 MapReduce
MapReduce Google MapReduce

MapReduce KeyValue MapReduce


8-1 3
(1) Map
KeyValue
(2) Shuffle
Map KeyValue Key
Reduce
(3) Reduce
Key KeyValue
MapReduce Map Reduce

8-1

8 MapReduce

Map

Shuffle
<key1, Value1>
<key2, Value2>
...

Reduce
<key1, Value1>
<key1, Value3>
...
<key2, Value2>

<key1, Value3>
<key3, Value4>
...

<key3, Value4>
...

<Key, Value>
Key <Key, Value>
Map,Reduceworker

8-1 MapReduce

8.1.2 MapReduce
MapReduce

MapReduce


Map Reduce


Map Reduce

8.2 MapReduce
MapReduce

8-2

8 MapReduce

Map Reduce Key


Value MapReduce

8.2.1 MapReduce
MapReduce 3
(1)

(2)
(1)

(3) Map Reduce


Reduce
Map
MapReduce Map
Reduce

MapReduce
MapReduce

MapReduce
MapReduce
MapReduce
MapReduce

Key
MapReduce KeyValue Key
Key MapReduce

8.2.2 MapReduce
MapReduce 2
(1) Web
8-3

8 MapReduce

Web 8-2
MapReduce Web Map <
, 1>Shuffle
Reduce <, 1><, >

Welcome to My HomePage.
Thank you.
Where is your house? ....

Web

Map

<homepage, 1>
<homepage, 1>
<house, 1>

<welcome, 1>
<homepage, 1>
<you, 1>
<go, 1>
<where, 1>
<your, 1>
<house, 1>
<homepage, 1>

<, 1>

Shuffle

<welcome, 1>
<welcome, 1>
<where, 1>

<you, 1>
<your, 1>
<your, 1>

Reduce

<go, 2>
<homepage, 10>
<house, 3>
<welcome, 8>
<where, 7>
<you, 4>
<your, 5>

8-2 Web
MapReduce
Web :
: Web
(2)

MapReduce 8-3 Map


1
Reduce

8-4

8 MapReduce

abc@example.com hello...
def@example.net adadafa

<abc@example.com, 10>
<def@example.net, 200>
<ghi@example.org, 0>
<aaa@example.jp, 0>
<abc@example.com, 0>
<def@example.net, 100>
<def@example.net, 50>

Map

Shuffle

Reduce

<, >

<aaa@example.jp, false>
<abc@example.com, false>
<def@example.net, true>
<ghi@example.org, false>

8-3
Map Reduce
Map Reduce

8.2.3 PageRank MapReduce

8.2.3 MapReduce
8.2.1 8-4
PageRank MapReduce PageRank

Web Web
Web PageRank

Web Web

8-5

8 MapReduce

PageRank

PageRank
Web
(1)
(2) (1/)
(3)

Web

example.com

100.5

example.net

200.5

hogehoge.jp

300.5

fugafuga.jp

0.25

example.jp

8-4 MapReduce
PageRank
(1) Web 1 1
(2) Web 1
(1/)

(3) (2) Web PageRank


8-4 PageRank MapReduce

8.2.4 MapReduce
8.2.3 PageRank MapReduce PageRank
PageRank 8-5 3

8-5 PageRank
8-5

8-6

8 MapReduce

8-6
Web

Web
URL

8-6 PageRank
8-6 PageRank

8-7

Web
URL

Web

Web
URL

Web

8-7 PageRank
Map Reduce


Reduce
Map
PageRank
Key Reduce

Map 8-8

8-7

8 MapReduce

Reduce Map
MapReduce

Web

Web
URL

Map

Reduce

Web

Web
URL

8-8 PageRank
Map Reduce PageRank
8-9 MapReduce
: 10
My Homepage Link.
example.net : example
example.com : example

<example.net, 1/10>
<example.com, 1/10>

Web

Map

: 25
example.net Link.
My Homepage Link :
example.com : example

<mypage.html, 1/25>
<example.com, 1/25>

Shuffle

1Web
Web1

Reduce

<example.com, 100.25>
<example.net, 50.66>
<mypage.html, 0.2>

Web
PageRanK

8-9 PageRank MapReduce

MapReduce
PageRank

8-8

8 MapReduce

PageRank MapReduce
(: X (X-1) )
Web

8.3 Hadoop MapReduce


Hadoop MapReduce
Hadoop MapReduce 8-10

(1)
Map

Reduce

Shuffle

(7)

(1) MapReduce
(2)
(3)
(4) Map
(5) Shuffle
(6) Reduce
(7)

(3)
<Key, Value>
(2)

(4)

(5)

(6)

8-10 Hadoop MapReduce


MapReduce : MapReduce
: MapReduce
: MapReduce
Map : MapReduce Map
Shuffle : Shuffle Partition
Reduce : MapReduce Reduce
: MapReduce
Hadoop MapReduce Java API
8.2.38.2.4
PageRank Hadoop MapReduce

8-9

8 MapReduce

Hadoop MapReduce 2010 2


0.20.1 Hadoop
MapReduce Hadoop
(HDFS) MapReduce
Hadoop MapReduce MapReduce Map
Reduce Hadoop

8.3.1 MapReduce
Hadoop MapReduce MapReduce MapReduce
8.3.7 Map Reduce
MapReduce
MapReduce Job
(org.apache.hadoop.mapreduce.Job)
Map Reduce
Job MapReduce
Job
Job.getConfiguration().set()
MapReduce

8.3.1.1
MapReduce

(1)
FileInputFormat setInputPaths /

(2)
FileOutputFormat setOutputPath (
)
8-10

8 MapReduce

FileAlreadyExistsException MapReduce

(3)
8.3.6 8.3.7
(4) Map Reduce
8.3.2 Map
8.3.3 Reduce

8.3.1.2
MapReduce

(1) Map Reduce Java


JavaVM mapred-site.xml
MapReduce mapred.child.java.opts

( 200MB)


(2) Map Reduce
Map Reduce TaskTracker Child
Child TaskTracker

Failed
mapred.task.timeout
(
60000 )
(3) Map Reduce
OS

mapred.child.ulimit
OS

8.3.1.3
MapReduce

8-11

8 MapReduce

(1) Reduce
Job setNumReduceTasks
Reduce
0 Map Hadoop Reduce

Reduce Hadoop

Reduce < Hadoop

Hadoop
Reduce >= Reduce

(2) Map
Map Map

Map < Hadoop

mapred.max.split.sizeMap 1
1 Map

Map
16MB(16777216)
Job job = new Job(); //

job.getConfiguration().set(mapred.max.split.size, 16777216);
Hadoop
Hadoop
Map >=

(3) MapReduce
HDFS
dfs.replication
HDFS 3 HDFS
MapReduce
3

8-12

8 MapReduce

8.3.1.4
MapReduce Map
Job.getConfiguration().set()MapReduce
MapReduce
Map

8.3.1.5 MapReduce
MapReduce

MapReduce
MapReduce
org.apache.hadoop.conf.Configured
org.apache.hadoop.util.Tool
main : run
run : MapReduce
MapReduce : setJobName
: setInputFormatClass
: setOutputFormatClass
Map : setMapperClass
Reduce : setReducerClass
: FileInputFormat.setInputPaths (HDFS
)
: FileOutputFormat.setOutputPath(HDFS
)
Key : setOutputKeyClass
Value : setOutputValueClass
Map Key : setMapOutputKeyClass
Map Value : setMapOutputValueClass
configuration :
execute : MapReduce
main MapReduce run
configurationexecute

8-13

8 MapReduce

8.2.4 PageRank MapReduce

8-11 MapReduce
Hadoop MapReduce
Map Reduce

8.3.2 Map
Map Map KeyValue Map
setup Map cleanup
(: Map )

Map
Map
org.apache.hadoop.mapreduce.Mapper
setup : Map
map : Map
8-14

8 MapReduce

cleanup : Map
run : setup, map, cleanup Map
()
Map 8.3.5
(: TextInputFormat Key
LongWritable Value Text ) KeyValue
8.3.1.5 KeyValue
Map KeyValue
8.2.4 PageRank Map PageRank
Web
Map
8-12
PageRank Map

8-15

8 MapReduce

8-12 Map
PageRankMapper Map setup Key
Web countTotalLinks
map
Map
Hadoop

Map

Hadoop Map Failed

8-16

8 MapReduce

8.3.3 Reduce
8.2 Reduce Hadoop Reduce
Reduce setup Reduce
cleanup

Reduce
Reduce
org.apache.hadoop.mapreduce.Reducer
setup : Reduce
reduce : Reduce
cleanup : Reduce
run : setup, reduce, cleanup Reduce
()
Reduce Reduce

setup cleanup Reduce

8.2.4 PageRank Reduce PageRank


Reduce
8-13 PageRank Reduce

8-17

8 MapReduce

8-13 Reduce
PageRankReducer reduce Key
Reduce Reduce
Key Iterable reduce
Reduce Map

8.3.4
Hadoop MapReduce KeyValue

(1) : Map
(2) : Map Reduce (Shuffle )
(3) : Reduce
Hadoop
Hadoop

8-18

8 MapReduce

8.3.4.1
Hadoop
()
java.util.Map

(1) (Text )
(2) (IntWritable , DoubleWritable )
(3) (BytesWritable )
(4) (BooleanWritable )
(5) (ArrayWritable , TwoDArrayWritable )
(6) Map (MapWritable )

8.3.4.2
Key Hadoop
Key Value WritableComparable
implement Value Writable
implement

org.apache.hadoop.io.Writable (Value )
org.apache.hadoop.io.WritableComparable (Key Value
)
set : (: )
get : ( : )
write : (:
)
readFields : (:
)
compareTo : (: Object
, : -1:, 0:, 1:
)WritableComparable
toString :
equals : (: Object
, : true / false)
8-19

8 MapReduce

Map Reduce
Key
Comparator Key

Comparator :
org.apache.hadoop.io.WritableComparator
compare :
Comparator
Hadoop MapReduce Shuffle Key
Key

8.2.4 PageRank
(1) Web
(2)
(1) Web Web
(2)Hadoop
DoubleWritable 8-14 Web
WebDataWritable

8-20

8 MapReduce

8-14
WebDataWritable WebData 8-15
Web
WebData WebDataWritable

WebData MapReduce

8-21

8 MapReduce

8-15 WebData
Map Reduce Hadoop
MapReduce MapReduce
8.3.5 MapReduce

8.3.5 Shuffle
Shuffle Map Key Reduce Hadoop
Shuffle 3
Partition : Reduce Key Value

Grouping : Key Reduce


Sort : Partition Shuffle Key
Map Partition Key
Reduce Hadoop
Key Partition Partition
Partition
Key
Sort 8.3.4

Grouping

Partitioner
Partitioner
org.apache.hadoop.mapreduce.Partitioner
getPartition : Key Value Reduce Key
Reduce
Hadoop Partition Key Value
8-22

8 MapReduce

8.2.4 PageRank Partition 8-16

8-16 Partitioner
8-16 PageRank Web Key
Web Partitoner
getPartition
Key int float Partition

Grouping
Grouping
org.apache.hadoop.io.RawComparator
compare : KeyValue Key

8.2.4 PageRank WebData address


address Key Grouping

8.3.6
Hadoop Map
Hadoop

8-23

8 MapReduce

8.3.6.1
Hadoop
(1)
1 TextInputFormat

TextInputFormat KeyValue
Key LongWritable
Value Text Key Value

(2)
Hadoop SequenceFile
SequenceFileInputFormat

8.3.6.2
FileInputFormat Hadoop

org.apache.hadoop.mapreduce.lib.input.FileInputFormat
createRecordReader : (RecordReader)
isSplitable :

1
(RecordReader)
RecordReader

org.apache.hadoop.mapreduce.RecordReader
initialize : KeyValue KeyValue
nextKeyValue : Map KeyValue

8-24

8 MapReduce

getCurrentKey : Key
getCurrentValue : Value
close :
8.2.4 PageRank TextInputFormat
PageRank
8-17

8-25

8 MapReduce

8-17

8.3.7
Hadoop Reduce
Hadoop

8-26

8 MapReduce

8.3.7.1
Hadoop
(1) : TextOutputFormat
KeyValue
Key Value
(2) : SequenceFileOutputFormat
Hadoop SequenceFile
MapReduce SequenceFile

8.2.4 PageRank Hadoop


TextOutputFormat
Key Value NullWritable
NullWritable WritableCompareble

8.3.7.2
FileOutputFormat (TextOutputFormat
)TextOutputFormat

(1)
(2) KeyValue
(3) ()

8.3.8
MapReduce
MapReduce
(1) Map Reduce
MapReduce Hadoop

(2)
Key

8-27

8 MapReduce

8.4 MapReduce
Hadoop MapReduce
MapReduce Hadoop
MapReduce
Key
MapReduce
Hadoop MapReduce MapReduce

Hadoop MapReduce

8.4.1 Pig
Pig Pig Latin Hadoop MapReduce

Pig Key Key


JOIN Key GROUP Hadoop MapReduce

Pig Latin Pig MapReduce


Hadoop MapReduce
2010 2 Hadoop
Pig-0.5.0

8.4.1.1 Pig
Pig

(1) Key (GROUP , COGROUP )


(2) Key (DISTINCT )
(3) Key (JOIN , JOIN OUTER )
(4) Key (SPLIT )
(5) (UNION , CROSS )
(6) (FILTER )
(7) (ORDER )
(8) (FOREACHGENERATE )
Pig Latin

8-28

8 MapReduce

A = LOAD dataa.csv AS (id:charaarray, test1:int, test2:int);


B = LOAD datab.csv AS (id:charaarray, test3:int, test4:int);
JOINDATA = JOIN A BY id, B BY id;
DISTINCTDATA = DISTINICT JOINDATA;
FILTERDATA = FILTER DISTINCTDATA BY (test1 > test3)
GENDATA = FOREACH FILTERDATA GENERATE id, test1, test2, test4;
ORDERDATA = ORDER GENDATA BY test4 DESC;
JOIN
Hadoop MapReduce Pig

Key

LOG = LOAD score.csv AS (id:charaarray, test1:int, test2:int);


SCORELIST = FOREACH LOG GENERATE id, test1+test2, test1-test2;

Key
(AND, OR, NOT)

LOG = LOAD hogehoge.csv AS (id:charaarray, score:int);


IDLIST = FILTER LOG BY (score >= 50) AND (NOT score > 100)

Key (MAX)(MIN)(AVG)(SUM)(COUNT)
(CONCAT)
Key
LOG = LOAD hogehoge.csv AS (id:charaarray, score:int);

8-29

8 MapReduce

IDLIST = GROUP LOG BY id;


MAXSCORE = FOREACH IDLIST GENERATE group, MAX(LOG.score);

Pig Pig LOAD


STORE LOAD STORE
PigStorage BinStorage
DUMP
DATA = LOAD /path/to USING PigStorage(,);
STORE DATA INTO /path/to2 USING BinStorage(,);

HDFS
Pig HDFS Hadoop
cat,cd,copyFromLocal,copyToLocal,cp,ls,mkdir,mv,pwd,rm,rmf
Pig exec, run
Job kill
set
grunt > cat hogehoge.txt
aaa bbb
ccc ddd
grunt > run pigscript.pig
# Pig

Pig
KEY Java
Pig REGISTER Java
(jar )

8.4.1.2 Pig
Pig (Hadoop ) Pig
Java Java Pig
Pig Pig

8-30

8 MapReduce

Pig Java
Java Pig

# Pig
user $ java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main
grunt > Pig
# Pig
user $ java -cp $PIGDIR/pig.jar:$HADOOP_CONF_DIR org.apache.pig.Main
script.pig
script.pig MapReduce

8.4.2 Hive
Hive Pig HiveQL SQL
MapReduce Hive RDBMS
HiveQL

HiveQL
Hive MapReduce
Hadoop MapReduce

8.4.2.1 Hive
Hive

CREATE TABLE ()/ DROP TABLE ()/ ALTER


TABLE() SQL Hive

Hive SQL
CREATE TABLE sample(id STRING, score INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 0A
STORED AS TEXTFILE;

LOAD INSERT OVERWRITE HDFS

8-31

8 MapReduce

SQL
UPDATE DELETE Hive

LOAD DATA INPATH test.csv INTO TABLE sample;

SELECT SELECT
GROUP BY DISTINCT SORT BY
ORDER BY
JOIN UNION

SELECT DISTINCT a.id, b.score FROM sample a JOIN sample2 b ON (a.id =


b.id) WHERE b.score > 80 LIMIT 10;
SELECT id, score FROM (
SELECT name AS id , point AS score
FROM sample3
) result

8.4.2.2 Hive
Hive Hive (Hadoop ) Hive
Hive
Hive metadata
metadata Hive

user $ cd $HIVE_HOME
user $ ./bin/hive

Hive

hive > create table table_name (col1 INT, ) ;


hive >

8-32

8 MapReduce

8.4.3 Pig Hive


Pig Hive MapReduce

(JOIN, UNION, GROUP) Pig


Hive
Pig Hive


()
MapReduce Pig
Hive

MapReduce MapReduce
Key Value MapReduce

8.5 MapReduce
Pig Hive MapReduce

8.5.1

Map Reduce
Hadoop MapReduce MapReduce

Hadoop Hadoop

8.5.1.1 Map
Hadoop Map
64MB 1
Map 64MBB

8-33

8 MapReduce


FileInputFormat setMaxSplitSize
mapred.max.split.size1
Map
mapred.max.split.size


HDFS 1 Map
HDFS

dfs.block.size
1 HDFS
HDFS HDFS

8.5.1.2 Reduce
Reduce
mapred.reduce.tasksMapReduce
Job setNumReduceTasks
Reduce 1
Reduce 1

8.5.1.3
Map Reduce
JobTracker Web
Web Map Reduce

JobTracker
JobTracker MapReduce

MapReduce API
Job getTaskCompletionEvent MapReduce
8-34

8 MapReduce

getTaskCompletionEvent

8.5.2
Map Reduce
3
stdout : Map Reduce System.out.println

JavaVM stdout
stderr : Map Reduce System.err.println

syslog : API Log4J


Log4J
Log4J
MapReduce

Failed
MapReduce 24

8.5.3

MapReduce
Hadoop 1 MapReduce

Hadoop
Job Counters
Map Reduce Map
Map Map
Map
FileSystemCounters
HDFS

Map-Reduce Framework
8-35

8 MapReduce

Map Reduce
(2)FileSystem Counters

Hadoop
Context getCounter

/* : Map */
void map(Key KeyValue value, Context context) {
context.getCounter(TestMapper.class.getSimpleName(),
).increment()
}
JobTracker Web Job

8.5.4 MapReduce
MapReduce

2
: MB MB- MB, GB
:

(1) :

(2) memcached : memcached

(3) RDBMS : Hadoop RDBMS

(4) KeyValue : Hadoop KeyValue


(HBaseHypertable)

8-36

8 MapReduce

(5) MapReduce : MapReduce

MapReduce
8-1
8-1
MB

MB MB

GB

()

()

()

memcached

RDBMS

KV

MapReduce-AP

No.

MapReduce

RDBMS KV

memcached KV
RDBMS
KV
RDBMS

8.5.5 Map Reduce


Hadoop Map Partition Reduce
Reduce
Reduce Map Key

Reduce
Partition
Reduce Key Value
Reduce

8-37

8 MapReduce

8.5.1 Reduce

Reduce Key Value


Partition

8.6
MapReduce
Hadoop MapReduce
PageRank MapReduce Pig
Hive MapReduce MapReduce

Hadoop MapReduce
MapReduce

MapReduce
Hadoop

8-38

9 Hadoop

Hadoop
Hadoop
MapReduce Hadoop
Hadoop
Hadoop MapReduce
MapReduce Hadoop Hadoop

Hadoop
MapReduce Hadoop

MapReduce Hadoop
Hadoop MapReduce Map
Reduce
MapReduce
MapReduce
MapReduce

Hadoop
Hadoop

9.1

9.1.1

2
(1)

(2)
100%

9-1

9 Hadoop

9.1.2
CPU

9-1
9-1
No.

CPU

OS

OS

9-1

20

A
A

30

9-1

9-2

9-2

9 Hadoop

15

A
A

20

9-2

9-2
9-2
No.

9.1.3

9.1.3.1
9-3

9-3

9 Hadoop

9-3
No.

CPU (

)
2

()

CPU

()

()

9.1.3.2

9.1.3.1

9.1.3.3

9-4
9-4
No.

9-4

9 Hadoop

9.2

9-2

9.2.1
9-5
9-5
No.

CPU HDD

No.1 9-3
No.2

9-6 5

9-6
No.

CPU

NIC

S1

S2

S3

Xeon E5504

6GB

SAS

2GHz

300GB

Xeon E5345

8GB

SAS

2.33GHz

146GB

4 2

Xeon 5148

2GB

SAS

1Gbps

1Gbps

1Gbps

18

16

HP

2009

DL360G6

HP

2008

DL380G5

HP

2006
10

2.33GHz

72GB

DL360G5

AH480A

9-5

9 Hadoop

No.

CPU

NIC

S4

Core 2 Duo

2GB

SATA

T9400

250GB

2.53GHz

1Gbps

50

NEC

2009

Express5800

HP

2008
2

2
5

S5

9.2.2

Xeon X5460

6GB

SAS

1Gbps

3.16GHz

146GB

DL360G5

AK839A

Hadoop Hadoop

HDFS(Hadoop Distributed File System)


MapReduce Hadoop
Hadoop MapReduce

2010 2 Hadoop 0.20.1

9.2.3 Hadoop
9.1.1 Hadoop MapReduce

(1)
1 MapReduce MapReduce

(2)
1 MapReduce

9.3 Hadoop
Hadoop

9-6

9 Hadoop

9.3.1 Hadoop MapReduce


MapReduce KeyValue

MapReduce 9-7 3
9-7 MapReduce
No.

Map

Shuffle

Map
Key

Reduce

Shuffle

Hadoop MapReduce 9-3 Map Reduce


TaskTracker TaskTracker JobTracker Map
Reduce 1 TaskTracker

Map

Reduce

TaskTracker
Map
TaskTracker

Shuffle

Reduce

JobTracker

TaskTracker

9-3 Hadoop MapReduce


Hadoop MapReduce HDFS HDFS
NameNode DataNode
Hadoop MapReduce 9-4
HDFS

9-7

9 Hadoop

DataNode TaskTracker
DataNode TaskTracker
NameNode

Reduce

Map

Shuffle
Reduce

TaskTracker

JobTracker
(MapReduce)
(HDFS)

9-4 MapReduce HDFS

9.3.2 MapReduce
MapReduce Hadoop
9-7 9-3 9-4 MapReduce

Map
Map 9-5
HDFS
Map
Map
Hadoop
DataNode
Map
DataNode

Map

9-8

9 Hadoop

JobTracker
T T T

JobTracker
HDFS
(DataNode
DataNode)
Map
Map
JobTracker
()

Map

HDFS

9-5 Map
Map 9-8
9-8 Map
No.

CPU

Map

Map
Map

Map

No.3 Hadoop No.1 No.2


Hadoop

Shuffle
Shuffle Reduce 9-4 9-6

Map

Key
JavaVM

IO

9-9

9 Hadoop

JobTracker
T T T

Sort


Map
Map
Map
ShuffleJobTracker
1

9-6 Shuffle
Shuffle 9-9
9-9 Shuffle
No.

Map

Map

Reduce
Reduce 9-7 Shuffle
Reduce Reduce
(HDFS )

9-10

9 Hadoop

JobTracker

Reduce
ReduceHDFS
Reduce

Reduce

HDFS

HDFS

9-7 Reduce
Reduce 9-10
9-10 Reduce
No.

CPU

Reduce

Reduce

Reduce

Reduce

Reduce

No.3 MapReduce No.4


No.1No.2 Hadoop

Hadoop MapReduce
CPU

MapReduce
Hadoop

9-11

9 Hadoop

9.3.3

Hadoop
Hadoop

9.3.3.1 CPU
CPU Map Reduce
MapReduce Map Reduce
CPU
MapReduce CPU
9-8

CPU

Map

Reduce

CPU

Map

Reduce

CPU

CPU

Map

CPU

Reduce

Map

Reduce

9-8 CPU MapReduce


9.2.3
Map Reduce CPU
Map Reduce
CPU 9-9
CPU
MapReduce

9-12

9 Hadoop

CPU

Map

CPU

Reduce

Map

Reduce

M
Map

MMMMMM
RRRRRR
CPU

M
CPU

Reduce

CPU

9-9 Map Reduce

9.3.3.2
9-10

()
Reduce
Reduce

Reduce
Reduce
Reduce
Map

Map

()

Map
Map

Map
TaskTracker
DataNode
OS(
)

Java VM
Java

9-10

9-13

:
( 200MB)

9 Hadoop

Hadoop Map
Reduce JavaVM
MapReduce

Map 9-11 Map

GC java.lang.OutOfMemoryError

Record Buffer
index Buffer

MapBuffer

Data Buffer
MapBuffer

9-11 Map
Reduce Map Reduce

Map Reduce Java VM


( 200MB)Map Reduce

Hadoop Map Reduce


CPU

Map Reduce
MapReduce
CPU1
MapReduce
CPU1
Map Reduce
CPU

9-14

9 Hadoop

9.3.3.3
MapReduce
Map Reduce
Hadoop Map Reduce ( 10%)
JobTracker 1
JobTracker Map Reduce

JobTracker
# Map - JobTracker
2010-02-02 17:14:24,230 WARN org.apache.hadoop.mapred.JobInProgress: No room for
map task. Node slave001 has 14135296 bytes free; but we expect map to take
109193991
# Reduce - JobTracker
2010-02-02 17:14:24,231 WARN org.apache.hadoop.mapred.JobInProgress: No room for
reduce task. Node tracker_slave001:localhost.localdomain/127.0.0.1:44741 has
14135296 bytes free; but we expect reduce input to take 1091939917
JobTracker 1

# -
2010-02-02 16:51:37,581 FATAL org.apache.hadoop.mapred.TaskRunner: Task
attempt_201001061158_0367_m_000003_0 failed : org.apache.hadoop.fs.FSError:
java.io.IOException: No space left on device
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write
(RawLocalFileSystem.java:192)
()
Caused by: java.io.IOException: No space left on device
... 8 more
MapReduce

9-15

9 Hadoop


Map Reduce ( 4
) mapred-site.xml MapReduce

mapred.map.max.attempts, mapred.reduce.max.attempts

Hadoop MapReduce
MapReduce

( 4)

mapred.max.tracker.blacklists

9.3.3.4
9-6

100Mbps, 1Gbps, 10Gbps 1Gbps


NIC 1Gbps NIC

Hadoop Shuffle Reduce HDFS


Shuffle Hadoop 100Mbps

9.3.4

Hadoop
Hadoop

(CPU IO)

9.3.4.1 CPU
CPU CPU (top sar
%user %system ) 100%
(%iowait) CPU 100%CPU

9-16

9 Hadoop

Hadoop Map
Reduce CPU 100%
CPU
Map Reduce
Map Reduce CPU

CPU

()
CPU
Hadoop (PiEstimator)
PiEstimator 9-12 PiEstimator
CPU
CPU

PiEstimator CPU

Map

Shuffle

Reduce

9-12 PiEstimator

9.3.4.2
IO
iostat
await( IO ) avgqu-sz(IO
)%util( IO CPU )
MapReduce Map Reduce
MapReduce

9-17

9 Hadoop

Hadoop TeraSort
TeraSort 9-13 TeraSort KeyValue
MapReduce

TeraSort

Map

Shuffle

Reduce

9-13 TeraSort

9.3.4.3

sar
(sar -n DEV)(rxbyt/stxbyt/s)

Hadoop
Shuffle Map

Reduce HDFS
Shuffle Reduce

9-18

9 Hadoop

Shuffle
Shuffle
Hadoop
Shuffle
Map Shuffle

Map

Shuffle

Reduce

Reduce Reduce

9-14

9.4 Hadoop
MapReduce Hadoop

9.3.2 Map Reduce CPU


IO
Hadoop
Hadoop

9.4.1 Hadoop
Hadoop Map Reduce
Shuffle Reduce

Map
9.3.2 Map 2
(1) Map CPU

9-19

9 Hadoop

(2) Map IO
(1) Map top sar
CPU 100%
CPU
9-15 Map
CPU

(CPU4)

M M M M

MapCPUMap

(CPU4)

M M M M

M M

M M

M M

MapCPUMap

M Map
CPU
9-15 Map CPU
(2)Map

IO 9-16 CPU
Map

(2)

mmm
mm

Map1Map

(2)

mmm

mm

Map2Map

M Map
CPU

m Map

9-16 Map

9-20

9 Hadoop

Reduce
9-9 9-10 Reduce

(1) Reduce CPU


(2) Reduce IO
(3) Map
(1)Map CPU 9-17
Reduce CPU

(CPU4)

R R R R

ReduceCPUReduce

(CPU4)

R R R R

ReduceCPUReduce

R Reduce
CPU
9-17 Reduce CPU
(2)Reduce HDFS
HDFS Reduce
MapReduce

9-18 HDFS
IO

9-21

9 Hadoop

(2)

r r r
r r

Reduce1Reduce

(2)

r r r

r r

Reduce2Reduce

R Reduce
CPU

r Reduce(HDFS)

9-18 Reduce
(3)Shuffle Map

Map
m
m

R
m m m

m
m

m
m

mm

R
m

9-19 Map
Hadoop MapReduce

9-22

9 Hadoop

9-11
9-11 Hadoop
No.

Map

CPU

Map

Map

Map

Reduce

CPU

Reduce

Reduce

Reduce

Reduce

Map

Hadoop

9.4.2 Hadoop
Hadoop
9-11 Hadoop
9-12
9-12 MapReduce Hadoop
No.

Map

CPU

mapred.tasktracker.map.

tasks.maximum

Map

mapred.local.dir

Map

Map

Reduce

CPU

mapred.tasktracker.redu

ce.tasks.maximum

Reduce(Shuffle)

Reduce

dfs.data.dir

Reduce

HDFS

tasktracker.http.threads

Shuffle Map

Reduce

mapred.reduce.parallel.c

Map

opies

9-23

9 Hadoop

CPU

9.4.2.1
Hadoop
9-6

MapReduce 500GB TeraSort


S3

9-13 TeraSort

S1

18

S2

S4

S4

19

S5

58

96

107

116

( 1 )

(100%)

(165.5%)

(184.5%)

(200%)

2531

1961

1811

1775

( 1 )

(100%)

(77.5%)

(71.6%)

(70.1%)

1 4 MapReduce
(: 2531 4: 1775 )

Hadoop MapReduce
Hadoop
Hadoop

9-24

9 Hadoop

9-14 Hadoop
No.

mapred.local.dir

dfs.data.dir

9.4.2.2
Map Reduce Map Reduce

Map : Map CPU


PiEstimator
Reduce : Reduce
TeraSort
Map PiEstimator PiEstimator

88 (S1 17 , S2 4 , S3 16 , S4 43 , S5 8 )
Map CPU 1, 1.5, 2, 3, 4
5 5

PiEstimator 9-15
9-15 PiEstimator
No.

Map

()

()

CPU 1

483

487

3.35

CPU 1.5

481

483

1.27

CPU 2

485

497

6.68

CPU 3

500

505

2.93

CPU 4

481

511

17.65

No.1 No.2 No.5


Map
Map CPU CPU 1.5

9-25

9 Hadoop

Reduce TeraSort

8 (S1 8 ), CPU4
40GB TeraSort
Map : 4(CPU 1) , 6 (CPU 1.5)
Reduce : 2(CPU 0.5) 6(CPU 1.5)

TeraSort 9-20

Map4

Map6

900
800
700

()

600
500
400
300
200
100
0
2

4
Reduce

9-20 TeraSort
Map Reduce 35
(CPU
9-21

9-26

9 Hadoop

No

Ma

Reduce 3

Reduce 4

Reduce5

9-21 TeraSort
CPU (Map Reduce
4 )
Idle CPUCPU
CPU
100%Reduce
+1
9-16
9-16
No.

mapred.tasktracker.map.tasks.maximum

CPU 1
1.5

mapred.tasktracker.reduce.tasks.maximum

CPU
CPU + 1

Map Reduce
(1) Hadoop(DataNodeTaskTracker) OS 1GB

(2) JavaVM Permanent C


50MB
(
1GB)(Map Reduce +50MB)

9-27

9 Hadoop

(3) {(Map Reduce )} > (2)


Map CPU 1
(Map
Reduce )(Map Reduce )2
(2)Map Reduce

(4) (3)Reduce
Map
Map Reduce 9-17
1 JavaVM 200MB
(3)1 JavaVM
450MB
9-17
No.

S1

S2

S3

S4

S5

Map

Reduce

JavaVM MapReduce
MapReduce

9.5 Hadoop
MapReduce Hadoop

9.5.1 Map Hadoop


MapReduce Map 9-22

Map 9-18

9-28

9 Hadoop

Map

Map

Map

M
M

M Map
CPU

Map

9-22 Map
9-18 Map
No.

Hadoop

Map

MapReduce

(0.19 mapred.map.tasks
)

mapred.child.java.opts

1 Map

( 200MB)

Map Partition

io.sort.mb

Key

( 100)
4

io.sort.record.percent

Map
( 0.05)

Map Map Map


Map Map

1 Map Map
2 Hadoop Map

Hadoop 1 Map JavaVM 200MB

9-29

9 Hadoop

Map 100MB

9.5.2 Reduce Hadoop


Reduce 9-23
9-19
Reduce Map Shuffle
Reduce 25%
25%

R Reduce

Reduce Reduce

Reduce
CPU

9-23 Reduce
9-19 Reduce
No.

Hadoop

Reduce

mapred.reduce.tasks

Reduce (
1)

mapred.child.java.opts

1 Reduce
(
200MB)

3
4
5

Map

mapred.job.shuffle.mer

Map

ge.percent

( 0.66)

mapred.job.shuffle.inp

Shuffle

ut.buffer.percent

( 0.70)

mapred.reduce.parallel

1 Shuffle Map

.copies

( 5)

9-30

9 Hadoop

9.5.3 MapReduce
9.5.2
9.5.2 Map
Reduce Map Reduce

9.5.3.1 Map
Map CPU
MapReduce Map
CPU PiEstimator Map
TeraSort
GB Hadoop

Hadoop Map Map

Map
(1) Map 1 (
) Map

Map
(2) 1 Map JavaVM 5 1
Map
(1)
(3) MapReduce Map
30
(1)(3) Map

CPU Map
CPU Map PiEstimator 1Map

9-31

9 Hadoop

1Map ()
1 4 : 2.5 PiEstimator
2 : 5.0 PiEstimator
3 : 7.5 PiEstimator
13 : 88 (S1 17 , S2 4 , S3 16 , S4 43 , S5 8 ),
250
4 : 44 (S1 9 , S2 2 , S3 8 , S4 21 , S5 4 ),
126
1Map PiEstimator 9-24

1600

1400
1200
()

2
1000

800
600

400
200
0
0

10

20

30
40
50
Map/Map

60

70

80

9-24 Map (PiEstimator)


PiEstimator Map
Map
9-25
Map Map

9-32

9 Hadoop

180000
160000

Map()

140000
120000
100000
80000
60000
40000
20000
0
0

10

20

30

40

50

60

70

80

Map/Map

9-25 Map (PiEstimator)


9-25 Map 1 Map Map
Map Hadoop

Map Map
30

Map
Map TeraSort Map

1Map () Map
Reduce
1 : 8 (S4 : CPU2 ) , 16, 40GB
2 : 8 (S4 : CPU2 ) , 16, 80GB
3 : 16 (S4 : CPU2 ) , 32, 40GB
1Map JavaVM 200MB( 100MB )

Map 1Map 9-20

9-33

9 Hadoop

9-20 Map

1 Map

2 Map

3 Map

(MB)

()

()

()

500

80 (5 )

160 (10 )

80 (2.5 )

250

160 (10 )

304 (19 )

160 (5 )

130

304 (19 )

608 (38 )

304 (9.5 )

100

400 (25 )

800 (50 )

400 (12.5 )

83.3

480 (30 )

960 (60 )

480 (15 )

65.8

608 (38 )

1200 (75 )

608 (19 )

33.3

1200 (75 )

2400 (150 )

1200 (37.5 )

21.9

1824 (114 )

3648 (228 )

1824 (57 )

9-26
Map
1-

2-

3-

4000
3500
3000

()

No.

2500
2000
1500
1000
500
0
0

20

40

60
80
100
Map/Map

120

140

160

9-26 Map (TeraSort)


Map
PiEstimator
Map Map

Map

9-34

9 Hadoop

1( 9-27 ) 9-28
3
CPU
(1) Map
(2) Map
(3) Map

Map

(1) (2)

(3)

1-

4000
3500

()

3000
2500
2000
1500
1000
500
0
0

20

40

60
80
100
Map/Map

120

9-27 TeraSort ( 1)

9-35

140

160

9 Hadoop

(1) CPU

(2)

(3)

9-28 TeraSort
2
(1) Map CPU WAIT CPU
IO

(2) Map
MapReduce Shuffle Shuffle

(1)(2)Hadoop 1

(1) 1Map ()
9-29 Map
9-30 Map Spill Records

9-36

9 Hadoop

1Map()
()

1Map
2.5E+09

1600
2.0E+09

1400

()

1200
1.5E+09
1000
800

1.0E+09

600
400

5.0E+08

1Map
(B)

1800

200
0

0.0E+00
0

10

250MB

15
20
25
Map/Map

30

35

40

9-29 Map ( 1)
()

Spill Records - Map

1.2E+09

1600

1.0E+09

()

1400
1200

8.0E+08

1000

6.0E+08

800
600

4.0E+08

400

2.0E+08

200
0

1MapSpill Records

1800

0.0E+00
0

10

15

20

25

30

35

40

Map/Map
9-30 Map Spill Records ( 1)
Map Spill Records
IO

9-37

9 Hadoop

(1) Map Data Buffer


Data Buffer Hadoop io.sort.mb
JavaVM ( 100MB)Map
Data Buffer Data Buffer
(Hadoop io.sort.spill.percent 0.8)
(80MB )Spill Data Buffer
Record Buffer
(2) Map Record Buffer

Record Buffer
io.sort.mbio.sort.record.percent
16 JavaVM (
330KB)Record Buffer Map (Hadoop
io.sort.spill.percent)Spill Data Buffer
Record Buffer
(3) Map Reduce
(1)(2) Spill
Map Reduce
1 (Hadoop io.sort.factor)
1

9-28 CPU 9-30 Spill Records Map


1 Map

Map
1 Map Data Buffer Map
Data Buffer(io.sort.mb)JavaVM

1 Map Reduce Record Buffer


Map Record Buffer
(io.sort.record.percent)
1 Map 1

1 Record Buffer
MapReduce 1

9-38

9 Hadoop

Record Buffer
9-30 Map 2 Spill Records
(3) Spill Records
10 9 Spill
Out (3) Spill Out Record Buffer

1 10 Map
1 100 260MB

9-28(3) Map
Reduce
1Reduce Map
Reduce
1 1Reduce Shuffle 9-31
Shuffle Map
Hadoop Map
Map Shuffle

1ReduceShuffle
80

3500

70

3000

60

2500

50

2000

40

1500

30

1000

20

500

10

1ReduceShuffle()

()

4000

0
0

20

40

60
80
100
Map/Map

120

140

160

9-31 Map Shuffle ( 1)


1Map 1

9-39

9 Hadoop

1-

2-

3-

1-1Map

2-1Map

3-1Map

4000

900

3500

800
700

()

3000

600

2500

500
2000
400
1500

300

1000

200

500

1Map(MB)

3 9-32

100

0
0

20

40

60
80
100
Map/Map

120

140

40MB
0
160

9-32 Map Map


9-32 3
1Map
1Map 40MB
MapReduce
1 Map
1Map
200MB 40MB

PiEstimator TeraSort Map

Map

1Map Reduce Map Data Buffer


Record Buffer

9-40

9 Hadoop

IO
1Map 1Reduce JavaVM
(5 1 )

Shuffle
Hadoop Map HDFS
Map
1 Map
HDFS Map

9.5.3.2 Reduce
Reduce Map
TeraSort
Reduce Map
Reduce
Reduce Reduce

Reduce
(1) Shuffle Reduce

(2) Shuffle JavaVM


Hadoop (mapred.job.shuffle.input.buffer.percent)

(1)(2) Reduce

Reduce
Reduce
Map Reduce Reduce
1 JavaVM 200MB
1: 8 (S4: CPU2 ), 16, 40GB
2: 8 (S4: CPU2 ), 16, 80GB
3: 16 (S4: CPU2 ), 32, 40GB

9-41

9 Hadoop

Reduce 9-21
9-21 Reduce
No.

Reduce

0.5

0.5

0.25

16

0.5

32

64

128

256

16

16

512

32

32

16

1192

74.5

74.5

37.25

9-33
2

Reduce
1-

2-

3-

4000

3500

3000

()

2500

2000

1500

1000

500

0
0

10

20

30

40
50
60
70
Reduce/Reduce

80

90

100

9-33 Reduce
Reduce TeraSort

9-42

9 Hadoop

Map CPU 3
9-34 9-35
3
Reduce Reduce ( (1) )
Reduce Reduce ( (4) )
Reduce Reduce ( (2),(3) )

(1)

Reduce

1600
1400

()

1200
1000
800
600
400

(2)

(3)

(4)

200
0
0

10

20
30
40
50
60
Reduce/Reduce

70

80

9-34 Reduce ( 3)
(1):(2):

(3):(4):

9-35 Reduce ( 3)

9-43

9 Hadoop

Reduce CPU WAIT CPU


IO CPU

CPU
Reduce CPU
(Idle CPU)
Reduce 9-36
1Reduce 9-35
CPU WAIT CPU
Map Spill Records
Reduce Spill Records 9-37

-40GB/32
1Reduce-40GB/32
1600

6000

5000
1200

()

4000
1000
800

3000

600
2000
400
1000
200
0

04

0
10

20

30
40
50
Reduce/Reduce

60

70

9-36 Reduce ( 3)

9-44

80

1Reduce(MB)

1400

9 Hadoop

Spill Records - Reduce

1600

8.0E+08

1400

7.0E+08

1200

6.0E+08

1000

5.0E+08

800

4.0E+08

600

3.0E+08

400

2.0E+08

200

1.0E+08

04

ReduceSpill Records

()

-40GB/32

0.0E+00
10

20

30
40
50
Reduce/Reduce

60

70

80

9-37 Reduce Spill Records( 3)


9-37 Reduce Spill Records

Shuffle Map Map


Key

Spill Records

Map

(1) 1Reduce JavaVM (: 200MB)


(2) mapred.job.shuffle.input.buffer.percent : Map
Hadoop (:0.7)
(3) 1 Map (0.25 )
1 Map = (1)(2)(3)
( 200MB) 32MB

9-45

9 Hadoop

Reduce 1 Map
Spill RecordsCPU WAIT CPU

Reduce
Spill Records
mapred.job.reduce.input.buffer.percentReduce
Map
0.0 Reduce Map
Spill Records

Reduce
Reduce Reduce Shuffle
Shuffle
3 Reduce
Shuffle 9-38
Shuffle Map

Shuffle

1600

30000

1400
25000

()

20000
1000
800

15000

600

Shuffle()

1200

10000
400
5000
200
0

0
0

10

20

30
40
50
Reduce/Reduce

60

70

80

9-38 Reduce Shuffle ( 3)


9-36 9-38 Reduce Shuffle

9-46

9 Hadoop

Map Hadoop
Reduce Shuffle
Reduce Reduce
2
Reduce
Reduce 1 Map Shuffle
Map
IO

Reduce Reduce Shuffle


Shuffle Map
Hadoop

9.6 MapReduce
MapReduce
MapReduce

9-39

(Map,Reduce)

(Map,Reduce)

(Map,Reduce)

(Map,Reduce)

MapReduce

(Map,Reduce)

(Map)

Reduce

(Map,Reduce)

9-39 MapReduce

9-47

(Map)

(Reduce)

()

9 Hadoop

9.6.1
( 10MB GB )MapReduce
9-39

Map : Map
Reduce : Reduce
Map :
Reduce : Reduce

Map : Map
Reduce : Reduce
Map : Map
Reduce : Reduce
Map Reduce Map
Reduce MapReduce Web

9.6.2 MapReduce
MapReduce

Map : Map
Map : Map
Reduce : Reduce
Map : Map
Reduce : Reduce

Map
Map Map
[]1Map = []Map ([]Map [
]Map )
[]Map = []1Map []Map
[] Map

9-48

9 Hadoop

[][]1Map = ([]Map [
]Map ) ([]Map []Map )
Map []Map 1Map

Reduce
Reduce
[]Reduce = [] ([]
[]Reduce )
Reduce
[]1Reduce = []Reduce ([]Reduce
[]Reduce )
[]1Reduce = []Reduce []Reduce

[]1Reduce = []Reduce [
]Reduce
[]1Reduce = []1Reduce [
]1Reduce []1Reduce
Reduce = []1Reduce []Reduce
[]Reduce

MapReduce
Hadoop MapReduce Reduce
Map
Reduce Map
MapReduce
Map
mapred.reduce.slowstart.completed.maps
0.05

MapReduce Map Reduce Map


+ Reduce

9-49

9 Hadoop

9.6.3 MapReduce
MapReduce
TeraSort

40GB TeraSort
8 (S4 :8 )

Map : 571
Reduce : 1364
Map : 4.010^10 Byte
Reduce : 4.010^10 Byte
Map : 608
Reduce : 256
Map : 16
Reduce : 16

93 (S1:17 , S2: 4 , S3: 16 , S4:48 ,


S5:8 ) 500GB TeraSort

Map : 5.010^11 Byte


Map : 7456
Reduce : 1300
Map : 260
Reduce : 260

Map
Map
[]1Map = []Map ([]Map [
]Map )
=

571 (608 16)

15.03 ()

9-50

9 Hadoop

[] Map = []1Map []Map


[] Map
= 15.03 7456 260
430.90 ()
[][]1Map = ([]Map [
]Map ) ([]Map []Map )
= (4.010^10 608) (5.010^11 7456)
0.98
Map [] Map 1Map

= 430.9 0.98
439.7 ()

Reduce
Reduce

[]Reduce = [] ([]
[]Reduce )
5.010^11 (4.010^10 4.010^10 )
5.010^11 (Byte)
Reduce Reduce
[]1Reduce = []Reduce ([]Reduce
[]Reduce )
= 1364 (256 16)
= 85.25 ()
[]1Reduce = []Reduce []Reduce

= 5.010^11 256
1.5610^8 (Byte)
[]1Reduce = []Reduce [

9-51

9 Hadoop

]Reduce
= 5.010^11 1300
3.8510^8 (Byte)
[]1Reduce = []1Reduce [
]1Reduce []1Reduce
= 85.25 3.8510^8 1.5610^8
= 209.85 ()
Reduce = []1Reduce []Reduce
[]Reduce
= 209.85 (1300 260)
= 1049.25 ()

MapReduce
Reduce Map 0.05(5%)
MapReduce Map Reduce Map
+ Reduce
439.7 0.05 + 1049.25
1071.2 ()

500GB TeraSort
Map : 588 ( 34%)
Reduce : 1325 ( 26%)
MapReduce : 1353 ( 26%)
3

9-52

9 Hadoop

9.7 Hadoop
Hadoop
9.7.1 Hadoop MapReduce
Hadoop MapReduce
Hadoop
Hadoop

9.7.2 Hadoop
MapReduce Hadoop CPU
Hadoop
Hadoop PiEstimator TeraSort
TaskTracker Map Reduce
Map CPU CPU
1.5 Reduce CPU CPU +1

Hadoop
Map Reduce

9.7.3

MapReduce Hadoop
MapReduce

MapReduce
Hadoop Map Reduce
PiEstimator TeraSort

9.7.4 MapReduce
GB MapReduce GB
MapReduce
MapReduce TeraSort
TeraSort

9-53

9 Hadoop

9.6

9-54

10 Hadoop

10 Hadoop
Hadoop Hadoop

Hadoop

()
Hadoop

2
HA

FT

10.1

Hadoop HDFS NameNode

MapReduce JobTracker 1 Hadoop


NameNode/JobTracker
2 1
Hadoop
10-1 HDFS

(1)

HDFS
client

(2)DataNode

namenode

(5)

(4)

(3)

(3)

datanode

(4)

(3)

datanode

10-1 HDFS
10-1

(4)

datanode

10 Hadoop

HDFS
(1) HDFS NameNode
NameNode

(2) HDFS NameNode DataNode


NameNode
DataNode HDFS
(3) HDFS DataNode
DataNode
DataNode DataNode
(4) DataNode
DataNode HDFS

(5) HDFS NameNode

NameNode HDFS NameNode


NameNode DataNode
HDFS
10-2 MapReduce

(2)

HDFS
client

Job
client

HDFS

(4)Job

(1)JobID
(3)Job

jobtracker

(9)Job

(5)

HDFS
client
(7)JAR
(8)Task

(6)Task

HDFS
client

tasktracker

HDFS
client

tasktracker

10-2 Map/Reduce

10-2

HDFS
client

tasktracker

10 Hadoop

Job
(1) Job JobTracker JobID
JobTracker
JobID
(2) (Job )HDFS Job
JobTracker Job
Job JAR
10-1 NameNode
(3) Job JobTracker Job Job
JobTracker
(4) JobTracker Job Job

(5) (JobTracker )HDFS


Task Task
TaskID
(6) JobTracker TaskTracker Task Task tasktracker
Heartbeat
(7) (TaskTracker )HDFS Job JAR
JAR Task
(8) TaskTracker JobTracker Task Task
TaskTracker Task
(9) Job JobTracker Job Job
JobTracker Job
Job Job JobTracker
JobTracker Job

Hadoop

10.2

10.1

Hadoop


10-3

10 Hadoop

(10.3)
(10.4)

(10.5)

10.3

Hadoop

10-3 NameNode

LAN

namenode

Hadoop

10-3
10.1 HDFS namenode
LAN DataNode NameNode
namenode LAN
3 1 HDFS
3
JobTracker 3
10-4

10 Hadoop

10.3.1
Hadoop

Hadoop

10.3.2
Hadoop

10.3.3
-

10.4

10.2

3 10-1
10-1
No.

HA

10-5

10 Hadoop

No.

FT

CPU

OS

FT

10.2 FT
HA FT
2
Heartbeat DRBD HA
HA Kemari FT

10.4.1
HA FT

10.4.1.1 Heartbeat
Heartbeat HA Heartbeat

Heartbeat
1 1 N 1





10-6

10 Hadoop

Heartbeat 10-4 /
heartbeat

(NIC)
Heartbeat

LAN

Heartbeat

he artbeat

he artbeat

LAN

10-4 Heartbeat

10.4.1.2 Kemari
Kemari
Kemari
HA

Kemari
Xen 2
LAN
I/O I/O

CPU
Kemari 10-5
10-7

10 Hadoop

Heartbeat

LAN

OS

OS

LAN

10-5 Kemari

10.4.1.3 DRBD
DRBD

DRBD


/


DRBD 10-6
DRBD

/ RAID

10-8

10 Hadoop

LAN

d rb d()

d rb d()

LAN

10-6 DRBD

10.4.2 Heartbeat DRBD HA


Heartbeat HA

10.2

DRBD Heartbeat DRBD HA


10-7

10-9

10 Hadoop

(3)Heartbeat+
LAN

namenode

(1)Heartbeat
Heartbeat

Heartbeat

(2)DRBD

drbd

Hadoop

drbd

Hadoop

Heartbeat/LAN

10-7 HA
10.3
(1) Heartbeat Hadoop

(2) DRBD Hadoop

(3) Heartbeat LAN

NameNodeJobTracker 10-7

10.4.3 HA Kemari FT
10.4.2 HA Kemari FT
HA
FT
Kemari FT Hadoop

Kemari FT 10-8

10-10

10 Hadoop

(1)

()

namenode
Hadoop

Hadoop

(5)Heartbeat+

OS

OS

LAN

namenode

(2)Kemari

OS

OS

(3)Heartbeat
Heartbeat

Heartbeat

(4)DRBD

drbd

(VM)

Heartbeat//
Kemari LAN

drbd

(VM)

10-8 FT
10.3
(1) Hadoop
(2) Kemari

(3) Heartbeat Kemari

(4) DRBD
Hadoop

(5) Heartbeat LAN

NameNodeJobTracker 10-8

10.5

10.4 HA FT

10-11

10 Hadoop

10.5.1 HA
Heartbeat DRBD HA

10.5.1.1
HA 10.3
10-2
10-2 (HA )
No.

Hadoop Heartbeat

DRBD
2

DRBD
(
)

IP

Heartbeat

Bonding()

() Bonding

DRBD Heartbeat /

DRBD
Heartbeat 10-9

10-12

10 Hadoop

IP

IP

Hadoop

Hadoop

DRBD()

DRBD()

10-9 (HA )

10.5.1.2
HA

Hadoop
Heartbeat Hadoop
(RA)

HA LAN

Heartbeat 3

VIPCheck IP
SFEX
STONITH
STONITH
STONITH
OS HP
iLO2IBM IMM
10.2
OS ssh STONITH

10-13

10 Hadoop

STONITH
STONITH 10-10

LAN

4.

1.

2.

Heartbeat

he artbeat

5.
he artbeat

LAN
3.

10-10 STONITH

(STONITH )

10.5.1.3
HA

DRBD
10-3 HA DRBD

10-14

10 Hadoop

10-3 DRBD (HA )


No.

Protocol C

10.5.1.1 DRBD Heartbeat


Heartbeat

Hadoop

DRBD

Heartbeat
3 ProtocolC
ProtocolC

ProtocolA:TCP

ProtocolB:

ProtocolC:

10.5.1.4
HA

LAN
Heartbeat LAN

10-15

10 Hadoop

LAN
IP

LAN
LAN 2
bonding
Heartbeat
Heartbeat

10.5.1.5
HA

10.5.1.6
10-4
10-11

10-16

10 Hadoop

(3)Heartbeat+
LAN

4
3
namenode

(1)Heartbeat

heartbeat

heartbeat

(2)DRBD

drbd

drbd

5
Hadoop

Heartbeat/LAN

Hadoop

10-11 (HA )
10-4 (HA )
No.

LAN

LAN

Heartbeat LAN

10.5.1.7
HA

Hadoop TeraSort
10-5

10-17

10 Hadoop

10-5 (HA )
No.

LAN

LAN

Heartbeat LAN

10.5.2 FT
HA Kemari FT

10.5.2.1
FT 10.3
10-6
10-6 ( FT )
No.

Kemari

Heartbeat
Kemari

DRBD

Heartbeat

Bonding

10-18

10 Hadoop

10.5.2.2
FT

Kemari
Kemari FT Kemari
( Kemari RA)Heartbeat Kemari RA

(1) Kemari RA
(2) Heartbeat RA Kemari
(3) Kemari
pause
(4) Kemari
(5) NameNode/JobTracker

NameNode
OS (Dom-U)
Xen

Kemari
xc_kemari_save

Kemari
xc_kemari_restore

Kemari FT 10-12

NameNode

OS (Dom-U)
Xen

Kemari RA

Heartbeat
OS (Dom-0)

DRBD

Heartbeat
DRBD

OS (Dom-0)

(VM)

(VM)

10-12 FT

10-19

10 Hadoop

(1) Heartbeat
(2) Kemari RA
(3) Gratious ARP MAC
(4) Kemari
(5) Gratious ARP MAC
(6)
Kemari FT 10-13

OS (Dom-U)
Xen

Kemari
xc_kemari_save

NameNode

Kemari
xc_kemari_restore

NameNode

OS (Dom-U)
Xen
Kemari RA

Kemari RA

Heartbeat

Heartbeat
OS (Dom-0)

DRBD

DRBD

(VM)

OS (Dom-0)

(VM)

10-13 FT
IP
ARP Heartbeat
Gratious ARP

10.5.2.3
FT

DRBD
Kemari Xen Xen
()/
()
10-20

10 Hadoop

10.2
DRBD DRBD
10-7
10-7 DRBD ( FT )
No.

Protocol C

Heartbeat //

DRBD
DRBD Heartbeat
OS

10.5.2.4
FT

LAN
Heartbeat LAN
LAN

LAN
Kemari FT 3

Heartbeat:
DRBD:
Kemari:
Hadoop NameNode/JobTracker I/O I/O

LAN 10-14
10-21

10 Hadoop

10-14
LAN

LAN LAN

LAN 10GNIC
1GNIC
LAN Bonding

3 LAN
Kemari RA
LAN 3

10.5.2.5
HA

Xen
Kemari Xen Xen Kemari
FT Xen
(1)
Kemari FT


CPU (Intel-VT, AMD-V)

PV PV
10-22

10 Hadoop

I/O

(2)

NameNode/JobTracker CPU
I/O
Kemari CPU
CPU 1
NameNode HDFS HDFS

JobTracker MapReduce

NameNode/JobTracker 2.5GB
8G

10.5.2.6
10-8
10-15

10-23

10 Hadoop

(5)Heartbeat+
LAN

4
3

namenode

(2)Kemari

OS

OS

(3)Heartbeat

heartbeat

heartbeat

(4)DRBD

drbd

drbd

(VM)

Heartbeat//
Kemari LAN

(VM)

10-15 ( FT )
10-8 ( FT )
No.

LAN

LAN

Heartbeat LAN

10.5.2.7 Kemari
FT

Hadoop TeraSort
10-9

10-24

10 Hadoop

10-9 ( FT )
No.

LAN

LAN

Heartbeat LAN

10.6

HA Kemari FT

10.6.1 HA
Heartbeat NameNode JobTracker /

JobTracker
NameNode HDFS DataNode
Safemode Safemode
NameNode

10.6.2 FT
Kemari
Hadoop

10-25

10 Hadoop

FT
10-10 Kemari

Kemari 10-10

Netperf: LAN

NNBench: NameNode
TeraSort: Hadoop
Kemari

10-10 Kemari
No.
1

Kemari

LAN

Kemari

Kemari

Netperf

750(Mb

20(Mbp

ps)

s)

NNBench

203(s)

2393(s)

17.5

16

NameNode

[Map :4
:1[byte]
:1[byte]
:5000]
3

Hadoop

Terasort
[:10G]

Kemari FT I/O
LAN
10-10 1/40

10-26

10 Hadoop

NNBench
10 Kemari
64MB(Hadoop ) Terasort

Hadoop MB
Hadoop

10.6.3
FT OS
Hadoop
Hadoop

FT
Hadoop

FT CPU

Kemari FT

10-27

11 Hadoop

11 Hadoop
Hadoop Hadoop

Hadoop
12
13

11.1

Hadoop
Hadoop

Hadoop

11.1.1 Hadoop

Hadoop

11-1

11-1

11 Hadoop

11-1 Hadoop

Hadoop
Hadoop Hadoop
Hadoop NameNode, JobTrackerHadoop

Hadoop
Hadoop

11-2

11-2

11 Hadoop

11-2 Hadoop
Hadoop 11-1

11-1 Hadoop
No.

Hadoop

Hadoop

Hadoop

11.1.2 Hadoop
Hadoop Hadoop
Hadoop
Hadoop
Hadoop

11.3 11.4
11.5

11-3

11 Hadoop

11.1.3
11-2

11-2
No.

OS

Hadoop

Hadoop

12
Hadoop
Hadoop HDD NIC

Hadoop
Hadoop

Hadoop

11-4

11 Hadoop

Hadoop
Hadoop

Hadoop
Hadoop
Hadoop
Hadoop
Hadoop

11-5

11 Hadoop

11.2

11.2.1
Hadoop 11-3
11-3
No.

24H/365D.

9:00-17:00

11.2.2 Hadoop
11-3 Hadoop

Job

L3
L3

Hadoop (DataNode/TaskTracker)
L2

L2

L2

L2

L2

NameNode
Namenode

Hadoop 100
JobTracker
JobTracker

Core2 Duo
40

11-3 Hadoop

11-6

Xeon QuadCore XeonQuadCore Xeon DualCore


12
18
16

Hadoop

Core2 Duo
10

11 Hadoop

11.3

Hadoop
Hadoop

11.3.1
Hadoop Hadoop

Hadoop

11.3.2
OS

11-4
11-4
No.

L3

Hadoop Hadoop

L2

Hadoop

Hadoop

NameNode, JobTracker

Hadoop

DataNode, TackTracker

Hadoop
Hadoop Hadoop

11-7

11 Hadoop

Hadoop
Hadoop

11.3.3

Hadoop

Hadoop Hadoop
Hadoop

Hadoop
11-5
11-5
No.

Hadoop

Hadoop

Hadoop

Hadoop

HinemosNagios

SNMP SNMP-TRAP

11-8

11 Hadoop

Hadoop
Hadoop Hadoop

20 60
Hadoop 120
1 40

Hadoop
Hadoop

11-6
11-6
No.

Hadoop

MapReduce

3
4

HDFS

DataNode

HDFS

TaskTracker

11-9

11 Hadoop

11.3.4
Hadoop Hadoop

CPU
Hadoop
/

11-7
11-7
No.

Hadoop
10 100 Hadoop
1000

Hadoop

11-10

11 Hadoop

11-8

11-8
No.

Hadoop

Hadoop

Hadoop

Hadoop

HDFS

HDFS HDFS

Hadoop

10

11

HDFS

HDFS

11.3.5
Hadoop
Hadoop

Hadoop

Hadoop
Hadoop

11-11

11 Hadoop

Hadoop
11-9
11-9
No.

L3

2
3

L2

4
5

Hadoop

OS

NIC

Hadoop

OS

NIC

OS

NIC

10

11-10

11-10
No.

11-12

11 Hadoop

11.4

11.4.1
10
11-11
11-11 Hadoop
No.

Hadoop

Hadoop

Hadoop

10

Hadoop

11-12
11-12

No.

( 11-11 No)

Hadoop

1,9,10

Hadoop

Hadoop

Hadoop

1,4,5,6

Hadoop

7,8,

11-13

11 Hadoop

11.4.2 Hadoop
Hadoop

Hadoop RedHat Enterprise Linux


Kickstart
Puppet 12
Puppet 11.4.5

11.4.3 Hadoop
Hadoop Hadoop Hadoop
Hadoop

Ganglia Hadoop
12

11.4.4 Hadoop
Hadoop Hadoop Hadoop

Hadoop
Hadoop
Hadoop
11-13
11-13 Hadoop
No.

Hadoop

Hadoop

11-14

11 Hadoop

Ganglia
Ganglia 12

Ganglia Ganglia
gmond

gmond

Ganglia
Ganglia 11-13
Ganglia
gmond (XML)
11-4 11-14

gmond
gmond

gmond
gmond

gmond

gmond

gmond
gmond

gmond

gmond
gmond
gmond

11-4 Ganglia
11-14 Ganglia
No.

Ganglia metric

HDD

metric gmond

Ganglia

3
4

gmond

gmond

11-15

11 Hadoop

No.

Hadoop Ganglia
Hadoop
11-5

11-5 Ganglia
r7-1-0-01
7

11.4.5 Hadoop
Hadoop

Hadoop
OS Hadoop 30
Hadoop
10 100 Hadoop

100

Puppet
Puppet

11-6 Puppet

11-16

11 Hadoop

Hadoop

OS

(puppetrun)

Puppet

Ganglia

-Hadoop NameNode
-Hadoop DataNode

-Hadoop

-Hadoop

CPU/

11-6 Puppet
Puppet 11-5
11-15 Puppet
No.

Puppet

push

md5

Puppet Puppet
Hadoop


MapReduce HDFS ...

11-17

11 Hadoop

Hadoop Puppet

factorpuppetrun
Puppet
15.4.9

11.4.6 Hadoop
Hadoop
Puppet Puppet
Hadoop
11-16

11-16
No.

Puppet
Puppet
Hadoop
Puppet

11-7

11-18

11 Hadoop

11.5

Hadoop
11.3 Hadoop 11.4

11.5.1
11-17
No6
11-17
No.

L3

L2

Hadoop

Hadoop

11.4 Hadoop

11.5.2
11-18
11-18
No
1

SNMP

2
3

Hadoop

Hadoop

14.4 Hadoop

11.4 Hadoop
JobTracker

11-19

11 Hadoop

No

NameNode

11-8 Hadoop JobTracker


JobTracker

11-8 Hadoop
11-9 r7-1-0-01
7

11-9 Hadoop

11.5.3
11-19
11-19
No.

NIC eth0 MAC

3
4

11.4 Hadoop

11-20

11 Hadoop

No.
5

11.4 Hadoop

11.4 Hadoop

11-20
11-20
No.

Hadoop 11.4 Hadoop

Hadoop

Hadoop 11.4 Hadoop

Hadoop

Hadoop

Hadoop

HDFS

NameNode HDFS

OS

OS

11.4

Hadoop

10
11

11.4 Hadoop

HDFS

SecondaryNameNode JobTracker

11-21

11 Hadoop

11.5.4
11-21 11-22
11-21
No.
1

L3

OS

2
3

L2

4
5

Hadoop

Hadoop

OS

OS

10

11-22
No.

11-22

11 Hadoop

11.6

11.6.1
Hadoop
Hadoop

Hadoop
Hadoop
Hadoop
Hadoop

11.6.2
Hadoop
Hadoop Hadoop
Hadoop

Hadoop

11-23

12 Hadoop

12 Hadoop
Hadoop

Hadoop
RedHat Enterprise
Linux Kickstart
Puppet

12.1

Hadoop Hadoop

12.1.1 Hadoop
Hadoop Hadoop NameNodeJobTracker
Hadoop DataNodeTaskTracker 12-1
Hadoop
12-1 100 Hadoop
No.

Hadoop

NameNode , JobTracker

Hadoop

DataNode, TaskTracker

96

Hadoop

IP

Hadoop
Hadoop
Hadoop 96
Hadoop
1000 Hadoop

80
Hadoop 1000

12-1

12 Hadoop

Hadoop L2
Hadoop Hadoop
Hadoop

12.1.2 Hadoop
Hadoop
Hadoop
OS

Hadoop IA
Hadoop

Hadoop

Hadoop
12-2

12-2 Hadoop
No.

Hadoop

12-2

12 Hadoop

12.2

12.2.1
12-1

Job

L3
L3

Hadoop (DataNode/TaskTracker)
L2

L2

L2

L2

L2

NameNode
Namenode

Hadoop 100
JobTracker
JobTracker

Core2 Duo
40

Xeon QuadCore XeonQuadCore Xeon DualCore


12
18
16

Hadoop

Core2 Duo
10

12-1

L3

L3 DHCP

12.2.2
Hadoop 12.1
Hadoop
Hadoop

12-3

12 Hadoop

Hadoop

12.2.3
OS

12.3

12.3.1 Hadoop
Hadoop Hadoop
12-2
Hadoop

Hadoop

Hadoop
12-3
12-3
No.

12-4

12 Hadoop

No.

12.3.2
HPC
HPC
Hadoop HPC

Hadoop

Kickstart
Kickstart+ Puppet
Kickstart
Puppet
rocks
Kickstart

OSCAR
Kickstart GUI
Kickstart

12-5

12 Hadoop

12-4
12-4
No.

Kickstart+Puppet

rocks

OSCAR

OS

GUI

GUI

GUI

Roll

Kickstart
rocks OSCAR

Rocks, OSCAR

12-6

12 Hadoop

Kickstart Puppet

12.4

Kickstart puppet
KickStart

DNS DHCP
TFTP
HTTP
Puppet

12.4.1 Kickstart
Kickstart

Kickstart

OS

Kickstart
OS
MAC

IP ,

96 MAC
Kickstart

12-7

12 Hadoop

12.4.2 Kickstart
Kickstart 12-5 12-6
DHCP DNS 12.4.8

12-5 Kickstart
No.

ON

PXE

DHCP IP

DHCPDISCOVER

DHCP

TFTP

OS

OS

IP .

HTTP
(Kickstart )

Kickstart
OS

Kickstart
12-6 Kickstart
No.

IP

PXE

DHCP

PXE

TFTP

OS
1
3

OS

OS

HTTP

4
5

OS

Kickstart

OS

12-8

HTTP
HTTP

12 Hadoop

No.

12.4.3 Puppet
Puppet
Ruby Puppet
12-2


Hadoop

OS

(puppetrun)

Puppet

-Hadoop NameNode

-Hadoop DataNode
-Hadoop

Ganglia

-Hadoop

CPU/

12-2 Puppet

12.4.4 Kickstart Puppet


Hadoop RedHat Enterprise Linux
Kickstart
Puppet

12-7 Kickstart No1


No2

12-9

12 Hadoop

12-7 Kickstart
No
1

1-1

1-2

2
2-1

Hadoop

1-1 Hadoop
Kickstart
IP IP

1-2 IP

IP 192.168.3.40

2-1 Hadoop

CPU

12.4.5
1-1

Kickstart IP MAC
IP


NIC MAC
MAC Kickstart IP

12-10

12 Hadoop


DNS
MAC

NIC MAC
NIC

MAC IP
DHCP hosts

Kickstart IP
IP
IP
DNS

12-8 MAC
DDNS
12-8
No.
1
2

MAC

DDNS

hosts

DNS A

DHCP

BIND A

hosts
3

MAC

12-11

12 Hadoop

No.
5

MAC

DDNS
MAC

DNS

12.4.6
1-2
IP Hadoop

12-9
No.
1

rack1-13u.example.net

rack1switch-port13.example.net

Hadoop

12-10

12-12

12 Hadoop

12-10
No.

MAC

Hadoop L2 mac-address-table(MAC
)
MAC

IP DNS
3
1/0/12

DNS IP A

# /root/scripts/myhostname
r3-1-0-12.example.net

12-11
12-11
No.

12.4.7
12-7 2-1

12-13

12 Hadoop

12-12
No.

(i.e. /dev/sda, /dev/cciss/c0p0)

CPU Hadoop

MapReduce

12-13 OS Kickstart
12-14
Puppet facter
12-13 OS

No.

OS
(%pre )
OS

(%include
)
12-14
No.

(CPU //)
Puppet facter

Hadoop

12.4.8
12-7 1-11-22-1
Kickstart DNS
DHCP TFTP HTTP Puppet

12-14

12 Hadoop

DNS
NW
example.net.NW

DHCP DNS
DNS
Hadoop
DHCP DynamicDNS

DHCP
DHCP
TFTP DHCP
dhcpd DHCP
DHCP Hadoop
L3 DHCP

DHCP IP
Hadoop IP
IP DHCP
DNS DHCP

IP IP

DHCP DNS A

TFTP
tftpd TFTP
RedHat Enterprise Linux syslinux
TFTP


HTTP
HTTP Apache
12-15

12 Hadoop

HTTP
OS

HTTP Ganglia Nagios


Web

Kickstart
Kickstart Kickstart

Linux
Kickstart %pre

%post puppet

Puppet
Puppet
Ruby Puppet

Hadoop
Hadoop

12-15
No.

DNS

BIND

OSS

DHCP

dhcpd

OSS

TFTP

tftpd

OSS

HTTP

Apache

OSS

Puppet

Puppet

OSS

12-3

12-16

12 Hadoop

12-3 Hadoop

ON

IP1

OS

DHCP
TFTP

2(OS)
HTTP

DHCP

Puppet

DNS

12.4.9 Kickstart Puppet


Kickstart OS
Kickstart
Puppet Kickstart
Puppet

12.4.9.1 Kickstart Puppet


Kickstart OS Puppet

Puppet Kickstart
Kickstart
12-16 Kickstart
No.
1

2
3

OS

OS

Hadoop Puppet OS

12-17

12 Hadoop

No.

DNS

Puppet

Puppet

Puppet

Puppet

6
Puppet

12.4.10 Puppet
Puppet
12-7 2-1
Puppet

Puppet
12-17 Puppet
No.

common

common

OS
NTP cron
OS

facter

hadoop

hadoop

Hadoop

namenode/jobtracker

ganglia

gmond

gmond

12-18

12 Hadoop

No.

gmetad

gmetad

web

ganglia

Puppet
Puppet
manifest
facter
facter

12-18 facter
facter

No.

racknum

diskcount

mygmetad metad

disklist

12.5

12.5.1 Hadoop
100 Hadoop 50

12-19

12-19

12 Hadoop

12-19
No.
1

46

0.75

1 4

SAS
(72300GBx2)

50

1.75

1 5

SATA
(250GBx2)

12-19 No2

12-20 50

12-20

r6-1-0-01
r6-1-0-02
r6-1-0-03
r6-1-0-04
r6-1-0-05
r6-1-0-06
r6-1-0-07
r6-1-0-08
r6-1-0-09
r6-1-0-10

11:33
11:30
11:33
11:30
11:33
11:30
11:33
11:30
11:33
11:30


12:42 1:09
12:42 1:12
12:42 1:09
12:42 1:12
12:42 1:09
12:42 1:12
12:42 1:09
12:42 1:12
12:42 1:09
12:42 1:12

r7-1-0-01
r7-1-0-02
r7-1-0-03
r7-1-0-04
r7-1-0-05
r7-1-0-06
r7-1-0-07
r7-1-0-08
r7-1-0-09
r7-1-0-10

11:57
11:54
11:57
11:54
11:57
11:54
11:57
11:54
11:57
11:54


13:03 1:06
13:03 1:09
13:03 1:06
13:02 1:08
13:03 1:06
12:57 1:03
12:57 1:00
12:57 1:03
12:57 1:00
12:56 1:02

r7-2-0-01
r7-2-0-02
r7-2-0-03
r7-2-0-04
r7-2-0-05
r7-2-0-06
r7-2-0-07
r7-2-0-08
r7-2-0-09
r7-2-0-10

11:45
11:42
11:45
11:42
11:45
11:42
11:45
11:42
11:45
11:42


13:06 1:21
13:05 1:23
13:06 1:21
13:05 1:23
13:05 1:20
13:00 1:18
13:00 1:15
13:00 1:18
13:00 1:15
13:00 1:18

r7-2-0-11
r7-2-0-12
r7-2-0-13
r7-2-0-14
r7-2-0-15
r7-2-0-16
r7-2-0-17
r7-2-0-18
r7-2-0-19
r7-2-0-20

11:39
11:36
11:39
11:36
11:39
11:36
11:39
11:36
11:39
11:36


12:48 1:09
12:46 1:10
12:48 1:09
12:45 1:09
12:48 1:09
12:46 1:10
12:49 1:10
12:45 1:09
12:49 1:10
12:46 1:10

r7-1-0-11
r7-1-0-12
r7-1-0-13
r7-1-0-14
r7-1-0-15
r7-1-0-16
r7-1-0-17
r7-1-0-18
r7-1-0-19
r7-1-0-20

11:51
11:48
11:51
11:48
11:51
11:48
11:51
11:48
11:51
11:48


12:54 1:03
12:50 1:02
12:54 1:03
12:51 1:03
12:55 1:04
12:52 1:04
12:55 1:04
12:51 1:03
12:55 1:04
12:52 1:04

12-20

12 Hadoop

50
16
14
12

10
8
6
4
2
0
0:50

0:55

1:00

1:05 1:10 1:15 1:20 1:25


[HH:MM]

1:30

1:35

1:40

12-4

12.5.2

Hadoop
Hadoop
12-21 12-2

12-21 Hadoop
No.

Hadoop

Hadoop Kickstart Puppet


100 Hadoop 3
12-21 No.1

12-21 No.2

12-21

12 Hadoop

12-21 No.3 Hadoop

Kickstart Puppet

Hadoop
Hadoop
Hadoop Hadoop
100
1000
10

12-22

13 Hadoop

13 Hadoop
Hadoop

Hadoop

Ganglia JobTracker
WebUI
100 Hadoop

13.1

Hadoop

13.1.1

Hadoop

Hadoop
Hadoop

13.1.2
Hadoop 3

(1)
(2)(3)
(1)~(3)

13-1

13 Hadoop

(1)

Hadoop

Hadoop

(2)

Hadoop

(3)
1 1

Hadoop 1 1

13-2

13 Hadoop

13.2

13.2.1

Namenode

JobTracker
NameNode
Namenode

JobTracker
JobTracker

Hadoop (DataNode/TaskTracker)

13-1
13-1

Hadoop

13.3

13.1

13-3

13 Hadoop

13.3.1 1Hadoop

Hadoop

13-1
13-1
No.
1

2
3

Hadoop

Hadoop

13-1

13.3.1.1 Hadoop
Hadoop MapReduce MapReduce
3 MapReduce
MapReduce MapReduce
MapReduce
MapReduce Map Reduce
4
MapReduce Map Reduce
MapReduce Map Reduce Map
Reduce MapReduce Map
MapReduce Reduce
Hadoop MapReduce Hadoop
speculative execution MapReduce

MapReduce

13-4

13 Hadoop

Hadoop
13-2
13-2 Hadoop
No.

MapReduce

MapReduce

MapReduce

MapReduce Map

MapReduce Reduce

MapReduce Map

7
8

MapReduce

MapReduce

Reduce

MapReduce Map

JobTracker

MapReduce

Reduce
10

MapReduce Map

11

MapReduce
Reduce

13.3.1.2

Hadoop

CPU

CPU CPU

13-5

13 Hadoop

systemuseriowait

system
user iowait CPU I/O
I/O

swap-in swap-out
UsedCachedBufferedSwapped

I/O

NIC
NIC
13-3

13-3
No.

CPU

NameNode

CPU systemuser

JobTracker

iowait

Hadoop

MapReduce

Client

NameNode

Used

CachedBufferedSwapped JobTracker

swap-in

Hadoop

swap-out

MapReduce
Client

13-6

13 Hadoop

No.

NameNode

JobTracker
Hadoop
MapReduce
Client

NameNode

bytes received

JobTracker

Hadoop

bytes sent

MapReduce
Client

Hadoop
NameNodeJobTracker JVM
FullGC

HDFS HDFS

HDFS

Hadoop
13-1

13-2

13-7

13 Hadoop

:
()

13-2
Hadoop 13-4
13-4 Hadoop
No.

JVM

Heap New

NameNode

Heap Old

JobTracker

Heap Permanent
FullGC
2

HDFS

HDFS

NameNode

UnderReplicatedBlocks ( NameNode

)
MissingBlocks

HDFS

CorruptBlocks (

13-8

NameNode

NameNode

13 Hadoop

13.3.1.3 Hadoop
Hadoop Hadoop

13-5 Hadoop
No.

Hadoop

Hadoop

Hadoop

13.3.1.4
MapReduce MapReduce
13-2
13-3~ 13-4

13.3.1.5 1 Hadoop
Hadoop 13-2
13-5

13.3.2 2

13.3.2.1
Hadoop

13-9

13 Hadoop

13-3

13-10

13 Hadoop

13-4

1 1

13-5

13-11

13 Hadoop

13.3.2.2 2

13.3.3 3
Hadoop

Hadoop

13.3.3.1 Hadoop

Hadoop Hadoop
Hadoop 1
Hadoop

Hadoop
Hadoop 1

Hadoop

Hadoop Hadoop
Hadoop
13-6

13-12

13 Hadoop

13-6

13-6
Hadoop 1 13.2

Hadoop MapReduce
MapReduce Hadoop
MapReduce
Hadoop Hadoop

MapReduce 2

13.3.3.2 Hadoop
Hadoop JobTrackerNameNode
Domain- 14
12.1.1
1
13-13

13 Hadoop

13.3.3.3 3
13-6
13-6
No.

Hadoop

Hadoop MapReduce

13.3.4
13.3.1 13.3.3

13.3.4.1
1Hadoop 13-2
13-5
2

3
Hadoop
MapReduce Hadoop
2
13-7
13-7
No.

13-2 13-5

Hadoop MapReduce

13-14

13 Hadoop

No.

Hadoop
13-2 13-7 JobTracker
WebUI JobTracker WebUI Hadoop

No.2 No.3 MapReduce


No.4 MapReduce

13-7JobTracker WebUI
( 13-3
13-4)Hadoop 13-5

13.3.4.2
GangliaMuninCacti
13-7 4

1 GangliaMuninCacti

2 Ganglia

13-15

13 Hadoop

Ganglia

1
Ganglia

3 GangliaMuninCacti
MapReduce
MapReduce
GangliaMuninCacti
4 Ganglia
CactiMunin

13-8
13-8
No.

Ganglia

Munin

Cacti

13-2 13-5

MapReduce

Ganglia

13-16

13 Hadoop

13.3.4.3 Ganglia
Ganglia

WebFrontend

Client

gmetad
HDD
:

gmond
gmond

gmond
gmond

gmond

gmond

gmond
gmond

gmond

gmond

gmond
gmond

13-8Ganglia
13-8 Ganglia
gmond

gmetad gmond

WebFrontend gmetad

13.3.4.4 Ganglia
Ganglia CPU
1

Ganglia 2
Hadoop Ganglia
gmetric

13-17

13 Hadoop

Hadoop Ganglia
Hadoop Ganglia MapReduce HDFS Hadoop

Hadoop Ganglia (https://issues.apache.org/jira/browse/HADO


OP-4675) Hadoop

gmetric
gmetric ganglia-gmond rpm

gmetric
gmond gmond
13-9

gmetric
cron

gmond
gmetric

gmond

gmond
gmond

gmond

gmond
:

13-9 gmond

13.3.4.5 Ganglia
13.3.4.4 1

13-18

13 Hadoop

Ganglia
php conf.php $optional_graphs

php conf.php
13-10

13-10

host_extra.tpl
host_extra.tpl

13-9
13-9
No.

Heap New

Heap

13-19

13 Hadoop

No.

Heap Old

Heap Permanent
4
5

swap-inout

swap-in

swap-out

13.3.5
1 3 Ganglia
2

13.3.5.1 1 3 Ganglia
Hadoop
13.3.1
13-10 Hadoop

Hadoop 13.3.3 Ganglia


JobTracker WebUI MapReduce

13-10
No.

MapReduce

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

13-20

13 Hadoop

No.

MapReduce

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

MapReduce

JobTracker WebUI

Ganglia

Ganglia

Ganglia

Ganglia

Ganglia

Ganglia

Gangli a

Ganglia

MapReduce
Map

MapReduce
Reduce

MapReduce
Map

MapReduce
Reduce

MapReduce
Map

MapReduce
Reduce

10

MapReduce
Map

11

MapReduce
Reduce

12

13

CPU
SystemUseriowait

14
15

16

17

UsedCached

BufferedSwapped
18

swap-in

19

swap-out

13-21

13 Hadoop

No.

Ganglia

Ganglia

Ganglia

20

21

bytes received
22

bytes sent
23

Heap New

Ganglia

24

Heap Old

Ganglia

25

Heap Permanent

Ganglia

26

FullGC

Ganglia

27

Ganglia

Ganglia

28

29

HDFS

Ganglia

30

UnderRepulicatedBlocks

Ganglia

31

MissingBlocks

Ganglia

32

CorruptBlocks

Ganglia

33

JobTracker WebUI

( 13-3~ 13-4)Hadoop
13-5

Hadoop MapReduce
13-11

13-22

13 Hadoop

13-11 Hadoop MapReduce


320KBytes/sec(2.56MBytes/sec)

13.3.3 Hadoop

13-12

JobTracker WebUI MapReduce


13-7

13-23

13 Hadoop

13-12 Hadoop Ganglia


13-11
No.

Hadoop

Hadoop
MapReduce

Ganglia
13-9 Ganglia 13-10

13-24

13 Hadoop

1.

2.

3.Heap

4.swap-inout

5.

6.

13-13 Ganglia
JobTracker WebUI Ganglia
Hadoop
1
2
3
Hadoop

13.3.5.2
Hadoop Hadoop
Hadoop

100 WAIT CPU 25%


( 13-14) gmetad
gmond I/O
Hadoop gmond

13-25

13 Hadoop

I/O

13-14 CPU
102648668896105
CPU
Hadoop 200

13-15 13-12

100
90
80

CPU

70
60
50
40
WAIT CPU
System CPU
Nice CPU
User CPU
Idle

30
20
10
0
0

10

26

48
66

88

96

13-15 CPU

13-26

105

13 Hadoop

13-12 CPU
No.

WAIT

System

Nice

User

CPU

CPU

CPU

CPU

Idle

WAITCP
U

1.160761 1.041576

10

2.679405 2.284270 0.037351 13.59621 81.40313 0.267941

26

6.125225 1.852921 0.000618 6.108033 85.91297 0.235586

48

11.54683

66

15.88108 2.928797 0.000633 6.108101 75.08151 0.240622

88

22.68472 3.345284

96

24.52599 3.175769 0.001099 6.010329 66.28692 0.255479

105

26.91957 3.889626

0.000054

2.636881 0.000990

0.001136

0.038984

4.151413 93.64630

5.971237

6.439261

6.484171

79.84410 0.240559

67.52965 0.257781

62.66791 0.256377

WAIT CPU
0.25%WAIT CPUWAIT CPU
10% CPU 90 / 0.25 = 360 CPU
Idle

13.4

13.4.1
Hadoop

Hadoop

13-27

13 Hadoop

Hadoop
Ganglia
Ganglia
Hadoop Ganglia
Hadoop
Ganglia
100 Hadoop

13.4.2
3

13.4.2.1
13-10 CPU
MapReduce
Ganglia MapReduce
MapReduce

13.4.2.2 I/O
13.3.5
gmetad I/O
Hadoop Hadoop
Hadoop
gmetad I/O

gmetad I/O

SSD

RAM
PC
SSD Hadoop
I/O

13-28

13 Hadoop

RAM gmetad
I/O RAM RAM

13.4.2.3
Ganglia 1 1 1 1 1 5
2
1

13-29

6
2

I.1

I-1

Job
traffic-report.jar

traffic-report.properties

HDFS

I-1
traffic-report.jar
traffic-report.properties

MapReduce 2

I-1

traffic-report.properties

I-1
I-1
No.

InputSplit

InputSplit

Reduce

Reduce

10

Reduce

HDFS
2

I-2

I.2

I.2.1

I.2.1.1
I-2

Hadoop/
Job
L3
L2

NameNode
Namenode

JobTracker
JobTracker

r2

L2

L2

L2

L2

L2

L2

Hadoop
(DataNode/TaskTracker)

r6
10

I-2

I-3

r5
18

r4
16

r3
12

r7
40

I.2.1.2
Hadoop Hadoop

Hadoop
Hadoop I-2
1 r2
I-2 Hadoop
No.

JobTracker

Kemari FT

JobTracker

Kemari

JobTracker

jt

JobTracker

hjt1

Kemari

jt

DL380G5

QC XE5345

jt

QuadCore/2.33GHz x2

32GB
HDD 146GB x2

JobTracker

hjt2

Kemari

jt

DL380G5

QC XE5345

hjt1 jt

QuadCore/2.33GHz x2

32GB
HDD 146GB x2

NameNode

nn

NameNode

Kemari FT

NameNode

Kemari

NameNode
Kemari

hnn1

nn

DL380G5

QC XE5345

I-4

No.

nn

QuadCore/2.33GHz x2

32GB

HDD 146GB x2
6

NameNode

hnn2

Kemari

jt

DL380G5

QC XE5345

hnn nn

QuadCore/2.33GHz x2

32GB
HDD 146GB x2

JobClient

job

JobClient

DELL R410
Intel(R)Xeon(R)CPUE55
06 2.13GHZ x 8
8GB
HDD 13GB

Hadoop
CPU
Hadoop CPU
CPU

NameNode HDFS
NameNode Hadoop
JobTracker Job
CPU
Hadoop

I-3 NameNode
No.

250Byte

150Byte

180Byte

I-5


NameNode JobTracker
Hadoop

SAS RAID1


Hadoop
LAN FT

LAN
FT LAN

10GBps 4

FT 10G NIC

Hadoop

OS
OS
iLO

Hadoop
Hadoop
CPU
I-4
I-4 Hadoop
No.

CPU

HDD

DL380G5

Xeon

8GB

SAS 146GB x 2

r3

XE5345

QuadCore/

I-6

2.33GHz x2

DL360G5

Xeon

XX5460

QuadCore/

6GB

SAS 146GB x 2

r3

2GB

SAS 72GB x 2

16

r4

6GB

SAS300GB x 2

18

r5

2GB

SATA 250GB x 2

10

r6

40

r7

3.16G

DL360G5

Xeon

LV DC X5148

DualCore/
2.33G

DL360G6

Xeon

XE 5504

QuadCore/2

1P4C

Express

Core2 Duo

5800

T9400

iR110a-1
Hadoop
CPU

Hadoop
(2010 1 )Intel
Xeon 5500 2GHz 4
CPU
CPU
CPU

spec.org TPC
TPC


Hadoop (Map Reduce )
Java VM
200MB()CPU 1 Map 1
Reduce 1 1

(Map 2 +Reduce 1 )200MB = 600MB

I-7

JavaVM Hadoop
(TaskTrackerDataNode )OS
CPU1 1GB


Hadoop Shuffle Reduce
1Gbps
LAN
1 Hadoop
PXE
BIOS BIOS
PXE
PXE
OS

I-2
r2
I-5
No.

HP Compaq dc7800 SFF

E8400

NagiosGanglia

Intel(R)Pentium(R)4CPU

3.00GHZ x 2

mg1

2G
HDD 120GB
2

pp1

Puppet DNS HP Compaq dc7800 SFF


DHCPTFTP

E8400

Intel(R)Core(TM)2 Duo CPU

E8400 3.00GHZ x 2
:2GB

I-8

No.

HDD:42GB x1
3

pp2

NEC MATE ME-8 MY30A/E-8


Intel(R) Core(TM)2 Duo CPU
E8400 3.00GHz x2
GB
HDD 120GB

I-6
I-6
No.
1

L3

WS-C3750G

10/100/1000

24

(EtherChannel)

SFP

-24TS-E


4
WS-C3750E

-24TD-S

10/100/1000

24X2

(EtherChannel)

10

2

L2

WS-C3750G
-24TS-E

10/100/1000

(EtherChannel)

24
SFP

4

I-9

I-7 -1
No.

Hadoop Hadoop

Gigabit

Hadoop

Telnet SSH

L3

L3 L3

DHCP

IP

IP DHCP IP

DHCP

L3 DHCP

I-10

I-8 -2
No.

I-7

I-7

Hadoop Hadoop
1 LAN

L3

I.2.1.3
I-9 OS
CentOS 5.3
I-9
No.

JobTracker

hadoop0.20.1
ganglia-gmond3.1.2
nagios-plugin1.4.14
puppet0.24.8

NameNode

hadoop0.20.1
ganglia-gmond3.1.2
nagios-plugin1.4.14
puppet0.24.8

JobTracker xen3.0.3
JobTracker drbd8.3.2
NameNode heartbeat(2.1.4)

I-11

No.

NameNode kemari(v1)
ganglia-gmond3.1.2
nagios-plugin1.4.14
puppet (0.24.8)
4

Job

hadoop0.20.1
ganglia-gmond3.1.2
nagios-plugin1.4.14
puppet0.24.8

puppet-server0.24.8

bind-chroot(9.3.4)
bind-libs(9.3.4)
bind-utils(9.3.4)
bind(9.3.4)
caching-nameserver(9.3.4)
ganglia-gmond3.1.2
nagios-plugin1.4.14
puppet0.24.8
ypbind(1.19)

nagios-3.2.0-1
ganglia-gmetad(3.1.2)
ganglia-web(3.1.1)
ganglia-gmond3.1.2
libganglia3_1_0-3.1.2
nagios-plugin1.4.14
net-snmp(5.3.2.2)
puppet0.24.8

puppet-server0.24.8-1
bind-chroot(9.3.4)
bind-libs(9.3.4)
bind-utils(9.3.4)
bind(9.3.4)
caching-nameserver(9.3.4)
bind 9.3.4
ganglia-gmond3.1.2
nagios-plugin1.4.14

I-12

No.

puppet0.24.8
ypbind(1.19-11)

Hadoop

hadoop0.20.1

ganglia-gmond3.1.2
nagios-plugin1.4.14
puppet0.24.8

I-10
I-10
No.

hadoop-0.20.1

http://www.apache.org/dyn/closer.c

Hadoop

gi/hadoop/core/

http://issues.apache.org/jira/brows
e/MAPREDUCE-112
http://issues.apache.org/jira/brows
e/MAPREDUCE-118
http://issues.apache.org/jira/brows
e/MAPREDUCE-1182
http://issues.apache.org/jira/brows
e/HADOOP-5759
https://issues.apache.org/jira/brow
se/HADOOP-4675
2

BIND

ypbind-1.19-11.el5.x86_
64.rpm
bind-chroot-9.3.4-10.P1
.el5x86_64.rpm
bind-libs-9.3.4-10.P1.el
5x86_64.rpm
bind-utils-9.3.4-10.P1.e
l5.x86_64.rpm
bind-9.3.4-10.P1.el5.x8
6_64.rpm

I-13

CentOS5.3

No.

caching-nameserver-9.
3.4-10.P1.el5.x86_64.rp
m
3

DRBD

drbd-8.3.2.tar.gz

http://oss.linbit.com/drbd/

Ganglia

ganglia-3.1.2.tar.gz

http://sourceforge.net/projects/gan
glia/files/ganglia%20monitoring%2
0core/

Heartbeat

Heartbeat

Heartbeat

heartbeat-2.1.4-1.rhel5.

http://www.linux-ha.org/wiki/Dow

x86_64.RPMS.tar.gz

nload/ja

hb-monitor-1.02-1.hb21

4.x86_64.rpm

http://www.linux-ha.org/wiki/Cont
rib/ja

Kemari

Kemari

http://sourceforge.net/projects/kem

kemari-xen-testing.tar.

ari/files/-kemari-v1

bz2
Kemari RA
ha-tools.tar.bz2
7

Nagios

nagios-3.2.0.tar.gz

http://www.nagios.org/download/co
re

Net-SNMP

net-snmp-5.3.2.2-5.el5

CentOS5.3

net-snmp-utils-5.3.2.25.el5
net-snmp-perl-5.3.2.2-5
.el5
net-snmp-libs-5.3.2.2-5.
el5
9

Puppet

puppet-server-0.24.8-1.

http://download.fedora.redhat.com

el5.1.noarch.rpm

/pub/epel/5/x86_64/repoview/letter

puppet-0.24.8-1.el5.1.n

_p.group.html

I-14

No.

oarch.rpm
facter-1.5.2-2.el5.noarc
h.rpm
10

I.2.2

Xen

kemari-xen-testing.tar.

http://sourceforge.net/projects/kem

bz2

ari/files/

I.2.2.1
I-3

I-15

Hadoop

r2

eth1

:
192.168.10.0/24
: 192.168.102.0/24
: 192.168.102.1
: 192.168.102.2192.168.102.50
(IP)

()

eth0

/()

()

eth0

()

Hadoop

10Gbps

eth2
eth2
eth3
eth3

eth0

Hadoop
eth1

JobClient

JobTracker
U27-26
eth1
eth0

JobTracker
U25-24
eth1
eth0

NameNode()
U22-21
eth1
eth0

10Gbps

eth2
eth2
eth3
eth3

eth0

eth0

eth0

:
:
IP:

NameNode
U20-19
eth1
eth0

Member

WS-C3750G-24TS-E

stack

WS-C3750G-24TS-E

: 192.168.107.0/24
: 192.168.107.1

Master

IP:
192.168.107.16192.168.107.128
(DHCP)

WS-C3750G-24TS-E
L3SW

WS-C3750E-24TD-S
L3SW

DL380G5(M8G)
U11-10
DL380G5(M8G)
U13-12
DL380G5(M8G)
U16-15
DL380G5(M8G)
U18-17
DL360G5(QC3.16)
U23
DL360G5(QC3.16)
U25
DL360G5(QC3.16)
U27
DL360G5(QC3.16)
U29
DL360G5(QC3.16)
U31
DL360G5(QC3.16)
U33
DL360G5(QC3.16)
U35
DL360G5(QC3.16)
U37

eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0

WS-C3750E-24TD-S
L3SW

DL360G5(DC2.33)
U12
DL360G5(DC2.33)
U13
DL360G5(DC2.33)
U15
DL360G5(DC2.33)
U16
DL360G5(DC2.33)
U18
DL360G5(DC2.33)
U19
DL360G5(DC2.33)
U21
DL360G5(DC2.33)
U22
DL360G5(DC2.33)
U27
DL360G5(DC2.33)
U28
DL360G5(DC2.33)
U30
DL360G5(DC2.33)
U31
DL360G5(DC2.33)
U33
DL360G5(DC2.33)
U34
DL360G5(DC2.33)
U36
DL360G5(DC2.33)
U37

eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0

WS-C3750G-24TS-E
L3SW

DL360G5(DC2.33)
U12
DL360G5(DC2.33)
U13
DL360G5(DC2.33)
U15
DL360G5(DC2.33)
U16
DL360G5(DC2.33)
U18
DL360G5(DC2.33)
U19
DL360G5(DC2.33)
U21
DL360G5(DC2.33)
U22
DL360G5(DC2.33)
U27
DL360G5(DC2.33)
U28
DL360G5(DC2.33)
U30
DL360G5(DC2.33)
U31
DL360G5(DC2.33)
U33
DL360G5(DC2.33)
U34
DL360G5(DC2.33)
U36
DL360G5(DC2.33)
U37

eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0
eth0

WS-C3750E-24TD-S
L3SW

Express5800
U30F
Express5800
U30B
Express5800
F31F
Express5800
U31B
Express5800
U32F
Express5800
U32B
Express5800
U33F
Express5800
U33B
Express5800
U34F
Express5800
U34B

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

eth0

eth1

: 192.168.103.0/24
: 192.168.103.1

: 192.168.104.0/24
: 192.168.104.1

: 192.168.105.0/24
: 192.168.105.1

: 192.168.106.0/24
: 192.168.106.1

eth1

IP:
192.168.103.16192.168.103.128
(DHCP)

IP:
192.168.104.16192.168.104.128
(DHCP)

IP:
192.168.105.16192.168.105.128
(DHCP)

IP:
192.168.106.16192.168.106.128
(DHCP)

eth1

r3

r4

r5

r6

eth1

eth1

Member

WS-C3750E-24TD-S
stack

Express5800
U28F
Express5800
U28B
Express5800
U29F
Express5800
U29B
Express5800
U30F
Express5800
U30B
Express5800
U31F
Express5800
U31B
Express5800
U32F
Express5800
U32B
Express5800
U33F
Express5800
U33B
Express5800
U34F
Express5800
U34B
Express5800
U35F
Express5800
U35B
Express5800
U36F
Express5800
U36B
Express5800
U37F
Express5800
U37B

Master
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1
eth1

r7

Hadoop

I-3

I.2.2.2
I-11

I-16

Express5800
U18F
Express5800
U18B
Express5800
U19F
Express5800
U19B
Express5800
U20F
Express5800
U20B
Express5800
U21F
Express5800
U21B
Express5800
U22F
Express5800
U22B
Express5800
U23F
Express5800
U23B
Express5800
U24F
Express5800
U24B
Express5800
U25F
Express5800
U25B
Express5800
U26F
Express5800
U26B
Express5800
U27F
Express5800
U27B

I-11

r2

SWNo

r2
r2

U19-20

r2

U21-22

r2
r2

U24-25

r2

U26-27

r2
r2
r2

r2

r3
r3
r3
r3
r3
r3
r3
r3
r3
r3
r3
r3
r3
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r4
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5
r5

U11-10
U13-12
U16-15
U18-17
U23
U25
U27
U29
U31
U33
U35
U37

U13
U15
U16
U18
U19
U21
U22
U27
U28
U30
U31
U33
U34
U36
U37

Gi0/1
Gi0/2
Gi0/3
Gi0/4
Gi0/5
Gi0/6
Gi0/7
Gi0/8
Gi0/9
Gi0/10
Gi0/11
Gi0/12
Gi0/1
Gi0/2
Gi0/3
Gi0/4
Gi0/5
Gi0/6
Gi0/7
Gi0/8
Gi0/9
Gi0/10
Gi0/11
Gi0/12
Gi0/13
Gi0/14
Gi0/15
Gi0/16
-

U10
U11
U13
U14
U16
U17
U20
U21
U23
U24
U27
U30
U31
U33
U34
U36
U37
U37

WS-C3750G-24TS-E x2

Gi1/0/21(eth0)
DL380G5(M32G)
Gi2/0/21(eth1)
Gi1/0/22(eth0)
DL380G5(M32G)
Gi2/0/22(eth1)

Gi1/0/23(eth0)
DL380G5(M32G)
Gi2/0/23(eth1)
Gi1/0/24(eth0)
DL380G5(M32G)
Gi2/0/24(eth1)
Gi1/0/12
Compaq dc7800 SFF
Gi2/0/12
MATE ME-8 MY30A/E-8
Gi1/0/13
Compaq dc7800 SFF
Gi1/0/17(eth0)
DELL R410
Gi1/0/17(eth1)

U12

Gi0/1
Gi0/2
Gi0/3
Gi0/4
Gi0/5
Gi0/6
Gi0/7
Gi0/8
Gi0/9
Gi0/10
Gi0/11
Gi0/12
Gi0/13
Gi0/14
Gi0/15
Gi0/16
Gi0/17
Gi0/18

WS-C3750G-24TS-E
DL380G5 XE5345
DL380G5 XE5345
DL380G5 XE5345
DL380G5 XE5345
DL360G5 XX5460
DL360G5 XX5460
DL360G5 XX5460
DL360G5 XX5460
DL360G5 XX5460
DL360G5 XX5460
DL360G5 XX5460
DL360G5 XX5460
WS-C3750E-24TD-S
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
DL360G5 LV DC X5148
WS-C3750E-24TD-S
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C
DL360G6 XE 5504 1P4C

nn

NameNode

IP
192.168.102.1
192.168.103.1
192.168.104.1
192.168.105.1
192.168.106.1
192.168.107.1
192.168.102.10

r2

L3


MEMGB
r2.example.netvlan102

vlan103/vlan104/vlan105/vlan107
CISCO
VLANGWIP
IPDHCP

hnn1

NameNode

192.168.102.11

ILO: 192.168.102.201(Gi1/0/19)

HP

32

hnn2

NameNode 192.168.102.12

ILO: 192.168.102.202(Gi1/0/20)

HP

32

jt

JobTracker

192.168.102.20

hjt1

JobTracker

192.168.102.21

ILO: 192.168.102.203(Gi2/0/19)

HP

32

hjt2

JobTracker 192.168.102.22

ILO: 192.168.102.204(Gi2/0/20)

HP

32

pp1
pp2
mg1

192.168.102.2
/( 192.168.102.3

192.168.102.5

puppet/DNS/DHCP/TFTP
puppet/DNS/DHCP/TFTP
Nagios/Ganglia

HP
NEC
HP

4
4
4

2
2
2

job2

Job

r3
r3-1-0-01
r3-1-0-02
r3-1-0-03
r3-1-0-04
r3-1-0-05
r3-1-0-06
r3-1-0-07
r3-1-0-08
r3-1-0-09
r3-1-0-10
r3-1-0-11
r3-1-0-12
r4
r4-1-0-01
r4-1-0-02
r4-1-0-03
r4-1-0-04
r4-1-0-05
r4-1-0-06
r4-1-0-07
r4-1-0-08
r4-1-0-09
r4-1-0-10
r4-1-0-11
r4-1-0-12
r4-1-0-13
r4-1-0-14
r4-1-0-15
r4-1-0-16
r5
r5-1-0-01
r5-1-0-02
r5-1-0-03
r5-1-0-04
r5-1-0-05
r5-1-0-06
r5-1-0-07
r5-1-0-08
r5-1-0-09
r5-1-0-10
r5-1-0-11
r5-1-0-12
r5-1-0-13
r5-1-0-14
r5-1-0-15
r5-1-0-16
r5-1-0-17
r5-1-0-18

I-17

DELL

192.168.103.254 IP

CISCO
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
CISCO
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
CISCO
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP
HP

192.168.104.254 IP

192.168.105.254 IP

8
-

8
-

8
8
8
8
4
4
4
4
4
4
4
4

8
8
8
8
6
6
6
6
6
6
6
6

2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2

2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2

4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4
4

6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6

SWNo
r6
U30F Gi0/1
r6
U30B Gi0/2
r6
U31F Gi0/3
r6
U31B Gi0/4
r6
U32F Gi0/5
r6
U32B Gi0/6
r6
U33F Gi0/7
r6
U33B Gi0/8
r6
U34F Gi0/9
r6
U34B Gi0/10
r6
r7
U18F Gi1/0/1
r7
U18B Gi1/0/2
r7
U19F Gi1/0/3
r7
U19B Gi1/0/4
r7
U20F Gi1/0/5
r7
U20B Gi1/0/6
r7
U21F Gi1/0/7
r7
U21B Gi1/0/8
r7
U22F Gi1/0/9
r7
U22B Gi1/0/10
r7
U23F Gi1/0/11
r7
U23B Gi1/0/12
r7
U24F Gi1/0/13
r7
U24B Gi1/0/14
r7
U25F Gi1/0/15
r7
U25B Gi1/0/16
r7
U26F Gi1/0/17
r7
U26B Gi1/0/18
r7
U27F Gi1/0/19
r7
U27B Gi1/0/20
r7
U28F Gi2/0/1
r7
U28B Gi2/0/2
r7
U29F Gi2/0/3
r7
U29B Gi2/0/4
r7
U30F Gi2/0/5
r7
U30B Gi2/0/6
r7
U31F Gi2/0/7
r7
U31B Gi2/0/8
r7
U32F Gi2/0/9
r7
U32B Gi2/0/10
r7
U33F Gi2/0/11
r7
U33B Gi2/0/12
r7
U34F Gi2/0/13
r7
U34B Gi2/0/14
r7
U35F Gi2/0/15
r7
U35B Gi2/0/16
r7
U36F Gi2/0/17
r7
U36B Gi2/0/18
r7
U37F Gi2/0/19
r7
U37B Gi2/0/20
r7

WS-C3750G-24TS-E
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
WS-C3750E-24TD-S x2
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800
Express5800

r6
r6-1-0-01
r6-1-0-02
r6-1-0-03
r6-1-0-04
r6-1-0-05
r6-1-0-06
r6-1-0-07
r6-1-0-08
r6-1-0-09
r6-1-0-10
r7
r7-1-0-01
r7-1-0-02
r7-1-0-03
r7-1-0-04
r7-1-0-05
r7-1-0-06
r7-1-0-07
r7-1-0-08
r7-1-0-09
r7-1-0-10
r7-1-0-11
r7-1-0-12
r7-1-0-13
r7-1-0-14
r7-1-0-15
r7-1-0-16
r7-1-0-17
r7-1-0-18
r7-1-0-19
r7-1-0-20
r7-2-0-01
r7-2-0-02
r7-2-0-03
r7-2-0-04
r7-2-0-05
r7-2-0-06
r7-2-0-07
r7-2-0-08
r7-2-0-09
r7-2-0-10
r7-2-0-11
r7-2-0-12
r7-2-0-13
r7-2-0-14
r7-2-0-15
r7-2-0-16
r7-2-0-17
r7-2-0-18
r7-2-0-19
r7-2-0-20

I-18

IP
192.168.106.254

IP

192.168.107.254

IP


MEMGB
CISCO
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
CISCO
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2
NEC
2
2

II

II
No.

Google GFS /MapReduce


GFS/MapReduce Hadoop
MPI(Message Passing Interface) MPICH
Open MPI

GPS

Hadoop Hadoop

Hadoop Hadoop

/
IT PC
IA

Open
Source Initiative
10
GPL, Apache

Hadoop

Apache Software Foundations


1 Hadoop
HDFS
/
MapReduce 2

II-1

II

No.

MapReduce

Map Reduce 2
Map

Reduce Map

Map HDFS
Reduce HDFS
MapReduce
Map Reduce
MapReduce
JobTracker TaskTracker 2

HDFS

Hadoop
64MB
1

1 3

HDFS
NameNode
DataNode 2

Hadoop

Hadoop
JobTracker NameNode

Hadoop

Hadoop

TaskTracker
DataNode

FT

II-2

II

No.

Kemari

FT
I/O I/O

Kemari
HA

http://www.osrg.net/kemari/

HA

2
2

Heartbeat

HA

http://linux-ha.org

Kickstart

Red Hat Linux Linux


Linux
Linux

Kickstart
Kickstart

Puppet

Puppet Puppet

http://reductivelabs.com/products/puppet/

II-3

II

No.

Ganglia

CPU

Web

http://ganglia.sourceforge.net/

II-4

Вам также может понравиться