Вы находитесь на странице: 1из 9

HDFS

NameNode
. DataNode.
NameNode EditLog. EditLog
FsImage.
Secondary NameNode EditLoga FSImage. NN
SN.
DN, NN DN
.
,
. NN,
, , DN ,
, ,
.. NN , NN
, .
, -> /trash, ,
.
FSImage , , ..

SPOF
NN is SPOF.
Failover process:
NN. ACTIVE, STANDBY. ActiveNN
Journal Nodes, SBN JN, ACTIVE
, SBN JN ACTIVE.
ZooKeeper.
Checkpoint Node
EditLog FSImage NN
Backup Node
, NN, ..
.
Federation
NN , ,
.
Archival Storage
,
. (RAM_DISK SSD HDD)
HADOOP ARCHIVE

HAR,
NN. .

Local Mode/Distributed Mode/Pseudo Distributed


Local jvm, Distr , Pseudo Distributed

MapReduce
Data -> Split -> Map -> Combine -> Shuffle (Partition) -> Reduce -> Result

Secondary Sort
MR Reducer. SS
.
:
1. Reducer, OutOfMemory
2. , SortComparator
GroupingComparator .

MapSideJoin
.
distributed cache. .
().
ReduceSideJoin.

ReduceSideJoin
(MultipleInputs).
.

MR LifeCycle
Client submit job into JobTracker. JobTracker ,
. TaskTracker TaskRunner,
JobTracker.

Yarn LifeCycle
Client submits job into ResourceManager, Scheduler , RM
AppMaster NodeManager. AppMaster
, , RM ,
, AppMaster,
.

Hive
- ,
(1024).
. ORC .

ObjectInspector
UDF ,
.

Tez
Direct Acyclic Graph,
.
.

SPARK
RDD .
, .
DataFrame . ,
..
Optimizations 1) Tungsten offheap storage, no GC and java serialization; 2)
Catalyst Optimizer .
String, .. , .
SQL ExpressionBuilder.
Datasets API.

Lineage
RDD, .. .

Spark Streaming
DStream collection of RDDs. (,
, , , ..).
Receiver,
.
,
.

Join
rdd1.union(rdd2).groupByKey()
PairRddFuntions.join

MLLib
AUROC
TPR = True Positive / All Positives
FPR = False pos/ All Negatives
x-fpr, y tpr
, ,
.

Sqoop
RDBMS -> hdfs. oozie . map-only
job.

Flume
(Source -> Channel -> Sink) = Agent
, , ( ).
Event , Flow .
,
Channel , , ,
.
Sink , ,
( HDFS)

Oozie
workflow.

Zookeeper
. , .

Cassandra
. .
.
. KeyPartitioner .
( ,
, ALL/
ONE/ - Quorum).
. .
(, , ..).
, CommitLog,
MemTables SSTables .

(compaction)
Deleted, SSTables
.
,
.
.
,
.
KeySpace = Database, ColumnFamily~=table.
Meta . .
inserte TTL .
AP ( ).

HBase
.
, , ,
.
RegionServer, - hdfs .
RegionServer , .
ZooKeeper.
HFile, .
( ).
, HLog
.
-.
Deleted, .
, ,
RegionServer CP ( ,
).

Bloom filters
Array of N bits. N hash-functions each returns number between 0 and N to fill the
array.
, , , ,
, .


HDFS............................................................................................................................... 1
SPOF................................................................................................................................ 1
Local Mode/Distributed Mode/Pseudo Distributed...........................................................2
MapReduce...................................................................................................................... 2
Secondary Sort................................................................................................................ 2
MapSideJoin..................................................................................................................... 2
ReduceSideJoin................................................................................................................ 2
MR LifeCycle.................................................................................................................... 2
Yarn LifeCycle.................................................................................................................. 2
Hive................................................................................................................................. 3
ObjectInspector............................................................................................................... 3
Tez................................................................................................................................... 3
SPARK.............................................................................................................................. 4
Lineage........................................................................................................................... 4
Spark Streaming.............................................................................................................. 4
Join.................................................................................................................................. 4
MLLib............................................................................................................................... 4
AUROC............................................................................................................................ 4
Sqoop.............................................................................................................................. 5
Flume.............................................................................................................................. 5
Oozie............................................................................................................................... 5
Zookeeper....................................................................................................................... 5
Cassandra....................................................................................................................... 6
HBase.............................................................................................................................. 7
Bloom filters.................................................................................................................... 8