Вы находитесь на странице: 1из 2

Hadoop

HDFS --> Hadoop Distributed File System


MapReduce --> Distributed Compute Framework
Is written in JAVA
HDFS ( Hadoop Distributed File System )
Master Node --> NameNode
Slave Nodes ---> DataNode
All Daemons( for each node) are running separate JVM. Daemon: is a process which it running
continuosly & waits for instructions.
NameNode
It doesn't stores any data & It keeps the track of where data is stored.
Stores meta data.
DataNodes
All data stored in Data nodes
Each node needs to connect the Namenode for that, need to add Namenode info in core-site.xml
files of all Datanodes.(/etc/hadoop/conf)
When Daemon starts, it checks for Namenode & it registers with Namenode. ( NN gets the
Diskspace info from all DNs)
Then NN shows all disks total space as Single Disk ( virtual disk)
Important Points

Hadoop splits the FIle into blocks(64MB to 128MB) or Chunks, stores each block in 3
datanodes.( 2 replicates)
Writes slow & reads faster( writes in 3 locations & reads from different locations )
Parallel disk reads for increased through puts.
Data replication & Fault Tolerance
NN stores everything(metadata) in Memory, If metadata doesn't fit in Memory , then it will not
store it.
All put&get things will be done by Hadoop client( which will conects to NN & DNs )
NN stores Metadata in Memory, DNs stores the blocks in Local FS, Hadoop client provides
virtual vies to the end user.
URI:Universal Resource Identifier --> the path where data stores in Virtual Disk
Namenode will make entry for URI in the memory.
hdfs-site.xml will have the Block size information, according to that NN desides the how to
split & where to store .
According to that Hclient splits the file into blocks & Shifts first Block to mention DN, that DN
sends the block to another DN, from that DN to another.
DN Sends the Heartbeats for every 10 secs to NN( sends alive status), when ever block
stored in DN, Heartbeat will reports that information to NN
When NN gets confirmation from DN, then it will add the entry of that into Memory.( updates
metadata in memory)
Once one block copied, Hclient gets status & then it copies another block.

Namenode Crash Scenarios


When ever NN crashes, we will lost the all Metadata info, for that 'editlog' introduced.
First it updates it editlog file & then it updates the metadata in memory, then if NN crashes, we
can get metadata info from editlog file.
Edit logs are files in NN which keeps track of any changes happening in NN memory. ( this is
similar to transactionlogs in RDBMS)

When ever NN crashes, it takes long time for creating metadata info in memory.

For this we can take memory snapshots called as FSimage, and when ever NN crashes we
can recovery metdata memory immediately with closest snapshot/FSimage & the missed data
after snapshot can be done using editlogs.
FSimage: periodical snapshots of NN metadata updated memory

File --> Hclient ---> Checks with NN for block size & location <---> NN
---> Splits the file into blocks
---> Shifts first Block to mention DN,
DN ---> that DN sends the block to another DN, from that DN to another.
--> Sends the Heartbeats for every 10 secs --> NN
--> when ever block stored in DN, Heartbeat will reports that information to -->NN
NN --> when NN gets confirmation from DN, then it will add the entry of that into Memory.( before it
adds it in editlog)
# hadoop fs -mkdir /abc/edf
# hadoop fs -ls /abc/
# hadoop fs -put <file> /abc/edf
Checking the file in GUI, check blocks assinged.

Вам также может понравиться