Академический Документы
Профессиональный Документы
Культура Документы
The Google
File System
Alexandru Costan
2
Outline
Doesn’t scale!
6
Motivation
• Commodity hardware
• Inexpensive
• Multiple chunkservers
• Grouped into racks
• Connected through switches
• Multiple clients
• Master/chunkserver coordination
• HeartBeat messages
11
GFS Architecture
• Problem:
• Single point of failure
• Scalability bottleneck
• GFS solutions:
• Shadow masters
• Minimize master involvement
• never move data through it, use only for metadata
• and cache metadata at clients
• large chunk size
• master delegates authority to primary replicas in data
mutations (chunk leases)
• Simple, and good enough for Google’s concerns
13
Metadata
• Metadata storage
• Holds all metadata in RAM
• Very fast operations on file system metadata
• Namespace management / locking
• Periodic communication with chunkservers
• Give instructions, collect state, track cluster health
• Garbage collection
• simpler, more reliable than traditional file delete
• master logs the deletion, renames the file to a
hidden name
• lazily garbage collects hidden files
• API:
• open, delete, read, write (as expected)
• snapshot: quickly create copy of file
• append: at least once, possibly with gaps and/or
inconsistencies among clients
19
Mutations
• High availability
• Fast recovery: master and chunkservers restartable
in a few seconds
• Chunk replication: default: 3 replicas
• Shadow masters
• Data integrity
• Checksum every 64KB block in each chunk
• Limitations
• Security
• Trusted environment, trusted users
• That doesn’t stop users from interfering with each other…
• Does not mask all forms of data corruption: requires
application-level checksum
30
Deployment at Google
• Namenode (Master)
• Metadata:
• Where file blocks are stored (namespace image)
• Edit (Operation) log
• Secondary namenode (Shadow master)
• Datanode (Chunkserver)
• Stores and retrieves blocks
• ...by client or namenode
• Reports to namenode with list of blocks they are
storing
34
Noticeable differences from GFS
• Open source
• Provides many interfaces and libraries for different
file systems
• S3, KFS, etc.
• Thrift (C++, Python, ...), libhdfs (C), FUSE
35
Anatomy of a file read
36
Anatomy of a file write
37
Additional features
• Replica placements:
• Different node, rack, and center
• Coherency model:
• Describes data visibility
• Current block being written may not be visible
to other readers
• Web interface
38
Bibliography
• https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html