Академический Документы
Профессиональный Документы
Культура Документы
Dr T.N.Sharma
` Data which is beyond the storage and
processing
` Characteristics of Big Data
◦ Volume
V l
◦ Velocity
y
◦ Variety
` Apache top level project, open-source
implementation of frameworks for reliable,
scalable, distributed computing and data
storage.
storage
` It is a flexible and highly-available
architecture for large scale computation and
data processing on a network of commodity
hardware.
` g
Designed to answer the q
question: “How
to process big data with reasonable
cost and time?”
2005: Doug Cutting and Michael J. Cafarella
developed Hadoop to support distribution for
the Nutch search engine project.
Doug Cutting
The project was funded by Yahoo.
2004
2006
HDFS
` Responsible for storing data on the cluster
◦ C
Communicates withh the
h NameNode d to determine
d
which blocks make up a file and on which data
nodes those blocks are stored