Вы находитесь на странице: 1из 7

CLOUDSTORE

Introduction

CloudStore (formerly, Kosmos filesystem) was initially designed and implemented at Kosmix in 2006
by two developers Sriram Rao and Blake Lewis, designed and developed kfs. KFS released as an
open-source project in Sep. 2007. Quantcast is now the primary sponsor of the project .

Web-scale applications require a scalable storage infrastructure to process vast amounts of data.
CloudStore (formerly, Kosmos filesystem) is an open-source high performance distributed file system
designed to meet such an infrastructure need.

CloudStore is implemented in C++ using standard system components such as STL(Standard


Template Library), boost libraries, aio, log4cpp. CloudStore is integrated with Hadoop and Hypertable.
This enables applications built on those systems to seamlessly use CloudStore as the underlying data
store. CloudStore is deployed on Solaris and Linux platforms for storing web log data, crawler data,
etc. CloudStore source code is released under the terms of the Apache License Version 2.0.

Web search engines are required to process large volumes of data. This entails having a scalable
backend storage infrastructure built on commodity hardware (such as, cluster of PCs running Linux).
To address this infrastructure need, at Kosmix, it released KFS as an open-source project under the
terms of the Apache 2.0 license. The initial release is KFS version 0.1 and it is currently in “alpha”.

KFS designed and implemented for Growing class of applications that process large volumes of data
Web Search, Web log analysis, Web 2.0 apps,Grid computing, The Key requirements are cost-efficient
scalable compute/storage infrastructure,It is focused towards building scalable storage infrastructure.

Looks like the big differences are that KFS is a more complete clone of GFS:

* KFS supports atomic append.


* KFS supports rebalancing.
* KFS exports a POSIX file interface,GFS does not.

ABOUT CLOUDSTORE

CloudStore builds upon on ideas from Google's well-known filesystem project, GFS. The system
consists of 3 components:

● Meta server: This provides the global namespace for the filesystem.

● Chunkserver(Block server): Files in KFS are split into chunks. Each chunk is 64MB in size.
Chunks are replicated and striped across chunkservers. Chunkserver store the chunks as files
in the underlying file system (such as, ZFS on Solaris or XFS on Linux)

● Client library: The client library is linked with applications. This enables applications to
read/write files stored in KFS. Client library provides the file system API to allow applications to
interface with CloudStore. To integrate applications to use CloudStore, applications will need to
be modified and relinked with the CloudStore client library.
File consists of a set of chunks Applications are oblivious of chunks I/O is done on files. Translation
from file offset to chunk/offset is transparent to the application. Each chunk is fixed in size. Chunk size
is 64MB. While CloudStore can be accessed natively from C++ applications, support is also provided
for Java and Python applications. JNI(Java Native Interface) glue code/Python module support is
included in the release to allow those applications to access CloudStore via the CloudStore client
library APIs.

Releases of Kostmos File System

The various releases are:

■ KFS-0.2.3 release (Latest)


■ KFS-0.2.2 release
■ KFS-0.2.1 release
■ KFS 0.2.0 release
■ KFS 0.1.3 release
■ KFS-0.1.2 release
■ Kosmos-0.1.1
■ KFS 0.1 release

KFS version 0.1 release

Release Date: 2007-09-28 05:15

KFS is a scalable distributed file system designed for applications with large data processing needs
(such as, web search, text mining, grid computing, etc). This is the first release of the software.

Kosmos version 0.1.1 release

Release Date: 2007-11-13 06:44

Update to the initial release with additional features and stability fixes.

KFS version 0.1.2 release

Release Date: 2007-12-21 05:59

This release is a bug fix release. We are rolling in the fix to the client library code from kfs-0.1.1

KFS version 0.1.3 release

Release Date: 2008-06-02 01:18

We are pleased to release KFS v. 0.1.3. The primary change from previous release is support for 32-
bit platforms.
KFS version 0.2.1 release

Release Date: 2008-08-11 18:03

This is a new release that updates kfs-0.2.0 with minor fixes/support for Mac OSX (leopard).

KFS version 0.2.2 release


Date: 2008-09-24 06:32

We are pleased to announce release 0.2.2 of KFS. This has stability improvements over previous
release. This also includes a Web UI for monitoring KFS servers.

KFS version 0.2.3 release

Date: 2009-01-12 21:32

We are pleased to announce release 0.2.3 of KFS. The new capabilities we have added to this release
are:
(1) support JBOD on the chunkservers (chunks are round-robin across drives and space on a given
drive is also used to determine placement),
(2) kfsfsck, a tool for checking KFS health. In addition, the release also contains stability fixes as well
as fixes to FUSE.

CloudStore Features
Incremental scalability: New chunkserver nodes can be added as storage needs increase; the
system automatically adapts to the new nodes.

Availability: Replication is used to provide availability due to chunk server failures. Typically, files are
replicated 3-way.

Per file degree of replication: The degree of replication is configurable on a per file basis, with a
max. limit of 64.

Re-replication: Whenever the degree of replication for a file drops below the configured amount (such
as, due to an extended chunkserver outage), the metaserver forces the block to be re-replicated on
the remaining chunk servers. Re-replication is done in the background without overwhelming the
system.

Rack-aware data placement: The chunk placement algorithm is rack-aware. Wherever possible, it
places chunks on different racks.

Re-balancing: Periodically, the meta-server may rebalance the chunks amongst chunkservers. This is
done to help with balancing disk space utilization amongst nodes.

Data integrity: To handle disk corruptions to data blocks, data blocks are checksummed. Checksum
verification is done on each read; whenever there is a checksum mismatch, re-replication is used to
recover the corrupted chunk.
File writes: The system follows the standard model. When an application creates a file, the filename
becomes part of the filesystem namespace. For performance, writes are cached at the CloudStore
client library. Periodically, the cache is flushed and data is pushed out to the chunkservers. Also,
applications can force data to be flushed to the chunkservers. In either case, once data is flushed to
the server, it is available for reading.

Leases: CloudStore client library uses caching to improve performance. Leases are used to support
cache consistency.

Chunk versioning: Versioning is used to detect stale chunks.

Client side fail-over: The client library is resilient to chunksever failures. During reads, if the client
library determines that the chunkserver it is communicating with is unreachable, the client library will
fail-over to another chunkserver and continue the read. This fail-over is transparent to the application.

Language support: CloudStore client library can be accessed from C++, Java, and Python. FUSE
support on Linux: By mounting CloudStore via FUSE, this allows existing linux utilities (such as, ls) to
interface with CloudStore.

Tools: A shell binary is included in the set of tools. This allows users to navigate the filesystem tree
using utilities such as, cp, ls, mkdir, rmdir, rm, mv. Tools to also monitor the chunk/meta-servers are
provided.

Deploy scripts: To simplify launching CloudStore servers, a set of scripts to:


1. install CloudStore binaries on a set of nodes,
2. start/stop CloudStore servers on a set of nodes are also provided.

Job placement support: The CloudStore client library exports an API to determine the location of a
byte range of a file. Job placement systems built on top of CloudStore can leverage this API to
schedule jobs appropriately.

Local read optimization: When applications are run on the same nodes as chunkservers, the
CloudStore client library contains an optimization for reading data locally. That is, if the chunk is stored
on the same node as the one on which the application is executing, data is read from the local node.

how KFS Works


Storage Virtualization

Construct a global namespace by decoupling storage from filesystem namespace Build a ''disk'' by
aggregating the storage from individual nodes in the cluster To improve performance, stripe a file
across multiple nodes in the cluster Use replication for tolerating failures Simplify storage
management System automagically balances storage utilization across all nodes Any file can be
accessed from any machine in the network .

System Architecture

Single meta-data server that maintains the global namespace Multiple chunkservers that enable
access to data Client library linked with applications for accessing files in KFS System implemented in
C++ .
Inter-process communication is via non-blocking TCP sockets. Communication protocol is text-based
Patterned after HTTP Connections between meta-server and chunkservers are persistent Simple
failure model:
● Connection break implies failure.
● Works for LAN settings.

KFS consists of three components:

Meta Server

The meta server is the repository for all file meta-data. It stores the file meta-data such as, directory
information, blocks of a file, etc., in-memory in a B-tree. Operations that mutate the tree are logged to
a log file. Periodically, via an off-line process, the log files are compacted to create a checkpoint file.
Whenever the metaserver is restarted, it rebuilds the B-tree from the latest checkpoint; it then applies
mutations to the tree from the log files to recover system state.

For fault-tolerance, the meta server's logs and checkpoint files should be backed up to a remote node.
The source code contains scripts that use rsync to backup system meta-data.

Chunkserver

Chunkservers store chunks, which are blocks of a file. Each chunk is 64MB in size. Chunkserver
stores chunks as files in the underlying filesystem. To protect against data corruptions, Adler-32
checksums are used:
● On writes, checksums are computed on 64KB block boundaries and saved in the chunk meta-
data.
● On reads, checksum verification is done using the saved data Internally,

Each chunk file is named: [file-id].[chunk-id].[version]. Each chunk file has a 16K header, that contains
the chunk checksum information. The checksum information is updated during writes.

When a chunkserver is restarted, it scans the directory containing the chunks to determine the chunks
it has. It then sends that information to the meta server. The meta server validates the blocks and
notifies the chunkserver of any _stale_ blocks. These blocks are those that are not owned by any file
in the system. Whenever a chunkserver receives a stale chunk notification, it deletes those chunks.

Client library

The client library enables applications to access files stored in KFS. For file operations, the client
library interfaces with the meta-data server to get file meta-data:
● On reads, the client library interfaces with the meta server to determine chunk locations; it then
downloads the block from one of the replicas
● On writes, the client library interfaces with the meta server to determine where to write the
chunk; the client library then forwards the data to the first replica; the first replica forwards the
data to the next replica and so on.

KFS virtualizes disk storage on a cluster of machines providing a global namespace. Files are striped
across nodes in the cluster and are replicated for fault tolerance/availability. KFS consists of a client
library that enables user applications to read/write files stored in KFS. KFS supports the familiar
filesystem interfaces/programming model. The functionality of the KFS API is similar to the model
exposed by operating systems such as Linux. To illustrate,
• When a file is created, the filename is visible in the global namespace.
• As data is written to a block of a file, it gets flushed out to the set of servers storing that block. Data
written to servers can now be read by other processes.
• For writing/reading, a process can seek to any point in the file and read/write from there.
• Files can be opened for writing multiple times.
• Data can be appened to existing files by opening the file for writing in append mode.

When blocks of a file are striped across nodes in the cluster, KFS stores individual blocks of file as
files in the underlying file system (such as, XFS on Linux). To guard against disk corruption,
checksums are computed on the blocks and verified on each read. If disk corruption is detected by
checksum mismatch, the system discards the corrupted block and uses re-replication to recover lost
data. Each file stored in KFS is typically replicated 3-way. Depending on application needs, the degree
of replication for files can be changed on-the-fly.KFS also contains rudimentary support for block
rebalancing. To help with better disk utilization across nodes, the system may periodically migrate data
from over-utilized to under-utilized nodes.KFS client library provides support job placement systems.
For instance, a job scheduler can determine the location(s) of a byte range within a file and schedule
jobs appropriately.KFS is implemented in C++. In addition to C++ applications, KFS also contains
support for Java (via JNI) and Python applications.To enable a large class applications to evaluate
KFS, we have integrated KFS to be the backing store for other open source projects:

• Hadoop: Hadoop is an open-source project that provides a Map/Reduce implementation. It contains


a Filesystem API that allows alternate implementations to be used as the backing store. For example,
currently, the set of choices for a backing store are Local filesystem, HDFS, S3 infrastructure. As a
new alternative to these choices, KFS is integrated with Hadoop using Hadoop’s Filesystem API. This
allows existing Hadoop Map/Reduce applications to use KFS seamlessly. That is, by changing some
Hadoop configuration parameters, KFS can be used as the backing store. We have submitted the
necessary “glue” code to the Hadoop code-base; it will be included in the next Hadoop release.

• Hypertable: Hypertable is an open source project (being developed at Zvents Inc.) that provides a
Big-Table interface. KFS is integrated with Hypertable as the backing store.

Using KFS as the filesystem for Map/Reduce jobs


KFS is integrated with Hadoop so that it can be used the backing store for Map/Reduce jobs. This
integration is done using the filesystem APIs provided by Hadoop. In Hadoop's conf directory (such as,
hadoop/conf), edit site.xml:

<property>
<name>fs.kfs.metaServerHost</name>
<value><server></value>
<description>The location of the meta server.</description>
</property>

<property>
<name>fs.kfs.metaServerPort</name>
<value><port></value>
<description>The location of the meta server's port.</description>
</property>
This enables path URI's of the form

kfs://host1:20000/footo be sent to KFS.


If you want KFS as the default filesystem with Hadoop, update site.xml:

<property>
<name>fs.default.name</name>
<value>kfs://<server:port></value>
</property>
The Hadoop distribution contains a KFS jar file. To use the latest files, build the jar file (see
HowToCompile). The remaining steps are:

Copy the jar file


Update the LD_LIBRARY_PATH environment variable so that libkfsClient.so can be loaded
Start the Map/Reduce job trackers.
cp ~/code/kfs/build/kfs-0.2.0.jar hadoop/lib
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:~/code/kfs/build/lib
hadoop/bin/start-mapred.sh
The Hadoop utilities can be used to view the KFS directory. For example,

bin/hadoop fs -fs kfs://<server:port> -ls /can be used to list the contents of the root directory.

Using KFS with Hadoop 0.15x, 0.16x, 0.17x

There are a few bug fixes that have been checked into KFS+Hadoop glue code (the code is in:
src/.../org/apache/hadoop/fs/kfs/). These fixes will be part of Hadoop-0.18x. To enable KFS to work
properly with prior Hadoop releases, the bug fixes need to be backported. Code with the backport is
included in the kfs-0.2.0 release. For example, to use with Hadoop-0.17x (similarly for other releases):

cp ~/code/kfs/kfs-hadoop/0.17x/org/apache/hadoop/fs/kfs/* <hadoop-
dir>/src/java/org/apache/hadoop/fs/kfs
rm <hadoop-dir>/lib/kfs-0.1*.jar
cp ~/code/kfs/build/kfs-0.2.0.jar <hadoop-dir>/lib
ant jarThis will build the hadoop jar files to include the new "glue" code. After the build finishes, restart
the Map/Reduce job trackers.

Limitations in CLOUDSTORE

Currently the single point of failure is the one meta-data server, but the GFS authors argued a single
master is a great feature, but they keep live backup masters ready to go. I'm sure this will be fixed
soon in KFS.
From the GFS paper:
Having a single master vastly simplifies our design and enables the master to make sophisticated
chunk placement and replication decisions using global knowledge. The master state is replicated for
reliability. Its operation log and checkpoints are replicated on multiple machines.

KFS, currently, DOES NOT support atomic record append.

Most compaints so far are along the lines of "It doesn't work with Hadoop", which is silly, because it
does. However, KFS is not really attractive until they have a MapReduce-alike. You have to be able to
do something with all that data, after all Maybe KFS can be integrated with Hadoop's MapReduce to
make up for the current lack of such from Kosmix?

Вам также может понравиться