Академический Документы
Профессиональный Документы
Культура Документы
CLUSTER COMPUTING
Introduction:
A computer cluster is a group of linked computers, working together closely so that in many
respects they form a single computer. The components of a cluster are commonly, but not
always, connected to each other through fast local area networks. Clusters are usually deployed
to improve performance and/or availability over that of a single computer, while typically
being much more cost-effective than single computers of comparable speed or availability.
The High Performance Computing (HPC) allows scientists and engineers to deal with very
complex problems using fast computer hardware and specialized software. Since often these
problems require hundreds or even thousands of processor hours to complete, an approach,
based on the use of supercomputers, has been traditionally adopted. Recent tremendous
increase in a speed of PC-type computers opens relatively cheap and scalable solution for HPC,
using cluster technologies. The conventional MPP (Massively Parallel Processing)
supercomputers are oriented on the very high-end of performance. As a result, they are
relatively expensive and require special and also expensive maintenance support. Better
understanding of applications and algorithms as well as a significant improvement in the
communication network technologies and processors speed led to emerging of new class of
systems, called clusters of SMP(symmetric multi processor) or networks of workstations
(NOW), which are able to compete in performance with MPPs and have excellent
price/performance ratios for special applications types.
A cluster is a group of independent computers working together as a single system to ensure
that mission-critical applications and resources are as highly available as possible. The group is
managed as a single system, shares a common namespace, and is specifically designed to
tolerate component failures, and to support the addition or removal of components in a way
that's transparent to users.
1
CLUSTER COMPUTING
Development of new materials and production processes, based on high technologies, requires
a solution of increasingly complex computational problems. However, even as computer
power, data storage, and communication speed continue to improve exponentially; available
computational resources are often failing to keep up with what users’ demand of them.
Therefore high-performance computing (HPC) infrastructure becomes a critical resource for
research and development as well as for many business applications. Traditionally the HPC
applications were oriented on the use of high-end computer systems - so-called
"supercomputers". Before considering the amazing progress in this field, some attention should
be paid to the classification of existing computer architectures. SISD (Single Instruction
stream, Single Data stream) type computers. These are the conventional systems that contain
one central processing unit (CPU) and hence can accommodate one instruction stream that is
executed serially. Nowadays many large mainframes may have more than one CPU but each of
these executes instruction streams that are unrelated. Therefore, such systems still should be
regarded as a set of SISD machines acting on different data spaces. Examples of SISD
machines are for instance most workstations like those of DEC, IBM, Hewlett-Packard, and
Sun Microsystems as well as most personal computers. SIMD (Single Instruction stream,
Multiple Data stream) type computers. Such systems often have a large number of processing
units that all may execute the same instruction on different data in lock-step. Thus, a single
instruction manipulates many data in parallel. Examples of SIMD machines are the CPP DAP
Gamma II and the Alenia Quadrics.
Vector processors, a subclass of the SIMD systems. Vector processors act on arrays of similar
data rather than on single data items using specially structured CPUs. When data can be
manipulated by these vector units, results can be delivered with a rate of one, two and, in
special cases, of three per clock cycle (a clock cycle being defined as the basic internal unit of
time for the system). So, vector processors execute on their data in an almost parallel way but
only when executing in vector mode. In this case they are several times faster than when
executing in conventional scalar mode. For practical purposes vector processors are therefore
mostly regarded as SIMD machines. Examples of such systems are Cray 1 and Hitachi S3600.
MIMD (Multiple Instruction stream, Multiple Data stream) type computers. These machines
execute several instruction streams in parallel on different data. The difference with the multi
2
CLUSTER COMPUTING
processor SISD machines mentioned above lies in the fact that the instructions and data are
related because they represent different parts of the same task to be executed. So, MIMD
systems may run many sub-tasks in parallel in order to shorten the time-to-solution for the
main task to be executed. There is a large variety of MIMD systems like a four-processor NEC
SX-5 and a thousand processor SGI/Cray T3E supercomputers. Besides above mentioned
classification, another important distinction between classes of computing systems can be done
according to the type of memory access
Shared memory (SM) systems have multiple CPUs all of which share the same address space.
This means that the knowledge of where data is stored is of no concern to the user as there is
only one memory accessed by all CPUs on an equal basis. Shared memory systems can be both
SIMD and MIMD. Single-CPU vector processors can be regarded as an example of the former,
while the multi-CPU models of these machines are examples of the latter.
Distributed memory (DM) systems. In this case each CPU has its own associated memory.
The CPUs are connected by some network and may exchange data between their respective
memories when required. In contrast to shared memory machines the user must be aware of the
location of the data in the local memories and will have to move or distribute these data
explicitly when needed. Again, distributed memory systems may be either SIMD or MIMD.
3
CLUSTER COMPUTING
Supercomputers are defined as the fastest, most powerful computers in terms of CPU power
and I/O capabilities. Since computer technology is continually evolving, this is always a
moving target. This year’s supercomputer may well be next year’s entry level personal
computer. In fact, today’s commonly available personal computers deliver performance that
easily bests the supercomputers that were available on the market in the 1980’s. Strong
limitation for further scalability of vector computers was their shared memory architecture.
Therefore, massive parallel processing (MPP) systems using distributed-memory were
introduced by the end of the 1980s. The main advantage of such systems is the possibility to
divide a complex job into several parts, which are executed in parallel by several processors
each having dedicated memory. The communication between the parts of the main job occurs
within the framework of the so-called message-passing paradigm, which was standardized in
the message-passing interface (MPI). The message-passing paradigm is flexible enough to
support a variety of applications and is also well adapted to the MPP architecture. During last
year’s, a tremendous improvement in the performance of standard workstation processors led
to their use in the MPP supercomputers, resulting in significantly lowered price/performance
ratios.
Traditionally, conventional MPP supercomputers are oriented on the very high-end of
performance. As a result, they are relatively expensive and require special and also expensive
maintenance support. To meet the requirements of the lower and medium market segments, the
symmetric multiprocessing (SMP) systems were introduced in the early 1990s to address
commercial users with applications such as databases, scheduling tasks in telecommunications
industry, data mining and manufacturing. Better understanding of applications and algorithms
as well as a significant improvement in the communication network technologies and
processors speed led to emerging of new class of systems, called clusters of SMP or networks
of workstations (NOW), which are able to compete in performance with MPPs and have
excellent price/performance ratios for special applications types. On practice, clustering
technology can be used for any arbitrary group of computers, allowing building homogeneous
or heterogeneous systems. Even bigger performance can be achieved by combining groups of
clusters into Hyper Cluster or even Grid-type system.
4
CLUSTER COMPUTING
Extraordinary technological improvements over the past few years in areas such as
microprocessors, memory, buses, networks, and software have made it possible to assemble
groups of inexpensive personal computers and/or workstations into a cost effective system that
functions in concert and posses tremendous processing power. Cluster computing is not new,
but in company with other technical capabilities, particularly in the area of networking, this
class of machines is becoming a high-performance platform for parallel and distributed
applications Scalable computing clusters, ranging from a cluster of (homogeneous or
heterogeneous) PCs or workstations to SMP (Symmetric Multi Processors), are rapidly
becoming the standard platforms for high-performance and large-scale computing. A cluster is
a group of independent computer systems and thus forms a loosely coupled multiprocessor
system as shown in Figure
5
CLUSTER COMPUTING
6
CLUSTER COMPUTING
node typically cannot serve as a standalone computer; a cluster node usually contains its own
disk and is equipped with complete operating systems, and therefore, it also can handle
interactive jobs. In a distributed system, each node can function only as an individual resource
while a cluster system presents itself as a single system to the user.
Beowulf clusters:
The concept of Beowulf clusters is originated at the Center of Excellence in Space Data and
Information Sciences (CESDIS), located at the NASA Goddard Space Flight Center in
Maryland. The goal of building a Beowulf cluster is to create a cost-effective parallel
computing system from commodity components to satisfy specific computational requirements
for the earth and space sciences community. The first Beowulf cluster was built from 16
IntelDX4TM processors connected by a channel-bonded 10 Mbps Ethernet, and it ran the
Linux operating system. It was an instant success, demonstrating the concept of using a
commodity cluster as an alternative choice for high-performance computing (HPC). After the
success of the first Beowulf cluster, several more were built by CESDIS using several
generations and families of processors and network. Beowulf is a concept of clustering
commodity computers to form a parallel, virtual supercomputer. It is easy to build a unique
Beowulf cluster from components that you consider most appropriate for your applications.
Such a system can provide a cost-effective way to gain features and benefits (fast and reliable
services) that have historically been found only on more expensive proprietary shared memory
systems. The typical architecture of a cluster is shown in Figure 3. As the figure illustrates,
numerous design choices exist for building a Beowulf cluster. For, example, the bold line
indicates our cluster configuration from bottom to top. No Beowulf cluster is general enough to
satisfy the needs of everyone.
7
CLUSTER COMPUTING
8
CLUSTER COMPUTING
The question may arise why clusters are designed and built when perfectly good commercial
supercomputers are available on the market. The answer is that the latter is expensive. Clusters
are surprisingly powerful. The supercomputer has come to play a larger role in business
applications. In areas from data mining to fault tolerant performance clustering technology has
become increasingly important. Commercial products have their place, and there are perfectly
good reasons to buy a commercially produced supercomputer. If it is within our budget and our
applications can keep machines busy all the time, we will also need to have a data center to
keep it in. then there is the budget to keep up with the maintenance and upgrades that will be
required to keep our investment up to par. However, many who have a need to harness
supercomputing power don’t buy supercomputers because they can’t afford them. Also it is
impossible to upgrade them. Clusters, on the other hand, are cheap and easy way to take off-
the-shelf components and combine them into a single supercomputer. In some areas of research
clusters are actually faster than commercial supercomputer. Clusters also have the distinct
advantage in that they are simple to build using components available from hundreds of
sources. We don’t even have to use new equipment to build a cluster.
9
CLUSTER COMPUTING
Cluster Styles:
There is much kind of clusters that may be used for different applications.
Homogeneous Clusters
If we have a lot identical systems or lot of money at our disposal we will be building a
homogeneous cluster. This means that we will be putting together a cluster in which every
single node is exactly the same. Homogeneous clusters are very easy to work with because no
matter what way we decide to tie them together, all of our nodes are interchangeable and we
can be sure that all of our software will work the same way on all of them.
Heterogeneous Clusters
They come in two general forms. The first and most common are heterogeneous clusters made
from different kinds of computers. It does not matter what the actual hardware is except that
there are different makes and models. A cluster made from such machines will have several
very important details.
10
CLUSTER COMPUTING
visualization tool that allows a user to examine the state of the machine allocated to their job as
well as provides a means of studying message flows between nodes.
Design Considerations:
Before attempting to build a cluster of any kind, think about the type of problems you are
trying to solve. Different kinds of applications will actually run at different levels of
performance on different kinds of clusters. Beyond the brute force characteristics of memory
speed, I/O bandwidth, disk seek/latency time and bus speed on the individual nodes of your
cluster, the way you connect your cluster together can have a great impact on its efficiency.
Architecture:
A cluster is a type of parallel or distributed processing system, which consists of
a collection of interconnected stand-alone computers working together as a single, integrated
computing resource. A computer node can be a single or multiprocessor system (PCs,
workstations, or SMPs) with memory, I/O facilities, and an operating system. A cluster
generally refers to two or more computers (nodes) connected together. The nodes can exist
Cluster Computing at a Glance in a single cabinet or be physically separated and connected via
a LAN. A inter- connected (LAN-based) cluster of computers can appear as a single system to
users and applications. Such a system can provide a cost-effective way to gain features and
benefits (fast and reliable services) that have historically been found only on more expensive
proprietary shared memory systems. The typical architecture of a cluster is shown in Figure
11
CLUSTER COMPUTING
12
CLUSTER COMPUTING
• Applications
_Sequential
_Parallel or Distributed
The network interface hardware acts as a communication processor and is responsible for
transmitting and receiving packets of data between cluster nodes via a network/switch.
Communication software offers a means of fast and reliable data communication among cluster
nodes and to the outside world. Often, clusters with a special net- work/switch like Myrinet use
communication protocols such as active messages for fast communication among its nodes.
They potentially bypass the operating system and thus remove the critical communication
overheads providing direct user-level access to the network interface. The cluster nodes can
work collectively, as an integrated computing resource, or they can operate as individual
computers. The cluster middleware is responsible for offering an illusion of a unified system
image (single system image) and availability out of a collection on independent but
interconnected computers. Programming environments can offer portable, efficient, and easy-
to-use tools for development of applications. They include message passing libraries,
debuggers. It should not be forgotten that clusters could be used for the execution of sequential
or parallel applications.
Network clustering connects otherwise independent computers to work together in some
coordinated fashion. Because clustering is a term used broadly, the hardware configuration of
clusters varies substantially depending on the networking technologies chosen and the purpose
(the so-called "computational mission") of the system. Clustering hardware comes in three
basic flavors: so-called "shared disk," "mirrored disk” and” shared nothing" configurations.
Shared Disk Clusters
One approach to clustering utilizes central I/O devices accessible to all computers ("nodes")
within the cluster. We call these systems shared-disk clusters as the I/O involved is typically
disk storage for normal files and/or databases. Shared-disk cluster technologies include Oracle
Parallel Server (OPS) and IBM's HACMP.
Shared-disk clusters rely on a common I/O bus for disk access but do not require shared
memory. Because all nodes may concurrently write to or cache data from the central disks, a
13
CLUSTER COMPUTING
14
CLUSTER COMPUTING
across a community of nodes it is possible to scale the number of read requests which can be
dealt with per second in a linear fashion.
15
CLUSTER COMPUTING
Render Farms
Render farms are a special form of batch processing clusters, with less of an emphasis on
responsiveness - most of the processing jobs will take more than a minute. Low cost hardware
and the quantity of available processing power is most important. Rendering is used in the
visual effects, computer modeling and CGI industries and refers to the process of creating an
image from what are essentially mathematical formulae. Rendering engines provide numerous
different features, which in combination can produce a scene with the desired effects.
16
CLUSTER COMPUTING
take an hour or more! Additionally it is important to provide the means to develop embedded
applications, where compiling on the host may be painfully slow. Ainkaboot believe in
making things a simple a possible, so our systems can be integrated into you existing code
management system or we can deploy an entirely new system with the cluster managing your
CVS (Concurrent Versions System) and development, verification, pre-production and
production environments as well.
MPI Architecture
MPI stands for Message Passing Interface and numerous implementations exist all with their
own particular advantages. However an MPI standard has been agreed on and Ainkaboot
support all the open source MPI implementations available.
The architecture of an MPI cluster depends on the specific application and many
Supercomputing clusters are designed specifically with a couple of applications in mind.
However there are some general points to note.
A key feature of MPI systems is low latency networks for internodes communication. As such
the networks switching technology is important for determining the eventual performance of
the system. Additionally the application must be designed to take advantage of the system and
should also take advantage of the processors architecture in use.
17
CLUSTER COMPUTING
MPI Architecture
18
CLUSTER COMPUTING
redundancy of applications and data. This, of course, requires at least two nodes — a primary
and a backup. In this model, the nodes can be active/passive or active/active. In the
active/passive scenario, one server is doing most of the work while the second server is
spending most of its time on replication work. In the active/active scenario, both servers are
doing primary work and both are accomplishing replication tasks so that each server always
"looks" just like the other. In both instance, instant failover is achievable should the primary
node (or the primary node for a particular application) experience a system or application
outage. As with the previous model, this model easily scales up (through application
replication) as the overall volume of users and transactions goes up. The scale-up happens
through simple application replication, requiring little or no application modification or
alteration.
19
CLUSTER COMPUTING
20
CLUSTER COMPUTING
When two or more computers are used together to solve a problem, it is called a computer
cluster. Then there are several ways of implementing the cluster, Beowulf is maybe the most
known way to do it, but basically it is just cooperation between computers in order to solve a
task or a problem. Cluster Computing is then just the thing you do when you use a computer
cluster.
Grid computing is something similar to cluster computing, it makes use of several computers
connected is some way, to solve a large problem. There is often some confusion about the
difference between Grids vs. Cluster computing.
• The big difference is that a cluster is homogenous while grids are heterogeneous.
• The computers that are part of a grid can run different operating systems and have
different hardware whereas the cluster computers all have the same hardware and OS.
• A grid can make use of spare computing power on a desktop computer while the
machines in a cluster are dedicated to work as a single unit and nothing else.
• Grid is inherently distributed by its nature over a LAN, metropolitan or WAN. On the
other hand, the computers in the cluster are normally contained in a single location or
complex.
• Another difference lies in the way resources are handled. In case of Cluster, the whole
system (all nodes) behaves like a single system view and resources are managed by
centralized resource manager. In case of Grid, every node is autonomous i.e. it has its
own resource manager and behaves like an independent entity.
Characteristics of Grid Computing
Loosely coupled (Decentralization)
Diversity and Dynamism
Distributed Job Management & scheduling
Characteristics of Cluster computing
Tightly coupled systems
Single system image
On the Windows operating system compute clusters are supported by Windows Compute
Cluster Server 2003 and grid computing is supported by the Digipede Network™.
21
CLUSTER COMPUTING
Cluster computing on Windows is provided by Windows Compute Cluster Server 2003 (CCS)
from Microsoft. CCS is a 64-bit version of Windows Server 2003 operating system packaged
with various software components that greatly eases the management of traditional cluster
computing.
With a dramatically simplified cluster deployment and management experience, CCS removes
many of the 4 Grid and Cluster Computing: Options for Improving Windows Application
Performance obstacles imposed by other solutions.
CCS enables users to integrate with existing Windows infrastructure, including Active
Directory and SQL Server.
CCS supports a cluster of servers that includes a single head node and one or more compute
nodes. The head node controls and mediates all access to the cluster resources and is the single
point of management, deployment, and job scheduling for the compute cluster. All nodes
running in the cluster must have a 64-bit CPU.
How Does It Work?
A user submits a job to the head node. The job identifies the application to run on the cluster.
The job scheduler on the head node assigns each task defined by the job to a node and then
starts each application instance on the assigned node.
Results from each of the application instances are returned to the client via files or databases.
Application parallelization is provided by Microsoft MPI (MSMPI), which supports
communication between tasks running in concurrent processes. MSMPI is a “tuned” MPI
implementation, optimized to deliver high performance on the 64-bit Windows Server OS.
MSMPI calls can be placed within an application, and the mpiexec.exe utility is available to
control applications from the command-line. Because MSMPI enables communication between
the concurrently executing application instances, the nodes are often connected by a high-speed
serial bus such as Gigabit Ethernet or InfiniBand.
Reduced Cost: The price of off-the-shelf consumer desktops has plummeted in recent years,
and this drop in price has corresponded with a vast increase in their processing power and
performance. The average desktop PC today is many times more powerful than the first
mainframe computers.
22
CLUSTER COMPUTING
Processing Power: The parallel processing power of a high-performance cluster can, in many
cases, prove more cost effective than a mainframe with similar power. This reduced price per
unit of power enables enterprises to get a greater ROI from their IT budget.
Improved Network Technology: Driving the development of computer clusters has been a
vast improvement in the technology related to networking, along with a reduction in the price.
Computer clusters are typically connected via a single virtual local area network (VLAN), and
the network treats each computer as a separate node. Information can be passed throughout
these networks with very little lag, ensuring that data doesn’t bottleneck between nodes.
Scalability: Perhaps the greatest advantage of computer clusters is the scalability they offer.
While mainframe computers have a fixed processing capacity, computer clusters can be easily
expanded as requirements change by adding additional nodes to the network.
Availability: When a mainframe computer fails, the entire system fails. However, if a node in
a computer cluster fails, its operations can be simply transferred to another node within the
cluster, ensuring that there is no interruption in service.
There are many projects investigating the development of supercomputing class machines
using commodity of-the-shelf components.
• Solaris-MC project at Sun Labs, Sun Microsystems, Inc., Palo Alto, CA.
The class of applications that a cluster can typically cope with would be considered grand
challenge or super-computing applications. GCAs (Grand Challenge Applications) are
23
CLUSTER COMPUTING
fundamental problems in science and engineering with broad economic and scientific impact
they are generally considered intractable without the use of state-of-the-art parallel computers.
The scale of their resource requirements, such as processing time, memory, and communication
needs distinguishes GCAs.
A typical example of a grand challenge problem is the simulation of some phenomena that
cannot be measured through experiments. GCAs include massive crystallographic and
microtomographic structural problems, protein dynamics and biocatalysts, relativistic quantum
chemistry of actinides, virtual materials design and processing, global climate modeling, and
discrete event simulation.
• Crystallographic problems
• Protein dynamics
• Biocatalysts etc.
Disadvantages:
24
CLUSTER COMPUTING
References:
http://technet2.microsoft.com/WindowsServer/en/Library/23afa6ab-bdaa-4c8d-9d89-
44ac67196d5b1033.mspx?mfr=true
http://www.digipede.net/downloads/Digipede_Network_Whitepaper.pdf
http://www.digipede.net/downloads/Digipede_SDK_Whitepaper.pdf
25