Вы находитесь на странице: 1из 24

Sun Clusters

Ira Pramanick
Sun Microsystems, Inc.
Outline
Todays vs. tomorrows clusters
- How they are used today and how this will change
Characteristics of future clusters
- Clusters as general-purpose platforms
How they will be delivered
- Suns Full Moon architecture
Summary & conclusions
Clustering Today
Mostly for HA
LAN/WAN
Little sharing of
resources
IP Switch IP Switch
Exposed topology
Hard to use
Layered on OS
Reactive Solution
Clustering Tomorrow

LAN/WAN
Central
Console
Global Networking

Global Storage
Sun Full Moon architecture
Turns clusters into general-purpose platforms
- Cluster-wide file systems, devices, networking
- Cluster-wide load-balancing and resource management
Integrated solution
- HW, system SW, storage, applications, support/service
Embedded in Solaris 8
Builds on existing Sun Cluster line
- Sun Cluster 2.2 -> Sun Cluster 3.0
Characteristics of tomorrows
clusters
High-availability
Cluster-wide resource sharing: files, devices, LAN
Flexibility & Scalability
Close integration with the OS
Load-balancing & Application management
Global system management
Integration of all parts: HW, SW, applications, support, HA
guarantees
High Availability
End-to-end application availability
- What matters: Applications as seen by network clients are
highly-available
- Enable Service Level Agreements
Failures will happen
- SW, HW, operator errors, unplanned maintenance, etc.
- Mask failures from applications as much as possible
- Mask application failures from clients
High Availability...
No single point of failure
- Use multiple components for HA & scalability
Need strong HA foundation integrated into OS
- Node group membership, with quorum
- Well-defined failure boundaries--no shared memory
- Communication integrated with membership
- Storage fencing
- Transparently restartable services
High Availability...
Applications are the key
- Most applications are not cluster-aware
- Mask most errors from applications
- Restart when node fails, with no recompile

- Provide support for cluster-aware apps


- Cluster APIs, fast communication

Disaster recovery
- Campus-separation and geographical data replication
Resource Sharing
What is important to applications?
- Ability to run on any node in cluster
- Uniform global access to all storage and network
- Standard system APIs
What to hide?
- Hardware topology, disk interconnect, LAN adapters,
hardwired physical names
Resource Sharing...
What is needed?
- Cluster-wide access to existing file systems, volumes,
devices, tapes
- Cluster-wide access to LAN/WAN
- Standard OS APIs: no application rewrite/recompile
Use SMP model
- Apps run on machine (not CPU 5, board 3, bus 2)
- Logical resource names independent of actual path
Resource Sharing...
Cluster-wide location-independent resource access
- Run applications on any node
- Failover/switchover apps to any node
- Global job/work queues, print queues, etc.
- Change/maintain hardware topology without affecting
applications
But need not require fully-connected SAN
- Main interconnect can be used through software support
Flexibility
Business needs change all the time
- Therefore, platform must be flexible
System must be dynamic -- all done on-line
- Resources can be added and removed
- Dynamic reconfiguration of each node
- Hot-plug in and out of IO, CPUs, memory, storage, etc.

- Dynamic reconfiguration between nodes


- More nodes, load-balancing, application reconfiguration
Scalability
Cluster SMP nodes
Choose nodes as big as needed to scale application
- Need expansion room within nodes too
Dont use clustering exclusively to scale applications
- Interconnect speed slower than backplane speed
- Few cluster-aware applications
- Clustering large number of small nodes is like herding chicken
Close integration with OS
Currently: multi-CPU SMP support in OS
- Does not make sense otherwise
Next step: cluster support in the OS
- Next dimension of OS support: across nodes
Clustering will become part of the OS
- Not a loosely-integrated layer
Advantages of OS integration
Ease of use
- Same administration model, commands, installation
Availability
- Integrated heartbeat, membership, fencing, etc.
Performance
- In-kernel support, inter-node/process messaging, etc.
Leverage
- All OS features/support available for clustering
Load-balancing
Load-balancing done at various levels
- Built-in network load-balancing
- For example, incoming http requests; TCP/IP bandwidth

- Transactions at middleware level


- Global job queues
All nodes have access to all storage and network
- Therefore any node can be eligible to perform the work
Resource management
Cluster-wide resource management
- CPU, network, interconnect, IO bandwidth
- Cluster-wide application priorities
Global resource requirements guaranteed locally
- Need per-node resource management
High-availability is not just making sure an application is
started
- Must guarantee resources to finish job
Global cluster management
System management
- Perform administrative functions once
- Maintain same model as single node
- Same tools/commands as base OS--minimize retraining
Hide complexity
- Most administrative operations should not deal with HW
topology
- But still enable low-level diagnostics and management
A Total Clustering Solution
Applications
Cluster OS Servers
Software
Storage
System
Management
Cluster
Interconnect
Middleware

HA Guarantee Service and


Practice Support

Integration of all components


Roadmap
Sun Cluster 2.2: currently shipping
- Solaris 2.6, Solaris 7, Solaris 8 3/00
- 4 nodes
- Year 2000 compliant
- Choice of servers, storage, interconnects, topologies, networks
- 10 Km separation

Sun Cluster 3.0


- External Alpha 6/99, Beta Q1 CY00, GA 2H CY00
- 8 nodes
- Extensive set of new features: cluster fs, global devices, network load-
balancing, new APIs (RGM), diskless application failover, SyMON
integration
Wide Range of Applications
Agents developed, sold, and supported by Sun
- Databases (Oracle, Sybase, Informix, Informix XPS), SAP
- Netscape (http, news, mail, LDAP), Lotus Notes
- NFS, DNS, Tivoli
Sold and supported by 3rd parties
- IBM DB2 and DB2 PE, BEA Tuxedo
Agents developed and supported by Sun Professional Services
- A large list, including many in-house applications
Toolkit for agent development
- Application management API, training, Sun PS support
Full Moon clustering
Embedded in Solaris 8 Built-in
load Single
balancing management
Dynamic
Dynamic console
domains
domains
Global Networking Global
resource
management

Global
Wide-range application
of HW management
Global Storage

Cluster
APIs Global
Global file
file
Global
Global
devices system
system
devices
Summary
Clusters as general-purpose platforms
- Shift from reactive to proactive clustering solution
Clusters must be built on a strong foundation
- Embed into a solid operating system
- Full Moon -- bakes clustering technology into Solaris
Make clusters easy to use
- Hide complexity, hardware details
Must be an integrated solution
- From platform, service/support, to HA guarantees

Вам также может понравиться