Академический Документы
Профессиональный Документы
Культура Документы
Ira Pramanick
Sun Microsystems, Inc.
Outline
Todays vs. tomorrows clusters
- How they are used today and how this will change
Characteristics of future clusters
- Clusters as general-purpose platforms
How they will be delivered
- Suns Full Moon architecture
Summary & conclusions
Clustering Today
Mostly for HA
LAN/WAN
Little sharing of
resources
IP Switch IP Switch
Exposed topology
Hard to use
Layered on OS
Reactive Solution
Clustering Tomorrow
LAN/WAN
Central
Console
Global Networking
Global Storage
Sun Full Moon architecture
Turns clusters into general-purpose platforms
- Cluster-wide file systems, devices, networking
- Cluster-wide load-balancing and resource management
Integrated solution
- HW, system SW, storage, applications, support/service
Embedded in Solaris 8
Builds on existing Sun Cluster line
- Sun Cluster 2.2 -> Sun Cluster 3.0
Characteristics of tomorrows
clusters
High-availability
Cluster-wide resource sharing: files, devices, LAN
Flexibility & Scalability
Close integration with the OS
Load-balancing & Application management
Global system management
Integration of all parts: HW, SW, applications, support, HA
guarantees
High Availability
End-to-end application availability
- What matters: Applications as seen by network clients are
highly-available
- Enable Service Level Agreements
Failures will happen
- SW, HW, operator errors, unplanned maintenance, etc.
- Mask failures from applications as much as possible
- Mask application failures from clients
High Availability...
No single point of failure
- Use multiple components for HA & scalability
Need strong HA foundation integrated into OS
- Node group membership, with quorum
- Well-defined failure boundaries--no shared memory
- Communication integrated with membership
- Storage fencing
- Transparently restartable services
High Availability...
Applications are the key
- Most applications are not cluster-aware
- Mask most errors from applications
- Restart when node fails, with no recompile
Disaster recovery
- Campus-separation and geographical data replication
Resource Sharing
What is important to applications?
- Ability to run on any node in cluster
- Uniform global access to all storage and network
- Standard system APIs
What to hide?
- Hardware topology, disk interconnect, LAN adapters,
hardwired physical names
Resource Sharing...
What is needed?
- Cluster-wide access to existing file systems, volumes,
devices, tapes
- Cluster-wide access to LAN/WAN
- Standard OS APIs: no application rewrite/recompile
Use SMP model
- Apps run on machine (not CPU 5, board 3, bus 2)
- Logical resource names independent of actual path
Resource Sharing...
Cluster-wide location-independent resource access
- Run applications on any node
- Failover/switchover apps to any node
- Global job/work queues, print queues, etc.
- Change/maintain hardware topology without affecting
applications
But need not require fully-connected SAN
- Main interconnect can be used through software support
Flexibility
Business needs change all the time
- Therefore, platform must be flexible
System must be dynamic -- all done on-line
- Resources can be added and removed
- Dynamic reconfiguration of each node
- Hot-plug in and out of IO, CPUs, memory, storage, etc.
Global
Wide-range application
of HW management
Global Storage
Cluster
APIs Global
Global file
file
Global
Global
devices system
system
devices
Summary
Clusters as general-purpose platforms
- Shift from reactive to proactive clustering solution
Clusters must be built on a strong foundation
- Embed into a solid operating system
- Full Moon -- bakes clustering technology into Solaris
Make clusters easy to use
- Hide complexity, hardware details
Must be an integrated solution
- From platform, service/support, to HA guarantees