Gpfs & Storm: Jon Wakelin University of Bristol

GPFS & StoRM
Jon Wakelin
University of Bristol
Pre-Amble
GPFS Basics
What it is & what it does
GPFS Concepts
More in-depth technical concepts
GPFS Topologies
HPC Facilities at Bristol

How we are using GPFS
Creating a mock-up/staging-service for GridPP
StoRM
Recap & References
GPFS Basics
IBMs General Parallel File System
Scaleable high-performance parallel file system
Numerous HA features
Life-cycle Management Tools
Provides POSIX and extended interfaces to data
Available for AIX and Linux

Only supported on AIX, RHEL and SuSE
Installed successfully on SL3.x (ask me if you are interested)
GPFS can run on a mix of these OSs
Pricing - per processor

Free version available through IBMs Scholars program
Currently developing new Licensing model
GPFS Basics
Provides High-performance I/O
Divides files into blocks and stripes the blocks across disks (on multiple storage devices)
Reads/Writes the blocks in parallel
Tuneable block sizes (depends on your data)
Block-level locking mechanism
Multiple applications can access the same file concurrently
multiple editors can work on different parts of a single file simultaneously. This eliminates the
additional storage, merging and management overhead typically required to maintain multiple copies
Client-side data-caching
Where is data cached?
Multi-Cluster Configuration
Join GPFS clusters together
Encrypted data and authentication or just authentication
openssl and keys
Different security contexts (root squash la NFS)
GPFS Basics
Information Life-cycle Management
Tiered storage
Create groups of disks within a file system,
based on reliability, performance, location, etc
Policy driven automation
Automatically move, delete or replicate files - based on filename, username, or fileset.
e.g. Keep newest files on fastest hardware, migrate them to older hardware over time
e.g. Direct files to appropriate resource upon creation.
Other notable points

Can specify user, group and fileset quotas
POSIX and NFS v4 ACL support
Can specify different IPs for GPFS and non-GPFS traffic
Maximum limit of 268 million disks (2048 is default max)
GPFS Topologies
SAN-Attached
All nodes are physically attached to all NSDs
High performance but expensive!
GPFS Topologies
Network Shared Disk (NSD) Server

Subset of nodes are physically attached to NSDs
Other nodes forward their IO requests to the NSD servers which perform the IO and pass
back the data
GPFS Topologies
Local Area Network
applicatio applicatio applicatio applicatio

n
Linux n
Linux n
Linux n
Linux
GPFS GPFS GPFS GPFS
NSD NSD NSD NSD
Server Server
In practice, often have a mixed NSD + SAN environment

Nodes use SAN if they can and NSD servers if they cant
If SAN connectivity fails a SAN-attached node can fallback to using remaining NSD servers
GPFS Redundancy & HA
Non-GPFS
Redundant power supplies
Redundant hot swap fans

RAID with hot swappable disks (multiple IBM DS4700s)
FC with redundant paths (GPFS know how to use this)
HA Features in GPFS
Primary and secondary Configuration Servers
Primary and secondary NSD Servers for each Disk
Replicate Metadata
Replicate data
Failure Groups
Specify which machines have a single point of failure
GPFS will use this info to make sure that replicated data is not striped across failure groups
GPFS Quorum
Quorum
A Majority of the nodes must be present before access to shared disks is allowed
Prevent subgroups making conflicting decisions
In event of failure disks in minority suspend and those in the majority continue
Quorum Nodes
These nodes are counted to determine if the system is quorate
If the system is no longer quorate
GPFS unmounts the filesystem
waits until quorum is established
and then recovers the FS.
Quorum Nodes with Tie-Breaker Disks

GPFS Performance
Preliminary results using
time dd if=/dev/zero of=testfile bs=1k count=2000000
Multiple write processes on same node
1 process 90MB/s
2 processes 51 MB/s
4 processes 18MB/s
Multiple write processes from different nodes
1 process 90MB/s
2 processes 58 MB/s
4 processes 28 MB/s
5 processes 23 MB/s
GPFS Performance
In a hybrid environment (SAN-attached and NSD Server nodes)
Read/Writes from SAN-attached nodes place little load on the NSD servers
Read/Writes from other nodes place a high load on the NSD servers
SAN-attached
[root@bf39 gpfs]# time dd if=/dev/zero of=file_zero count=2048 bs=1024k
real 0m31.773s
[root@bf40 GPFS]# top -p 26651

26651 root 0 -20 1155m 73m 7064 S 0 1.5 0:10.78 mmfsd
Via NSD Server
[root@bfa-se /]# time dd if=/dev/zero of=/gpfs/file_zero count=2048 bs=1024k

real 0m31.381s
[root@bf40 GPFS]# top -p 26651

26651 root 0 -20 1155m 73m 7064 S 34 1.5 0:10.78 mmfsd
Bristol HPC Facilities
Bristol, IBM, ClearSpeed and ClusterVision
BabyBlue - installed Apr 2007
Currently undergoing acceptance trials
BlueCrystal ~Dec 2007
Testing
A number of pump-priming projects have been identified
Majority of users will develop, or port code, directly on the HPC system
Only make changes at the Application level
GridPP
System level changes
Pool accounts, World-addressable Slaves, NAT, Run services and daemons
Instead we will build testing/staging system for GridPP

In-house and loan equipment from IBM
Reasonable Analogue of HPC facilities
No InfiniBand (but you wouldnt use it anyway)
Bristol HPC Facilities
BabyBlue
Torque/Maui, SL 4 Worker Node, RHEL4 (maybe AIX) on Head-Nodes
IBM 3455,
96 dual-core, dual-socket 2.6GHz, AMD Opterons
4? ClearSpeed Accelerator board
8GB RAM per node (2GB per core)
IBM DS4700 + EXP810, 15TB Transient storage
SAN/FC network running GPFS
BlueCrystal c. Dec 2007

Torque/Moab
512 dual-core, dual-socket nodes (or quad-core depending on timing)
8GB RAM per node (1GB or 2GB per core)
50TB Storage, SAN/FC Network running GPFS
Server Room
48 water cooled APC racks 18 will be occupied by HPC, Physics Servers may be co-
located
3 x270kW chillers (space for 3 more)
GPFS BabyBlue
GPFS MiniBlue
p-Config s-Config ---

p-NSD s-NSD ---
quorum quorum quorum
IBM DS4500
Configure hot spares
StoRM
StoRM is a storage resource manager for disk based storage systems.
Implements the SRM interface version 2.2
StoRM is designed to support guaranteed space reservation and direct access (using native
POSIX I/O call)
StoRM takes advantage of high performance parallel file systems
GPFS, XFS and Lustre???
Also standard POSIX file systems are supported
Direct access to files from Worker Nodes
Compare with Castor, D-Cache and DPM
StoRM architecture
Front end (FE):
Exposes the web service interface
Manages user authentication
Sends the request to the BE
Data Base (DB):

Stores SRM request and status
Stores file and space information
Back end (BE):

Binds with the underlying file systems
Enforces authorization policy on files
Manages SRM file and space metadata
StoRM miscellaneous
Scalability and high availability.
FE, DB, and BE can be deployed on different machines
StoRM is designed to be configured with n FE and m BE, using a common DB
Installation (Relatively straight forward)

RPM & Yaim (FE, BE and DB all on one server)
Additional manual configuration steps
e.g. namespace.xml, Information Providers
Not completely documented yet
Mailing list
CNAF x2 and Bristol

Basic tests - http://lxdev25.cern.ch/s2test/basic/history/
Use Case tests - http://lxdev25.cern.ch/s2test/usecase/history/
Currently still differences between Bristol and CNAF installations
StoRM usage model
Summary
GPFS
Scalable high-performance file system
Highly Available, built on redundant components
Tiered storage or multi-cluster configuration for GridPP work
HPC
University wide facility not just for PP
GridPP requirements rather different from general/traditional HPC users
Build an analogue of the HPC system for GridPP
StoRM
Better performance because StoRM builds on
Also, more appropriate data transfer model POSIX and file protocol
References
GPFS
http://www-03.ibm.com/systems/clusters/software/gpfs.pdf
http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_fa
qs/gpfsclustersfaq.pdf
http://www-03.ibm.com/systems/clusters/software/whitepapers/gpfs_intro.pdf
Storm
http://hst.home.cern.ch/hst/publications/storm_chep06.pdf
http://agenda.cnaf.infn.it/getFile.py/access?contribId=10&resId=1&materialId=slides&confId=0

Gpfs & Storm: Jon Wakelin University of Bristol

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Gpfs & Storm: Jon Wakelin University of Bristol

Загружено:

Авторское право:

Доступные форматы

GPFS & StoRM

HPC Facilities at Bristol

Available for AIX and Linux

Pricing - per processor

Other notable points

Network Shared Disk (NSD) Server

applicatio applicatio applicatio applicatio

In practice, often have a mixed NSD + SAN environment

Quorum Nodes with Tie-Breaker Disks

time dd if=/dev/zero of=testfile bs=1k count=2000000

Multiple write processes on same node

Multiple write processes from different nodes

[root@bf40 GPFS]# top -p 26651

Via NSD Server

[root@bfa-se /]# time dd if=/dev/zero of=/gpfs/file_zero count=2048 bs=1024k

[root@bf40 GPFS]# top -p 26651

Instead we will build testing/staging system for GridPP

BlueCrystal c. Dec 2007

p-Config s-Config ---

Data Base (DB):

Back end (BE):

Installation (Relatively straight forward)

CNAF x2 and Bristol

Вам также может понравиться