Академический Документы
Профессиональный Документы
Культура Документы
Steve Jones
Technology Operations Manager
Institute for Computational and Mathematical Engineering
Stanford University
Larry Jones
Vice President, Product Marketing
Panasas Inc.
Institute for Computational
and Mathematical Engineering
Research Groups
Funding
• Sponsored Research
(AFOSR/ONR/DARPA/DURIP/ASC)
Institute for Computational
and Mathematical Engineering
Affiliates Program
Institute for Computational
and Mathematical Engineering
The Research
MOLECULES TO PLANETS !
Institute for Computational
and Mathematical Engineering
Tsunami Modeling
Preliminary calculations
Landslide Modeling
9/12/97
Institute for Computational
and Mathematical Engineering
8
7 4
6 3
5
2
4
1
3
0 e s ors
2
ut s
in ces
1 -1
8 m ro
0.6 0.8 1 1.2 1.4 0.6 0.8 1 41.2 p1.4
4 0
2
Mach Number
Mach Number on
3D Simulation (Clean Wing)
Flight Test Data (Clean Wing)
Institute for Computational
and Mathematical Engineering
Databases?
Potential
4
3.5
0.5
0
0.6 0.8 1 1.2
Mach Number
Institute for Computational
and Mathematical Engineering
A Brief Introduction to
Clustering and Rocks
Institute for Computational
and Mathematical Engineering
Types of Clusters
• Highly Available (HA)
– Generally small, less than 8 nodes
– Redundant components
– Multiple communication paths
– This is NOT Rocks
• Visualization clusters
– Each node drives a display
– OpenGL machines
– This is not core Rocks
– But there is a Viz Roll
• Computing (HPC clusters)
– AKA Beowulf
– This is core Rocks
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
• Not cost-effective if every cluster “burns” a person just for care and feeding
• Programming environment could be vastly improved
• Technology is changing rapidly
– Scaling up is becoming commonplace (128-256 nodes)
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
Minimum Components
Power
Local Hard
Drive
Ethernet
i386 (Athlon/Pentium)
x86_64 (Opteron/EM64T)
ia64 (Itanium) server
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
Optional Components
• High performance network
– Myrinet
– Infiniband (SilverStorm or Voltaire)
• Network addressable power distribution unit
• Keyboard/video/mouse network not required
– Non-commodity
– How do you manage your network?
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
• Rocks is a cluster on a CD
– Red Hat Enterprise Linux (opensource and free)
– Clustering software (PBS, SGE, Ganglia, NMI)
– Highly programmatic software configuration management
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
Philosophy
• Caring and feeding for a system is not fun
• System administrators cost more than clusters
– 1 TFLOP cluster is less than $200,000 (US)
– Close to actual cost of a full-time administrator
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
Philosophy (continued)
• All nodes are 100% automatically configured
– Zero “hand” configuration
– This includes site-specific configuration
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
Philosophy (continued)
• Optimize for installation
– Get the system up quickly
– In a consistent state
– Build supercomputers in hours not months
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
© 2005 UC Regents
Institute for Computational
and Mathematical Engineering
The Clusters
Institute for Computational
and Mathematical Engineering
Iceberg
• 600 Processor Intel Xeon 2.8GHz
• Fast Ethernet
– Install Date 2002
• 1 TB Storage
• Physical installation - 1 week
• Rocks installation tuning - 1 week
Institute for Computational
and Mathematical Engineering
Nivation
Campus
Backbone
Eliminated Bottlenecks
Redundancy
Frontend Server Tools-1 Tools-2 Tools-3 Tools-4
400MBytes/sec
NFS Appliance
GigE Net
NFS Appliance
Myrinet
Huge Bottleneck/
Single Point of Failure
Institute for Computational
and Mathematical Engineering
#PBS -N BONNIE
#PBS -e Log.d/BONNIE.panfs.err
#PBS -o Log.d/BONNIE.panfs.out
#PBS -m aeb
#PBS -M hpcclusters@gmail.com
#PBS -l nodes=1:ppn=2
#PBS -l walltime=30:00:00
PBS_O_WORKDIR='/home/sjones/benchmarks'
export PBS_O_WORKDIR
### ---------------------------------------
### BEGINNING OF EXECUTION
### ---------------------------------------
cd $PBS_O_WORKDIR
cmd="/home/tools/bonnie++/sbin/bonnie++ -s 8000 -n 0 -f -d /home/sjones/bonnie"
echo "running bonnie++ with: $cmd in directory "`pwd`
$cmd >& $PBS_O_WORKDIR/Log.d/run9/log.bonnie.panfs.$PBS_JOBID
Institute for Computational
and Mathematical Engineering
NFS - 8 Nodes
17.80MB/sec for concurrent write using NFS with 8 dual processor jobs
PanFS - 8 Nodes
154MB/sec for concurrent write using PanFS with 8 dual processor jobs
NFS - 16 Nodes
Version 1.03 Sequential Output Sequential Input- Random Seeks
-Block- -Rewrite- -Block-
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP
compute-3-82 8000M 1403 0 127 0 2210 0 274.0 2
compute-3-81 8000M 1395 0 132 0 1484 0 72.1 0
compute-3-80 8000M 1436 0 135 0 1342 0 49.3 0
compute-3-79 8000M 1461 0 135 0 1330 0 53.7 0
compute-3-78 8000M 1358 0 135 0 1291 0 54.7 0
compute-3-77 8000M 1388 0 127 0 2417 0 45.5 0
compute-3-74 8000M 1284 0 133 0 1608 0 71.9 0
compute-3-73 8000M 1368 0 128 0 2055 0 54.2 0
compute-3-54 8000M 1295 0 131 0 1650 0 47.4 0
compute-2-53 8000M 1031 0 176 0 737 0 18.3 0
compute-2-52 8000M 1292 0 128 0 2124 0 104.1 0
compute-2-51 8000M 1307 0 129 0 2115 0 48.1 0
compute-2-50 8000M 1281 0 130 0 1988 0 92.2 1
compute-2-49 8000M 1240 0 135 0 1488 0 54.3 0
compute-2-47 8000M 1273 0 128 0 2446 0 52.7 0
compute-2-46 8000M 1282 0 131 0 1787 0 52.9 0
20.59MB/sec for concurrent write using NFS with 16 dual processor jobs
27.41MB/sec during read process
Institute for Computational
and Mathematical Engineering
PanFS - 16 Nodes
Version 1.03 Sequential Output Sequential Input- Random Seeks
-Block- -Rewrite- -Block-
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP
compute-1-26 8000M 14330 5 3392 2 28129 9 54.1 0
compute-1-25 8000M 14603 5 3294 2 30990 9 60.3 0
compute-1-24 8000M 14414 5 3367 2 28834 9 55.1 0
compute-1-23 8000M 9488 3 2864 2 17373 5 121.4 0
compute-1-22 8000M 8991 3 2814 2 21843 7 116.5 0
compute-1-21 8000M 9152 3 2881 2 20882 6 80.6 0
compute-1-20 8000M 9199 3 2865 2 20783 6 85.2 0
compute-1-19 8000M 14593 5 3330 2 29275 9 61.0 0
compute-1-18 8000M 9973 3 2797 2 18153 5 121.6 0
compute-1-17 8000M 9439 3 2879 2 22270 7 64.9 0
compute-1-16 8000M 9307 3 2834 2 21150 6 99.1 0
compute-1-15 8000M 9774 3 2835 2 20726 6 77.1 0
compute-1-14 8000M 15097 5 3259 2 32705 10 60.6 0
compute-1-13 8000M 14453 5 2907 2 36321 11 126.0 0
compute-1-12 8000M 14512 5 3301 2 32841 10 60.4 0
compute-1-11 8000M 14558 5 3256 2 33096 10 62.2 0
187MB/sec for concurrent write using PanFS with 8 dual processor jobs
405MB/sec during read process
Capacity imbalances on jobs - 33MB/sec increase from 8 to 16 job run
Institute for Computational
and Mathematical Engineering
• Performance
– High read concurrency for parallel application and data sets
– High write bandwidth for memory checkpointing, interim and final output
• Scalability
– More difficult problems typically means larger data sets
– Scaling cluster nodes requires scalable IO performance
• Management
– Single system image maximizes utility for user community
– Minimize operations and capital costs
Institute for Computational
and Mathematical Engineering
– 16-Port GE Switch
– Redundant power + battery
KEY:
Hardware maximizes next generation file system
Institute for Computational
and Mathematical Engineering
Ease of Management
Panasas Addresses Key Drivers of TCO
Source: Gartner
Institute for Computational
and Mathematical Engineering
Industry-Leading Performance
Thank you
For more information about Steve Jones:
High Performance Computing Clusters
http://www.hpcclusters.org