Академический Документы
Профессиональный Документы
Культура Документы
A Tutorial at Supercomputing 09 by
Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio state.edu/ panda http://www.cse.ohio-state.edu/~panda Matthew Koop NASA Goddard E-mail: matthew.koop@nasa.gov E il tth k @ http://www.cse.ohio-state.edu/~koop Pavan Balaji Argonne National Laboratory E-mail: balaji@mcs.anl.gov http://www.mcs.anl.gov/ balaji http://www.mcs.anl.gov/~balaji
Presentation Overview
Introduction Why InfiniBand and 10-Gigabit Ethernet? Overview of IB, 10GE, their Convergence and Features IB and 10GE HW/SW Products and Installations Sample Case Studies and Performance Numbers Conclusions and Final Q&A
Supercomputing '09
Clusters are increasingly becoming popular to design next generation computing systems
Scalability Modularity and Upgradeability with Scalability, compute and network technologies
Supercomputing '09
3
Storage cluster
Meta-Data Manager I/O Server Node I/O Server Node I/O Server Node Meta Data Data Data Data
Frontend
LAN
Compute Node
LAN/WAN
Compute Node
Switch
. .
Application Server
. .
Supercomputing '09
Hardware Components
Processing Bottleneck
Memory
P1
Core 0 Core 1 Core 2
Memory
Core 3 I O
Network Switch
I/O B Bottleneck U
S Network Switch
Software Components
Communication software
Network Adapter
Network Bottleneck
Supercomputing '09
Not scalable:
Cross talk bet een bits between Skew between wires Signal integrity makes it difficult to increase bus width significantly, especially for high clock speeds
Supercomputing '09
9
Network Bottleneck:
Bit serial differential signaling
Independent pairs of wires to transmit independent data (called a lane) Scalable to any number of lanes Easy to increase clock speed of lanes (since each lane consists only of a pair of wires)
Supercomputing '09
10
Both IB and 10GE today come as network adapters that plug into existing I/O technologies
Supercomputing '09
11
Presentation Overview
Introduction Why InfiniBand and 10-Gigabit Ethernet? Overview of IB, 10GE, their Convergence and Features IB and 10GE HW/SW Products and Installations Sample Case Studies and Performance Numbers Conclusions and Final Q&A
Supercomputing '09
12
Intel QuickPath
2009
Supercomputing '09
13
Myricom GM
P Proprietary protocol stack from Myricom i t t l t kf M i
These network stacks set the trend for highperformance communication requirements
Hardware offloaded protocol stack Support for fast and secure user-level access to the pp protocol stack
Supercomputing '09
16
IB Trade Association
IB Trade Association was formed with seven industry leaders (Compaq, Dell, HP, IBM, Intel, Microsoft, and Sun) Goal: To design a scalable and high performance communication and I/O architecture by taking an integrated i ti d hit t b t ki i t t d view of computing, networking, and storage technologies Many other industry participated in the effort to define the IB architecture specification ( ) IB Architecture (Volume 1, Version 1.0) was released to public on Oct 24, 2000 Latest version 1.2.1 released January 2008 http://www.infinibandta.org
Supercomputing '09
17
IB Hardware Acceleration
Some IB models have multiple hardware accelerators p
E.g., Mellanox IB adapters
Supercomputing '09
18
Jumbo Frames
N l No latency i impact; I Incompatible with existing switches ibl ih i i i h
Presentation Overview
Introduction Why InfiniBand and 10-Gigabit Ethernet? Overview of IB, 10GE their Convergence and Features IB and 10GE HW/SW Products and Installations Sample Case Studies and Performance Numbers Conclusions and Final Q&A
Supercomputing '09
23
IB Overview
InfiniBand
Architecture and Basic Hardware Components Communication Model and Semantics
C Communication M d l i ti Model Memory registration and protection Channel and memory semantics
Novel Features
Hardware Protocol Offload
Link network and transport layer features Link,
A Typical IB Network yp
Supercomputing '09
26
Supercomputing '09
27
Relay packets from a link to another Switches: intra-subnet Routers: inter-subnet May support multicast
Supercomputing '09
28
Intel Connects: Optical cables with Copper-to-optical conversion hubs (acquired by Emcore)
Up to 100m length 550 picoseconds copper-to-optical conversion latency Available from other vendors (Luxtera) Repeaters (Vol. 2 of InfiniBand specification)
Supercomputing '09
29
(Courtesy Intel)
IB Overview
InfiniBand
Architecture and Basic Hardware Components Communication Model and Semantics
C Communication M d l i ti Model Memory registration and protection Channel and memory semantics
Novel Features
Hardware Protocol Offload
Link network and transport layer features Link,
Supercomputing '09
31
QP
Send Recv
CQ
Supercomputing '09
32
CQ
WQEs
CQEs
Completed WQEs are placed in the CQ with additional information They are now called CQEs (Cookies)
InfiniBand Device I fi iB d D i
Supercomputing '09
33
Supercomputing '09
Memory Registration y g
Before we do any communication: All memory used for communication must be registered 1. Registration Request
Process
Memory Protection y
For security, keys are required for all operations that touch buffers
Process
Kernel
l_key
For RDMA, the initiator must have the RDMA r_key for the remote virtual address
Possibly exchanged with a send/recv r_key is not encrypted in IB
HCA/RNIC
Processor
Processor
Memory
Receive Buffer
QP CQ
Send Recv
QP
Send Recv
CQ
1. Post receive WQE Q 2. Post send WQE 3. Pull out completed CQEs from the CQ
InfiniBand Device
Hardware ACK
InfiniBand Device
Receive WQE contains information on the receive buffer; Incoming messages have to be matched to a receive WQE to know where to place the data
Supercomputing '09
37
Processor
Processor
Memory
Receive Buffer
QP CQ
Send
QP Initiator processor is involved only to: Send Recv Recv 1. Post send WQE 2. Pull out completed CQE from the send CQ No involvement from the target processor
CQ
InfiniBand Device
Hardware ACK
InfiniBand Device
Send S d WQE contains information about the t i i f ti b t th send buffer and the receive buffer
Supercomputing '09
38
IB Overview
InfiniBand
Architecture and Basic Hardware Components Communication Model and Semantics
C Communication M d l i ti Model Memory registration and protection Channel and memory semantics
Novel Features
Hardware Protocol Offload
Link network and transport layer features Link,
Supercomputing '09
40
Supercomputing '09
41
Has no relation to the number of messages but only to messages, the total amount of data being sent
O 1MB message i equivalent to 1024 1KB messages One is i l (except for rounding off at message boundaries)
Supercomputing '09
43
Supercomputing '09
44
Virtual Lanes
Multiple virtual links within same physical link
Between 2 and 16
VL15: reserved for management Each port supports one or more data VL
Supercomputing '09
45
S to VL mapping: SL
SL determines which VL on the next link is to be used E h port (switches, routers, end nodes) h a SL t VL Each t ( it h t d d ) has to mapping table configured by the subnet management
Partitions:
Fabric administration (through Subnet Manager) may assign specific SLs to different partitions to isolate traffic flows
46
Supercomputing '09
Servers S
Servers S
Servers S
InfiniBand Virtual Lanes allow the multiplexing of multiple independent logical traffic flows on the same physical link Providing the benefits of independent, separate networks while eliminating the cost and difficulties associated with maintaining two or more networks t k
47
IP Network
Routers, Switches VPNs, DSLAMs
Supercomputing '09
For multicast packets, the switch needs to maintain packets multiple output ports to forward the packet to
Packet is replicated to each appropriate output port Ensures at-most once delivery & loop-free forwarding There is an interface for a group management protocol
Create, join/leave, prune, delete group
Supercomputing '09
48
Destination-based Switching/Routing g g
Spine Blocks
Leaf Blocks
Routing: U R ti Unspecified b IB SPEC ifi d by Up*/Down*, Shift are popular routing engines supported by OFED
Fat-Tree is a popular topology for IB Clusters Different over subscription ratio may be used over-subscription
Supercomputing '09
49
IB Switching/Routing: An Example g g p
Spine Blocks
1 2 3 4
Leaf Blocks
P2
LID: 2 LID: 4
P1
DLID
Out-Port
Forwarding Table
2 4 1 4
Someone has to setup these tables and give every port an LID
Subnet Manager does this work (more discussion on this later)
IB Multicast Example p
Supercomputing '09
51
IB WAN Capability p y
Getting increased attention for:
Remote Storage, Remote Visualization Cluster Aggregation (Cluster-of-clusters)
Supercomputing '09
53
Supercomputing '09
54
Flow Labels allow routers to specify which p p y packets belong to the same connection
Switches can optimize communication by sending p p y g packets with the same label in order Flow labels can change in the router, but packets belonging to one label will always do so
Supercomputing '09
55
Supercomputing '09
56
IB Transport Services p
Service Type Reliable Connection Unreliable Connection Reliable Datagram Unreliable Datagram RAW Datagram Connection Oriented Yes Yes No No No Acknowledged Yes No Yes No No Transport IBA IBA IBA IBA Raw
Each transport service can have zero or more QPs associated with it
e.g., you can have 4 QPs based on RC and one based on UD
Supercomputing '09
57
Supercomputing '09
58
m
One RQ per connection
n -1
SRQ is a hardware mechanism for a process to share p receive resources (memory) across multiple connections
Introduced in specification v1.2
(M2-1)*N 1)
(M-1) (M 1)*N
IB Overview
InfiniBand
Architecture and Basic Hardware Components Communication Model and Semantics
C Communication M d l i ti Model Memory registration and protection Channel and memory semantics
Novel Features
Hardware Protocol Offload
Link network and transport layer features Link,
Concepts in IB Management p g
Agents
Processes or hardware units running on each adapter, switch, router (everything on the network) Provide capability to query and set parameters
Managers
Make high-level decisions and implement it on the network fabric using the agents
Messaging schemes g g
Used for interactions between the manager and agents (or between agents)
Messages
Supercomputing '09
62
Subnet Manager g
Inactive Link
Multicast Join
Subnet Manager
Supercomputing '09
63
10GE Overview
10-Gigabit Ethernet Family
Architecture and Components
Stack Layout y Out-of-Order Data Placement Dynamic and Fine-grained Data Rate Control y g
Supercomputing '09
64
iWARP/10GE
Supported (for TOE and iWARP) Supported (for iWARP) Not supported Supported Out-of-order (for iWARP) Dynamic and Fine-grained (for TOE and iWARP) Prioritization and Fixed Bandwidth QoS
Supercomputing '09
65
RDMAP RDDP MPA SCTP Hardware TCP IP Device Driver Network Adapter (e.g., 10GigE)
Remote Direct Data Placement (RDDP) Data Placement and Delivery Multi Stream Semantics Connection Management
Marker PDU Aligned (MPA) Middle Box Fragmentation Data Integrity (CRC)
Supercomputing '09
66
Supercomputing '09
67
Issues to consider:
Can we guarantee that the frame will be unchanged? What happens when intermediate switches segment data?
Supercomputing '09
68
Switch Splicing p g
Switch
Splicing
Splicing
Intermediate Ethernet switches (e.g., those which support splicing) can segment a frame to multiple segments or coalesce multiple segments to a single segment
Supercomputing '09
69
Each segment independently has enough information about where it needs to be placed
Supercomputing '09
70
RDDP Header
Segment Length
Pad C C CRC
Supercomputing '09
71
Dynamic bandwidth allocation to flows based on interval between two packets in a flow
E.g., one stall for every packet sent on a 10 Gbps network refers to a bandwidth allocation of 5 Gbps Complicated because of TCP windowing behavior
10GE Overview
10-Gigabit Ethernet Family
Architecture and Components
Stack Layout y Out-of-Order Data Placement Dynamic and Fine-grained Data Rate Control y g
Supercomputing '09
74
Regular Ethernet
TOE
Fully compatible with hardware iWARP y p Internally utilizes host TCP/IP stack
Ethernet Environment
Supercomputing '09
76
Application
User-level iWARP
Application
Kernel-level iWARP
Sockets
Sockets TCP IP
Sockets
TCP (Modified with MPA) TCP IP Device Driver IP
Software iWARP
Device Driver
Device Driver
Network Adapter
Network Adapter
Network Adapter
Network Adapter
Supercomputing '09
77
Supercomputing '09
78
Single network firmware to support both b th IB and Eth d Ethernet t Autosensing of layer-2 protocol
Can be configured to automatically g y work with either IB or Ethernet networks
IB Verbs
Sockets
IB Transport Layer
TCP
Hardware
IB Network Layer
IP
TCP/IP support
Multi-port Multi port adapters can use one port on IB and another on Ethernet Multiple use modes:
Datacenters with IB inside the cluster and Ethernet outside Clusters with IB network and Ethernet management
IB Link Layer
IB Port
Ethernet Port
Supercomputing '09
79
Native convergence of IB network and transport layers with Eth t tl ith Ethernet link l t li k layer IB packets encapsulated in Ethernet frames IB network layer already uses IPv6 frames Pros:
Works natively in Ethernet environments Has all the benefits of IB verbs
Ethernet
Cons:
Network bandwidth limited to Ethernet switches (currently 10Gbps), even though IB can provide 32Gbps S Some IB native link-layer f t ti li k l features are optional in (regular) Ethernet
Supercomputing '09
80
Application IB Verbs
Pros:
Works natively in Ethernet environments Has all the benefits of IB verbs CEE is very similar to the link layer of native IB, so there are no missing features
Cons:
Network bandwidth limited to Ethernet switches (currently 10Gbps) even though 10Gbps), IB can provide 32Gbps
Supercomputing '09
81
iWARP/10GE
Yes Yes No No Out-of-order Optional Optional Yes Yes
Supercomputing '09
IBoE
Yes Yes Yes Optional Ordered Optional Optional Yes No
RoCEE
Yes Yes Yes Optional Ordered Yes Yes Yes No
82
Presentation Overview
Introduction Why InfiniBand and 10-Gigabit Ethernet? Overview of IB, 10GE, their Convergence and Features IB and 10GE HW/SW Products and Installations Sample Case Studies and Performance Numbers Conclusions and Final Q&A
Supercomputing '09
83
IB Hardware Products
Many IB vendors: Mellanox, Voltaire and Qlogic
Ali Aligned with many server vendors: I t l IBM SUN, Dell d ith d Intel, IBM, SUN D ll And many integrators: Appro, Advanced Clustering, Microway,
MemFree Adapter
No memory on HCA Uses System memory (through PCIe) Good for LOM designs (Tyan S2935, Supermicro 6015T-INFB)
Different speeds
SDR (8 Gbps), DDR (16 Gbps) and QDR (32 Gbps)
Some 12X SDR adapters exist as well (24 Gbps each way)
Supercomputing '09
84
(Courtesy Tyan)
Similar b Si il boards from Supermicro with LOM features are also available d f S i ith f t l il bl
Supercomputing '09
85
used at TACC
N New IB switch silicon f it h ili from Ql i i t d Qlogic introduced at SC 08 d t Up to 846-port QDR switch by Qlogic Switch Routers with Gateways IB-to-FC; IB-to-IP
Supercomputing '09
86
IB Software Products
Low-level software stacks
VAPI (Verbs-Level API) from Mellanox Modified and customized VAPI from other vendors New initiative: Open Fabrics (formerly OpenIB)
http://www.openfabrics.org Open-source code available with Linux distributions Initially IB; later extended to incorporate iWARP
Fujitsu, Woven Systems (144 ports), Myricom (512 ports), Quadrics (96 ports), Force10, Cisco, Arista (formerly Arastra) 40GE and 100GE switches Nortel Networks
10GE downlinks with 40GE and 100GE uplinks
Supercomputing '09
88
(Courtesy Mellanox)
89
Supercomputing '09
OpenFabrics p
Open source organization (formerly OpenIB)
www.openfabrics.org
Supercomputing '09
Mellanox (libmthca)
QLogic (libipathverbs)
Chelsio (libcxgb3)
Mellanox Adapters
QLogic Adapters
IBM Adapters
Chelsio Adapters
Supercomputing '09
91
For IBoE and RoCEE, the upperlevel stacks remain completely unchanged Within the hardware:
Transport and network layers remain completely unchanged
User Level
Kernel Level
ConnectX Adapters Ad t
B th IB and Eth Both d Ethernet (or CEE) link t( li k layers are supported on the network adapter
Note: The OpenFabrics stack is not valid for the Ethernet path in VPI
That still uses sockets and TCP/IP
10GE
IB
Supercomputing '09
92
Application Level
Various MPIs
Clustered DB Access
MAD
Management Datagram
SMA
User APIs
InfiniBand
SDP Lib
SCSI RDMA Protocol (Initiator) iSCSI RDMA Protocol (Initiator) Reliable Datagram Service g User Direct Access Programming Lib Host Channel Adapter
SDP
SRP
iSER
RDS
NFS-RDMA RPC
Kerne bypass el
Kerne bypass el
Connection Manager Abstraction (CMA) SA MAD Client InfiniBand SMA Connection Manager Connection Manager iWARP R-NIC
Mid-Layer
HCA
R-NIC
RDMA NIC
Provider
Key
Common InfiniBand
Hardware
iWARP
Supercomputing '09
93
Supercomputing '09
Supercomputing '09
95
10GE Installations
Several Enterprise Computing Domains
E t Enterprise Datacenters (HP, Intel) and Financial Markets i D t t (HP I t l) d Fi i lM k t Animation firms (e.g., Universal Studios created The Hulk and many new movies using 10GE)
Integrated Systems
BG/P uses 10GE for I/O (ranks 3 7 9 14 24 in the Top 25) 3, 7, 9, 14, ESnet to install 62M $ 100GE infrastructure for US DOE
Supercomputing '09
96
Presentation Overview
Introduction Why InfiniBand and 10-Gigabit Ethernet? Overview of IB, 10GE, their Convergence and Features IB and 10GE HW/SW Products and Installations Sample Case Studies and Performance Numbers Conclusions and Final Q&A
Supercomputing '09
98
Case Studies
Low level Performance Low-level Message Passing Interface ( g g (MPI) ) File Systems
Supercomputing '09
99
Small Messages
Native N ti IB VPI-IB VPI-Eth IBoE
Large Messages
25 20 15 10 5 0 1K 2K 128 256
Message Size (bytes)
Latency (us)
0 512 4K 2 4 8 16 32 64
ConnectX-DDR: 2.4 GHz Quad-core (Nehalem) Intel with IB and 10GE switches
Supercomputing '09
100
400 200 0
VPI-Eth IBoE
ConnectX-DDR: 2.4 GHz Quad-core (Nehalem) Intel with IB and 10GE switches
Supercomputing '09
101
Case Studies
Low level Performance Low-level Message Passing Interface ( g g (MPI) ) File Systems
Supercomputing '09
102
Has been implemented for various past commodity networks (Myrinet, Quadrics) How can it be designed and efficiently implemented for InfiniBand and iWARP?
Supercomputing '09
103
MVAPICH/MVAPICH2 Software
High Performance MPI Library for IB and 10GE
MVAPICH (MPI-1) and MVAPICH2 (MPI-2) Used by more than 975 organizations in 51 countries More than 34,000 downloads from OSU site directly Empowering many TOP500 clusters
8th ranked 62,976-core cluster (Ranger) at TACC
Available with software stacks of many IB, 10GE and server vendors including Open Fabrics Enterprise Distribution (OFED) Also supports uDAPL device to work with any network supporting uDAPL http://mvapich.cse.ohio-state.edu/
Supercomputing '09
104
Available with many software distributions Integrated with the ROMIO MPI-IO implementation and the MPE profiling library http://www.mcs.anl.gov/research/projects/mpich2
Supercomputing '09
105
InfiniHost III and ConnectX-DDR: 2.33 GHz Quad-core (Clovertown) Intel with IB switch
Supercomputing '09
Unidirectional Bandwidth
MVAPICH-InfiniHost III-DDR MVAPICH-Qlogic-SDR MVAPICH-ConnectX-DDR MVAPICH-ConnectX-QDR-PCIe2 MVAPICH-Qlogic-DDR-PCIe2
2500 2000
1952.9 1399.8
1389.4 936.5
InfiniHost III and ConnectX-DDR: 2.33 GHz Quad-core (Clovertown) Intel with IB switch
Supercomputing '09
6.88
Unidirectional Bandwidth
Chelsio Ch l i (TCP/IP) Chelsio (iWARP)
M MillionBytes/se ec
1500
1000
855.3
2500
Large Messages
20
L Latency (us)
2000
15
1500
10
1000
500
0 1K
ConnectX-DDR: 2.4 GHz Quad-core (Nehalem) Intel with IB and 10GE switches
Supercomputing '09
110
Uni-directional Bandwidth
Native IB VPI-IB VPI-Eth IBoE Ban ndwidth (MBps s)
Bi-directional Bandwidth
1000 800 600 400 200 0 12 28K 512K 2K 8K 3 32K 128 512 2M 0 2 8 32
Message Size (bytes)
12 28K
512K
128
512
ConnectX-DDR: 2.4 GHz Quad-core (Nehalem) Intel with IB and 10GE switches
Supercomputing '09
111
3 32K
2M
32
2K
8K
Case Studies
Low level Performance Low-level Message Passing Interface ( g g (MPI) ) File Systems
Supercomputing '09
112
Metadata Server
Metadata Server
I/O server
I/O server
Computing node
I/O server
Supercomputing '09
113
Lustre Performance
Write Performance (4 OSSs) 1200
Throughpu (MBps) ut Native Throughpu (MBps) ut
Read Performance (4 OSSs) 3500 3000 2500 2000 1500 1000 500 0
IPoIB
Number of Clients
Number of Clients
CPU Utilization
100 90 80
CPU Utilization (%) IPoIB (Read) Native (Read) IPoIB (Write) Native (Write)
70 60 50 40 30 20 10 0 1 2
Number of Clients
4 OSS nodes, IOzone record size 1MB Offers potential for greater scalability
Supercomputing '09
115
NFS/RDMA Performance
Write (tmpfs)
1000 900 Throughput (MB/s) 700 600 500 400 300 200 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of Threads Read-Read Read-Write Throughput (MB/s) t 800 1000 900 800 700 600 500 400 300 200 100 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Number of Threads Read (tmpfs)
IOzone Read Bandwidth up to 913 MB/s (Sun x2200s with x8 PCIe) Read-Write design by OSU, available with the latest OpenSolaris NFS/RDMA is being added into OFED 1 4 1.4
R. Noronha, L. Chai, T. Talpey and D. K. Panda, Designing NFS With RDMA For Security, Performance and Scalability, ICPP 07
Supercomputing '09
116
Supercomputing '09
117
Presentation Overview
Introduction Why InfiniBand and 10-Gigabit Ethernet? Overview of IB, 10GE, their Convergence and Features IB and 10GE HW/SW Products and Installations Sample Case Studies and Performance Numbers Conclusions and Final Q&A
Supercomputing '09
118
Concluding Remarks g
Presented network architectures & trends in Clusters Presented background and details of IB and 10GE
Highlighted the main features of IB and 10GE and their convergence Gave an overview of IB and 10GE hw/sw products Discussed sample performance numbers in designing various high-end systems with IB and 10GE
IB and 10GE are emerging as new architectures leading to a new generation of networked computing systems, opening many research issues needing novel solutions
Supercomputing '09
119
Funding Acknowledgments g g
Our research is supported by the following organizations Funding support by
Equipment support by
Supercomputing '09
120
Personnel Acknowledgments g
Current Students
M Kalaiya (M S ) M. (M. S.) K. Kandalla (M.S.) P. Lai (Ph.D.) M. Luo (Ph.D.) G. Marsh (Ph. D.) X. Ouyang (Ph.D.) S. Potluri (M. S.) H Subramoni (Ph D ) H. S b i (Ph.D.)
Past Students
P Balaji (Ph D ) P. (Ph.D.) D. Buntinas (Ph.D.) S. Bhagvat (M.S.) L. Chai (Ph.D.) B. Chandrasekharan (M.S.) T. Gangadharappa (M.S.) K. Gopalakrishnan (M.S.) W. Huang (Ph.D.) W. Jiang (M.S.) S. Kini (M.S.) M Koop (Ph D ) M. (Ph.D.) R. Kumar (M.S.) S. Krishnamoorthy (M.S.) P. Lai (Ph. D.) ( ) J. Liu (Ph.D.)
Supercomputing '09
121
A Mamidala (Ph D ) A. (Ph.D.) S. Naravula (Ph.D.) R. Noronha (Ph.D.) G. Santhanaraman (Ph.D.) J. Sridhar (M.S.) S. Sur (Ph.D.) K. Vaidyanathan (Ph.D.) A. Vishnu (Ph.D.) J. Wu (Ph.D.) W. Yu (Ph.D.)
Current Post-Doc
E Mancini E.
Current Programmer
J. Perkins
Web Pointers
http://www.cse.ohio-state.edu/~panda http://www.mcs.anl.gov/~balaji http://www.cse.ohio-state.edu/~koop http://nowlab.cse.ohio-state.edu MVAPICH Web Page http://mvapich.cse.ohio-state.edu