Вы находитесь на странице: 1из 21

Users Guide on UoW HPC Cluster

Information Technology Services


University of Wollongong
( Last updated on October 10, 2011)
Contents
1 Getting Started 3
2 Hardware 3
3 File System 3
4 Software Installation 4
5 Software Environment 4
6 Queue Structure 7
7 Working with Queue System 8
7.1 Submit a batch job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
7.2 Check the status of a job/queue . . . . . . . . . . . . . . . . . . . . . . . 11
7.3 Submit an interactive job . . . . . . . . . . . . . . . . . . . . . . . . . . 11
7.4 Delete submitted jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
CONTENTS CONTENTS
8 Parallel Environment 13
8.1 Submit OpenMP Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
8.2 Set up MPI environment for future shells . . . . . . . . . . . . . . . . . . 14
8.3 Set up MPI environment for current shell . . . . . . . . . . . . . . . . . 16
8.4 Build parallel codes using MPI . . . . . . . . . . . . . . . . . . . . . . . 17
8.5 Submit MPI job to the batch queue system . . . . . . . . . . . . . . . . . 17
8.5.1 LAM/MPI sample script . . . . . . . . . . . . . . . . . . . . . . 17
8.5.2 MPICH2 sample script . . . . . . . . . . . . . . . . . . . . . . . 18
8.5.3 OpenMPI sample script . . . . . . . . . . . . . . . . . . . . . . . 19
9 Usage Policy 20
10 Contact Information 21
3 FILE SYSTEM
1 Getting Started
Contact HPC admin to request an account. Students must get their supervisor to request
an account on the HPC cluster.
Once the account is enabled, the user may use their normal ITS supplied username and
password to sign onto the head node of the UoWHPCcluster: gur.its.uow.edu.au.
Users can use secure shell (ssh) to connect to the cluster from on campus only, i.e.
ssh username@gur.its.uow.edu.au
If working outside campus, user can rstly login to wumpus.uow.edu.au and then
ssh to the HPC cluster from it. However, the home directory on the HPC is NOT the
same as on other ITS machines such as wumpus.
For details on logging on the HPC cluster from a Windows desktop, please refer to UOW
HPC Quick Start.
Note The head node is ONLY for job creation and not job execution. Users MUST submit
jobs to the queue system by using the qsub command (more on qsub in a moment) to
run jobs on one or more compute nodes.
2 Hardware
The UoW HPC cluster consists of a single head node and 30 homogeneous compute
nodes. Each node has two Quad-Core AMD Opteron Processors (2.3GHz) installed with
16GB local memory. In addition, the cluster has a storage node with approximately 28TB
of storage. The storage array splits into 4 main pools which could be accessed by all
nodes.
3 File System
/home/username This is users home directory. Most of users activities should
stay within it. It is globally accessible from all nodes within the system.
/hpc/software Used for installing system-wide software packages accessible
by multiple users. User needs to obtain permission from ITS to deploy software
here.
3
5 SOFTWARE ENVIRONMENT
/tmp This is the local directory attached to each node to store the intermediate
les from various system commands and programs. Since it has very limited space,
please do NOT set TMPDIR environment variable of users applications to /tmp
or put scratch data in it.
/hpc/tmp This is the directory where TMPDIR should be set to and all scratch
or temporary data should be stored during the calculation. At present there is no
quota limitation per user within this directory. However, to keep its availability, this
directory will be cleaned regularly and all scratch les older than 3 months will be
automatically disposed from this directory.
/hpc/data Reserved for special requirement of data storage.
Note The storage on the HPC is protected via RAID (so loss of a single disk does not lose
data) but is NOT backed up in any way. It is the users responsibility to ensure that any
important data is backed up. Contact ITS if you need special storage requirements.
4 Software Installation
Users can install software under their own home directory. If user requires software that
is not installed on the HPC please send an email to the ITS admin team.
Note As there is currently no central budget for software licenses, any costs incurred for
the requested software must be covered by the department of the university that the user
is employed by or users research groups. Please see the Software List section of the
UOW HPC Software Guide for the details of the available software.
5 Software Environment
The operating systemon the HPCcluster is Scientic Linux. The package Environment
Modules is deployed on the cluster to allow easy customization of users shell environ-
ment to the requirements of whatever software you wish to use. The module command
syntax is the same no matter which command shell you are using as listed in the following:
module avail will show you a list of the software environments which can be
loaded via module load package command. Example:
4
5 SOFTWARE ENVIRONMENT
-sh-3.2$ module avail
------- /opt/env-switcher/share/env-switcher ---
mpi/mpich-ch_p4-gcc-1.2.7 mpi/openmpi-1.3
mpi/lam-7.1.4 mpi/openmpi-1.4.2
mpi/mpich2-1.3.1-gcc mpi/openmpi-1.4.3
-------- /opt/modules/oscar-modulefiles --------
default-manpath/1.0.1(default) switcher/1.0.13(default)
pvm/3.4.5+6(default) torque/2.1.10
-------- /opt/modules/version -------------------
3.2.5
-------- /opt/modules/Modules/3.2.5/modulefiles -----
dot module-cvs module-info modules null use.own
-------- /opt/modules/modulefiles --------------
R/2.12.0/R-2.12.0
abinit/6.4.1/abinit-6.4.1
blcr/0.8.2/blcr-0.8.2
cmake/2.8.3/cmake-2.8.3
cpmd/3.13.2/cpmd-3.13.2
emacs/23.2.1/emacs-23.2.1
fftw/2.1.5/fftw-2.1.5
fftw/3.2.2/fftw-3.2.2
ghostscript/ghostscript-8.15
gromacs/4.5.1/gromacs-4.5.1
imagemagick/6.6.4/imagemagick-6.6.4
intel-cc/10.1.023/intel-cc-10.1.023
intel-fc/10.1.023/intel-fc-10.1.023
intel-fc/11.1.046/intel-fc-11.1.046
intel-mkl/10.2/intel-mkl-10.2
lam/lam-oscar-7.1.4
lammps/05Nov10/lammps-05Nov10
mpich2/1.2.7-itl/mpich2-1.2.7-itl
mpich2/1.3.1-itl/mpich2-1.3.1-itl
namd/2.7b4/namd-2.7b4
oscar-modules/1.0.5(default)
perl/5.12.2/perl-5.12.2
quantum_espresso/4.2.1/quantum_espresso-4.2.1
xcrysden/1.5.21/xcrysden-1.5.21
5
5 SOFTWARE ENVIRONMENT
module load package will load the software environments for you. Example:
-sh-3.2$ module load R/2.12.0
module help package should give you a little information about what the
module load package will achieve for you. Example:
-sh-3.2$ module help R/2.12.0
------- Module Specific Help for R/2.12.0/R-2.12.0 ------
This modulefile provides R (2.12.0, x86-64)
More information about R can be found at:
http://www.r-project.org/
-sh-3.2$
module show package will detail the command in the module le. Example:
-sh-3.2$ module show R/2.12.0
------------------------------------------------------------
/opt/modules/modulefiles/R/2.12.0/R-2.12.0:
module-whatis Sets the environment for R (2.12.0, x86-64)
conflict R
append-path PATH /hpc/software/packages/R/2.12.0/bin
append-path MANPATH /hpc/software/packages/R/2.12.0/share/
man
------------------------------------------------------------
module list print out those loaded modules. Example:
-sh-3.2$ module list
Currently Loaded Modulefiles:
1) torque/2.1.10 3) switcher/1.0.13 5) oscar-modules
/1.0.5
2) pvm/3.4.5+6 4) default-manpath/1.0.1 6) R/2.12.0/R
-2.12.0
Note The available software packages are subject to change and are upgraded from time to
time. Please check the latest status of all available packages by using module avail
command.
6
6 QUEUE STRUCTURE
6 Queue Structure
Users must submit their jobs to the queue systems. The queue system software in use is
TORQUE (OpenPBS) resource management system in conjugate with Maui sched-
uler. TORQUE is a networked subsystem for submitting, monitoring, and controlling a
workload of jobs on the cluster. The present version of Torque is 3.0. Users submit jobs to
a queue by specifying the number of CPUs, the amount of memory, and the length of time
needed (and, possibly, other resources). The Maui scheduler then run the job according
to its priority when the resources are available, subject to constraints on maximum re-
source usage. Maui is capable of very sophisticated scheduling and will be tuned over
time to meet the requirements of the user community while maximising overall through-
put. The present version of Maui is 3.3.
At present there are two execution queues, short and workq, at two levels of priority
respectively:
short:
* higher priority queue for testing, debugging and interactive jobs
* With resource limit at 48h CPUtime and 4 GB memory per job
workq:
* the default queue designed for all production use
* long running time limits (1000h CPUtime)
* default cput=48h (without requesting cput or walltime)
* allows the largest resource requests
The basic scheduling policy is FIFO (rst in rst out) within each queue, i.e. queuing jobs
in the order that they arrive. However, the less resources (walltime, cput, mem, nodes etc.)
requested in the job, the higher priority that the job to be put into run. Since the job get
allocated based on the resources requested and available, make sure that your requests are
reasonable and this will help your jobs run sooner.
Note In some circumstances, such as the package can not restart from the checkpoint or
job needs larger memory etc, ITS could create special queues with longer executing time
limit or larger memory limit. Please contact ITS for requesting special queues.
7
7 WORKING WITH QUEUE SYSTEM
7 Working with Queue System
7.1 Submit a batch job
Use qsub command to submit a batch job script to the queue, i.e.
-sh-3.2$ qsub jobscript
where jobscript is an ascii le containing the TORQUE options and the shell script
to run commands and programs (not the compiled executable which is a binary le):
#!/bin/sh
#PBS -N test job
#PBS -l nodes=1
#PBS -l cput=03:00:10
#PBS -l pvmem=400MB
#PBS -q workq
#PBS -o test.out
#PBS -e test.err
#PBS -m abe
#PBS -V
cd $PBS_O_WORKDIR
./a.out > output
Explanations on the above example job script are listed below:
#!/bin/sh
The shell environment in use.
#PBS -N test job
Use -N ag to specify name of the job.
#PBS -l nodes=1
Use -l ag to request resources. This example job requests 1 core. Use #PBS
-l nodes=1:ppn=N to request N (8) cores within the same node if you are
running OpenMP parallelized program.
#PBS -l cput=03:00:10
Request 3 hours cpu time in the format of hh:mm:ss. Users can also request Wall-
time time with walltime.
8
7.1 Submit a batch job 7 WORKING WITH QUEUE SYSTEM
#PBS -l pvmem=400MB
Request 400MB virtual memory per process. Users can also request memory per
process by using pmem. Other units of pvmem and pmem could be GB.
#PBS -q workq
Use -q ag to specify the destination queue of the job. At present workq is the
default queue with very long walltime and cput limit.
#PBS -o test.out
Use -o ag to specify the name and path of the standard output le.
#PBS -e test.err
Use -e ag to specify the name and path of the standard error le.
#PBS -m abe
Use -m ag to specify the email notication when the job aborts, begins and/or
nishes.
#PBS -V
Export users environment variables to the job.
cd $PBS O WORKDIR
Changes the current working directory to the directory from which the script was
submitted, i.e. PBS O WORKDIR is the PBS variable showing the qsub working
directory.
./a.out > output
Users program is running here.
Notice that the PBS directives are all starting with #PBS and are all at the start of
the script, that there are no blank lines between them, and there are no other non-PBS
commands until after all the PBS directives.
Note
Always pipe the output message to a le within your home directory, i.e. > output.
TORQUE also handles the ncpus resource. With ncpus, they all have to be allo-
cated on the same node. Thus -l ncpus=N is equivalent to -l nodes=1:ppn=N
and N should always 8 in the HPC cluster. However, since combining usage of
nodes and ncpus tends to give unexpected CPUallocations, always use nodes=M:ppn=N
to request appropriate CPU resources.
9
7.1 Submit a batch job 7 WORKING WITH QUEUE SYSTEM
Other PBS ags for qsub command are listed below:
#PBS -j [eo|oe] Merge STDOUT and STDERR. If eo merge as standard
error; if oe merge as standard output.
#PBS -v Customize the user dened variables.
Type man qsub for more details on using qsub command.
When a batch job starts execution, a number of environment variables are predened,
which include:
variables dened on the execution host.
variables exported from the submission host with -v (selected variables) and
-V (all variables).
variables dened by PBS.
The following variables reect the environment where the user ran qsub:
PBS O HOST The host where you ran the qsub command.
PBS O LOGNAME Your user ID where you ran qsub.
PBS O HOME Your home directory where you ran qsub.
PBS O WORKDIR The working directory where you ran qsub.
These variables reect the environment where the job is executing:
PBS ENVIRONMENT Set to PBS BATCH to indicate the job is a batch job, or to
PBS INTERACTIVE to indicate the job is a PBS interactive job.
PBS O QUEUE The original queue you submitted to.
PBS QUEUE The queue the job is executing from.
PBS JOBID The jobs PBS identier.
PBS JOBNAME The jobs name.
10
7.2 Check the status of a job/queue 7 WORKING WITH QUEUE SYSTEM
7.2 Check the status of a job/queue
Job progress can be monitored using command qstat. It will give the following in-
formation: job identier,job name, username, elapsed CPU time, status of the job and
queue in which the job resides. Status can be one of the following:
E - job is exiting after having run
H - job is held
Q - job is queued, eligible to be run or routed
R - job is running
T - job is being moved to new location
W - job is waiting for its execution time to be reached
Other qstat ags:
qstat -u username Display all jobs belong to a specic user.
For example qstat -u ruiy will check the status of jobs belong to user
ruiy.
qstat -f jobid Full display of a job with a specic jobid.
Check the value of resources used.cput and resources used.walltime
in the output. Generally, the resources used.cput should be around
5095% (resources used.walltime requested CPU)
Otherwise, please check your job input le and job script le to make sure they are
requesting consistent number of CPU core.
Type man qstat for more details on using qstat command.
7.3 Submit an interactive job
Interactive batch jobs are likely to be used for debugging large or parallel programs and es-
pecially for running time-consuming, memory consuming and I/O consuming commands.
It uses the CPU and memory of a computer node which can largely reduce the work load
on the head node. An example on working with the interactive job is shown below:
Suppose user ruiy is working at the head node, i.e.
11
7.3 Submit an interactive job 7 WORKING WITH QUEUE SYSTEM
-sh-3.2$ hostname
gur.its.uow.edu.au
Submit an interactive job to short queue
-sh-3.2\$ qsub -I -q short
qsub: waiting for job 54718.gur.its.uow.edu.au to start
qsub: job 54718.gur.its.uow.edu.au ready
-sh-3.2$
An interactive shell being started out on the batch cpu[s] once job starts
-sh-3.2$ hostname
hpc30.its.uow.edu.au
Initially login users home directory
-sh-3.2$ pwd
/home/ruiy
Check job status
-sh-3.2$ qstat -u ruiy
gur.its.uow.edu.au:
Reqd Reqd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- ------ -------- ------ ---- --- ------ ----- - -----
54718.gur.its.uow.ed ruiy short STDIN 11701 1 1 429496 48:00 R --
Terminate the job
-sh-3.2$ exit
logout
qsub: job 54718.gur.its.uow.edu.au completed
Returned to the head node
-sh-3.2$ hostname
gur.its.uow.edu.au
A submission script cannot be used in this mode - the user must provide all qsub options
on the command line for the interactive job. The submitted interactive jobs are subject to
all the same constraints and management as any other job in the same queue. Dont forget
12
7.4 Delete submitted jobs 8 PARALLEL ENVIRONMENT
to exit the interactive batch session by typing exit to avoid leaving cpus idle on the
machine.
If the user needs to work with a GUI program in the X session it is necessary to submit
an interactive job with -X, type:
-sh-3.2$ qsub -I -X -q short
The user also needs to login in the cluster by specifying -X ag in ssh command
when using Linux desktop or enable X-server on your Windows desktop. Note Users
should submit their interactive job to short queue ONLYand avoid submitting to workq
queue.
7.4 Delete submitted jobs
To delete a specic batch job, type qdel jobid in the command line where jobid is
the jobs identication, produced by the qsub command. However, it has no effect on an
interactive job and you need to type exit to quit the interactive job.
Type man qdel for more details on using qdel command.
To delete all jobs belong to a user, use the following shell commands:
qselect -u $USER | xargs qdel
qselect prints out a job list based on specic criterions, xargs takes multi line input
and run the command you give to it repeatedly until it has consumed the input list.
Delete all running jobs of a user:
qselect -u $USER -s R | xargs qdel
Delete all queued jobs of a user:
qselect -u $USER -s Q | xargs qdel
8 Parallel Environment
Users can run multiple instances of the same program by submitting several jobs with
each instance using only one core. On the other hand, users are particularly encouraged
to develop and run programs with parallel APIs such as OpenMP and message passing
interface (MPI).
13
8.1 Submit OpenMP Jobs 8 PARALLEL ENVIRONMENT
8.1 Submit OpenMP Jobs
OpenMP is an API that supports multi-platformshared memory multiprocessing program-
ming model. An OpenMP program can only utilize CPUs within a single node. Specify
#PBS -l nodes=1:ppn=N with N8 in the job script to run an OpenMP program.
An example job script is shown as below:
#!/bin/sh
#PBS -N test job
#PBS -l nodes=1:ppn=4
#PBS -l cput=0:10:00,pvmem=400MB
#PBS -q workq
#PBS -o test.out
#PBS -e test.err
#PBS -m abe
#Get the number of cores allocated
NP=wc -l <$PBS_NODEFILE
export OMP_NUM_THREADS=$NP
./a.out
Note the environment variable OMP NUM THREADS is set as allocated CPU number,
which represents the total CPU number a OpenMP program can use.
8.2 Set up MPI environment for future shells
MPI is an API specication that allows nodes to communicates with one another. Thus
it enables the program to utilize CPU resources across different nodes. There are 3 MPI
environments installed in the cluster, eg. LAMMPI, MPICH and OpenMPI. The cluster
uses switcher and module packages to set users preferable MPI environments.
By using env-switcher package, users can set their favorite MPI environment once
for future shells.
Step 1: To check if you are working in an MPI environment, type
-sh-3.2$ which mpicc
/opt/mpich-ch_p4-gcc-1.2.7/bin/mpicc
14
8.2 Set up MPI environment for future shells 8 PARALLEL ENVIRONMENT
If a path returned by which mpicc command as above, you have set up an
MPI environment. Otherwise, you are not working in any MPI environment. In the
above example output, mpich-ch p4-gcc-1.2.7 is the current MPI environ-
ment.
Step 2: To check those available MPI environments of the system, type
-sh-3.2$ switcher mpi --list
mpich2-1.3.1-gcc
openmpi-1.4.2
openmpi-1.3
mpich-ch_p4-gcc-1.2.7
openmpi-1.4.3
lam-7.1.4
As listed in the above example output, we have 2 versions of OpenMPI, 1 version
of MPICH and another 1 version of LAMMPI available (The output is subject to
change along with the future MPI environment upgrading).
Step 3: To set up users favorite MPI environment, type
-sh-3.2$ switcher mpi = openmpi-1.4.3
If you have set the MPI environment previously, you could get the following warn-
ing message to remind you that the MPI has a value already:
Warning: mpi:default already has a value:
mpich-ch_p4-gcc-1.2.7
Replace old attribute value (y/N)?
Type y and return, then you get
Attribute successfully set; new attribute setting will be
effective for future shells
This means using the switcher command to change the default MPI implemen-
tation will modify the PATH, LD LIBRARY PATH or MANPATH etc. for all future
shell invocations - it does not change the environment of the shell in which it was
invoked. For example, now type
-sh-3.2$ which mpicc
/opt/mpich-ch_p4-gcc-1.2.7/bin/mpicc
15
8.3 Set up MPI environment for current shell 8 PARALLEL ENVIRONMENT
It gives the same MPI environment as in Step 1. You will need to logout and login
again to make the above settings taking effect. Under the new MPI environment,
you should get the following output of the command which mpicc:
-sh-3.2$ which mpicc
/hpc/software/openmpi/1.4.3/bin/mpicc
You are working in the OpenMPI 1.4.3 environment from now on and for any future
shells.
8.3 Set up MPI environment for current shell
If it is necessary to work in different MPI environments from time to time, users could
use Environment Module tool to load the MPI environment for the current login
shell only.
Step 1: Make sure you are not working under switcher MPI environment. Otherwise
clean it using
-sh-3.2$ switcher mpi = none
Step 2: Load MPI environment using one of the following commands:
-sh-3.2$ module load mpi/openmpi-1.4.3
or
-sh-3.2$ module load mpi/mpich2-1.3.1-gcc
or
-sh-3.2$ module load mpi/lam-7.1.4
You can also load all three MPI parallel environments and select one of them by
using module switch command. For example,
-sh-3.2$ which mpicc
/usr/bin/which: no mpicc in (/usr/kerberos/bin:/usr/local/
bin:/bin:/usr/bin:/opt/pbs/bin:/opt/pvm3/lib:/opt/pvm3/
lib/LINUX64:/opt/pvm3/bin/LINUX64:/opt/env-switcher/bin:/
opt/c3-4/
-sh-3.2$ module load mpi/lam-7.1.4
16
8.4 Build parallel codes using MPI 8 PARALLEL ENVIRONMENT
-sh-3.2$ which mpicc
/opt/lam-7.1.4/bin/mpicc
-sh-3.2$ module switch mpi/mpich2-1.3.1-gcc
-sh-3.2$ which mpicc
/hpc/software/packages/mpich2/1.3.1-gcc/bin/mpicc
-sh-3.2$ module switch mpi/openmpi-1.4.3
-sh-3.2$ which mpicc
/hpc/software/openmpi/1.4.3/bin/mpicc
You can unload all MPI environments by using
-sh-3.2$ module unload mpi
8.4 Build parallel codes using MPI
Please refer to tutorial or course materials for detail programming using MPI. An online
course of MPI programming can be found on NCIs training website.
8.5 Submit MPI job to the batch queue system
The user needs to set an appropriate MPI environment in the job script by using module
load your-mpi-environment. A sample job script on running program cpi
to calculate Pi is given for each MPI environment.
8.5.1 LAM/MPI sample script
#!/bin/sh
#PBS -N test job
#PBS -l nodes=8
#PBS -l cput=0:10:00,pvmem=400MB
#PBS -q workq
#PBS -o test.out
#PBS -e test.err
#PBS -m abe
#Load the module of lam mpi
source /etc/profile.d/00-modules.sh
17
8.5 Submit MPI job to the batch queue system 8 PARALLEL ENVIRONMENT
module load mpi/lam-7.1.4
#Get the number of cores allocated
NP=wc -l <$PBS_NODEFILE
#Enter the directory where you ran qsub
cd $PBS_O_WORKDIR
# boot up MPI
lamboot -v $PBS_NODEFILE
#Executing command line
mpirun -np $NP ./cpi >& cpi.out
lamhalt -v $PBS_NODEFILE
This script uses LAM-MPI to execute the MPI program cpi on the nodes provided by
TORQUE. The program cpi is simply built as mpicc -g cpi.c -o cpi in
the LAM/MPI environment.
After running lamboot on the nodes listed in le $PBS NODEFILE, the script moves
to the target directory and runs cpi using mpirun with 8 threads. The script cleans
up itself by running lamhalt as soon as the calculation nish. Note that we set a system
variable NP to store the number of the allocated cores and use it in the executing line to
keep the consistence between the resources requested and those in real use.
8.5.2 MPICH2 sample script
#!/bin/sh
#PBS -l cput=0:10:00,pvmem=400MB
#PBS -N test job
#PBS -l nodes=8
#PBS -q workq
#PBS -o test.out
#PBS -e test.err
#PBS -m abe
#Load the module of mpich2
source /etc/profile.d/00-modules.sh
module load mpi/mpich2-1.3.1-gcc
18
8.5 Submit MPI job to the batch queue system 8 PARALLEL ENVIRONMENT
#Get the number of cores allocated
NP=wc -l <$PBS_NODEFILE
#Enter the directory where you ran qsub
cd $PBS_O_WORKDIR
mpiexec ./cpi >& cpi.out
This script uses gcc build of MPICH2 to execute the MPI program cpi on the nodes
provided by TORQUE. The program cpi is built as mpicc -o cpi cpi.c in
the MPICH2 environment.
If you are going to use the Intel Compiler build of MPICH2 1.3.1, please use
module load mpich2-1.3.1-itl
in your job script instead.
The above example job can be found at /hpc/tmp/examples/mpich2-1.3.1.
8.5.3 OpenMPI sample script
It is recommended to use OpenMPI 1.4.3.
#!/bin/sh
#PBS -l cput=0:10:00,pvmem=400MB
#PBS -N test job
#PBS -l nodes=8
#PBS -q workq
#PBS -o test.out
#PBS -e test.err
#PBS -m abe
#Load the module of openMPI 1.4.3
source /etc/profile.d/00-modules.sh
module load mpi/openmpi-1.4.3
#Get the number of cores allocated
NP=wc -l <$PBS_NODEFILE
#Enter the directory where you ran qsub
cd $PBS_O_WORKDIR
19
9 USAGE POLICY
mpirun -machinefile $PBS_NODEFILE -np $NP cpi >& cpi.out
This script uses OpenMPI-1.4.3 to execute the MPI program cpi on the nodes provided
by TORQUE as listed in $PBS NODEFILE. The program cpi is built as mpicc
-g cpi.c -o cpi in the OpenMPI 1.4.3 environment. The above example job can
be found in /hpc/tmp/examples/openmpi-1.4.3.
For using OpenMPI 1.3 and 1.4.2, please adding the following lines in your job script:
module load mpi/openmpi-1.3
mpirun -prefix /hpc/software/openmpi/1.3 -x PATH -machinefile
$PBS_NODEFILE -np $NP cpi >& cpi.out
or
module load mpi/openmpi-1.4.2
mpirun -prefix /hpc/software/openmpi/1.4.2 -x PATH -machinefile
$PBS_NODEFILE -np $NP cpi >& cpi.out
9 Usage Policy
Users must be mindful that the HPC is a shared resource. In particular, users must not
use excessive resources on the HPC that locks out others for large amounts of time. The
following guidelines must be observed:
no user should at any time be using more than 30% of the available CPU resources
of the HPC. If user requires more than 30% of the available CPU resources, contact
ITS.
users must be mindful of other resources which must be shared with other users,
such as storage and memory. The set of processes running on a node should not
consume more than 2 times the amount of physical memory on that node.
users must not use software which is licensed by another user or group without prior
approval of the user or group which has paid for the licenses.
users agree to be on the HPC USERS mailing list and to read all emails sent to the
above list. ITS will communicate information via this list and will convene regular
user group meetings. Users should attend such meetings where possible.
20
10 CONTACT INFORMATION
if a user has problems with the operation of the cluster or notices any failures of the
hardware or software he must report any such problems to ITS as soon as possible.
ITS may allow a user or group of users to have sole access to the HPC for a short
time in special circumstances.
users must submit all large jobs through the job queuing system on the control
node and should avoid signing onto compute nodes. Small test jobs (less than a
few minutes) may be run 1 at a time on the control node. The time-consuming or
memory-consuming commands must run as either batch or interactive PBS job.
if cluster admins observe unreasonable user behavior, they will rst contact the user
by email, but if there is no response within an appropriate time, they may delete user
jobs.
10 Contact Information
If you have any problems with or comments on the above document please contact ITS.
The following email addresses may be used.
hpc users@uow.edu.au: a mailing list comprising the HPC users. May be moder-
ated.
hpc admin@uow.edu.au: the HPC administrators. Use the address for account re-
quests or technical issues.
21

Вам также может понравиться