Академический Документы
Профессиональный Документы
Культура Документы
Qubec
Daniel Stubbs
June 17, 2014
Summary
1. Brief History of Canadian HPC
2. Calcul Qubec: Organization, Machines and Staff
3. Obtaining an Account
4. Access, File Transfer and Storage
5. Modules
6. Job Submission
7. Job Monitoring
8. ANSYS
9. MATLAB
10. Getting Help
Calcul Qubec
Formed out of the merger of RQCHP and CLUMEQ, Calcul
Qubec now covers all of the province's universities.
Hardware is installed at the Universit de Montral, the
Universit de Sherbrooke, Universit Laval, Concordia and
the cole de technologie suprieure (McGill).
Calcul Qubec has staff at the UdeM, UdeS, Laval and McGill,
with the UdeM and McGill employees also supporting users
at Concordia and UQM.
Any researcher at any Quebec-based university has the right
to a free account with Calcul Qubec.
Obtaining an Account
Getting an account on a Calcul Qubec machine like Briare
is a little more complicated than Cirrus.
We don't personally know the faculty at Concordia and the
granting agencies that fund us require us to gather a variety
of statistics on our users.
The first step is for the faculty member or PI to register with
the Compute Canada database.
There is an acceptable use agreement and a small Web-based
form to fill out which asks for a few personal details along
with some information about your research:
10
11
12
13
14
15
16
Cluster Access
The only way to connect to CQ clusters is by ssh (Secure
Shell).
Using ssh, your password is encrypted before being sent over
the network to the server.
To connect, you need to know the host name (the machine,
e.g. briaree.calculquebec.ca), your username and finally your
password.
The first time that you connect to a machine, ssh asks if you
would like to store the servers key and typically you would
answer yes.
17
18
19
20
21
22
23
24
25
26
As for sftp, you use it like ftp: you can use cd to move
around and to transfer files, put or get.
For Windows there are also programs like WinSCP which are
very similar to Windows Explorer (graphical interface etc.).
27
28
Modules
In general we would prefer that you ask a CQ analyst to
install the software that you need.
You can then use the command module, which will modify
the necessary environment variables so that you can use the
program in question.
The most common options are,
module list
module avail
module load module_name
module unload module_name
module purge
module swap old_module new_module
29
Modules, cont.
With the module command, you can choose a particular
version of a program.
You can automatically load modules by adding the module
load line to the end of the .bashrc file in your $HOME.
There may be dependencies for the modules you load, so
that you will need to first do module load A and then
module load B.
30
Modules, cont.
31
Modules, cont.
32
Modules, cont.
33
Modules, cont.
34
Modules, cont.
35
Job Submission
The machine that you connect to when you use ssh is called
the head node.
It functions as the gateway to the cluster and as such is
shared by everyone who connects to this machine.
You should not therefore be using it for any significant
computations, beyond compiling your software the real
work should take place on the cluster's compute nodes.
You first use a text editor to create a small "job script" that
specifies what resources the job needs (e.g. number of CPUs,
amount of memory, runtime) as well as what actions need to
be carried out, step by step.
36
37
38
39
which in this case will ask for a single node with 24 GB.
There are significantly fewer 48 and 96 GB nodes on Briare
so you should only request them when you need it.
40
41
42
43
Job Monitoring
With qstat you can verify that your job is running (its state
is listed as "R") but that's about it.
If you have redirected its standard output to a file (i.e. you
have written ./my_code > result.txt), you can also look
there to see what progress the job has made.
Another technique is to go directly to the node(s) where the
job is running.
With qstat n job_id you can learn what node(s) your job
has been assigned and connect to it by ssh.
Once you're on the node, there are several commands for
observing what's happening.
44
45
46
47
48
ANSYS/Fluent
Commercial software widely used in engineering, here and at
other universities.
A couple of small research teams from cole Polytechnique
have been using ANSYS on the cluster at the UdeM for two
or three years now.
The ANSYS license is flexible enough that you can use your
license to run ANSYS on machines that are located at other
institutions.
To use ANSYS on Briare, you first need to ask us to add
your name to the ANSYS/Concordia group by sending us an
e-mail at briaree@calculquebec.ca
49
ANSYS/Fluent, cont.
Once we've confirmed your status at Concordia, we will add
your name and you can run ANSYS jobs on Briare using the
Concordia licenses.
Note that Concordia has a shared license pool for ANSYS,
including instances of ANSYS run on your workstations and
Briare jobs.
The job scheduler on Briare knows nothing about whether
there are enough licenses available it will run the job when
the CPUs become available and if the necessary licenses can't
be checked out then your job will crash immediately.
50
ANSYS/Fluent, cont.
#!/bin/bash
#PBS -o output.txt
#PBS -j oe
#PBS -l nodes=2:ppn=12
#PBS -l walltime=12:00:00
module load ANSYS_CONC/v145
cd my/research/directory
fluent 3ddp t24 -ssh -pib -mpi=pcmpi -g -i journal.jou > output1.dat
51
ANSYS/Fluent, cont.
In addition to the need to monitor ANSYS license usage,
another difference with Cirrus concerns interactive jobs.
On Cirrus you could use interactive jobs to run ANSYS and
its components using a graphical interface, much like you
would on a Windows workstation.
This practice is very strongly discouraged on Briare it is
possible to submit an interactive job using the -I option to
qsub but this is normally just for short (less than one hour)
tests and debugging.
When Briare is busy, as it often is, an interactive job that
asks for significant resources may wait for hours or even days
to start.
52
ANSYS/Fluent, cont.
What this means of course is that your interactive job could
well start at 3:40 AM on a Sunday or 8:30 PM on a Friday, so
that the resources sit there idle waiting for your input using
keyboard and mouse.
For this reason, you will need to become accustomed to
running ANSYS using a text file to control its behavior so
that it can run in a regular "batch mode" job on Briare.
This may require some adjustment in your workflow but it's
important for assuring that Briare's resources are used
efficiently and shared fairly among the many different
individuals on the system.
53
MATLAB
Various options exist on Calcul Qubec machines for using
this software, whose license is much more restrictive than
ANSYS.
The open source MATLAB clone octave is installed on all
Calcul Qubec machines and it duplicates a great deal of the
functionality of the MATLAB kernel.
It costs nothing to try using your MATLAB script with
octave, which also has a very wide variety of packages for
specialized calculations, similar to MATLAB's Toolkits.
However, it's certainly possible your MATLAB script won't
run in octave.
54
MATLAB, cont.
A further option is to use the copy of MATLAB installed on
your workstation here at Concordia to compile your script
into a standalone binary executable.
This binary file can then be copied to another computer
running the same operating system and executed using only
the freely available MATLAB runtime library.
This runtime library is installed on Briare as a module but
you will need to remember to do that MATLAB compilation
under Linux, not Windows, on your workstation.
Once again though, this solution isn't ideal for some users.
55
MATLAB, cont.
MATLAB is installed on the Guillimin cluster managed by
McGill but with a restricted license.
Only McGill users can run MATLAB in serial mode.
Outside users can run MATLAB on Guillimin using the
Distributed Computing Toolkit (DCT), assuming that they
have MATLAB installed on their workstation.
Using the DCT may require some changes to your MATLAB
script(s) and not every algorithm lends itself to running in
parallel.
A final alternative...
56
MATLAB, cont.
Rewrite your script, or at least the most compute-intensive
part of it, using open tools like C or Fortran in conjunction
with libraries such as BLAS/LAPACK, FFTW, GSL and so on.
This may seem like a lot of work but much depends on the
particular MATLAB script.
The staff at Calcul Qubec are here to help as well in making
a rough estimate of how much work it might be to convert a
script to C or Fortran and then in aiding the conversion.
After an initial time investment, you'll have software that can
run anywhere, on any number of processors, without paying a
cent to a foreign corporation.
57
Getting Help
The first place to look is the Calcul Qubec wiki, which you
can find at
https://wiki.calculquebec.ca
It has extensive documentation in English and French on
many different topics.
If however you can't find the answer there, you can write an
e-mail to
briaree@calculquebec.ca
assuming the problem is related to using Briare.
You'll normally get a response within a few hours from one of
the analysts or sysadmins at the UdeM.
58
59