Вы находитесь на странице: 1из 13

Table

of Contents
Introduction

1.1

Installation

1.2

Linux

1.2.1

Windows

1.2.2

Running the Notebooks

1.3

Keeping the installation up-to-date

1.4

Running iSDM on a cluster

1.5

Frequent Problems

1.6

Introduction

iSDM
Towards a species-by-species approach to global biodiversity modelling.

Installation

Installation
Windows
Linux

Linux

Linux
Step 1: Download Anaconda for Python 3.
To add Anaconda to your PATH (environment variable) permanently, it's best to say 'yes' to
appending to the .bashrc file, during installation. Otherwise, to temporarily make Anaconda
visible in your environment, do:
export PATH=~/anaconda3/bin:$PATH # assuming you installed Anaconda at ~/anaconda3/

Step 2: Open a command prompt, to create a python environment and install the
following packages:
conda create --name=biodiversity six pandas ipython-notebook scikit-learn git basemap
matplotlib xlrd numba gdal rasterio python=3.4

This will list the packages and dependences that need to be installed. It will ask you for a
confirmation. Click y :
Proceed ([y]/n)? y

Now you have an isolated environment called biodiversity, in which the listed python libraries
are installed. You can activate it any time you want to use the iSDM framework.
source activate biodiversity

Step 3: Install these additional packages:


pip install pygbif geopy geopandas

Step 3a (optional, only if you have R already installed): Install rpy2 to be able to
access R objects directly from Python:
pip install rpy2

Step 4: Install iSDM:


git clone https://github.com/remenska/iSDM.git
cd iSDM/
python setup.py install

Linux

Step 5: Test if everything works OK:


$ ipython
Python |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:16:01)
[...]
In [1]: import iSDM
In [2]: iSDM.__version__
Out[2]: '0.0.1'

Windows

Windows
Step 1: Download Anaconda for Python 3.
Do not install as Administrator unless admin privileges are required. If you encounter any
issues during installation, please temporarily disable your anti-virus software during install,
then immediately re-enable it.
During installation, in the dialog Advanced Options, check both options: Add Anaconda to
my PATH environment variable, and Register Anaconda as my default Python.
Step 2: Open a command prompt like this, to create a python environment and install
the following packages:
conda create --name=biodiversity six pandas ipython-notebook scikit-learn git matplotl
ib xlrd numba gdal rasterio python=3.4

This will list the packages and dependencies that need to be installed. It will ask you for a
confirmation. Click y :
Proceed ([y]/n)? y

Now you have an isolated environment called biodiversity, in which the listed python libraries
are installed. You can activate it any time you want to use the iSDM framework.
activate biodiversity

Step 3: Install these additional packages:


conda install -c conda-forge basemap
pip install pygbif geopy

Step 3a (optional, only if you have R already installed): Install rpy2 to be able to
access R objects directly from Python:
pip install rpy2

Step 4: Install Geopands

Windows

Geopandas is a very cool Open Source library to make working with geospatial data in
python easier. It is also used in the iSDM framework, and installing it on Windows is a little
bit more involved than on Linux or MacOS. You can always try the shortcut and see if it
works:
conda install -c conda-forge geopandas

If all goes well without any errors, skip to Step 5. If you get errors, then this guide gives a
detailed explanation on how to install Geopandas and its dependencies. In summary, from
the same command prompt from above (make sure you activate biodiversity if you haven't
done the Step 2):
1. Download the wheels for GDAL, Fiona, pyproj, and shapely. Make sure you choose the
files that match your architecture (64-bit) and Python version (3.4). The -cpXY
indicates the Python version (so choose the -cp34 files). Do not install them yet, just
download them.
2. If the website mentions any prerequisites in the descriptions of those 4 packages, install
the prerequisites now.
3. If OSGeo4W, GDAL, Fiona, pyproj, or shapely is already installed, uninstall it now. The
GDAL wheel contains a complete GDAL installation dont use it alongside OSGeo4W
or other distributions.
4. Open a command prompt and cd (change directory) to the folder where you
downloaded these 4 wheel files. For example: cd /My/Cool/Folder/Downloads
5. Repeat the pip install <filename.whl> command where <filename.whl> is the name
of each of the 4 downloaded files, in the following order: GDAL, Fiona, pyproj,
shapely.
6. Now that Geopandas dependencies are all installed, you can just run pip install
geopandas from the command prompt, to install Geopandas.

Step 5: Install iSDM:


From the same command prompt, run this:
git clone https://github.com/remenska/iSDM.git
cd iSDM/
python setup.py install

Step 6: Test if everything works OK:


Open the interactive python console, by typing:
ipython

Windows

and then input these 2 lines:


In [1]: import iSDM
In [2]: iSDM.__version__
Out[2]: '0.0.1'

Running the Notebooks

Running the Notebooks


The IPython Notebook (now known as the Jupyter Notebook, as it supports more languages
besides Python, like R) is an interactive computational environment, in which you can
combine code execution, rich text, mathematics, plots and rich media. You can create and
share IPython documents in a similar way to regular "paper" notebooks. Others can then run
your notebook steps, and inspect the results, in an interactive fashion.
To open and run the notebooks, you need to start a notebook server. First make sure you
are in the iSDM folder AND have activated the biodiversity environment. Run the
following command on the command line:
ipython notebook

This will print some information about the notebook server in your console, and open a web
browser to the URL of the web application (by default, http://127.0.0.1:8888). In the
notebooks folder you can find all the notebooks examples.

You may want to stop the notebook server eventually, for example to fetch fresh code and
examples from the iSDM repository. You can do this by pressing Ctrl+C .
To fetch/install the latest code and examples from the repository, run the following on the
command line:
git stash
git pull
python setup.py install

Keeping the installation up-to-date

Keeping the installation up-to-date


Any time you want to use or update the iSDM framework, you need to activate the
biodiversity environment as described in the Installation chapter (end of Step 2).

Make sure you are in the iSDM folder AND have activated the biodiversity
environment. To fetch/install the latest code and examples from the repository, run the
following on the command line:
git stash
git pull
python setup.py install

10

Running iSDM on a cluster

Running iSDM on a cluster


DAS5/SLURM specific
I did a small "tryout" on the DAS5 cluster (shared among several universities).
The "batch queueing system" behind the cluster is SLURM. For admins this is probably a
known thing. To put it simply, this is the software that takes care of reserving and
orchestrating the resource (CPU/memory) assignment among users and the programs they
are running. Lots of linux clusters use it. Below are some useful things to know. I used DAS5
@ UvA, as there is a "fat" node there with a lot of memory. The iSDM is installed (as
explained here in this manual) in a directory /var/scratch/danielar/iSDM .
This is the "job description" file that is used to submit a job to run on a cluster (called
iSDM_step2.job , entire contents below):

$ cat iSDM_step2.job
#!/bin/sh
#SBATCH --time=00:15:00
#SBATCH -N 1
#SBATCH -C fatnode
srun python scripts/step2_finegrained.py --output-location=/var/scratch/danielar/iSDM/
data/fish/step2/verify

As you can see, there are some configuration parameters (with #SBATCH ), the number of
nodes to use, and which type of constraint (in this case "fatnode"). The rest is just running
the python scripts command, with a special output folder location that is accessible on the
cluster ( /var/scratch/your-username/ gives you a lot of disk space, typically far more than
your "home" partition)
Some useful commands:
1. Submit a job:
$ sbatch --nodelist=node203 -p fatq iSDM_step2.job; squeue

(At the moment of writing, node203 is a "fat" node with a lot of memory available). See here
the list of resources/special types of machines available in the cluster.
2. Check the resource usage:
$ sstat 320851 (the number is the job ID)

11

Running iSDM on a cluster

This will give you everything the job is using in terms of resources. RSS and VMEM are
interesting, as they show the memory usage.
3. Check the output so far (from the directory where you submitted it):
$ tail -n 100 slurm-320851.out

This will show the last 100 lines of the log file xxxx.out, with the number 320851 indicating
the job number.
4. Check the queue status of the cluster. This will list all users that use resources, and
JOBIDs, something like this:
$ squeue -a
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
319634 defq job vsivadas PD 0:00 1 (Resources)
319578 defq vishnu20 vsivadas R 97-01:59:10 1 node205
319579 defq vishnu20 vsivadas R 97-01:58:39 1 node206
319585 defq bash tscohen R 96-05:50:15 1 node201
319586 defq bash tscohen R 96-05:50:03 1 node202
319604 fatq bash tscohen R 90-22:54:15 1 node204

There you should find your jobID in case you need it.

12

Frequent Problems

Frequent Problems
Windows
Some institutes with ICT-managed accounts make it a bit more difficult by not allowing
regular users to modify their environment variables. As a consequence, your Anaconda
installation will not be "seen" by Windows when you open the command prompt. So you may
experience:
> activate biodiversity
'activate' is not recognized as an internal or external command,
operable program or batch file.

This can be easily fixed. Assuming your installation folder for Anaconda is C:\Anaconda2 ,
you need to issue this command after you open the command prompt.
set Path=C:\Anaconda2\Scripts;%Path%

You can afterwards continue with the usual commands, like activate biodiversity and
ipython notebook .

13

Вам также может понравиться