The goal of this short tutorial is to introduce new users to ScienceCluster computing service. It assumes that the readers already have experience with remote Linux servers but not necessarily with clusters. For a longer tutorial, considering following the training handout or joining a training course.
Connecting to the cluster¶
You can connect to the cluster using your UZH shortname and Active Directory (AD) password, like so
ssh -l shortname cluster.s3it.uzh.ch
After running the command, you will be prompted for your password. Please note that there will be no echo as you type your password -- nothing will be displayed when you type each character. If you need to update your AD password, you can do so in the Identity Manager.
For security reasons, you can access ScienceCluster only from the UZH internal network. Please use the UZH VPN if you are connecting from off-campus.
For more detailed instructions about how to connect, read here.
There are four filesystems where you can store your data.
Your home filesystem (
/home/$USER) has a quota of 15 GB and 100,000 files. Typically, it is used to store configuration and small important files.
For persistent storage of larger files, you can use the data filesystem (
/data/$USER). It has a limit of 200 GB and it is not backed up. This filesystem is also appropriate for software installation (e.g., Python modules or R packages).
Large input data and computational results can be stored on the scratch filesystem (
/scratch/$USER), which has a quota of 20 TB and is not backed up. Please note that this filesystem is meant for temporary storage and the files may be automatically deleted if they have not been accessed within one month.
If you need additional space for persistent data beyond the
scratch filesystems, you can use scalable storage. It is not subject to quota but it requires cost contributions based on the actual usage.
Jobs are submitted with the
sbatch command. The default values for resource allocations are very low. If you do not specify any parameters, Slurm (the automatic job allocation system) will allocate 1 vCPU, 1 MB of memory, and 1 second for execution time. Therefore, you need to specify at minimum the amount of memory and the expected runtime. For example, to run a
hostname command on the cluster, you can create a file named
test.job with the following contents:
#!/usr/bin/env bash hostname
Then you can submit it for execution with the following command (assuming that you have already loaded a partition module).
sbatch --time=0:10:0 --mem=7800 --cpus-per-task=2 test.job
This will request 2 CPUs and 7800 MB of RAM for 10 minutes. Alternatively, you can specify these parameters in your job file; e.g.,
#!/usr/bin/env bash #SBATCH --time=0:10:0 #SBATCH --mem=7800 #SBATCH --cpus-per-task=2 hostname
You can use modules to request specific hardware types such as GPUs and an Infiniband network. For example, if you want to use A100 GPUs, you can load
a100 module, i.e.
module load a100
This will automatically limit the job submission to A100 nodes. Loading this module pre-selects defaults for the libraries optimised for multi-GPU multi-node workflows (OpenMPI). If you load
t4 instead, you would get the same version of OpenMPI but compiled without Infiniband support.
It is necessary to load one of the GPU flavour modules (
a100, etc.) in order to be able to load
It is recommended to load GPU flavour modules outside of batch scripts. They set constraints that may interfere with resource allocation for job steps.
For testing or debugging purposes, you can run your job in an interactive session. Any other use of interactive sessions is generally discouraged. You can start an interactive session with the following command.
srun --pty --time=1:0:0 --mem-per-cpu=7800 --cpus-per-task=2 bash -l
An example with a GPU could be
module load t4 srun --pty --time=1:0:0 --mem-per-cpu=7800 --cpus-per-task=2 --gpus=1 bash -l
For more detailed information on job submission, click here.
Maximum running time¶
You should strive to split your calculations into jobs that can finish in fewer than 24 hours. Short jobs are easier to schedule; i.e., they are likely to start earlier than long jobs. If something goes wrong, you might be able to detect it earlier. In case of a failure, you will be able to restart calculations from the last checkpoint rather than from the beginning. Finally, long jobs fill up the queue for extended periods and prevent other users from running their smaller jobs.
A job's runtime is controlled by the
--time parameter. If your job runs beyond the specified time limit, Slurm will terminate it. Depending on the value of the
--time parameter, slurm automatically places jobs into one of the quality of service (QOS) groups, which in turn affects job scheduling priority as well as some other limits and properties. ScienceCluster has four different QOS groups.
- normal: 24 hours
- medium: 48 hours
- long: 7 days
- verylong: 28 days
By default, users get access only to normal, medium and long. If you would like to use the
verylong QOS (i.e., running times over 7 days), please contact Science IT with your request. A single user can run only one job with the
verylong QOS at a time. If you schedule multiple
verylong jobs, they will run serially regardless of the resource availability.
You can view the list of currently scheduled and running jobs with the
squeue command. Without any parameters, it will display all the jobs that are currently scheduled or running on the cluster. If you loaded a partition module, then the output will be limited to the jobs scheduled or running on that particular partition. To see only your jobs, you need to specify the
squeue -u $USER
If you want to delete a job from the queue you can do so with
scancel, and you need to specify the Job ID as an argument. The Job ID is always reported when you schedule a job. You can also find it in the output of
squeue. Multiple jobs can be deleted at once. For example,
scancel 2850610 2850611
You can also cancel all your jobs at once without specifying and Job IDs. The following two commands delete all your jobs or all your pending jobs, respectively.
scancel scancel --state=PENDING
For more information about job management, click here.
There are four main approaches to parallelisation.
- Single program that runs multiple processes each with private memory allocation
- Several program instances that run in parallel (i.e., job arrays)
- Single master program that launches several slave programs
- Single program that runs multiple processes with shared memory (MPI)
For the first approach, you do not need to do anything special. You just submit a job requesting the number of vCPUs that your program can efficiently use. The other three approaches are described in the Job Scheduling section of the documentation.
In addition to the documentation provided on this site, you can also find the following external resources useful.
- Slurm Quick Start Guide
- Slurm Documentation
- CECI Slurm Quick Start Tutorial (Although it is written specifically for CECI users, the tutorial is excellent and can be used as a general Slurm guide.)