Job submission¶
CPU jobs¶
Jobs are typically submitted as bash scripts. At the top of such a script, you can specify various SBATCH
parameters for your job submission, such as the amount of memory and the number of CPUs that you want to request. After that, you include the commands you want to execute. The script below simply writes the name of the node where the job will run to a file named job.out
.
#!/usr/bin/env bash
#SBATCH --cpus-per-task=1
#SBATCH --mem=100
#SBATCH --time=2:00
#SBATCH --output=job.out
srun hostname
The first line is a so-called shebang that specifies the interpreter for the file. In this case, the interpreter is bash
. However, it could be any other interpreter such as tcsh
, python
, or R
.
The job above requests 1 CPU (--cpus-per-task=1
) and 100 MB of RAM (--mem=100
) for 2 minutes (--time=2:00
). These and other parameters are described below in greater detail.
It is essential to place all SBATCH
directives immediately after the shebang. Otherwise, they will be ignored. If you do not specify any parameters, Slurm will allocate 1 vCPU, 1 MB of memory and 1 second for execution time. Since such allocation is insufficient for any real job, at a minimum you should specify the amount of memory and execution time.
If you save the script to the file named myjob
, you can schedule its execution with
sbatch myjob
Upon successful submission, Slurm will print the Job ID.
If you have a compiled application, you can replace hostname
command with your application call. In addition, you can sequentially run several commands within the same job script. Consider two hypothetical applications: convert_data
and process_data
. Each accepts certain parameters and both reside in your ~/data/bin
directory. You can run process_data
after convert_data
with the following script.
#!/usr/bin/env bash
#SBATCH --cpus-per-task=2
#SBATCH --mem=7700
#SBATCH --time=30:00
#SBATCH --output=job.out
srun ~/data/bin/convert_data -i input.csv -o input.txt
srun ~/data/bin/process_data -i input.txt --threads=2 -o results.txt
If your application is available as a module, you need to load the module either in the job script before you make the call or on the command line before you submit the job. Below is an example with Mathematica where the module is loaded in the job script.
#!/usr/bin/env bash
#SBATCH --cpus-per-task=1
#SBATCH --mem=3850
#SBATCH --time=30:00
#SBATCH --output=job.out
module load mathematica
srun wolframscript -file myscript.wls
To run a script written in an interpretable language, it is often necessary to configure the environment first. We provide examples for common languages, including R and Python.
GPU jobs¶
To schedule a GPU job, you would typically need to load a GPU module (gpu
, multigpu
) or a module for a specific GPU type (t4
, v100
, a100
). Cuda and related modules become available only after you load one of these modules, but you can specify all necessary modules in a single command.
module load multigpu cuda/11.4.4
In most cases, cudnn module would be required as well. Since a specific cudnn version is only compatible with certain cuda versions, we also provide modules that load the compatible combinations.
module load multigpu cudnn/8.2.4
GPU jobs should explicitly request GPU resources by passing --gres=gpu:1
. The number at the end of the parameter value indicates the number of requested GPU devices.
GPU jobs normally need a relatively small amount of system memory. In most cases, it would be sufficient to request 4000 MB or even less; e.g., --mem=4000
.
The sample script below requests a single GPU device and 4000 MB of system memory for 1 hour.
#!/usr/bin/env bash
#SBATCH --gres=gpu:1
#SBATCH --mem=4000
#SBATCH --time=01:00:00
#SBATCH --output=job.out
nvidia-smi
Low priority queue¶
A special queue lowprio
is available for GPU jobs that have a limited job duration and are of lower priority. In situations where the cluster is highly utilized your GPU job might start earlier using this option. Submit your job with the --partition lowprio
flag, and with a maximum time of 24 hours.
Parameters¶
Slurm parameters can be specified either at the top of the job submission script with the #SBATCH
prefix or on the command line. Parameters indicated on the command line override those in the job script. For example, the script below requests a 2 hour execution time and 2000 MB of RAM.
#!/usr/bin/env bash
#SBATCH --cpus-per-task=2
#SBATCH --mem=2000
#SBATCH --time=02:00:00
srun my_computations -t 2
However, you can schedule it to run for up to 4 hours with 4000 MB of RAM using the following command.
sbatch --time=04:00:00 --mem=4000 myscript
CPUs¶
The --cpus-per-task
flag controls the number of CPUs that will be made available for each task of the job. By default, your job will have a single task. Jobs with multiple tasks are described in the Parallelisation section.
Warning
If you try to use more threads than the number of CPUs you have requested, those threads will not run all simultaneously but will be competing for CPU time with each other. This can significantly degrade the performance of your job and will be slower than running the matching number of threads. This is true even when each thread is using less than 100% of the CPU time.
Memory¶
There are two ways to request memory. You can specify the total amount with the --mem
flag as in the examples above. Alternatively, you can use the --mem-per-cpu
flag to request a certain amount for each requested CPU. The value is in MB, but GB can be specified with G
suffix; e.g., --mem-per-cpu=4G
. If your job allocates more memory than requested, Slurm may terminate it.
Time¶
You should strive to split your calculations into jobs that can finish in fewer than 24 hours. Short jobs are easier to schedule; i.e., they are likely to start earlier than long jobs. If something goes wrong, you might be able to detect it earlier. In case of a failure, you will be able to restart calculations from the last checkpoint rather than from the beginning. Finally, long jobs fill up the queue for extended periods and prevent other users from running their smaller jobs.
A job's runtime is controlled by the --time
parameter. The value is formatted as dd-hh:mm:ss
where dd
is the number of days, hh
- hours, mm
- minutes, and ss
- seconds. If the leading values are 0, they can be omitted. Thus, --time=2:00
means 2 minutes, --time=36:00:00
stands for 36 hours, and --time=1-12:30:00
requests 1 day, 12 hours, and 30 minutes.
If your job runs beyond the specified time limit, Slurm will terminate it. Depending on the value of the --time
parameter, slurm automatically places jobs into one of the Quality of Service (QOS) groups, which in turn affects job scheduling priority as well as some other limits and properties. ScienceCluster has four different QOS groups.
- normal: 24 hours
- medium: 48 hours
- long: 7 days
- verylong: 28 days
In order to be able to use the verylong
qos (i.e., running times over 7 days), please request access via the S3IT issue tracker. A single user can run only one job with the verylong
QOS at a time. If you schedule multiple verylong
jobs, they will run serially regardless of the resource availability.
You can view the details of each QOS using the sacctmgr show qos
command. For example,
sacctmgr show qos format=name,priority,maxwall,maxsubmit,maxtrespu%30,maxjobspu
will show the name of the QOS, the priority, the maximum wall time, the maximum number of jobs per user that can be submitted to the queue, the maximum resources you can request, and the maximum number of jobs you can run.
Output and errors¶
A job's output can be saved to a custom file using the --output
parameter. For example, --output=job.out
indicates that all output that the script would have printed on the screen or console should be directed to the job.out
file in your current working directory.
If you want the output file to be in a different location, you can use either an absolute path (e.g., --output=/scratch/$USER/logs/job.out
) or a path relative to your working directory (e.g., --output=logs/job.out
). In addition, you can specify a placeholder for Job ID denoted with %j
; e.g.m --output=logs/job_%j.out
. If you do not specify the --output
parameter, the output will be directed to slurm-%j.out
in your working directory.
Warning
The directory where you plan to write the output file should already exist. If it does not, the job will fail.
By default, Slurm writes error messages to the same file. However, you can redirect error messages to a different file by adding an extra parameter --error=job.err
.
GPUs¶
GPUs can be requested in the list of generic consumable resources via the --gres
parameter. Specifically, a single GPU device can be requested as --gres=gpu:1
. It is possible to request multiple GPUs as well, but please ensure that your job can actually consume them. Requesting more than 1 GPU without changing the way your applications run will not make them run faster and may increase the time your job waits in the queue.
Some nodes have GPUs with a different amount of GPU memory. If your job fails because you run out of GPU memory, you can specifically request the higher-memory nodes by specifying the node type and a memory constraint; i.e., --gres=gpu:V100:1 --constraint=GPUMEM32GB
. As with the number of GPU devices, you only need to do so when your application runs out of GPU memory on the nodes with 16 GB of on-board memory. Your job will not run faster on a high-memory node. For convience we've added a module for V100's with 32GB GPU RAM, you can simply call module load v100-32g
.
If you need at least 32GB of GPU RAM and you don't have a preference between the 32GB V100 or 80GB A100 GPUs, you can use the following two flags when submitting your job or interactive session: --gres=gpu:1 --constraint="GPUMEM32GB|GPUMEM80GB"
. You will then receive whichever GPU is first available (and cost contributions will apply according to the GPU you receive).
It is important to understand the difference between GPU and system memory. Each GPU device has its own on-board memory that is available only to the code that runs on that device. Code that runs directly on a GPU device does not consume system memory. However, other portions of the application may use system CPUs and require system memory. In such cases, it may be necessary to request a higher amount of system memory with --mem
.
Project¶
If you have access to the cluster under two different projects (i.e., tenants), you can choose which project should be billed for the job by setting the --account
parameter. You can find more details in the Account Info section.