Skip to content

Parallelisation

There are a few main approaches to parallelisation.

  • Multithreaded application: A single program that runs multiple processes each with private memory allocation
  • Job arrays: Several program instances that run in parallel
  • MPI: A single program that runs multiple processes with shared memory; please contact us if you would like assistance using MPI

Multithreaded application

This is the simplest case. You only need to request the number of vCPUs that your application can take advantage of. In most cases, there is a parameter that you would need to specify when calling the application. The application's documentation should explain what parameter to use. You can find sample job scripts in the Job Submission section.

Job arrays

In general, job arrays are useful for applying the same processing routine to a collection of multiple input data files or different sets of parameters to the same input data files. Job arrays offer a very simple way to submit a large number of independent processing jobs. In this example, the --array=1-16 option will cause 16 array tasks (numbered 1, 2, ..., 16) to be spawned when this master job script is submitted. The array tasks are simply copies of this master script that are automatically submitted to the scheduler on your behalf. Thus, exactly the same amount of resources is requested for each array task. However, in each array task an environment variable called SLURM_ARRAY_TASK_ID will be set to a unique value. In this example, the number will be in the range 1, 2, ..., 16. In your script, you can use this value to select, for example, a specific data file that each array task will be responsible for processing.

#!/bin/bash
#SBATCH --job-name=arrayJob
#SBATCH --output=arrayJob_%A_%a.out
#SBATCH --error=arrayJob_%A_%a.err
#SBATCH --array=1-16
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=3850

srun your_application $SLURM_ARRAY_TASK_ID

Job array indices can be specified in a number of ways. For example:

  • A job array with index values between 0 and 31
    #SBATCH --array=0-31
    
  • A job array with index values of 1, 2, 5, 19, 27

    #SBATCH --array=1,2,5,19,27
    

  • A job array with index values between 1 and 7 with a step size of 2 (i.e. 1, 3, 5, 7)

    #SBATCH --array=1-7:2
    

In the example above, you can see that the output and error file names have special placeholders %A and %a. For each array tasks, they will be replaced with job ID and array task ID respectively.

Warning

Please make sure that each array task writes to its own unique set of files. For example, you can add SLURM_ARRAY_TASK_ID as a file name suffix or append it to the output directory name.

Below is a sample script that runs a parameter sweep with a hypothetical application. It iterates over two parameters: alpha and gamma. The first parameter takes values from 0 to 6 with a step of 2 (i.e., 0, 2, 4, 6). If a step of 1 is desired, the step value can be omitted; e.g., {0..10}. The second parameter can take values 'Aa', 'Bb', or 'Cc'. In total, there are 4 * 3 = 12 parameter combinations. So, the value for the --array parameter should be 0-11.

The nested loops generate two arrays that allow reconstruction of all parameter combinations. In this case, alphaArr contains 0 0 0 2 2 2 4... while gammaArr has Aa Bb Cc Aa Bb Cc Aa.... Later, $SLURM_ARRAY_TASK_ID is used to retrieve a particular combination. The array indices start with 0. Thus, when the $SLURM_ARRAY_TASK_ID variable is 2, for example, alpha is set to 0 and gamma takes the value Cc, which are subsequently passed to the myapp application.

The output file is saved to the results directory. To make the output file unique, both parameter values are added to its name. For example, when $SLURM_ARRAY_TASK_ID is 2, the output path will be results/output_a0_gCc.txt. Note the use of curly braces (e.g., {}) around the variable names. Since variable names can contain an underscore, without the braces bash would identify the first variable as $alpha_g. The curly braces around gamma are technically redundant but they can help to prevent a potential bug if another variable is added to the file name.

The back slash (i.e., \) is a line continuation character. There must be no spaces after back slashes.

Slurm exports several environment variables when it submits a job for execution. You may find SLURM_CPUS_PER_TASK particularly useful. In this example, the variable is used to specify the number of threads available to the application.

#!/usr/bin/env bash
#SBATCH --time=1:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=3850
#SBATCH --job-name=param_sweep
#SBATCH --output=param_sweep_%A_%a.out
#SBATCH --array=0-11

alphas=({0..6..2})
gammas=(Aa Bb Cc)

alphaArr=()
gammaArr=()
for alpha in "${alphas[@]}"; do
   for gamma in "${gammas[@]}"; do
      alphaArr+=($alpha)
      gammaArr+=($gamma)
   done
done

alpha=${alphaArr[$SLURM_ARRAY_TASK_ID]}
gamma=${gammaArr[$SLURM_ARRAY_TASK_ID]}
srun myapp \
   --alpha=$alpha \
   --gamma=$gamma \
   --threads=$SLURM_CPUS_PER_TASK \
   --output=results/output_a${alpha}_g${gamma}.txt

Before submitting the job, it would be helpful to test the script to ensure that it translates the task id correctly into the parameters. This can be done by setting $SLURM_ARRAY_TASK_ID to the first command line argument before its first use and adding echo before srun.

# ...
SLURM_ARRAY_TASK_ID=$1
alpha=${alphaArr[$SLURM_ARRAY_TASK_ID]}
gamma=${gammaArr[$SLURM_ARRAY_TASK_ID]}
echo srun myapp \
   --alpha=$alpha \
   --gamma=$gamma \
   --threads=$SLURM_CPUS_PER_TASK \
   --output=results/output_a${alpha}_g${gamma}.txt

Suppose the job script has been saved as myjob and added execution permissions chmod u+x myjob. Now, when you call it with different ids it will print the application call string that would have been executed. Notice that you need to run the script directly without sbatch.

./myjob 2
./myjob 5
./myjob 9

Caution

Make sure you remove both changes before submitting the job!


Last update: January 7, 2022