Job management¶

Job information¶

You can use squeue utility to see the list of all running jobs. Without any parameters, it will show the list of all jobs that are running or pending on the whole cluster.

squeue

Since the queue is typically very long, it may be beneficial to pipe the output to the less command. Then you can use up/down arrows to scroll as well as ctrl+f and ctrl+b to page forward and back respectively. You can get to the bottom of the output by pressing G. Finally, press q to exit less.

squeue | less

You can limit the output to display only your jobs -u $USER.

squeue -u $USER

The output includes basic job information such as job id, user, and requested resources. Jobs cannot exceed the END_TIME but they can terminate earlier. NODELIST(REASON) column shows the list of nodes for the running jobs or the reason for pending jobs. The most common reason is (priority), which means that the job is not running because it has lower priority than some other scheduled jobs. Pending jobs with highest priority will have (resources) as the reason. Occasionally, you may see (ReqNodeNotAvail). In most cases, it means that a reservation has been placed on partition nodes due to an upcoming maintenance and your job cannot start as its runtime may overlap with the maintenance window.

By default, the jobs are sorted by increasing step id, which is not very convenient. To make the output more informative, you can sort by job state t (pending, running) and priority Q (low to high).

squeue -S t,Q | less

Other useful sorting options are node name N and expected end time e. For example,

squeue -S t,N,e | less

Note

Some pending jobs may already show their estimated end time. This is a very rough estimate and the actual completion time may be either sooner or later depending on many factors. For example, additional jobs may be submitted at any time and they may delay currently pending jobs that have lower priority.

Information about running jobs can be also obtained with the sstat utility. For example, the MaxRSS column shows the maximum amount of RAM your job has consumed so far:

sstat -a <jobid> -o Jobid,MaxRSS,AveCPU

Use the sstat --helpformat command to see the list of all available fields and check man sstat to find out exactly what each field means.

Information about jobs that ran previously can be obtained with sacct utility. The most common parameters are listed below.

-S <date> displays jobs that started after the specified date. Date should be in ISO format, e.g. '2023-01-01'. You can also specify time, e.g. '2023-01-01 14:30'.
-s <state> limits the output to jobs in specific states, e.g. -s FAILED,TIMEOUT would show jobs that failed or timed out.
-j <jobid> shows the information for the specified job only.

For example,

sacct -S '2023-01-01' -s COMPLETED
sacct -j 2905691

Jobs that successfully finished should have COMPLETED state and 0:0 exit code.

Among the default output columns, you may find MaxRSS particularly useful. It shows the maximum amount of RAM your job consumed at some point during its execution. This information can be used to adjust the amount of requested RAM for similar jobs in the future.

There are many other fields that you can request. You can see the whole list by running sacct --helpformat. The output format can be controlled with the -o parameter, which accepts a comma-separated list of fields.

sacct -S '2023-01-01' -s COMPLETED -o jobid,start,reqtres,reqmem,maxrss

In some cases, a column may not be wide enough to fit entire values. sacct appends a plus sign to the end of truncated values. You can increase column width by adding %x to the column names specified with -o. Here, x is the width of the corresponding column in characters. For example, the following command expands the width of JobID, ReqTres, and ReqMem columns to 9, 25, and 15 characters respectively.

sacct -S '2023-01-01' -s COMPLETED -o jobid%9,start,reqtres%25,reqmem%15,maxrss

You may also find useful to compare the number of requested CPUs with the columns CPUTime (time allocated to the job: Elapsed*AllocCPUs) and TotalCPU (actual CPU time consumed by the job): the time should be comparable. For example, if the CPUTime is twice the TotalCPU you can try halving the number of requested CPUs.

sacct -S '2023-01-01' -s COMPLETED -o jobid,start,reqtres,reqmem,maxrss,alloccpus,cputime,totalcpu

Job priority¶

On ScienceCluster, the order in which jobs are executed is primarily determined by the job's priority. Slurm assigns the initial priority when the job is submitted. This initial priority depends on the user's fair share, which is the difference between the promised resources and the resources already consumed by the user. In other words, the more resources that have been allocated for the user's jobs in the past, the lower the initial priority will be. All users initially start with the same fair share value, which begins to decrease once the user's jobs start running. The record of usage has a half-life decay of 7 days. This implies that the user's fair share may increase over time, consequently increasing the priority of pending jobs.

In addition to the fair share value, a job's priority depends on the amount of time the job remains in the queue. The longer it stays in the queue, the higher the priority bonus.

At regular intervals, Slurm re-evaluates the priority of jobs and checks whether there are enough resources to run the jobs with the highest priority. If so, the jobs are assigned to nodes with available resources for execution. Additionally, there is a backfilling mechanism that schedules lower-priority jobs if available resources are insufficient for higher-priority jobs, and scheduling these lower-priority jobs does not delay the scheduling of higher-priority jobs.

You can see the priority of pending jobs by using sprio -l command. When you run squeue, the job with the highest priority is indicated by "(Resources)" in the NODELIST(REASON) column, while jobs with lower priority have "(Priority)" in that column.

When will my job run?¶

We receive this question rather frequently. Unfortunately it has neither a definitive nor even an approximate answer due to the complexity of the scheduling algorithm and the highly dynamic nature of the environment.

On ScienceCluster, Slurm schedules jobs based on priority, which primarily depends on the user's previous resource consumption. This impact from consumption decays over time. Additionally, a job's priority increases the longer it sits in the queue, but it can also decrease if the user has other jobs running.

We have also enabled the job backfilling feature. This allows lower-priority jobs to run earlier as long as they do not delay higher-priority jobs. Backfilling is particularly beneficial for shorter jobs that require fewer resources. This might encourage users to request only the resources their jobs can actually use. However, it is still advisable to request a small buffer especially when it comes to memory and runtime.

High-priority jobs may lose their top position in the queue if another user submits jobs with higher priority. Given the large number of users on ScienceCluster, who may work at various hours, this can happen at any time. Although Slurm can provide an estimated completion time for your job, actual completion may be delayed due to newly submitted jobs. Conversely, jobs often finish earlier than their requested time suggests, which can help your job start sooner.

Even though there is no definitive answer to the question, there are several commands you can use to check the cluster's current load and your job's position in the queue.

Jobs in queue during maintenance¶

To maintain the ScienceCluster, Science IT administrators may create a reservation that includes all compute nodes. This ensures no jobs are running during maintenance periods, as hardware or software updates may interfere with running workloads, and reboots are often required.

Pending jobs that cannot finish before the start of the reservation—based on their requested runtime—will remain in the queue.
These jobs will not lose priority and will be scheduled as usual once the reservation ends.
Running jobs are prevented by the reservation and will not start during the maintenance window.

Checking for upcoming reservations¶

You can check for upcoming reservations using:

scontrol show reservations

This command displays the reservation start and end times, affected nodes, and other details. For example:

ReservationName=s3it.uzh_51 StartTime=2025-06-04T06:00:00 EndTime=2025-06-04T18:00:00 Duration=12:00:00
   Nodes=u20-cha0tm0-[601-602],u20-chaiam0-[611-615],u20-chi0000-[301-302],u20-chii000-[401-417],u20-chiihm0-[616-617],u20-chiivm0-[603-610],u20-cva0ts0-[501-510],u20-cva0000-[001-010,101-128,201-209] NodeCnt=93 CoreCnt=3910 Features=(null) PartitionName=(null) Flags=MAINT,IGNORE_JOBS,SPEC_NODES,ALL_NODES
   TRES=cpu=5054
   Users=(null) Groups=(null) Accounts=s3it.uzh Licenses=(null) State=INACTIVE BurstBuffer=(null)
   MaxStartDelay=(null)

Reservations typically span from 06:00 to 18:00 on the day of maintenance, but may start earlier or end later. The exact end time may also change during the maintenance.

During maintenance, access to the login nodes may also be restricted. If so:

A maintenance message will be shown at login.
Login attempts may fail temporarily.
All active tmux and screen sessions will be terminated, as login nodes may be rebooted.

Similarly, ScienceApps and Globus may not be available during maintenance periods.

Cancelling jobs¶

You can remove a pending or running job from the queue with scancel. Typically, you would use it with specific job ids.

scancel 2905690 2905690

However, it is possible to delete all your jobs that satisfy certain criteria. For example, you can delete all jobs that are pending.

scancel --state=PENDING

The command also has an interactive mode whereby it would ask you to confirm the deletion of each job before actually deleting them. The mode is enabled with -i flag.

scancel -i --state=RUNNING