How to run an interactive session on ScienceCluster¶
What is an interactive session and why should I use one?¶
The main protocol for submitting jobs to ScienceCluster is the batch queuing system, where a user submits a job to the queue then waits for its output, which may not be available until after the job has finished running. However, there are cases when your workflow may benefit from having more rapid feedback.
For example, if you'd like to actively debug your code on the cluster then you may need to interact with your code more without constantly submitting jobs to the queue. As such, if you need a more interactive experience on ScienceCluster when developing your code and ScienceApps do not meet your requirements, consider using interactive sessions.
In addition, an interactive session may be necessary for tasks that require a lot of resources (memory or CPU time) because login nodes restrict resource usage. These tasks may include, for instance, creation of conda environments and software installation.
An interactive session is an active terminal session on a compute node for the time period that you request and with the resources you request. You submit an interactive session request to the queue, just as you would a job; when the available resources can be allocated, your interactive session opens, and you can run or debug your code in real time.
Warning
There is an important distinction between running code interactively from the login node and running code from an interactive session.
You should not simply log in to the cluster and start running your code from the command line. When you do so, you are using the login node's resources. Login nodes are designed for smaller tasks such as submitting jobs or checking job output. If your task consumes too much memory or CPU time on a login node, it will fail, often with a cryptic error message.
How do I request an interactive session?¶
Requesting an interactive session is a straightforward process. To do so, you could run the following line of code from a login node:
srun --pty -n 1 -c 2 --time=01:00:00 --mem=7G bash -l
This would request an interactive session with the following attributes:
- a single task with 2 cores, requested via
-n 1 -c 2
- a 1 hour time limit, requested via
--time=01:00:00
- 7 GiB of RAM, requested via
--mem=7G
- the session would be a standard bash login session, requested via
bash -l
You should customize these parameters to tailor the interactive session to your needs. For example, you can request a GPU device by inserting --gres=gpu:1
before bash -l
or increase memory by changing the corresponding parameter to --mem=15G
.
srun --pty -n 1 -c 2 --time=01:00:00 --mem=15G --gres=gpu:1 bash -l
After the interactive session begins, your prompt will change to show the host name of the compute node where resources have been allocated. All modules that you have previously loaded on the login node will remain loaded. You can also load additional modules as needed. The interactive session will also inherit all the environmental variables that were set on the login node. However, if you define any functions on the login node, they will not be exported.
Note
Depending on the amount of requested resources, your recent usage, and the current cluster utilization, you may need to wait for your interactive session to begin. Keep your Terminal open while you wait or your interactive session will be cancelled.
How to connect to a running job?¶
You cannot connect to the corresponding compute node using ssh
. However, you can access the node in a similar way to requesting an interactive session:
$ srun --pty --interactive --jobid <JOBID> bash -l
If you run a multi-node job, you can request a given node using --nodelist=<NODE>
.