Skip to content

FAQs

What will happen to my queued jobs during maintenance?

When ScienceCluster maintenance occurs, Science IT admins will "drain" the ScienceCluster nodes so that the hardware and/or software used within the cluster can be updated. When a node is "drained", all currently running jobs will be allowed to finished and no additional jobs in the queue will be accepted to run. The maintenance will then be performed once a node has completed all running jobs (i.e., there is no activity on the node).

During this process, the SLURM queue will continue to hold all jobs with their assigned priority. As soon as the ScienceCluster maintenance window has closed, and the nodes are freed from their "drained" status, all jobs in the queue will continue to run normally.

Of note, it will not be possible to schedule jobs with time frames that overlap with a scheduled maintenance window on a node. These jobs will simply be rejected from the queue when you attempt submitting them via sbatch. When this situation occurs, you should either adjust the time limit so it doesn't overlap with a maintenance window or simply submit the job(s) after the maintenance has been completed.

I am over-quota. How can I clean up my file storage?

Consider storing large files in your scalable storage folder, which is in your project space and can be found by running the command quota.

Folders that typically grow in size with cache or temporary files are .local and .cache. To find the storage used (in GB) in all subfolders in your /home/$USER and /data/$USER folders, run:

ls -lha

Anaconda / Mamba

To clean up cached installation packages from Anaconda, run the following commands:

module load anaconda3
conda clean -a
pip cache purge

Or with Mamba:

module load mamba
mamba clean -a
pip cache purge

Singularity

Singularity stores its cache by default in a user's home folder. To determine your cache folder for Singularity:

echo $SINGULARITY_CACHEDIR

To clean the cache:

module load singularityce
singularity cache clean

You can change your singularity cache path with this command

export SINGULARITY_CACHEDIR=/scratch/$USER/

Or add it to your .bashrc file so that it is set each time you log in.

echo "export SINGULARITY_CACHEDIR=/scratch/$USER/" >> ~/.bashrc
source ~/.bashrc
echo $SINGULARITY_CACHEDIR

Framework folders

Certain software frameworks (e.g., HuggingFace) cache files programmatically, which can be cleaned with their own commands. For example, with HuggingFace consider using:

huggingface-cli delete-cache
You can find corresponding information on such commands in the framework specific documentation, such as this page for information on HuggingFace cache cleaning.

What to do if I have a broken conda or mamba environment?

There are a variety of possible causes that a conda (or mamba) virtual environment might no longer function, even if it worked in the past, so there is not a single answer to cover all cases. There are two general approaches: either to start over with a new environment, or to repair the existing environment.

Start fresh with a new environment

One approach, and generally the simplest and most reliable, is to create a new environment and start again following the methods outlined in this how-to article.

In some cases, that may not be sufficient. For example, if one has inadvertently installed packages using pip while not within an activated virtual environment, those packages may end up in .local, and they may conflict with packages within a virtual environment. In that case, it may be needed clean up .local/lib and .local/bin. Check whether either of those directories exist with ls .local, then ls .local/lib to see whether that directory contains folder or files with names containing "python". If so, one can clean these directories in a reversible way (to avoid deleting something that may be needed by another application) by renaming (instead of deleting) those directories

mv .local/lib .local/lib_bak
mv .local/bin .local/bin_bak
This issue can be avoided in the future by first conda install pip (or mamba install pip) within your activated virtual environment before installing any packages with pip. Do NOT modify .local/share because that directory may contain important configuration settings for other applications.

Check version compatibility: Sometimes, in order to get packages working in a new environment, a specific package might require an older (or newer) version of python; check documentation about that package. In that case, one can create a new environment with a specific python version, e.g.:

conda create --name myenv python=3.10
In other cases, a specific version of a package may be needed for compatibility with other packages in an environment, which can be done as:
conda install <package_name>=<version_number>

Repair the environment

Another approach, though not guaranteed to work, is to attempt to repair the virtual environment. Some possible steps (not a comprehensive guide) that may help in some cases are below. Update packages:

conda update --all
Remove and re-install a specific package that is giving errors:
conda remove <package_name>
conda install <package_name>
Also, check version compatibility of packages, and reinstall specific packages, if needed.