Skip to content

Containers (Singularity) Tutorial

This tutorial demonstrates how to use Singularity to create a container based software environment for use on the ScienceCluster (or elsewhere).

Workflow Overview

Before beginning, it's helpful to understand the basic parts of a Singularity workflow:

  1. Acquire / make a Singularity image file and optionally customize it for your needs.
    • This step will require you to select a pre-existing Singularity image (from Docker Hub) and use it without any customization, or to select such a pre-existing Singularity image and customize it (i.e., add new software to it) using a definition file.
  2. Transfer the image file to the ScienceCluster and create a "sandbox" directory from it.
    • This step involves simply transferring the image file to the ScienceCluster (if it's not already there) and simply run a single command to "unpack it" for production use.
  3. Prepare your code to run from the Singularity environment.
    • This step will involve augmenting your Slurm submission script to use the Singularity container.

Acquiring / Making a Singularity Image File

Pulling an Image File from DockerHub

The first step in a Singularity workflow is to acquire or make, and potentially customize, a Singularity Image File (.sif). A Singularity Image File is the basis of the container technology, and this file includes an encapsulated environment that can be fully transferred, then run, to any other computer system that has Singularity installed. As a general rule, pre-prepared Singularity image files are downloaded or referenced from DockerHub.

For example, if you would like to acquire the latest GPU-compatible container version of TensorFlow, you would run the following code on the ScienceCluster:

# Load the necessary modules in the cluster
module load generic singularity
# Change to the /data directory
cd data
# Pull the image directly from DockerHub
singularity pull docker://tensorflow/tensorflow
Running a singularity pull command will result in the specified Singularity Image File being downloaded to the present working directory in the ScienceCluster. In the example above, you'll have a file titled tensorflow_latest.sif in your /data directory.

You can find available Docker/Singularity images from their relevant pages on Docker Hub. For TensorFlow, the official Docker Hub page is here.

Once you've found the specific version of software that you want, copy the relevant container ID from the provided docker pull ... command. Specifically, the singularity pull docker://tensorflow/tensorflow command was adapted from docker pull tensorflow/tensorflow, which is found directly on the TensorFlow DockerHub page. Moreover, you can view various available versions of TensorFlow via the Tags subpage. The text that appears after docker pull must be inserted after the docker:// part of the singularity pull command.

If you're confident that all of the software required for your analysis is within the downloaded image file, you can move forward with building the sandbox directory so that you can efficiently use the container. If you need to install additional software into the container, you'll need to use a definition file to add code to the image file instead of pulling it directly via singularity pull ....

Creating a custom image from a definition file

If you need to add software to a pre-existing Singularity Image file, you'll need to "bootstrap" a Singularity Image using a definition file. "Bootstrapping" the image means selecting the desired pre-existing from DockerHub then installing additional software into it. Because it requires superuser (sudo) privileges, this specific process needs to be done either on a ScienceCloud VM (using a source VM image that has Singularity preinstalled; e.g., ***Singularity 3.8 Ubuntu 20.04 (2021-07-06)) or on your own computer. The installation directions can be found here. After the *.sif file has been created, you'll transfer it to the ScienceCluster for production use.

Once you have Singularity available via a ScienceCloud VM or via your own laptop, you'll need to make a definition file. The definition file is a plain text file that selects the starting Singularity image that you'll bootstrap as well as additional commands that will add more software to the container.

An example Singularity definition file might look like the following:

Bootstrap: docker
From: tensorflow/tensorflow

%post
    pip install pandas
This example uses the same TensorFlow container from DockerHub as above but includes a %post section. This section allows users to define specific commands that can augment the container. In this definition file, pip is simply used to install the pandas package in the tensorflow/tensorflow container's Python environment. The pip program is available as it was already installed in the Tensorflow container.

To find out what software is available in a container, you can either research the existing Docker Hub information on the container or use the singularity shell command to explore the container interactively. The singularity shell command can be used either directly on a Singularity image file or on a Singularity sandbox directory (described in the section below).

A more complex example of a Singularity definition file might look like the following:

Bootstrap: docker
From: rocker/tidyverse:4.0.3

%post
    apt-get update && . /etc/environment
    wget sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Source/JAGS-4.3.0.tar.gz  -O jags.tar.gz
    tar -xf jags.tar.gz
    cd JAGS* && ./configure && make -j4 && make install
    cd ~
    apt-get update && . /etc/environment
    wget sourceforge.net/projects/jags-wiener/files/JAGS-WIENER-MODULE-1.1.tar.gz  -O jagswiener.tar.gz
    tar -xf jagswiener.tar.gz
    cd JAGS-WIENER-MODULE-1.1 && ./configure && make -j4 && make install
    R -e "install.packages('runjags')"

Notice that this example uses many operating system commands to prepare/install system level packages; for example, apt-get update and make install. You can use these commands because the rocker/tidyverse:4.0.3 container is built using Ubuntu 20 as the operating system. To determine this, you can pull the pre-built Singularity Image File from Docker hub using singularity pull docker://rocker/tidyverse:4.0.3 then use singularity shell tidyverse_4.0.3.sif to open a Command Line directly within the Container. When in the command line, you can run lsb_release -a to find the operating system version.

Once you've saved your definition file as a text file (e.g., using the file name recipe.def), you can then try to build a Singularity Image File from it using:

sudo singularity build tensorflow.sif recipe.def
In this example, the outputted *.sif file will be named tensorflow.sif. However, you can edit this name to be whatever you'd like.

Note

The process of creating a Singularity Image File from a definition file will often take a significant amount of trial and error. Be patient and persistent. Use singularity shell on each of the image files you create to open your environment and confirm whether you can load your software/packages of interest. Do not run your code from the singularity shell; instead, see the final section below on how to augment your cluster submission script to run your code.

Transferring the Image to the Cluster and Creating a Sandbox Directory

Once you've created a Singularity Image File, you should transfer it to the ScienceCluster.

When the file is located in the ScienceCluster, you should then create a "sandbox" directory from it. A "sandbox" directory is an "unpacked" Singularity Image File. In explanation: when you use a Singularity Image File without sandboxing it, your workflow will need to "unpack" the files to access and run them; if you create a sandbox directory from the image file, then you can "unpack" the Singularity Image File once (and only once) then save yourself from unpacking it during future uses of the software.

To unpack a Singularity Image file titled tensorflow_latest-gpu.sif into a directory titled tensorflow_latest_sandbox, your command would be:

singularity build --sandbox tensorflow_latest_sandbox tensorflow_latest-gpu.sif

Once this command has been run, which might take anywhere from several seconds to several minutes, you'll end up with a directory titled tensorflow_latest_sandbox in the place where you ran this command. At this point, you're ready to use (and re-use) your prepared Singularity container environment.

Prepare your code to run from the Singularity environment

The final step to a Singularity workflow is preparing your code to use the environment. This step involves editing the Slurm submission script. Take for example the following submission script (from this documentation page):

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=3000
#SBATCH --gres gpu:1
module load vesta
module load anaconda3
source activate tensorflowexample
srun python examplecode.py

This code assumes that the anaconda3 module has already been loaded, as it uses the source activate tensorflowexample line to activate this prepared environment, and it runs the examplecode.py script using python. To change this submission script workflow to use a Singularity container environment, consider the following submission script:

#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=3000
#SBATCH --gres gpu:1
module load vesta
module load singularity
srun singularity exec \
                    -B /data -B /scratch \
                    -B /net/cephfs/data \
                    -B /net/cephfs/scratch \
                    --nv \
                    tensorflow_latest_sandbox \
                    python examplecode.py

The principal edits to this script are:

  • source activate tensorflowexample has been replaced by module load singularity, which shows that Anaconda is no longer being used as the environment manager; Singularity will instead be used.
  • Instead of simply using srun python examplecode.py to run the script of interest, there is now an extended singularity exec command. This command executes an arbitrary command of interest using the specified Singularity container environment. The flags to this command are crucial to understand:
    • The various -B flags will "bind" directories to the container; in other words, it will make these directories available to the container, since by definition the container runs by default as an isolated environment;
    • If you need to allow your Singularity container to access files in the /data and/or /scratch locations on the ScienceCluster, it's recommended that you bind in all of the locations referenced in this tutorial. E.g., -B /data -B /net/cephfs/data for the /data directory, and -B /scratch -B /net/cephfs/scratch for the /scratch directory.
    • The --nv flag allows the container to access NVIDIA drivers so that the container can take advantage of available GPU's (see here).
    • The tensorflow_latest_sandbox flag specifies the sandbox directory created from the Singularity Image File.
    • The \ operators are used to continue a single Bash command across multiple lines.

Once you've augmented your Slurm Submission script, your code is ready to be submitted using a standard submission workflow (i.e., via sbatch after loading modules of interest).

Using an interactive session to explore a Singularity Container

As mentioned previously, it's often helpful to explore a Singularity container's environment interactively rather than via a submission script—especially when creating the Singularity Image file. To do so, first request and receive an appropriate interactive session on a ScienceCluster node, which will ensure you don't use too many resources on a login node.

Once you've received an interactive session that meets your computational needs, and after you've loaded the required modules (e.g., module load generic singularity), you can then request a Singularity shell prompt from the Command Line that will open inside of the container environment. For example:

singularity shell -B /data -B /scratch -B /net/cephfs/data -B /net/cephfs/scratch --nv tensorflow_latest_sandbox

will open a singularity shell using the tensorflow_latest_sandbox Singularity sandbox container.

Singularity Cache Cleaning

If you commonly unpack Singularity Image Files on the ScienceCluster, you may notice that your /home folder is filling up with data. This is likely due to the Singularity cache being filled with reference files from the unpacking processes. To clean the cache, simply run:

singularity cache clean


Last update: March 21, 2022