Skip to content

Python Example Script with TensorFlow

This tutorial demonstrates the basics of how to create a Python environment on ScienceCluster with specific packages of interest, in this case TensorFlow with GPU compute.

Preparing the environment

After connecting from a terminal, work through the following steps

# load the gpu module

module load gpu

# request an interactive session, which allows the package installer to see the GPU hardware

srun --pty -n 1 -c 2 --time=01:00:00 --gres=gpu:1 --mem=8G bash -l

# (optional) confirm the gpu is loaded

nvidia-smi

# use mamba (drop-in replacement for conda)

module load mamba

# create a virtual environment and install packages

mamba create -n venv-tf tensorflow cudatoolkit

# use the virtual environemnt

source activate venv-tf

# confirm that the GPU is correctly detected

python -c 'import tensorflow as tf; print("Num GPUs Available:", len(tf.config.list_physical_devices("GPU")));print("TF version:",tf.__version__)'

# when finished with your test, close the interactive cluster job

conda deactivate
exit

You can always use the srun command above to create an interactive shell with GPU hardware.

If you would like to use your TensorFlow with Jupyter and ScienceApps, see the documentation about installing the environment as an ipython kernel.

Preparing a job submission script

Once the virtual environment is created and packages installed, it can then be activated from within the job submission script.

First, create a file called examplecode.py with the following command:

cat << EOF > examplecode.py
import tensorflow as tf;
print(tf.config.list_physical_devices('GPU'));
print();
tf.test.gpu_device_name();
print();
tf.test.is_built_with_cuda();
print();
from tensorflow.python.client import device_lib;
print(device_lib.list_local_devices())
EOF

Then, similarly create the submission script:

cat << EOF > tfsubmission.sh
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4GB
#SBATCH --gres gpu:1
module load gpu
module load mamba
source activate venv-tf
srun python examplecode.py
EOF

You can check the contents of these files with cat examplecode.py and cat tfsubmission.sh.

Note

⚠️ Please observe that the --gres gpu:1 flag is included in this batch submission script. Slurm will reject any jobs submitted to the GPU nodes without this flag.

Submitting the job

To submit this script for processing (after the modules have been loaded and the Conda environment has been created), simply run

sbatch tfsubmission.sh

When submitted, the console should print a message similar to

Submitted batch job <jobid>

where <jobid> is the Job ID numeric code assigned by the SLURM Batch Submission system.

Understanding job outputs

When the job runs to completion (provided your submitted code does not produce any errors) any/all files outputted by your script should have been written to their designated locations and a file named slurm-<jobid>.out should exist from where you submitted the script, unless you specified otherwise. This file contains the printed output from your job.


Last update: January 11, 2023