Python Example Script with TensorFlow¶
This tutorial demonstrates the basics of how to create a Python environment on ScienceCluster with specific packages of interest, in this case TensorFlow with GPU compute.
Preparing the environment¶
After connecting from a terminal, work through the following steps
# load the gpu module
module load gpu
# request an interactive session, which allows the package installer to see the GPU hardware
srun --pty -n 1 -c 2 --time=01:00:00 --gres=gpu:1 --mem=8G bash -l
# (optional) confirm the gpu is loaded
nvidia-smi
# use mamba (drop-in replacement for conda)
module load mamba
# create a virtual environment and install packages
mamba create -n venv-tf tensorflow cudatoolkit
# use the virtual environemnt
source activate venv-tf
# confirm that the GPU is correctly detected
python -c 'import tensorflow as tf; print("Num GPUs Available:", len(tf.config.list_physical_devices("GPU")));print("TF version:",tf.__version__)'
# when finished with your test, close the interactive cluster job
conda deactivate
exit
You can always use the srun
command above to create an interactive shell with GPU hardware.
If you would like to use your TensorFlow with Jupyter and ScienceApps, see the documentation about installing the environment as an ipython kernel.
Preparing a job submission script¶
Once the virtual environment is created and packages installed, it can then be activated from within the job submission script.
First, create a file called examplecode.py
with the following command:
cat << EOF > examplecode.py
import tensorflow as tf;
print(tf.config.list_physical_devices('GPU'));
print();
tf.test.gpu_device_name();
print();
tf.test.is_built_with_cuda();
print();
from tensorflow.python.client import device_lib;
print(device_lib.list_local_devices())
EOF
Then, similarly create the submission script:
cat << EOF > tfsubmission.sh
#!/bin/bash
#SBATCH --time=00:10:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4GB
#SBATCH --gres gpu:1
module load gpu
module load mamba
source activate venv-tf
srun python examplecode.py
EOF
You can check the contents of these files with cat examplecode.py
and cat tfsubmission.sh
.
Note
⚠️ Please observe that the --gres gpu:1
flag is included in this batch submission script. Slurm will reject any jobs submitted to the GPU nodes without this flag.
Submitting the job¶
To submit this script for processing (after the modules have been loaded and the Conda environment has been created), simply run
sbatch tfsubmission.sh
When submitted, the console should print a message similar to
Submitted batch job <jobid>
where <jobid>
is the Job ID numeric code assigned by the SLURM Batch Submission system.
Understanding job outputs¶
When the job runs to completion (provided your submitted code does not produce any errors) any/all files outputted by your script should have been written to their designated locations and a file named slurm-<jobid>.out
should exist from where you submitted the script, unless you specified otherwise. This file contains the printed output from your job.