GPUs on ScienceCloud¶
It is possible to launch an instance with GPU support.
To make use of a GPU device on your ScienceCloud instance, you need to select one of the "GPU enabled" flavors in the launch wizard.
"GPU enabled" flavors behave differently than the "normal" ones: please take a look at the section: GPU instances specific caveats.
GPU models available on the cloud¶
There are currently two GPU models that you can choose from:
NVIDIA Tesla P4 with 8 GB of onboard RAM.
NVIDIA Tesla T4 with 16 GB of onboard RAM
You can find more information, as well as the respective datasheets, on the NVIDIA website.
GPU enabled flavors¶
The "GPU enabled" flavors are public, i.e. any user can launch an instance with a GPU attached without the need for further interaction with S3IT support.
The naming of the flavors follows the usual scheme with the addition of the suffix "-gpu" followed by the model of the GPU device:
- gpuP4 for the NVIDIA Tesla P4
- gpuT4 for the NVIDIA Tesla T4
Available GPU flavors¶
Xeon Gold 6126
1x nVidia Tesla P4
32cpu-128ram-hpcv3-gpuT4 (on request)
AMD EPYC 7702
1x nVidia Tesla T4
There is an additional cost for using a GPU enabled flavor as outlined in the "Service Description: ScienceCloud" document, reachable from the S3IT terms and conditions page (UZH shortname login required).
The T4 and P4 models have the same cost.
You can estimate the total cost by adding to the "regular" flavor pricing the GPU cost specified in the pricing document.
Images with NVIDIA Driver and CUDA preinstalled¶
S3IT provides public images suitable for NVIDIA GPU specific usage.
They come with the Nvidia Driver and CUDA preinstalled and ready to use.
You can find the latest image version searching for a public image whose name starts with "***CUDA" (for example "***CUDA 10.2 on Ubuntu 18.04 (2020-03-27)").
We regularly update the "CUDA" images with the latest packages and retire the oldest ones, as we do for the other public images.
GPU instances specific caveats¶
Instances with GPU support behave differently from the regular ones. The principal differences are:
Pause, Suspend, Shelve and Resize actions are not supported with GPU flavors.
The actions above mentioned are not guaranteed to work.
Triggering one of those action on a GPU enabled instance might result in an unrecoverable Error state and should not be attempted.
If you resize an instance with no GPU support to a GPU enabled flavor, the resulting instance will lack GPU support even if running on a GPU enabled flavor.
If you need to adjust the size of your instance, or add GPU support to an existing instance, you need to shut down the instance, take a snapshot, and launch a new instance using the snapshot as the boot source.
Once you have checked everything works as expected you can then delete the old instance.
Migration and live-migration actions are not possible. This means that it is not possible for us to move your instance from one physical server to another when we need to perform maintenance on the underlying hardware.
When system maintenance needs to be performed on these servers, the instance must be shut down in the best case scenario, or even deleted in the worst case.
You are thus strongly advised to take regular backups of your work on a GPU enabled instance.
There are a total of 80 T4 devices and 4 P4 devices regardless of the flavors chosen (cpu/ram size).
You can check the current availability status of GPU enabled flavors under the ScienceCloud availability report.
Feedback is welcome and needed¶
Any kind of feedback regarding GPUs on ScienceCloud is very welcome. This service is new and we are looking for ways to improve it: your feedback will enable us to both optimize our service and to better meet your needs.