Our cluster provides the following hardware resources.
If you do not load any hardware modules, your job will end up on the first available node. The node may have either AMD or Intel CPU. If you need a specific CPU type, you would need to load the corresponding module, e.g.
module load amd.
Some nodes have Infiniband connections, which make them ideal for running multi-node jobs. To request such nodes, you need to load the Infiniband module, e.g.
module load infiniband. Specific CPU features such as avx512 instruction set can also be selected via modules.
There are three types of GPUs available: NVIDIA A100, NVIDIA V100, NVIDIA T4. The A100 and V100 have nvlink (high performance bandwidth interconnect among GPUs). These machines are suitable for multi-GPU workloads. In addition, they are suitable for multi-node multi-GPU jobs as they are equipped with four Infiniband network interfaces per node with 1 connection per 2 GPUs. T4s are meant for single-GPU jobs only.
For regular single-GPU jobs, where you are not concerned about GPU type, you can load the gpu module:
module load gpu. Your job will be scheduled on the first available node. If you need a specific GPU type, you can request it by loading the corresponding hardware module, e.g.
module load a100.
V100 nodes have two flavours. They come with either 16GB or 32GB of the onboard GPU memory. A100 nodes have 80GB. If your job requires a lot of GPU memory, you may want to request A100 specifically.
For multi-GPU and multi-node multi-GPU jobs, you would need to request either V100 or A100 specifically. Alternatively, you can load the multigpu module, i.e.
module load multigpu.
When you load any of the GPU modules, you will get access to additional hardware-specific modules, e.g. cuda, cudnn, OpenMPI (with GPU and Infiniband support.)
CPUs per Node
CPU Mem per Node (GB)
GPU Mem (GB)
GPUs per Node