Resources¶

Our cluster provides the following hardware resources.

CPUs¶

If you do not load any hardware modules, your job will end up on the first available node. The node may have either AMD or Intel CPU. If you need a specific CPU type, you would need to load the corresponding module, e.g. module load amd.

Some nodes are equipped with Infiniband connections, making them ideal for running multi-node jobs that require memory sharing among the nodes. Please note that you do not need Infiniband if your processes never pass any data to other potentially remote processes. To request such nodes, you need to load the Infiniband module, e.g. module load infiniband. Specific CPU features, such as the avx512 instruction set, can also be selected via modules.

GPUs¶

There are three types of NVIDIA GPUs available: H100, A100, V100, T4. The H100, A100 and V100 nodes have nvlink (high performance bandwidth interconnect among GPUs). These machines are suitable for multi-GPU workloads. T4s are meant for single-GPU jobs only.

For regular single-GPU jobs, where you are not concerned about GPU type, you can load the gpu module: module load gpu. Your job will be scheduled on the first available node. If you need a specific GPU type, you can request it by loading the corresponding hardware module, e.g. module load a100.

V100 nodes have two flavours. They come with either 16GB or 32GB of the onboard GPU memory. A100 nodes have 80GB. If your job requires a lot of GPU memory, you may want to request A100 specifically.

For multi-GPU and multi-node multi-GPU jobs, you would need to request either V100 or A100 specifically. Alternatively, you can load the multigpu module, i.e. module load multigpu.

When you load any of the GPU modules, you will get access to additional hardware-specific modules, e.g. cuda, cudnn, OpenMPI (with GPU and Infiniband support.)

Hardware¶

CPUs per Node	CPU Mem per Node (GB)	GPU Type	GPU Mem (GB)	GPUs per Node	Chipset	Infiniband	Total Nodes
8	32				AMD		10
32	124				AMD		28
48	376				Intel	✓	17
126	484				AMD		9
256	4032				Intel		2
48	1512	A100	80	8	AMD		5
192	1512	H100	80	8	Intel		2
16	60	T4	16	1	AMD		10
128	504	T4	16	8	AMD		2
48	376	V100	16	8	Intel		2
80	756	V100	32	8	Intel		6