Resources

Linux HPC provides different kinds of HPC resources to its users. These mainly consist of different compute queues (i.e. SLURM partitions). Some of these partitions providing different hardware capabilities. In addition, there are two separate parallel filesystems. These are meant as a scratch space for performing application I/O when loading or generating data.

Partitions

There are several partitions available in Linux HPC:

inf-short
inf-long
photon
phodev
muon
mudev

Short partitions are meant for shorter, more interactive runs. We recommend that you use short partitions mostly for trying out your application and basic performance or scalability testing. Long partitions are meant for the heavier, longer-running jobs when you are confident that your application will work on a larger number of nodes, or will be stable enough to run for extended periods of time. However, that is just a guideline, and you may of course run on any partition. Especially if one of the partitions is filled up and the other is free, you may submit to the partition with free resources, as long as it fits the timelimit.

Hence inf-short (5 days) is recommended for typical jobs on the inf- partitions, while mudev and phodev are intented for short test and development jobs on respectively muon and photon partitions.

Note that nodes within each partition are homogeneous.

Photon partition nodes (EL9)

CPU: 2x AMD EPYC 7302 16-Core Processor (total of 32 physical cores, no hyperthreading)
Memory: 512GB DDR4 3200Mhz
Network:
- Infiniband 100Gbps
- Mellanox ConnectX-6 HCAs
Storage:
- Hyperconverged CephFS for /hpscratch (Over 10GbE) _ MPI:
- OpenMPI-4
- MVAPICH2-2.3
- Operating System: RHEL9.3

Inf partition nodes (EL9)

CPU: 2x Intel(R) Xeon(R) CPU E5-2630 v4 (20 physical cores, 40 hyperthreaded)
Memory: 128GB DDR4 2400Mhz (8x 16GiB 18ASF2G72PDZ-2G3B1 DIMMs)
Network:
- Infiniband interconnect, Mellanox MT27500 ConnectX-3
- Integrated Intel 10Gbit ethernet for storage interconnect, system services
Storage:
- Hyperconverged CephFS for /hpcscratch (over 10GbE)
- 960GB Intel S3520 SATA3 for local scratch _ MPI:
- OpenMPI-4
- MVAPICH2-2.3
- Operating System: RHEL9.3

Muon partition nodes (EL9)

CPU: 2x Intel(R) Xeon(R) Gold 6442Y (48 physical cores, 96 hyperthreaded)
Memory: 512GB DDR5 4800 Mhz (16x 32GB DDR5)
Network:
- Infiniband HDR interconnect, Mellanox ConnextX-6 HCAs
- Integrated Intel 10Gbit ethernet for storage interconnect, system services
Storage:
- Hyperconverged CephFS for /hpcscratch (over 10GbE)
- 1.8 TB NVMe SSD for local scratch _ MPI:
- OpenMPI-4
- MVAPICH2-2.3
- Operating System: RHEL9.3

The old "hpc-qcd" nodes have been retired.

MPI versions

We strongly recommend leaving behind OpenMPI-3 and using OpenMPI-4 instead, as it will provide much better performance. We will only support the latest mvapich2 and OpenMPI-4 versions, meaning older versions may be removed. If your application requires an older MPI version to work, please get in touch with us.

Note that OpenMPI-3 and OpenMPI-4 are ABI compatible. Similarly, mvapich2/2.2 and mvapich2/2.3 are also ABI compatible. This means that if your application was compiled for one version, it should run without issues on the other ABI-compatible version.

For instance, let's say that your application was compiled with mvapich/2.2. You may still run it under Photon, where only mvapich2/2.3 is installed, provided you do module load mpi/mvapich2/2.3 as part of your job submission script.

I/O Scratch spaces

I/O scratch spaces for project data are all based on CephFS. The main scratch space is /hpcscratch, which is where user home directories and project directories are located. Parallel programs are expected to perform I/O on this space as no tokens are required. The home and scratch space area /hpcscratch is a Hyperconverged CephFS cluster. This means that the compute nodes of the inf partitions are also working as the storage nodes for /hpcscratch data. This has two immediate consequences: First, I/O access is faster as the data is closer (especially compared to the "old" /hpcscratch), and the services is more resilient to datacenter network incidents. Second, if you are measuring the performance of highly CPU-optimized codes and run at 100% utilization, you may observe some noise or performance variability. This is due to the fact that another user job doing intensive I/O may cause a CephFS process co-located with your job to compete for CPU. The same applies to I/O performance. For most users this will not be noticeable, but it may become visible.

For QCD users, the legacy /cephfs mount from the earlier QCD setup is also available.

Note that while running applications installed on AFS or EOS is fully supported, writing program outputs directly to AFS or EOS is not supported. Users are expected to transfer result files from the scratch space to EOS.

If your application does local caching or I/O on each worker node, it is recommended to use a local disk like /tmp for such I/O, and only get results and snapshots back to the shared scratch file.

IMPORTANT: Please note that while the scratch space provides redundancy to prevent data loss, there are NO BACKUPS.

Last update: October 8, 2024