Skip to content

Quickstart

This Section describes the bare minimum to get you started submitting jobs to the SLURM Linux HPC resource. For access permission to Linux HPC, please see Section Access.

First, log in by connecting over ssh to hpc-batch:

ssh hpc-batch.cern.ch

Before launching a job, you need to select an MPI environment. Running module avail will list available MPI distributions. To load MVAPICH2-2.3 which is a stable distribution that works well on this cluster, use the following.

module load mpi/mvapich2/2.3

Submit your job to one of the available partitions. For short job durations (<48h), you may submit to the short partition, otherwise please submit to the long partition. You may see the state of the cluster, including partitions and how many nodes are available on each partition using sinfo. Using squeue will display currently running (or queued) jobs for every partition.

When submitting a job, as a bare minimum you are required to specify:

  • The partition (-p argument)
  • The maximum runtime (walltime) for your job. (-t argument)
  • The number of tasks (-n argument)

For instance, for submitting a 64-task job to the batch-short partition with a time limit of 1h, you may use the following:

srun -p inf-short -t 1:00:00 -n 64 ./mpi_program parameters

For more information regarding srun parameters, please refer to the srun SLURM documentation.

The main limitation of srun is that it will block your terminal until the job is finished running. For more traditional batch submission, you would use sbatch instead of srun, and you would put the parameters into a batch submission file. This would be the recommended way of working, although srun can be useful for trying quick runs. The equivalent way of launching the above command using sbatch would be as follows. Imagine we have the following batch submission file called myjob:

#!/usr/bin/bash
#SBATCH -p inf-short
#SBATCH -t 1:00:00
#SBATCH -n 64

srun ./mpi_program parameters

And you would submit this job as follows:

sbatch myjob

At this moment, SLURM will immediately queue the job into batch-short, give us the newly created job ID, and return us to the shell. At this point it is possible to check the state of our submitted jobs using:

squeue -u $USER

Or a specific job, say JobID 100, using:

squeue -j 100

You may also cancela job at any time using scancel. To cancel a job just append the (comma separated list of) JobID(s) to scancel:

scancel 100

For more information on these SLURM commands, please use the official SLURM documentation. You may access via the man pages from your terminal (e.g. man squeue), or the web.


Last update: May 2, 2022