This Section describes the bare minimum to get you started submitting jobs to the SLURM Linux HPC resource. For access permission to Linux HPC, please see Section Access.
First, log in by connecting over ssh to hpc-batch:
Before launching a job, you need to select an MPI environment. Running
module avail will list available MPI distributions.
To load MVAPICH2-2.3 which is a stable distribution that works well on this cluster, use the following.
module load mpi/mvapich2/2.3
Submit your job to one of the available partitions. For short job durations (<48h), you may submit to the short partition, otherwise please submit to the long partition. You may see the state of the cluster, including partitions and how many nodes are available on each partition using
squeue will display currently running (or queued) jobs for every partition.
When submitting a job, as a bare minimum you are required to specify:
- The partition (
- The maximum runtime (walltime) for your job. (
- The number of tasks (
For instance, for submitting a 64-task job to the batch-short partition with a time limit of 1h, you may use the following:
srun -p inf-short -t 1:00:00 -n 64 ./mpi_program parameters
For more information regarding srun parameters, please refer to the srun SLURM documentation.
The main limitation of
srun is that it will block your terminal until the job is finished running. For more traditional batch submission, you would use
sbatch instead of
srun, and you would put the parameters into a batch submission file. This would be the recommended way of working, although
srun can be useful for trying quick runs.
The equivalent way of launching the above command using
sbatch would be as follows. Imagine we have the following batch submission file called
#SBATCH -p inf-short
#SBATCH -t 1:00:00
#SBATCH -n 64
srun ./mpi_program parameters
And you would submit this job as follows:
At this moment, SLURM will immediately queue the job into batch-short, give us the newly created job ID, and return us to the shell. At this point it is possible to check the state of our submitted jobs using:
squeue -u $USER
Or a specific job, say JobID 100, using:
squeue -j 100
You may also cancela job at any time using
scancel. To cancel a job just append the (comma separated list of) JobID(s) to
For more information on these SLURM commands, please use the official SLURM documentation. You may access via the man pages from your terminal (e.g.
man squeue), or the web.