Quickstart
This Section describes the bare minimum to get you started submitting jobs to the SLURM Linux HPC resource. For access permission to Linux HPC, please see Section Access.
First, log in by connecting over ssh to hpc-batch:
ssh hpc-batch.cern.ch
Before launching a job, you need to select an MPI environment. Running module avail
will list available MPI distributions.
To load MVAPICH2-2.3 which is a stable distribution that works well on this cluster, use the following.
module load mpi/mvapich2/2.3
Submit your job to one of the available partitions. You may see the state of the cluster, including partitions, max job duration and how many nodes are available on each partition using sinfo
. Using squeue
will display currently running (or queued) jobs for every partition.
When submitting a job, as a bare minimum you are required to specify:
- The partition (
-p
argument) - The maximum runtime (walltime) for your job. (
-t
argument) - The number of tasks (
-n
argument)
For instance, for submitting a 64-task job to the inf-short partition with a time limit of 1h, you may use the following:
srun -p inf-short -t 1:00:00 -n 64 ./mpi_program parameters
For more information regarding srun parameters, please refer to the srun SLURM documentation.
The main limitation of srun
is that it will block your terminal until the job is finished running. For more traditional batch submission, you would use sbatch
instead of srun
, and you would put the parameters into a batch submission file. This would be the recommended way of working, although srun
can be useful for trying quick runs.
The equivalent way of launching the above command using sbatch
would be as follows. Imagine we have the following batch submission file called myjob
:
#!/usr/bin/bash
#SBATCH -p inf-short
#SBATCH -t 1:00:00
#SBATCH -n 64
srun ./mpi_program parameters
And you would submit this job as follows:
sbatch myjob
At this moment, SLURM will immediately queue the job into batch-short, give us the newly created job ID, and return us to the shell. At this point it is possible to check the state of our submitted jobs using:
squeue -u $USER
Or a specific job, say JobID 100, using:
squeue -j 100
You may also cancela job at any time using scancel
. To cancel a job just append the (comma separated list of) JobID(s) to scancel
:
scancel 100
For more information on these SLURM commands, please use the official SLURM documentation. You may access via the man pages from your terminal (e.g. man squeue
), or the web.