Quick Start Guide

Prerequisites

Login to a machine that is configured to submit jobs to the HTCondor batch system. To most people, that means logging into lxplus.cern.ch

You need to ensure that you have kerberos tickets, as this will authenticate you and your job. Run kinit to refresh tokens as necessary.

Submit file

Here's an example submit file:

executable            = hello_world.sh
arguments             = $(ClusterId) $(ProcId)
output                = output/hello.$(ClusterId).$(ProcId).out
error                 = error/hello.$(ClusterId).$(ProcId).err
log                   = log/hello.$(ClusterId).log
queue

Let's go through that line by line.

executable: The script or command you want HTCondor to run.
arguments: Any arguments that could be passed to the command. We're using interpolated values here. ClusterId will normally be unique to each submit file. There are circumstances in which one submit file can create multiple clusters, but let's ignore that for now. ProcId is incremented by one for each job in each cluster. In this simple example, which is defining a single job, the value of ProcId will be 0.
output: where the STDOUT of the command or script should be written to. This can be a relative or absolute path. HTCondor won't create the directory for you though, and will error if it doesn't exist. Note the use of interpolation again to split up the output.
error: where the STDERR of the command or script would be written to. Same rules apply as output.
log: This is the output of HTCondor's logs for your jobs, not any logging your job itself will perform. It will show the submission times, execution host and times, and on termination will show stats.
queue: This schedules the job. It becomes more important (along with the interpolation) when queue is used to schedule multiple jobs by taking an integer as a value.

Submitting the job

On lxplus (or another configured submit host) you simply need to run the condor_submit command:

$ condor_submit hello.sub
Submitting job(s).
1 job(s) submitted to cluster 70.

You will note the reference again to "cluster", this output shows the "ClusterId" referred to in the submit file. Normally, you get one cluster per run of condor_submit.

Monitoring the job

You can see the job and its current state using the condor_q command:

$ condor_q


-- Schedd: bigbird01.cern.ch : <128.142.194.108:9618?...
OWNER   BATCH_NAME       SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS
bejones CMD: hello.sh  10/3  14:08      _      _      1      _ 70.0

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

Note the job id "70.0" showing the cluster id (70) and the process id (0). We only have one subjob in this submit file, so we only have job id 0.

Rather than monitoring the job using repeated invocations of condor_q, you can use condor_wait:

$ condor_wait -status log/hello.70.log
70.0.0 submitted
70.0.0 executing on host <188.185.180.233:9618?addrs=188.185.180.233-9618+[--1]-9618&noUDP&sock=23729_b2e3_13>
70.0.0 completed
All jobs done.

Last update: November 26, 2019