Quick Start Guide
Login to a machine that is configured to submit jobs to the HTCondor batch system. To most people, that means logging into lxplus.cern.ch
You need to ensure that you have kerberos tickets, as this will authenticate you and your job. Run kinit to refresh tokens as necessary.
Here's an example submit file:
executable = hello_world.sh arguments = $(ClusterId) $(ProcId) output = output/hello.$(ClusterId).$(ProcId).out error = error/hello.$(ClusterId).$(ProcId).err log = log/hello.$(ClusterId).log queue
Let's go through that line by line.
- executable: The script or command you want HTCondor to run.
- arguments: Any arguments that could be passed to the command. We're using interpolated values here. ClusterId will normally be unique to each submit file. There are circumstances in which one submit file can create multiple clusters, but let's ignore that for now. ProcId is incremented by one for each job in each cluster. In this simple example, which is defining a single job, the value of ProcId will be 0.
- output: where the STDOUT of the command or script should be written to. This can be a relative or absolute path. HTCondor won't create the directory for you though, and will error if it doesn't exist. Note the use of interpolation again to split up the output.
- error: where the STDERR of the command or script would be written to. Same rules apply as output.
- log: This is the output of HTCondor's logs for your jobs, not any logging your job itself will perform. It will show the submission times, execution host and times, and on termination will show stats.
- queue: This schedules the job. It becomes more important (along with the interpolation) when queue is used to schedule multiple jobs by taking an integer as a value.
Submitting the job
On lxplus (or another configured submit host) you simply need to run the condor_submit command:
$ condor_submit hello.sub Submitting job(s). 1 job(s) submitted to cluster 70.
You will note the reference again to "cluster", this output shows the "ClusterId" referred to in the submit file. Normally, you get one cluster per run of condor_submit.
Monitoring the job
You can see the job and its current state using the condor_q command:
$ condor_q -- Schedd: bigbird01.cern.ch : <220.127.116.11:9618?... OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS bejones CMD: hello.sh 10/3 14:08 _ _ 1 _ 70.0 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
Note the job id "70.0" showing the cluster id (70) and the process id (0). We only have one subjob in this submit file, so we only have job id 0.
Rather than monitoring the job using repeated invocations of condor_q, you can use condor_wait:
$ condor_wait -status log/hello.70.log 70.0.0 submitted 70.0.0 executing on host <18.104.22.168:9618?addrs=22.214.171.124-9618+[--1]-9618&noUDP&sock=23729_b2e3_13> 70.0.0 completed All jobs done.