Exercise 1a: Job Submission
The aim of this exercise is to submit a simple job. To achieve this it is important to understand the submit description file as it is responsible for describing the requirements and characteristics of the job. It is here that certain aspects of job submission can be controlled such as placing restrictions on machine characteristics or the number of the times that an executable should be run.
The basic commands in a simple submit file are:
- executable: The fully qualified name of the executable to be run.
- arguments: Any arguments that are to be passed to the executable.
- output: Where the STDOUT of the executable is written. This can be a relative or absolute path. HTCondor will not create the directory and hence an error will occur if the specified directory does not exist.
- error: Where the STDERR of the executable is written. The same rules apply as for output.
- log: Where HTCondor writes logging information regarding the job lifecycle (not the job itself). It shows the submission times, execution machine and times, and on termination will shows some statistics.
- queue: This command submits the job.
The commands and the attributes are case insensitive, hence OUTPUT and output or Queue and queue are equivalent. Comments can be added to the submit file with #. Interpolated values can be used in the submit file. Two useful values are ClusterId and ProcId. ClusterId is unique to each submission. ProcId is incremented by one for each instance of the executable in that submission. When submitting a single job, the value of ProcId is 0.
The following is a submit description file for a simple job. This job executes the welcome.sh script. It will add only one job to the queue.
The script welcome.sh contains a simple command:
#!/bin/bash
echo "welcome to HTCondor tutorial"
executable = welcome.sh
arguments = $(ClusterId)$(ProcId)
output = output/welcome.$(ClusterId).$(ProcId).out
error = error/welcome.$(ClusterId).$(ProcId).err
log = log/welcome.$(ClusterId).log
queue
Submitting The Job
On submission machine create the welcome.sh script, the exercise01.sub submssion file and then run the following command:
condor_submit exercise01.sub
This should produce the following output which shows that the job has been submitted with ClusterId 2464.
Submitting job(s).
1 job(s) submitted to cluster 2464.
Monitoring the job
The command condor_q can be used to see the current status of the jobs in the queue:
-- Schedd: bigbird04.cern.ch : <128.142.194.115:9618?... @ 12/07/16 15:10:09
OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS
fprotops CMD: welcome.sh 12/6 15:08 _ _ 1 1 2464.0
1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
The condor_q command provides information regarding the current state of the jobs, the name of the schedd, the name of the owner, etc.
The progress of a job can be followed by executing:
condor_q
condor_q -nobatch
-- Schedd: bigbird04.cern.ch : <128.142.194.115:9618?... @ 03/28/17 17:13:42
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
21847.0 fprotops 3/28 17:13 0+00:00:00 I 0 0.0 welcome.sh
001 (2465.000.000) 12/07 15:18:17 Job executing on host: <188.185.177.87:9618?addrs=188.185.177.87-9618+[--1]-9618&noUDP&sock=2869_c9b5_3>
...
006 (2465.000.000) 12/07 15:18:18 Image size of job updated: 1
0 - MemoryUsage of job (MB)
0 - ResidentSetSize of job (KB)
...
005 (2465.000.000) 12/07 15:18:18 Job terminated.
(1) Normal termination (return value 0)
Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Remote Usage
Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage
28 - Run Bytes Sent By Job
47 - Run Bytes Received By Job
28 - Total Bytes Sent By Job
47 - Total Bytes Received By Job
Partitionable Resources : Usage Request Allocated
Cpus : 1 1
Disk (KB) : 15 1 501507
Memory (MB) : 0 2000 2000
condor_wait <path_to_log_file> <JobId>