Exercise 7: Periodic Removal
In this exercise the ability to force jobs to be removed, held, released will be examined. In the submit file the following can be set:
- If the expression is true, HTCondor removes the jobs.
periodic_remove = (expression)
periodic_release = (expression)
periodic_hold = (expression)
The condor_schedd scheduler periodically evaluates these expressions every 5 minutes and if the expression is true, the command is executed. Variables such as EnteredCurrentStatus, JobStatus, JobCurrentStartDate, etc, can be used for the expressions and a non exhaustive list can be found here.
HTCondor submits a job to the queue for the executable welcome.sh. If the job arrives in the on hold status and remains there for more than 60 seconds, when the scheduler evaluates the value of periodic_release, it will release the job.
The script welcome.sh contains a simple command:
#!/bin/bash
echo "welcome to HTCondor tutorial"
executable = welcome.sh
arguments = $(ClusterId)$(ProcId)
output = output/welcome.$(ClusterId).$(ProcId).out
error = error/welcome.$(ClusterId).$(ProcId).err
log = log/welcome.log
periodic_release = ((JobStatus == 5) && (time() - EnteredCurrentStatus) > 60)
queue
Note: The periodic_release is useful only if the executable is correct and fails under specific circumstances such as problems with executing machines, etc. Otherwise, HTCondor will reschedule (NO resubmit) the job and the job will again arrive in the on hold status.