Exercise 7: Periodic Removal
In this exercise the ability to force jobs to be removed, held, released will be examined. In the submit file the following can be set:
- If the expression is true, HTCondor removes the jobs.
periodic_remove = (expression)
- If the expression is true, puts the job in the idle state again.
periodic_release = (expression) ``` - **If the expression is true, HTCondor puts the job in the hold state.** ```Ini periodic_hold = (expression)
The condor_schedd scheduler periodically evaluates these expressions every 5 minutes and if the expression is true, the command is executed. Variables such as EnteredCurrentStatus, JobStatus, JobCurrentStartDate, etc, can be used for the expressions and a non exhaustive list can be found in http://research.cs.wisc.edu/htcondor/manual/latest/12_Appendix_A.html.
HTCondor submits a job to the queue for the executable welcome.sh. If the job arrives in the on hold status and remains there for more than 60 seconds, when the scheduler evaluates the value of periodic_release, it will release the job.
The script welcome.sh contains a simple command:
#!/bin/bash echo "welcome to HTCondor tutorial"
Execute condor_submit exercise7.sub using the following submit description file to submit the jobs.
executable = welcome.sh arguments = $(ClusterId)$(ProcId) output = output/welcome.$(ClusterId).$(ProcId).out error = error/welcome.$(ClusterId).$(ProcId).err log = log/welcome.log periodic_release = ((JobStatus == 5) && (time() - EnteredCurrentStatus) > 60) queue
Note: The periodic_release is useful only if the executable is correct and fails under specific circumstances such as problems with executing machines, etc. Otherwise, HTCondor will reschedule (NO resubmit) the job and the job will again arrive in the on hold status.