We saw the basics of a submit file in the quick start guide, but here we'll drill down into some of the specifics. This is by no means an exhaustive guide, as ever the HTCondor Documentation is recommended as a reference. Aside for some differences around the exact memory / cpu amounts you can request, you should be able to use an HTCondor submit file you use elsewhere here at CERN. That said, there are also some local shortcuts that we've added to help people, especially with a transition from the old batch system. We detail those differences here too.
Default schedd mapping
Launching jobs from central services like
aiadm will benefit from our internal configuration that automatically maps a user to a specific schedd. This mapping ensures that users get a consistent view of their jobs by always interacting with the same schedd.
To see the current mapping, or to see how to change mapping, please refer to the documentation for myschedd
HTCondor allows submission to different platforms or architectures with the use of something it calls universes. In the submit file, you can specify the universe, like so:
universe = vanilla
Input, output and logs
Most jobs require input and output in order to run. HTCondor provides job logging using the following job submission directives:
input = jobinput.txt output = joboutput.txt error = joberr.txt log = joblog.txt
Whereas HTCondor doesn't require a shared filesystem, the use of one is supported at CERN and enables some additional features, such as being able to use condor_wait on the log file to monitor job state transitions. In the above example, the paths are relative, but are presumed to be on a shared filesystem that both the submission node and the condor scheduler can both access. At the present time in CERN this effectively means AFS, though other shared filesystems will be supported as they become available.
There are a number of variables that can be used in the generation of the filenames, which is useful when a submit file is being used to generate multiple jobs. This is an example of such a submit file excerpt:
input = input/job.$(ClusterId).$(ProcId).txt output = output/job.$(ClusterId).$(ProcId).output log = log/job.$(ClusterId).$(ProcId).log
Operating system requirements
The default operating system is set system-wide and is currently CentOS7 - though soon we will start a migration to Alma 9. To select the non-default operating system, please use a requirements attribute in your job submit file:
requirements = (OpSysAndVer =?= "CentOS7")
requirements = (OpSysAndVer =?= "CentOS8")
OS selection via Containers
Given we are about to go through a period where we are less homogenous with Operating Systems, there are options to have the OS managed for you, with containers being an option.
The following can be added to your submit file:
MY.WantOS = "el9"
...with valid choices currently being
This will ensure your job runs in the correct OS version for these - whatever flavour of Enterprise Linux clone is run for that release. Your job will match to a machine, if that machine is running a different base OS, then your job will run in a
OpSysAndVer as the previous example will ensure that you match on the base OS of the worker node. Combining the two options doesn't really make sense, at best it would mean that you are reducing the potential workers to run your OS version independent container.
Some requirements can be used to target particular parts of the batch system. These should be used with care (and usually only on the advice of the Batch support team), since, by definition it means limiting the nodes that are available for your job. However it is sometimes necessary, for example, if you need a new package that has only currently been deployed to the QA batch nodes:
Requirements can use any ClassAd and expressions can be chained:
requirements = ( (OpSysAndVer =?= "CentOS7") && (CERNEnvironment =?= "qa") )
In order to help scheduling, and to provide priority to jobs that are shorter and more efficient, the maximum runtime of a job should be set. This can either be set directly in seconds, or a job can be assigned a "flavour" which will bucket the job into a max runtime. Jobs which exceed the maximum runtime will be terminated. The runtime is the wall time of the job (the elapsed actual time) rather than a calculated cpu time.
The job flavours are as follows:
espresso = 20 minutes microcentury = 1 hour longlunch = 2 hours workday = 8 hours tomorrow = 1 day testmatch = 3 days nextweek = 1 week
Setting the job flavour in the submit file is achieved like this:
+JobFlavour = "longlunch"
Setting manually can be achieved by placing the following in your submit file:
+MaxRuntime = Number of seconds
Note that if a job runs out of time, the partial stdout/stderr is not normally copied back. Benchmark jobs can help you choose an appropriate time limit for your jobs.
Resources and limits
As with any system of finite resources, there are limits which you need to be aware of. The time limit of a job is one such limit. The MaxRuntime of a job is the value of the longest JobFlavour.
By default, a job will get one slot of a CPU core, 2gb of memory and 20gb of disk space. It is possible to ask for more CPUs or more memory, but the system will scale the number of CPUs you receive to respect the 2gb / core limit. To ask for more CPUs you can do the following in the submit file:
RequestCpus = 4
Note that memory is applied as a soft limit. This means that your job will be terminated if there is memory pressure on the node. If you need additional memory, the safe thing to do is to request more slots.
Submitting multiple jobs
We've talked about submitting multiple jobs in passing, and how the interpolated values such as $(ClusterId) and $(ProcId) can help. The mechanics of how to do it are quite simple. The queue directive can take an integer to submit multiple jobs, for example with the following submit file:
executable = runmore.sh input = input/mydata.$(ProcId) arguments = $(ClusterID) $(ProcId) output = output/hello.$(ClusterId).$(ProcId).out error = error/hello.$(ClusterId).$(ProcId).err log = log/hello.$(ClusterId).log queue 150
Note that the queue command has many other features, which are documented here