EosSubmit schedds

NOTE: this is still an experimental feature. Please, use with care and report any problems.

As discussed in Data Flow, submission to the HTCondor schedds at CERN normally makes use of a shared filesystem, ie AFS. This is convenient, but also shared filesystems introduce instability. There are two main methods to avoid shared filesystems: spool job submission (condor_submit -spool) and usage of the xrootd file transfer plugin.

Spool submission is described here. It has the inconvenient that files are copied to the schedds local disks, which sometimes may fill. Still, it is a useful submission method if input/output files are not very large/numerous.

The xrootd file transfer plugin can be used to directly transfer files from/to EOS and execution nodes. The plugin can be used by configuring it explicitely in the submit file, as described here, or by using the new (experimental) EosSubmit schedds, which will automatically set the use of the plugin for any input/output file in EOS, as described in this page.

Usage

The basics for using EosSubmit schedds are fairly simple.

To select the schedds:

module load lxbatch/eossubmit

Note that to switch back to normal schedds you would need to either:

module unload lxbatch/eossubmit or module load lxbatch/share
To submit to the EosSubmit schedds, use the normal command:

condor_submit <submit file > [...]

Submit file

Submit files for jobs using the EosSubmit schedds do not require any special setting but you must remember that only EOS files will use the xrootd transfer plugin. In particular, mixing EOS and not-EOS files in either the input or the output files is not supported and the jobs will be rejected by the schedds.

The easiest way to use these schedds is to either cd into some EOS directory where we hold our executable and input files, and where we want our output files back, or to set the initialdir attribute to that EOS directory. Some examples follow.

1. From an AFS directory

We could use the following submit file:

executable     = /eos/user/d/delgadop/condor/output.sh

initialdir = /eos/user/d/delgadop/condor

log            = logs/log.$(ClusterId)
error          = $(ClusterId)/stderr
output         = $(ClusterId)/stdout

transfer_input_files = in/file.txt, in/file2.txt

queue

The log, and output files will be written at /eos/user/d/delgadop/condor. The stdout and stderr files will be written to a subdirectory named after the job cluster ID within it. The input files will be read from /eos/user/d/delgadop/condor/in.

Notes: - The initialdir setting is used for all I/O files but not for the executable. For that reason, we have set an absolute path for it. Otherwise, it would be looked for in AFS (what would also work but we wanted to avoid AFS here). - While the $ClusterID subdir for the stdout/err files is created by the xrootd plugin on the fly, the directories for initialdir and the log file must exist beforeand. We should not use $ClusterId (or $ProcId) for subdirectories in these cases since, in general, we don't know the value of those variables. We can however use per-job subdirs if we generate them beforehand (see example 3 at the end of this section).

Notice also that in the case that multiple jobs are sent with the same submit file as in this example, all the output files of the submitted jobs are written to the same directory (/eos/user/d/delgadop/condor). This might be a problem if different jobs produce files with the same name (this is not different when AFS is used for output). For more precise control of the target destination of output files, please explicitly use the output_destination attribute, as described here, or use transfer_output_remaps. Both may include ClusterId or ProcId variables. E.g. the submit file might include something like the following to separate the output of jobs per directory:

transfer_output_remaps = "f1=$(ClusterId)/f1.$(ProcId); f2=$(ClusterId)/f2.$(ProcId)"

2. From EOS directory

Let us see the equivalent to the previous example, assuming we have previously cd into /eos/user/d/delgadop/condor:

executable     = output.sh

log            = logs/log.$(ClusterId)
error          = $(ClusterId)/stderr
output         = $(ClusterId)/stdout

transfer_input_files = in/file.txt, in/file2.txt

queue

Notice that in this case the executable is directly found in the current EOS dir, not AFS, and it is thus also transferred to the execution node using the xrootd transfer plugin.

3. Submission of multiple jobs

Finally, let us see an example to submit several jobs once, which use different I/O directories.

Let's assume we start at /eos/user/d/delgadop/condor and that we have previously created a number of directories named jobXX, where XX are consecutive numbers. Let us also consider that jobXX directory contains a different specific.txt input file. We could have a submit file like the following:

executable = multi.sh

initialdir = $(subdir)

log            = /eos/user/d/delgadop/condor/logs/log.$(ClusterId)
error          = stderr
output         = stdout

transfer_input_files = specific.txt, /eos/user/d/delgadop/condor/in/common.txt

queue subdir matching dirs job*

This submit file will send a job per existing jobXX subdirectory. All jobs will use /eos/user/d/delgadop/condor/multi.sh as executable and use /eos/user/d/delgadop/condor/in/common.txt as one of their input files. However, each job will consume a different specific.txt file located in its specific jobXX directory and copy stdout/stderr and any other output files to the same particular directory. The executable and all I/O files are transferred using the xrootd plugin.

Notice that there is a single log file for all the submitted jobs (i.e. in the same job cluster). This is by design since the user log is the only file that is directly written by the schedd using fuse mount of EOS rather than the xrootd plugin (See limitations).

Comparison to other submission strategies

The EosSubmit schedds have been set up to make it easier for users to submit from EOS. There are however some trade-offs. We can compare them to other submission methods:

Simple submission from AFS. It is easy to use but often AFS dirs get filled or show a slow respond (especially for numerous/large files) which may cause problems to the user jobs and, more importantly, to other users of HTCondor.
Submission using -spool. Files are copied to the schedd and no shared filesystem is used. It has the drawback that schedds may be filled up if many/large files are moved. It also requires that the user explicitely calls condor_transfer_data to get stdout/stderr/log files back.
Explicit use of the xrootd transfer plugin with non-EosSubmit schedds. I/O files do not use a shared filesystem or use the schedd local disk. However, in this case submit files must explicitely make use of the output_destination attribute. This has however the additional advantage that allows user to use $ClusterID and $ProcID and thus have more control of where output files are written (but notice that the output_destination attribute may also be set with EosSubmit schedds). A drawback with non-EosSubmit schedds is that they do not allow for the user log file to be in EOS so it is either written in AFS or it must be spooled.

Limitations

Mixing of EOS and non-EOS files. While the EosSubmit schedds are capable of automatically apply the xrootd transfer plugin selectively to e.g. output files only or input files only, jobs with mixed EOS and non-EOS files for either the input or output categories is not supported. This is more relevant for the input files case. A job submitted from AFS with a setting like the following transfer_input_files = fileA, /eos/user/d/delgadop/condor/in/fileB will be rejected by the schedd because fileA is located at AFS while fileB lies within EOS.
Single log file per job cluster. The EosSubmit schedds try to use the xrootd transfer plugin for all EOS files. However, the user log file is an exception to this rule because it is handled exclusively by the schedd (not the execution nodes). Therefore, the log file is written directly in an EOS mount in the schedd. This might become a problem if a user submitted thousands of jobs with a different log file for each job.

For this reason, users should set a single log file per job cluster (HTCondor will include the information of all jobs in the cluster in the same log file).

In practice, this means that the log filename may depend on $ClusterId (constant for all jobs in the cluster) but not on $ProcId (which is different for each individual job). The schedds will automatically reject any submission where the log file matches $(ClusterId).*$(ProcId). Users must not attempt to circumvent this requirement.

Notice however that $ProcId will take values from 0 to N (N being the number of jobs in the cluster), it is possible therefore that your log filename matches the rejection expression by chance. This should be however easy to fix by e.g avoiding numbers after a $(ClusterId) clause in the name.

Last update: March 24, 2025