Skip to content

EosSubmit schedds

Note

This is still an experimental feature. Please, use with care and report any problems.

Note

New: since November 2025, EosSubmit schedds can no longer be used to submit jobs referencing AFS files, but only /eos paths.

Introduction

As discussed in Data Flow, submission to the HTCondor schedds at CERN normally makes use of a shared filesystem, ie AFS. This is convenient, but also shared filesystems introduce instability. There are two main methods to avoid shared filesystems: spool job submission (condor_submit -spool) and usage of the xrootd file transfer plugin.

  • Spool submission has the inconvenient that files are copied to the schedds local disks, which sometimes may fill. Therefore it is not the recommended submission method, but it may still be useful for cases where input/output files are not very large/numerous.

  • The xrootd transfer plugin can be used to directly transfer files from/to EOS and execution nodes. The plugin can be used by indicating it explicitely in the submit file, or by using the EosSubmit schedds, described as follows, which will automatically set the use of the plugin for referenced EOS files.

Usage

The main point of EosSubmit schedds is that, contrary to standard schedds, all defined files related to the job must be located in EOS. This includes the executable, the user log, the stdout/err/input files, and the files to transfer as input. Also, the destination path of output files to transfer must be in EOS.

The easiest way to achieve this is to cd to some EOS path (/eos/...) within your submission machine and use relative paths in the submit file. Alternatives are to set initialdir to an EOS directory, or use absolute /eos/... paths. Several examples below illustrate these possibilities.

Once the submit file is ready, using EosSubmit schedds is fairly simple:

  • First, select the schedds:

    module load lxbatch/eossubmit

    Note that to switch back to normal schedds you would need to run module unload lxbatch/eossubmit or module load lxbatch/share.

  • Then, just use the normal submit command:

    condor_submit <submit file > [...]

Everything should work out fine for your job, and xrootd used transparently for all file transfers avoding fuse or shared filesystems altogether.

There are however a few limitations to know about when using EosSubmit schedds.

Examples

Submitting from an EOS directory

First, cd into some EOS directory where your executable and input files (perhaps within a subdirectory) are located.

We could use the following submit file:

executable     = myexec.sh

log            = logs/log.$(ClusterId)
error          = $(ClusterId)/stderr.$(ProcId)
output         = $(ClusterId)/stdout.$(ProcId)

transfer_input_files = in/file.txt, in/file2.txt
transfer_output_files = fout1, subdir/fout2

queue
So if the initial submission directory was /eos/user/d/delgadop/condor, the input files will be read from the /eos/user/d/delgadop/condor/in directory, the log file will be created within /eos/user/d/delgadop/condor/logs/, the stdout and stderr files will be copied to /eos/user/d/delgadop/condor/$ClusterId/, and the output files to /eos/user/d/delgadop/condor/fout1 and /eos/user/d/delgadop/condor/subdir/fout1.

N.B.: Condor no longer supports directories to be included within transfer_input_files (while they can be used with transfer_output_files), thus please indicate files only. For many files or folders, it may be helpful to transfer a tarball and uncompress it within the job (this is also more efficient). If you really need to transfer directories, you can do so manually, running the xrdcp command within the job itself.

For the output files is worth noting the following:

  • The specified paths are the same for source (relative to the job's working directory in the execution node) and destination (relative to the submission dir). Any specified subdirectory is always preserved. This is different than the default with standard schedds. I.e. with EosSubmit schedds preserve_relative_paths=True is always assumed.

  • It is possible to change the destination path for the output files. For that, the transfer_output_remaps directive may be used. See example below.

Notice also that while the $ClusterID subdir for the stdout/err files is created by the xrootd plugin on the fly, the directories for the log file must exist beforeand, thus we should not use $ClusterId for subdirectories in this case since, in general, we don't know the value of those variables.

Finally, $ProcId cannot be used for the log directive. Please, see limitations.

Submitting from an AFS directory

If you want to submit to an EosSubmit schedd from an non-EOS directory, e.g. from AFS, you need to set the initialdir attribute to that EOS directory, as in:

executable     = /eos/user/d/delgadop/condor/exec.sh

initialdir = /eos/user/d/delgadop/condor

log            = logs/log.$(ClusterId)
error          = $(ClusterId)/stderr.$(ProcId)
output         = $(ClusterId)/stdout.$(ProcId)

transfer_input_files = in/file.txt, in/file2.txt
transfer_output_files = fout1, subdir/fout2

queue

This will produce exactly the same job and file transfers as the previous example, but can be submitted from anywhere (no need to cd into /eos/user/d/delgadop/condor first).

Important: the initialdir setting is used for all I/O files but not for the executable. For that reason, we have set an absolute path for it. Otherwise, it would be looked for in AFS and, even if found, the submission would be rejected.

Output remap

By default, job output files transferred to EOS are copied to the same relative paths they were produced to in the job execution directory. It is possible to change this, using the transfer_output_remaps directive, as in the following example:

transfer_output_files = f1, subdir/f2
transfer_output_remaps = "f1=outdir1/f1.remap; f2=/eos/user/d/delgadop/outdir2/f2.remap;"

In this case, for a job submitted from /eos/user/d/delgadop/condor the job produces f1, which is copied to /eos/user/d/delgadop/condor/outdir1/f1.remap, and subdir/f2, which is copied to /eos/user/d/delgadop/outdir2/f2.remap. This illustrate how both relative or absolute paths can be used in the remaps (while only relative paths can be used with transfer_output_files.

Notice also that the key for the remaps are the basename of the files, not their entire path (e.g. f2, not subdir/f2).

Comparison to other submission strategies

The EosSubmit schedds have been set up to make it easier for users to submit from EOS. There are however some trade-offs. We can compare them to other submission methods:

  1. Simple submission from AFS. It is easy to use but often AFS dirs get filled or show a slow respond (especially for numerous/large files) which may cause problems to the user jobs and, more importantly, to other users of HTCondor.

  2. Submission using -spool. Files are copied to the schedd and no shared filesystem is used. It has the drawback that schedds may be filled up if many/large files are moved. It also requires that the user explicitely calls condor_transfer_data to get stdout/stderr/log files back.

  3. Explicit use of the xrootd transfer plugin with non-EosSubmit schedds. I/O files do not use a shared filesystem or use the schedd local disk. However, in this case submit files must explicitely make use of the output_destination attribute, and root:// URLs set for input files. This has the additional drawback that the executable, the user log file, and the standard input (if set) will be transferred from AFS, not EOS, or must be spooled.

Limitations

1. Don't reference non-EOS files

The EosSubmit schedds will not allow for any file to be transferred to/from or be written to a non-EOS filesystem (usually, AFS). A job submission will be rejected if any of log, stdout/err/in, or input/output files uses non-EOS paths. This can e.g happen inadvertently if submitting from AFS and not setting initialdir.

Please, refer to the examples above for details about how to create your submit file.

2. Single log file per job cluster.

The EosSubmit schedds try to use the xrootd transfer plugin for all EOS files. However, the user log file is an exception to this rule because it is handled exclusively by the schedd (not the execution nodes). Therefore, the log file is written directly in an EOS mount in the schedd. This might become a problem if a user submitted thousands of jobs with a different log file for each job.

For this reason, users should set a single log file per job cluster (HTCondor will include the information of all jobs in the cluster in the same log file).

In practice, this means that the log filename may depend on $ClusterId (constant for all jobs in the cluster) but not on $ProcId (which is different for each individual job). The schedds will automatically reject any submission where the log file matches $(ClusterId).*$(ProcId). Users must not attempt to circumvent this requirement.

Notice however that $ProcId will take values from 0 to N (N being the number of jobs in the cluster), it is possible therefore that your log filename matches the rejection expression by chance. This should be however easy to fix by e.g avoiding numbers after a $(ClusterId) clause in the name.

3. Preservation of relative paths

By default, job output files transferred to EOS are copied to the same relative paths they were produced to in the job execution directory (contrary to the default behaviour with normal schedds). The preserve_relative_paths directive is assumed to be True and ignored if present in the submit file. It is however possible to change this by using the transfer_output_remaps directive, as shown in the examples.


Last update: November 17, 2025