File Transfer to xrootd URL
EOS is often used at CERN, whereas for the HTCondor service, AFS is used as the shared filesystem between your submissions and the schedd. Whereas we do not allow EOS FUSE to serve as this filesystem, we do have a file transfer plugin that can help integrate EOS into your workflow for local batch submission. This is achieved using an HTCondor file transfer plugin to xrdcp
files to EOS.
Note
Note that this is authenticated with kerberos and so should be viewed as a way to use EOS at CERN, not global xrootd URIs.
The xrootd file transfer plugin usage may be explicitely configured in the submit file, or automally imposed by the use of the new (experimental) EosSubmit schedds.
Usage in the submit file
Output files
To use the xrootd transfer plugin for the output produced by the job, you can specify the output_destination
attribute in the submit file. A correctly formatted xrootd URI will tell htcondor to write your output files to this destination. Here is a basic example, where I want to write files to /eos/user/b/bejones/condor/xfer
:
executable = script.sh
log = xfer.$(ClusterId).log
error = yf.$(ClusterId).$(ProcId).err
output = yf.$(ClusterId).$(ProcId).out
output_destination = root://eosuser.cern.ch//eos/user/b/bejones/condor/xfer/
queue
Using this example, all files written to the working directory (aka the output sandbox
) will be written to that directory in EOS.
However, we can also ask for any directories to be created, which can help to avoid having too many files in one directory. We do this by adding an additional special attribute MY.XRDCP_CREATE_DIR = true
. Full example:
executable = script.sh
log = xfer.$(ClusterId).log
error = yf.$(ClusterId).$(ProcId).err
output = yf.$(ClusterId).$(ProcId).out
output_destination = root://eosuser.cern.ch//eos/user/b/bejones/condor/xfer/$(ClusterId)/
MY.XRDCP_CREATE_DIR = True
queue
Note the addition of a subdir with the ClusterId
in the output_destination.
We can also include the transfer_output_files
attribute to explicitely indicate the number of output files to transfer (instead of all the files written in the job working directory). E.g.:
transfer_output_files = fout1, fout2
Input files
We can ask the plugin to import input files by setting the transfer_input_files
attribute, though this needs to have the full path (for every included file), for example:
executable = script.sh
log = xfer.$(ClusterId).log
error = yf.$(ClusterId).$(ProcId).err
output = yf.$(ClusterId).$(ProcId).out
output_destination = root://eosuser.cern.ch//eos/user/b/bejones/condor/xfer/$(ClusterId)/
transfer_input_files = root://eosuser.cern.ch//eos/user/b/bejones/condor/file.txt, root://eosuser.cern.ch//eos/user/b/bejones/condor/sub/file2.txt
MY.XRDCP_CREATE_DIR = True
queue
Notice that you could also use a root://
URL for the Input
attribute (stdin file) if desired, but not for the executable
or initialdir
attributes.
User log file and spool submission
Even when setting output_destination
, the user log file (log
attribute of the submit file) is not transferred with the plugin (since it is handled by the schedd directly). In the normal case the log file would be thus a file in AFS. There is though a way to avoid using AFS at all, by using condor_submit -spool
.
As described in the dataflow page condor_submit -spool
causes input/output/log files to be staged to the schedd beforehand, avoiding the use of the shared filesystems. However, if output_destination
is set and xrootd URLs used in transfer_input_files
, input/output files are directly transferred by the plugin to/from EOS and the execution nodes, skipping the schedd. This means that the only files written in the schedd will be the executable and the user log file. This is not perfect but at least avoids the use of shared filesystems (AFS) altogether.
In this case, after a job terminates, if you wish to consult the log, it is necessary to transfer the log back using the following command:
condor_transfer_data <job-id>
This will transfer back the log file to the submit machine (in the initial working directory of the submitted job).
Automatic usage by EosSubmit schedds
A new (experimental) feature allows users to submit jobs from EOS using the xrootd transfer plugin without explicitely
setting output_destination
or using xrootd URLs for input files. This can be very convenient if submit files are
generated programatically or in general difficult to modify.
This method have some advantages and disadvantage over explicit plugin usage in the submit file or e.g. usage of
-spool
.
Please refer to EosSubmit schedds for information on how to use this feature.