Skip to content

Migration from LSF to HTCondor: practical

This page aims to describe the practical setps needed to migrate your current LSF setup to Condor. It should be read in conjunction with the more detailed Submit guide and Quickstart guide.

Design of your framework / job-layouts

In general, since the shared filesystems between lxplus and HTCondor are the same as LSF, you don't need to change your job frameworks or the way you layout the job input data on the underlying filesystem. You should be able to craft a Condor submit file that does the same job as your LSF submits.

Since Condor offers a more flexible way to submit and control multiple jobs that belong to the same set (e.g. to glob over multiple input files), you may subsequently wish to take advantage of those features to streamline your job submissions - but there is no requirement to do so in order to migrate over.

Create a submit file

Unlike LSF, HTCondor jobs cannot be submitted directly on the command line. Each job (or set of jobs) requires a submit file. If you already use LSF submit files, the concept is the same.

The Quickstart guide describe the basic format. A skeleton submit-file is copied here:

executable            =
arguments             = $(ClusterID) $(ProcId)
output                = output/hello.$(ClusterId).$(ProcId).out
error                 = error/hello.$(ClusterId).$(ProcId).err
log                   = log/hello.$(ClusterId).log

There is one directive (or ClassAd) per line. The ClassAds described below (if needed) should just be added on their own line before the queue statement.

Chose a time-limit

This part replaces the bsub -q queuename specification.

A described, the time-limit on HTCondor is real-time - this part effectively replaces the "queue" specification for LSF. Check the real time ("Run time") as returned in the email from one of your LSF jobs:

LSF email

Then either:

  1. Choose the next largest jobflavour (as described in the Submit section) and add to your submit file:

Setting the job flavour in the submit file is achieved like this:

```Ini +JobFlavour = "longlunch"

 The possible options are:

espresso     = 20 minutes
microcentury = 1 hour
longlunch    = 2 hours
workday      = 8 hours
tomorrow     = 1 day
testmatch    = 3 days
nextweek     = 1 week

  1. Instead, set the real run-time manually. The advantage of specifying a (smaller) run time more precisely are that your job is more likely to fit in spare/opportunistic/draining resources, so will have a chance of running sooner.

Ini +MaxRuntime = NNNs

We will soon provide a proper benchmarking setup, that allows you to benchmark your job runtimes more precisely.

Choose the job limits (#cores, etc)

The part replaces the the bsub -n option (for #cores) and -M option for memory limit.

As per LSF, the number of cores defaults to 1 with 2 GB of memory. You may choose a different multiple by specifying the number of cores, as described in the Submit section). To respect the agreed WLCG ratios (which determine what we purchase for lxbatch), you can't choose the number of cores and memory independently.

In the "vanilla" LSF universe (normal jobs), all cores are allocated on the same node: the span directive (used to ensure this on LSF) is not needed for HTCondor.

Choose the STDOUT, STDERR and state log locations

This part replaces the STDOUT and STDERR file handling of LSF, and if used, the bsub -o and bsub -e options.

The skeleton file above gives the locations of:

  • outout - standard out of your job script (as specified by executable)
  • error - standard error of your job script
  • log - the condor state log, as described in the Submit guide. Note this is purely the Condor state log and not any output from your job.

If the specified paths are relative, they are relative to the filesystem directory from which the job is submitted.

Main difference from LSF:

  • It was very rare for people to specify these files explicitly with LSF. For HTCondor, they must be specified.
  • The default auto-created directory of style LSFJOB_789482027/ is not done. If a simple filename is specified, the logs will be written to the submission directory rather than one level below.
  • If a relative directory path (or a full file path) is specified, those directories must exist at job submit time (or the submit will fail).

Other parameters

For now, if you have other bsub parameters you need to migrate to a condir submit file, please contact the Batch Service team to discuss those. We can document popular ones as they come along.

Code changes

The general procedure is:

  • Look at the bsub lines in your code, and replace that with code that emits the equivalent submit file (SUB-FILE), using the sections above as a guide. Then replace the whole bsub command with:

condor_submit SUB_FILE

Multiple job submits

The previous submit examples show a single queue command which submits a single Condor "cluster" [of jobs] containing just one job - this is equivalent to a single bsub command. See [LSF migration: concepts] for the terminology differences ("cluster" vs. "job").

Condor also supports very easily multiple job submits from a single submit file, paramterised by the $(ProcId) as described in the Submit guide. This is the equivalent of LSF array jobs.