Migration from LSF to HTCondor: concepts
This page aims to describes the (few) conceptual differences between LSF and HTCondor together with instructions on how to translate your existing submit scripts.
The way you interact with the HTCondor system is basically the same as LSF, although the commands are different. The nodes are the same and the fair-share you get is the same. There are differences: queues do not exist (you just say what you want for your job), terminology is a bit different and we've simplified the job time limits. Similarly to LSF, shorter jobs are better for the batch system and will typically get scheduled faster, even on a busy system.
Things that are broadly the same as LSF
Pattern The submit-poll batch pattern is the same: you submit a job, it sits waiting until there is a spare slot on the system and you have a share on the system, whereupon it starts your job on a worker node.
Worker-node The worker-node package-sets and available libraries are the same as on LSF worker nodes.
Mounted filesystems AFS, EOS and CVMFS are all available in the same way. Locally submitted jobs have a kerberos token as normal.
Pool directory As per LSF, the job runs in a 'pool directory' (i.e. a job-specific directory on the worker node). As per LSF, this is the CWD the job starts in. As per LSF, locally written files are not copied at the end of the job unless your script does so at the end (though you can request Condor to copy them in your submit file).
Fair-share A fair-share system is in place to ensure CERN resource agreements are respected. As in LSF, the system is hierarchical, with agreed shares given to experiments/projects and sub-shares divided up between users, as the experiment/projects coordinators see fit. As experiments/projects migrate, we'll move compute resources over to HTCondor and set the fair-share appropriately on the new system to ensure the same resources are delivered as before.
Shortish jobs win As per LSF, reasonably short jobs (1-4 hours) have a larger phase space to run in, so will typically be scheduled faster. This is because, at any one time, we have a fair number of systems draining for reboot (e.g. to apply security upgrades) and these only accept shorter jobs. We also, where possible, backfill dedicated prompt resources with short jobs (i.e. resources which are often unused but need to be given to the rightful owner on short-notice). If you have a choice, chopping your jobs into smaller pieces will get it through faster.
Some subtle differences
Stdout/Stderr In LSF the standard out (stdout) and standard error (stderr) files locations are implicit (at the end of the job, they're copied to the directory from where you submitted the job). In HTCondor, the stdout and stderr file locations must be explicitly specified in the job submit file - though, if relative paths are used, they are relative to the directory from where you submitted the job. As per LSF, the two files are copied into place at the end of the job (though differently from LSF, if a job runs out of time, the partial stdout/stderr will not be copied back).
Queues LSF exposes the concept of different 'queues'. Inside LSF, these queues apply a set of requirements and restrictions to the job (notably on maximum time-limit). Despite the apparently different queues, the LSF scheduler really looks at all public queues together when deciding which job to place next. In Condor, there is no concept of distinct queues - job submit files specify the needs of the job (e.g. maximum time limit) which allow for a more flexible specification of requirements. As per LSF, all jobs are considered together when the scheduler decides which job to place next.
RUNjob state in LSF is called
Running (R)in HTCondor (i.e. executing on woker node).
PENDjob state in LSF is called
Idle (I)in HTCondor (i.e. waiting to be scheduled). Somewhat confusing since 'Idle' in LSF-speak means something else.
PSUSP/USUSP/SSUSPsuspended states in LSF are referred to as
Hold (H)in HTCondor (i.e. waiting for someone to reschedule the job).
What is a 'job' called?
- An LSF
jobis called a
- A single, simple job identifier in Condor has the format
<ClusterId>.0(e.g. 1234567.0). The integer value after the . is called the
ProcId. In a simple job, the integer
ProcIdis set to 0.
- LSF has the concept of an
array job, where the same executable is used, but multiple sub-jobs are submitted, differing only by a numeric parameter that is passed to the job as an environment variable. Condor has a similar feature allowing a single submit script to submit multiple sub-jobs (e.g. each iterating over a different input file in the same directory). The sub-jobs can all be managed as a group and all have the same
ClusterID, differing only by their
ProcId. The feature is more powerful than LSF array jobs and is typically used a lot more.
- An LSF
Time Limits In LSF, the time limits are specified in normalised CPU-time relative to a rather old normalisation (consequently the 1 week queue typically gives you less than 2 days CPU time on a modern system). In HTCondor, the limits exposed are on unnormalised real-time. We recommend you test your jobs on the special benchmark setup (see later), determine how many events/whatever you can fit in 1-4 hours on those systems, and chop up your jobs appropriately. There is an example of how to do this later in the docs.
Memory limits Limits, notably on memory usages are enforced more rigourously by HTCondor using Linux
cgroups. This is good news, since it prevents badly behaving jobs from stealing resources from other jobs. The limits themselves (CPU vs. memory vs. disk) have not changed (we use the agreed WLCG ratios of 1 core to 2 GB memory to 20 GB disk).