Skip to content

Custom python libraries and LxBatch

There are many curated sources of Python libraries that you can use in your LxBatch jobs. For example, the lxplus environment provides access to both system python and its libraries, which may suit many purposes, but also the LCG Python libraries which are optimised for HEP use cases. The package lists for LCG can be found here.

The various views for lcg can be seen in /cvmfs/sft.cern.ch/lcg/views and additional packages can be requested via jira. Given the fact that batch jobs may need to scale to many nodes, having a cached read only filesystem providing the libraries is a good practice.

However, you may want to use custom python libraries that do not exist in cvmfs (yet) or there are incompatibilities, or you have a need for fast iterations. Whilst it's not a good idea to use an entire python distribution of your own, having an overlay to inject specific libraries can work well.

However, the temptation here is to use distributed filesystems to run code on LxBatch - this can often strain those filesystems, get jobs throttled, and lead to a bad experience. This doc will therefore show an alternative that is still very simple with a fast turnaround

Installing libraries to EOS

Having the libraries in EOS means that they can be used to test on, for example, lxplus, but they can also be transferred by HTCONDOR to the worker node, used, and discarded.

Here's how to install on EOS:

Firstly, if using LCG as the base, source an LCG release for your platform, ie:

[bejones@aiadm84 bejones]$ . /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/setup.sh

Then, the python and pip commands should be from cvmfs, ie:

[bejones@aiadm84 bejones]$ which python3
/cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/bin/python3
[bejones@aiadm84 bejones]$ which pip3
/cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/bin/pip3

We can then use pip to install packages, but we need to make sure they are installed in EOS, which probably isn't the $HOME directory:

[bejones@aiadm84 bejones]$ PYTHONUSERBASE=/eos/user/b/bejones/.local/ pip3 install --user dask-lxplus
[...]
Installing collected packages: dask-jobqueue, dask-lxplus
Successfully installed dask-jobqueue-0.8.2 dask-lxplus-0.3.2
[bejones@aiadm84 bejones]$ ls .local/lib/python3.9/site-packages/
dask_jobqueue  dask_jobqueue-0.8.2.dist-info  dask_lxplus  dask_lxplus-0.3.2.dist-info

Testing with libraries in EOS

To test with libraries in eos, just make sure the PYTHONTPATH prepends the directory they are in when running a script, for example:

[bejones@aiadm84 bejones]$ PYTHONPATH=/eos/user/b/bejones/.local/lib/python3.9/site-packages/:$PYTHONPATH python3 myscript.py

Submitting a job with the custom libraries

To submit a job, we just need to ensure that we have both the libraries in the submit file, and that we set the PYTHONPATH correctly.

In my example above, the ROOT path for the libraries is root://eosuser.cern.ch//eos/user/b/bejones/.local/lib/python3.9/site-packages/

In the submit file, I'd therefore just need to make sure I included the line:

transfer_input_files = root://eosuser.cern.ch//eos/user/b/bejones/.local/lib/python3.9/site-packages/, myscript.py

Note that I include also a script, here myscript.py since it's likely that our executable will be a shell script to do things like source the LCG view. In that case we'd also setup PYTHONPATH in that script, so something like:

#!/bin/bash 

. /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/setup.sh
export PYTHONPATH=./site-packages:$PYTHONPATH

Last update: August 4, 2023