Custom python libraries and LxBatch
There are many curated sources of Python libraries that you can use in your LxBatch jobs. For example, the lxplus environment provides access to both system python and its libraries, which may suit many purposes, but also the LCG Python libraries which are optimised for HEP use cases. The package lists for LCG can be found here.
The various views for lcg can be seen in
/cvmfs/sft.cern.ch/lcg/views and additional packages can be requested via jira. Given the fact that batch jobs may need to scale to many nodes, having a cached read only filesystem providing the libraries is a good practice.
However, you may want to use custom python libraries that do not exist in cvmfs (yet) or there are incompatibilities, or you have a need for fast iterations. Whilst it's not a good idea to use an entire python distribution of your own, having an overlay to inject specific libraries can work well.
However, the temptation here is to use distributed filesystems to run code on LxBatch - this can often strain those filesystems, get jobs throttled, and lead to a bad experience. This doc will therefore show an alternative that is still very simple with a fast turnaround
Installing libraries to EOS
Having the libraries in EOS means that they can be used to test on, for example,
lxplus, but they can also be transferred by
HTCONDOR to the worker node, used, and discarded.
Here's how to install on EOS:
Firstly, if using LCG as the base, source an LCG release for your platform, ie:
[bejones@aiadm84 bejones]$ . /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/setup.sh
Then, the python and pip commands should be from cvmfs, ie:
[bejones@aiadm84 bejones]$ which python3 /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/bin/python3 [bejones@aiadm84 bejones]$ which pip3 /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/bin/pip3
We can then use pip to install packages, but we need to make sure they are installed in EOS, which probably isn't the
[bejones@aiadm84 bejones]$ PYTHONUSERBASE=/eos/user/b/bejones/.local/ pip3 install --user dask-lxplus [...] Installing collected packages: dask-jobqueue, dask-lxplus Successfully installed dask-jobqueue-0.8.2 dask-lxplus-0.3.2
[bejones@aiadm84 bejones]$ ls .local/lib/python3.9/site-packages/ dask_jobqueue dask_jobqueue-0.8.2.dist-info dask_lxplus dask_lxplus-0.3.2.dist-info
Testing with libraries in EOS
To test with libraries in eos, just make sure the
PYTHONTPATH prepends the directory they are in when running a script, for example:
[bejones@aiadm84 bejones]$ PYTHONPATH=/eos/user/b/bejones/.local/lib/python3.9/site-packages/:$PYTHONPATH python3 myscript.py
Submitting a job with the custom libraries
To submit a job, we just need to ensure that we have both the libraries in the submit file, and that we set the PYTHONPATH correctly.
In my example above, the ROOT path for the libraries is
In the submit file, I'd therefore just need to make sure I included the line:
transfer_input_files = root://eosuser.cern.ch//eos/user/b/bejones/.local/lib/python3.9/site-packages/, myscript.py
Note that I include also a script, here
myscript.py since it's likely that our executable will be a shell script to do things like source the LCG view. In that case we'd also setup
PYTHONPATH in that script, so something like:
#!/bin/bash . /cvmfs/sft.cern.ch/lcg/views/LCG_102/x86_64-centos9-gcc11-opt/setup.sh export PYTHONPATH=./site-packages:$PYTHONPATH