Skip to content

HTMap

HTMap is a Python library that wraps the process of mapping Python function calls out to an HTCondor pool.

This section covers some examples that explains how to use HTMap against the CERN HTcondor cluster. We strongly recommend you to follow the upstream documentation to discover all the possibilities that this library offers.

Tip

The support for HTMap in our infrastructure has been added recently.

If you have any question or you have a use-case planned to be implemented with HTMap, feel free to contact us for further discussion. This will help us improving its support.

Installing HTMap

At the time of writting, there is a bug in the latest published release of HTMap (v0.6.1) that makes it difficult to work properly on AFS filesystem as we do at CERN.

To workaround this issue, it is recommended to use the latest version available in the master branche. This version can be installed like this:

lxplus:$ pip3 install --user git+https://github.com/htcondor/htmap.git@master

First Steps

Running a basic script with HTMap is straightforward. Using as a reference the example proposed in the upstream documentation, we can adapt it to the CERN infrastructure as follows:

#!/usr/bin/python3

import htcondor
import htmap

def double(x):
    return 2 * x

# Send credentials
credd = htcondor.Credd()
print ("[CREDD] Adding user credentials to credd daemon")
credd.add_user_cred(htcondor.CredTypes.Kerberos, None)

# Create and run the map
htmap.settings["DELIVERY_METHOD"] = "assume"
mapped = htmap.map(double, range(10), 
    map_options = htmap.MapOptions(custom_options={
        "MY.SendCredential": "true",
        "JobFlavour": '"espresso"',
        }))

print(mapped)
doubled = list(mapped)
print(doubled)

Compared to the upstream example, there are a few things that differ:

  • add_user_cred: this call is added to get your token submitted to the schedd. This action ensures that your job has a valid kerberos environment set.

  • MY.SendCredential: setting this option to True ensures that HTCondor transfer your credentials from the scheduler machine to the worker nodes when the job starts to run.

  • DELIVERY_METHOD: the default delivery method is Docker. At the time of writing some issues are being investigated that causes problems. The assume delivery method runs the maps directly on the worker nodes as any other vanilla HTCondor job.

  • JobFlavour: if you foresee that your jobs will take longer than the default 20 minutes maximum runtime, you can make use of the JobFlavour to define a particular MaxRuntime.

Wrapping External Programs

In some circumstances we might need to run external programs in our maps. This scenario is described upstream in the HTMaps documentation.

Adapting this example to our infrastructure gives us the following result:

import htcondor
import htmap
import subprocess
import time
import sys

def add_credentials():
    credd = htcondor.Credd()
    credd.add_user_cred(htcondor.CredTypes.Kerberos, None)

@htmap.mapped(map_options=htmap.MapOptions(fixed_input_files="dbl.sh",
    custom_options={'JobFlavour' : '"espresso"', "MY.SendCredential": "true"}))
def jobs(i):
    process = subprocess.run(
             ["bash", "dbl.sh", str(i)],
             stdout=subprocess.PIPE,
             )

    if process.returncode != 0:
        raise Exception("call to dbl.sh failed!")

    return_value = process.stdout

    return int(return_value)

if __name__ == "__main__":
    add_credentials()
    htmap.settings["DELIVERY_METHOD"] = "assume"
    new_map = jobs.map(range(5))
    new_map.wait(show_progress_bar = True)
    print(list(new_map))

The dbl.sh script picks the factor to multiply by from EOS and applies the change to the map value:

#!/bin/bash

FACTOR=$(cat /eos/project/l/lxbatch/public/batchdocs/factor)
echo $(($FACTOR * $1))

The output of running this example should be:

lxplus:$ python3 docs.py
bland-quick-frog:   0%|                  | 0/5 [00:04<?, ?component
bland-quick-frog:  60%|############2     | 3/5 [00:58<00:02,  1.08s/compone
bland-quick-frog: 100%|##################| 5/5 [01:13<00:00, 14.71s/component]
[0, 2, 4, 6, 8]