GPU application examples with HTCondor
Below you will find some basic examples of how to run popular GPU applications under HTCondor. You may also try them interactively on lxplus
Matrix
This exercise will run a matrix operation job using the GPU. Create a python file called matrix.py with the following contents:
import numpy as np
from timeit import default_timer as timer
from numba import vectorize
@vectorize(['float32(float32, float32)'], target='cuda')
def pow(a, b):
    return a ** b
vec_size = 100000000
a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
c = np.zeros(vec_size, dtype=np.float32)
start = timer()
c = pow(a,b)
duration = timer() - start
print(duration)
This will take a vector of random numbers of size 100000000 and raise it to the power of the contents of a different, identically sized randomly generated vector. The vectorize function will ensure that the GPU resources are utilised.
Create a file matrix.sh with the following contents:
#!/bin/bash
python -m virtualenv myvenv
source myvenv/bin/activate
pip install numba
python matrix.py
Make this script executable:
chmod +x matrix.sh
This will generate a virtual Python environment, activate it, install the necessary python Packages and then run our program.
Create a submit file named matrix.sub with the following contents:
executable              = matrix.sh
arguments               = $(ClusterId)$(ProcId)
output                  = matrix.$(ClusterId).$(ProcId).out
error                   = matrix.$(ClusterId).$(ProcId).err
log                     = matrix.$(ClusterId).log
should_transfer_files   = YES
transfer_input_files    = matrix.py 
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
queue
Note the request_GPUs line which will ensure the job is assigned to a node with a GPU.
Submit the job with the following command:
condor_submit matrix.sub
For comparison, we could run the same code without GPU optimisation (@vectorize line) to compare the run times.
TensorFlow
This is an example of how to submit a simple TensorFlow job.
Create a python file tf_matmul.py (based on the example found here):
import tensorflow as tf
import time
print("GPU Available: ", tf.test.is_gpu_available())
matrix1 = tf.constant([1.0,2.0,3.0,4.0], shape=[2, 2])
matrix2 = tf.matrix_inverse(matrix1)
product = tf.matmul(matrix1, matrix2)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(product))
with tf.Session() as sess:
    t0 = time.time()
    result = sess.run(product)
    t1 = time.time()
    print("result of matrix multiplication")
    print("===============================")
    print(result)
    print("Time: ", t1-t0)
print("===============================")
Prepare the python environment prior to submission:
Note that the following commands will not work on lxplus7. For lxplus7, you would need to run scl enable devtoolset-9 bash first in order to activate a modern gcc version. This modern gcc is already present in lxplus.
python3 -m venv myvenv
source myvenv/bin/activate
pip install tensorflow
pip install tensorflow-gpu
Create a bash script tf_matmul.sh with the following contents, making sure you use the full absolute path that points to the myvenv directory you just created.
#!/bin/bash
source /afs/cern.ch/user/LETTER/YOURUSER/PATH/myvenv/bin/activate
python tf_matmul.py
Make this script executable:
chmod +x tf_matmul.sh
This will setup a python environment and install the necessary packages before running the python script. Create the submit file tf_matmul.sub:
executable              = tf_matmul.sh
arguments               = $(ClusterId)$(ProcId)
output                  = tf_matmul.$(ClusterId).$(ProcId).out
error                   = tf_matmul.$(ClusterId).$(ProcId).err
log                     = tf_matmul.$(ClusterId).log
transfer_input_files    = tf_matmul.py 
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
queue
Submit the job with the following command:
condor_submit tf_matmul.sub
Docker
This is an example of how to run a TensorFlow job using the TensorFlow-gpu docker image. This example is based on this blogpost.
Create a python file docker.py with the following contents:
#!/usr/bin/env python
import sys
import numpy as np
import tensorflow as tf
from datetime import datetime
print("GPU Available: ", tf.test.is_gpu_available())
device_name = "/gpu:0"
with tf.device(device_name):
    random_matrix = tf.random.uniform(shape=[4,4], minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)
startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
    result = session.run(sum_operation)
    print(result)
# It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Device:", device_name)
print("Time taken:", str(datetime.now() - startTime))
Make this script executable:
chmod +x docker.py
Create the submit file docker.sub with the following contents:
universe                = docker
docker_image            = tensorflow/tensorflow:1.15.0-gpu
executable              = docker.py
arguments               = /etc/hosts
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
output                  = $(ClusterId).$(ProcId).out
error                   = $(ClusterId).$(ProcId).err
log                     = $(ClusterId).$(ProcId).log
request_memory          = 100M
request_gpus            = 1
+Requirements           = OpSysAndVer =?= "CentOS7"
queue 1
This will pull a TensorFlow GPU image that is enabled to handle GPU jobs and assign them to GPU resources. Submit the job with the following command:
condor_submit docker.sub
Singularity
Singularity is installed on all batch nodes including those with GPU's. Hence all that is required is specifying the location of the necessary image on cvmfs. Note that only preapproved singularity images are available.
Create the following submit file sing.sub with the following contents:
executable              = sing.sh
arguments               = $(ClusterId)$(ProcId)
output                  = sing.$(ClusterId).$(ProcId).out
error                   = sing.$(ClusterId).$(ProcId).err
log                     = sing.$(ClusterId).log
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
+SingularityImage = "/cvmfs/unpacked.cern.ch/registry.hub.docker.com/library/centos:centos7'
queue
This requests a GPU and specifies that a CentOS7 image hosted on /cvfms/ will be mounted. Create the script sing.sh with the following contents:
#!/bin/bash
cat /etc/centos-release
nvidia-smi
Make this script executable:
chmod +x sing.sh
This will print the CentOS version and run the nvidia-smi tool which will print available nvidia devices to show that they are available to the Singularity image.
Submit the job with the following command:
condor_submit sing.sub
Interactive Jobs
Interactive jobs can be run to gain access to the resource for testing and development. There are two ways in which interactive access can be gained to the resources, either requested at the time of job submission or once the job has already started. If interactive access is required from the moment the user is first allocated to a machine, the -interactive parameter must be specified when the job is submitted:
condor_submit -interactive gpu_job.sub
The user will then be presented with the following statement whilst the job is waiting to be assigned a machine:
Waiting for job to start...
Once the machine has been assigned, the user will have access to the terminal:
Welcome to slot1_1@b7g47n0004.cern.ch!
[jfenech@b7g47n0004 dir_21999]$ nvidia-smi
Mon Dec  9 16:39:16 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   35C    P0    28W / 250W |     11MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
Alternatively, if you would rather submit a batch job and access the machine whilst it is running for debugging or general monitoring, you can condor_ssh_to_job to gain access to the terminal on the machine that is currently running the job. The -auto-retry flag will periodically check to see whether the job has been assigned to a machine and set up the connection when it has.
condor_ssh_to_job -auto-retry jobid
For example:
[jfenech@lxplus752 Int_Hel_Mul]$ condor_ssh_to_job -auto-retry 2257625.0 
Waiting for job to start...
Welcome to slot1_1@b7g47n0003.cern.ch!
Your condor job is running with pid(s) 12564.
[jfenech@b7g47n0003 dir_12469]$ nvidia-smi
Mon Dec  9 16:59:44 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   37C    P0    36W / 250W |      0MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+