Skip to content

GPUs

GPU resources are available for use and can be accessed using HTCondor. To view what resources are available, run the following command:

condor_status -constraint  '(DetectedGpus > 0)' -compact  -af Machine CUDADeviceName DetectedGPUs

Requesting GPU resources requires a single addition to the submit file.

request_gpus            = 1

To request more than one GPU in HTCondor, the number (n) can be used where 0 < n < 5

request_gpus            = n

The following examples show how to submit a Hello World GPU job, a simple TensorFlow job, a job in Docker and a job in Singularity.

Matrix

This exercise will run a matrix operation job using the GPU. Create a python file called matrix.py with the following contents:

import numpy as np
from timeit import default_timer as timer
from numba import vectorize

@vectorize(['float32(float32, float32)'], target='cuda')

def pow(a, b):
    return a ** b

vec_size = 100000000
a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
c = np.zeros(vec_size, dtype=np.float32)
start = timer()
c = pow(a,b)
duration = timer() - start
print(duration)

This will take a vector of random numbers of size 100000000 and raise it to the power of the contents of a different, identically sized randomly generated vector. The vectorize function will ensure that the GPU resources are utilised.

Create a file matrix.sh with the following contents:

#!/bin/bash
python -m virtualenv myvenv
source myvenv/bin/activate
pip install numba
python matrix.py

Make this script executable:

chmod +x matrix.sh

This will generate a virtual Python environment, activate it, install the necessary python Packages and then run our program.

Create a submit file named matrix.sub with the following contents:

executable              = matrix.sh
arguments               = $(ClusterId)$(ProcId)
output                  = matrix.$(ClusterId).$(ProcId).out
error                   = matrix.$(ClusterId).$(ProcId).err
log                     = matrix.$(ClusterId).log
should_transfer_files   = YES
transfer_input_files    = matrix.py 
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
queue

Note the request_GPUs line which will ensure the job is assigned to a node with a GPU.

Submit the job with the following command:

condor_submit matrix.sub

For comparison, we could run the same code without GPU optimisation (@vectorize line) to compare the run times.

TensorFlow

This is an example of how to submit a simple TensorFlow job.

Create a python file tf_matmul.py (based on the example found here):

import tensorflow as tf
import time

print("GPU Available: ", tf.test.is_gpu_available())

matrix1 = tf.constant([1.0,2.0,3.0,4.0], shape=[2, 2])
matrix2 = tf.matrix_inverse(matrix1)
product = tf.matmul(matrix1, matrix2)

# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(product))

with tf.Session() as sess:
    t0 = time.time()
    result = sess.run(product)
    t1 = time.time()
    print("result of matrix multiplication")
    print("===============================")
    print(result)
    print("Time: ", t1-t0)
print("===============================")

Create a bash script tf_matmul.sh with the following contents:

#!/bin/bash
python -m virtualenv myvenv
source myvenv/bin/activate
pip install tensorflow
pip install tensorflow-gpu
python tf_matmul.py

Make this script executable:

chmod +x tf_matmul.sh

This will setup a python environment and install the necessary packages before running the python script. Create the submit file tf_matmul.sub:

executable              = tf_matmul.sh
arguments               = $(ClusterId)$(ProcId)
output                  = tf_matmul.$(ClusterId).$(ProcId).out
error                   = tf_matmul.$(ClusterId).$(ProcId).err
log                     = tf_matmul.$(ClusterId).log
transfer_input_files    = tf_matmul.py 
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
queue

Submit the job with the following command:

condor_submit tf_matmul.sub

Docker

This is an example of how to run a TensorFlow job using the TensorFlow-gpu docker image. This example is based on this blogpost.

Create a python file docker.py with the following contents:

import sys
import numpy as np
import tensorflow as tf
from datetime import datetime

print("GPU Available: ", tf.test.is_gpu_available())

device_name = "/gpu:0"
with tf.device(device_name):
    random_matrix = tf.random.uniform(shape=[4,4], minval=0, maxval=1)
    dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
    sum_operation = tf.reduce_sum(dot_operation)

startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
        result = session.run(sum_operation)
        print(result)

# It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Shape:", shape, "Device:", device_name)
print("Time taken:", str(datetime.now() - startTime))

Create a bash script docker.sh:

#!/bin/bash
python -m virtualenv myvenv
source myvenv/bin/activate
python docker.py

Make this script executable:

chmod +x docker.sh

Create the submit file docker.sub with the following contents:

universe                = docker
docker_image            = tensorflow/tensorflow:latest-gpu
executable              = docker.sh
arguments               = /etc/hosts
transfer_input_files    = docker.py
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
output                  = $(ClusterId).$(ProcId).out
error                   = $(ClusterId).$(ProcId).err
log                     = $(ClusterId).$(ProcId).log
request_memory          = 100M
request_gpus            = 1
+Requirements           = OpSysAndVer =?= "CentOS7"
queue 1

This will pull the latest TensorFlow GPU image that is enabled to handle GPU jobs and assign them to GPU resources. Submit the job with the following command:

condor_submit docker.sub

Singularity

Singularity is installed on all batch nodes including those with GPU's. Hence all that is required is specifying the location of the necessary image on cvmfs. Note that only preapproved singularity images are available.

Create the following submit file sing.sub with the following contents:

executable              = sing.sh
arguments               = $(ClusterId)$(ProcId)
output                  = sing.$(ClusterId).$(ProcId).out
error                   = sing.$(ClusterId).$(ProcId).err
log                     = sing.$(ClusterId).log
should_transfer_files   = YES
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
+SingularityImage = "/cvmfs/unpacked.cern.ch/registry.hub.docker.com/library/centos:centos7"
queue

This requests a GPU and specifies that a CentOS7 image hosted on /cvfms/ will be mounted. Create the script sing.sh with the following contents:

#!/bin/bash
cat /etc/centos-release
nvidia-smi

Make this script executable:

chmod +x sing.sh

This will print the CentOS version and run the nvidia-smi tool which will print available nvidia devices to show that they are available to the Singularity image.

Submit the job with the following command:

condor_submit sing.sub

Interactive Jobs

Interactive jobs can be run to gain access to the resource for testing and development. There are two ways in which interactive access can be gained to the resources, either requested at the time of job submission or once the job has already started. If interactive access is required from the moment the user is first allocated to a machine, the -interactive parameter must be specified when the job is submitted:

condor_submit -interactive gpu_job.sub

The user will then be presented with the following statement whilst the job is waiting to be assigned a machine:

Waiting for job to start...

Once the machine has been assigned, the user will have access to the terminal:

Welcome to slot1_1@b7g47n0004.cern.ch!

[jfenech@b7g47n0004 dir_21999]$ nvidia-smi
Mon Dec  9 16:39:16 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   35C    P0    28W / 250W |     11MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Alternatively, if you would rather submit a batch job and access the machine whilst it is running for debugging or general monitoring, you can condor_ssh_to_job to gain access to the terminal on the machine that is currently running the job. The -auto-retry flag will periodically check to see whether the job has been assigned to a machine and set up the connection when it has.

condor_ssh_to_job -auto-retry jobid

For example:

[jfenech@lxplus752 Int_Hel_Mul]$ condor_ssh_to_job -auto-retry 2257625.0 
Waiting for job to start...
Welcome to slot1_1@b7g47n0003.cern.ch!
Your condor job is running with pid(s) 12564.
[jfenech@b7g47n0003 dir_12469]$ nvidia-smi
Mon Dec  9 16:59:44 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   37C    P0    36W / 250W |      0MiB / 32510MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Running GPU-accelerated GUI applications

In some cases, you may want to run GUI applications that are GPU-accelerated, and probably also interact with this GUI. Condor GPU nodes are configured with the capability of achieving this by using TurboVNC.
The procedure consists of:

  • Launching an interactive GPU condor job that spawns a TurboVNC server
  • Connecting to the workernode's virtual display using a TurboVNC client

This virtual display be able to run GPU accelerated GUI applications efficiently. However, keep in mind that while the application may run smoothly on the GPU, the framerate that you will see on your own computer will depend greatly on the quality of your connection to the CERN datacenter. On-site, the experience should quite good in most cases, while when not at CERN it will greatly depend on your internet connection's latency and bandwidth.

To run GPU accelerated GUI applications, you will need to:

  1. Download and install TurboVNC on your desktop from https://turbovnc.org/.
  2. In your Condor script (or in the job's shell when using condor_submit -interactive), launch the turbovnc server as follows: /opt/TurboVNC/bin/vncserver -fg -wm mate-session

  3. Once the job is running, the job output will contain something similar to the following:

    Desktop 'TurboVNC: b7g47n0005.cern.ch:1 (pllopiss)' started on display b7g47n0005.cern.ch:1
    
    Starting applications specified in /afs/cern.ch/user/p/pllopiss/.vnc/xstartup.turbovnc
    Log file is /afs/cern.ch/user/p/pllopiss/.vnc/b7g47n0005.cern.ch:1.log
    

    The first time that this is run, it may also ask for you to set a TurboVNC password. When asked to provide a view-only password, it is safe to say no: n.

  4. From this output, please note the workernode and display number. In the example above, b7g47n0005.cern.ch:1.

    The workernode is b7g47n0005.cern.ch and the display number is 1. The display number will determine the port we have to connect to: 5900 + display_number. In this case this would correspond to port 5901.

  5. Finally, open TurboVNC on your desktop, and connect to workernode:port. In this case, b7g47n0005:5901. You will be prompted for a password. Use the one that was set the first time you launched the vncserver job. If you do not remember it, you may reset it by deleting your file ~/.vnc/passwd.

At this point, you should have a virtual desktop with GPU acceleration for the duration of your job. To run an application, open the terminal and run any application by prefixing it with vglrun to get the best performance. i.e. type vglrun glxgears in the virtual desktop's terminal, or vglrun /afs/cern.ch/user/u/my_gpu_application. IF the command hangs, try without vglrun.

As a special case, if connecting from outside of the CERN network, an additional step is required after Step 4. If you are connecting from outside of the CERN network, you will need to establish an ssh tunnel. From your desktop, open an ssh tunnel to the workernode and port as obtained in Step 4. The command for opening an ssh tunnel is as follows: ssh -L 5901:b7g47n0005:5901 lxplus.cern.ch. This ssh session is not needed to type commands in it, but simply for the tunnel to exist. This makes any connection to your desktop's port 5901 being tunneled to the workernode's port 5901. Once this ssh session is established, you may proceed with Step 5, except you would always use localhost instead of the workernode. In the example above, this would correspond to connecting to localhost:5901. Please note that closing this ssh session will cut out the ssh tunnel, and therefore also the TurboVNC session.

Tip: If your virtual desktop session is laggy, clunky or slow, you may have to lower the virtual desktop's quality to get a smoother framerate. In TurboVNC, before opening the connection, press the Options... button to open the settings window. In the settings window's Encoding tab, you may chose an Encoding method that fits your internet connection. For instance, choosing Custom and lowering the JPEG image quality to, say, 20, may provide great results even when connecting remotely from a slow internet connection.

Please note that X11 forwarding does not provide a GPU-accelerated solution, hence the use of TurboVNC.

Monitoring

You can monitor your jobs in via condor_q in the normal way, and you can see how many other jobs are currently in the queue requesting GPU's:

condor_q -global -all -const '(requestGPUs > 0)'

There is a custom Grafana dashboard depicting information on the GPU's here. There are two time series heatmaps that are dedicated to either short or long time periods: changing the time period in the drop down menu in the top right will affect which of the two graphs has data. There is another drop down menu at the top left which will specify the metric to be displayed in the heatmaps.

Notes

Please note that benchmark jobs are currently not compatible with GPU nodes: All GPU VM's are identical and so benchmark jobs are redundant. Jobs marked as both Benchmark and GPU jobs will not be scheduled.

Machines are currently separated into subhostgroups that are dedicated to different job lengths. This is to prevent long running jobs from blocking the resources so that less demanding jobs can have a chance to be scheduled. The flavours available are: 'espresso', 'longlunch' and 'nextweek' (see the appropriate section in the tutorial for more info on job flavours). Short jobs can run on long job nodes but long jobs cannot run on short job nodes so you will maximise the chance that your job will be scheduled if you use as short a job flavour as you can.

For example, include in your submit file:

+JobFlavour = "espresso"