GPUs
GPU resources are available for use and can be accessed using HTCondor. To view what resources are available, run the following command:
condor_status -constraint '!isUndefined(DetectedGPUs)' -compact -af Machine CUDADeviceName DetectedGPUs
NB: This command will list nodes with CC7 running HTCondor9. With HTCondor10 on AlmaLinux9, the class ad CUDADeviceName is named: GPUs_DeviceName Hence the command becomes:
condor_status -constraint '!isUndefined(DetectedGPUs)' -compact -af Machine GPUs_DeviceName DetectedGPUs
Requesting GPU resources requires a single addition to the submit file.
request_gpus = 1
To request more than one GPU in HTCondor, the number (n) can be used where 0 < n < 5
request_gpus = n
The following examples show how to submit a Hello World GPU job, a simple TensorFlow job, a job in Docker and a job in Singularity.
Running on specific platforms
The Batch Service offers a variety of GPU platforms in HTCondor. Depending on the use, you may want to run on specific models with the right capabilities for your jobs.
HTCondor publishes automatically some CUDA attributes in our GPU machines that you can use in the requirements
attribute for your submit file. The following examples show some of the possible options:
-
Requirements based on device: you can use the Machine ClassAd attribute
CUDADeviceName
to match the GPU type you need:requirements = regexp("V100", TARGET.GPUs_DeviceName)
: this expression will make your job able to run on ourV100
orV100S
cards.requirements = TARGET.CUDADeviceName =?= "Tesla T4"
: this expression will make your job run only onTesla T4
cards.requirements = regexp("A100", TARGET.CUDADeviceName)
: this expression will make your job able to run on ourA100
cards.
-
Requirements based on compute capabilities: the compute capability version of the device is published by HTCondor in the Machine ClassAdd attribute
CUDACapability
and can be used as well in your submit file like this:requirements = TARGET.CUDACapability =?= 7.0
: this expression will make your job able to run on ourV100
orV100S
cards.requirements = TARGET.CUDACapability =?= 7.5
: this expression will make your job run only onTesla T4
cards.
While using device names are more friendly in terms of human readability, using compute capabilities can be more flexible in the long term as you won't have to update your jobs if we add more hardware that matches the desired capability version.
Matrix
This exercise will run a matrix operation job using the GPU. Create a python file called matrix.py with the following contents:
import numpy as np
from timeit import default_timer as timer
from numba import vectorize
@vectorize(['float32(float32, float32)'], target='cuda')
def pow(a, b):
return a ** b
vec_size = 100000000
a = b = np.array(np.random.sample(vec_size), dtype=np.float32)
c = np.zeros(vec_size, dtype=np.float32)
start = timer()
c = pow(a,b)
duration = timer() - start
print(duration)
This will take a vector of random numbers of size 100000000 and raise it to the power of the contents of a different, identically sized randomly generated vector. The vectorize function will ensure that the GPU resources are utilised.
Create a file matrix.sh with the following contents:
#!/bin/bash
python -m virtualenv myvenv
source myvenv/bin/activate
pip install numba
python matrix.py
Make this script executable:
chmod +x matrix.sh
This will generate a virtual Python environment, activate it, install the necessary python Packages and then run our program.
Create a submit file named matrix.sub with the following contents:
executable = matrix.sh
arguments = $(ClusterId)$(ProcId)
output = matrix.$(ClusterId).$(ProcId).out
error = matrix.$(ClusterId).$(ProcId).err
log = matrix.$(ClusterId).log
should_transfer_files = YES
transfer_input_files = matrix.py
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
queue
Note the request_GPUs line which will ensure the job is assigned to a node with a GPU.
Submit the job with the following command:
condor_submit matrix.sub
For comparison, we could run the same code without GPU optimisation (@vectorize line) to compare the run times.
TensorFlow
This is an example of how to submit a simple TensorFlow job.
Create a python file tf_matmul.py (based on the example found here):
import tensorflow as tf
import time
print("GPU Available: ", tf.test.is_gpu_available())
matrix1 = tf.constant([1.0,2.0,3.0,4.0], shape=[2, 2])
matrix2 = tf.matrix_inverse(matrix1)
product = tf.matmul(matrix1, matrix2)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print(sess.run(product))
with tf.Session() as sess:
t0 = time.time()
result = sess.run(product)
t1 = time.time()
print("result of matrix multiplication")
print("===============================")
print(result)
print("Time: ", t1-t0)
print("===============================")
Prepare the python environment prior to submission:
Note that the following commands will not work on lxplus7. For lxplus7, you would need to run scl enable devtoolset-9 bash
first in order to activate a modern gcc version. This modern gcc is already present in lxplus.
python3 -m venv myvenv
source myvenv/bin/activate
pip install tensorflow
pip install tensorflow-gpu
Create a bash script tf_matmul.sh with the following contents, making sure you use the full absolute path that points to the myvenv directory you just created.
#!/bin/bash
source /afs/cern.ch/user/LETTER/YOURUSER/PATH/myvenv/bin/activate
python tf_matmul.py
Make this script executable:
chmod +x tf_matmul.sh
This will setup a python environment and install the necessary packages before running the python script. Create the submit file tf_matmul.sub:
executable = tf_matmul.sh
arguments = $(ClusterId)$(ProcId)
output = tf_matmul.$(ClusterId).$(ProcId).out
error = tf_matmul.$(ClusterId).$(ProcId).err
log = tf_matmul.$(ClusterId).log
transfer_input_files = tf_matmul.py
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
queue
Submit the job with the following command:
condor_submit tf_matmul.sub
Docker
This is an example of how to run a TensorFlow job using the TensorFlow-gpu docker image. This example is based on this blogpost.
Create a python file docker.py with the following contents:
#!/usr/bin/env python
import sys
import numpy as np
import tensorflow as tf
from datetime import datetime
print("GPU Available: ", tf.test.is_gpu_available())
device_name = "/gpu:0"
with tf.device(device_name):
random_matrix = tf.random.uniform(shape=[4,4], minval=0, maxval=1)
dot_operation = tf.matmul(random_matrix, tf.transpose(random_matrix))
sum_operation = tf.reduce_sum(dot_operation)
startTime = datetime.now()
with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
result = session.run(sum_operation)
print(result)
# It can be hard to see the results on the terminal with lots of output -- add some newlines to improve readability.
print("\n" * 5)
print("Device:", device_name)
print("Time taken:", str(datetime.now() - startTime))
Make this script executable:
chmod +x docker.py
Create the submit file docker.sub with the following contents:
universe = docker
docker_image = tensorflow/tensorflow:1.15.0-gpu
executable = docker.py
arguments = /etc/hosts
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
output = $(ClusterId).$(ProcId).out
error = $(ClusterId).$(ProcId).err
log = $(ClusterId).$(ProcId).log
request_memory = 100M
request_gpus = 1
+Requirements = OpSysAndVer =?= "CentOS7"
queue 1
This will pull a TensorFlow GPU image that is enabled to handle GPU jobs and assign them to GPU resources. Submit the job with the following command:
condor_submit docker.sub
Singularity
Singularity is installed on all batch nodes including those with GPU's. Hence all that is required is specifying the location of the necessary image on cvmfs. Note that only preapproved singularity images are available.
Create the following submit file sing.sub with the following contents:
executable = sing.sh
arguments = $(ClusterId)$(ProcId)
output = sing.$(ClusterId).$(ProcId).out
error = sing.$(ClusterId).$(ProcId).err
log = sing.$(ClusterId).log
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
request_GPUs = 1
request_CPUs = 1
+SingularityImage = "/cvmfs/unpacked.cern.ch/registry.hub.docker.com/library/centos:centos7"
queue
This requests a GPU and specifies that a CentOS7 image hosted on /cvfms/ will be mounted. Create the script sing.sh with the following contents:
#!/bin/bash
cat /etc/centos-release
nvidia-smi
Make this script executable:
chmod +x sing.sh
This will print the CentOS version and run the nvidia-smi tool which will print available nvidia devices to show that they are available to the Singularity image.
Submit the job with the following command:
condor_submit sing.sub
Interactive Jobs
Interactive jobs can be run to gain access to the resource for testing and development. There are two ways in which interactive access can be gained to the resources, either requested at the time of job submission or once the job has already started. If interactive access is required from the moment the user is first allocated to a machine, the -interactive parameter must be specified when the job is submitted:
condor_submit -interactive gpu_job.sub
The user will then be presented with the following statement whilst the job is waiting to be assigned a machine:
Waiting for job to start...
Once the machine has been assigned, the user will have access to the terminal:
Welcome to slot1_1@b7g47n0004.cern.ch!
[jfenech@b7g47n0004 dir_21999]$ nvidia-smi
Mon Dec 9 16:39:16 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:00:05.0 Off | 0 |
| N/A 35C P0 28W / 250W | 11MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Alternatively, if you would rather submit a batch job and access the machine whilst it is running for debugging or general monitoring, you can condor_ssh_to_job to gain access to the terminal on the machine that is currently running the job. The -auto-retry flag will periodically check to see whether the job has been assigned to a machine and set up the connection when it has.
condor_ssh_to_job -auto-retry jobid
For example:
[jfenech@lxplus752 Int_Hel_Mul]$ condor_ssh_to_job -auto-retry 2257625.0
Waiting for job to start...
Welcome to slot1_1@b7g47n0003.cern.ch!
Your condor job is running with pid(s) 12564.
[jfenech@b7g47n0003 dir_12469]$ nvidia-smi
Mon Dec 9 16:59:44 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:00:05.0 Off | 0 |
| N/A 37C P0 36W / 250W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Running GPU-accelerated GUI applications
In some cases, you may want to run GUI applications that are GPU-accelerated, and probably also interact with this GUI.
Condor GPU nodes are configured with the capability of achieving this by using TurboVNC.
The procedure consists of:
- Launching an interactive GPU condor job that spawns a TurboVNC server
- Connecting to the workernode's virtual display using a TurboVNC client
This virtual display be able to run GPU accelerated GUI applications efficiently. However, keep in mind that while the application may run smoothly on the GPU, the framerate that you will see on your own computer will depend greatly on the quality of your connection to the CERN datacenter. On-site, the experience should quite good in most cases, while when not at CERN it will greatly depend on your internet connection's latency and bandwidth.
To run GPU accelerated GUI applications, you will need to:
- Download and install TurboVNC on your desktop from https://turbovnc.org/.
-
In your Condor script (or in the job's shell when using
condor_submit -interactive
), launch the turbovnc server as follows: /opt/TurboVNC/bin/vncserver -fg -wm mate-session -
Once the job is running, the job output will contain something similar to the following:
Desktop 'TurboVNC: b7g47n0005.cern.ch:1 (pllopiss)' started on display b7g47n0005.cern.ch:1 Starting applications specified in /afs/cern.ch/user/p/pllopiss/.vnc/xstartup.turbovnc Log file is /afs/cern.ch/user/p/pllopiss/.vnc/b7g47n0005.cern.ch:1.log
The first time that this is run, it may also ask for you to set a TurboVNC password. When asked to provide a view-only password, it is safe to say no:
n
. -
From this output, please note the workernode and display number. In the example above,
b7g47n0005.cern.ch:1
.The workernode is
b7g47n0005.cern.ch
and the display number is1
. The display number will determine the port we have to connect to:5900 + display_number
. In this case this would correspond to port5901
. -
Finally, open TurboVNC on your desktop, and connect to workernode:port. In this case,
b7g47n0005:5901
. You will be prompted for a password. Use the one that was set the first time you launched the vncserver job. If you do not remember it, you may reset it by deleting your file~/.vnc/passwd
.
At this point, you should have a virtual desktop with GPU acceleration for the duration of your job.
To run an application, open the terminal and run any application by prefixing it with vglrun
to get the best performance. i.e. type vglrun glxgears
in the virtual desktop's terminal, or vglrun /afs/cern.ch/user/u/my_gpu_application
. IF the command hangs, try without vglrun
.
As a special case, if connecting from outside of the CERN network, an additional step is required after Step 4. If you are connecting from outside of the CERN network, you will need to establish an ssh tunnel.
From your desktop, open an ssh tunnel to the workernode and port as obtained in Step 4. The command for opening an ssh tunnel is as follows: ssh -L 5901:b7g47n0005:5901 lxplus.cern.ch
. This ssh session is not needed to type commands in it, but simply for the tunnel to exist. This makes any connection to your desktop's port 5901 being tunneled to the workernode's port 5901.
Once this ssh session is established, you may proceed with Step 5, except you would always use localhost
instead of the workernode. In the example above, this would correspond to connecting to localhost:5901
.
Please note that closing this ssh session will cut out the ssh tunnel, and therefore also the TurboVNC session.
Tip: If your virtual desktop session is laggy, clunky or slow, you may have to lower the virtual desktop's quality to get a smoother framerate. In TurboVNC, before opening the connection, press the Options...
button to open the settings window. In the settings window's Encoding
tab, you may chose an Encoding method that fits your internet connection. For instance, choosing Custom
and lowering the JPEG image quality to, say, 20, may provide great results even when connecting remotely from a slow internet connection.
Please note that X11 forwarding does not provide a GPU-accelerated solution, hence the use of TurboVNC.
Monitoring
You can monitor your jobs in via condor_q in the normal way, and you can see how many other jobs are currently in the queue requesting GPU's:
condor_q -global -all -const '(requestGPUs > 0)'
There is a custom Grafana dashboard depicting information on the GPU's here. There are two time series heatmaps that are dedicated to either short or long time periods: changing the time period in the drop down menu in the top right will affect which of the two graphs has data. There is another drop down menu at the top left which will specify the metric to be displayed in the heatmaps.
Notes
Please note that benchmark jobs are currently not compatible with GPU nodes: All GPU VM's are identical and so benchmark jobs are redundant. Jobs marked as both Benchmark and GPU jobs will not be scheduled.
Machines are currently separated into subhostgroups that are dedicated to different job lengths. This is to prevent long running jobs from blocking the resources so that less demanding jobs can have a chance to be scheduled. The flavours available are: 'espresso', 'longlunch' and 'nextweek' (see the appropriate section in the tutorial for more info on job flavours). Short jobs can run on long job nodes but long jobs cannot run on short job nodes so you will maximise the chance that your job will be scheduled if you use as short a job flavour as you can.
For example, include in your submit file:
+JobFlavour = "espresso"