Running a Tensorflow app with GPUs and HPC


Teaching: 45 min
Exercises: 0 min
  • How can we run a container on a HPC resource?

  • Run a task on a HPC facility using Singularity.

Setting up on Zeus

First login using your supplied credentials.


Now we need to get the MINST benchmark, as we did in the Nimbus section. This time we will download directly to the scratch directory.

git clone
cd models/tutorials/image/mnist
#!/bin/bash --login
#SBATCH --job-name=minst_test
#SBATCH --nodes=1
#SBATCH --time=00:15:00
#SBATCH --account=courses01
#SBATCH --gres=gpu:1
#SBATCH --partition=gpuq
#SBATCH --reservation=courseq-gpu

module load broadwell
module swap gcc/4.8.5 gcc/5.5.0
module load cuda

mkdir $MYSCRATCH/tmp
export MY_WORKSCRIPT=${MYSCRATCH}/models/tutorials/image/mnist/
export MY_CONTAINER=docker://tensorflow/tensorflow:latest-gpu

if [ ! -d "$SINGULARITY_CACHEDIR" ]; then

srun -N 1 -n 1 --export=ALL singularity exec --nv --bind /scratch ${MY_CONTAINER} python ${MY_WORKSCRIPT}

The job launch command is not too different from a normal SLURM command line. We still launch with the number of nodes (-N 1) and tasks (-n 1) we want to use. Instead of simply running the Python script, we now launch a Singularity container, and from within that container we run Python.

The other important Singularity specific directives include:



We now need to submit our jobscript to the scheduler:

sbatch minst.slm
Submitted batch job 2498240

It may take a little bit the first time, as the TensorFlow image is pulled down. You should see output similar to this: ~~~ Docker image path: Cache folder set to /group/courses01/cou000/singularity/docker Creating container runtime… Extracting data/train-images-idx3-ubyte.gz Extracting data/train-labels-idx1-ubyte.gz Extracting data/t10k-images-idx3-ubyte.gz Extracting data/t10k-labels-idx1-ubyte.gz 2018-02-18 20:33:59.677580: W tensorflow/core/platform/] The TensorFlow library wasn’t compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations. 2018-02-18 20:33:59.677620: W tensorflow/core/platform/] The TensorFlow library wasn’t compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations. 2018-02-18 20:34:00.148531: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2017-02-18 20:34:00.148926: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: major: 3 minor: 0 memoryClockRate (GHz) 0.8885 pciBusID 0000:03:00.0 Total memory: 2.95GiB Free memory: 2.92GiB 2018-02-18 20:34:00.148954: I tensorflow/core/common_runtime/gpu/] DMA: 0 2018-02-18 20:34:00.148965: I tensorflow/core/common_runtime/gpu/] 0: Y Initialized! Step 0 (epoch 0.00), 21.7 ms Minibatch loss: 8.334, learning rate: 0.010000 Minibatch error: 85.9% Validation error: 84.6% Step 100 (epoch 0.12), 20.9 ms Minibatch loss: 3.235, learning rate: 0.010000 Minibatch error: 4.7% Validation error: 7.8% Step 200 (epoch 0.23), 20.5 ms Minibatch loss: 3.363, learning rate: 0.010000 Minibatch error: 9.4% Validation error: 4.2% […snip…] Step 8500 (epoch 9.89), 20.5 ms Minibatch loss: 1.602, learning rate: 0.006302 Minibatch error: 0.0% Validation error: 0.9% Test error: 0.8%

Key Points

  • Containers can be used to support cross platform workflows