Installation with SLURM

Using cellfinder with the SLURM workload manager

These instructions are based on SWC SLURM cluster, and so most of the command syntax will likely vary. Specifically, you are unlikely to have modules configured in exactly the same way as us.

If you are from the SWC, and using the SWC cluster, please see the specific instructions here.

Prepare the environment

On our cluster, modules are only available on a compute node, so start an interactive job on a GPU node, and request a GPU for testing.

srun -p gpu --gres=gpu:1 --pty bash

Load miniconda

module load miniconda

Set up conda environment and install cellfinder

Create and activate new minimal conda environment

conda create --name cellfinder python=3.9
conda activate cellfinder

Install CUDA and cuDNN

conda install -c conda-forge cudnn cudatoolkit

Install cellfinder

pip install cellfinder

Check that tensorflow and CUDA are configured properly:

python
import tensorflow as tf
tf.test.is_gpu_available()

If you see something like this, then all is well.

2019-06-26 10:51:34.697900: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2019-06-26 10:51:34.881183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: TITAN RTX major: 7 minor: 5 memoryClockRate(GHz): 1.77
pciBusID: 0000:2d:00.0
totalMemory: 23.62GiB freeMemory: 504.25MiB
2019-06-26 10:51:34.881217: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-26 10:51:35.251465: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-26 10:51:35.251505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-26 10:51:35.251511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-26 10:51:35.251729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 195 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:2d:00.0, compute capability: 7.5)
True

Exit python

exit()

End your interactive job

exit

Running cellfinder

Once installed on a cluster with SLURM, cellfinder can be run interactively, but it is more convenient (in the long run) to submit a batch job.

A SLURM tutorial is out of scope of the cellfinder documentation (look online or ask your sysadmin), but an example batch script is here:

#!/bin/bash
#SBATCH -p gpu # partition (queue)
#SBATCH -N 1 # number of nodes
#SBATCH --mem 40G # memory pool for all cores
#SBATCH --gres=gpu:1
#SBATCH -n 10
#SBATCH -t 1-0:0 # time (D-HH:MM)
#SBATCH -o cellfinder.out
#SBATCH -e cellfinder.err
#SBATCH --mail-type=ALL
cell_file='/path/to/signal/channel'
background_file='path/to/background/channel'
output_dir='/path/to/output/directory'
echo "Loading conda environment"
module load miniconda
conda activate cellfinder
echo "Running cellfinder"
# Just an example. See the user guide for the specific parameters
cellfinder -s $cell_file -b $background_file -o $output_dir -v 5 2 2 --orientation psl

You can then submit the batch job to the scheduler:

sbatch cellfinder_sbatch.sh

If you use the example script, you will receive an email when the job is done. To watch the progress, log onto a node with the same storage drive mounted and run:

watch tail -n 100 /path/to/cellfinder_log.log