Using the OneCellPipe pipeline

January 22, 2018

OneCellBio

This is a sub-page of the multi-page documentation for the OneCellPipe pipeline of OneCellBio.

The configuration files

If you run into trouble or your setup is somewhat special, the main config file to look at is nextflow.config :

The “profiles” part allows you to prepare options in a specific environment, e.g. standard to run on the local machine and slurm to run with a SLURM cluster. Make use of the profile with the -profile option on the command line. Some more details are given below.

The rest of the configuration provide default values e.g. for the container names and processing parameters. You should not have to modify these.
All the settings in the “params” section can be provided on the command-line if necessary, e.g. –dir <fastq-directory>

The file “indrop_fixed.yaml” in the “bin” directory provides the default settings for indrops. You don’t have to modify these, but they might be of interest to you. Some indrops settings can be passed through as command line arguments. The pipeline uses this file to construct the full config passed to indrops during the analysis. After the analysis has started it can be found as “indrop_config_used.yaml” in the “resources” directory of your fastq directory.

Specific configurations

The pipeline can easily be configured to run in different environments by adjusting the “profile” section of the nextflow.config file as explained here. Here you define specific settings for e.g. execution on the local machine (standard) or on a specific environment, e.g. a high performance compute cluster (HPC). The profile is then activated by using the -profile {profile-name} option on the command line (note the single dash for this), e.g.:

nextflow onecellpipe.nf --dir sampledata -profile slurm

Shared directory on compute clusters

In general there should be a directory defined in your cluster system that is shared between all nodes, it could be named /tmp or /shared. This is used to store a copy of the container used for the pipeline. If /tmp is not shared in you system, but you want to use e.g. /shared, please update

cacheDir = ‚/shared‘ in bin/nextflow.singularity.config
and add „–cache /shared“ on the command line.

SLURM HPC

NextFlow options for using the SLURM workload manager are described here. An example profile section was added to the configuration file with the “executor” slurm:

slurm {
  process.executor = 'slurm'
  process.queue = 'general'
  // optional settings
  process.clusterOptions = '-n 3 -N 1 --mem 6000 --job-name F'
  process.$analysis_3.queue = 'serial_requeue'
  process.$analysis_3.clusterOptions = '-n 4 -N 1 --mem 10000 --job-name S'
  process.$analysis_4.queue = 'serial_requeue'
  process.$analysis_4.clusterOptions = '-n 3 -N 1 --mem 3000 --job-name Q'
  process.$analysis_5.queue = 'serial_requeue'
  process.$analysis_5.clusterOptions = '-n 4 -N 1 --mem 8000 --job-name A'
}

You will have to adjust these parameters adjusting the queue name and the other parameters as you would when using your SLURM system directly. You can then use the slurm executor on the command line. Please not the single dash for -profile:

nextflow onecellpipe.nf --dir sampledata -profile slurm

LSF

For using Platform LSF or OpenLava LSF an example profile section was added to the configuration file (nexflow.config) with the “executor” lsf:

lsf {
  process.executor = 'lsf'
  process.queue = 'general'
  memory = '6 GB'
  scratch = '/scratch'
}

You will have to adjust these parameters specifying the queue name, the name of the temporary scratch directory and any other parameter you would add when using your LSF system directly. You can then use the LSF executor on the command line. Please not the single dash for -profile:

nextflow onecellpipe.nf --dir sampledata -profile lsf

Some more details can be found on this NextFlow page.