More tips for running the OneCellPipe

This is a sub-page of the multi-page documentation for the OneCellPipe pipeline of OneCellBio.

Command-line parametersss

You can run the entire raw data processing with the following command after navigating to the onecellpipe folder:

nextflow onecellpipe.nf --dir {path/to/your/fastq-files}

Here are all the options you can use if you do need to modify these defaults (use double dashes “–” for these):

[table id=2 /]

[table id=3 /]

Additional Nextflow parameters

Please use single dash “-” for these!

[table id=5 /]

Using screen for remote sessions

The control script will continue to run for a significant time as the different processing steps are executed. You would not want to interrupt the script to avoid loss of data or time. On a remote computer (login node or Amazon cloud machine) it is therefore a good idea to run the command within a “screen session” in case your connection gets interrupted, sending the output to a file and detaching from the screen once you see the pipeline is running:

screen

nextflow onecellpipe.nf --dir {path/to/your/fastq-files} >& 1cellpipe.out &

[ Press <ctrl><a><d> to detach ]

You can re-attach to the session with

screen -r

You can list all running screen sessions with

screen -ls
There is a screen on:
	48717	(Detached)
1 Socket in /var/folders/tc/j1_xv_rx3k1535wnv0000qb40000gn/T/.screen.

more details about the screen program are e.g. here.

Using the Docker container software system

As an alternative to Singularity you can use the Docker container system.
If you are using the AWS AMI we are providing you simply have to activate the service first:

sudo systemctl start docker

Otherwise the installation of the free community edition is described here: https://docs.docker.com/engine/installation/.
For example on an Ubuntu amd64 system this would mean the following sequence of commands:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu xenial stable"
sudo apt-get update
sudo apt-get install -y docker-ce
sudo systemctl enable docker
sudo systemctl start docker

Start the Docker software if it is not already running. To test type

docker ps

on the command line. The result should be a list starting with “Container ID”. (Note: This reporting is not working well on all systems)

To switch Onecellpipe from Singularity to Docker you then only add the parameter –docker 1 on the command line. (This will re-write the nextflow.config file with the correct settings). The pipeline would then be run like this:

nextflow onecellpipe.nf --dir data-dir  --docker 1

If you see permission issues, you will need to run Docker commands as sudo (user with administrator privileges). In this case:

  • Run sudo docker ps to test again
  • Add the parameter –sudo 1 on the onecellpipe command line.

To switch back to Singularity simply use –docker 0  (this is a “zero”) or skip the parameter. Stop the Docker service with:

sudo systemctl stop docker

Using a config file

If your file set up different from the default expected by the pipeline,  you can still use the onecellpipe by using the –config parameter  providing it with your prepared configuration file. The file has to be in the expected format, e.g like here.

If there are problems with mixed files and libraries, move the fastq files into separate directories and run the pipeline on each library individually.

Power users: Using the container for additional analysis

The onecellpipe software image has all the software installed to run the indrops pipeline and the other analysis parts described here within an Ubuntu Linux system. If you are familiar with this environment, you can also manually create a container from the image, log in and run these or other analysis steps on the command line. Using Singularity with the image file already available this would be:

singularity shell --workdir /tmp onecellpipe/bin/onecellpipe-25-2.img

Using Docker, the same can be achieved with:

DATA=/path/to/your/data
docker import onecellpipe-25.2.tar.gz onecellpipe
docker run -it --rm -m 4g -e USER --mount type=bind,source=$DATA,target=/tmp --name oncellpipe-container oncellpipe /bin/bash

Make sure you set DATA to your fastq directory. Adjust the version of onecellpipe-25.2.tar.gz if necessary.

 

Parallelization

To enable or increase parallel processing and therefore increase the analysis speed (and when using a HPC), increase the number of “worker” jobs on the command line, e.g. 100 for the quantification step and 5 for other steps with the OneCellPipe parameters:

--worker 5 --worker2 100

Please be aware that the indrops software sometimes seems to fail at the next step of the pipeline if the number of workers is too high.

In order to check for jobs running in parallel you can watch for Nextflow processes being submitted directly after one another:

If you execute the pipeline on a multi-core/CPU machine, you can check for multiple processes, e.g. alignment steps, being executed:

top

When running with Docker containers you will also see multiple containers being created (for Singularity this is not as visible):

sudo docker ps

 

Uploading & downloading data to S3

To process data on Amazon cloud machines, it is convenient to use the cloud storage S3. In order to get your data there and back again your can use the browser interface of S3. If you don’t like this or you need to transfer many files it might be more convenient to use use other tools:

  1. The AWSCLI tools allow command-line access to S3 and other cloud services. For example to copy a directory “myfolder” into the bucket “mybucket” you would do:

    aws s3 cp myfolder s3://mybucket/myfolder --recursive

    Details are here.

  2. Cyberduck interface

    Cyberduck is a free GUI program to interact with S3 and similar storage systems.
    It’s available for Mac and Windows.