Common question about the OneCellPipe system

Comments Off on Common question about the OneCellPipe system

Common question about the OneCellPipe system

This is a sub-page of the multi-page documentation for the OneCellPipe pipeline of OneCellBio.

Here is a collection of frequently asked questions and potential error messages when running the single-cell processing pipeline.

Questions

  1. How do I free up disk space after running the pipeline?
    Answer:
    Once you are done, you can remove temporary files in your analysis folder. For the sample data set this would be:

    rm -rf sampledata/A5/{filtered_parts, quant_dir}

    You can also remove the Nextflow cache:

    nextflow clean
  2. How do I save / store / backup my results?
    Answer:
    The easiest would be to compress and copy the entire output folder (if specified with –out) or working folder (if only –dir was specified), e.g. with

    tar czvf onecellpipe-results.tgz result-folder

    Alternatively just save the main result files, replacing LIB-NAME with your library name:

    tar czvf onecellpipe-results.tgz result-folder/resources result-folder/*.* result-folder/LIB-NAME/*.*
  3. I started the pipeline, but nothing seems to happen.
    Answer:
    When run for the first time the container image has to be downloaded. This can take several minutes.
    This can also happen when the container image was removed from the system.
  4. I have a lot of fastq files to process in the cloud. How can I get them into S3?
    Answer:
    Have a look at these options.
  5. Should I use Docker or Singularity?
    Answer:
    We provide both options to accommodate integration into different IT infrastructures. There is no significant performance difference. If you have local system administrators, ask for their preference. For compute clusters it is sometimes better to use Singularity to avoid giving sudo permissions. The memory for Singularity is also slightly smaller.
  6. How can I speed up the pipeline?
    – Increase parallel processing by using a compute cluster of a machine with more CPUs / cores. You can then look at the number of parallel jobs.
    – Don’t run the QC steps if you don’t need them: –qc 0 (default)
    – Don’t create the transposed count matrix if you don’t need it: –transpose 0 (default)
    – Don’t create the BAM files after quantification if you don’t need them: –bam 0(default) 
  7. I accidentally stopped the pipeline, what can I do?
    As long as no files in the cache of Nextflow have changed, it is ofter possible to jump right back to the last successful step by repeating the same command and adding -resume.
  8. I lost the connection to my server, what can I do?
    Reconnect, use the screen command and try to resume the pipeline (by repeating the same command and adding -resume). Disconnect the screen by pressing <Ctrl><a><d> to avoid another interruption.
  9. I specified –email <email@address.com> but did not receive a notification!
    -Does your system support sendmail?
    -Unless you set up a SMTP details at the bottom of the nextflow.config file, mails often get blocked as spam! Try a Gmail address. Have a look in your sendmail folder, e.g. with .  less /var/mail/<username> if your notification got stuck!

 

Potential Error Messages

    1. Bowtie error 1
      Error: Could not allocate ChunkPool of 1048576000 bytes
      Warning: Exhausted best-first chunk memory for read 
      ...
      Exception: 
      === Error on piping data to bowtie ===

      Solution:
      Bowtie requires more at least 2 GB of RAM just to load the genome index.
      Please use a machine with at least 4 GB of RAM.

    2. Container software error
      Container software singularity could not be found, please make sure it is running or download 
       and install it ...
      

      Solution 1:
      You did not install container software on the machine you are running the pipeline on or you did not activate / start it.
      Solution 2:
      You are trying to use Docker as container software, but you did not specify —docker 1   on the command-line.

    3. Container software error
      Pipeline execution stopped with the following message: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get  (...) dial unix /var/run/docker.sock: connect: permission denied
      

      Solution:
      You are running on a system where the Docker software requires higher permissions.
      Add the following to your command line:

      --sudo 1
    4. Input error 1
      INFO:	Scanning directory testdata2
      ERROR:	Could not find fastq files.
      

      Solution:
      Provide the absolute path to your project directory and make sure there are fastq files in there with the parameter –dir /home/jdoe/data/fastqs.

    5. Input error 2

      ERROR:	Could not identify file name pattern.
        Please use configuration file

      Solution:
      The names of your fastq files are different to the default expected by the automatic setup script:

      {library_prefix}_{split_affix}_{read}_001.fastq.gz

      Either change the file names or provide a configuration file using the –config option.

    6. Alignment error
      Traceback (most recent call last):
        File "/home/onecellbio/indrops/indrops.py", line 1724, in 
          no_bam=args.no_bam, run_filter=target_runs)
        File "/home/onecellbio/indrops/indrops.py", line 818, in quantify_expression
          min_counts = min_counts, run_filter=run_filter)
        File "/home/onecellbio/indrops/indrops.py", line 928, in quantify_expression_for_barcode
          raise Exception("\n === No aligned bam was output for barcode %s ===" % barcode)
      Exception: 
       === No aligned bam was output for barcode bcDFJI ===
      

      Solution:
      This indrops error seems to occur when the number of jobs for the last step are not appropriate for the amount of data. Try reducing the –workers2 number.

    7. AWS connection error
      When trying to connect to an Amazon cloud machine using your SSH key file you might see the following error:

      @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      @         WARNING: UNPROTECTED PRIVATE KEY FILE!          @
      @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
      Permissions 0644 for '/Users/fred/.ssh/aws-key.pem' are too open.
      

      Solution:
      Change permission so that only you are allowed to access this file:

      chmod 400 /Users/fred/.ssh/aws-key.pem
    8. Setup error
      INFO: Setting up analysis container onecellpipe-25-2.tar.gz.
      ERROR: There was a problem importing the Docker image.Command error:
      Error response from daemon: Error processing tar file(exit status 1):
        write /home/onecellbio/ref/Mus_musculus.GRCm38.91.annotated.n2g.idx.fa: no space left on device
      

      Solution:
      You need more disk space in order to run the pipeline since the genome index files are large and you will need additional space for your results!

    9. Parameter error
      Unknown option: XXXX -- Check the available commands and options and syntax with 'help'
         OR
      ERROR ~ Unknown parameter "XXXX"

      Solution:
      Check which parameters can be used.
      General Nextflow parameter are passed with a single dash, e.g. -with-timeline.
      OneCellPipe parameters are passed using double dashes, e.g. –dir /fastq/dir

    10. Input problems

      gzip: /home/ubuntu/somefolder/sampledata/A5_S12_L001_R2_001.fastq.gz: No such file or directory
      ...
      Command error:
        .command.run.1: line 99:    12 Terminated              nxf_trace "$pid" .command.trace

      On some systems there are problems if the input folder is not at the level of the nextflow pipeline. Move the fastq folder to the current directory and start again.
      This can also happen if the fastq files are only provided via links.