Indrops config 2

March 21, 2018

OneCellBio

An example for running the OneCellPipe pipeline of OneCellBio with a pre-defined configuration file for sequencing runs with a single part, and a single library.
Files:

R1.fastq.gz
R2.fastq.gz

Command line:

nextflow onecellpipe.nf  --config /data/onecellpipe/data_results/indrop_config_to_use.yaml --out /data/onecellpipe/data_results

indrop_config_to_use.yaml:

# project and library settings
project_name : "libA5"
project_dir : "/data/onecellpipe/data_results"
sequencing_runs :
  - name : "libA5"
    version : "v2"
    dir : "/data/onecellpipe/data"
    fastq_path : "{read}.fastq.gz"
    library_name: "libA5"
# standard indrops config
# part 1: general software paths within the container, do not change
paths : 
  bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated'
  bowtie_dir : '/home/onecellbio/bowtie'
  rsem_dir : '/home/onecellbio/RSEM/bin'
  python_dir : '/home/onecellbio/pyndrops/bin'
  indrops_dir : '/home/onecellbio/indrops'
  java_dir : '/usr/bin'
  samtools_dir : '/home/onecellbio/samtools-1.3.1'
# part 2: analysis parameters
parameters : 
  umi_quantification_arguments:
    m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
    u : 1 #Ignore counts from UMI that should be split among more than U genes.
    d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
    split-ambigs : False  #If umi is assigned to m genes, add 1/m to each genes count (instead of 1)
    min_non_polyA : 15  #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
  output_arguments :
    output_unaligned_reads_to_other_fastq : False
    filter_alignments_to_softmasked_regions : False
  bowtie_arguments :
    m : 200
    n : 1
    l : 15
    e : 80
  trimmomatic_arguments :
    LEADING : "28"
    SLIDINGWINDOW : "4:20"
    MINLEN : "30"
    argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN']
  low_complexity_filter_arguments :
    max_low_complexity_fraction : 0.50

An other more complicated example for a single part but multiple libraries is following.
Files:

A5_S12_L001_R1_001.fastq.gz
A5_S12_L001_R2_001.fastq.gz
A5_S12_L002_R1_001.fastq.gz
A5_S12_L002_R2_001.fastq.gz
A6_S1_L001_R1_001.fastq.gz
A6_S1_L001_R2_001.fastq.gz
A6_S1_L002_R1_001.fastq.gz
A6_S1_L002_R2_001.fastq.gz

Command line:

nextflow onecellpipe.nf  --config /data/onecellpipe/data_results/indrop_config_to_use.yaml --out /data/onecellpipe/data_results_2

indrop_config_to_use.yaml:

# project and library settings
project_name : "libA5"
project_dir : "/data/onecellpipe/data_results_2"
sequencing_runs :
  - name : "libA5"
    version : "v2"
    dir : "/data/onecellpipe/more_data"
    fastq_path : "{read}.fastq.gz"
    split_affixes : ["L001", "L002"]
    libraries : 
      - {library_name: "A5", library_prefix: "A5_S12"}
      - {library_name: "A6", library_prefix: "A6_S1"}
# standard indrops config
# part 1: general software paths within the container, do not change
paths : 
  bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated'
  bowtie_dir : '/home/onecellbio/bowtie'
  rsem_dir : '/home/onecellbio/RSEM/bin'
  python_dir : '/home/onecellbio/pyndrops/bin'
  indrops_dir : '/home/onecellbio/indrops'
  java_dir : '/usr/bin'
  samtools_dir : '/home/onecellbio/samtools-1.3.1'
# part 2: analysis parameters
parameters : 
  umi_quantification_arguments:
    m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
    u : 1 #Ignore counts from UMI that should be split among more than U genes.
    d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
    split-ambigs : False  #If umi is assigned to m genes, add 1/m to each genes count (instead of 1)
    min_non_polyA : 15  #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
  output_arguments :
    output_unaligned_reads_to_other_fastq : False
    filter_alignments_to_softmasked_regions : False
  bowtie_arguments :
    m : 200
    n : 1
    l : 15
    e : 80
  trimmomatic_arguments :
    LEADING : "28"
    SLIDINGWINDOW : "4:20"
    MINLEN : "30"
    argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN']
  low_complexity_filter_arguments :
    max_low_complexity_fraction : 0.50

–

Even more complicated: An example for multiple runs (for the same samples in different directories) and multiple libraries is following.
Files:

Run1/A5_S12_L001_R1_001.fastq.gz
Run1/A5_S12_L001_R2_001.fastq.gz
Run1/A5_S12_L002_R1_001.fastq.gz
Run1/A5_S12_L002_R2_001.fastq.gz
Run1/A6_S1_L001_R1_001.fastq.gz
Run1/A6_S1_L001_R2_001.fastq.gz
Run1/A6_S1_L002_R1_001.fastq.gz
Run1/A6_S1_L002_R2_001.fastq.gz

and

Run2/A5_S12_L001_R1_001.fastq.gz
Run2/A5_S12_L001_R2_001.fastq.gz
Run2/A5_S12_L002_R1_001.fastq.gz
Run2/A5_S12_L002_R2_001.fastq.gz
Run2/A6_S1_L001_R1_001.fastq.gz
Run2/A6_S1_L001_R2_001.fastq.gz
Run2/A6_S1_L002_R1_001.fastq.gz
Run2/A6_S1_L002_R2_001.fastq.gz

Command line:

nextflow onecellpipe.nf  --config /data/onecellpipe/data_results/indrop_config_to_use.yaml

indrop_config_to_use.yaml:

# project and library settings
project_name : "libA5"
project_dir : "/data/onecellpipe/data_results_2"
sequencing_runs :
  - name : "Run1"
    version : "v2"
    dir : "/data/onecellpipe/more_data/Run1"
    fastq_path : "{read}.fastq.gz"
    split_affixes : ["L001", "L002"]
    libraries : 
      - {library_name: "A5", library_prefix: "A5_S12"}
      - {library_name: "A6", library_prefix: "A6_S1"}
  - name : "Run2"
    version : "v2"
    dir : "/data/onecellpipe/more_data/Run1"
    fastq_path : "{read}.fastq.gz"
    split_affixes : ["L001", "L002"]
    libraries : 
      - {library_name: "A5", library_prefix: "A5_S12"}
      - {library_name: "A6", library_prefix: "A6_S1"}
# standard indrops config
# part 1: general software paths within the container, do not change
paths : 
  bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated'
  bowtie_dir : '/home/onecellbio/bowtie'
  rsem_dir : '/home/onecellbio/RSEM/bin'
  python_dir : '/home/onecellbio/pyndrops/bin'
  indrops_dir : '/home/onecellbio/indrops'
  java_dir : '/usr/bin'
  samtools_dir : '/home/onecellbio/samtools-1.3.1'
# part 2: analysis parameters
parameters : 
  umi_quantification_arguments:
    m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
    u : 1 #Ignore counts from UMI that should be split among more than U genes.
    d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
    split-ambigs : False  #If umi is assigned to m genes, add 1/m to each genes count (instead of 1)
    min_non_polyA : 15  #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
  output_arguments :
    output_unaligned_reads_to_other_fastq : False
    filter_alignments_to_softmasked_regions : False
  bowtie_arguments :
    m : 200
    n : 1
    l : 15
    e : 80
  trimmomatic_arguments :
    LEADING : "28"
    SLIDINGWINDOW : "4:20"
    MINLEN : "30"
    argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN']
  low_complexity_filter_arguments :
    max_low_complexity_fraction : 0.50

Indrops config 2

Felix Kokocinski

Related Articles

Onecellpipe processes execution timeline

Running the OneCellPipe software in the Amazon cloud

Installing the software required for the OneCellPipe system