An example for running the OneCellPipe pipeline of OneCellBio with a pre-defined configuration file for sequencing runs with a single part, and a single library.
Files:
- R1.fastq.gz
- R2.fastq.gz
Command line:
nextflow onecellpipe.nf --config /data/onecellpipe/data_results/indrop_config_to_use.yaml --out /data/onecellpipe/data_results
indrop_config_to_use.yaml:
# project and library settings
project_name : "libA5"
project_dir : "/data/onecellpipe/data_results"
sequencing_runs :
- name : "libA5"
version : "v2"
dir : "/data/onecellpipe/data"
fastq_path : "{read}.fastq.gz"
library_name: "libA5"
# standard indrops config
# part 1: general software paths within the container, do not change
paths :
bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated'
bowtie_dir : '/home/onecellbio/bowtie'
rsem_dir : '/home/onecellbio/RSEM/bin'
python_dir : '/home/onecellbio/pyndrops/bin'
indrops_dir : '/home/onecellbio/indrops'
java_dir : '/usr/bin'
samtools_dir : '/home/onecellbio/samtools-1.3.1'
# part 2: analysis parameters
parameters :
umi_quantification_arguments:
m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
u : 1 #Ignore counts from UMI that should be split among more than U genes.
d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
split-ambigs : False #If umi is assigned to m genes, add 1/m to each genes count (instead of 1)
min_non_polyA : 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
output_arguments :
output_unaligned_reads_to_other_fastq : False
filter_alignments_to_softmasked_regions : False
bowtie_arguments :
m : 200
n : 1
l : 15
e : 80
trimmomatic_arguments :
LEADING : "28"
SLIDINGWINDOW : "4:20"
MINLEN : "30"
argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN']
low_complexity_filter_arguments :
max_low_complexity_fraction : 0.50
An other more complicated example for a single part but multiple libraries is following.
Files:
- A5_S12_L001_R1_001.fastq.gz
- A5_S12_L001_R2_001.fastq.gz
- A5_S12_L002_R1_001.fastq.gz
- A5_S12_L002_R2_001.fastq.gz
- A6_S1_L001_R1_001.fastq.gz
- A6_S1_L001_R2_001.fastq.gz
- A6_S1_L002_R1_001.fastq.gz
- A6_S1_L002_R2_001.fastq.gz
Command line:
nextflow onecellpipe.nf --config /data/onecellpipe/data_results/indrop_config_to_use.yaml --out /data/onecellpipe/data_results_2
indrop_config_to_use.yaml:
# project and library settings
project_name : "libA5"
project_dir : "/data/onecellpipe/data_results_2"
sequencing_runs :
- name : "libA5"
version : "v2"
dir : "/data/onecellpipe/more_data"
fastq_path : "{read}.fastq.gz"
split_affixes : ["L001", "L002"]
libraries :
- {library_name: "A5", library_prefix: "A5_S12"}
- {library_name: "A6", library_prefix: "A6_S1"}
# standard indrops config
# part 1: general software paths within the container, do not change
paths :
bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated'
bowtie_dir : '/home/onecellbio/bowtie'
rsem_dir : '/home/onecellbio/RSEM/bin'
python_dir : '/home/onecellbio/pyndrops/bin'
indrops_dir : '/home/onecellbio/indrops'
java_dir : '/usr/bin'
samtools_dir : '/home/onecellbio/samtools-1.3.1'
# part 2: analysis parameters
parameters :
umi_quantification_arguments:
m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
u : 1 #Ignore counts from UMI that should be split among more than U genes.
d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
split-ambigs : False #If umi is assigned to m genes, add 1/m to each genes count (instead of 1)
min_non_polyA : 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
output_arguments :
output_unaligned_reads_to_other_fastq : False
filter_alignments_to_softmasked_regions : False
bowtie_arguments :
m : 200
n : 1
l : 15
e : 80
trimmomatic_arguments :
LEADING : "28"
SLIDINGWINDOW : "4:20"
MINLEN : "30"
argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN']
low_complexity_filter_arguments :
max_low_complexity_fraction : 0.50
–
Even more complicated: An example for multiple runs (for the same samples in different directories) and multiple libraries is following.
Files:
- Run1/A5_S12_L001_R1_001.fastq.gz
- Run1/A5_S12_L001_R2_001.fastq.gz
- Run1/A5_S12_L002_R1_001.fastq.gz
- Run1/A5_S12_L002_R2_001.fastq.gz
- Run1/A6_S1_L001_R1_001.fastq.gz
- Run1/A6_S1_L001_R2_001.fastq.gz
- Run1/A6_S1_L002_R1_001.fastq.gz
- Run1/A6_S1_L002_R2_001.fastq.gz
and
- Run2/A5_S12_L001_R1_001.fastq.gz
- Run2/A5_S12_L001_R2_001.fastq.gz
- Run2/A5_S12_L002_R1_001.fastq.gz
- Run2/A5_S12_L002_R2_001.fastq.gz
- Run2/A6_S1_L001_R1_001.fastq.gz
- Run2/A6_S1_L001_R2_001.fastq.gz
- Run2/A6_S1_L002_R1_001.fastq.gz
- Run2/A6_S1_L002_R2_001.fastq.gz
Command line:
nextflow onecellpipe.nf --config /data/onecellpipe/data_results/indrop_config_to_use.yaml
indrop_config_to_use.yaml:
# project and library settings
project_name : "libA5"
project_dir : "/data/onecellpipe/data_results_2"
sequencing_runs :
- name : "Run1"
version : "v2"
dir : "/data/onecellpipe/more_data/Run1"
fastq_path : "{read}.fastq.gz"
split_affixes : ["L001", "L002"]
libraries :
- {library_name: "A5", library_prefix: "A5_S12"}
- {library_name: "A6", library_prefix: "A6_S1"}
- name : "Run2"
version : "v2"
dir : "/data/onecellpipe/more_data/Run1"
fastq_path : "{read}.fastq.gz"
split_affixes : ["L001", "L002"]
libraries :
- {library_name: "A5", library_prefix: "A5_S12"}
- {library_name: "A6", library_prefix: "A6_S1"}
# standard indrops config
# part 1: general software paths within the container, do not change
paths :
bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated'
bowtie_dir : '/home/onecellbio/bowtie'
rsem_dir : '/home/onecellbio/RSEM/bin'
python_dir : '/home/onecellbio/pyndrops/bin'
indrops_dir : '/home/onecellbio/indrops'
java_dir : '/usr/bin'
samtools_dir : '/home/onecellbio/samtools-1.3.1'
# part 2: analysis parameters
parameters :
umi_quantification_arguments:
m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
u : 1 #Ignore counts from UMI that should be split among more than U genes.
d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
split-ambigs : False #If umi is assigned to m genes, add 1/m to each genes count (instead of 1)
min_non_polyA : 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
output_arguments :
output_unaligned_reads_to_other_fastq : False
filter_alignments_to_softmasked_regions : False
bowtie_arguments :
m : 200
n : 1
l : 15
e : 80
trimmomatic_arguments :
LEADING : "28"
SLIDINGWINDOW : "4:20"
MINLEN : "30"
argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN']
low_complexity_filter_arguments :
max_low_complexity_fraction : 0.50