An example for running the OneCellPipe pipeline of OneCellBio with a pre-defined configuration file for sequencing runs with a single part, and a single library.
Files:
- R1.fastq.gz
- R2.fastq.gz
Command line:
nextflow onecellpipe.nf --config /data/onecellpipe/data_results/indrop_config_to_use.yaml --out /data/onecellpipe/data_results
indrop_config_to_use.yaml:
# project and library settings project_name : "libA5" project_dir : "/data/onecellpipe/data_results" sequencing_runs : - name : "libA5" version : "v2" dir : "/data/onecellpipe/data" fastq_path : "{read}.fastq.gz" library_name: "libA5" # standard indrops config # part 1: general software paths within the container, do not change paths : bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated' bowtie_dir : '/home/onecellbio/bowtie' rsem_dir : '/home/onecellbio/RSEM/bin' python_dir : '/home/onecellbio/pyndrops/bin' indrops_dir : '/home/onecellbio/indrops' java_dir : '/usr/bin' samtools_dir : '/home/onecellbio/samtools-1.3.1' # part 2: analysis parameters parameters : umi_quantification_arguments: m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end. u : 1 #Ignore counts from UMI that should be split among more than U genes. d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL split-ambigs : False #If umi is assigned to m genes, add 1/m to each genes count (instead of 1) min_non_polyA : 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.) output_arguments : output_unaligned_reads_to_other_fastq : False filter_alignments_to_softmasked_regions : False bowtie_arguments : m : 200 n : 1 l : 15 e : 80 trimmomatic_arguments : LEADING : "28" SLIDINGWINDOW : "4:20" MINLEN : "30" argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN'] low_complexity_filter_arguments : max_low_complexity_fraction : 0.50
An other more complicated example for a single part but multiple libraries is following.
Files:
- A5_S12_L001_R1_001.fastq.gz
- A5_S12_L001_R2_001.fastq.gz
- A5_S12_L002_R1_001.fastq.gz
- A5_S12_L002_R2_001.fastq.gz
- A6_S1_L001_R1_001.fastq.gz
- A6_S1_L001_R2_001.fastq.gz
- A6_S1_L002_R1_001.fastq.gz
- A6_S1_L002_R2_001.fastq.gz
Command line:
nextflow onecellpipe.nf --config /data/onecellpipe/data_results/indrop_config_to_use.yaml --out /data/onecellpipe/data_results_2
indrop_config_to_use.yaml:
# project and library settings project_name : "libA5" project_dir : "/data/onecellpipe/data_results_2" sequencing_runs : - name : "libA5" version : "v2" dir : "/data/onecellpipe/more_data" fastq_path : "{read}.fastq.gz" split_affixes : ["L001", "L002"] libraries : - {library_name: "A5", library_prefix: "A5_S12"} - {library_name: "A6", library_prefix: "A6_S1"} # standard indrops config # part 1: general software paths within the container, do not change paths : bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated' bowtie_dir : '/home/onecellbio/bowtie' rsem_dir : '/home/onecellbio/RSEM/bin' python_dir : '/home/onecellbio/pyndrops/bin' indrops_dir : '/home/onecellbio/indrops' java_dir : '/usr/bin' samtools_dir : '/home/onecellbio/samtools-1.3.1' # part 2: analysis parameters parameters : umi_quantification_arguments: m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end. u : 1 #Ignore counts from UMI that should be split among more than U genes. d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL split-ambigs : False #If umi is assigned to m genes, add 1/m to each genes count (instead of 1) min_non_polyA : 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.) output_arguments : output_unaligned_reads_to_other_fastq : False filter_alignments_to_softmasked_regions : False bowtie_arguments : m : 200 n : 1 l : 15 e : 80 trimmomatic_arguments : LEADING : "28" SLIDINGWINDOW : "4:20" MINLEN : "30" argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN'] low_complexity_filter_arguments : max_low_complexity_fraction : 0.50
–
Even more complicated: An example for multiple runs (for the same samples in different directories) and multiple libraries is following.
Files:
- Run1/A5_S12_L001_R1_001.fastq.gz
- Run1/A5_S12_L001_R2_001.fastq.gz
- Run1/A5_S12_L002_R1_001.fastq.gz
- Run1/A5_S12_L002_R2_001.fastq.gz
- Run1/A6_S1_L001_R1_001.fastq.gz
- Run1/A6_S1_L001_R2_001.fastq.gz
- Run1/A6_S1_L002_R1_001.fastq.gz
- Run1/A6_S1_L002_R2_001.fastq.gz
and
- Run2/A5_S12_L001_R1_001.fastq.gz
- Run2/A5_S12_L001_R2_001.fastq.gz
- Run2/A5_S12_L002_R1_001.fastq.gz
- Run2/A5_S12_L002_R2_001.fastq.gz
- Run2/A6_S1_L001_R1_001.fastq.gz
- Run2/A6_S1_L001_R2_001.fastq.gz
- Run2/A6_S1_L002_R1_001.fastq.gz
- Run2/A6_S1_L002_R2_001.fastq.gz
Command line:
nextflow onecellpipe.nf --config /data/onecellpipe/data_results/indrop_config_to_use.yaml
indrop_config_to_use.yaml:
# project and library settings project_name : "libA5" project_dir : "/data/onecellpipe/data_results_2" sequencing_runs : - name : "Run1" version : "v2" dir : "/data/onecellpipe/more_data/Run1" fastq_path : "{read}.fastq.gz" split_affixes : ["L001", "L002"] libraries : - {library_name: "A5", library_prefix: "A5_S12"} - {library_name: "A6", library_prefix: "A6_S1"} - name : "Run2" version : "v2" dir : "/data/onecellpipe/more_data/Run1" fastq_path : "{read}.fastq.gz" split_affixes : ["L001", "L002"] libraries : - {library_name: "A5", library_prefix: "A5_S12"} - {library_name: "A6", library_prefix: "A6_S1"} # standard indrops config # part 1: general software paths within the container, do not change paths : bowtie_index : '/home/onecellbio/ref/Homo_sapiens.GRCh38.91.annotated' bowtie_dir : '/home/onecellbio/bowtie' rsem_dir : '/home/onecellbio/RSEM/bin' python_dir : '/home/onecellbio/pyndrops/bin' indrops_dir : '/home/onecellbio/indrops' java_dir : '/usr/bin' samtools_dir : '/home/onecellbio/samtools-1.3.1' # part 2: analysis parameters parameters : umi_quantification_arguments: m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end. u : 1 #Ignore counts from UMI that should be split among more than U genes. d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL split-ambigs : False #If umi is assigned to m genes, add 1/m to each genes count (instead of 1) min_non_polyA : 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.) output_arguments : output_unaligned_reads_to_other_fastq : False filter_alignments_to_softmasked_regions : False bowtie_arguments : m : 200 n : 1 l : 15 e : 80 trimmomatic_arguments : LEADING : "28" SLIDINGWINDOW : "4:20" MINLEN : "30" argument_order : ['LEADING', 'SLIDINGWINDOW', 'MINLEN'] low_complexity_filter_arguments : max_low_complexity_fraction : 0.50