Nextflow on Pegasus
===================
This is a simplified Guide on how to run `nf-core/sarek `__, a Nextflow workflow designed to detect variants on whole genome or targeted sequencing data.
If you are new to Nextflow on HPC, please see the `Software Carpentries Tutorials `__ provided by the `Nextflow Community `__.
Load Nextflow module
--------------------
::
[pdavila@login4 ~]$ module load nextflow/25.04.2
Want to run an nf-core pipeline that requires DSL1 or an older version of Nextflow, no problem. You can use your own nextflow executable, instructions below.
::
mkdir $HOME/bin && cd $HOME/bin
export NXF_VER=24.04.4
curl -s https://get.nextflow.io | bash
# Set as your default nextflow
echo "export PATH=$HOME/bin:$PATH" >> $HOME/.bash_profile
source $HOME/.bash_profile
# Return to your previous $PWD
cd -
Sample User Nextflow LSF Configuration
--------------------------------------
::
[pdavila@login4 nextflow]$ cat pedro_pegasus_hpc.config
params {
config_profile_description = 'UM SCCC pegasus HPC profile.'
config_profile_contact = 'pdavila@miami.edu'
max_memory = 384.GB
max_cpus = 64
max_time = 30.m
lsf_queue_size = 256
schema_ignore_params = 'genomes,lsf_opts,lsf_queue_size'
validationSchemaIgnoreParams = "genomes,lsf_opts,lsf_queue_size,schema_ignore_params"
}
singularity.enabled = true
aws {
client {
anonymous = true
}
}
process {
clusterOptions = '-P sccc_ceccarelli'
scratch = '/scratch/projects/sccc_ceccarelli/pdavila'
resourceLimits = [
memory: 6.GB,
cpus: 1,
time: 30.m
]
queue = 'sccc2'
executor = 'lsf'
maxRetries = 3
errorStrategy = { task.attempt <= 3 ? 'retry' : 'finish' }
cache = 'lenient'
}
executor {
perTaskReserve = true
perJobMemLimit = false
queueSize = params.lsf_queue_size
}
Nextflow Submission Script
--------------------------
::
[pdavila@login4 nextflow]$ cat pedro_run_nextflow.sh
#!/bin/bash
#BSUB -J nf-sarek # assign a name to job.
#BSUB -P [YOUR_PROJECT] # specify the project to use when submitting the job
#BSUB -e %J.err # redirect std error to a specified file
#BSUB -o %J.out # redirect std out to a specified file
#BSUB -W 1:00 # set wall clock run time limit of 1 hour, else queue default will be applied
#BSUB -q sccc # specify queue to be used, else 'general' queue will be applied
#BSUB -n 1 # specify number of processors. In this job, a single processor is requested
#BSUB -R "rusage[mem=6144]" # Request 6GiB per core (6144 MiB = 6 GiB) to match your Nexflow Pegasus Config
#BSUB -B # send mail to specified email when the job is dispatched and begins execution
#BSUB -u [YOUR_EMAIL] # send notification to your email
#BSUB -N # send job statistics report through email when job finishes
# The nextflow/24.04.4 module is the latest module, built in miniforge3 conda env.
module load nextflow/24.04.4
module load singularity
export SCRATCH="/scratch/projects/sccc_ceccarelli/pdavila"
export NXF_SINGULARITY_CACHEDIR="$HOME/SINGULARITY_CACHEDIR"
export SINGULARITY_CACHEDIR="$HOME/SINGULARITY_CACHEDIR"
mkdir -p $NXF_SINGULARITY_CACHEDIR $SINGULARITY_CACHEDIR
# Run pipeline with test data
nextflow run nf-core/sarek -r 3.4.4 \
-profile test,singularity \
--outdir ./results \
-c pegasus_hpc.config \
-resume -bg > run_pipeline.log
Submit Job to LSF Cluster
-------------------------
::
[pdavila@login4 nextflow]$ bsub is submitted to queue .
Check status of running Job
---------------------------
::
[pdavila@login4 nextflow]$ bpeek 28994646
<< output from stdout >>
<< output from stderr >>
--------------------------------------------------------------------------------
This Conda env provides Nextflow 24.04.4, nc-core/tools 2.14.1, Java 17.0.11,
Python 3.12, and all their dependencies.
--------------------------------------------------------------------------------
You can also use use the ``tail -f run_pipeline.log`` command to see the log file as your Job writes to it.
::
[pdavila@login4 nextflow]$ tail -f run_pipeline.log
N E X T F L O W ~ version 24.04.4
WARN: It appears you have never run this project before -- Option `-resume` is ignored
Launching `https://github.com/nf-core/sarek` [berserk_koch] DSL2 - revision: 5cc30494a6 [3.4.4]
...
[a6/256990] Submitted process > NFCORE_SAREK:SAREK:FASTQC (test-test_L1)
[ed/ef1b85] Submitted process > NFCORE_SAREK:SAREK:FASTQC (test-test_L2)
[84/f728c7] Submitted process > NFCORE_SAREK:PREPARE_GENOME:BWAMEM1_INDEX (genome.fasta)
[ca/46b83b] Submitted process > NFCORE_SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED (genome.interval_list)
[18/2d0b64] Submitted process > NFCORE_SAREK:PREPARE_INTERVALS:GATK4_INTERVALLISTTOBED (genome)
Pulling Singularity image https://depot.galaxyproject.org/singularity/htslib:1.19.1--h81da01d_1 [cache /nethome/pdavila/SINGULARITY_CACHEDIR/depot.galaxyproject.org-singularity-htslib-1.19.1--h81da01d_1.img]
...
Pulling Singularity image https://depot.galaxyproject.org/singularity/multiqc:1.21--pyhdfd78af_0 [cache /nethome/pdavila/SINGULARITY_CACHEDIR/depot.galaxyproject.org-singularity-multiqc-1.21--pyhdfd78af_0.img]
[88/d69b7a] Submitted process > NFCORE_SAREK:SAREK:MULTIQC
-[nf-core/sarek] Pipeline completed successfully-
View your Results
-----------------
::
[pdavila@login4 nextflow]$ tree results/
results/
├── csv
│ ├── markduplicates.csv
│ ├── markduplicates_no_table.csv
│ ├── recalibrated.csv
│ └── variantcalled.csv
├── multiqc
│ ├── multiqc_data
│ │ ├── gatk_base_recalibrator.txt
│ │ ├── mosdepth_cov_dist.txt
│ │ ├── mosdepth_cumcov_dist.txt
│ │ ├── mosdepth_perchrom.txt
│ │ ├── multiqc_bcftools_stats.txt
│ │ ├── multiqc_citations.txt
│ │ ├── multiqc_data.json
│ │ ├── multiqc_fastqc.txt
│ │ ├── multiqc_general_stats.txt
│ │ ├── multiqc.log
│ │ ├── multiqc_picard_dups.txt
│ │ ├── multiqc_samtools_stats.txt
│ │ ├── multiqc_software_versions.txt
│ │ ├── multiqc_sources.txt
│ │ ├── picard_histogram_1.txt
│ │ ├── picard_histogram_2.txt
│ │ ├── picard_histogram.txt
│ │ ├── vcftools_tstv_by_count.txt
│ │ └── vcftools_tstv_by_qual.txt
│ ├── multiqc_plots
│ └── multiqc_report.html
├── pipeline_info
│ ├── execution_report_2024-10-14_16-17-26.html
│ ├── execution_timeline_2024-10-14_16-17-26.html
│ ├── execution_trace_2024-10-14_16-17-26.txt
│ ├── manifest_2024-10-14_16-17-26.bco.json
│ ├── nf_core_sarek_software_mqc_versions.yml
│ ├── params_2024-10-14_16-17-52.json
│ └── pipeline_dag_2024-10-14_16-17-26.html
├── preprocessing
│ ├── markduplicates
│ │ └── test
│ │ ├── test.md.cram
│ │ └── test.md.cram.crai
│ ├── recalibrated
│ │ └── test
│ │ ├── test.recal.cram
│ │ └── test.recal.cram.crai
│ └── recal_table
│ └── test
│ └── test.recal.table
├── reference
├── reports
│ ├── bcftools
│ │ └── strelka
│ │ └── test
│ │ └── test.strelka.variants.bcftools_stats.txt
│ ├── fastqc
│ │ ├── test-test_L1
│ │ │ ├── test-test_L1_1_fastqc.html
│ │ │ ├── test-test_L1_1_fastqc.zip
│ │ │ ├── test-test_L1_2_fastqc.html
│ │ │ └── test-test_L1_2_fastqc.zip
│ │ └── test-test_L2
│ │ ├── test-test_L2_1_fastqc.html
│ │ ├── test-test_L2_1_fastqc.zip
│ │ ├── test-test_L2_2_fastqc.html
│ │ └── test-test_L2_2_fastqc.zip
│ ├── markduplicates
│ │ └── test
│ │ └── test.md.cram.metrics
│ ├── mosdepth
│ │ └── test
│ │ ├── test.md.mosdepth.global.dist.txt
│ │ ├── test.md.mosdepth.region.dist.txt
│ │ ├── test.md.mosdepth.summary.txt
│ │ ├── test.md.regions.bed.gz
│ │ ├── test.md.regions.bed.gz.csi
│ │ ├── test.recal.mosdepth.global.dist.txt
│ │ ├── test.recal.mosdepth.region.dist.txt
│ │ ├── test.recal.mosdepth.summary.txt
│ │ ├── test.recal.regions.bed.gz
│ │ └── test.recal.regions.bed.gz.csi
│ ├── samtools
│ │ └── test
│ │ ├── test.md.cram.stats
│ │ └── test.recal.cram.stats
│ └── vcftools
│ └── strelka
│ └── test
│ ├── test.strelka.variants.FILTER.summary
│ ├── test.strelka.variants.TsTv.count
│ └── test.strelka.variants.TsTv.qual
└── variant_calling
└── strelka
└── test
├── test.strelka.genome.vcf.gz
├── test.strelka.genome.vcf.gz.tbi
├── test.strelka.variants.vcf.gz
└── test.strelka.variants.vcf.gz.tbi
Sources
-------
The nextflow/25.04.2 module provides nf-core and nf-core tools. It was installed using Miniforge and Bioconda.
https://nf-co.re/docs/nf-core-tools/installation