Nextflow in Neurodesk#
#
Author: Steffen Bollmann
Date: 13 Feb 2026
#
License:
Note: If this notebook uses neuroimaging tools from Neurocontainers, those tools retain their original licenses. Please see Neurodesk citation guidelines for details.
Citation and Resources#
Workflow engine#
Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. https://doi.org/10.1038/nbt.3820
Tools included in this workflow#
FSL - Brain Extraction Tool (BET)
M. Jenkinson, C.F. Beckmann, T.E. Behrens, M.W. Woolrich, S.M. Smith. FSL. NeuroImage, 62:782-90, 2012
Smith S. M. (2002). Fast robust automated brain extraction. Human brain mapping, 17(3), 143–155. https://doi.org/10.1002/hbm.10062
Dataset from OSF#
Shaw, T., & Bollmann, S. (2020). Dataset for Towards Optimising MRI Methods for ChAracterisation of Tissue (TOMCAT) [Data set]. OSF. https://doi.org/10.17605/OSF.IO/BT4EZ
Table of contents#
Introduction#
Nextflow is a workflow engine that lets you write data-driven computational pipelines. It handles parallelism, file staging, and error recovery so you can focus on the analysis logic.
Why use Nextflow for neuroimaging?
Automatically runs independent subjects in parallel
Tracks which steps have finished so you can resume failed runs with
-resumeWorks the same way on your laptop, an HPC cluster, or in the cloud
Large ecosystem of ready-made pipelines at nf-core
This notebook teaches Nextflow from scratch through three progressively complex examples:
A “hello world” pipeline
Brain extraction on a single subject
A multi-subject pipeline with quality-control summary
Setup#
Load FSL via the Neurodesk module system.
import module
await module.load('fsl/6.0.7.16')
await module.list()
#nextflow is installed inside Neurodesk
!nextflow -version
Download test data#
We download a T1-weighted MRI scan from the TOMCAT dataset on OSF, then create a copy as a simulated second subject for the multi-subject demo later.
%%bash
mkdir -p data
if [ -f ./data/sub-01.nii ]; then
echo "sub-01.nii already exists, skipping download"
else
if [ ! -f ./data/sub-01.nii.gz ]; then
echo "Downloading T1w from OSF..."
osf -p bt4ez fetch osfstorage/TOMCAT_DIB/sub-01/ses-01_7T/anat/sub-01_ses-01_7T_T1w_defaced.nii.gz ./data/sub-01.nii.gz
fi
echo "Decompressing..."
gunzip ./data/sub-01.nii.gz
fi
echo "Done."
ls -lh ./data/sub-01.nii
Create a simulated second subject by copying sub-01#
if [ ! -f ./data/sub-02.nii ]; then cp ./data/sub-01.nii ./data/sub-02.nii echo “Created sub-02.nii” else echo “sub-02.nii already exists” fi
ls -lh ./data/
Hello Nextflow#
Every Nextflow pipeline is a .nf script containing processes (units of work) and a workflow that wires them together using channels (asynchronous data queues).
Let’s start with the simplest possible pipeline.
%%writefile hello.nf
process SAY_HELLO {
input:
val greeting
output:
stdout
script:
"""
echo "$greeting from Nextflow!"
"""
}
workflow {
greetings = Channel.of('Hello', 'Bonjour', 'Hola')
SAY_HELLO(greetings) | view
}
!nextflow run hello.nf
What just happened?
Channel.of(...)created a channel with three itemsNextflow launched the
SAY_HELLOprocess three times — once per item, potentially in parallelEach execution ran in its own isolated
work/subdirectory
Let’s peek inside the work directory to see how Nextflow organises task execution:
%%bash
echo "=== work directory structure ==="
# Show the first task directory as an example
TASK_DIR=$(find work -name '.command.sh' -print -quit 2>/dev/null | xargs dirname)
if [ -n "$TASK_DIR" ]; then
echo "Example task dir: $TASK_DIR"
echo ""
echo "--- .command.sh (the actual script that ran) ---"
cat "$TASK_DIR/.command.sh"
echo ""
echo "--- .command.log (stdout/stderr) ---"
cat "$TASK_DIR/.command.log"
else
echo "No work directory found (pipeline may not have run)"
fi
Core concepts#
Concept |
Description |
|---|---|
Process |
A unit of work with defined inputs, outputs, and a script. Runs in an isolated directory. |
Channel |
An asynchronous queue that connects processes. Data flows through channels. |
Workflow |
The top-level block that creates channels and wires processes together. |
|
Copies output files from the work directory to a permanent results folder. |
|
Pipeline parameters that can be set on the command line with |
Nextflow automatically handles:
Parallelism: If a channel has N items, the process runs N times (potentially concurrently)
File staging: Input files are symlinked into each task’s work directory
Resumability: Use
-resumeto skip already-completed tasks after a failure
Single-subject brain extraction#
Now let’s do something useful: run FSL’s bet (Brain Extraction Tool) on a single T1w image via Nextflow.
This introduces:
pathinputs (file handling)paramsfor configurable settingspublishDirto save outputs to a results folder
%%writefile bet_single.nf
params.input = './data/sub-01.nii'
params.outdir = './results_single'
params.frac = 0.4
process BET {
publishDir params.outdir, mode: 'copy'
input:
path t1w
output:
path '*_brain.*'
script:
"""
bet ${t1w} ${t1w.baseName}_brain -f ${params.frac} -m
"""
}
workflow {
input_ch = Channel.fromPath(params.input)
BET(input_ch)
}
!nextflow run bet_single.nf
!ls -lh results_single/
Key points:
Channel.fromPath(...)creates a channel from a file pathInside the script block,
${t1w}refers to the staged input filepublishDircopies the outputs matching'*_brain.*'to our results folderWe could override any parameter from the command line, e.g.
nextflow run bet_single.nf --frac 0.3
Multi-subject pipeline#
Real neuroimaging studies have multiple subjects. Nextflow makes this easy — we just put multiple files into a channel and Nextflow fans out automatically.
This pipeline has two processes:
BET — runs brain extraction per subject (parallel fan-out)
QC_SUMMARY — collects all results and generates a summary table (runs once after all BET tasks finish)
%%writefile bet_multi.nf
params.inputs = './data/sub-*.nii'
params.outdir = './results_multi'
params.frac = 0.4
process BET {
publishDir "${params.outdir}/bet", mode: 'copy'
input:
path t1w
output:
path '*_brain.nii.gz'
path '*_brain_mask.nii.gz'
script:
"""
bet ${t1w} ${t1w.baseName}_brain -f ${params.frac} -m
"""
}
process QC_SUMMARY {
publishDir params.outdir, mode: 'copy'
input:
path brain_images
path mask_images
output:
path 'qc_summary.tsv'
script:
"""
echo -e "subject\tbrain_volume_voxels" > qc_summary.tsv
for mask in *_brain_mask.nii.gz; do
subj=\$(echo \$mask | sed 's/_brain_mask.nii.gz//')
nvox=\$(fslstats \$mask -V | awk '{print \$1}')
echo -e "\${subj}\t\${nvox}" >> qc_summary.tsv
done
echo "=== QC Summary ==="
cat qc_summary.tsv
"""
}
workflow {
t1w_ch = Channel.fromPath(params.inputs)
BET(t1w_ch)
QC_SUMMARY(
BET.out[0].collect(),
BET.out[1].collect()
)
}
!nextflow run bet_multi.nf
%%bash
echo "=== Output files ==="
ls -lh results_multi/bet/
echo ""
echo "=== QC Summary ==="
cat results_multi/qc_summary.tsv
What’s new here?
Channel.fromPath('data/sub-*.nii')picks up bothsub-01.niiandsub-02.niiNextflow runs
BETtwice in parallel (one per subject).collect()gathers all per-subject outputs into a single list and passes it toQC_SUMMARYQC_SUMMARYruns once, after all BET tasks complete, and generates a combined table
This fan-out/collect pattern is the foundation of most neuroimaging Nextflow pipelines.
Visualize results#
Use ipyniivue to visualize results. Enable the first line to see the comparison.
from ipyniivue import NiiVue
nv = NiiVue()
nv.load_volumes([
# {"path": "./data/sub-01.nii", "colormap": "gray"},
{"path": "./results_multi/bet/sub-01_brain.nii.gz", "colormap": "red"}
])
nv
Next steps#
You now know the core Nextflow patterns. Here are some ways to extend what you’ve learned:
-resume: Add this flag to skip already-completed tasks when re-running a pipeline after a failure or parameter changenextflow.config: Move parameters, executor settings (local/SLURM/PBS), and resource limits (CPUs, memory) into a separate config fileContainers: Nextflow can pull and run Docker/Singularity containers per process — set
containerin a process or confignf-core: Browse nf-co.re for production-grade neuroimaging pipelines and community best practices
More modalities: Extend the glob pattern (
params.inputs) to pick up T2w, FLAIR, or functional data
Dependencies and environment capture#
Using the package watermark to document system environment and software versions used in this notebook
%load_ext watermark
%watermark
%watermark --iversions