Nextflow in Neurodesk

Nextflow in Neurodesk#

#

Author: Steffen Bollmann

Date: 13 Feb 2026

#

License:

Note: If this notebook uses neuroimaging tools from Neurocontainers, those tools retain their original licenses. Please see Neurodesk citation guidelines for details.

Citation and Resources#

Workflow engine#

Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. https://doi.org/10.1038/nbt.3820

Tools included in this workflow#

FSL - Brain Extraction Tool (BET)

M. Jenkinson, C.F. Beckmann, T.E. Behrens, M.W. Woolrich, S.M. Smith. FSL. NeuroImage, 62:782-90, 2012
Smith S. M. (2002). Fast robust automated brain extraction. Human brain mapping, 17(3), 143–155. https://doi.org/10.1002/hbm.10062

Dataset from OSF#

Shaw, T., & Bollmann, S. (2020). Dataset for Towards Optimising MRI Methods for ChAracterisation of Tissue (TOMCAT) [Data set]. OSF. https://doi.org/10.17605/OSF.IO/BT4EZ

Introduction#

Nextflow is a workflow engine that lets you write data-driven computational pipelines. It handles parallelism, file staging, and error recovery so you can focus on the analysis logic.

Why use Nextflow for neuroimaging?

Automatically runs independent subjects in parallel
Tracks which steps have finished so you can resume failed runs with -resume
Works the same way on your laptop, an HPC cluster, or in the cloud
Large ecosystem of ready-made pipelines at nf-core

This notebook teaches Nextflow from scratch through three progressively complex examples:

A “hello world” pipeline
Brain extraction on a single subject
A multi-subject pipeline with quality-control summary

Setup#

Load FSL via the Neurodesk module system.

import module
await module.load('fsl/6.0.7.16')
await module.list()

#nextflow is installed inside Neurodesk
!nextflow -version

Download test data#

We download a T1-weighted MRI scan from the TOMCAT dataset on OSF, then create a copy as a simulated second subject for the multi-subject demo later.

%%bash
mkdir -p data

if [ -f ./data/sub-01.nii ]; then
    echo "sub-01.nii already exists, skipping download"
else
    if [ ! -f ./data/sub-01.nii.gz ]; then
        echo "Downloading T1w from OSF..."
        osf -p bt4ez fetch osfstorage/TOMCAT_DIB/sub-01/ses-01_7T/anat/sub-01_ses-01_7T_T1w_defaced.nii.gz ./data/sub-01.nii.gz
    fi
    echo "Decompressing..."
    gunzip ./data/sub-01.nii.gz
fi

echo "Done."
ls -lh ./data/sub-01.nii

Create a simulated second subject by copying sub-01#

if [ ! -f ./data/sub-02.nii ]; then cp ./data/sub-01.nii ./data/sub-02.nii echo “Created sub-02.nii” else echo “sub-02.nii already exists” fi

ls -lh ./data/

Hello Nextflow#

Every Nextflow pipeline is a .nf script containing processes (units of work) and a workflow that wires them together using channels (asynchronous data queues).

Let’s start with the simplest possible pipeline.

%%writefile hello.nf
process SAY_HELLO {
    input:
    val greeting

    output:
    stdout

    script:
    """
    echo "$greeting from Nextflow!"
    """
}

workflow {
    greetings = Channel.of('Hello', 'Bonjour', 'Hola')
    SAY_HELLO(greetings) | view
}

!nextflow run hello.nf

What just happened?

Channel.of(...) created a channel with three items
Nextflow launched the SAY_HELLO process three times — once per item, potentially in parallel
Each execution ran in its own isolated work/ subdirectory

Let’s peek inside the work directory to see how Nextflow organises task execution:

%%bash
echo "=== work directory structure ==="
# Show the first task directory as an example
TASK_DIR=$(find work -name '.command.sh' -print -quit 2>/dev/null | xargs dirname)
if [ -n "$TASK_DIR" ]; then
    echo "Example task dir: $TASK_DIR"
    echo ""
    echo "--- .command.sh (the actual script that ran) ---"
    cat "$TASK_DIR/.command.sh"
    echo ""
    echo "--- .command.log (stdout/stderr) ---"
    cat "$TASK_DIR/.command.log"
else
    echo "No work directory found (pipeline may not have run)"
fi

Core concepts#

Concept	Description
Process	A unit of work with defined inputs, outputs, and a script. Runs in an isolated directory.
Channel	An asynchronous queue that connects processes. Data flows through channels.
Workflow	The top-level block that creates channels and wires processes together.
`publishDir`	Copies output files from the work directory to a permanent results folder.
`params`	Pipeline parameters that can be set on the command line with `--name value`.

Nextflow automatically handles:

Parallelism: If a channel has N items, the process runs N times (potentially concurrently)
File staging: Input files are symlinked into each task’s work directory
Resumability: Use -resume to skip already-completed tasks after a failure

Single-subject brain extraction#

Now let’s do something useful: run FSL’s bet (Brain Extraction Tool) on a single T1w image via Nextflow.

This introduces:

path inputs (file handling)
params for configurable settings
publishDir to save outputs to a results folder

%%writefile bet_single.nf
params.input = './data/sub-01.nii'
params.outdir = './results_single'
params.frac  = 0.4

process BET {
    publishDir params.outdir, mode: 'copy'

    input:
    path t1w

    output:
    path '*_brain.*'

    script:
    """
    bet ${t1w} ${t1w.baseName}_brain -f ${params.frac} -m
    """
}

workflow {
    input_ch = Channel.fromPath(params.input)
    BET(input_ch)
}

!nextflow run bet_single.nf

!ls -lh results_single/

Key points:

Channel.fromPath(...) creates a channel from a file path
Inside the script block, ${t1w} refers to the staged input file
publishDir copies the outputs matching '*_brain.*' to our results folder
We could override any parameter from the command line, e.g. nextflow run bet_single.nf --frac 0.3

Multi-subject pipeline#

Real neuroimaging studies have multiple subjects. Nextflow makes this easy — we just put multiple files into a channel and Nextflow fans out automatically.

This pipeline has two processes:

BET — runs brain extraction per subject (parallel fan-out)
QC_SUMMARY — collects all results and generates a summary table (runs once after all BET tasks finish)

%%writefile bet_multi.nf
params.inputs = './data/sub-*.nii'
params.outdir = './results_multi'
params.frac   = 0.4

process BET {
    publishDir "${params.outdir}/bet", mode: 'copy'

    input:
    path t1w

    output:
    path '*_brain.nii.gz'
    path '*_brain_mask.nii.gz'

    script:
    """
    bet ${t1w} ${t1w.baseName}_brain -f ${params.frac} -m
    """
}

process QC_SUMMARY {
    publishDir params.outdir, mode: 'copy'

    input:
    path brain_images
    path mask_images

    output:
    path 'qc_summary.tsv'

    script:
    """
    echo -e "subject\tbrain_volume_voxels" > qc_summary.tsv
    for mask in *_brain_mask.nii.gz; do
        subj=\$(echo \$mask | sed 's/_brain_mask.nii.gz//')
        nvox=\$(fslstats \$mask -V | awk '{print \$1}')
        echo -e "\${subj}\t\${nvox}" >> qc_summary.tsv
    done
    echo "=== QC Summary ==="
    cat qc_summary.tsv
    """
}

workflow {
    t1w_ch = Channel.fromPath(params.inputs)

    BET(t1w_ch)

    QC_SUMMARY(
        BET.out[0].collect(),
        BET.out[1].collect()
    )
}

!nextflow run bet_multi.nf

%%bash
echo "=== Output files ==="
ls -lh results_multi/bet/
echo ""
echo "=== QC Summary ==="
cat results_multi/qc_summary.tsv

What’s new here?

Channel.fromPath('data/sub-*.nii') picks up both sub-01.nii and sub-02.nii
Nextflow runs BET twice in parallel (one per subject)
.collect() gathers all per-subject outputs into a single list and passes it to QC_SUMMARY
QC_SUMMARY runs once, after all BET tasks complete, and generates a combined table

This fan-out/collect pattern is the foundation of most neuroimaging Nextflow pipelines.

Visualize results#

Use ipyniivue to visualize results. Enable the first line to see the comparison.

from ipyniivue import NiiVue

nv = NiiVue()
nv.load_volumes([
    # {"path": "./data/sub-01.nii", "colormap": "gray"},
    {"path": "./results_multi/bet/sub-01_brain.nii.gz", "colormap": "red"}
])
nv

Next steps#

You now know the core Nextflow patterns. Here are some ways to extend what you’ve learned:

-resume: Add this flag to skip already-completed tasks when re-running a pipeline after a failure or parameter change
nextflow.config: Move parameters, executor settings (local/SLURM/PBS), and resource limits (CPUs, memory) into a separate config file
Containers: Nextflow can pull and run Docker/Singularity containers per process — set container in a process or config
nf-core: Browse nf-co.re for production-grade neuroimaging pipelines and community best practices
More modalities: Extend the glob pattern (params.inputs) to pick up T2w, FLAIR, or functional data

Dependencies and environment capture#

Using the package watermark to document system environment and software versions used in this notebook

%load_ext watermark

%watermark
%watermark --iversions