https://doi.org/10.5281/zenodo.18717009

Nextflow in Neurodesk#

#

Author: Steffen Bollmann

Date: 13 Feb 2026

#

License:

Note: If this notebook uses neuroimaging tools from Neurocontainers, those tools retain their original licenses. Please see Neurodesk citation guidelines for details.

Citation and Resources#

Workflow engine#

  • Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316-319. https://doi.org/10.1038/nbt.3820

Tools included in this workflow#

FSL - Brain Extraction Tool (BET)

Dataset from OSF#

Table of contents#

  1. Introduction

  2. Setup

  3. Download test data

  4. Hello Nextflow

  5. Core concepts

  6. Single-subject brain extraction

  7. Multi-subject pipeline

  8. Visualize results

  9. Next steps

  10. Cleanup

  11. Dependencies and environment capture

Introduction#

Nextflow is a workflow engine that lets you write data-driven computational pipelines. It handles parallelism, file staging, and error recovery so you can focus on the analysis logic.

Why use Nextflow for neuroimaging?

  • Automatically runs independent subjects in parallel

  • Tracks which steps have finished so you can resume failed runs with -resume

  • Works the same way on your laptop, an HPC cluster, or in the cloud

  • Large ecosystem of ready-made pipelines at nf-core

This notebook teaches Nextflow from scratch through three progressively complex examples:

  1. A “hello world” pipeline

  2. Brain extraction on a single subject

  3. A multi-subject pipeline with quality-control summary

Setup#

Load FSL via the Neurodesk module system.

import module
await module.load('fsl/6.0.7.16')
await module.list()
#nextflow is installed inside Neurodesk
!nextflow -version

Download test data#

We download a T1-weighted MRI scan from the TOMCAT dataset on OSF, then create a copy as a simulated second subject for the multi-subject demo later.

%%bash
mkdir -p data

if [ -f ./data/sub-01.nii ]; then
    echo "sub-01.nii already exists, skipping download"
else
    if [ ! -f ./data/sub-01.nii.gz ]; then
        echo "Downloading T1w from OSF..."
        osf -p bt4ez fetch osfstorage/TOMCAT_DIB/sub-01/ses-01_7T/anat/sub-01_ses-01_7T_T1w_defaced.nii.gz ./data/sub-01.nii.gz
    fi
    echo "Decompressing..."
    gunzip ./data/sub-01.nii.gz
fi

echo "Done."
ls -lh ./data/sub-01.nii

Create a simulated second subject by copying sub-01#

if [ ! -f ./data/sub-02.nii ]; then cp ./data/sub-01.nii ./data/sub-02.nii echo “Created sub-02.nii” else echo “sub-02.nii already exists” fi

ls -lh ./data/

Hello Nextflow#

Every Nextflow pipeline is a .nf script containing processes (units of work) and a workflow that wires them together using channels (asynchronous data queues).

Let’s start with the simplest possible pipeline.

%%writefile hello.nf
process SAY_HELLO {
    input:
    val greeting

    output:
    stdout

    script:
    """
    echo "$greeting from Nextflow!"
    """
}

workflow {
    greetings = Channel.of('Hello', 'Bonjour', 'Hola')
    SAY_HELLO(greetings) | view
}
!nextflow run hello.nf

What just happened?

  1. Channel.of(...) created a channel with three items

  2. Nextflow launched the SAY_HELLO process three times — once per item, potentially in parallel

  3. Each execution ran in its own isolated work/ subdirectory

Let’s peek inside the work directory to see how Nextflow organises task execution:

%%bash
echo "=== work directory structure ==="
# Show the first task directory as an example
TASK_DIR=$(find work -name '.command.sh' -print -quit 2>/dev/null | xargs dirname)
if [ -n "$TASK_DIR" ]; then
    echo "Example task dir: $TASK_DIR"
    echo ""
    echo "--- .command.sh (the actual script that ran) ---"
    cat "$TASK_DIR/.command.sh"
    echo ""
    echo "--- .command.log (stdout/stderr) ---"
    cat "$TASK_DIR/.command.log"
else
    echo "No work directory found (pipeline may not have run)"
fi

Core concepts#

Concept

Description

Process

A unit of work with defined inputs, outputs, and a script. Runs in an isolated directory.

Channel

An asynchronous queue that connects processes. Data flows through channels.

Workflow

The top-level block that creates channels and wires processes together.

publishDir

Copies output files from the work directory to a permanent results folder.

params

Pipeline parameters that can be set on the command line with --name value.

Nextflow automatically handles:

  • Parallelism: If a channel has N items, the process runs N times (potentially concurrently)

  • File staging: Input files are symlinked into each task’s work directory

  • Resumability: Use -resume to skip already-completed tasks after a failure

Single-subject brain extraction#

Now let’s do something useful: run FSL’s bet (Brain Extraction Tool) on a single T1w image via Nextflow.

This introduces:

  • path inputs (file handling)

  • params for configurable settings

  • publishDir to save outputs to a results folder

%%writefile bet_single.nf
params.input = './data/sub-01.nii'
params.outdir = './results_single'
params.frac  = 0.4

process BET {
    publishDir params.outdir, mode: 'copy'

    input:
    path t1w

    output:
    path '*_brain.*'

    script:
    """
    bet ${t1w} ${t1w.baseName}_brain -f ${params.frac} -m
    """
}

workflow {
    input_ch = Channel.fromPath(params.input)
    BET(input_ch)
}
!nextflow run bet_single.nf
!ls -lh results_single/

Key points:

  • Channel.fromPath(...) creates a channel from a file path

  • Inside the script block, ${t1w} refers to the staged input file

  • publishDir copies the outputs matching '*_brain.*' to our results folder

  • We could override any parameter from the command line, e.g. nextflow run bet_single.nf --frac 0.3

Multi-subject pipeline#

Real neuroimaging studies have multiple subjects. Nextflow makes this easy — we just put multiple files into a channel and Nextflow fans out automatically.

This pipeline has two processes:

  1. BET — runs brain extraction per subject (parallel fan-out)

  2. QC_SUMMARY — collects all results and generates a summary table (runs once after all BET tasks finish)

%%writefile bet_multi.nf
params.inputs = './data/sub-*.nii'
params.outdir = './results_multi'
params.frac   = 0.4

process BET {
    publishDir "${params.outdir}/bet", mode: 'copy'

    input:
    path t1w

    output:
    path '*_brain.nii.gz'
    path '*_brain_mask.nii.gz'

    script:
    """
    bet ${t1w} ${t1w.baseName}_brain -f ${params.frac} -m
    """
}

process QC_SUMMARY {
    publishDir params.outdir, mode: 'copy'

    input:
    path brain_images
    path mask_images

    output:
    path 'qc_summary.tsv'

    script:
    """
    echo -e "subject\tbrain_volume_voxels" > qc_summary.tsv
    for mask in *_brain_mask.nii.gz; do
        subj=\$(echo \$mask | sed 's/_brain_mask.nii.gz//')
        nvox=\$(fslstats \$mask -V | awk '{print \$1}')
        echo -e "\${subj}\t\${nvox}" >> qc_summary.tsv
    done
    echo "=== QC Summary ==="
    cat qc_summary.tsv
    """
}

workflow {
    t1w_ch = Channel.fromPath(params.inputs)

    BET(t1w_ch)

    QC_SUMMARY(
        BET.out[0].collect(),
        BET.out[1].collect()
    )
}
!nextflow run bet_multi.nf
%%bash
echo "=== Output files ==="
ls -lh results_multi/bet/
echo ""
echo "=== QC Summary ==="
cat results_multi/qc_summary.tsv

What’s new here?

  • Channel.fromPath('data/sub-*.nii') picks up both sub-01.nii and sub-02.nii

  • Nextflow runs BET twice in parallel (one per subject)

  • .collect() gathers all per-subject outputs into a single list and passes it to QC_SUMMARY

  • QC_SUMMARY runs once, after all BET tasks complete, and generates a combined table

This fan-out/collect pattern is the foundation of most neuroimaging Nextflow pipelines.

Visualize results#

Use ipyniivue to visualize results. Enable the first line to see the comparison.

from ipyniivue import NiiVue

nv = NiiVue()
nv.load_volumes([
    # {"path": "./data/sub-01.nii", "colormap": "gray"},
    {"path": "./results_multi/bet/sub-01_brain.nii.gz", "colormap": "red"}
])
nv

Next steps#

You now know the core Nextflow patterns. Here are some ways to extend what you’ve learned:

  • -resume: Add this flag to skip already-completed tasks when re-running a pipeline after a failure or parameter change

  • nextflow.config: Move parameters, executor settings (local/SLURM/PBS), and resource limits (CPUs, memory) into a separate config file

  • Containers: Nextflow can pull and run Docker/Singularity containers per process — set container in a process or config

  • nf-core: Browse nf-co.re for production-grade neuroimaging pipelines and community best practices

  • More modalities: Extend the glob pattern (params.inputs) to pick up T2w, FLAIR, or functional data

Dependencies and environment capture#

  • Using the package watermark to document system environment and software versions used in this notebook

%load_ext watermark

%watermark
%watermark --iversions