GenTract: Generative Global Brain Tractography

Updated 24 November 2025

GenTract is a conditional generative model for global brain tractography that uses dMRI-derived SH volumes to produce anatomically plausible streamline sets.
It integrates diffusion-based and flow-matching paradigms to efficiently generate full tractograms while minimizing error propagation and noise sensitivity.
The framework leverages transformer generators and VAE-based anatomical encoders to achieve state‐of-the-art precision, robust performance, and fast inference.

GenTract is a conditional generative model for global brain tractography based on diffusion magnetic resonance imaging (dMRI). It reframes the process of tractogram reconstruction as a conditional generative modeling task, learning a direct mapping from dMRI-derived spherical harmonic (SH) coefficient volumes to complete, anatomically plausible sets of streamlines. GenTract represents the first generative approach for global tractography, integrating recent advances in diffusion-based and flow-matching generative paradigms to produce full tractograms in parallel, conditioned on the global dMRI signal (Sargood et al., 17 Nov 2025).

1. Background and Problem Formulation

Tractography aims to reconstruct trajectories of white-matter fiber bundles by inferring streamlines from dMRI data. Classical local methods generate streamlines incrementally by advancing along estimated local fiber orientations. These approaches suffer from error accumulation and high false positive rates, particularly under noisy or low-resolution conditions. Global tractography methods jointly optimize entire collections of streamlines for consistency with local fiber orientations but incur significant computational costs and may converge to suboptimal solutions.

Formally, GenTract operates between spaces:

$\mathcal{X} = \{x \in \mathbb{R}^{H \times W \times D \times m}\}$ : 3D dMRI-derived SH coefficient volumes with $m$ coefficients per voxel.
$\mathcal{Y} = \{Y = \{s^{(i)}_{i=1}^N\},\ s^{(i)} \in \mathbb{R}^{p \times 3}\}$: tractograms comprising $N$ streamlines, each a sequence of $p$ 3D points.

The generative objective is to learn a conditional generator $G: \mathcal{X} \rightarrow \mathcal{Y}$ such that

$G(x) = \{s^{(i)}_{i=1}^N \sim p_\theta(Y|x)\},$

with the parameter vector $\theta$ optimized to minimize negative log-likelihood: $\theta^* = \arg \min_\theta \mathbb{E}_{X,Y \sim \text{data}}[-\log p_\theta(Y|X)].$

2. Generative Modeling Paradigms

GenTract implements two families of conditional generative models for $\mathcal{Y}$ given $\mathcal{X}$ : diffusion-based generative modeling and flow-matching.

2.1 Diffusion-Based Generative Modeling

This approach employs a continuous-time stochastic differential equation (SDE) framework. The forward (noising) SDE is defined as

$d x_t = f(x_t, t)\, dt + g(t)\, dw_t, \quad x_0 \sim p_\text{data},$

where $w_t$ is standard Brownian motion. Generation proceeds via the reverse-time SDE: $d x_t = [f(x_t, t) - g(t)^2 \nabla_x \log p_t(x_t)]\,dt + g(t)\,d\bar{w}_t.$ A score network $s_\theta(x_t, t)$ is trained to approximate $\nabla_x \log p_t(x_t)$ using the denoising score matching objective: $\mathcal{L}_D(\theta) = \mathbb{E}_{t \sim U[0,1], x_0 \sim p_\text{data}, \epsilon \sim \mathcal{N}(0, I)} \left[\lVert \epsilon_\theta(x_t, t) - \epsilon \rVert^2 \right],$ where $x_t = \alpha_t x_0 + \sigma_t \epsilon$ and $\epsilon_\theta$ predicts $\epsilon$ .

2.2 Flow-Matching Generative Modeling

In flow-matching, a neural network $v_\theta(x, t)$ learns a vector field $u_t(x)$ transporting noise $x_0 \sim N(0,I)$ to data $x_1$ along the ODE: $\frac{dx_t}{dt} = u_t(x_t),\quad x_t = (1-t)x_0 + t x_1.$ The training objective minimizes: $\mathcal{L}_{FM}(\theta) = \mathbb{E}_{t, x_0, x_1} [\lVert v_\theta(x_t, t) - (x_1 - x_0) \rVert^2].$

3. Model Architecture

GenTract’s architecture consists of two principal modules.

3.1 Anatomical Conditioning Encoder

For each of $m$ SH coefficient volumes, an independent variational autoencoder (VAE) is trained:

Encoder $E^{(i)}: \mathbb{R}^{H\times W\times D} \to \mathbb{R}^{C_z\times H_z\times W_z\times D_z}$ ,
Decoder $D^{(i)}: \mathbb{R}^{C_z\times H_z\times W_z\times D_z} \to \mathbb{R}^{H\times W\times D}$ .

Each VAE is optimized with a composite objective: $L_{VAE}^{(i)} = L_{rec}^{(i)} + L_{perc}^{(i)} + L_{adv}^{(i)} + \beta\, KL(q(z|x) || p(z)).$ The resulting $m$ latent tensors $z^{(i)}$ are fused via a shared 3D ResNet-style encoder and concatenated into a global context vector $\mathbf{z}$ .

3.2 Conditional Transformer Generator

Inputs to the generator are:

Noisy streamline $x_t \in \mathbb{R}^{p \times 3}$ ,
Generation time $t$ ,
Conditioning context $z$ .

Inputs are linearly projected into an embedding space with dimension $n$ and supplemented with sinusoidal position encodings and learned time embeddings. The core comprises $M=8$ Transformer layers with self-attention over streamline points and cross-attention to inject global anatomical context. The output is a residual 3D offset or denoised 3D point sequence.

Key architectural details include embedding dimension $n=256$ and usage of 10 DDIM inference steps for diffusion-based generation.

4. Training Protocol and Data

4.1 Loss Functions

The overall training objective aggregates:

VAE per-coefficient losses: $L_{rec}$ , $L_{perc}$ , $L_{adv}$ , $L_{KL}$ ,
Generative loss: $\mathcal{L}_D(\theta)$ or $\mathcal{L}_{FM}(\theta)$ .

Gradients are backpropagated throughout both the transformer-based generator and the anatomical encoder.

4.2 Datasets and Augmentation

GenTract is trained on the HCP Young Adult dataset (1,042 subjects), targeting PyAFQ-filtered tractograms with 24 known bundles. The dataset is split into training (75%), validation (10%), and test (15%) partitions at the subject level.

Data augmentations include deterministic rotations of both dMRI volumes and streamlines by $\pm$ 15°, $\pm$ 30°, $\pm$ 45°. Synthetic corruptions are introduced at test time:

Rician noise with $\sigma = 0.005$ on diffusion-weighted images,
Downsampling to $3\,\text{mm}^3$ with added noise to simulate clinical image quality.

An external robustness evaluation was performed on TractoInferno (28 subjects) with similar degradation.

4.3 Model Implementation

VAE initialized via fine-tuned MAISI backbone (Guo et al. 2025).
All modules implemented in MONAI and run on NVIDIA H100 accelerators.
Inference proceeds in batches with 10 DDIM steps.

5. Evaluation Metrics and Quantitative Results

5.1 Metrics

Given the lack of direct fiber “ground truth,” GenTract is evaluated on:

Precision: $\text{Precision} = \text{TP}/(\text{TP}+\text{FP})$ , where TP/FP are streamlines retained/discarded by reference filtering.
Bundle count: Number of bundles (out of 24 or 51) populated by at least one plausible streamline.
Inference time: Mean time for full tractogram generation.

General metrics defined but not reported in the study include recall and coverage.

5.2 Comparative Performance

	BS % Precision	BS Bundles	TO-Net % Precision
tckglobal	0.19	42.91	17.83
iFOD2	1.96	48.88	6.30
SD Stream	4.71	47.85	11.10
DDTracking	0.49	9.30	8.35
TractOracle	28.93	48.20	39.55
GenTract	61.95	36.62	56.35

On clean HCP test data, GenTract achieves 61.95% precision versus 28.93% for TractOracle ( $2.1\times$ improvement).
Under Rician noise, GenTract maintains 60.32% precision vs. 22.06% for TractOracle.
On low-resolution, noisy data, GenTract’s precision is 15.73%, exceeding the next-best (TractOracle, 1.12%) by an order of magnitude.
On external TractoInferno (low-res + noise), GenTract retains 24.94% precision (TractOracle: 9.74%).

5.3 Inference Speed

Model Type	Mean Inference Time (s)
Classical global/local	~2,000–3,500
Deep-learning local	~150–300
GenTract	~230

This suggests GenTract matches or exceeds deep local approaches in efficiency and is an order of magnitude faster than classical global methods.

6. Robustness and Qualitative Behavior

Qualitative analysis indicates that GenTract reliably retains anatomically plausible bundles across a range of acquisition conditions. For example, in the Right Superior Longitudinal Fasciculus (SLFR), GenTract maintains plausible streamline geometry under clean, noisy, and low-resolution/noisy conditions, while other methods (SD Stream, iFOD2, tckglobal) fail entirely in challenging scenarios. Error heatmaps and streamline overlays demonstrate GenTract’s robust anatomical plausibility and resistance to noise and resolution degradation.

7. Limitations and Future Directions

GenTract advances tractography by directly sampling tractograms conditioned on full-brain SH embeddings, thereby eliminating stepwise error propagation and the need for manual seeding masks. The global latent context enhances robustness against noise and resolution effects.

However, limitations include:

Supervision is currently limited by the PyAFQ 24-bundle labeling, contributing to a modest false negative rate.
Results depend on proxy “ground truth” and synthetic corruptions, which may introduce evaluation biases.

Future research directions include:

Expanding the training set with more bundles or moving toward unsupervised/global training paradigms that avoid explicit streamline labels.
Validation on real clinical datasets with genuine low-resolution and noise characteristics.
Incorporating anatomically-informed priors during training and adopting alternative evaluation metrics such as topological fidelity and coverage.

GenTract establishes a new generative framework for global tractography, providing state-of-the-art precision on research-grade datasets and demonstrating robustness to clinically relevant data degradations (Sargood et al., 17 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

GenTract: Generative Global Tractography (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to GenTract.