Dynamic Stream Network (DySNet)
- Dynamic Stream Network (DySNet) is a feature-modeling backbone designed to address the combinatorial explosion in deformable medical image registration by dynamically adjusting receptive fields and aggregation weights.
- Its key modules, Adaptive Stream Basin (AdSB) and Dynamic Stream Attention (DySA), enable spatially adaptive sampling and data-dependent feature aggregation for improved anatomical correspondence.
- Quantitative evaluations on CT and MRI benchmarks demonstrate enhanced Dice scores and reduced feature interference, highlighting its robust performance and clinical relevance.
Dynamic Stream Network (DySNet) is a feature-modeling backbone specifically designed to address the combinatorial explosion in deformable medical image registration (DMIR), where dual-image input leads to exponential growth in potential feature correspondences and a proliferation of interfering feature matches. DySNet dynamically adapts both the receptive fields ("where" to sample) and feature aggregation weights ("how much" to aggregate) in response to input data, via two key modules: Adaptive Stream Basin (AdSB) and Dynamic Stream Attention (DySA). This approach enables more precise modeling of relevant anatomical correspondences while suppressing the aggregation of less-informative or interfering features, leading to improved registration performance and generalization across modalities and scales (Bi et al., 22 Dec 2025).
1. Architectural Overview
At the core of DySNet are Dynamic Stream Blocks (DSBs), which symmetrically process pairs of images—denoted as fixed () and moving ()—in a bidirectional manner. Each DSB performs the following steps:
- Linear projections, typically convolutions, generate query (), key (), and value () tensors from feature maps (fixed image) and (moving image):
- Adaptive Stream Basin (AdSB) predicts spatial offsets to dynamically deform the sampling window for each spatial location.
- Dynamic Stream Attention (DySA) computes spatially-constrained, data-dependent attention weights within the deformed receptive field.
- Feature aggregation is performed based on these dynamically weighted and positioned samples.
DSBs are stacked bidirectionally and symmetrically across both registration directions; symmetry enforces inverse relationships between forward and backward deformation fields, promoting anatomical topology preservation.
2. Adaptive Stream Basin (AdSB): Dynamic Receptive Fields
The AdSB module adapts the shape and position of local feature sampling windows on a per-pixel, per-kernel-point basis. For a fixed -point kernel (e.g., ), AdSB predicts per-location offsets : where is the canonical grid and is predicted by a compact CNN operating on the concatenated features .
Keys and values at fractional coordinates in are sampled using bilinear interpolation: This deformable strategy enables adaptive focusing on the most structurally relevant areas, responding to local image content.
3. Dynamic Stream Attention (DySA): Data-Dependent Aggregation
DySA computes the relative importance of points within each deformed local neighborhood for every spatial location. The query and dynamically-sampled keys yield similarity scores via scaled dot-product attention: A softmax over these scores produces spatial attention weights . These are fused with a learned channel gate to produce feature- and channel-dependent dynamic kernels: This two-factor mechanism adaptively filters out less relevant correspondences and accentuates meaningful alignments.
4. Symmetry, Loss Function, and Training Objective
DySNet employs a symmetric bidirectional loss to jointly optimize both and registration: with each term combining similarity and smoothness constraints: employs negative Dice or cross-correlation losses, and regularizes large gradients and negative Jacobians, thereby penalizing non-diffeomorphic warps. Symmetric optimization enforces that the learned deformations respect anatomical invertibility, thereby increasing reliability for clinical contexts (Bi et al., 22 Dec 2025).
5. Experimental Protocol and Baseline Comparisons
DySNet's efficacy was rigorously evaluated on diverse DMIR benchmarks:
- 3D Cardiac CT: MM-WHS, ASOCA, CAT08 datasets (total: 152 volumes)
- 3D Brain MRI: PPMI (837 train) and CANDI (103 test, 28 tissues)
- 2D Brain MRI: OASIS-1 (361 train/53 test)
Baselines spanned CNN-only (VoxelMorph), transformer-only (ViT-V-Net), hybrid (ModeT, XMorpher, TransMorph), and deformable attention architectures. Key evaluation metrics included Dice Similarity Coefficient (DSC) and the fraction of folding voxels (). Implementation was in PyTorch, using AdamW (lr = ), batch size 1, for up to 200 epochs on an RTX 6000.
6. Quantitative Results and Ablation Analyses
DySNet demonstrated consistently superior results across all benchmarks:
| Task | DySNet-M/DSC (%) | Best Baseline/DSC (%) | Fold (%) |
|---|---|---|---|
| 3D Cardiac CT | 84.1 | ModeT (83.6) | ~0.7–0.9 |
| 3D Brain MRI | 79.7 | ModeT (77.4) | – |
| 2D Brain MRI | 83.0 | XMorpher (76.5) | – |
| Avg (DySNet-M) | 82.0 | ModeT (81.0) | – |
| Avg (DySNet-X) | 83.0 | XMorpher (73.4) | – |
Ablation Studies:
- Bidirectional symmetry yields a 4.5% DSC gain over baseline.
- Isolating DySA provides an additional 1–2% DSC improvement; AdSB adds 1%.
- Kernel size is not critical; DSC remains constant (82.9–83.0%) for to .
- DySA's attention maps concentrate on correct anatomical correspondences, confirmed by heat-map visualization.
- DySNet maintains substantial accuracy and low folding even with as little as 5% of training data, demonstrating strong sample efficiency.
7. Generalization Capability and Integration
DySNet generalizes robustly across modality (CT vs. MRI), dimensionality (2D vs. 3D), and anatomical complexity (heart vs. brain). Performance is stable across different kernel sizes and unaffected by reduced training set size. The dynamic and modular design of AdSB and DySA supports integration into existing DMIR frameworks as "plug-and-play" components (e.g., as replacements in XMorpher or ModeT pipelines).
A salient practical impact is the reduction of spurious or interfering feature matches by adaptively excluding uninformative neighborhoods from aggregation. The combined effect of dynamic receptive fields and attention promotes anatomically meaningful correspondences, thereby simultaneously enhancing registration accuracy and preserving deformation plausibility (Bi et al., 22 Dec 2025).
References
- "Dynamic Stream Network for Combinatorial Explosion Problem in Deformable Medical Image Registration" (Bi et al., 22 Dec 2025)