Papers
Topics
Authors
Recent
Search
2000 character limit reached

DDoS-UNet for Dynamic MRI Super-Resolution

Updated 28 January 2026
  • DDoS-UNet is a deep learning architecture that leverages dual-channel inputs to combine spatial and temporal information for high-resolution dynamic MRI reconstruction.
  • It employs a modified 3D UNet with trilinear upsampling and 1×1×1 convolutions, effectively reducing artefacts and preserving anatomical consistency.
  • Quantitative results demonstrate significant improvements over baseline methods, with higher SSIM, PSNR, and lower NRMSE under extreme undersampling.

DDoS-UNet is a deep learning architecture specifically designed for super-resolution in dynamic magnetic resonance imaging (MRI), addressing the central limitation posed by the spatio-temporal trade-off inherent to rapid MR data acquisition. Traditional approaches either recover spatial detail at the expense of temporal resolution or treat each temporal frame independently, thus losing temporal coherence. DDoS-UNet introduces a dynamic dual-channel 3D UNet framework that explicitly leverages both spatial and temporal information, yielding enhanced reconstructions of dynamic MRI sequences even under extreme undersampling conditions (Chatterjee et al., 2022).

1. Spatio-Temporal Trade-off in Dynamic MRI

Dynamic MRI seeks to visualize organ motion or physiological changes, necessitating the acquisition of image volumes on sub-second timescales. The main limitation is the spatio-temporal trade-off: acquiring a small percentage of k-space per time frame (e.g., 4% of lines in the phase-encode direction) yields a theoretical acceleration factor of 25 but leads to pronounced spatial blurring due to the loss of high-frequency information. Standard super-resolution approaches based on single-image models process each time point as an independent entity, neglecting inter-frame redundancy and temporal consistency. DDoS-UNet circumvents this by incorporating information from both a high-resolution static “planning” scan and previously reconstructed frames, effectively deploying spatial and temporal priors at each step (Chatterjee et al., 2022).

2. Architecture: Modified 3D UNet with Dual-Channel Input

The DDoS-UNet design builds upon the standard 3D UNet encoder-decoder with skip-connections, introducing two pivotal architectural modifications:

  • Dual-channel input: The first convolutional layer receives as input two channels—the current low-resolution (LR) volume (trilinearly upsampled to the high-resolution grid) and a high-resolution prior, which is the static planning scan for the initial time frame and the network’s previous super-resolved (SR) output for subsequent frames.
  • Decoder: Trilinear Upsampling with 1×1×1 Convolution: Instead of transposed convolutions, each decoder up-block performs trilinear upsampling by a factor of two, followed by a 1×1×1 convolution (to mitigate checkerboard artefacts), and then two standard 3×3×3 convolutions with ReLU activations.

The network comprises three down-sampling and three up-sampling blocks. The down-sampling pathway doubles feature maps (starting at 64), each block comprising two 3×3×3 conv + ReLU layers and an average pooling layer (kernel size 2), while the up-sampling pathway reduces feature map dimensionality and concatenates skip-connections. A final 1×1×1 convolution produces the SR output volume (Chatterjee et al., 2022).

3. Stepwise Temporal Inference

DDoS-UNet operates in two distinct temporal phases for each sequence:

  • Antipasto (Initial) Phase: For the first time-point (TP0\mathrm{TP}_0), Channel 1 is the LR input and Channel 2 is the static HR planning scan. The network outputs the initial super-resolved frame SR0\mathrm{SR}_{0}.
  • Recursive Phase: For subsequent time-points n=1,,Nn=1,\ldots,N, Channel 1 is the current LR, and Channel 2 is the SR output from the previous frame (SRn1\mathrm{SR}_{n-1}). This enables temporal propagation of anatomical detail and consistency throughout the series.

This recursive inference bootstraps the reconstruction at each time step using previously inferred high-resolution information, thus regularizing reconstructions for both spatial fidelity and temporal stability (Chatterjee et al., 2022).

4. Mathematical Formulation and Loss Functions

In single-frame super-resolution, the mapping is HR^t=F(LRt;θ)\hat{HR}_t = \mathcal{F}(LR_t; \theta). DDoS-UNet extends this to a dual-channel formulation:

HR^t=F(LRt, HR^t1; θ).\hat{HR}_t = \mathcal{F}(LR_t,\ \hat{HR}_{t-1};\ \theta).

The objective is enforced via a perceptual loss. Multi-scale features are extracted using a pre-trained, frozen UNet MSS; the loss Lp\mathcal{L}_p is computed as the L1L_1 norm of the differences between predicted and ground-truth features at multiple scales. This penalizes discrepancies in perceptual structure as opposed to pixel-wise error, encouraging sharper output and structural fidelity.

Evaluation of reconstruction quality utilizes the structural similarity index (SSIM):

SSIM(x,y)=(2μxμy+C1)(2σxy+C2)(μx2+μy2+C1)(σx2+σy2+C2),\mathrm{SSIM}(x,y) = \frac{(2\mu_x\mu_y + C_1)(2\sigma_{xy} + C_2)}{(\mu_x^2 + \mu_y^2 + C_1)(\sigma_x^2 + \sigma_y^2 + C_2)},

where μ\mu and σ\sigma are local means and standard deviations (with constants C1C_1, C2C_2 for stabilization).

Undersampling fraction pp yields an acceleration factor AF=1/(p/100)AF = 1/(p/100); 4% sampling equates to AF=25AF = 25 (Chatterjee et al., 2022).

5. Training Protocol and Data

The training set uses the CHAOS abdominal T1 dataset (40 subjects), “animated” by random elastic deformations into synthetic 25-frame sequences to simulate breathing motion, partitioned into 70% train and 30% validation. Real test data consists of five volunteers, each with a breath-hold static scan and a free-breathing 25-frame dynamic MRI series. All data are retrospectively undersampled in-plane to retain 10%, 6.25%, or 4% of central k-space.

Volumes are trilinearly upsampled prior to input. Training uses batch size 1, the Adam optimizer (1×1041 \times 10^{-4} learning rate), and 100 epochs. The perceptual loss module is frozen throughout (Chatterjee et al., 2022).

6. Quantitative Performance and Comparative Analysis

Quantitative evaluation under the most aggressive undersampling (4% centre k-space, AF25AF \approx 25) yields:

  • DDoS-UNet: SSIM = 0.951±0.0170.951 \pm 0.017, PSNR = 37.56±2.1837.56 \pm 2.18 dB, NRMSE = 0.024±0.0060.024 \pm 0.006
  • Baselines:
    • Trilinear interpolation: SSIM 0.765±0.0220.765 \pm 0.022
    • Zero-padding: SSIM 0.863±0.0210.863 \pm 0.021
    • Single-input UNet (static): SSIM 0.916±0.0150.916 \pm 0.015
    • Single-input UNet (dynamic): SSIM 0.914±0.0120.914 \pm 0.012

For less aggressive undersampling (6.25%, 10%), SSIM improves to approximately 0.967±0.0110.967 \pm 0.011 and 0.980±0.0060.980 \pm 0.006, respectively. All improvements over the baselines are statistically significant (Mann–Whitney U, p<0.001p < 0.001). DDoS-UNet is computationally efficient, reconstructing volumes at approximately $0.36$ seconds per frame (Chatterjee et al., 2022).

7. Architectural Innovations and Impact

Key architectural innovations include:

  • Dual-channel learning: Forces the network to learn two intertwined representations: a spatial SR mapping and a temporal regularizer that models inter-frame coherence. This is formalized by Ψ(θ)=F1(LRt;HRt)+F2(HRt1;HRt)\Psi(\theta) = \mathcal{F}_1(LR_t; HR_t) + \mathcal{F}_2(HR_{t-1}; HR_t), where F1\mathcal{F}_1 and F2\mathcal{F}_2 correspond to spatial and temporal mappings, respectively.
  • Trilinear upsampling plus 1×1×11 \times 1 \times 1 convolution: Eliminates checkerboard artefacts without incurring prohibitive computational burden.
  • Whole-volume processing: Ensures anatomically consistent priors even under deformations (e.g., breathing), avoiding the misalignment problems inherent to patch-based techniques.
  • Perceptual loss: Regularizes outputs toward structural fidelity and enhanced sharpness, as opposed to pixel-wise L2L_2 minimization.

DDoS-UNet demonstrates that by orchestrating a lightweight 3D UNet backbone with explicit high-resolution and temporal priors, dynamic MRI can achieve rapid, high-fidelity reconstructions, effectively breaking the conventional spatio-temporal compromise (Chatterjee et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DDoS-UNet.