Papers
Topics
Authors
Recent
2000 character limit reached

MoCo-INR: Unsupervised Cardiac MRI Reconstruction

Updated 21 November 2025
  • The paper presents an unsupervised framework that decomposes dynamic cardiac MRI into a static canonical image and continuous deformation fields using implicit neural representations.
  • It employs dual neural networks with hash-based coordinate encoding and a coarse-to-fine curriculum to optimize motion compensation from undersampled k-space data.
  • Results demonstrate improved PSNR and SSIM at high acceleration factors, yielding fine-grained, artifact-suppressed reconstructions suitable for clinical wall-motion assessment.

MoCo-INR is an unsupervised motion-compensated reconstruction framework that fuses explicit motion modeling with implicit neural representations (INRs) for accelerated dynamic cardiac MRI. This approach leverages the decomposition of a dynamic image sequence into a single time-invariant canonical image and a set of continuous, time-varying deformation fields, with both components parameterized as coordinate-based neural networks. The MoCo-INR paradigm is characterized by its continuous spatial–temporal modeling, unsupervised optimization directly from undersampled k-t space data, and integration of recent neural encoding techniques to enable fine-grained, artifact-suppressed reconstructions even at high acceleration factors, with robust clinical performance for cardiac wall-motion assessment (Tian et al., 14 Nov 2025).

1. Theoretical Motivation and Rationale

MoCo-INR is motivated by the limitations of standalone motion-compensated (MoCo) and pure INR approaches to highly accelerated dynamic MRI reconstruction. Traditional MoCo methods require fully-sampled datasets or discrete image representations and tend to lose high-frequency anatomical detail due to aliasing during nonrigid warping. Pure INR-based dynamic MRI, while imposing a beneficial continuity prior, lacks explicit modeling of time-varying deformations, leading to slow convergence and suboptimal resolution in the presence of severe undersampling.

MoCo-INR addresses these deficiencies by decomposing the reconstruction task into learning (a) a static canonical appearance field and (b) a continuous, temporally-indexed deformation vector field (DVF). Explicit factorization reduces overparameterization and decouples texture from deformation, enabling unsupervised learning even when only highly undersampled, motion-corrupted data is available (Tian et al., 14 Nov 2025).

2. Mathematical Formulation

The MoCo-INR acquisition model for a spatiotemporal dynamic sequence is: yt,c=MtFScxt+nt,cy_{t,c} = M_t F S_c x_t + n_{t,c} where xtx_t is the image at time frame tt, ScS_c are the coil sensitivity maps, FF the Fourier transform, MtM_t a binary undersampling mask, and yt,cy_{t,c} the acquired k-space data for coil cc. nt,cn_{t,c} is Gaussian noise.

Motion compensation is achieved by reconstructing each frame as a nonrigid warp of the canonical image xcanox_{\text{cano}} via

xt(p)=xcano(p+ut(p)),x_t(p) = x_{\text{cano}}(p + u_t(p)),

where ut(p)u_t(p) is the deformation at spatial position pp and time tt.

Both the canonical field and the DVF are realized as coordinate-based networks:

  • gψ:R2→Cg_\psi: \mathbb{R}^2 \rightarrow \mathbb{C} (canonical image, outputs real and imaginary channels)
  • fÏ•:R2×[1,T]→R2f_\phi: \mathbb{R}^2 \times [1,T] \rightarrow \mathbb{R}^2 (DVF, outputs displacement for each coordinate and time)

The full prediction for frame tt: xt(p)=gψ(p+fϕ(p,t))x_t(p) = g_\psi(p + f_\phi(p, t))

The data consistency objective (L1-norm) is imposed in k-space for each frame and coil: LDC=∑t,c∥y^t,c−yt,c∥1L_{\text{DC}} = \sum_{t, c} \left\| \hat{y}_{t,c} - y_{t,c} \right\|_{1} where

y^t,c=MtFSc[gψ(p+fϕ(p,t))]\hat{y}_{t,c} = M_t F S_c \Big[g_\psi ( p + f_\phi (p, t) ) \Big]

Regularization on the DVFs is critical for stability. The loss includes terms for sparsity (∥ut∥1\|u_t\|_1), spatial smoothness (∥∇ut∥1\|\nabla u_t\|_1), and curvature (∥∇2ut∥1\|\nabla^2 u_t\|_1): LDVF=∑t(∥ut∥1+∥∇ut∥1+∥∇2ut∥1)L_{\text{DVF}} = \sum_t \left( \|u_t\|_{1} + \|\nabla u_t\|_{1} + \|\nabla^{2} u_t\|_{1} \right)

Total loss: L=LDC+λ LDVFL = L_{\text{DC}} + \lambda\,L_{\text{DVF}} with λ=1\lambda=1 in practice.

3. Network Architecture and Encoding Strategy

MoCo-INR employs two parallel neural networks, each leveraging hash-based coordinate encoding ("InstantNGP"-style):

  • The canonical image gψg_\psi and the deformation field fÏ•f_\phi use separate multi-level hash grid encoders.
  • Each encoder provides features at multiple spatial resolutions (â„“=1…L\ell=1\ldots L), concatenated to form the final representation.

The encoded features are fed into compact 3-layer CNN decoders:

  • Each decoder employs two 3×3 convolutional layers (64 filters, ReLU) and a final output convolution (2 channels: Re, Im for gψg_\psi; Δx\Delta x, Δy\Delta y for fÏ•f_\phi).

A "coarse-to-fine" curriculum is imposed:

  • Early optimization only unfreezes low-frequency (coarse) hash levels. High-frequency (fine) levels are progressively enabled, which stabilizes the estimation of large-scale motion before allowing the capture of detail.

4. Unsupervised Optimization and Training Protocol

  • Coil sensitivity maps are estimated from a time-averaged reference via ESPIRiT.
  • The entire system (appearance and DVF networks) is trained end-to-end on the acquired data, without ground-truth or external supervision.
  • Training uses Adam for 1,200 iterations, with standard deep learning schedules:
    • Hash-level curricula: initial iterations activate only coarse grid levels; as training progresses, finer levels are incrementally unfrozen.
    • Learning rate: 10−310^{-3}.
    • Regularization: λ=1\lambda=1 for LDVFL_{\text{DVF}}; L1 data-consistency loss.
  • Optimization exploits only the data from the subject under examination; no pretraining or transfer learning is required.

5. Performance, Ablation, and Comparative Analysis

Quantitative and qualitative results on the OCMR cine dataset, using retrospective Cartesian VISTA (AF = 12×, 20×) and golden-angle radial (AF ≈ 26×, 69×) undersampling:

Sampling / AF MoCo-INR PSNR MoCo-INR SSIM Next Best Method PSNR SSIM
VISTA 12× 42.25 dB 0.971 ST-INR + L₀S 41.35 0.972
VISTA 20× 39.53 dB 0.957 SOTA ~36.6 ~0.937
Golden-angle 26× 40.33 dB 0.960 — — —
Golden-angle 69× 37.75 dB 0.940 — — —

Ablation studies demonstrate:

  • Replacing the CNN decoder with an MLP decreases PSNR by ≈3 dB and SSIM by ≈0.04; high-frequency ringing is introduced.
  • Training all hash levels from the start (removing the curriculum) results in ≈2 dB PSNR loss and spurious DVF estimation in static regions.
  • Omitting DVF regularization decreases PSNR by ≈3 dB, increases SSIM loss by ≈0.04, and produces implausible motion fields.

In prospective free-breathing clinical evaluation (VISTA AF = 9×, 65 frames, ∼26 Hz), MoCo-INR achieves diagnostic image quality, accurate wall-motion depiction, and stronger myocardium–blood-pool boundary definition than competing unsupervised approaches. Runtimes for cine reconstruction are ≈1.3 min (VISTA retro) and ≈5.5 min (golden-angle), superior to or comparable with non-INR and hybrid INR baselines (Tian et al., 14 Nov 2025).

6. Motion-Compensated Decomposition and Separation of Appearance/Dynamics

MoCo-INR explicitly enforces a separation between anatomy and motion:

  • The canonical appearance field, parameterized by gψg_\psi, encodes time-invariant tissue texture and contrast.
  • The DVF field, parameterized by fÏ•f_\phi, encodes dynamic motion as a continuous spatiotemporal mapping.

This decomposition prevents entanglement of motion and appearance in the network's latent space, which is a documented failure mode in single-network INR or generative approaches under extreme undersampling. The explicit structure also supports interpretable motion tracking and recoverable deformation fields suitable for downstream analysis or reporting.

7. Broader Impact, Limitations, and Prospects

MoCo-INR demonstrates that explicit factorization, hierarchical encoding, and CNN-based decoders are synergistic for dynamic MRI from highly limited measurements. The framework is fully unsupervised and scan-adaptive, supporting a wide range of sampling patterns (Cartesian, radial), acceleration factors (up to 69×), and clinical conditions (including free breathing).

Major limitations are the requirement of preestimated coil sensitivities and the restriction to continuous but deterministic deformation fields—potentially limiting applicability to highly nonrigid, stochastic, or interleaved motion. Extensions to 3D, probabilistic motion models, or alternative encoding strategies are plausible directions for future research. The curriculum adopted for hash-level unfreezing is likely beneficial for other inverse problems involving deformation and appearance disentanglement in medical imaging (Tian et al., 14 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to MoCo-INR.