Papers
Topics
Authors
Recent
2000 character limit reached

Reconstruct Anything Model (RAM)

Updated 28 January 2026
  • RAM is a unified computational imaging model that leverages a multi-scale, non-iterative UNet backbone with a Krylov Subspace Module for explicit physical and noise integration.
  • It applies to diverse inverse problems such as denoising, deblurring, super-resolution, and medical imaging, demonstrating near state-of-the-art performance across tasks.
  • RAM’s efficient design enables rapid self-supervised adaptation to new modalities, achieving significant improvements in PSNR and SSIM with reduced computational overhead.

A Reconstruct Anything Model (RAM) is a unified computational imaging model designed to solve a broad spectrum of inverse problems in imaging, integrating physical knowledge of forward operators and noise, and enabling rapid adaptation to new modalities or acquisition settings. The following provides an exhaustive technical account centered on the RAM described in "Reconstruct Anything Model: a lightweight foundation model for computational imaging" (Terris et al., 11 Mar 2025).

1. Architectural Foundations

The core of RAM is a multiscale, non-iterative network that generalizes across forward models and noise types. The backbone is a bias-free residual UNet architecture (specifically a DRUNet variant), with all convolution kernels (except at network endpoints) shared among imaging modalities (grayscale, color, complex). The principal innovation is the Krylov Subspace Module (KSM), inserted at each scale to explicitly couple the features with knowledge of the physical image formation model and acquisition noise:

  • Multiscale Stages: At each scale ss, an upsampling operator UsU_s generates a coarse-to-fine mapping; the resulting As=AUsA_s = AU_s enables the network to reason at multiple spatial resolutions.
  • Krylov Subspace Module (KSM): KSM at layer \ell inputs image-space features xx^\ell and forms a Krylov basis,

{(AsTAs)kx,(AsTAs)kAsTy}k=0K\left\{ (A_s^TA_s)^k x^\ell,\, (A_s^TA_s)^k A_s^T y \right\}_{k=0}^K

with learnable mixing coefficients {αk,βk}\{\alpha_k,\beta_k\}. A 3×33\times 3 convolution then mixes the Krylov basis, and the result is re-encoded and residually combined with the latent features,

$h^{\ell+1} \pluseq \mathrm{Conv}_{\mathrm{enc}}\left( \sum_{k=0}^K \alpha_k (A_s^TA_s)^k x^\ell + \beta_k (A_s^TA_s)^k A_s^T y \right)$

where hh^{\ell} are the latent features.

2. Physics and Noise Integration

RAM operates under the non-blind linear inverse problem formalism:

yp(yAx),yRm,  xRn,  A:RnRmy \sim p(y\,|\,A x),\quad y \in \mathbb{R}^m,\; x \in \mathbb{R}^n,\; A: \mathbb{R}^n \to \mathbb{R}^m

with p(yz)p(y \mid z) either Gaussian with variance σ2I\sigma^2 I, Poisson, or their combination (Poisson–Gaussian).

Key conditioning mechanisms:

  • Proximal Input Initialization: RAM computes

u0=proxλAy22(ATy)=argminuλAuy22+uATy22u_0 = \operatorname{prox}_{\lambda \|A\cdot - y\|_2^2}(A^T y) = \arg\min_u \lambda \|Au - y\|_2^2 + \|u - A^T y\|_2^2

with a learnable scaling parameter η\eta, setting λ=ση/y1\lambda = \sigma\,\eta/\|y\|_1.

  • Explicit Noise Conditioning: For Poisson–Gaussian models,

y=γz+σn,zPoisson(x/γ),  nN(0,I)y = \gamma z + \sigma n, \quad z \sim \mathrm{Poisson}(x/\gamma),\; n\sim \mathcal{N}(0, I)

RAM encodes σ\sigma and γ\gamma as constant-valued maps in the input channel and omits bias terms for scale equivariance.

3. Training Methodology and Task Portfolio

RAM is jointly trained on a diverse collection of imaging tasks, each defined by a forward operator AgA_g and noise parameters (σg,γg)(\sigma_g, \gamma_g):

  • Natural Image Tasks: Gaussian and Poisson–Gaussian denoising, motion and Gaussian deblurring (with various blur kernel sizes), inpainting (block or pixel masks), super-resolution (bicubic/bilinear upscaling, factors 2–4), pansharpening (Brovey transform).
  • Medical Imaging Tasks: Single-coil MRI (y=MFx+σny=MFx+\sigma n, with Cartesian masks), multi-coil MRI (y=MFSx+σny_\ell=MF S_\ell x+\sigma n_\ell, 15 channels), CT (y=Ax+σny=Ax+\sigma n; Radon with 10–51 projections).
  • Datasets: LSDIR (natural images), fastMRI (MRI), LIDC-IDRI (CT).

The supervised training loss for each task is

Lg(θ)=E(σg,γg)pEyxi,g[ωgRθ(y,Ag,σg,γg)xi,g1]\mathcal{L}_g(\theta) = \mathbb{E}_{(\sigma_g,\gamma_g)\sim p} \mathbb{E}_{y|x_{i,g}}\left[ \omega_g \|R_\theta(y,A_g,\sigma_g,\gamma_g) - x_{i,g}\|_1 \right]

where ωg=AgTy/σg\omega_g = \|A_g^T y\| / \sigma_g, and RθR_\theta is the RAM reconstruction function. The total loss aggregates over all tasks:

L(θ)=g=1Gi=1NgLg(θ,xi,g)\mathcal{L}(\theta) = \sum_{g=1}^G \sum_{i=1}^{N_g} \mathcal{L}_g(\theta, x_{i,g})

Training proceeds from DRUNet-initialized weights, using a batch size of 16 per task, random 128×128128{\times}128 patches, Adam optimizer (lr =104=10^{-4}, decay factor 10×10\times at $180$k steps, total $200$k steps).

4. Self-Supervised Adaptation to Novel Operators

RAM can rapidly adapt to new, potentially unseen forward operators or noise statistics with self-supervised fine-tuning. Given only measurements {yi}\{y_i\} from the unknown setting, the training objective is:

L(θ)=i=1N[LMC(θ,yi)+ωLNULL(θ,yi)]\mathcal{L}(\theta) = \sum_{i=1}^N \left[ \mathcal{L}_{\mathrm{MC}}(\theta,y_i) + \omega \mathcal{L}_{\mathrm{NULL}}(\theta,y_i) \right]

  • Measurement Consistency (LMC\mathcal{L}_{\mathrm{MC}}): Uses SURE (if Gaussian noise is known), i.e.,

LMC=ARθ(y)y22+2σ2div(ARθ)(y)\mathcal{L}_{\mathrm{MC}} = \|A R_\theta(y) - y\|_2^2 + 2\sigma^2 \, \mathrm{div}(A \circ R_\theta)(y)

or SPLIT, which employs partial masking and expectation over masks.

  • Null-Space Regularization (LNULL\mathcal{L}_{\mathrm{NULL}}): For single operators, Equivariant Imaging regularization compares reconstructions of transformed inputs, while for families of operators, Multi-Operator Imaging is used.

Fine-tuning typically converges in minutes (e.g., a few images, a single mid-range GPU).

5. Quantitative Performance and Efficiency

RAM achieves strong quantitative and computational performance compared to iterative and unrolled architectures. Key metrics are summarized:

Method Params CBSD68 Urban100 Div2K
PDNet 30.54 28.36 32.61
Restormer 30.96 28.21 28.56
DPIR 32 M 33.45 34.39 36.32
uDPIR-tied 32 M 33.78 33.07 35.79
uDPIR-un 256 M 33.64 32.72 35.38
RAM 36 M 34.04 33.61 36.18

On medical tasks:

Method MRI×4 MRI×8 CT
PDNet 28.25/0.719 24.54/0.641 23.09/0.713
DPIR 30.54/0.784 25.28/0.661 n.a.
uDPIR-un 33.73/0.848 30.20/0.792 27.00/0.772
uDPIR-tied 34.14/0.851 30.86/0.805 28.35/0.779
RAM 34.39/0.853 31.50/0.813 28.83/0.798

Efficiency (deblurring 256×256 color):

Method GFLOPs Test Mem (MB) Train Mem (MB)
PDNet 60 52 5815
uDPIR-tied 2234 298 36374
uDPIR-un 2234 1213 37288
RAM 360 354 9670

RAM’s memory and computational profile is approximately 6×6\times faster than an 8-step unrolled network, with a parameter count (\sim36 M) comparable to that of DRUNet.

Out-of-distribution experiments confirm robust adaptation: e.g., for multi-coil MRI (r=8r=8), RAM yields 35.62/0.889 (PSNR/SSIM) versus 36.06/0.894 for uDPIR-tied; for CT with Poisson noise, RAM achieves 28.83/0.798, far outperforming baselines.

Self-supervised fine-tuning on a single Sentinel-2 satellite image for compressed sensing or demosaicing elevates PSNR from \sim10 dB (zero-shot) to \sim30–35 dB within seconds.

6. Ablations, Memory Footprint, and Limitations

  • Model Complexity: Approximately 36 M parameters, 360 GFLOPs for 256×256 images.
  • GPU Memory: \sim354 MB inference memory (batch 1); \sim9.7 GB for batch 8 during training.
  • Krylov Iterations and Scales: Best trade-off at K=3K=3 Krylov terms and 4 multiscale levels (empirically in ablations).
  • Limitations:
    • Training RAM from scratch is GPU-intensive.
    • While RAM strongly minimizes distortion (PSNR/SSIM), it does not optimize perceptual metrics—GAN/diffusion-based priors are not targeted.
    • In settings where the forward operator is nearly invertible and well-characterized, bespoke single-task unrolled models may provide marginal gains over RAM.
    • RAM serves as a backbone for possible posterior-mean-based sampling, but is not itself primarily a generative model.

7. Broader Impact and Applicability

RAM provides a unified, non-iterative solution across diverse computational imaging challenges, with extensibility to new operators and noise distributions via light self-supervised adaptation. It is effective across both natural and medical image datasets, consistently yielding near–state-of-the-art results but with hardware requirements suitable for practical deployment. Its architectural paradigm—explicit forward model integration, Krylov-based conditioning, and modality-agnostic kernel sharing—serves as an extensible blueprint for future foundation models in inverse imaging (Terris et al., 11 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reconstruct Anything Model (RAM).