Reconstruct Anything Model (RAM)

Updated 28 January 2026

RAM is a unified computational imaging model that leverages a multi-scale, non-iterative UNet backbone with a Krylov Subspace Module for explicit physical and noise integration.
It applies to diverse inverse problems such as denoising, deblurring, super-resolution, and medical imaging, demonstrating near state-of-the-art performance across tasks.
RAM’s efficient design enables rapid self-supervised adaptation to new modalities, achieving significant improvements in PSNR and SSIM with reduced computational overhead.

A Reconstruct Anything Model (RAM) is a unified computational imaging model designed to solve a broad spectrum of inverse problems in imaging, integrating physical knowledge of forward operators and noise, and enabling rapid adaptation to new modalities or acquisition settings. The following provides an exhaustive technical account centered on the RAM described in "Reconstruct Anything Model: a lightweight foundation model for computational imaging" (Terris et al., 11 Mar 2025).

1. Architectural Foundations

The core of RAM is a multiscale, non-iterative network that generalizes across forward models and noise types. The backbone is a bias-free residual UNet architecture (specifically a DRUNet variant), with all convolution kernels (except at network endpoints) shared among imaging modalities (grayscale, color, complex). The principal innovation is the Krylov Subspace Module (KSM), inserted at each scale to explicitly couple the features with knowledge of the physical image formation model and acquisition noise:

Multiscale Stages: At each scale $s$ , an upsampling operator $U_s$ generates a coarse-to-fine mapping; the resulting $A_s = AU_s$ enables the network to reason at multiple spatial resolutions.
Krylov Subspace Module (KSM): KSM at layer $\ell$ inputs image-space features $x^\ell$ and forms a Krylov basis,

$\left\{ (A_s^TA_s)^k x^\ell,\, (A_s^TA_s)^k A_s^T y \right\}_{k=0}^K$

with learnable mixing coefficients $\{\alpha_k,\beta_k\}$ . A $3\times 3$ convolution then mixes the Krylov basis, and the result is re-encoded and residually combined with the latent features,

$h^{\ell+1} \pluseq \mathrm{Conv}_{\mathrm{enc}}\left( \sum_{k=0}^K \alpha_k (A_s^TA_s)^k x^\ell + \beta_k (A_s^TA_s)^k A_s^T y \right)$

where $h^{\ell}$ are the latent features.

2. Physics and Noise Integration

RAM operates under the non-blind linear inverse problem formalism:

$y \sim p(y\,|\,A x),\quad y \in \mathbb{R}^m,\; x \in \mathbb{R}^n,\; A: \mathbb{R}^n \to \mathbb{R}^m$

with $p(y \mid z)$ either Gaussian with variance $\sigma^2 I$ , Poisson, or their combination (Poisson–Gaussian).

Key conditioning mechanisms:

Proximal Input Initialization: RAM computes

$u_0 = \operatorname{prox}_{\lambda \|A\cdot - y\|_2^2}(A^T y) = \arg\min_u \lambda \|Au - y\|_2^2 + \|u - A^T y\|_2^2$

with a learnable scaling parameter $\eta$ , setting $\lambda = \sigma\,\eta/\|y\|_1$ .

Explicit Noise Conditioning: For Poisson–Gaussian models,

$y = \gamma z + \sigma n, \quad z \sim \mathrm{Poisson}(x/\gamma),\; n\sim \mathcal{N}(0, I)$

RAM encodes $\sigma$ and $\gamma$ as constant-valued maps in the input channel and omits bias terms for scale equivariance.

3. Training Methodology and Task Portfolio

RAM is jointly trained on a diverse collection of imaging tasks, each defined by a forward operator $A_g$ and noise parameters $(\sigma_g, \gamma_g)$ :

Natural Image Tasks: Gaussian and Poisson–Gaussian denoising, motion and Gaussian deblurring (with various blur kernel sizes), inpainting (block or pixel masks), super-resolution (bicubic/bilinear upscaling, factors 2–4), pansharpening (Brovey transform).
Medical Imaging Tasks: Single-coil MRI ( $y=MFx+\sigma n$ , with Cartesian masks), multi-coil MRI ( $y_\ell=MF S_\ell x+\sigma n_\ell$ , 15 channels), CT ( $y=Ax+\sigma n$ ; Radon with 10–51 projections).
Datasets: LSDIR (natural images), fastMRI (MRI), LIDC-IDRI (CT).

The supervised training loss for each task is

$\mathcal{L}_g(\theta) = \mathbb{E}_{(\sigma_g,\gamma_g)\sim p} \mathbb{E}_{y|x_{i,g}}\left[ \omega_g \|R_\theta(y,A_g,\sigma_g,\gamma_g) - x_{i,g}\|_1 \right]$

where $\omega_g = \|A_g^T y\| / \sigma_g$ , and $R_\theta$ is the RAM reconstruction function. The total loss aggregates over all tasks:

$\mathcal{L}(\theta) = \sum_{g=1}^G \sum_{i=1}^{N_g} \mathcal{L}_g(\theta, x_{i,g})$

Training proceeds from DRUNet-initialized weights, using a batch size of 16 per task, random $128{\times}128$ patches, Adam optimizer (lr $=10^{-4}$ , decay factor $10\times$ at $180$k steps, total $200$k steps).

4. Self-Supervised Adaptation to Novel Operators

RAM can rapidly adapt to new, potentially unseen forward operators or noise statistics with self-supervised fine-tuning. Given only measurements $\{y_i\}$ from the unknown setting, the training objective is:

$\mathcal{L}(\theta) = \sum_{i=1}^N \left[ \mathcal{L}_{\mathrm{MC}}(\theta,y_i) + \omega \mathcal{L}_{\mathrm{NULL}}(\theta,y_i) \right]$

Measurement Consistency ( $\mathcal{L}_{\mathrm{MC}}$ ): Uses SURE (if Gaussian noise is known), i.e.,

$\mathcal{L}_{\mathrm{MC}} = \|A R_\theta(y) - y\|_2^2 + 2\sigma^2 \, \mathrm{div}(A \circ R_\theta)(y)$

or SPLIT, which employs partial masking and expectation over masks.

Null-Space Regularization ( $\mathcal{L}_{\mathrm{NULL}}$ ): For single operators, Equivariant Imaging regularization compares reconstructions of transformed inputs, while for families of operators, Multi-Operator Imaging is used.

Fine-tuning typically converges in minutes (e.g., a few images, a single mid-range GPU).

5. Quantitative Performance and Efficiency

RAM achieves strong quantitative and computational performance compared to iterative and unrolled architectures. Key metrics are summarized:

Method	Params	CBSD68	Urban100	Div2K
PDNet	–	30.54	28.36	32.61
Restormer	–	30.96	28.21	28.56
DPIR	32 M	33.45	34.39	36.32
uDPIR-tied	32 M	33.78	33.07	35.79
uDPIR-un	256 M	33.64	32.72	35.38
RAM	36 M	34.04	33.61	36.18

On medical tasks:

Method	MRI×4	MRI×8	CT
PDNet	28.25/0.719	24.54/0.641	23.09/0.713
DPIR	30.54/0.784	25.28/0.661	n.a.
uDPIR-un	33.73/0.848	30.20/0.792	27.00/0.772
uDPIR-tied	34.14/0.851	30.86/0.805	28.35/0.779
RAM	34.39/0.853	31.50/0.813	28.83/0.798

Efficiency (deblurring 256×256 color):

Method	GFLOPs	Test Mem (MB)	Train Mem (MB)
PDNet	60	52	5815
uDPIR-tied	2234	298	36374
uDPIR-un	2234	1213	37288
RAM	360	354	9670

RAM’s memory and computational profile is approximately $6\times$ faster than an 8-step unrolled network, with a parameter count ( $\sim$ 36 M) comparable to that of DRUNet.

Out-of-distribution experiments confirm robust adaptation: e.g., for multi-coil MRI ( $r=8$ ), RAM yields 35.62/0.889 (PSNR/SSIM) versus 36.06/0.894 for uDPIR-tied; for CT with Poisson noise, RAM achieves 28.83/0.798, far outperforming baselines.

Self-supervised fine-tuning on a single Sentinel-2 satellite image for compressed sensing or demosaicing elevates PSNR from $\sim$ 10 dB (zero-shot) to $\sim$ 30–35 dB within seconds.

6. Ablations, Memory Footprint, and Limitations

Model Complexity: Approximately 36 M parameters, 360 GFLOPs for 256×256 images.
GPU Memory: $\sim$ 354 MB inference memory (batch 1); $\sim$ 9.7 GB for batch 8 during training.
Krylov Iterations and Scales: Best trade-off at $K=3$ Krylov terms and 4 multiscale levels (empirically in ablations).
Limitations:
- Training RAM from scratch is GPU-intensive.
- While RAM strongly minimizes distortion (PSNR/SSIM), it does not optimize perceptual metrics—GAN/diffusion-based priors are not targeted.
- In settings where the forward operator is nearly invertible and well-characterized, bespoke single-task unrolled models may provide marginal gains over RAM.
- RAM serves as a backbone for possible posterior-mean-based sampling, but is not itself primarily a generative model.

7. Broader Impact and Applicability

RAM provides a unified, non-iterative solution across diverse computational imaging challenges, with extensibility to new operators and noise distributions via light self-supervised adaptation. It is effective across both natural and medical image datasets, consistently yielding near–state-of-the-art results but with hardware requirements suitable for practical deployment. Its architectural paradigm—explicit forward model integration, Krylov-based conditioning, and modality-agnostic kernel sharing—serves as an extensible blueprint for future foundation models in inverse imaging (Terris et al., 11 Mar 2025).

Markdown Upgrade to Chat

References (1)

Reconstruct Anything Model: a lightweight foundation model for computational imaging (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reconstruct Anything Model (RAM).