Reconstruct Anything Model (RAM)
- RAM is a unified computational imaging model that leverages a multi-scale, non-iterative UNet backbone with a Krylov Subspace Module for explicit physical and noise integration.
- It applies to diverse inverse problems such as denoising, deblurring, super-resolution, and medical imaging, demonstrating near state-of-the-art performance across tasks.
- RAM’s efficient design enables rapid self-supervised adaptation to new modalities, achieving significant improvements in PSNR and SSIM with reduced computational overhead.
A Reconstruct Anything Model (RAM) is a unified computational imaging model designed to solve a broad spectrum of inverse problems in imaging, integrating physical knowledge of forward operators and noise, and enabling rapid adaptation to new modalities or acquisition settings. The following provides an exhaustive technical account centered on the RAM described in "Reconstruct Anything Model: a lightweight foundation model for computational imaging" (Terris et al., 11 Mar 2025).
1. Architectural Foundations
The core of RAM is a multiscale, non-iterative network that generalizes across forward models and noise types. The backbone is a bias-free residual UNet architecture (specifically a DRUNet variant), with all convolution kernels (except at network endpoints) shared among imaging modalities (grayscale, color, complex). The principal innovation is the Krylov Subspace Module (KSM), inserted at each scale to explicitly couple the features with knowledge of the physical image formation model and acquisition noise:
- Multiscale Stages: At each scale , an upsampling operator generates a coarse-to-fine mapping; the resulting enables the network to reason at multiple spatial resolutions.
- Krylov Subspace Module (KSM): KSM at layer inputs image-space features and forms a Krylov basis,
with learnable mixing coefficients . A convolution then mixes the Krylov basis, and the result is re-encoded and residually combined with the latent features,
$h^{\ell+1} \pluseq \mathrm{Conv}_{\mathrm{enc}}\left( \sum_{k=0}^K \alpha_k (A_s^TA_s)^k x^\ell + \beta_k (A_s^TA_s)^k A_s^T y \right)$
where are the latent features.
2. Physics and Noise Integration
RAM operates under the non-blind linear inverse problem formalism:
with either Gaussian with variance , Poisson, or their combination (Poisson–Gaussian).
Key conditioning mechanisms:
- Proximal Input Initialization: RAM computes
with a learnable scaling parameter , setting .
- Explicit Noise Conditioning: For Poisson–Gaussian models,
RAM encodes and as constant-valued maps in the input channel and omits bias terms for scale equivariance.
3. Training Methodology and Task Portfolio
RAM is jointly trained on a diverse collection of imaging tasks, each defined by a forward operator and noise parameters :
- Natural Image Tasks: Gaussian and Poisson–Gaussian denoising, motion and Gaussian deblurring (with various blur kernel sizes), inpainting (block or pixel masks), super-resolution (bicubic/bilinear upscaling, factors 2–4), pansharpening (Brovey transform).
- Medical Imaging Tasks: Single-coil MRI (, with Cartesian masks), multi-coil MRI (, 15 channels), CT (; Radon with 10–51 projections).
- Datasets: LSDIR (natural images), fastMRI (MRI), LIDC-IDRI (CT).
The supervised training loss for each task is
where , and is the RAM reconstruction function. The total loss aggregates over all tasks:
Training proceeds from DRUNet-initialized weights, using a batch size of 16 per task, random patches, Adam optimizer (lr , decay factor at $180$k steps, total $200$k steps).
4. Self-Supervised Adaptation to Novel Operators
RAM can rapidly adapt to new, potentially unseen forward operators or noise statistics with self-supervised fine-tuning. Given only measurements from the unknown setting, the training objective is:
- Measurement Consistency (): Uses SURE (if Gaussian noise is known), i.e.,
or SPLIT, which employs partial masking and expectation over masks.
- Null-Space Regularization (): For single operators, Equivariant Imaging regularization compares reconstructions of transformed inputs, while for families of operators, Multi-Operator Imaging is used.
Fine-tuning typically converges in minutes (e.g., a few images, a single mid-range GPU).
5. Quantitative Performance and Efficiency
RAM achieves strong quantitative and computational performance compared to iterative and unrolled architectures. Key metrics are summarized:
| Method | Params | CBSD68 | Urban100 | Div2K |
|---|---|---|---|---|
| PDNet | – | 30.54 | 28.36 | 32.61 |
| Restormer | – | 30.96 | 28.21 | 28.56 |
| DPIR | 32 M | 33.45 | 34.39 | 36.32 |
| uDPIR-tied | 32 M | 33.78 | 33.07 | 35.79 |
| uDPIR-un | 256 M | 33.64 | 32.72 | 35.38 |
| RAM | 36 M | 34.04 | 33.61 | 36.18 |
On medical tasks:
| Method | MRI×4 | MRI×8 | CT |
|---|---|---|---|
| PDNet | 28.25/0.719 | 24.54/0.641 | 23.09/0.713 |
| DPIR | 30.54/0.784 | 25.28/0.661 | n.a. |
| uDPIR-un | 33.73/0.848 | 30.20/0.792 | 27.00/0.772 |
| uDPIR-tied | 34.14/0.851 | 30.86/0.805 | 28.35/0.779 |
| RAM | 34.39/0.853 | 31.50/0.813 | 28.83/0.798 |
Efficiency (deblurring 256×256 color):
| Method | GFLOPs | Test Mem (MB) | Train Mem (MB) |
|---|---|---|---|
| PDNet | 60 | 52 | 5815 |
| uDPIR-tied | 2234 | 298 | 36374 |
| uDPIR-un | 2234 | 1213 | 37288 |
| RAM | 360 | 354 | 9670 |
RAM’s memory and computational profile is approximately faster than an 8-step unrolled network, with a parameter count (36 M) comparable to that of DRUNet.
Out-of-distribution experiments confirm robust adaptation: e.g., for multi-coil MRI (), RAM yields 35.62/0.889 (PSNR/SSIM) versus 36.06/0.894 for uDPIR-tied; for CT with Poisson noise, RAM achieves 28.83/0.798, far outperforming baselines.
Self-supervised fine-tuning on a single Sentinel-2 satellite image for compressed sensing or demosaicing elevates PSNR from 10 dB (zero-shot) to 30–35 dB within seconds.
6. Ablations, Memory Footprint, and Limitations
- Model Complexity: Approximately 36 M parameters, 360 GFLOPs for 256×256 images.
- GPU Memory: 354 MB inference memory (batch 1); 9.7 GB for batch 8 during training.
- Krylov Iterations and Scales: Best trade-off at Krylov terms and 4 multiscale levels (empirically in ablations).
- Limitations:
- Training RAM from scratch is GPU-intensive.
- While RAM strongly minimizes distortion (PSNR/SSIM), it does not optimize perceptual metrics—GAN/diffusion-based priors are not targeted.
- In settings where the forward operator is nearly invertible and well-characterized, bespoke single-task unrolled models may provide marginal gains over RAM.
- RAM serves as a backbone for possible posterior-mean-based sampling, but is not itself primarily a generative model.
7. Broader Impact and Applicability
RAM provides a unified, non-iterative solution across diverse computational imaging challenges, with extensibility to new operators and noise distributions via light self-supervised adaptation. It is effective across both natural and medical image datasets, consistently yielding near–state-of-the-art results but with hardware requirements suitable for practical deployment. Its architectural paradigm—explicit forward model integration, Krylov-based conditioning, and modality-agnostic kernel sharing—serves as an extensible blueprint for future foundation models in inverse imaging (Terris et al., 11 Mar 2025).