All Resolutions Inference (ARI)
- ARI is a framework that provides resolution-invariant inference through statistical guarantees and deep learning adaptations without pre-specified discretization.
- It enables rigorous lower bounds on the fraction of active signals in neuroimaging and ensures robustness in tasks like super-resolution, medical imaging, and audio processing.
- ARI integrates post hoc statistical inference with adaptive neural architectures, achieving efficient adaptive cluster thresholding and consistent performance across variable resolutions.
All Resolutions Inference (ARI) encompasses a family of statistical and deep learning methodologies enabling a single inferential model or statistical guarantee to remain valid or performant over all possible input resolutions, scales, or clusterings—without pre-specified discretization and without retraining or tuning for each possible input grid. In functional neuroimaging, ARI provides rigorous lower bounds on the fraction of truly active voxels in any, possibly data-driven, brain region or cluster, thereby circumventing the spatial specificity paradox of classical cluster thresholding. In the broader machine learning domain, ARI characterizes methods that yield models with robustness and explicit adaptability to unseen or continuously varying spatial, temporal, or sample resolutions. This framework unifies approaches in post hoc statistical inference, adaptive neural architectures, and resolution-invariant autoencoding with key demonstrations in fMRI, medical imaging, super-resolution, object and audio recognition.
1. Formal Statistical Foundations of ARI in Neuroimaging
ARI was first developed in functional MRI (fMRI) for rigorous inference over arbitrary spatial clusters of voxels (Peyrouset et al., 4 Nov 2025, 2206.13587). Formally, let index all voxels; is the unknown subset of null (inactive) voxels. For any subset , the true discovery proportion (TDP) is
ARI constructs, with no need to pre-specify regions, a simultaneous lower-confidence bound for all , such that
This is realized via Simes-type calibration: if -values are independent or PRDS (positively regression dependent on a subset), then Simes’ inequality ensures that for any ,
Given a non-decreasing vector of thresholds , the joint error rate (JER) is
When , ARI employs the Blanchard–Rösholt interpolation:
The canonical choice, Simes thresholds , ensures FWER control and valid TDP bounds for all simultaneously, supporting fully post hoc cluster choice.
2. Advances: Permutation-Based ARI Extensions (Notip and pARI)
To address fMRI’s strong voxelwise dependencies, permutation-based ARI extensions have been introduced (Peyrouset et al., 4 Nov 2025). Notip and pARI instantiate the “JER-calibration” principle of Blanchard et al. (2020).
- pARI [Andreella et al. 2023]: Defines a parametric threshold family
where integer sets the minimal cluster size for nontrivial bounds; is chosen by permutation calibration.
- Notip [Blain et al. 2022]: Builds data-driven thresholds from the empirical quantile of the th-smallest -values across permutations; this focuses statistical power on likely-signal regimes and adjusts naturally for the empirical null.
Performance Regimes: pARI outperforms classic ARI and Notip for large clusters (especially with large ), but can yield vacuous (zero) bounds for small clusters (). In contrast, Notip consistently improves over ARI across all cluster sizes, preserving drill-down resolution (accurate inference in subclusters).
3. Adaptive Cluster Thresholding and Efficient Algorithms
ARI’s combinatorial flexibility introduces computational challenges for brain-wide scan sizes. Chen et al. (2022) proposed an efficient algorithm for adaptive cluster thresholding (2206.13587):
- Construct the forest of all possible supra-threshold clusters using a union-find data structure over the voxel adjacency graph (e.g., $6$, $18$, $26$ neighbor connectivity), in time.
- Compute TDP lower bounds for all clusters via heavy-path decomposition, achieving preprocessing.
- Given any post hoc threshold , efficiently enumerate all maximal clusters for which ; output-sensitive complexity is linear in the total size of the returned clusters.
This approach yields sub-second query times on whole-brain datasets (), supports data-driven exploration, and retains strong FWER control for arbitrary, anatomically or functionally motivated cluster choices.
4. All Resolutions Inference in Deep Learning Architectures
The ARI principle has been embedded in a new class of neural models to solve the “resolution-variance” problem: classical convolutional networks operate on fixed grids, so mixing data of different native resolutions (medical images, remote sensing, video) generally forces lossy resampling.
- Resolution Invariant Autoencoder (RIAE) (Patel et al., 12 Mar 2025): Replaces fixed-size pooling and upsampling with learned, variable-resolution resizing blocks. The encoder always maps any input to a consistent latent space, regardless of ’s voxel spacing; similarly, the decoder can reconstruct to any chosen output grid. A latent consistency loss enforces alignment of latents across resolutions.
- For a 3-stage AE, the per-layer scale is set so that after all downsamplings, the latent grid matches the highest-resolution data:
where is the fixed latent physical spacing, is the input’s resolution. - Downstream tasks—including uncertainty-aware super-resolution, latent-diffusion-based generation, and classification—can operate on these latents across any resolutions, avoid pixel-level resampling, and deliver consistent uncertainty quantification.
Adaptive Resolution Residual Networks (ARRN) (Demeule et al., 9 Dec 2024): Generalizes fixed-resolution networks by inserting Laplacian residual adapters at each scale, which isolate band-limited detail and can be skipped at coarser resolutions. Laplacian dropout randomly omits high-res adapters during training, improving the model’s robustness to resolution shifts.
- Theoretical underpinning is given by neural operator invariance: for any input resolution, the residual-form network provides equivalent computation (modulo discretization error) as the full-resolution network.
- Empirically, ARRN provides up to speedup on low-resolution data with minimal accuracy loss and smooth accuracy–compute trade-off.
5. ARI in Vision, Audio, and Multimodal Applications
All-Resolutions Inference has practical instantiations in high-level vision and audio models:
- Self-supervised Object Detection (AERIS) (Cui et al., 2022): AERIS trains an encoder-decoder pipeline where the encoder receives images of arbitrary degradation (random blur, downsampling, noise), and the decoder reconstructs to original content regardless of degradation. End-to-end object detection and self-supervised equivariance loss induce a feature representation that is stably equivariant to scale and degradation. At test time, AERIS accepts any input size or degradation, bypassing the need for explicit super-resolution preprocessing.
- Arbitrary-Scale Super-Resolution (ARIS Plugin for SR) (Zhou et al., 2022): A transformer-based “plugin” injects coordinate query-based cross-attention between LR features and upsampling layers, yielding a continuous implicit image representation. The training scheme combines standard paired supervision at known scales with unsupervised cycle-consistency for unseen (out-of-distribution) scale factors, enabling a single model to perform super-resolution at any scale—seen or unseen.
- Audio Spectrogram Transformers (ElasticAST) (Feng et al., 11 Jul 2024): By sequence packing, masked self-attention, and mask attention pooling, ElasticAST allows a transformer model to process audio clips of any length or mel-spectrogram resolution, obviating the padding/trimming paradigm of fixed-size transformers and yielding flat or even improved performance across a wide range of test-time input resolutions and durations.
6. ARI: Theoretical Guarantees, Tradeoffs, and Empirical Results
Statistical Guarantees:
- In fMRI, ARI delivers simultaneous, post hoc lower confidence bounds on the TDP for all clusters, with strong FWER control under Simes’ inequality or permutation-based calibration (Notip, pARI).
- Adaptive cluster thresholding preserves statistical validity for arbitrary post hoc choice of clusters, sizes, or anatomical masks.
Deep Learning Properties:
- RIAE guarantees a shared latent space for all images regardless of native resolution, with latent consistency loss enforcing cross-resolution alignment; uncertainty estimates are built in via noise injection proportional to estimated information loss.
- ARRN achieves discretization invariance by design, with theoretical error bounds linked to the accuracy of approximated smoothing kernels. Laplacian dropout further enforces robustness to varying resolutions and numerical artifacts.
Empirical Results:
A sample of numerical findings across domains:
- fMRI: Notip and pARI reveal complementary power regimes; Notip dominates for small clusters, while pARI excels for large clusters (Peyrouset et al., 4 Nov 2025).
- Medical imaging: RIAE (ARI) matches or outperforms fixed-networks at extreme upsampling rates, closes the classification gap when test and train resolutions mismatch, and allows mixing low- and high-resolution data for generative modeling, nearly recovering oracle performance (Patel et al., 12 Mar 2025).
- Vision: ARRN achieves near-constant accuracy from full to resolution with up to compute saving compared to fixed models (Demeule et al., 9 Dec 2024).
- Super-resolution: ARIS plugin improves PSNR by $0.3$–$0.6$ dB on out-of-distribution scales vs. all previous any-scale methods (Zhou et al., 2022).
- Audio: ElasticAST strictly dominates fixed-length AST models, never discarding information, and achieves monotonic accuracy across all lengths or resolutions (Feng et al., 11 Jul 2024).
7. Limitations, Best Practices, and Future Directions
Limitations:
- In classical ARI, the validity of simultaneous TDP inference hinges on Simes-type assumptions or permutation calibration; in highly dependent or structured data, conservative alternatives may be required.
- pARI can yield null bounds on fine (subcluster) resolution if the cluster size drops below the parameter.
- In neural architectures, learned resizing or Laplacian residual blocks add parameter and runtime cost, and cannot recover lost high-frequency information at extreme downsampling.
- ARI models require that the highest-resolution (or finest-scale) structure is present during training, fixing the latent scale or operator structure.
Best Practices:
- In statistical ARI, predefine TDP thresholds for spatial specificity (e.g., , $0.7$, $0.9$) and use dense -maps to explore local activity density before committing to reporting clusters.
- When targeting deployment or scaling, choose ARI variants tailored to expected operating resolutions and task requirements; use Laplacian dropout and explicit latent alignment for neural models to maximize cross-resolution robustness.
Future Directions:
- Generalization of ARI theory to arbitrary graphs or multi-modal data, beyond voxelwise lattices.
- Integration with closed-testing or Hommel-improved calibrations for arbitrary dependence structures.
- Extension of ARI-type architectures to domains such as remote sensing, spatiotemporal modeling, or cross-sensor fusion.
ARI thus formalizes and operationalizes the principle that inference and prediction should not be artificially constrained to a resolution or clustering dictated by computational expediency, but rather should robustly and validly adapt to the diversity of real-world inputs encountered in both the statistical and learning paradigms.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free