Zero-Cost Accuracy Estimators
- Zero-cost accuracy estimators are techniques that compute proxy scores from untrained or minimally-initialized models, using measures like gradient and Jacobian norms to correlate with eventual accuracy.
- They are applied in neural architecture search, randomized low-rank approximations, and high-dimensional regression, reducing training computations by up to 1000× while maintaining performance ranking.
- Recent advancements using genetic programming and parametric Bayesian mixers have enhanced the estimators’ rank correlation and transferability across diverse datasets and search spaces.
A zero-cost accuracy estimator is a methodology that provides an accuracy estimate (or a correlated ranking signal) for a machine learning model, typically a neural architecture, without performing any model training or using only a minimal amount of computation (e.g., a single forward or backward pass). These estimators play a pivotal role in neural architecture search (NAS), model selection, randomized low-rank approximation, and high-dimensional statistical inference, enabling massive reductions in computational resource requirements and carbon footprint relative to training-based evaluation.
1. Core Principles and Mathematical Formalism
Zero-cost estimators operate under the principle of extracting task-relevant statistical or algorithmic information from untrained or only-initialized models. The central requirement is that the estimator , computed from an architecture without any full or partial training, maintains a high (typically rank-order) correlation with the actual post-training accuracy . Formally: where the correlation is typically Spearman’s or Kendall’s (Akhauri et al., 2022, Abdelfattah et al., 2021).
Instantiations of zero-cost accuracy estimators span several domains:
- Neural Architecture Scoring: Forward/backward statistics on randomly initialized neural networks (Abdelfattah et al., 2021), pruning-based saliencies, Hessian-based measures, and expressivity statistics (Lukasik et al., 2023).
- Randomized Low-Rank Approximation: Leave-one-out estimators (e.g., for Generalized Nyström) quantifying low-rank matrix approximation accuracy without additional passes over data (Lazzarino et al., 16 Jan 2026).
- High-Dimensional Regression Signal Estimation: Zero-estimator adjustments—statistics with expectation zero under the null—applied to naive variance estimators for variance reduction without bias (Livne, 2023).
2. Canonical Methodologies
Neural Architecture Search (NAS)
In NAS, zero-cost proxies leverage initializations (pre-training) to distinguish between candidate architectures:
- Gradient Norm: Aggregate L2-norm of input mini-batch gradient vectors, (Abdelfattah et al., 2021).
- Pruning-Based Proxies:
- SNIP: .
- GraSP: Utilizes second-order sensitivity, , (Abdelfattah et al., 2021).
- SynFlow: Measures layerwise parameter sensitivity under data-agnostic forward passes, (Abdelfattah et al., 2021).
- Jacobian-based Proxies: Sample-wise Jacobian norms and their statistics, e.g., (Lukasik et al., 2023).
- Piecewise-linear and Hessian-based: E.g., spectrum of input-output Jacobians or Hessian eigenvalues (Lukasik et al., 2023).
- Expressivity (e.g., NWOT, EPE-NAS): Binary region counts and within-class kernel similarities from ReLU masks or Jacobians (Lukasik et al., 2023).
- Node-wise Parametric Aggregation: Bayesian-mixed and differentiable ranking models (ParZC) learn weights over node-level zero-cost statistics, leveraging uncertainty and node-importance (Dong et al., 2024).
Randomized Low-Rank Approximation
For randomized SVD or Nyström methods, zero-cost estimators are constructed by algebraic manipulation of sketching matrices:
- Leave-Pair-Out (LPO): Matrix norm of elementwise reciprocals of inverse core matrices (Lazzarino et al., 16 Jan 2026).
- Leave-Twin-Out (LTO): Restricted to diagonals; efficient estimator for the mean-squared error of the approximation (Lazzarino et al., 16 Jan 2026).
- Leave-Right-Out (LRO): Valid for any sketch dimension discrepancy, uses core matrix pseudoinverses for Frobenius norm error estimation; most robust empirically (Lazzarino et al., 16 Jan 2026).
High-Dimensional Regression
Zero-estimator approaches provide variance reduction:
- Baseline: Second-order U-statistics for regression signal estimation.
- Zero-Estimators: Statistics from unlabeled covariates with , added as to baseline estimators, with chosen optimally to minimize variance (Livne, 2023).
3. Automated and Parametric Zero-Cost Estimator Design
Manual design of zero-cost proxies is laborious and often fails to generalize. Recent developments include:
- Genetic Programming Frameworks: Automatic evolution of interpretable zero-cost estimators by maximizing rank correlation with true accuracy across diverse search spaces (EZNAS-A proxy) (Akhauri et al., 2022).
- Parametric Bayesian Mixer Models: Parametric frameworks (ParZC) embed classical proxies in differentiable, uncertainty-aware architectures trained to maximize differentiable Kendall's (DiffKendall loss), adapting node-wise weighting according to inferred importance and statistical noise (Dong et al., 2024).
- Differentiable Operation Scoring: Zero-cost scoring embedded into differentiable architecture search using perturbation-based finite-difference proxies, balancing computational speed with estimator fidelity (Xiang et al., 2021).
4. Empirical Efficacy and Benchmarking Landscape
Zero-cost estimators have been systematically evaluated on standardized NAS and approximation benchmarks:
- Rank Correlation: On NAS-Bench-201, expert-designed proxies like SynFlow achieve Spearman , but automatically evolved (EZNAS-A) or parametric (ParZC) proxies now routinely reach and Kendall’s (Akhauri et al., 2022, Dong et al., 2024).
- Generalization: Top-performing zero-cost proxies (ParZC, EZNAS-A) exhibit transferability across datasets (CIFAR-10/100, ImageNet-16-120), search spaces (DARTS, NASNet, ENAS), and even vision transformers (Dong et al., 2024).
- Search Efficiency: Substituting zero-cost proxies into standard NAS paradigms (random search, evolutionary, reinforcement learning, predictor-based) reduces the number of full trainings required by factors of to while maintaining or improving final architecture accuracy (Abdelfattah et al., 2021, Akhauri et al., 2022).
- Low-Rank Matrix Approximation: Fast zero-cost LOO estimators for the Generalized Nyström method produce mean-squared approximation errors indistinguishable from brute-force validation at sub- runtime cost (Lazzarino et al., 16 Jan 2026).
- Statistical Signal Estimation: Variance reduction in high-dimensional regression by zero-estimator augmentation can lower RMSE by up to over plain U-statistic estimators (Livne, 2023).
5. Theoretical Guarantees, Limitations, and Caveats
- Consistency and Unbiasedness: Many zero-cost estimators (especially in randomized linear algebra and statistical regression) are provably unbiased and consistent under mild moment or sketching assumptions (Livne, 2023, Lazzarino et al., 16 Jan 2026).
- Empirical, Not Formal, Ranking Guarantees: For NAS, even the most robust proxies (e.g., SynFlow, ParZC, EZNAS-A) offer no formal guarantee of high accuracy ranking in arbitrary tasks; significant drops in rank correlation are observed in highly heterogeneous or non-vision domains (Abdelfattah et al., 2021, Akhauri et al., 2022, Lukasik et al., 2023).
- Robustness Prediction: Clean accuracy is effectively captured by single proxies, but robust accuracy under adversarial attack requires ensembles of 5–8 proxies for acceptable ranking fidelity (Lukasik et al., 2023).
- Top-10% Resolution: Most zero-cost estimators struggle to distinguish the absolute best architectures—the fidelity among the top 10% remains modest even for state-of-the-art methods (Akhauri et al., 2022).
6. Practical Guidance and Future Directions
- Proxy Selection: In homogeneous CNN NAS spaces, SynFlow and jacobian-based proxies are preferred; for heterogeneous or transformer settings, ParZC or learned parametric models yield superior results (Abdelfattah et al., 2021, Lukasik et al., 2023, Dong et al., 2024).
- Ensembling: Simple voting or learned regression over multiple proxies increases robustness, especially for multi-objective (clean + robustness) ranking (Lukasik et al., 2023).
- Algorithmic Integration: Rank large candidate pools by cheap zero-cost proxy, restrict expensive full training to top-ranked candidates, and use proxies for local mutation proposals in evolutionary and differentiated search loops (Abdelfattah et al., 2021, Xiang et al., 2021).
- Automated Discovery: Augment the search spaces for proxies to include inter-block statistics and topological features for improved generalization (Akhauri et al., 2022, Dong et al., 2024).
- Extending to New Domains: Adapt hooks/statistics for object detection, NLP, graph neural networks, and transformers; actively research proxies sensitive to adversarial robustness, local Lipschitz smoothness, and loss landscape features (Lukasik et al., 2023, Dong et al., 2024).
- Randomized Linear Algebra: Leverage zero-cost LOO estimators for adaptive rank selection and real-time error assessment in large-scale matrix decompositions with no data revisiting (Lazzarino et al., 16 Jan 2026).
Zero-cost accuracy estimators have transformed model evaluation and search efficiency in NAS, scalable linear algebra, and statistical inference. Continual improvements in proxy design, parameterization, and theoretical understanding remain active areas of research, with broad implications for efficient, environmentally responsible machine learning and optimization workflows.