Dimension-free estimators of gradients of functions with(out) non-independent variables

Published 31 Dec 2025 in math.ST, math.OC, and math.PR | (2512.24527v1)

Abstract: This study proposes a unified stochastic framework for approximating and computing the gradient of every smooth function evaluated at non-independent variables, using $\ell_p$-spherical distributions on $\R^d$ with $d, p\geq 1$. The upper-bounds of the bias of the gradient surrogates do not suffer from the curse of dimensionality for any $p\geq 1$. Also, the mean squared errors (MSEs) of the gradient estimators are bounded by $K_0 N^{-1} d$ for any $p \in [1, 2]$, and by $K_1 N^{-1} d^{2/p}$ when $2 \leq p \ll d$ with $N$ the sample size and $K_0, K_1$ some constants. Taking $\max\left{2, \log(d) \right} < p \ll d$ allows for achieving dimension-free upper-bounds of MSEs. In the case where $d\ll p< +\infty$, the upper-bound $K_2 N^{-1} d^{2-2/p}/ (d+2)^2$ is reached with $K_2$ a constant. Such results lead to dimension-free MSEs of the proposed estimators, which boil down to estimators of the traditional gradient when the variables are independent. Numerical comparisons show the efficiency of the proposed approach.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a dimension-free stochastic framework that uses ℓp-spherical distributions to estimate gradients in high-dimensional, dependent settings.
It achieves statistically optimal bias and mean squared error rates by carefully choosing the ℓp parameter relative to the ambient dimension to overcome the curse of dimensionality.
Numerical validations on benchmark functions confirm that the estimators reduce computational costs and deliver robust performance even under variable dependencies.

Dimension-Free Estimators for Gradients with (Non-)Independent Variables

Introduction and Motivation

Computing gradients of multivariate functions is foundational in optimization, sensitivity analysis, and statistical inference, particularly in high-dimensional regimes. For functions depending on non-independent input variables, traditional gradient definitions (based on Euclidean geometry) are insufficient, and a non-Euclidean or "dependent" gradient—accounting for the input dependencies—is essential. Existing gradient estimators, especially in derivative-free and zeroth-order optimization settings, often encounter the curse of dimensionality, with estimation error scaling poorly in the ambient dimension.

The paper "Dimension-free estimators of gradients of functions with(out) non-independent variables" (2512.24527) introduces a unified stochastic framework for approximating gradients—both Euclidean and non-Euclidean—achieving dimension-independent bias and mean squared error (MSE) under suitable conditions. The key technical innovation is the use of random directions sampled from $\ell_p$ -spherical distributions, with carefully chosen $p$ yielding statistical efficiency regardless of input dimension.

Framework for Dependent Gradients

A central contribution is the formal treatment of gradients in the presence of dependent (potentially correlated) input variables. Let $f: \mathbb{R}^d \to \mathbb{R}$ be a smooth function, and let $\mathbf{X}$ denote a random vector (not necessarily independent components). The "dependent" gradient is defined as

$\mathrm{grad}\,f(\mathbf{x}) = G^{-1}(\mathbf{x}) \nabla f(\mathbf{x}),$

where $G(\mathbf{x})$ is the tensor metric induced by the dependency structure, generalizing the Fisher information or covariance matrix. For independent variables, $G = I$ and the dependent gradient reduces to the classical gradient. The metric $G$ can be constructed explicitly when the joint distribution (e.g., via copulas or transformation of independent variables) is known.

Stochastic Estimation via $\ell_p$ -Spherical Distributions

The proposed estimators sample perturbation directions from $\ell_p$ -spherical distributions or uniform distributions over $p$ -balls. This generalizes prior approaches relying on $\ell_2$ or $\ell_1$ spheres/balls, leading to the following key surrogate for the gradient (for an appropriately chosen bandwidth $h$ and scaling $\sigma$ ):

$\widehat{\mathrm{grad}\,f}(\mathbf{x}) = \frac{G^{-1}(\mathbf{x})}{Nh\sigma^2} \sum_{i=1}^N \sum_{\ell=1}^L C_\ell f(\mathbf{x} + \beta_\ell h \mathbf{V}_i)\, \mathbf{V}_i,$

where the $\mathbf{V}_i$ are independent draws from an $\ell_p$ -spherical (or related) distribution, and the $C_\ell,\beta_\ell$ are determined by a system of moment-matching constraints (e.g., centered finite differences).

A crucial observation is that the joint moment properties of $\ell_p$ -spherical distributions allow for the control of both bias and variance of the estimators—provided $p$ is chosen appropriately relative to dimension $d$ .

Theoretical Guarantees: Dimension-Free Bias and MSE

Bias Analysis

For smooth (noiseless, $\mathcal{H}_2$ -class) functions, the bias of the proposed estimators can be upper bounded independently of $d$ , by appropriately choosing $h$ and $\sigma$ . Formally, for suitable choices

$\Vert \mathrm{bias} \Vert \leq M_2 h,$

where $M_2$ is a second-order smoothness constant.(See Eqn (44), (52).) This dimension-free bias holds even with dependent inputs and for any $p \geq 1$ , decisively breaking the curse of dimensionality at the bias level.

MSE and the Curse of Dimensionality

The mean squared error (MSE) admits the following upper bounds, depending on $p$ and $d$ :

For $1 \leq p \leq 2$ : $\mathrm{MSE} \leq K_0 N^{-1} d$ .
For $2 < p \ll d$ : $\mathrm{MSE} \leq K_1 N^{-1} d^{2/p}$ .
For $d \ll p < \infty$ : $\mathrm{MSE} \leq K_2 N^{-1} d^{2-2/p}/ (d+2)^2$ .

Choosing $p = \max\{2, \log d\} + 1$ secures essentially dimension-free MSE, i.e., the estimation error no longer scales with $d$ even as $d$ grows, provided a moderate $p$ can be sampled efficiently.

Implications for Derivative-Free Optimization

These results demonstrate that zeroth-order gradient estimation (one or two function evaluations per direction) can attain oracle convergence rates (parametric $N^{-1}$ scaling) without exponential or even polynomial dependence on $d$ , a substantial advance over classical approaches (finite differences, Spall/SPSA, and standard random directions). Notably, dimension-independent error is achievable even in the presence of arbitrarily strong dependence among variables.

Numerical Validation

Comprehensive numerical experiments underscore both the statistical and computational behavior predicted by theory. Benchmarks include the Rosenbrock function and a high-dimensional synthetic trigonometric-quadratic function, under independent and correlated settings. The following conclusions are supported:

For $d$ as large as $1000$, the dimension-free regime is empirically observed.
The error does not increase with $d$ for the recommended $p$ regime, outperforming both finite-difference and previously proposed Monte Carlo/randomized techniques.
Gram-Schmidt orthogonalization is used to enforce empirical uncorrelatedness in the sample directions, which is critical for accuracy at small sample sizes relative to $d$ .

Practical and Theoretical Implications

Practical Implications: The dimension-free stochastic gradient estimators can reduce computational costs for high-dimensional optimization, global sensitivity analysis, and uncertainty quantification in complex models, especially when automatic differentiation or analytic gradients are unavailable. They are also readily adaptable to cases with strong dependencies among input variables, which frequently arise in scientific computing and probabilistic modeling.

Theoretical Implications: This work clarifies the statistical-minimax behavior of random-direction gradient estimators under broad conditions, resolving the tension between theory and numerical practice in high-dimensional Monte Carlo gradient approximation. It also demonstrates the centrality of $\ell_p$ -spherical distributions in enabling favorable concentration phenomena, suggesting new directions for randomized algorithm design beyond derivative-free optimization.

Future Directions

The authors indicate several avenues for further research:

Efficient empirical generation of "uncorrelated" random directions in high dimensions with small sample sizes.
Extensions to noisy, non-smooth, or stochastic functions.
Integration with one-point feedback schemes in online optimization, stochastic control, and bandit problems.

Conclusion

The paper (2512.24527) delivers a principled framework for dimension-free gradient estimation via $\ell_p$ -spherical randomized directions, with rigorously justified statistical guarantees for both independent and non-independent variable regimes. These results set a new benchmark in high-dimensional gradient approximation, with significant ramifications for optimization, sensitivity analysis, and machine learning.