RESZO: Regression-Based Single-Point ZO
- Regression-Based Single-Point Zeroth-Order Optimization (RESZO) is a derivative-free method that uses regression on historical function evaluations to construct surrogate models and estimate gradients with reduced variance.
- It employs both linear and quadratic surrogate models to capture gradient and curvature information, achieving convergence rates similar to two-point methods while requiring only one function query per iteration.
- RESZO is particularly effective in online, black-box, and simulation-driven scenarios where obtaining multiple function evaluations is impractical or costly.
Regression-Based Single-Point Zeroth-Order Optimization (RESZO) is a class of derivative-free optimization algorithms designed for settings where only a single function evaluation is feasible at each iteration, such as online, black-box, and simulation-driven optimization. The key innovation of RESZO is the use of regression over multiple historical function evaluations to construct local surrogate models, whose gradients serve as low-variance descent directions. This approach achieves convergence rates and query complexities comparable to two-point zeroth-order methods while maintaining the practical and statistical efficiency of single-point evaluations (Chen et al., 6 Jul 2025).
1. Core Principles and Algorithmic Framework
Traditional single-point zeroth-order (SZO) methods estimate gradients using a single sample, e.g., for drawn from a sphere or normal distribution, discarding all previous information. This produces high-variance estimates, leading to slow convergence needing queries to reach stationarity for smooth nonconvex objectives. In contrast, RESZO reuses the most recent function evaluations to fit a local surrogate model by least-squares regression, then takes the surrogate’s gradient as a descent direction. By aggregating historical information, both variance and bias are controlled, accelerating convergence with only one new function evaluation per step.
There are two principal RESZO variants:
- Linear RESZO (L-RESZO): Fits a local linear surrogate around the current perturbed point using recent samples.
- Quadratic RESZO (Q-RESZO): Fits a local quadratic surrogate with a diagonal Hessian to capture basic curvature information.
At each iteration :
- Sample or and set .
- Query .
- Fit a surrogate function using via least-squares regression.
- Update .
This regression strategy allows RESZO to leverage the information content of multiple, costly function queries for each update, closing the gap to multi-query (two-point) methods (Chen et al., 6 Jul 2025).
2. Surrogate Model Construction and Algorithmic Implementation
The surrogate at time is built from perturbed points and corresponding function values.
- Linear surrogate:
The coefficient (gradient estimate) and offset are given by the least-squares solution:
where , .
- Quadratic surrogate:
Fits a diagonal-Hessian quadratic form,
with regression matrices extended accordingly.
Pseudocode for L-RESZO:
1 2 3 4 5 6 7 8 9 |
Input: initial x₀, smoothing δ, stepsize η, window m, total T
For t = 0 to m−1:
Run a standard one-point or residual-feedback SZO update
For t = m to T−1:
Sample u_t; set hat_x_t = x_t + δu_t; query y_t = f(hat_x_t)
Collect past m points (hat_x_{t−i}, y_{t−i}), i=0,...,m−1
Construct X_t, y_t matrices as above
Solve [g_t; c_t] = (X_tᵗX_t)† X_tᵗ y_t
Update x_{t+1} = x_t − η g_t |
3. Theoretical Guarantees and Convergence Analysis
Under standard smoothness assumptions, the regression-based gradient approximates with error controlled by the window size , the step-size , and the geometry of the perturbations. Theoretical results for L-RESZO include:
- Gradient-Error Control:
Under -smoothness, there exists such that for all ,
for a dimension- and schedule-dependent .
- Smooth Nonconvex Case:
For and ,
- Strongly Convex Case:
For smooth -strongly convex objectives,
- Query Complexity:
| Setting | Two-point ZO | L-RESZO | |------------------------------|--------------|---------------| | Smooth nonconvex | | | | Smooth -strongly convex | | |
Empirically, behaves as . This suggests that in high dimensions, L-RESZO achieves query complexity comparable (up to a moderate factor) to two-point ZO methods, outperforming standard SZO by a significant margin (Chen et al., 6 Jul 2025).
4. Empirical Performance and Practical Considerations
Comprehensive experiments on noiseless ridge regression, logistic regression, Rosenbrock, and neural network training with –$200$ confirm that both L-RESZO and Q-RESZO converge at essentially the same iteration-rate as two-point ZO, while using only one query per step. Thus, in terms of function query complexity, RESZO is approximately twice as efficient. Both RESZO variants also substantially outperform residual-feedback SZO. Q-RESZO demonstrates slightly faster convergence than L-RESZO due to access to basic curvature information.
Stability and precision are sensitive to the perturbation radius :
- causes oscillations or divergence.
- Small, positive increases precision but can hurt stability if too small.
- Adapting provides a balance between stability and optimality.
- Window size: is necessary for full-rank surrogate fitting; is used in practice.
- Overhead: Each iteration requires an or least-squares regression, which can be efficiently updated via rank-one matrix updates.
5. Advantages, Limitations, and Comparison
Advantages
- Single function query per step with far superior variance and convergence properties than classic one-point estimators.
- Systematic reuse of historical data: All available function calls are utilized for each gradient estimation.
- Rates matching two-point ZO: Up to a moderate, empirically mild factor ().
Limitations
- Assumption A2 dependence: The full theoretical guarantee requires that regression error, as encapsulated by , remains bounded—a property empirically observed, but without sharp theoretical bounds for large .
- Noiseless analysis: Current convergence results apply only in deterministic function settings.
- Storage and batch-size: Maintaining a buffer of at least past queries is necessary for surrogate regression.
Comparison with other ZO methods
| Method | Queries per step | Uses history | Query complexity (nonconvex) |
|---|---|---|---|
| Classic SZO | 1 | No | |
| Two-point ZO | 2 | Not required | |
| Residual-feedback SZO | 1 | Previous eval only | Improved, but not regression-based |
| RESZO (proposed) | 1 | Yes (window ) |
6. Applications and Extensions
RESZO is particularly advantageous in settings where only single function queries are feasible at each iteration:
- Online and dynamic optimization, where the objective may change over time and repeated querying is impossible.
- Bandit settings, expensive simulation, and hyperparameter tuning.
- Reinforcement learning and power systems control, where function evaluation is costly or resource-limited.
- Safety-critical control systems, where repeated, identical actions are not permissible.
A plausible implication is the applicability of RESZO to reinforcement learning and simulation-based policy optimization under severe query limitations.
7. Open Problems and Future Directions
Although RESZO marks a substantial advance for single-point ZO, several technical challenges remain:
- Extending theory to noisy evaluations (stochastic objectives).
- Developing high-probability regret/convergence bounds.
- Rigorous bounding of the regression constant for high-dimensional regimes.
- Improving adaptive strategies for window size and perturbation radius.
- Incorporating variance reduction and acceleration mechanisms.
Potential extensions may include mirror-descent variants, non-Euclidean sampling schemes, or combination with control-oriented feedback designs (Chen et al., 6 Jul 2025).
For the definitive introduction, formal algorithmic details, theoretical analysis, and empirical comparisons, see "Regression-Based Single-Point Zeroth-Order Optimization" (Chen et al., 6 Jul 2025).