An objective-function-free algorithm for nonconvex stochastic optimization with deterministic equality and inequality constraints

Published 31 Mar 2026 in math.OC | (2603.29685v1)

Abstract: An algorithm is proposed for solving optimization problems with stochastic objective and deterministic equality and inequality constraints. This algorithm is objective-function-free in the sense that it only uses the objective's gradient and never evaluates the function value. It is based on an adaptive selection of function-decreasing and constraint-improving iterations, the first ones using an Adagrad-type stepsize. When applied to problems with full-rank Jacobian, the combined primal-dual optimality measure is shown to decrease at the rate of O(1/sqrt{k}), which is identical to the convergence rate of first-order methods in the unconstrained case.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper introduces STRADIC, an algorithm that alternates between function-decreasing and constraint-improving steps without using objective function evaluations.
It leverages componentwise AdaGrad-based stepsizes and projected gradients to achieve O(1/√k) convergence under full-rank Jacobian assumptions.
The method effectively handles nonlinear equality and inequality constraints while offering optimal iteration complexity for stochastic nonconvex problems.

Objective-Function-Free Stochastic Optimization for Constrained Nonconvex Problems

Introduction

The paper "An objective-function-free algorithm for nonconvex stochastic optimization with deterministic equality and inequality constraints" (2603.29685) introduces the STRADIC (Stochastic Trust-Region AdaGrad with Inequality Constraints) algorithm, an objective-function-free optimization (OFFO) method for general constrained nonconvex stochastic problems with deterministic constraints. The algorithm distinguishes itself by utilizing exclusively the gradient of the stochastic objective (never the function value) and systematically alternating between function-decreasing and constraint-improving iterations. Notably, the convergence rate of the algorithm matches that of unconstrained first-order methods, achieving an expected $O(1/\sqrt{k})$ decrease in optimality measures under full rank Jacobian assumptions.

Algorithmic Framework

STRADIC is rooted in a trust-funnel paradigm and employs a componentwise adaptive stepsize based on AdaGrad. At each iteration:

Gradient Processing: A stochastic gradient $g(x)$ is computed, which may leverage approximate or full second-order information.
Projected Gradient: Iterates are updated by orthogonal projection onto the tangent space defined by active equality and inequality constraints.
Function-Decreasing Steps: Adaptive tangential steps use AdaGrad-type stepsizes $\alpha_{k,i}$ for each variable component.
Constraint-Improving Steps: Normal steps are constructed independently to reduce constraint violations, exploiting standard nonlinear least-squares techniques and trust-region strategies.
Switching Mechanism: The algorithm adaptively alternates between tangential and normal (constraint) steps based on a specified switching condition related to gradient magnitudes and constraint violation levels.

This approach is fundamentally "objective-function-free," meaning that only the stochastic gradient is accessed, never the objective value—a significant advantage in large-scale or high-noise settings where function evaluation is impractical, such as deep learning and PINN applications.

Theoretical Guarantees and Complexity

The central contribution is a rigorous complexity analysis in the stochastic, constrained setting:

First-order Criticality: STRADIC measures dual/primal criticality using projected gradient residuals and constraint violations, guaranteeing that $\Omega_T(x_k)$ (projected gradient norm) and $\omega_N(x_k)$ (primal violation) both approach zero at critical points.
Componentwise AdaGrad Adaptivity: Each variable uses its own learning rate, theoretically ensuring superior adaptivity compared to full-space methods, particularly in heterogeneous variable landscapes.
Global Convergence Rate: Under regularity assumptions (AS.1–AS.11), including full-rank Jacobian and root-mean-square error conditions on gradient approximations, the expected optimality measures decrease at the rate $O(1/\sqrt{k})$ :

$\frac{1}{k+1}\sum_{j=0}^k \mathbb{E}[\Omega_T(x_j) + \|c_j\|] \leq \frac{\text{STRAD}}{\sqrt{k+1}} + O\left(\frac{1}{k+1}\right)$

where STRAD aggregates constant factors depending on problem parameters and noise characteristics.

Evaluation Complexity: Achieving $\epsilon$ -approximate criticality requires at most $O(\epsilon^{-2})$ iterations, matching optimal bounds for unconstrained deterministic problems. This is established via a telescoping Lyapunov function based analysis, leveraging componentwise AdaGrad bounds, and novel extensions to stochastic constraint regimes.

The results boldly claim that these complexity bounds are optimal for the class and that the convergence proofs do not exhibit the intricacies often present in constrained stochastic optimization analyses.

Comparative Analysis and Innovations

STRADIC advances over previous methods in several key respects:

Support for Nonlinear Equality and Inequality Constraints: Unlike many existing OFFO methods restricted to convex constraints or equality only, STRADIC handles the general case.
Componentwise Stepsizes: Prior works such as [CurtRobiZhou24], [FangNaMahoKola24], and [WangPierZhouCurt26] use full-space or isotropic stepsizes, whereas STRADIC leverages per-component adaptation, which is theoretically and empirically preferable in high-dimensional or ill-conditioned settings.
Second-Order Information: STRADIC systematically allows incorporation of (approximate or nonconvex) second-order information via Cauchy/trust-region strategies, while also functioning robustly in strictly first-order regimes.
Stochastic Noise Modeling: The analysis adopts a root-mean-square noise assumption along the step direction, weaker than classical unbiased or strong growth noise conditions in unconstrained optimization [WangZhanMaChen23], and extends this to its complexity characterization.

Numerical Results and Claims

While the paper is primarily theoretical, its central claims about convergence rate are strongly quantitative:

Convergence Rate: STRADIC achieves $O(1/\sqrt{k})$ convergence in expected optimality for constrained stochastic nonconvex problems—identical to AdaGrad in unconstrained settings.
Iteration Complexity: $g(x)$ 0 for $g(x)$ 1-criticality.
Adaptivity: Full componentwise adaptivity, per-variable stepsizes, outperforming full-space alternatives in complexity and adaptability.

These claims are explicitly contrasted with both penalty-based (e.g., PINN-style), augmented Lagrangian, and sequential quadratic programming approaches, where either ill-conditioning, unreliable multiplier estimates, or weaker complexity bounds are evidenced.

Practical and Theoretical Implications

The STRADIC framework has several implications:

Scalability and Robustness: Its OFFO nature, componentwise adaptivity, and independence from function evaluation make it a promising candidate for large-scale stochastic optimization (including deep learning paradigms).
Flexible Constraint Handling: The ability to integrate both approximate second-order and strict first-order methods, while supporting nonlinear deterministic equality and inequality constraints, broadens its applicability to advanced machine learning and scientific computing settings.
Optimal Complexity in Stochastic Regimes: The theoretically optimal complexity results invite reevaluation of penalty-centric or augmented Lagrangian methods, particularly in settings where exact function evaluation or perfect multiplier estimates are unattainable.
Relaxation of Stochastic Assumptions: The discussion section contemplates relaxed noise models—accommodating past-iteration noise, permitting slower sample-size increase, and thereby bridging practical sampling protocol realities.

Speculation on Future Developments

Future directions prompted by this research include:

Relaxation of Regularity Assumptions: Weakening boundedness, full-rank, and strict variance assumptions to handle broader classes of stochastic optimization problems.
Approximate Projection Schemes: Efficient approximation of tangent-space projections, crucial for further scalability and practical deployment with very large constraint sets.
Hybridization with Deep Learning Architectures: Embedding STRADIC within neural network training paradigms, especially for constrained or physics-informed models, leveraging its OFFO design.
Expanded Empirical Evaluation: Benchmarking STRADIC against state-of-the-art penalty and augmented Lagrangian-based solvers in practical settings.

Conclusion

STRADIC constitutes a formal advance in objective-function-free stochastic optimization for general nonlinear constrained problems. Its theoretical analysis delivers optimal complexity in stochastic regimes, with componentwise adaptivity and principled constraint handling. The algorithm's flexibility and rigorous convergence properties position it as a strong candidate for scalable optimization in high-dimensional machine learning and beyond, while its theoretical insights encourage ongoing refinement of stochastic constraint optimization theory.

Markdown Report Issue