Descent-Net: Neural Optimization for Constraints
- Descent-Net is a neural network framework that embeds trainable descent modules to compute feasible descent directions for constrained optimization.
- It employs unrolled, trainable projected subgradient solvers with adaptive step sizes and explicit penalty mechanisms to maintain constraint satisfaction.
- Empirical results demonstrate fast runtimes and high accuracy across diverse applications like convex QPs, portfolio optimization, and AC optimal power flow.
Descent-Net is a neural network-based framework designed for learning efficient descent directions to solve constrained optimization problems, with a central focus on executing updates that improve objective values while preserving feasibility. It achieves this by embedding iterative optimization procedures and constraint preservation within modular, trainable network structures. Two distinct lines of Descent-Net research exist: one line focuses on constrained optimization using projection and penalty mechanisms (Zhou et al., 12 Dec 2025), and another on enforcing layer-wise stochastic descent within deep unrolled architectures to guarantee robustness and convergence (Hadou et al., 2023).
1. Constrained Optimization Problem Formulation
Descent-Net addresses parametric constrained optimization problems of the form: where encodes instance-specific data, and are continuously differentiable in . The feasible set is assumed non-empty, bounded sublevel sets for are required, the Linear Independence Constraint Qualification (LICQ) holds at all feasible points, and all gradients involved are uniformly Lipschitz. A nontrivial margin is enforced on inactive inequalities to ensure local feasibility can be robustly preserved (Zhou et al., 12 Dec 2025).
In the context of unconstrained or regularized problems, related Descent-Net architectures target minimizees of smooth mappings over , phrased as bi-level optimization: the outer model is trained to minimize downstream loss to (approximate) optima, and the inner model optimizes the function given (Hadou et al., 2023).
2. Descent-Net Architecture
The core of the Descent-Net approach resides in its stack of “Descent Modules,” each engineered to compute a feasible descent direction and update the solution accordingly. The process for each module at stage , with current iterate , is: where the direction is produced via an unrolled, trainable projected subgradient solver. Each module features “Descent Layers,” iteratively solving the penalized subproblem
with . Subgradients, learned linear transformations, and non-linearities (notably ReLU) are composed at each layer, followed by projection onto . All transformation matrices, biases, per-layer step sizes, and module-specific scalars are treated as trainable network parameters (Zhou et al., 12 Dec 2025).
In the unrolled network context, Descent-Net is realized as an -layer architecture, where each layer performs a differentiable map with injected Gaussian noise for decorrelation. Layerwise maps are denoted: (Hadou et al., 2023).
3. Constraint Handling, Objective Design, and Feasibility Guarantees
Feasibility and optimality are enforced through explicit penalization in the training loss: where the ReLU-term penalizes violated inequalities and the -term penalizes equalities. Crucially, module-wise updates are constructed so as to leave equality constraints invariant to first order by enforcing , and the step size is selected to guarantee non-violation of linearized inequalities.
The maximum admissible step is
and the actual step
employs a learned shrinkage factor via the sigmoid function. This approach ensures that for sufficiently small , all active constraints remain feasible under the linearized update (Zhou et al., 12 Dec 2025).
In stochastically-descending unrolled networks, constraint enforcement is accomplished via explicit descent constraints (e.g., on gradient-norm or distance-to-optimum), enforced as Lagrangian constraints during training over the batch expectation,
with primal-dual optimization of the network weights and Lagrange multipliers (Hadou et al., 2023).
4. Theoretical Properties and Convergence
Descent-Net's design aims for both feasibility preservation and objective value improvement. In the constrained setting, enforcing each to satisfy constraint-orthogonality and updating with provably feasible steps ensures the trajectory remains within the feasible set and achieves descent in the objective, subject to the regularity and margin conditions stated (Zhou et al., 12 Dec 2025).
For stochastically-descending architectures, the methodology guarantees—under mild regularity and absence of distribution shift—layerwise, in-expectation descent of critical metrics (e.g., gradient norm, sub-optimality distance). Constrained learning theory provides that primal-dual training converges to solutions that are near-optimal and nearly feasible, with explicit finite-sample generalization bounds: where the error terms arise from finite data and constraint violation rates (Hadou et al., 2023).
5. Empirical Evaluation and Applications
Descent-Net demonstrates substantial empirical performance on a diverse set of optimization tasks (Zhou et al., 12 Dec 2025):
- Convex quadratic programs (QPs): For problems up to variables, Descent-Net achieves objective errors in the – range, with runtimes – faster than OSQP (e.g., 0.01 s vs. 2–100 s for larger problems).
- Nonconvex QPs: Achieves relative objective error at runtimes competitive with, and often faster than, IPOPT.
- Portfolio optimization: For up to 4000 assets, yields errors near and batch runtimes under 2 ms, significantly outpacing classical solvers.
- AC optimal power flow (OPF): On 30- and 118-bus models with nonlinear equality constraints, achieves errors of in 0.04–0.16 s, compared to 0.29–0.64 s for physics-based PYPOWER.
Stochastically-descending Descent-Nets exhibit improved robustness and steady descent in practical tasks such as sparse coding (LISTA) and image restoration (GLOW-Prox), outperforming unconstrained unrolled networks in the presence of noise and distributional perturbations (Hadou et al., 2023).
6. Scalability, Generalization, and Limitations
Descent-Net exhibits strong scalability, supporting problem sizes with several thousand variables and constraints (Zhou et al., 12 Dec 2025). The architecture is instance-agnostic, relying on shared parameters across all problem instances. A key empirical finding is its generalization capability: it maintains performance and feasibility for larger problem instances than encountered during training.
The architecture accommodates nonlinear equality constraints, such as in AC-OPF, by exploiting an equality-elimination preprocessing. However, the following limitations are identified:
- Performance degrades on highly ill-conditioned problems due to approximation difficulty for the learned mapping.
- For nonconvex constraints, descent is local and no global optimality guarantee is provided.
- The current projection and step-size computation presumes linear or well-behaved constraints; more sophisticated handling may be necessary for arbitrary nonconvex settings.
A summary of validated tasks and scaling behavior is provided:
| Problem Type | Size () | Error | Runtime (s) | Baseline (s) |
|---|---|---|---|---|
| Convex QP | 100–5000 | – | 0.01 | 2–100 (OSQP) |
| Nonconvex QP | 100 | 0.014 | 0.34 (IPOPT) | |
| Portfolio Opt. | 100–4000 | 0.002 | 0.6 (OSQP) | |
| AC-OPF | 30–118 bus | 0.04–0.16 | 0.29–0.64 (PYPOWER) |
7. Relationship to Broader Literature and Practical Considerations
Descent-Net integrates the geometric interpretability of feasible-direction and projected subgradient optimization methods with the representational benefits of deep learning. Stochastically-descending Descent-Nets formalize layer-wise descent constraints within unrolled architectures and provide a framework for robust, distributionally-stable learning-to-optimize across signal processing and inverse problems (Hadou et al., 2023).
Key practical guidelines include noise regularization, primal-dual learning rates, adaptive step sizes, and appropriate unroll depths for target problems. Empirical results highlight the benefit of explicit constraint enforcement for both standard and adversarially perturbed instances.
Descent-Net constitutes a significant contribution to scalable, constraint-aware learning-to-optimize frameworks, with rigorous convergence and robustness guarantees in both classical constrained and deep unrolling paradigms.