Descent-Net: Neural Optimization for Constraints

Updated 16 December 2025

Descent-Net is a neural network framework that embeds trainable descent modules to compute feasible descent directions for constrained optimization.
It employs unrolled, trainable projected subgradient solvers with adaptive step sizes and explicit penalty mechanisms to maintain constraint satisfaction.
Empirical results demonstrate fast runtimes and high accuracy across diverse applications like convex QPs, portfolio optimization, and AC optimal power flow.

Descent-Net is a neural network-based framework designed for learning efficient descent directions to solve constrained optimization problems, with a central focus on executing updates that improve objective values while preserving feasibility. It achieves this by embedding iterative optimization procedures and constraint preservation within modular, trainable network structures. Two distinct lines of Descent-Net research exist: one line focuses on constrained optimization using projection and penalty mechanisms (Zhou et al., 12 Dec 2025), and another on enforcing layer-wise stochastic descent within deep unrolled architectures to guarantee robustness and convergence (Hadou et al., 2023).

1. Constrained Optimization Problem Formulation

Descent-Net addresses parametric constrained optimization problems of the form: $\begin{aligned} &\min_{y\in\mathbb{R}^n}\; f_x(y)\,, \ &\text{s.t.}\quad g_x(y)\le 0\in\mathbb{R}^l,\quad h_x(y)=0\in\mathbb{R}^m, \end{aligned}$ where $x\in\mathcal X\subset\mathbb R^p$ encodes instance-specific data, and $f_x, g_x, h_x$ are continuously differentiable in $y$ . The feasible set $\mathcal C$ is assumed non-empty, bounded sublevel sets for $f_x$ are required, the Linear Independence Constraint Qualification (LICQ) holds at all feasible points, and all gradients involved are uniformly Lipschitz. A nontrivial margin is enforced on inactive inequalities to ensure local feasibility can be robustly preserved (Zhou et al., 12 Dec 2025).

In the context of unconstrained or regularized problems, related Descent-Net architectures target minimizees of smooth mappings $f(\cdot;x)$ over $\mathbb{R}^d$ , phrased as bi-level optimization: the outer model is trained to minimize downstream loss to (approximate) optima, and the inner model optimizes the function $f$ given $x$ (Hadou et al., 2023).

2. Descent-Net Architecture

The core of the Descent-Net approach resides in its stack of “Descent Modules,” each engineered to compute a feasible descent direction and update the solution accordingly. The process for each module at stage $s$ , with current iterate $y_s$ , is: $y_{s+1} = y_s + \alpha_s d_s,$ where the direction $d_s$ is produced via an unrolled, trainable projected subgradient solver. Each module features $K$ “Descent Layers,” iteratively solving the penalized subproblem

$\min_{d\in\mathcal D}\ \nabla f_x(y_s)^\top d + \sum_{j=1}^l c_j\max\{\nabla g_{x,j}(y_s)^\top d,\ -M g_{x,j}(y_s)\}$

with $\mathcal D = \{d:\,\|d\|\le1,\,\nabla h_x(y_s)^\top d = 0\}$ . Subgradients, learned linear transformations, and non-linearities (notably ReLU) are composed at each layer, followed by projection onto $\mathcal D$ . All transformation matrices, biases, per-layer step sizes, and module-specific scalars are treated as trainable network parameters (Zhou et al., 12 Dec 2025).

In the unrolled network context, Descent-Net is realized as an $L$ -layer architecture, where each layer performs a differentiable map with injected Gaussian noise for decorrelation. Layerwise maps are denoted: $y_l = \phi_l(y_{l-1},x;W_l) + n_l, \quad n_l\sim\mathcal N(0,\sigma_l^2I)$ (Hadou et al., 2023).

3. Constraint Handling, Objective Design, and Feasibility Guarantees

Feasibility and optimality are enforced through explicit penalization in the training loss: $\ell(y) = f_x(y) + \lambda_g\|\mathrm{ReLU}(g_x(y))\|_1 + \lambda_h\|h_x(y)\|_1,$ where the ReLU-term penalizes violated inequalities and the $\ell_1$ -term penalizes equalities. Crucially, module-wise updates are constructed so as to leave equality constraints invariant to first order by enforcing $d_s\in\mathcal D$ , and the step size $\alpha_s$ is selected to guarantee non-violation of linearized inequalities.

The maximum admissible step is

$\alpha_{\max} = \min_{j:\,\nabla g_{x,j}(y_s)^\top d_s>0} \frac{-g_{x,j}(y_s)}{\nabla g_{x,j}(y_s)^\top d_s}$

and the actual step

$\alpha_s = \sigma(\beta_s)\alpha_{\max}$

employs a learned shrinkage factor via the sigmoid function. This approach ensures that for sufficiently small $\alpha_s$ , all active constraints remain feasible under the linearized update (Zhou et al., 12 Dec 2025).

In stochastically-descending unrolled networks, constraint enforcement is accomplished via explicit descent constraints (e.g., on gradient-norm or distance-to-optimum), enforced as Lagrangian constraints during training over the batch expectation,

$\mathbb{E}[C_l(y_l, y_{l-1})] \le 0,$

with primal-dual optimization of the network weights and Lagrange multipliers (Hadou et al., 2023).

4. Theoretical Properties and Convergence

Descent-Net's design aims for both feasibility preservation and objective value improvement. In the constrained setting, enforcing each $d_s$ to satisfy constraint-orthogonality and updating with provably feasible steps ensures the trajectory remains within the feasible set and achieves descent in the objective, subject to the regularity and margin conditions stated (Zhou et al., 12 Dec 2025).

For stochastically-descending architectures, the methodology guarantees—under mild regularity and absence of distribution shift—layerwise, in-expectation descent of critical metrics (e.g., gradient norm, sub-optimality distance). Constrained learning theory provides that primal-dual training converges to solutions that are near-optimal and nearly feasible, with explicit finite-sample generalization bounds: $\mathbb{E}[\|\nabla f(y_L; x)\|] \leq (1-\delta)^L (1-\epsilon)^L \mathbb{E}\|\nabla f(y_0;x)\| + O(\zeta(N,\delta)),$ where the error terms arise from finite data and constraint violation rates (Hadou et al., 2023).

5. Empirical Evaluation and Applications

Descent-Net demonstrates substantial empirical performance on a diverse set of optimization tasks (Zhou et al., 12 Dec 2025):

Convex quadratic programs (QPs): For problems up to $n=5000$ variables, Descent-Net achieves objective errors in the $10^{-3}$ – $10^{-4}$ range, with runtimes $30\times$ – $100\times$ faster than OSQP (e.g., 0.01 s vs. 2–100 s for larger problems).
Nonconvex QPs: Achieves relative objective error $2.3\times10^{-4}$ at runtimes competitive with, and often faster than, IPOPT.
Portfolio optimization: For up to 4000 assets, yields errors near $10^{-6}$ and batch runtimes under 2 ms, significantly outpacing classical solvers.
AC optimal power flow (OPF): On 30- and 118-bus models with nonlinear equality constraints, achieves errors of $3\times10^{-4}$ in 0.04–0.16 s, compared to 0.29–0.64 s for physics-based PYPOWER.

Stochastically-descending Descent-Nets exhibit improved robustness and steady descent in practical tasks such as sparse coding (LISTA) and image restoration (GLOW-Prox), outperforming unconstrained unrolled networks in the presence of noise and distributional perturbations (Hadou et al., 2023).

6. Scalability, Generalization, and Limitations

Descent-Net exhibits strong scalability, supporting problem sizes with several thousand variables and constraints (Zhou et al., 12 Dec 2025). The architecture is instance-agnostic, relying on shared parameters across all problem instances. A key empirical finding is its generalization capability: it maintains performance and feasibility for larger problem instances than encountered during training.

The architecture accommodates nonlinear equality constraints, such as in AC-OPF, by exploiting an equality-elimination preprocessing. However, the following limitations are identified:

Performance degrades on highly ill-conditioned problems due to approximation difficulty for the learned mapping.
For nonconvex constraints, descent is local and no global optimality guarantee is provided.
The current projection and step-size computation presumes linear or well-behaved constraints; more sophisticated handling may be necessary for arbitrary nonconvex settings.

A summary of validated tasks and scaling behavior is provided:

Problem Type	Size ( $n$ )	Error	Runtime (s)	Baseline (s)
Convex QP	100–5000	$10^{-3}$ – $10^{-4}$	0.01	2–100 (OSQP)
Nonconvex QP	100	$2.3\times 10^{-4}$	0.014	0.34 (IPOPT)
Portfolio Opt.	100–4000	$10^{-6}$	0.002	0.6 (OSQP)
AC-OPF	30–118 bus	$3\times 10^{-4}$	0.04–0.16	0.29–0.64 (PYPOWER)

7. Relationship to Broader Literature and Practical Considerations

Descent-Net integrates the geometric interpretability of feasible-direction and projected subgradient optimization methods with the representational benefits of deep learning. Stochastically-descending Descent-Nets formalize layer-wise descent constraints within unrolled architectures and provide a framework for robust, distributionally-stable learning-to-optimize across signal processing and inverse problems (Hadou et al., 2023).

Key practical guidelines include noise regularization, primal-dual learning rates, adaptive step sizes, and appropriate unroll depths for target problems. Empirical results highlight the benefit of explicit constraint enforcement for both standard and adversarially perturbed instances.

Descent-Net constitutes a significant contribution to scalable, constraint-aware learning-to-optimize frameworks, with rigorous convergence and robustness guarantees in both classical constrained and deep unrolling paradigms.