Papers
Topics
Authors
Recent
Search
2000 character limit reached

Input Convex Neural Networks

Updated 14 April 2026
  • Input Convex Neural Networks are a specialized class of feedforward models that enforce convexity through nonnegative weight constraints and convex, nondecreasing activations.
  • They enable tractable, globally optimal solutions by guaranteeing that designated inputs produce convex outputs, which is crucial for optimization, control, and inverse problem applications.
  • ICNNs require tailored training and initialization strategies and have demonstrated superior convergence and reliability compared to traditional nonconvex architectures.

Input Convex Neural Networks (ICNNs) are a specialized class of feedforward neural architectures that enforce convexity with respect to some or all of their inputs by design. Developed to bridge the expressivity of deep networks with the tractability and regularity of convex optimization, ICNNs have become foundational in modern data-driven optimization, control, inverse problems, and physics-inspired modeling. The defining feature is architectural and parametric constraints that guarantee that the output is a convex function of designated input variables, ensuring global optima for input inference and enabling their integration as function surrogates in structured convex programs.

1. Formal Definition and Architectural Principles

Let fθ ⁣:RdRf_\theta\colon \mathbb{R}^d \rightarrow \mathbb{R} be a scalar-valued function modeled by a feed-forward network parameterized by θ\theta. fθf_\theta is convex in xx if, for any x,yRdx, y \in \mathbb{R}^d and t[0,1]t \in [0,1],

fθ(tx+(1t)y)tfθ(x)+(1t)fθ(y).f_\theta(t x + (1-t) y) \leq t f_\theta(x) + (1-t) f_\theta(y).

An Input Convex Neural Network (ICNN) guarantees this property via layerwise constraints. For an LL-layer ICNN with input xx,

z0=x z1=σ0(W0x+b0) zk+1=σk(Wkzk+Ukx+bk),k=1,,L1 f(x)=zL+1=σL(WLzL+ULx+bL)\begin{aligned} z_0 &= x \ z_1 &= \sigma_0(W_0 x + b_0) \ z_{k+1} &= \sigma_k(W_k z_k + U_k x + b_k), \quad k=1,\ldots,L-1 \ f(x) &= z_{L+1} = \sigma_L(W_L z_L + U_L x + b_L) \end{aligned}

where

  • θ\theta0 (entrywise) for θ\theta1 (and optionally for θ\theta2),
  • θ\theta3 (entrywise) for θ\theta4,
  • each θ\theta5 is convex and nondecreasing (e.g., ReLU, leaky-ReLU, softplus).

Convexity follows by induction: each θ\theta6 is a convex function of θ\theta7, and compositions with convex, nondecreasing activations preserve convexity. The same approach extends to partially input-convex networks (PICNNs), where convexity is enforced only with respect to a subset of inputs (Amos et al., 2016, Parolini et al., 2024).

2. Parameter Constraints and Convexity Guarantees

Core to the ICNN is the parametric structure:

  • Nonnegativity in θ\theta8 and θ\theta9 ensures that each hidden layer’s preactivation is a nonnegative weighted sum of convex functions, preserving convexity.
  • Convex, nondecreasing activations (e.g. ReLU, leaky-ReLU, softplus, ELU) are required. These may be chosen for additional regularization properties or smoothness (Amos et al., 2016, Parolini et al., 2024).

Parameter enforcement is typically achieved via two methods: (a) post-gradient-step clipping (fθf_\theta0), or (b) nonnegative parametrization (e.g. fθf_\theta1, where fθf_\theta2 is unconstrained) (Parolini et al., 2024).

Where strong convexity is desired (e.g., in modeling mirror potentials), a positive definite quadratic term may be added (Tan et al., 2022).

3. Theoretical Expressivity and Limitations

ICNNs are universal approximators of continuous convex functions. Any Lipschitz convex function on a compact domain can be approximated arbitrarily well by an ICNN with sufficient width and depth, provided nonnegative weights and convex, nondecreasing activations (Amos et al., 2016, Chen et al., 2018).

However, ICNNs represent only a subset of convex functions implementable with ReLU multi-layer perceptrons (MLPs): for 1-hidden-layer ReLU networks, ICNNs are sufficient to realize any convex continuous piecewise-linear (CPWL) function. For depth fθf_\theta3, there exist convex ReLU networks not reparameterizable into the ICNN form due to structural constraints—ICNNs cover only a small subset of convex ReLU networks in this regime (Gagneux et al., 6 Jan 2025).

ICNNs cannot faithfully approximate nonconvex functions; the minimal sup-norm error can be significant for highly nonconvex targets. They also cannot capture some system structures (e.g., linear time delays) without input expansion or lifting (Sankaranarayanan et al., 2021).

4. Training, Initialization, and Inference

Training follows standard supervised paradigms (e.g., mean-squared error minimization for regression), employing stochastic optimization (Adam, SGD). After each update, nonnegativity constraints are re-imposed as above (Parolini et al., 2024, Chen et al., 2018). For structured learning tasks (e.g., max-margin structured prediction, Q-function learning), the architecture enables convex inference and loss computation (Amos et al., 2016).

Initialization of nonnegative weights deviates from classical methods (e.g. Xavier, He) due to nonzero mean and strictly positive entries. Recent principled schemes adjust mean, variance, and correlations to control signal propagation (e.g. centering preactivation means, using nonnegative log-normal samples), resulting in accelerated learning and obviating the need for skip connections under proper statistical calibration (Hoedt et al., 2023).

Inference exploits convexity: input optimization becomes a convex program, solvable by projected gradient, bundle methods, or converted to linear/quadratic programs as needed (Amos et al., 2016, Christianson et al., 2024). For partially input-convex architectures, convexity guarantees tractable optimization over the targeted variables, even when nonconvexity remains in other coordinates (Mallick et al., 16 May 2025).

5. Integration in Optimization, Control, and Model Reduction

ICNNs are widely used to learn convex surrogates for control, hybrid optimization, and inverse problems, particularly when global optimality or robust feasibility is required. For instance:

  • In constrained parametric optimization, ICNNs coupled with augmented Lagrangian methods can learn solution mappings that converge to fθf_\theta4-KKT points, empirically rivaling classic QP and ACOPF solvers in gap (0.15–2%) and runtime (fθf_\theta51 ms) (Liu et al., 7 May 2025).
  • In energy systems and contingency screening, ICNNs enable the data-driven characterization of polyhedral feasible sets, with convex programs enforcing reliability constraints (e.g., zero false negative guarantees in fθf_\theta6 screening) (Christianson et al., 2024, Mallick et al., 16 May 2025).
  • In model predictive control (MPC), ICNNs enable convex optimization over control inputs, facilitating fast and globally optimal receding-horizon control in physical systems (e.g. MuJoCo and large-scale building HVACs), with up to 5fθf_\theta7 less compute time compared to nonconvex shooting methods and >10% higher empirical reward or energy savings (Chen et al., 2018, Xu et al., 23 Mar 2026).
  • Recent architectures combine input convexity with modern sequence models—e.g. the Input Convex Encoder-only Transformer (IC-EoT) maintains input convexity across temporal horizons, overcoming gradient instability in recurrent ICNNs and delivering 3–8fθf_\theta8 faster MPC solution times at similar predictive accuracy (Xu et al., 23 Mar 2026).
  • For structure-preserving physical modeling, convexity ensures mathematical well-posedness of PDEs (e.g. non-Newtonian Stokes flow), as data-driven ICNN surrogates automatically obey monotonicity and growth conditions necessary for unique solvability (Parolini et al., 2024).
  • In nonlinear model reduction for real-time deformable simulation, symmetric ICNNs encode both convexity and physical oddness to ensure stability and plausible generalization under out-of-distribution loads or sparse training (Huang et al., 23 Nov 2025).

6. Application to Convex Learning, Inverse Problems, and Optimal Transport

ICNNs are routinely applied to the learning of convex potentials (e.g., in optimal transport), where gradient maps of an ICNN instantiate monotone transport operators. In Wasserstein distance estimation, adversarial training with ICNNs over Kantorovich dual pairs directly yields Brenier maps, with rigorous consistency and uniqueness guarantees—empirically outperforming or matching regularized GAN and classical OT approaches across a range of synthetic and high-dimensional benchmarks (Makkuva et al., 2019).

In mirror descent and learned optimization, ICNNs serve as expressive, data-driven mirror potentials, enabling the acceleration of iterative optimization with learnable Bregman distances and provable regret bounds. Learned mirror maps on image denoising, inpainting, and large-scale classification outperform classical solvers by one order of magnitude in convergence speed and final accuracy (Tan et al., 2022).

In mathematical finance, variants that realize convexity as the supremum of affine forms (max-of-hyperplanes or log-sum-exp smoothing) are employed for option pricing and path-dependent payoffs, with provable approximation theorems (rates fθf_\theta9 in xx0 dimensions) and observed sub-percent relative errors against Monte Carlo estimators (Lemaire et al., 2024).

7. Advances, Extensions, and Theoretical Developments

Recent work has generalized the ICNN construction to spline-based (ICKAN) and Kolmogorov–Arnold architectures, achieving convexity via sums of convex 1D shape bases. Piecewise-linear and cubic-spline ICKANs enjoy universal approximation properties (where proven), and in practice, match or slightly outperform classical ICNNs in regression and learning transport potentials, at lower parameter counts in low to moderate dimensions (Deschatre et al., 27 May 2025).

There is ongoing research into the expressivity of general convex ReLU nets vis-à-vis ICNNs, with precise necessary and sufficient convexity characterizations—showing that as network depth increases, ICNNs become a vanishingly small subset of all convex ReLU functions (Gagneux et al., 6 Jan 2025). Difference-of-convex architectures (CDiNNs) have been proposed to increase representational class, modeling arbitrary smooth functions as xx1 with xx2 convex ReLU nets, while retaining subproblem tractability in optimization (Sankaranarayanan et al., 2021).

References

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Input Convex Neural Networks (ICNNs).