Input-Convex Neural Networks (ICNNs)
- Input-Convex Neural Networks (ICNNs) are specialized neural architectures that ensure the scalar output remains convex with respect to designated inputs by enforcing non-negative weight constraints and convex activation functions.
- They serve as universal approximators for continuous convex functions, enabling efficient optimization in domains such as optimal transport, structured prediction, and model predictive control.
- Recent advances, including SOC-ICNN and HyCNN, enhance expressivity and training efficiency, overcoming traditional challenges like dead weights and approximation bottlenecks in high-dimensional settings.
Input-Convex Neural Networks (ICNNs) are a class of neural architectures designed to guarantee convexity of the scalar-valued output function with respect to (some or all of) the inputs. This property is achieved through architectural constraints, enabling tractable and certifiable optimization, inference, and learning in domains where convexity is essential, such as optimal transport, structured prediction, stochastic programming, and convex surrogate learning. ICNNs provide universal approximation for continuous convex functions and underpin scalable convex-analytic solutions for high-dimensional, structure-rich problems. Recent advances include architectural enhancements addressing expressiveness bottlenecks and efficient training methodologies for constrained parameter spaces.
1. Architectural Principles and Mathematical Definition
An ICNN is a feed-forward neural network, typically scalar-valued, parameterized to preserve convexity in the designated input variables. The canonical architecture, following [Amos et al. 2017, (Amos et al., 2016)], realizes a mapping as a sequence of layers: where
- are inter-layer weights (constrained),
- are input-skip (unconstrained) weights,
- are biases,
- are convex, non-decreasing activations (e.g., ReLU, leaky-ReLU, SELU, softplus).
Convexity in is ensured by imposing elementwise for all , together with the activation constraints. Skip connections (0, 1) do not violate convexity due to their affine structure. Variants such as partially input-convex networks (PICNNs) extend the architecture to allow mixed convex/nonconvex dependence on partitioned input dimensions (Amos et al., 2016).
ICNNs are universal approximators for convex functions over compact sets: any continuous convex function can be approximated by an ICNN of sufficient width and depth (Makkuva et al., 2019, Liu et al., 8 May 2025). Alternative basis formulations, such as input-convex Kolmogorov–Arnold networks (ICKAN), represent convex functions via sums of univariate convex splines, also achieving universal approximation (Deschatre et al., 27 May 2025).
2. Training Methodologies and Weight Constraints
Standard training employs projected gradient descent (PGD) with post-update projection onto the non-negative orthant for all inter-layer weights, or softplus-based reparametrization to maintain 2 (Hoedt et al., 2023, Siahkoohi et al., 22 May 2026). However, classical projections induce non-smoothness incompatible with standard deep learning convergence guarantees, while softplus introduces exponentially attenuated gradients when weights are negative, resulting in “dead” weights and plateaued losses (Siahkoohi et al., 22 May 2026). To mitigate these pathologies, the "lift" method emits 3 from a hypernetwork conditioned on the input batch, introducing necessary batch-coupled stochasticity for improved convergence and landscape smoothing (Siahkoohi et al., 22 May 2026).
Principled initialization schemes for ICNNs address the non-centered (positive) weight distribution and preserve the signal propagation statistics through depth, yielding faster loss descent and robust training without the need for input skip connections (Hoedt et al., 2023). General fixed-point formulas for mean and variance are derived for the non-negative weight setting, and positive weights are sampled from log-normal distributions matching these moments.
Training losses are application-dependent: mean-squared error for regression and convex surrogate fitting (Liu et al., 8 May 2025, Omidi et al., 2024), cross-entropy or adversarial objectives in classification, and adversarial minimax objectives for Kantorovich-potential-based optimal transport (Makkuva et al., 2019). Convexity constraints are always enforced, either by explicit projection or reparametrization.
3. Convex Surrogate Modeling and Downstream Optimization
The primary motivation for ICNNs is to enable end-to-end tractable optimization over inputs 4 given context 5 for 6 convex in 7 (Amos et al., 2016, Liu et al., 8 May 2025). The input-convex structure allows for projection-free convex optimization (including subgradient methods, bundle entropy, and argmin-differentiation), ensuring global optima for the inferential problem
8
and facilitating differentiation through the argmin by implicit function theory, essential for structured tasks and decision-focused learning (Amos et al., 2016, Liu et al., 8 May 2025, Liu et al., 24 Apr 2026). ICNNs are particularly well-suited as convex surrogates in stochastic programming (Liu et al., 8 May 2025), model predictive control (MPC) (Wang et al., 2024), rheological surrogate modeling (Parolini et al., 2024), and battery optimization (Omidi et al., 2024), where convexity is required for solution quality, feasibility, and scalability.
For instance, embedding an ICNN as a recourse function surrogate in two-stage stochastic programming maps the entire decision process to a linear program (LP) without introducing integer variables, resulting in significant computational speedups over generic MIP-embedded feedforward networks and improved solution quality (Liu et al., 8 May 2025). In explicit ML-based MPC, ICNN models guarantee that the explicit regions and candidate policies can be embedded within convex MIQP selection, maintaining real-time feasibility and control performance (Wang et al., 2024).
4. Extensions: Expressivity, Duality, and Parameter Efficiency
ReLU-ICNNs parameterize the set of continuous, convex, piecewise-linear (CPWL) functions. This restriction implies an exponential complexity barrier in approximation accuracy for high-curvature targets in high dimension: approximating a 9-strongly convex quadratic to uniform error 0 requires exponentially many affine pieces in 1 (Liu et al., 24 Apr 2026). This significant expressiveness challenge has driven several architectural advances:
- SOC-ICNN (Liu et al., 24 Apr 2026, Liu et al., 6 May 2026): Extends the architecture by augmenting the ReLU-ICNN LP value-function with smooth, convex quadratic and second-order-cone norm modules, resulting in value functions of SOCPs. SOC-ICNNs strictly expand the representable convex function class while preserving forward-pass complexity. All first- and second-order geometric information (subdifferentials, gradients, local Hessians) can be read off from SOCP dual variables exactly, facilitating white-box optimization and robust inference (Liu et al., 6 May 2026).
- Hyper Input Convex Neural Networks (HyCNNs) (Hundrieser et al., 29 Apr 2026): Integrate per-layer maxout (or 2-sum-exp) gates on multiple affine branches, achieving exponential growth of representational regions in depth and a logarithmic parameter dependence on error, exponentially reducing parameter counts versus standard ICNNs when approximating smooth convex targets.
- ICKAN (Deschatre et al., 27 May 2025): Compose univariate convex spline functions in a Kolmogorov–Arnold-style architecture, with explicit convexity constraints at the 1D level. P1-ICKAN achieves a constructive universal approximation guarantee for convex regression, providing representational alternatives and potential improvements in smoothness or derivative fidelity.
A summary comparison of select expressivity-enhancing ICNN variants is provided below:
| Architecture | Additional Modules | Expressivity Gain | Readout/Inference |
|---|---|---|---|
| ReLU-ICNN | None | CPWL only | Fast, LP-valued |
| SOC-ICNN | Quadratic, norm-conic branches | Exact smooth & conic curvature | SOCP-valued, exact dual |
| HyCNN | Maxout over affine branches | Exponential in depth for region count | Simple feedforward |
| ICKAN | Univariate convex spline bases | Constructive universal approx., spline | Direct evaluation |
5. Theoretical Guarantees and Analytical Properties
The universal approximation power of wide/deep ICNNs for continuous convex functions on compact domains is established for both regression and classification (Amos et al., 2016, Pfrommer et al., 2023, Deschatre et al., 27 May 2025). In optimal transport, parameterizing the Kantorovich or Monge map via the gradient of an ICNN (or ICKAN) yields provably optimal 3-Wasserstein transport maps whose discontinuity structure matches disconnected supports, improving over generator-based GANs which enforce superfluous continuity (Makkuva et al., 2019, Deschatre et al., 27 May 2025).
SOC-ICNNs admit explicit dual characterizations: every first-order subgradient, directional derivative, and local Hessian can be recovered from SOCP dual optimal multipliers (Liu et al., 6 May 2026). Local nondegeneracy ensures positive-definiteness and continuous behavior of second derivatives. Moreover, the lift method for training ICNNs introduces an implicit Moreau-envelope smoothing of the loss landscape, provably accelerating escape from saddle or shoulder plateaus induced by classical softplus or PGD-enforced positivity (Siahkoohi et al., 22 May 2026).
Applications in operator learning for PDEs (JKO flows) (Alvarez-Melis et al., 2021), model reduction with symmetry constraints (Huang et al., 23 Nov 2025), and robust optimization (feature-convex networks) (Pfrommer et al., 2023) further showcase the versatility and analytical benefits of convexity enforcement in neural scalar function modeling.
6. Applications, Empirical Findings, and Limitations
ICNNs have been deployed in a range of high-impact domains:
- Optimal Transport: ICNN-based minimax learning of the Kantorovich potential achieves expressive, discontinuous Monge maps in high dimensions with global optimality and robustness; outperforms WGAN, regularized OT, and MIP-NN-based solvers in mass transfer, both in synthetic and image data (Makkuva et al., 2019).
- Stochastic Programming and MPC: ICNN surrogates enable convex and computationally efficient embedding in two-stage stochastic programming and explicit MPC frameworks, outperforming MIP-NN surrogates in solution time and stability, especially as problem scales increase (Liu et al., 8 May 2025, Wang et al., 2024).
- Physics-Informed Modeling: Data-driven ICNN surrogates for viscosity laws guarantee well-posedness and enable stable FEM discretizations for non-Newtonian Stokes flows, achieving near-zero train/test error and convergence rates matching theory (Parolini et al., 2024).
- Adversarial and Certified Robustness: Feature-convex classifiers composed of an ICNN and a Lipschitz feature map achieve closed-form, deterministic certified radii for adversarial perturbations with state-of-the-art accuracy and orders-of-magnitude faster certification times (Pfrommer et al., 2023).
- Mirror Descent Optimization: Parameterizing the mirror potential with an ICNN within learned mirror descent solvers accelerates convergence over generic gradient methods in imaging and classification tasks, retaining convergence guarantees and adaptability (Tan et al., 2022).
Empirical limitations include expressivity barriers for classical ICNNs (addressed by HyCNN, SOC-ICNN, ICKAN), extra training-time cost for hypernetwork-based parameterizations (lift), and the necessity for careful initialization or regularization to avoid dead or stalled features. Nonconvex target functions cannot be exactly represented; in such cases, extensions such as CDiNN—modeling functions as differences of ICNN-based convex functions and optimizing with CCP—recover universal approximation while retaining LP-based tractability (Sankaranarayanan et al., 2021).
7. Outlook: Open Problems and Architectural Evolution
Recent advances position ICNNs and their extensions as general-purpose convex function approximators with strong theoretical guarantees and scalable empirical performance. Research directions include:
- Tightening convergence analysis for parametrized JKO schemes in high dimensions (Alvarez-Melis et al., 2021).
- Extending dual-geometry analysis to more generic conic or even SDP-valued ICNNs.
- Developing domain-specialized ICNN layers—e.g., for graph, sequence, or physics-structured data—using cone-lifted or maxout-based constraints (Liu et al., 24 Apr 2026, Hundrieser et al., 29 Apr 2026).
- Further reducing training complexity and overcoming rare-case pathologies in deep-layered architectures or large-scale decision-focused training.
The interplay between convexity, scalable optimization, and neural expressivity embodied in ICNNs underpins a growing suite of neural architectures for scientifically grounded, interpretable, and optimization-ready learning systems (Amos et al., 2016, Makkuva et al., 2019, Liu et al., 24 Apr 2026, Siahkoohi et al., 22 May 2026, Hundrieser et al., 29 Apr 2026).