Optimizer-Expressivity Duality

Updated 19 July 2025

Optimizer-Expressivity Duality is the interplay between optimization algorithms and model expressivity, where optimizers encode inductive biases to shape solution spaces.
It bridges classical logic and modern duality frameworks, linking set-valued, gauge, and measure-space optimizations with deep learning and quantum applications.
Practical implementations span robust risk management, neural architecture design, and combinatorial optimizations, guiding innovations in sparsity and low-rank solution structures.

Optimizer-Expressivity Duality refers to the interplay between the optimization algorithm (“optimizer”) and the range of behaviors, solutions, or representational capacity (“expressivity”) effectively realized by a learning or optimization system. In classical theory, the optimizer is often treated as a tool for minimizing loss or maximizing performance within a fixed hypothesis class determined by model architecture or logic. Recent perspectives challenge this, particularly in the context of complex models such as deep neural networks (DNNs), set-valued or gauge optimization, and beyond, by recognizing that the optimizer itself can encode inductive biases, shape the structure of solutions, and restrict or expand the set of functions the model effectively realizes.

1. Foundations in Logic, Complexity, and Optimization

The distinction between decision and optimization problems is fundamental to understanding optimizer-expressivity duality. Syntactic logic expressions, such as existential second order (ESO) universal Horn formulae, suffice to capture all polynomial-time decidable problems when augmented with a built-in successor relation. However, optimization problems (e.g., MaxHorn2Sat) defy such a characterization, with even quantifier-free Horn expressions able to capture NP-hard behavior, indicating a profound separation between expressibility and computational tractability in optimization as opposed to decision settings (0904.4331). The theoretical framework thus exposes a limit: syntactic simplicity in problem specification does not guarantee easy solvability, with expressivity disconnected from computational efficiency.

Syntactic descriptions can sometimes be leveraged in optimization duality. For certain classes, such as linear programming with strong duality, optimality conditions can be formulated as ESO logic statements, enabling polynomial-time solution via a single decision computation—contrasting with classical iterative or binary search approaches. This reflects an overview of descriptive complexity and optimization duality, unifying the logic-based expressivity of problem specification with computational procedures.

2. Duality Theory Across Generalized Settings

Duality theory has been systematically extended to cover set-valued, gauge, and measure-space optimization, each framework bringing unique insight into the optimizer-expressivity dichotomy.

Set Optimization: Lagrangian duality has been generalized using complete-lattice structures for set-valued functions (1207.4433). Here, strong duality holds under mild conditions (even with an ordering cone admitting minimal structure) and the notions of “saddle sets” supplant classical saddle points. The saddle set—the pair of primal and dual solution sets—cements the correspondence between primal optimality and the expressive power of the dual, with dual variables capturing geometric properties of the original set-optimization problem.
Gauge Optimization: Gauge functions, defined as convex, nonnegative, positively homogeneous functions, extend the concept of norms and encode regularizations for sparsity, low rank, or atomic decompositions (1310.2639, Yamanaka et al., 2017). Optimizer-expressivity duality is evident in the multiplicative nature of gauge duality—where the dual inequality $\langle x, y \rangle \leq \kappa(x) \cdot \kappa^{\circ}(y)$ expresses constraints on the reachable solution set, with primal and dual gauges tightly controlling solution structure. The dual formulation, using polars and antipolars, often yields more computationally tractable algorithms and deeper sensitivity analysis, showing that the optimizer (through its dual variables) reveals hidden structure in the solution class.
Optimization Over Measure Spaces: Duality for optimization over measure spaces, notably in risk management and financial applications, ties integrability and measure-theoretic expressivity to optimizer structure (1501.04243). Strong duality is established using $L^p$ density functions, admitting very general constraint types and guaranteeing that the optimizer, via dual variables in function space, expresses an exact balance with problem complexity.

3. Implications in Learning and Neural Networks

In deep learning, optimizer-expressivity duality manifests in several important phenomena:

Architecture-Kernel Duality: The computation skeleton framework associates every neural network architecture with a reproducing kernel Hilbert space (RKHS), providing a one-to-one correspondence between network structure and a class of functions (1602.05897). Random initialization “covers” the RKHS, so that even before training begins, the set of achievable functions (expressivity) is determined by the architecture and initialization, with the optimizer’s role (e.g., last-layer training) acting as a selector within this space.
Optimizer-Induced Inductive Biases: The optimizer not only determines convergence speed but shapes solution structure in high-dimensional non-convex landscapes (Pascanu et al., 16 Jul 2025). For example, first-order optimizers (like SGD or Adam) with diagonal preconditioners can produce diffuse representations using more of the parameter space, whereas second-order methods with non-diagonal preconditioners (e.g., Shampoo) guide learning toward low-rank, less interfering representations, aiding continual learning. The update rule

$\theta^{(t+1)} = \theta^{(t)} - \eta \cdot P^{-1}(\theta^{(t)}) \nabla_\theta \mathcal{L}(\theta^{(t)})$

reflects how the choice of preconditioner $P$ modulates paths through parameter space, affecting the effective expressivity of the learned solution.

Direct Control of Sparsity and Other Properties: Techniques such as power-propagation, initially formulated through parameter reparameterizations, can be equivalently realized with optimizer modifications—specifically, tailoring the preconditioner to selectively update parameters based on current magnitude. The use of a preconditioner $P = \mathrm{diag}(|\theta|^\beta)$ enables direct regularization toward sparsity by biasing updates for small weights.

These findings suggest that the optimizer’s design constitutes a critical avenue for encoding domain-tailored inductive biases, complementing data and architecture.

4. Duality in Quantum and Combinatorial Optimization

Quantum Algorithms: In variational quantum algorithms (VQAs), expressivity is rigorously quantified using covering numbers from statistical learning theory, which scale exponentially with circuit complexity parameters (number of gates, entangling degree, observable norm) (Du et al., 2021). The optimizer must be matched to the hypothesis space’s expressivity: overly expressive quantum circuits are hard to train due to barren plateaus and generalization issues, while too constrained architectures lack representational capacity. This underscores a delicate optimizer–expressivity trade-off in quantum learning.
Combinatorial and Polymatroid Optimization: Certain linear programming (LP) duality tricks reveal that the complex, expressive class of polymatroid functions can, under specific constraints (acyclic or simple difference constraints), be “compressed” via dual projection into simpler functional forms (modular or coverage functions) (Im et al., 2022). The duality therefore reflects optimizer-expressivity in that high expressivity is sometimes unnecessary for optimality and efficiency, but necessary in more general or hard instances.

5. Abstract Convexity and Composite Optimization

The theory of abstract convexity broadens optimizer-expressivity duality by enabling zero duality gap results for composite problems defined by general “elementary functions” rather than only classic convexity notions (Tran et al., 2022). By employing classes of support functions (Φ, Ψ), one can fine-tune the expressivity of the dual formulation, demonstrating that optimizer structure can align tightly with the problem’s geometry. Zero duality gap can be assured under conditions on approximate subdifferentials and intersection properties among support functions, even extending to weakly convex or nonconvex components.

Multiple dualities (conjugate, Lagrange, and their interrelations) provide parallel expressive channels through which optimizer structure is matched to the complexity and variety of the original problem, with explicit examples shown for both convex and weakly convex scenarios.

6. Practical Applications and Prospects

In practice, optimizer-expressivity duality informs the design of algorithms and models in several domains:

Robust Risk Management and Option Pricing: Duality frameworks guarantee that the solutions produced by optimizers over measure spaces or under semi-infinite constraints are not only tractable but exactly expressive of the rich underlying stochastic structure (1501.04243).
Global Optimization of Nonconvex Functions: Techniques such as SAGE relaxations, leveraging convex duality via relative entropy, enable both tractable computation and explicit solution recovery for nonconvex signomial and polynomial optimization, illustrating the tight balance between the expressive capacity of the relaxation and the information preserved through the optimizer (Murray et al., 2019).
Bilevel and Hierarchical Optimization: For complex problems such as optimistic bilevel programs, value function reformulations and sophisticated dual frameworks (Fenchel-Lagrange, Toland-Fenchel-Lagrange) demonstrate that optimizer-expressivity duality persists even without standard regularity or convexity assumptions, ensuring that dual solutions retain sufficient expressivity to represent the structure of the hierarchical optimization (En-Naciri et al., 2022).
Infinite and Semi-Infinite Programs: Reverse strong duality and generalizations of Farkas’ lemma provide finite representations and refined optimality conditions for problems with infinitely many constraints, again embodying the interplay between dual formulation expressivity and optimizer power (Dinh et al., 2021).

7. Broader Implications and Future Directions

Research increasingly recognizes that optimizers are not merely computational mechanisms but can drastically reshape learning outcomes, influence solution expressivity, and encode inductive biases previously attributed to other sources (model architecture or data). Designing optimizers with explicit goals—whether sparsity, low-rankness, robustness, or other properties—beyond mere convergence rate presents a growing research direction. The richer understanding of optimizer-expressivity duality encourages joint consideration of model, data, and algorithm as inseparable contributors to the solution space, with foundational, algorithmic, and practical consequences across optimization, machine learning, operations research, quantum computing, and beyond.