Aggregation-Based Convexification Methods

Updated 20 October 2025

Aggregation-based convexification is a technique that aggregates functions, constraints, or model components to transform nonconvex problems into equivalent convex formulations.
It leverages convex hull methods in tasks like empirical risk minimization and Bayesian estimation to achieve optimal statistical rates and adaptive performance.
The approach underpins advances in global optimization, inverse problems, and mixed-integer programming by generating strong relaxations and facet-defining inequalities.

Aggregation-based convexification techniques form a family of methods in mathematical optimization and statistical learning that leverage the aggregation of functions, constraints, or model components to enable or improve convexification—the transformation of a (possibly nonconvex) problem into an equivalent or relaxed convex formulation. These methods exploit structure available through model aggregation, whether at the level of empirical risk, constraints, estimators, or representation, to achieve optimal statistical rates, stronger relaxations, or efficient numerical algorithms. The concept encompasses both classical approaches, such as empirical risk minimization over convex hulls, and modern convexification of nonconvex systems, with applicability across regression, classification, inverse problems, polynomial optimization, and mixed-integer programming.

1. Statistical Aggregation and Empirical Risk Convexification

A core instantiation of aggregation-based convexification arises in statistical learning, where the objective is to aggregate a dictionary $F$ of $M$ functions in order to compete with the best function in $\operatorname{conv}(F)$ . The fundamental result (Lecué, 2013) establishes that empirical risk minimization (ERM) over $\operatorname{conv}(F)$ in the bounded regression model, with squared loss risk $R(f) = \mathbb{E}[(Y - f(X))^2]$ , achieves uniform oracle inequalities: $R(\widehat{f}_n^{\mathrm{ERM}-C}) \leq \min_{f \in \operatorname{conv}(F)} R(f) + c_0 b^2 \max\left(\psi_n^{(C)}(M),\frac{x}{n}\right)$ with probability at least $1-4e^{-x}$ , where $\psi_n^{(C)}(M) = \frac{M}{n}$ if $M \le \sqrt{n}$ , and $\psi_n^{(C)}(M) = \sqrt{\frac{\log(eM/\sqrt{n})}{n}}$ if $M > \sqrt{n}$ . These rates are minimax-optimal for the convex aggregation problem.

This technique is both geometrically and algorithmically fundamental: the aggregate over the convex hull transforms combinatorial selection over $F$ into convex optimization, enabling the application of concentration inequalities and empirical process theory. Maurey's empirical method is used to discretize $\operatorname{conv}(F)$ and extend concentration from finite sets to the convex hull, ensuring uniform deviation bounds.

Key consequences:

In low-complexity regimes ( $M \leq \sqrt{n}$ ), excess risk is linear in model cardinality.
In high-complexity regimes, excess risk is governed by complexity measures reflecting the entropy of $\operatorname{conv}(F)$ .

2. Bayesian and Probabilistic Aggregation

Bayesian convex and linear aggregation extends the aggregation-based convexification framework through hierarchical prior modeling (Yang et al., 2014). By placing Dirichlet priors over the simplex of weights (for convex aggregation), or double Dirichlet Gamma priors (for linear aggregation), one obtains posterior contraction rates matching the minimax rates derived for frequentist convex aggregation—without requiring explicit tuning parameters or sparsity assumption knowledge.

Specifically, for regression with a sparse truth, the rate shrinks to order $(s/n) \log(M/s)$ when only $s$ dictionary elements are relevant. The construction of the prior—a symmetric Dirichlet with concentration parameter scaling as $M^{-\gamma}$ , $\gamma > 1$ —ensures that the posterior adapts to unknown sparsity without penalty selection.

The Bayesian formulation deepens the theoretical foundations of aggregation-based convexification, revealing adaptation properties under “M-open” settings (i.e., model misspecification), and providing practical methods for computation via MCMC or variational inference.

3. Aggregative Convexification in Optimization

Aggregation-based convexification appears in optimization when one works with sets or objectives that are nonconvex due to nonlinear (especially bilinear or quadratic) structure, or mixed integer constraints. The technique is to aggregate constraints—usually via nonnegative linear combinations—so as to produce single (often simpler) constraints whose convex hull can be explicitly described or more tightly relaxed.

Quadratic and Bilinear Aggregation

For quadratic inequalities, under a positive definite linear combination (PDLC) condition, the convex hull of the set defined by three quadratic inequalities can be obtained via (possibly infinite) intersections of the convex hulls of aggregated constraints (Dey et al., 2021). For two such inequalities, it suffices to consider only two aggregations (a sharp extension of Farkas’ Lemma and the S-lemma). However, for four or more, or when PDLC fails, aggregation may be insufficient, revealing a sharp limitation of the technique.

For bilinear bipartite equality constraint systems,

$S = \{(x, y) \in [0,1]^{n_1 + n_2} : x^\top Q_1 y + a_1^\top x + b_1^\top y + c_1 = 0, \ x^\top Q_2 y + a_2^\top x + b_2^\top y + c_2 = 0 \}$

aggregation by weights $\lambda$ yields relaxations as intersections of convex hulls for single (aggregated) constraints. In the two-dimensional setting, a finite set of aggregations suffices for exact convexification, while for higher dimensions either infinite or approximate intersection is necessary (Dey et al., 18 Oct 2024). Adding a handful of aggregated cuts yields substantial tightening in branch-and-bound algorithms for structured model updating.

In chance-constrained programming, aggregation-based convexification over bilinear extended formulations with simplex constraints leads to facet-defining inequalities that unify and strengthen polyhedral relaxations of mixing sets with knapsack-type constraints (Davarnia et al., 17 Oct 2025). The aggregation procedure in a lifted space generates a very broad family of facet-defining inequalities, with computational results showing over 90% coverage of all convex hull facets in standard benchmark sets.

Practical Techniques and Computational Heuristics

When selecting aggregation multipliers, practical heuristics range from grid search (maximizing violation over a grid), “simple” dual search (maximizing separation from a relaxed solution), to surrogate duality inspired multipliers. Tightening via aggregation is often used in concert with McCormick or one-row relaxations, complementing classical convexification strategies.

4. Aggregation in Learning with Constraints and Program Extensions

In regularized empirical risk minimization with constrained labels—a setting unifying semi-supervised, unsupervised, and multi-label learning—aggregation-based convexification provides a mechanism for constructing the tightest convex extension (Legendre-Fenchel biconjugate) of a nonconvex cost function defined on $\Theta \times Y$ , with $Y$ discrete (Shcherbatyi et al., 2016). Exact convex extensions are NP-hard to compute, but by decomposing the objective additively, aggregation over single components yields efficiently computable looser convex envelopes. For binary labels and convex losses/regularizers, the extension reduces to one-dimensional convex programs, admitting closed form in some cases (e.g., squared loss plus $\ell_2$ -regularization).

In multi-convex vector-structured problems, disciplined multi-convex programming (DMCP) frameworks facilitate variable aggregations through verification of convexity upon fixing individual blocks, enabling systematic block coordinate descent with guaranteed convexity of subproblems (Shen et al., 2016).

5. Universal Function Representation and Generalized Convexity

Universal representation of generalized convex functions further advances aggregation-based convexification into function space parameterization (Nehzati, 30 Aug 2025). Here, any generalized convex function (in the sense of optimal transport, economics, or mechanism design) is represented as a supremum (or log-sum-exp aggregation) over a finite family of surplus functions $\phi(x, y) - r(y)$ , parameterized by a finite set $Y$ and weighting function $r$ . This finite “aggregation” parameterization is shown to be dense (universal approximation property, UAP) in the space of all generalized convex functions and their gradients, provided mild regularity. Applications include single-level reformulation of bilevel problems and direct convexification of mechanism design objectives.

This parameterization is directly implemented in computational packages, enabling optimization over convex parameter domains and direct enforcement of generalized convexity constraints, which is crucial in economics, optimal transport, and machine learning.

6. Applications across Inverse Problems, Polynomial Optimization, and MINLP

Aggregation-based convexification underpins recent global optimization algorithms in polynomial and mixed-integer nonlinear optimization. For box-constrained polynomial optimization, aggregation by monomial patterns (e.g., multilinear, chain, truncated submonoid) allows decomposition of high-dimensional moment relaxations into tractable, structured pieces, trading off relaxation tightness versus computational cost (Averkov et al., 2019).

In coefficient inverse problems arising in PDEs (inverse parabolic, radiative transport, frequency-dependent scattering), aggregation emerges as expansion over a suitable basis (e.g., truncated Fourier-like series) of the unknown, followed by minimization of a weighted (often Carleman) convexified least squares functional (Klibanov et al., 2020, Klibanov et al., 2022, Le et al., 2023). The aggregation stabilizes the inherently nonconvex estimation problem, enabling globally convergent numerical algorithms.

In mixed-integer nonlinear programming, aggregation-based convexification can be incorporated into outer-approximation frameworks: convexification cuts derived from aggregation, combined with rigorous domain reduction, tighten MILP master relaxations and accelerate branch-and-bound convergence (Peng et al., 30 Jul 2024).

7. Theoretical and Computational Impact

Aggregation-based convexification techniques bridge the gap between tight but expensive convexifications (e.g., sum-of-squares relaxations, copositive programming) and rapidly computable but loose relaxations. The structure provided by aggregation—whether of models, constraints, or functional representations—enables theoretically optimal statistical rates, adaptation to model misspecification and sparsity, efficient large-scale computation, and the recursive construction of facet-defining inequalities and representations in combinatorial and chance-constrained optimization.

These methods are now standard in statistical ensemble learning, high-dimensional regression, global polynomial optimization, mixed-integer programming, and inverse problem regularization, and remain under active development, particularly for problems combining combinatorial and analytical nonconvexity.

Summary Table: Aggregation-Based Convexification Techniques

Application Area	Aggregation Mechanism	Key Outcome
Regression/Classification (ERM)	Convex hull of base functions	Minimax-optimal risk via convex empirical risk minimization
Bayesian Ensemble Learning	Dirichlet priors on aggregation	Adaptive, tuning-free, minimax-optimal Bayesian estimators
Quadratic/Bilinear Constraints	Weighted sum of constraints	Strong relaxations; facet-defining inequalities
Inverse Problems/PDEs	Basis expansion, aggregated residual	Strictly convex functional; global convergence
Polynomial Optimization	Aggregation by pattern (e.g. chain)	Tradeoff between relaxation quality and cost
Mixed-Integer Optimization (MINLP)	Aggregated convexification cuts	Tighter MILP relaxations in OA/branch-and-bound
Generalized Convex Function Approximation	Max/LSE over surplus functions	Universal approximators (functions and gradients)

Aggregation-based convexification constitutes a rigorous and algorithmically diverse paradigm for rendering nonconvex problems tractable or optimally estimable via convex analytical and computational mechanisms.