Integrated Learning and Optimization (ILO)

Updated 27 January 2026

Integrated Learning and Optimization (ILO) is an advanced framework that couples predictive modeling with prescriptive tasks to systematically minimize decision regret and operational costs.
It leverages bilevel programming, differentiable solvers, and decision-focused loss functions to integrate learning with optimization under uncertainty.
Applications in power systems, air traffic management, and autonomous robotics demonstrate significant performance gains and cost reductions compared to sequential approaches.

Integrated Learning and Optimization (ILO) is a class of algorithmic methodologies and mathematical formulations that tightly couple predictive modeling—in particular, the estimation of uncertain or context-dependent parameters—with downstream optimization tasks, such that the learning procedure is explicitly oriented toward improving decision quality. Within ILO frameworks, model parameters are trained through direct integration with the prescriptive optimization problem, rather than solely with respect to prediction accuracy, enabling systematic minimization of post-decision regret, operational costs, constraint violations, or other domain-specific metrics. ILO generalizes traditional sequential pipelines by creating feedback between learning and optimization modules, and its foundations span stochastic bilevel programming, optimization networks, model-driven imitation learning, and robust control designs.

1. Mathematical Foundations and General Formulation

ILO is formalized as a stochastic bilevel program where the upper-level objective aggregates decision quality across realized data, and the lower-level problem is an optimization task whose input data or constraints are governed by the parameters or distributions predicted by a learner. Explicitly, given a family of predictors $f_\theta(x)$ for uncertainty-conditioned variables $Y$ , the canonical ILO program can be posed as

$\min_{\theta \in \Theta} \; \mathbb{E}_{(X,Y) \sim P} \left[ L(z^*(X), X, Y, \theta) \right], \quad\text{s.t.} \; z^*(x) \in \arg\min_{z \in \mathcal{Z}} \mathbb{E}_{Y' \sim f_\theta(x)}[c(z, Y')]$

(Tao et al., 23 Jan 2026). For convex lower-level structure, this reduces to a two-stage stochastic mathematical program with equilibrium constraints (SMPEC) involving variational inequalities. Necessary first-order optimality conditions leverage Mordukhovich subdifferential calculus, coderivative multipliers, and sensitivity analysis of the second-stage value function; in nonconvex settings, partial calmness and value-function penalization are required (Tao et al., 23 Jan 2026).

Integrated bilevel approaches generalize ML-in-the-loop optimization by training prediction parameters via implicit differentiation through the optimization layer, with KKT embeddings or value-function reformulations enabling chain rule propagation (Kolcu et al., 2021). Architecture features include parametric or deep models for $f_\theta$ , differentiable solvers for the lower-level program, and custom loss functions targeting post-decision regrets.

2. Algorithmic Realizations: Optimization Networks, Imitation Learning, and Model-Differentiation

ILO instantiations encompass diverse paradigms:

Optimization Networks (ONs): Directed graphs of solver nodes (continuous, discrete, or mixed) exchange partial solutions, jointly optimizing learning (e.g., feature selection, hyperparameter tuning) and prescriptive modules (e.g., regression coefficients, subset selection) (Kommenda et al., 2021). The canonical MIQP for sparse linear regression couples binary selection variables $z_j$ and coefficients $w_j$ , enforcing $|w_j| \leq M z_j$ . Generalized ONs nest ML architectures and combinatorial decision nodes, facilitating end-to-end optimization for regression, classification, model selection, and inverse inference.
Imitation Learning Optimization (ILO): In resource-constrained or latency-critical settings (e.g., edge Gaussian-splatting rendering), iterative optimizers such as Penalty Majorization Minimization (PMM) generate expert policies, distilled by supervised DNNs via imitation to execute optimized decisions in real time (100 $\times$ speedups vs PMM with negligible loss in objective fidelity) (Wan et al., 26 Oct 2025). Offline demonstration datasets, hybrid loss functions (classification/regression), and scenario-specific features enable rapid duplication of expert behavior under domain constraints.
Integrated Learning via Differentiable Optimization: Neural predictors for uncertain constraints (e.g., load and renewable generation in economic dispatch) are trained with regret-driven objectives that measure the post-decision market cost, not just forecast error. End-to-end SGD leverages differentiable interior-point solvers (e.g., IPOPT, custom KKT Jacobians), enabling gradient flow from decision regret through the optimization solution back to the predictor weights (Pervez et al., 13 Aug 2025, Kolcu et al., 2021).

3. Regret Functions and Decision-Focused Losses

A central tenet of ILO is the construction of application-specific regret or loss functions, quantifying the penalty incurred by post-hoc corrections, operational failures, or resource misalignments. For RTM electricity markets, the regret is explicitly formulated as the weighted sum of ramp-up and ramp-down costs, capturing the economic burden of prediction error on generation dispatch, often substantially exceeding the mere MSE of forecasts (Pervez et al., 13 Aug 2025). In DCOPF and ED for grid congestion, regret also incorporates penalties for constraint violations linked to mispredicted PTDF matrices (Pervez et al., 2024). In distributionally robust air traffic management, the objective is the worst-case recourse cost across Wasserstein ambiguity sets of predicted capacities (Wu et al., 23 Sep 2025).

Contrast with SLO (sequential learning–optimization) is sharp: SLO trains predictors for minimum error, potentially resulting in suboptimal decisions when uncertainty, nonconvexities, or non-monotonic regret surfaces dominate.

4. Feasible Region Geometry and Sensitivity

ILO frameworks jointly analyze the geometry of feasible regions in the lower-level optimization task as a function of predicted uncertainties. In economic dispatch, predicted load and renewables alter the polyhedral faces and balance hyperplanes, driving the solution into different active sets; nonconvex ridges of the regret surface often emerge, necessitating decision-biased forecasts for minimum operational cost (Pervez et al., 13 Aug 2025). For constrained ILC control, tightened feasible sets guarantee constraint satisfaction for all model uncertainties and disturbances, leveraging structured polytopic descriptions to guarantee robust performance (Liao-McPherson et al., 2022).

Sensitivity of optimal decisions, and their dependence on predicted parameters, is rigorously characterized via coderivative multipliers and variational analysis; subgradient formulas for bilevel objectives explicitly link upper-level regret to lower-level solution sensitivities (Tao et al., 23 Jan 2026).

5. Generalization Control and Regularization

To avoid overfitting prediction models to the operational cost metric alone, several generalization control mechanisms are employed:

Penalty Approaches: Convex combination of predictive loss and decision regret ( $\lambda_1$ -weighting), loss-budget constraints ( $\lambda_2$ ), or parameter shrinkage ( $\lambda_3$ ) enforce that predictor accuracy remains within statistically meaningful bounds (Kolcu et al., 2021). These penalties are tuned via validation splits and ensure out-of-sample robustness.
Uncertainty and Scenario Analysis: Hybrid framework such as iCOIL employs on-line estimation of IL policy entropy and CO complexity to switch between fast-but-brittle learnt policies and slow-but-reliable optimization solvers, achieving high operational robustness (Huang et al., 2023).
Distributional Robustness: Wasserstein ambiguity sets safeguard against distribution shifts in predicted uncertainty, enabling adaptive decision rules for fluctuating contexts (Wu et al., 23 Sep 2025).

6. Implementation, Scalability, and Empirical Results

ILO algorithmic complexity typically exceeds that of decoupled learning and optimization. Mixed-integer nodes (e.g., MIQP in ONs) slow with dimensionality; iterative solvers for bilevel programs are NP-hard in general. Sample-average approximations and stochastic gradient methods are practical for scalability; scenario decomposition (Progressive Hedging) aids computation for moderate-sized systems (Kolcu et al., 2021).

Empirical findings consistently establish decision-focused gains:

Ramp cost reductions of 20–30% vs SLO in power systems under significant forecast error (Pervez et al., 13 Aug 2025).
DR-MAGHP yields up to 15.6% improvement over SP-MAGHP under distributional shifts, and >40% against deterministic baselines (Wu et al., 23 Sep 2025).
iCOIL improves success rates by 20–30 percentage points over pure IL, with only modest increase in average task time (Huang et al., 2023).
Optimization Networks produce sparser more accurate models, at the cost of compute, relative to standard OLS and elastic net (Kommenda et al., 2021).

7. Extensions and Ongoing Research Directions

Recent work has begun to address optimality conditions for stochastic bilevel ILO under general uncertainty, providing subgradient and stationarity formulas to facilitate algorithmic gradient-based solvers (Tao et al., 23 Jan 2026). Prospective areas include:

End-to-end training across deep learning and prescriptive layers for full pipeline decision optimization (Wu et al., 23 Sep 2025).
Adaptive selection of model architectures and topologies via meta-optimization (Kommenda et al., 2021).
Enhanced scenario compression and clustering for tractable robust optimization in high-dimensional settings (Wu et al., 23 Sep 2025).
Extension to dynamic or online settings with discrete switching between learned and optimization-based controls (Huang et al., 2023).
Integration of robust learning and optimization in contexts with nonconvex constraints, requiring new calmness and penalty formulations.

Overall, ILO paradigms represent a systematic response to the limitations of sequential learning–optimization pipelines, providing rigorously-motivated, context-aware, and application-driven solutions across domains including electricity markets, air traffic management, autonomous robotics, and ML-augmented decision systems.