Predict-then-Optimize (PO) Approach

Updated 21 January 2026

Predict-then-Optimize (PO) is a data-driven paradigm that predicts uncertain parameters from features and uses embedded optimization to generate tailored decisions.
End-to-end and proxy-based methods refine the PO pipeline by incorporating differentiable optimization layers and surrogate losses to minimize decision regret.
Empirical benchmarks across finance, scheduling, and power management reveal that advanced PO techniques can achieve significantly lower regret and faster inference than traditional methods.

The Predict-then-Optimize (PO) Approach is a foundational paradigm in data-driven decision-making under uncertainty, combining statistical learning with mathematical optimization to yield high-quality decisions tailored to real-world tasks. At its core, PO leverages observed features to predict uncertain, task-defining parameters of an optimization model and then translates these predictions into decisions via an embedded optimization procedure. Recent developments have aimed to close the gap between predictive accuracy and actual decision quality by designing end-to-end learning algorithms, new loss functions, and efficient proxy architectures, as well as extending PO frameworks to non-convex, combinatorial, and sequential settings. This article provides a detailed technical overview of the PO methodology, its principal algorithmic variants, key advances in loss-driven and proxy-based approaches, empirical benchmarking, and outstanding research challenges.

1. Mathematical Framework and Standard Pipeline

Consider a stochastic decision process where $z \in Z \subseteq \mathbb{R}^d$ is an observable feature vector, $\zeta \in C \subseteq \mathbb{R}^p$ an unobserved parameter vector, and $x \in \mathbb{R}^n$ a decision vector constrained to a feasible set $X = \{x \mid g(x) \leq 0,\, h(x) = 0\}$ . The downstream decision is given by the parametric optimization problem:

$x^*(\zeta) = \arg\min_{x} \ f(x,\zeta) \quad \text{s.t. } g(x) \leq 0,\, h(x) = 0.$

The classical Predict-then-Optimize pipeline consists of two stages:

Prediction: Learn a function $C_\theta(z)$ (often a neural network) to estimate $\zeta$ from $z$ via statistical regression/supervised learning.
Optimization: Given the predicted parameters $\hat{\zeta} = C_\theta(z)$ , solve the optimization problem to generate the final decision $x^*(\hat{\zeta})$ .

The decision quality is typically assessed by expected regret:

$\min_\theta\ \mathbb{E}_{(z,\zeta) \sim \Omega} \big[ f\big(x^*(C_\theta(z)),\,\zeta\big) - f(x^*(\zeta),\,\zeta) \big].$

In practice, the easier-to-compute $\mathbb{E}[f(x^*(C_\theta(z)),\zeta)]$ is also reported since $f(x^*(\zeta),\zeta)$ is constant in $\theta$ .

Classic two-stage PO regresses $C_\theta$ using an $\ell_2$ objective ( $\|\hat{\zeta}-\zeta\|^2$ ), ignoring the downstream impact on $f$ . This can yield suboptimal decisions due to the misalignment between predictive error and decision error, especially in the presence of non-convex decision boundaries or discrete variables (Kotary et al., 2024, Kotary et al., 2023).

2. End-to-End and Surrogate Loss Approaches

The End-to-End Predict-then-Optimize (EPO) paradigm directly incorporates the optimization problem into the training loop. Rather than minimizing parameter prediction error, EPO minimizes expected regret with respect to the ground truth:

$\min_\theta\ \mathbb{E}_{(z,\zeta)}\left[ f\left(x^*(C_\theta(z)), \zeta\right) \right].$

Training thus requires differentiating through the argmin operator, for which several techniques have been proposed:

Implicit Differentiation of the KKT Conditions: Applied to convex problems (e.g., QPs, LPs) via differentiable optimization layers [Amos & Kolter 2017]. This requires explicit derivation of gradients for each optimization class.
Smoothing or Stochastic Perturbation Techniques: Replace non-differentiable argmin with smooth approximations, e.g., Fenchel-Young or SPO+ surrogate losses [Elmachtoub & Grigas 2020; Berthet et al. 2020].
Unrolling Fixed-Point Solvers: Back-propagate through iterative optimization algorithms such as Projected Gradient Descent or Newton methods [Wang et al. 2020; Tang & Khalil 2022].

While EPO can improve downstream decision quality and align predictions with decision boundaries, it is computationally expensive (due to solving the inner optimization at every iteration) and becomes less effective for nonconvex or discrete settings lacking well-defined gradients (Kotary et al., 2024, Kotary et al., 2023).

3. Learning-to-Optimize from Features (LtOF) and Proxy-based Methods

Proxy-based approaches address computational and modeling limitations of EPO by learning to map features directly to optimal decisions, bypassing both parameter prediction and embedded solver differentiation. Formally, LtOF replaces the two-stage pipeline with a single function $J_\varphi: Z \rightarrow X$ , trained to output near-optimal $x^*(\zeta)$ directly from features $z$ :

$\min_\varphi\ \mathbb{E}_{(z,\zeta)\sim\Omega}\left[ \ell^{\mathrm{LtO}}( J_\varphi(z),\,\zeta ) \right].$

Losses $\ell^{\mathrm{LtO}}$ typically enforce both near-optimality and (approximate or exact) feasibility. Principal instantiations include:

Lagrangian Dual Learning (LD): $L_{\text{LD}}(x̂, \zeta) = \|x̂ - x^*(\zeta)\|^2 + \lambda^T [g(x̂)]_+ + \mu^T h(x̂)$ , with dual ascent updates for multipliers.
Primal–Dual Learning (PDL): Employs an augmented Lagrangian with an instance-specific dual network.
Deep Constraint Completion & Correction (DC3): The predictor emits a partial solution, and a deterministic completion operator restores feasibility (Kotary et al., 2024, Kotary et al., 2023).

This methodology eliminates solver calls during inference (resulting in $10\times$ – $100\times$ speedups), is agnostic to problem convexity or discreteness (so long as a proxy can be trained), and can leverage surrogates or projections for constraint satisfaction.

4. Empirical Evaluation and Benchmarks

Experimental validation of PO variants includes a series of controlled tasks:

Convex Quadratic Portfolio Optimization: LtOF (with LD/PDL/DC3) matches or outperforms EPO and dramatically outperforms Two-Stage regression as feature mapping complexity increases, achieving up to $10\times$ lower regret for highly nonlinear mappings, and $10$– $100\times$ faster inference (Kotary et al., 2024, Kotary et al., 2023).
Nonconvex QP and AC Optimal Power Flow: Only LtOF-based methods produce high-quality solutions, while EPO fails or converges to poor minima in nonconvex contexts.
Inference Runtime: LtOF methods yield 1–2 ms/sample, compared to 20–100 ms/sample for solver-based EPO approaches.
Empirical Soft Regret (ESR) in Black-Box Settings: For scenarios with partial feedback, the ESR surrogate minimizes regret in expectation and achieves statistically significant improvements on real-world news recommendation and healthcare datasets (Tan et al., 2024).

A comparative summary:

Method	Regret (Complex Task)	Inference Time	Feasibility Handling
Two-Stage	High (grows w/ $k$ )	Fast	Indirect (via solver)
EPO (Differentiable)	Low (only for convex, moderate nonlinearity)	Slow (solver call/sample)	Handcrafted, solver-specific
LtOF (LD/PDL/DC3)	Lowest (convex & nonconvex)	Fast	Surrogate/proxy function

Regret is measured as $100 \times \mathbb{E}[ f(x̂, \zeta) - f(x^*(\zeta), \zeta) ] / |f(x^*(\zeta), \zeta)|$ (Kotary et al., 2024).

5. Applications and Problem Domains

The PO framework and its modern extensions have been adopted in diverse operational and scientific contexts:

Clinician Scheduling: Combines LLM-extracted constraints from unstructured data with hybrid classifier–MIP pipelines to achieve 100% schedule fill and improved equity (Jha et al., 2 Oct 2025).
Wildfire Response with Drone Swarms: PO is used for fast, robust convex-NN fire mapping and MIP-based drone scheduling with chance constraints, enabling real-time, scalable deployment (Pan et al., 2024).
Uplift Modeling with Continuous Treatments: PO enables integer programming–based optimal dose allocation under complex fairness or budget constraints (Vos et al., 2024).
Sequential Decision Problems (MDP/RL): Predict-then-Optimize is extended to RL through feature-based parameter prediction and decision-focused learning with sampled differentiation in high-dimensional policy spaces (Wang et al., 2021).
Seaport Power-Logistics Scheduling: Decision-focused continual learning preserving cross-task generalization in large operational ecosystems through Fisher regularization and differentiable surrogates (Pu et al., 11 Nov 2025).

6. Theoretical Properties and Open Challenges

Recent work has begun to establish regret bounds, calibration properties, and generalization results:

Calibration and Generalization: For classical PO with i.i.d. data, convex surrogates (e.g., SPO+) are Fisher consistent and enjoy generalization guarantees via Rademacher complexity; recent results extend uniform calibration and regret rates to dependent (mixing) data and autoregressive models (Liu et al., 2024).
Loss Function Design: Efficient global losses (EGL) replace restrictive local-loss assumptions, enabling feature-based parameterization and model-based sampling for robust, sample-efficient loss construction (Shah et al., 2023).
Theoretical Regret Bounds: ESR minimizes regret directly under mild assumptions, attaining regret $O(n^{-1/4})$ in black-box decision problems (Tan et al., 2024); similar surrogates have been proven to yield sublinear regret in convex, strongly convex, and general decision architectures (Liu et al., 2024).
Proxy-based Approximations: LtOF quality depends on the representational coverage of the proxy in training and requires careful constraint reproduction to avoid feasibility drift in complex or discrete domains.
Open Problems: Guaranteeing exact feasibility, extending PO frameworks for combinatorial and hybrid problems, and automating method selection for given problem structure remain active areas of research (Kotary et al., 2024, Kotary et al., 2023).

7. Tooling, Extensions, and Future Directions

Libraries and Implementations: PyEPO provides extensible PyTorch-based implementations of PO for linear and integer programming, supporting multiple loss architectures and workflows (Tang et al., 2022).
Accelerated Training: Novel methods such as cone-aligned vector estimation (CaVE) exploit convex relaxations to enable fast decision-focused training without repeated integer optimization, delivering state-of-the-art results on large VRP and TSP instances (Tang et al., 2023).
Multi-task and Continual Learning: PO has been generalized to multi-task regimes, balancing regret across diverse operational tasks through shared deep representations and adaptive loss weightings (Tang et al., 2022, Pu et al., 11 Nov 2025).
Risk and Robustness Integration: Frameworks such as Predict-then-Calibrate decouple prediction and uncertainty quantification, enabling robust and distributionally robust decisions with coverage guarantees by calibrating model residuals (Sun et al., 2023).

In summary, the Predict-then-Optimize approach is a unifying, extensible paradigm supporting high-stakes decision-making in complex, data-driven environments, with ongoing advances in loss function design, proxy learning, calibration, and scalable algorithmics (Kotary et al., 2024, Kotary et al., 2023, Jha et al., 2 Oct 2025, Tang et al., 2022).