Papers
Topics
Authors
Recent
2000 character limit reached

Locally Optimized Decision Losses (LODL)

Updated 21 December 2025
  • LODL is a decision-focused learning methodology that constructs instance-specific, convex surrogate losses matching the local shape of the true decision loss.
  • It decouples the loss fitting and predictor training phases by using black-box oracle evaluations and efficient gradient-based optimization.
  • Empirical evaluations on resource allocation, web advertising, and portfolio optimization tasks show improved decision quality over traditional methods.

Locally Optimized Decision Losses (LODL) constitute a methodology for decision-focused learning that circumvents the need for differentiable end-to-end optimization or handcrafted, task-specific surrogates. LODL enables learning predictive models optimized for downstream decision quality using only a black-box oracle for the combinatorial or convex program. A locally-learned, convex surrogate loss is constructed at each training instance, matching the local shape of the true task loss, facilitating efficient gradient-based predictor training. This approach generalizes across a range of resource allocation and predict-then-optimize problems while ensuring theoretical convexity and practical parallelizability (Shah et al., 2022).

1. Mathematical Formulation and Loss Construction

Given a dataset {(xn,yn)}n=1N\{(x_n, y_n)\}_{n=1}^N where xnx_n are features and ynRDy_n\in\mathbb{R}^D parameterize a downstream optimization problem, the decision variable is

z(y^)=argminzf(z;y^)s.t.gi(z)0, i.z^*(\hat y) = \arg\min_z f(z;\hat y)\quad \text{s.t.}\quad g_i(z) \le 0,\ \forall i.

The decision loss for a prediction y^\hat y is defined as

DL(y^,y)=f(z(y^);y).DL(\hat y, y) = f(z^*(\hat y); y).

LODL replaces this with a learned local surrogate

LODLϕn(y^)DL(y^,yn)DL(yn,yn)LODL_{\phi_n}(\hat y) \approx DL(\hat y, y_n) - DL(y_n, y_n)

where parameters ϕn\phi_n are fit to match the true decision loss in a neighborhood around yny_n and are chosen so LODLϕnLODL_{\phi_n} is convex in y^\hat y and minimized at y^=yn\hat y = y_n.

The predictive model MθM_\theta is trained to minimize

1Nn=1NLODLϕn(Mθ(xn))\frac{1}{N} \sum_{n=1}^N LODL_{\phi_n}\bigl(M_\theta(x_n)\bigr)

without further calls to the black-box optimizer during training (Shah et al., 2022).

2. Offline Loss Fitting and Predictor Training Pipeline

LODL separates the learning process into two stages:

  • Offline loss fitting (per instance): For each nn, sample KK perturbations {y^nk}\{\hat y_n^k\} around yny_n, use the oracle to compute DL(y^nk,yn)DL(\hat y_n^k, y_n), and fit ϕn\phi_n^* via least-squares regression to

argminϕ1Kk=1K[LODLϕ(y^nk)DL(y^nk,yn)]2\arg\min_\phi \frac{1}{K} \sum_{k=1}^K \left[ LODL_\phi(\hat y_n^k) - DL(\hat y_n^k, y_n) \right]^2

No gradient through the oracle is taken; only forward evaluation is required.

  • Predictor training: With all ϕn\phi_n^* fixed, train MθM_\theta via SGD on the convex surrogate loss as above.

This decoupling enables amortized optimization—once local surrogates are fit, predictor training is efficient and can be reused for multiple model architectures (Shah et al., 2022).

3. Loss Family Design and Theoretical Properties

LODL's success depends on selecting convex, instance-tuned losses. Four main families are proposed:

  1. Weighted MSE: LODLw(y^)=i=1Dwi(y^iyn,i)2LODL_w(\hat y) = \sum_{i=1}^D w_i (\hat y_i - y_{n,i})^2, with wi0w_i \ge 0.
  2. Quadratic (second-order): LODLH(y^)=(y^yn)TH(y^yn)LODL_H(\hat y) = (\hat y - y_n)^T H (\hat y - y_n), H=LTL0H = L^T L \succeq 0; captures cross-feature curvature.
  3. Directed Weighted MSE: Separate wi+,wiw_i^+, w_i^- for over/under-prediction, allowing asymmetric error costs.
  4. Directed Quadratic: Partition the sign-pattern of y^yn\hat y - y_n, fitting one PSD block per quadrant for further expressivity and error asymmetry.

All are convex (elementwise or blockwise), uniquely minimized at yny_n, and enable stable gradient-based learning (Shah et al., 2022).

4. Algorithmic Outline and Implementation Details

The LODL meta-algorithm can be summarized as follows:

Stage Operation Oracle Calls
Loss fitting Draw KK local samples, compute DLDL, fit ϕn\phi_n per instance KK forward calls per instance (parallelizable)
Predictor Train MθM_\theta with SGD on LODLϕnLODL_{\phi_n} None

During fitting:

  • Gaussian or coordinate perturbations are recommended for sampling, K1,000K\simeq 1,000–$5,000$ per instance.
  • Enforce PSD constraints on quadratic parameters (H=LTLH = L^TL), and sign-consistency for directed forms.

Gradient computation is through the local parametric loss and the model only; backpropagation through the combinatorial optimizer is not needed.

5. Empirical Evaluation and Performance

LODL was evaluated on three canonical resource allocation tasks:

  • Linear Model (Top-1 selection): DirectedQuadratic LODL achieved normalized Decision Quality (DQ) 0.96\approx 0.96, versus 0.95-0.95 (2-stage) and $0.83$ (DFL surrogate).
  • Web Advertising (Submodular maximization): DQ 0.91\approx 0.91 for DirectedQuadratic, outperforming both 2-stage ($0.48$) and DFL ($0.85$).
  • Portfolio Optimization (Quadratic program): LODL DQ $0.33$, marginally below DFL ($0.35$), but outperforming 2-stage ($0.32$).

Ablations confirm performance and variance improve with increased KK, and indicate the optimal sampling scheme depends on the chosen loss family. The methodology demonstrates high parallelizability (across samples) and amortizability (suitable for hyperparameter search or ensembling), often matching or surpassing DFL's wall-clock time as model training is detached from oracle calls (Shah et al., 2022).

6. Practical Usage Guidelines and Limitations

Key recommendations include:

  • Prefer DirectedQuadratic loss families for maximal expressivity (feature-wise and directional error modeling).
  • Use extensive, localized sampling around each yny_n; Gaussian and coordinate perturbations are both effective.
  • Ensure PSD constraints and convexity for all loss parameters to guarantee stability.
  • Parallelize the oracle-based sample generation across hardware resources.
  • Monitor surrogate-vs-true decision loss fit (e.g., mean absolute error in the prediction neighborhood) as a leading indicator for downstream decision quality.

Limitations arise due to the localness assumption: LODL presumes predictions from MθM_\theta remain near yny_n. In regimes with high model bias or non-local errors, quantitative performance and sample efficiency can degrade. Each instance's surrogate is fit independently, which scales as N×O(d2)N \times O(d^2) samples for Quadratic families when dd (output dimension) is large (Shah et al., 2023).

7. Evolution Beyond LODL: Context and Alternatives

LODL offers a significant advantage over handcrafted surrogates by providing a generic, convex, and automatable approach for decision-focused learning with black-box oracles. However, its localness and instance-wise fitting restrict sample efficiency and generalization. Subsequent work has introduced feature-based global loss parameterizations (e.g., Efficient Global Losses—EGL), relaxing the localness assumption and achieving similar or better performance with an order of magnitude fewer oracle calls. These global approaches leverage model features and realistic model-based prediction samples, overcoming key sample complexity and scalability barriers of the purely local LODL regime (Shah et al., 2023).

LODL remains a reference point for fully automated, surrogate-free, and convex decision-focused learning when a black-box oracle is available, and when high-quality local prediction neighborhoods can be assumed. Its methodology is central to the paradigm shift from handcrafted relaxations to learned surrogates in predict-then-optimize pipelines (Shah et al., 2022).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Locally Optimized Decision Losses (LODL).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube