Locally Optimized Decision Losses (LODL)

Updated 21 December 2025

LODL is a decision-focused learning methodology that constructs instance-specific, convex surrogate losses matching the local shape of the true decision loss.
It decouples the loss fitting and predictor training phases by using black-box oracle evaluations and efficient gradient-based optimization.
Empirical evaluations on resource allocation, web advertising, and portfolio optimization tasks show improved decision quality over traditional methods.

Locally Optimized Decision Losses (LODL) constitute a methodology for decision-focused learning that circumvents the need for differentiable end-to-end optimization or handcrafted, task-specific surrogates. LODL enables learning predictive models optimized for downstream decision quality using only a black-box oracle for the combinatorial or convex program. A locally-learned, convex surrogate loss is constructed at each training instance, matching the local shape of the true task loss, facilitating efficient gradient-based predictor training. This approach generalizes across a range of resource allocation and predict-then-optimize problems while ensuring theoretical convexity and practical parallelizability (Shah et al., 2022).

1. Mathematical Formulation and Loss Construction

Given a dataset $\{(x_n, y_n)\}_{n=1}^N$ where $x_n$ are features and $y_n\in\mathbb{R}^D$ parameterize a downstream optimization problem, the decision variable is

$z^*(\hat y) = \arg\min_z f(z;\hat y)\quad \text{s.t.}\quad g_i(z) \le 0,\ \forall i.$

The decision loss for a prediction $\hat y$ is defined as

$DL(\hat y, y) = f(z^*(\hat y); y).$

LODL replaces this with a learned local surrogate

$LODL_{\phi_n}(\hat y) \approx DL(\hat y, y_n) - DL(y_n, y_n)$

where parameters $\phi_n$ are fit to match the true decision loss in a neighborhood around $y_n$ and are chosen so $LODL_{\phi_n}$ is convex in $\hat y$ and minimized at $\hat y = y_n$ .

The predictive model $M_\theta$ is trained to minimize

$\frac{1}{N} \sum_{n=1}^N LODL_{\phi_n}\bigl(M_\theta(x_n)\bigr)$

without further calls to the black-box optimizer during training (Shah et al., 2022).

2. Offline Loss Fitting and Predictor Training Pipeline

LODL separates the learning process into two stages:

Offline loss fitting (per instance): For each $n$ , sample $K$ perturbations $\{\hat y_n^k\}$ around $y_n$ , use the oracle to compute $DL(\hat y_n^k, y_n)$ , and fit $\phi_n^*$ via least-squares regression to

$\arg\min_\phi \frac{1}{K} \sum_{k=1}^K \left[ LODL_\phi(\hat y_n^k) - DL(\hat y_n^k, y_n) \right]^2$

No gradient through the oracle is taken; only forward evaluation is required.

Predictor training: With all $\phi_n^*$ fixed, train $M_\theta$ via SGD on the convex surrogate loss as above.

This decoupling enables amortized optimization—once local surrogates are fit, predictor training is efficient and can be reused for multiple model architectures (Shah et al., 2022).

3. Loss Family Design and Theoretical Properties

LODL's success depends on selecting convex, instance-tuned losses. Four main families are proposed:

Weighted MSE: $LODL_w(\hat y) = \sum_{i=1}^D w_i (\hat y_i - y_{n,i})^2$ , with $w_i \ge 0$ .
Quadratic (second-order): $LODL_H(\hat y) = (\hat y - y_n)^T H (\hat y - y_n)$ , $H = L^T L \succeq 0$ ; captures cross-feature curvature.
Directed Weighted MSE: Separate $w_i^+, w_i^-$ for over/under-prediction, allowing asymmetric error costs.
Directed Quadratic: Partition the sign-pattern of $\hat y - y_n$ , fitting one PSD block per quadrant for further expressivity and error asymmetry.

All are convex (elementwise or blockwise), uniquely minimized at $y_n$ , and enable stable gradient-based learning (Shah et al., 2022).

4. Algorithmic Outline and Implementation Details

The LODL meta-algorithm can be summarized as follows:

Stage	Operation	Oracle Calls
Loss fitting	Draw $K$ local samples, compute $DL$ , fit $\phi_n$ per instance	$K$ forward calls per instance (parallelizable)
Predictor	Train $M_\theta$ with SGD on $LODL_{\phi_n}$	None

During fitting:

Gaussian or coordinate perturbations are recommended for sampling, $K\simeq 1,000$ –$5,000$ per instance.
Enforce PSD constraints on quadratic parameters ( $H = L^TL$ ), and sign-consistency for directed forms.

Gradient computation is through the local parametric loss and the model only; backpropagation through the combinatorial optimizer is not needed.

5. Empirical Evaluation and Performance

LODL was evaluated on three canonical resource allocation tasks:

Linear Model (Top-1 selection): DirectedQuadratic LODL achieved normalized Decision Quality (DQ) $\approx 0.96$ , versus $-0.95$ (2-stage) and $0.83$ (DFL surrogate).
Web Advertising (Submodular maximization): DQ $\approx 0.91$ for DirectedQuadratic, outperforming both 2-stage ($0.48$) and DFL ($0.85$).
Portfolio Optimization (Quadratic program): LODL DQ $0.33$, marginally below DFL ($0.35$), but outperforming 2-stage ($0.32$).

Ablations confirm performance and variance improve with increased $K$ , and indicate the optimal sampling scheme depends on the chosen loss family. The methodology demonstrates high parallelizability (across samples) and amortizability (suitable for hyperparameter search or ensembling), often matching or surpassing DFL's wall-clock time as model training is detached from oracle calls (Shah et al., 2022).

6. Practical Usage Guidelines and Limitations

Key recommendations include:

Prefer DirectedQuadratic loss families for maximal expressivity (feature-wise and directional error modeling).
Use extensive, localized sampling around each $y_n$ ; Gaussian and coordinate perturbations are both effective.
Ensure PSD constraints and convexity for all loss parameters to guarantee stability.
Parallelize the oracle-based sample generation across hardware resources.
Monitor surrogate-vs-true decision loss fit (e.g., mean absolute error in the prediction neighborhood) as a leading indicator for downstream decision quality.

Limitations arise due to the localness assumption: LODL presumes predictions from $M_\theta$ remain near $y_n$ . In regimes with high model bias or non-local errors, quantitative performance and sample efficiency can degrade. Each instance's surrogate is fit independently, which scales as $N \times O(d^2)$ samples for Quadratic families when $d$ (output dimension) is large (Shah et al., 2023).

7. Evolution Beyond LODL: Context and Alternatives

LODL offers a significant advantage over handcrafted surrogates by providing a generic, convex, and automatable approach for decision-focused learning with black-box oracles. However, its localness and instance-wise fitting restrict sample efficiency and generalization. Subsequent work has introduced feature-based global loss parameterizations (e.g., Efficient Global Losses—EGL), relaxing the localness assumption and achieving similar or better performance with an order of magnitude fewer oracle calls. These global approaches leverage model features and realistic model-based prediction samples, overcoming key sample complexity and scalability barriers of the purely local LODL regime (Shah et al., 2023).

LODL remains a reference point for fully automated, surrogate-free, and convex decision-focused learning when a black-box oracle is available, and when high-quality local prediction neighborhoods can be assumed. Its methodology is central to the paradigm shift from handcrafted relaxations to learned surrogates in predict-then-optimize pipelines (Shah et al., 2022).

PDF Markdown Chat (Pro)

References (2)

Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses (2022)

Leaving the Nest: Going Beyond Local Loss Functions for Predict-Then-Optimize (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Locally Optimized Decision Losses (LODL).

Locally Optimized Decision Losses (LODL)

1. Mathematical Formulation and Loss Construction

2. Offline Loss Fitting and Predictor Training Pipeline

3. Loss Family Design and Theoretical Properties

4. Algorithmic Outline and Implementation Details

5. Empirical Evaluation and Performance

6. Practical Usage Guidelines and Limitations

7. Evolution Beyond LODL: Context and Alternatives

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Locally Optimized Decision Losses (LODL)

1. Mathematical Formulation and Loss Construction

2. Offline Loss Fitting and Predictor Training Pipeline

3. Loss Family Design and Theoretical Properties

4. Algorithmic Outline and Implementation Details

5. Empirical Evaluation and Performance

6. Practical Usage Guidelines and Limitations

7. Evolution Beyond LODL: Context and Alternatives

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research