Locally Optimized Decision Losses (LODL)
- LODL is a decision-focused learning methodology that constructs instance-specific, convex surrogate losses matching the local shape of the true decision loss.
- It decouples the loss fitting and predictor training phases by using black-box oracle evaluations and efficient gradient-based optimization.
- Empirical evaluations on resource allocation, web advertising, and portfolio optimization tasks show improved decision quality over traditional methods.
Locally Optimized Decision Losses (LODL) constitute a methodology for decision-focused learning that circumvents the need for differentiable end-to-end optimization or handcrafted, task-specific surrogates. LODL enables learning predictive models optimized for downstream decision quality using only a black-box oracle for the combinatorial or convex program. A locally-learned, convex surrogate loss is constructed at each training instance, matching the local shape of the true task loss, facilitating efficient gradient-based predictor training. This approach generalizes across a range of resource allocation and predict-then-optimize problems while ensuring theoretical convexity and practical parallelizability (Shah et al., 2022).
1. Mathematical Formulation and Loss Construction
Given a dataset where are features and parameterize a downstream optimization problem, the decision variable is
The decision loss for a prediction is defined as
LODL replaces this with a learned local surrogate
where parameters are fit to match the true decision loss in a neighborhood around and are chosen so is convex in and minimized at .
The predictive model is trained to minimize
without further calls to the black-box optimizer during training (Shah et al., 2022).
2. Offline Loss Fitting and Predictor Training Pipeline
LODL separates the learning process into two stages:
- Offline loss fitting (per instance): For each , sample perturbations around , use the oracle to compute , and fit via least-squares regression to
No gradient through the oracle is taken; only forward evaluation is required.
- Predictor training: With all fixed, train via SGD on the convex surrogate loss as above.
This decoupling enables amortized optimization—once local surrogates are fit, predictor training is efficient and can be reused for multiple model architectures (Shah et al., 2022).
3. Loss Family Design and Theoretical Properties
LODL's success depends on selecting convex, instance-tuned losses. Four main families are proposed:
- Weighted MSE: , with .
- Quadratic (second-order): , ; captures cross-feature curvature.
- Directed Weighted MSE: Separate for over/under-prediction, allowing asymmetric error costs.
- Directed Quadratic: Partition the sign-pattern of , fitting one PSD block per quadrant for further expressivity and error asymmetry.
All are convex (elementwise or blockwise), uniquely minimized at , and enable stable gradient-based learning (Shah et al., 2022).
4. Algorithmic Outline and Implementation Details
The LODL meta-algorithm can be summarized as follows:
| Stage | Operation | Oracle Calls |
|---|---|---|
| Loss fitting | Draw local samples, compute , fit per instance | forward calls per instance (parallelizable) |
| Predictor | Train with SGD on | None |
During fitting:
- Gaussian or coordinate perturbations are recommended for sampling, –$5,000$ per instance.
- Enforce PSD constraints on quadratic parameters (), and sign-consistency for directed forms.
Gradient computation is through the local parametric loss and the model only; backpropagation through the combinatorial optimizer is not needed.
5. Empirical Evaluation and Performance
LODL was evaluated on three canonical resource allocation tasks:
- Linear Model (Top-1 selection): DirectedQuadratic LODL achieved normalized Decision Quality (DQ) , versus (2-stage) and $0.83$ (DFL surrogate).
- Web Advertising (Submodular maximization): DQ for DirectedQuadratic, outperforming both 2-stage ($0.48$) and DFL ($0.85$).
- Portfolio Optimization (Quadratic program): LODL DQ $0.33$, marginally below DFL ($0.35$), but outperforming 2-stage ($0.32$).
Ablations confirm performance and variance improve with increased , and indicate the optimal sampling scheme depends on the chosen loss family. The methodology demonstrates high parallelizability (across samples) and amortizability (suitable for hyperparameter search or ensembling), often matching or surpassing DFL's wall-clock time as model training is detached from oracle calls (Shah et al., 2022).
6. Practical Usage Guidelines and Limitations
Key recommendations include:
- Prefer DirectedQuadratic loss families for maximal expressivity (feature-wise and directional error modeling).
- Use extensive, localized sampling around each ; Gaussian and coordinate perturbations are both effective.
- Ensure PSD constraints and convexity for all loss parameters to guarantee stability.
- Parallelize the oracle-based sample generation across hardware resources.
- Monitor surrogate-vs-true decision loss fit (e.g., mean absolute error in the prediction neighborhood) as a leading indicator for downstream decision quality.
Limitations arise due to the localness assumption: LODL presumes predictions from remain near . In regimes with high model bias or non-local errors, quantitative performance and sample efficiency can degrade. Each instance's surrogate is fit independently, which scales as samples for Quadratic families when (output dimension) is large (Shah et al., 2023).
7. Evolution Beyond LODL: Context and Alternatives
LODL offers a significant advantage over handcrafted surrogates by providing a generic, convex, and automatable approach for decision-focused learning with black-box oracles. However, its localness and instance-wise fitting restrict sample efficiency and generalization. Subsequent work has introduced feature-based global loss parameterizations (e.g., Efficient Global Losses—EGL), relaxing the localness assumption and achieving similar or better performance with an order of magnitude fewer oracle calls. These global approaches leverage model features and realistic model-based prediction samples, overcoming key sample complexity and scalability barriers of the purely local LODL regime (Shah et al., 2023).
LODL remains a reference point for fully automated, surrogate-free, and convex decision-focused learning when a black-box oracle is available, and when high-quality local prediction neighborhoods can be assumed. Its methodology is central to the paradigm shift from handcrafted relaxations to learned surrogates in predict-then-optimize pipelines (Shah et al., 2022).