- The paper introduces a decision-focused method that minimizes both prediction and decision errors by integrating the ESS optimization model into the training loop.
- It quantifies decision error using surrogate regret and combines it with MSE in a hybrid loss function to balance accuracy with economic performance.
- The approach achieves up to 47% higher average daily benefits over standard MSE-based models, highlighting its practical value in ESS arbitrage.
This paper proposes a decision-focused approach for electricity price prediction specifically tailored for optimizing Energy Storage System (ESS) arbitrage (2305.00362). It addresses the limitation of traditional prediction models that solely focus on minimizing prediction errors (like Mean Squared Error - MSE) without considering how these errors impact the downstream decision-making process (i.e., ESS charging/discharging schedule for maximizing profit).
The core idea is to integrate the downstream ESS arbitrage optimization model into the training loop of the upstream price prediction model. The goal is to minimize not just the prediction error but also the decision error, which is the difference in profitability between decisions made using predicted prices and decisions made using the actual (oracle) prices.
Methodology:
- Quantifying Decision Error (Regret): The paper uses regret to measure the decision error. Regret is defined as the difference between the optimal profit achievable with perfect knowledge of future prices (oracle profit) and the actual profit obtained using decisions based on predicted prices.
regret(λ^,λ)=λTP∗(λ)−λTP∗(λ^)
where λ is the true price, λ^ is the predicted price, P∗(λ) is the optimal ESS schedule under true prices, and P∗(λ^) is the optimal schedule under predicted prices.
- Tractable Surrogate Regret: Directly optimizing the prediction model to minimize regret is difficult because the regret function is discontinuous and non-differentiable with respect to the predicted price λ^, especially for optimization problems involving binary variables like the ESS arbitrage model (a Mixed-Integer Linear Program - MILP). To overcome this, the paper derives a tractable, differentiable upper bound called surrogate regret (Lregret), inspired by the Smart Predict-then-Optimize (SPO) framework [Elmachtoub2020].
Lregret(λ^,λ)=(λ−2λ^)TP∗(λ−2λ^)−2λ^TP∗(λ)+c∗(λ)
where c∗(λ) is the optimal oracle profit, and P∗(λ−2λ^) is the optimal ESS schedule calculated using a modified cost vector (λ−2λ^).
- Gradient Calculation: A key contribution is deriving the gradient of this surrogate regret with respect to the predicted price:
∂λ^∂Lregret(λ^,λ)=−2(P∗(λ)+P∗(λ−2λ^))
This gradient requires solving the ESS arbitrage MILP twice: once with the true price λ and once with the modified price (λ−2λ^).
- Hybrid Loss Function: To balance prediction accuracy and decision quality, a hybrid loss function (Lcomb) is proposed, combining the surrogate regret with the traditional MSE loss, weighted by a hyperparameter ϵ:
Lcomb(λ^,λ)=Lregret(λ^,λ)+ϵLMSE(λ^,λ)
The hyperparameter ϵ controls the trade-off: a higher ϵ emphasizes prediction accuracy (lower MSE), while a lower ϵ emphasizes decision quality (lower regret).
- Hybrid SGD Learning Method: A specialized Stochastic Gradient Descent (SGD) training procedure is introduced (Algorithm 1). In each training step for a batch of data:
- The gradient of the weighted MSE term (ϵLMSE) is calculated using standard back-propagation (e.g., via PyTorch's Autograd).
- The gradient of the surrogate regret term (Lregret) is calculated explicitly using the formula derived above (requiring MILP solves).
- Both gradients are accumulated.
- The prediction model's parameters are updated once using the combined gradient. This involves two separate back-propagation passes before a single parameter update.
Implementation Details:
- Prediction Models: The approach is demonstrated using both a simple Linear Regression model and a more complex Residual Neural Network (ResNet), showing applicability across models with different representational capacities.
- ESS Arbitrage Model: A standard MILP formulation for ESS arbitrage is used, considering charging/discharging efficiencies, power limits, energy capacity limits, and constraints preventing simultaneous charging and discharging (using binary variables and the big-M method).
- Maximize ∑t=1TλtPtΔt
- Subject to energy balance, capacity limits (Emin, Emax), power limits (Pchmax, Pdismax), efficiency (ηch, ηdis), and binary constraints.
- Data & Features: Six years of hourly PJM market data (price, load, temperature) are used. Features include historical load/price, future temperature forecasts, and calendar features (day of week, holiday). Input features are standardized, and the output (price) is predicted in log scale.
- Software: Implemented using Python with PyTorch for neural networks and Cvxpy (likely with a MILP solver like Gurobi or CPLEX) for the ESS optimization.
- Hyperparameters: Key hyperparameters include the ResNet architecture (hidden layers, dropout), optimizer settings (Adam, learning rate), batch size, the hybrid loss weight ϵ (tuned via experiments, found to be 25 for ResNet in the case paper), and ESS parameters (capacity, power ratings, efficiencies).
Results and Practical Implications:
- The decision-focused prediction (DFP) model trained with the hybrid loss significantly outperforms models trained solely on MSE, especially in terms of realized arbitrage profit and reduced regret, even if its raw prediction accuracy (RMSE, MAPE) might be slightly worse than some highly tuned prediction-focused models (like MLP or Random Forest).
- The DFP model achieved ~47% higher average daily benefits than the MSE-based ResNet model and ~6% higher benefits than a tuned MLP model for a 500 kWh ESS example.
- The key reason for improved performance is that the hybrid loss encourages the prediction model to capture the timing of price fluctuations more accurately, even if the exact price magnitude prediction error is slightly higher. It flattens the distribution of prediction errors across the day, reducing large errors during critical high/low price periods crucial for arbitrage decisions.
- The optimal value for ϵ depends on the prediction model's capacity and the data characteristics, representing a trade-off. For simple models (Linear), a higher ϵ might be needed to maintain reasonable prediction accuracy while improving decisions. For complex models (ResNet), a smaller ϵ can effectively guide the model towards better decisions without significantly sacrificing prediction accuracy.
- The approach provides a practical framework for aligning prediction model training directly with the economic objectives of the downstream task, leading to tangible improvements in application performance (higher profits for ESS operators).