Machine Learning and Deep Learning Models for Short Term Electricity Price Forecasting in Australia's National Electricity Market

Published 26 Apr 2026 in cs.LG and eess.SY | (2604.23908v1)

Abstract: Short term electricity price forecast is essential in competitive power markets, yet electricity price series exhibit high volatility, irregularity, and non-stationarity. This phenomenon is pronounced in the South Australian region of the National Electricity Market, where high renewable penetration drives price volatility and frequent negative price intervals, while structural changes such as the transition to five-minute settlement further complicate forecast. To address these challenges, this study develops a unified benchmark framework. Under identical data preprocessing, feature engineering with lag features, rolling statistics, cyclic temporal encodings, and so on, and an 85% to 15% chronological train test split, six algorithms are systematically compared, including AWMLSTM, CatBoost, GBRT, LSTM, LightGBM, and SVR. The results show that for price prediction, tree-based models, especially GBRT with an R squared value of 0.88, generally outperform LSTM and SVR. However, all models achieve a mean absolute percentage error above 90%, and more than 65% of GBRT predictions have relative errors above 10%, which highlights the inherent difficulty of price forecast. For demand prediction, all models perform substantially better than in price prediction. AWMLSTM and GBRT achieve an R2 value of 0.96 with mean absolute percentage error below 32%, and GBRT has 74.37% of samples within 5% error, while LSTM and SVR perform less accurately in both tasks. Future improvements should focus on hybrid models such as tree plus transformers, data augmentation for extreme events, and error correction to better capture price spikes.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper benchmarks six machine learning and deep learning models for short-term electricity forecasting in the Australian NEM.
It demonstrates that tree-based ensembles, particularly GBRT, achieve the highest R² and lowest MAE for price prediction despite high MAPE, highlighting challenges in volatile markets.
The analysis shows that while demand forecasting is more tractable with lower errors, traditional deep learning models struggle to capture extreme pricing events.

Comparative Evaluation of Machine Learning and Deep Learning Models for Short-Term Electricity Price Forecasting in the Australian NEM

Introduction

Short-term electricity price forecasting (EPF) is an operational imperative in energy markets, particularly in the volatile context of Australia's National Electricity Market (NEM). South Australia's high volatility, non-stationary pricing patterns, frequent negative price events, and regulatory shifts such as the transition to five-minute settlements impose severe challenges on EPF methodologies. The analyzed study (2604.23908) presents a systematic benchmark of six machine learning and deep learning models—AWMLSTM, CatBoost, GBRT, LSTM, LightGBM, and SVR—using unified datasets and consistent preprocessing pipelines, thus directly addressing reproducibility and comparability concerns prevalent in the literature.

Methodology

The study constructs a comprehensive experimental framework, including meticulous feature engineering (lags, rolling statistics, cyclic temporal encodings, interaction terms), rigorous data normalization, and a chronological 85%/15% train-test split. This ensures temporal consistency and eliminates look-ahead bias. All six models are fed the same feature set and normalized data, enabling a controlled comparison across algorithmic paradigms.

Model Portfolio

AWMLSTM: Attention-augmented LSTM capturing long-term and salient temporal dependencies.
CatBoost: Categorical boosting trees leveraging ordered boosting and native categorical feature support.
GBRT: Classic gradient boosting regression trees, standard for ensemble-based nonlinear regression.
LSTM: Vanilla recurrent network for sequential modeling with cell-based memory.
LightGBM: Leaf-wise boosting with histogram-based optimization for scalability.
SVR: Kernel-based support vector regression, agnostic to sequential structure.

Each model is stringently evaluated against both price and demand prediction targets with a suite of error metrics: MAE, MAPE, MSE, and coefficient of determination ( $R^2$ ).

Empirical Results

Price Forecasting Performance

Tree-based models—GBRT, CatBoost, and LightGBM—consistently outperform both baseline deep learning (LSTM) and SVR across all relevant error metrics. Notably, GBRT achieves the highest $R^2$ value (0.88) and the lowest MAE (13.25) for price prediction. However, the absolute mean absolute percentage errors (MAPE) are universally high (all models exceeding 90%; SVR surpasses 300%), and over 65% of GBRT predictions incur relative errors above 10%. This magnitude of error, despite large feature sets and advanced models, is a direct quantification of the difficulty inherent to short-term EPF under the market conditions in South Australia.

Deep learning models (AWMLSTM, LSTM) do not deliver competitive results for EPF, with LSTM yielding the lowest $R^2$ (0.68) and extreme MAPE, highlighting issues around volatility adaptation and poor fit to rare pricing spikes or abrupt regime shifts. SVR provides the weakest performance due to inadequate temporal modeling capacity and sensitivity to kernel selection in high dimensions.

Demand Forecasting Performance

In contrast to price forecasting, demand prediction is significantly more tractable. AWMLSTM and GBRT both achieve an $R^2$ of 0.96, MAE below 61.1, and MAPE below 32%. GBRT demonstrates the highest proportion of predictions within 5% (74.37%) and 10% (84.87%) error of ground truth values. LightGBM and CatBoost are also highly competitive, albeit with slight increases in bias and variance, particularly at extreme peaks and troughs—likely a function of limited sample support for rare demand events.

While AWMLSTM shows strength in capturing overall demand periodicity, the tree ensembles outperform in accuracy and robustness to unusual patterns. LSTM and SVR fail to provide reliable demand estimates, suffering from both systematic bias and poor tracking of extremes.

Discussion: Numerical Benchmarks and Implications

The study's key numerical findings are:

For price forecasting, no model achieves operationally low errors: GBRT is best on $R^2$ (0.88) and MAE (13.25), but high MAPE (>150%) and large portions of predictions with >10% relative error are universal across models.
For demand forecasting, both GBRT and AWMLSTM achieve $R^2$ = 0.96, with significantly lower error rates (MAPE < 32% for top models) and most test points predicted within ±10% error.
LSTM and SVR are substantially inferior for both tasks, suggesting limited value from classic deep learning or kernel-based methods for these datasets and feature regimes in the NEM context.

The contradictory finding that deep learning (LSTM, AWMLSTM) underperforms tree-ensembles is particularly notable given widespread claims regarding the superiority of deep learning for sequential data. The results identify major shortcomings of LSTM-variants in highly volatile, nonstationary, and sparse-event domains like electricity pricing in deregulated, renewable-dominated markets.

The high MAPE across all price prediction models suggests that current methodological approaches—despite advanced preprocessing—fail to capture extreme, abrupt market events and price spikes, which remain a forecasting bottleneck in the NEM setting.

Theoretical and Practical Implications

The empirical evidence implies several directions for both research and practice:

Tree-based ensembles (GBRT, LightGBM, CatBoost) retain dominance in structured, tabular time series where regime changes and volatility dominate the error landscape.
Deep learning architectures require new architectures (e.g., transformer hybrids) or auxiliary mechanisms to contend with rare event forecasting and high-frequency volatility.
The extreme inaccuracy in price forecasts (>90% MAPE) emphasizes the necessity for hybrid models, robust error correction schemas, and the inclusion of additional exogenous factors (weather, renewables, macroeconomic signals).
Future systems should implement dynamic model switching or regime-aware ensembles to mitigate the inadequacy of single-model deployments in volatile markets.
For demand prediction, integrating additional external features (weather, event calendars) and focusing on extreme-point accuracy may further narrow the already low error rates.

Future Directions

The study identifies several avenues for improving EPF:

Hybrid models (tree ensembles + transformer layers) to specifically target episodic price spikes and nonstationarity.
Data augmentation and synthetic resampling around extreme events to increase tail event awareness.
Advanced error correction layers to systematically debias model tendencies towards under/overestimation.
Explicit modeling of volatility regimes, potentially via regime-switching or meta-learning frameworks, for both price and demand forecasts.

Conclusion

This study delivers a rigorous, directly comparable analysis of six learning architectures for short-term price and demand forecasting in the South Australian NEM. Tree-based ensemble models, in particular GBRT, consistently outperform deep recurrent and kernel methods, especially in volatile, high-penetration renewable environments. However, price forecasting remains fundamentally challenging, with all models failing to achieve actionable accuracy on volatile, nonstationary series. Future progress depends on hybrid model innovations, robust data engineering for tail events, and adaptive frameworks that respond to rapid regime changes in both price and demand contexts.

Markdown Report Issue