A Principled Framework to Evaluate Quality of AC-OPF Datasets for Machine Learning: Benchmarking a Novel, Scalable Generation Method

Published 26 Aug 2025 in eess.SY and cs.SY | (2508.19083v1)

Abstract: Several methods have been proposed in the literature to improve the quality of AC optimal power flow (AC-OPF) datasets used in ML models. Yet, scalability to large power systems remains unaddressed and comparing generation approaches is still hindered by the absence of widely accepted metrics quantifying AC-OPF dataset quality. In this work, we tackle both these limitations. We provide a simple heuristic that samples load setpoints uniformly in total load active power, rather than maximizing volume coverage, and solves an AC-OPF formulation with load slack variables to improve convergence. For quality assessment, we formulate a multi-criteria framework based on three metrics, measuring variability in the marginal distributions of AC-OPF primal variables, diversity in constraint activation patterns among AC-OPF instances and activation frequency of variable bounds. By comparing four open-source methods based on these metrics, we show that our heuristic consistently outperforms uniform random sampling, whether independent or constrained to a convex polytope, scoring as best in terms of balance between dataset quality and scalability.