Safe Bottom-Up Flexibility Provision from Distributed Energy Resources

Published 29 Apr 2025 in eess.SY | (2504.20529v1)

Abstract: Modern renewables-based power systems need to tap on the flexibility of Distributed Energy Resources (DERs) connected to distribution networks. It is important, however, that DER owners/users remain in control of their assets, decisions, and objectives. At the same time, the dynamic landscape of DER-penetrated distribution networks calls for agile, data-driven flexibility management frameworks. In the face of these developments, the Multi-Agent Reinforcement Learning (MARL) paradigm is gaining significant attention, as a distributed and data-driven decision-making policy. This paper addresses the need for bottom-up DER management decisions to account for the distribution network's safety-related constraints. While the related literature on safe MARL typically assumes that network characteristics are available and incorporated into the policy's safety layer, which implies active DSO engagement, this paper ensures that self-organized DER communities are enabled to provide distribution-network-safe flexibility services without relying on the aspirational and problematic requirement of bringing the DSO in the decision-making loop.

Abstract PDF Upgrade to Chat

Summary

The paper proposes a decentralized MADRL framework for DER flexibility that employs a model-free, data-driven safety layer to predict and enforce voltage limits.
It leverages local observations and intertemporal constraints, with a voltage predictor achieving 99.3% variance explained and a mean absolute error of 0.00138 p.u. for robust safety.
Empirical results demonstrate faster convergence and near-OPF optimal performance while guaranteeing zero voltage violations, enabling privacy-preserving DER operations.

Safe Bottom-Up Flexibility Provision from Distributed Energy Resources

Introduction

The paper "Safe Bottom-Up Flexibility Provision from Distributed Energy Resources" (2504.20529) addresses the integration of Distributed Energy Resources (DERs) into modern, renewables-based power systems. Focusing on bottom-up, data-driven decision-making, the work proposes a Multi-Agent Deep Reinforcement Learning (MADRL) framework that ensures distribution network safety constraints are satisfied while allowing DER owners to retain local asset control and privacy. A key innovation is the introduction of a model-free, data-driven safety layer that predicts voltage levels, enabling network-safe flexibility provision without Distribution System Operator (DSO) participation or disclosure of network parameters.

Distributed flexibility through DER coordination is critical for reliable, high-renewable power systems via demand response and ancillary service frameworks. Most conventional approaches rely on centralized optimal power flow (OPF) models or their distributed counterparts and frequently assume full visibility of network topology and parameters. However, model-centric frameworks struggle to adapt to changing system dynamics and scale under high DER penetration. Recent AI research on MARL methods allows for scalable, adaptive policy learning by decentralized agents, but safety constraint violations—such as voltage overshoots—remain a significant issue.

Prior safe MARL approaches either rely on explicit knowledge of grid models ("shielding," physics-informed safety layers) or indirectly penalize constraint violations in the reward function. Such requirements limit policy applicability in realistic scenarios where model parameters are incomplete, privacy is a concern, or DSOs are not involved in real-time dispatch. Moreover, methods enforcing constraints in expectation do not guarantee instantaneous safety, and those embedding penalties need susceptible reward engineering.

Proposed Framework

The paper proposes a bottom-up, decentralized framework wherein each DER agent autonomously coordinates charging/discharging (for energy storage) and load reductions (for buildings) to provide up-regulation services, maximizing a composite economic objective. The key design features are:

Decentralized Agent Control: Each node operates its own assets, deciding actions based solely on local states and historical observations without global network information.
Explicit Handling of Intertemporal Constraints: Agents enforce long-term energy and flexibility usage bounds for both buildings and storage systems.
Model-Free Safety Layer for Voltage Compliance: A data-driven regressor is trained to map joint agent state-action profiles to node voltages, effectively learning the system’s operational manifold without explicit network parameterization.
Safety Layer Projection: At each time step, agents’ joint actions are minimally modified by a projection layer such that predicted voltages (computed by the regressor) remain within operational bounds, guaranteeing constraint satisfaction without active DSO involvement.
Figure 1: Illustration of the safety layer used in combination with the MADDPG networks.

Technical Architecture

System and Decision Model

The MARL framework is formulated as a Constrained Markov Decision Process (C-MDP) where each agent's observation includes only its local energy storage state, net demand, local price signals, and recent flexibility market data. Actions are normalized to ensure that operational limits are enforced by construction, while terminal and cumulative constraints (e.g., for storage state-of-charge and total building reduction) are incorporated via post-action penalties.

The operational constraint most challenging to decentralize is the voltage bound, since explicit physical coupling is unknown to the agents. To address this, the voltage regressor learns to predict real-time voltages from data and, during policy execution, a projection layer ensures any action profile that would push voltages beyond safe bounds is automatically corrected.

MADDPG with Safety Layer

The MADDPG algorithm is implemented with a centralized critic (for stability in policy training) but decentralized actors, each learning only from local observations. The training loop iterates episodes of policy rollout, safety-layer action projection, environment interaction, reward allocation, and neural updates (with separate target networks and replay buffer). The critic uses all agents’ actions and observations, enabling sample-efficient learning despite environment non-stationarity.

Evaluation and Results

Voltage Predictor Performance

A linear multi-output regressor (one output per bus) is trained using simulated power flows under diverse operating conditions. The regressor achieves a mean absolute error of 0.00138 p.u. and explains 99.3% of voltage variance, validating its suitability for safety-layer application. Figure 2 shows the close match between predicted and observed voltage at a test bus over a representative 24-hour period.

Figure 2: Predicted and actual voltage levels for bus 5 over 24 hours.

Safe-MADDPG Training and Benchmarking

The Safe-MADDPG, MADDPG (no safety layer), and MAPPO algorithms are compared using a realistic IEEE 33-bus network testbed with five flexible buildings endowed with PV and storage. Wide-ranging load, flexibility, and price signals are simulated based on real and synthetic data.

Training Results: The Safe-MADDPG framework consistently achieves higher episode rewards and converges faster than baselines. The safety layer guarantees zero voltage violations from initialization, while other methods show lingering constraint violations and require extensive training to minimize them.
Testing Results: Over unseen one-week test data, Safe-MADDPG yields a net benefit within 12% of the omniscient OPF optimum, yet, crucially, maintains strict enforcement of all voltage constraints—zero violations—without knowledge of network topology. In contrast, unconstrained baselines exhibit 1–3 violations and larger optimality gaps.
Figure 3: (a) Episode reward over training (left) and (b) normalized voltage-violation cost over training (right).

Policy Behavior Analysis

Policy traces reveal that Safe-MADDPG is more conservative than the OPF optimum, trading off some immediate flexibility gain to stay robust under uncertainty. Power reduction and energy storage charging/discharging profiles demonstrate a close tracking of the optimal pattern but feature "cautious" flexibility harvesting compared to the aggressive (oracle) OPF, reflecting the algorithm’s prioritization of operational safety.

Figure 4: Comparison of power reduction per building between the OPF solution and the Safe-MADDPG policy for a single day, alongside flexibility price and buy prices.

Figure 5: Comparison of ESS energy levels per building between the OPF solution and the Safe-MADDPG policy for a single day, alongside flexibility and buy prices.

Implications and Future Directions

This research demonstrates that self-organized DER communities can safely and efficiently participate in flexibility markets, without central DSO intervention or disclosure of network configuration. By ensuring constraint satisfaction via a data-driven safety layer, the proposed scheme sidesteps the operational, regulatory, and privacy bottlenecks of traditional grid-aware aggregation. This paradigm is particularly compelling as distribution systems become increasingly heterogeneous and DSO visibility decreases.

Future directions include the extension of the voltage predictor under partial observability (e.g., sparse measurement availability), online regressor adaptation to cope with topology reconfiguration or faults, and integration of multi-period bidding strategies under uncertainty for participation in advanced ancillary service markets.

Conclusion

The paper presents a robust MARL approach for DER flexibility management in distribution networks with rigorous safety guarantees under a fully decentralized architecture. The model-free voltage predictor and safety-layer projection emancipate DER agents from the need for network or DSO information, facilitating scalable, privacy-preserving bottom-up flexibility aggregation. Empirical results evidence resilient operation close to the system-theoretic optimum with strict constraint adherence, advancing the practical feasibility of autonomous grid-supportive DER policies in future power systems.

Markdown