FairMarket-RL: Fair P2P Electricity Markets
- Peer-to-peer electricity markets are decentralized systems where distributed energy resources trade locally, and FairMarket-RL leverages MARL with an LLM critic to ensure fairness.
- The framework employs explicit fairness metrics—FTG for grid reliance, FBS for seller equity, and FPP for pricing fairness—to balance economic and social objectives.
- Empirical evaluations demonstrate that LLM-guided reward shaping enhances local trading, reduces consumer costs, and sustains fairness metrics above 0.8 across diverse market conditions.
Peer-to-peer (P2P) electricity markets are rapidly advancing as distributed energy resources such as rooftop photovoltaics and home energy management systems become pervasive in residential grids. However, conventional market structures and reinforcement learning (RL)-driven bidding protocols have predominantly focused on optimizing economic efficiency or private profit. Such designs lack explicit mechanisms to guarantee equity or procedural fairness among market participants, especially under uncertainty and partial observability. The FairMarket-RL framework introduces a scalable, fairness-aware multiagent reinforcement learning (MARL) system in which a LLM operates as a fairness critic. This LLM-based critic computes three explicit fairness metrics during ongoing market operation and injects fairness-driven guidance into the reward function through parameterized reward shaping, thereby balancing economic and social objectives.
1. FairMarket-RL Architecture and LLM Critic Integration
FairMarket-RL consists of a population of RL agents representing buyers and sellers within a @@@@1@@@@ for local electricity exchange. Each agent acts under partial observability and selects from discrete price-quantity bids at each trading slot. The central novelty lies in the integration of an LLM critic in the closed loop: after each trading slot, the LLM receives raw transaction traces and state summaries. It then computes and outputs normalized scalar measures corresponding to three fairness dimensions:
- Fairness-to-Grid (FTG): Evaluates the share of total energy transacted locally versus procured from the external grid.
- Fairness-Between-Sellers (FBS): Quantifies equitable opportunity and participation among sellers.
- Fairness-of-Pricing (FPP): Assesses the distributional fairness of settled prices relative to all participant valuations.
The LLM thus acts as an adaptive, socially aware reward signal generator, augmenting or modulating the native economic reward in each agent’s policy update.
2. Fairness Metric Definitions and LLM Normalization
The three fairness metrics computed by the LLM are defined formally as follows, with all values normalized to :
- Fairness-to-Grid (FTG):
where is the total energy procured from the grid and is the system-wide total energy consumed in the trading slot.
- Fairness-Between-Sellers (FBS):
where is the sales volume for seller , the average sales per seller, and the number of sellers; the denominator provides proper normalization.
- Fairness-of-Pricing (FPP):
with the settled price for trade and the fair price reference (e.g., median of all acceptable valuations).
Each output is clipped (if necessary) and normalized by the LLM to ensure scalar ranges, supporting direct inclusion in reward modifications.
3. Reward Aggregation: Economic-Fairness Composition
The FairMarket-RL reward shaping function combines the base economic reward with the LLM-generated normalized fairness scores through a set of tunable, ramped mixing coefficients and global scaling factor :
- The coefficients (possibly scheduled over training epochs) allow the fairness guidance to be ramped up (or down) to avoid overwhelming underlying market incentives.
- globally controls the influence of fairness feedback; set by practitioner based on desired social/economic balance.
This architecture preserves reward interpretability and tunability, supporting ablation and sensitivity analysis.
4. LLM Inference Protocol, Output Normalization, and Stability Controls
Following each trading slot, the transaction archive for that slot (including bid/ask distributions and settlement details) is summarized, then submitted to the LLM critic in a zero-shot or few-shot prompt template explicitly requesting fairness evaluation. The LLM returns triple scalar outputs . These scores undergo:
- Normalization: Explicit clipping/scaling to to avoid accidental reward domination.
- Integration: Immediate injection into the next policy update via the economic-fairness composition.
- Policy-Update Stability: Physical constraints are enforced to prevent infeasible outcomes (e.g., trade mismatch, price bounds), and reward shaping coefficients (especially ) are bounded to prevent destabilizing the RL learning rate or inducing oscillatory learning.
Such safeguards are necessary to ensure stable MARL convergence under augmented, dynamic reward landscapes.
5. Empirical Evaluation and Impact of LLM-Guided Reward Shaping
Experimental results span progressive deployments:
- Pilot and Community-Scale Simulations: Demonstrate that LLM-guided fairness shaping consistently increases local trading fractions, decreases mean consumer costs relative to grid-only procurement, and exhibits sustained fairness metrics (maintained FBS, FPP, and FTG > 0.8 across runs).
- Real-World Asset Dataset: Validates robustness—policies shaped by LLM guidance achieve both high social equity and utility viability.
- Sensitivity Studies: Analyze impact of solar availability and aggregate demand fluctuations, revealing that the framework remains robust to transfer-set shifts and variable supply/demand regimes.
Ablations comparing RL with and without the LLM critic indicate that the inclusion of LLM-based fairness rewards engenders: (i) a shift toward local, socially optimal exchanges, (ii) reduced profit/utility disparities between participants, and (iii) sustained overall system efficiency. The architecture successfully avoids the pathologies often associated with hard rule-based fairness constraints or naïvely additive fairness penalties.
6. Scalability, Limitations, and Outlook
FairMarket-RL operates with scalable LLM inference: the LLM critic is invoked per trading slot, incurring communication overhead but leveraging advances in local LLM deployment for tractability. The modular reward shaping interface generalizes to expanded communities and diverse asset mixes, as validated on large, mixed-asset datasets. However, the framework’s efficacy relies on the ongoing calibration of fairness–economic trade-offs and the empirical validity of LLM-generated fairness metrics under domain shift and participant heterogeneity. Additional research may address automated adaptation of and , and extensions to other sectors where procedural fairness is critical under decentralized operation.
The work establishes a practical, scalable paradigm for integrating LLM-informed fairness shaping into multiagent market-RL, achieving Pareto improvements across economic, technical, and equity objectives (Jadhav et al., 26 Aug 2025).