Fair-GNE: Adaptive Fairness in MARL

Updated 25 November 2025

Fair-GNE is a framework that integrates Jain's fairness index into multi-agent reinforcement learning to ensure equitable workload distribution.
It uses a decentralized primal-dual algorithm to adaptively enforce fairness constraints while optimizing collective utility in a generalized Nash equilibrium setting.
Empirical results demonstrate that Fair-GNE boosts fairness metrics by 168% while maintaining competitive task success rates in complex multi-agent simulations.

The Fair-GNE (Generalized Nash Equilibrium-Seeking Fairness) model establishes a principled framework for incorporating rigorous, adaptive, and self-enforceable fairness into multi-agent reinforcement learning (MARL) systems. Positioned at the intersection of game theory, constrained optimization, and learning in cooperative multi-agent systems, Fair-GNE specifically addresses workload equity by embedding Jain’s fairness index as a shared constraint within a generalized Nash equilibrium (GNE) game. This approach prescribes a collective equilibrium whereby no agent can unilaterally improve its utility, ensuring a workload allocation that is not only efficient but certifiably fair and robust to individual deviation at runtime (Ekpo et al., 18 Nov 2025).

1. Mathematical Structure and Problem Setting

Fair-GNE is formulated as a constrained multi-agent Markov game with $n$ agents, each with primitive finite action spaces ( $|\mathcal{A}_i|=8$ in empirical studies) operating over a high-dimensional, factored state space. The environment dynamics are governed by the transition kernel $T(s_{t+1}|s_t, \mathbf{a}_t)$ , and reward is supplied as the differential in a task-progress metric, $r(s_t, \mathbf{a}_t) = H(s_t) - H(s_{t-1})$ , with discount factor $\gamma \in (0,1)$ .

Each agent accumulates a workload counter $w_{i,t}$ based on completed subtasks. To quantify workload allocation equity, Jain’s fairness index is employed: $\mathsf{F}(w_t) = \frac{(\sum_{i=1}^{n} w_{i,t})^2}{n\sum_{i=1}^n w_{i,t}^2} \in [0,1],$ where $\mathsf{F}=1$ indicates perfect equity, and lower values indicate disparity. The fairness constraint is implemented via a trajectory-level relaxation of the per-timestep fairness threshold $\tau$ : $\bar{g}(\pi) = \mathbb{E}_\pi\Big[\sum_{t=0}^\infty \gamma^t(\tau - \mathsf{F}(w_t))\Big] \leq 0.$ Policy generation is modeled as a GNE problem, where each agent solves: $\max_{\pi_i\in\Pi_i} J(\pi_i, \pi_{-i}) \quad \text{subject to} \quad \bar{g}(\pi_i, \pi_{-i}) \leq 0,$ with $J(\pi) = \mathbb{E}\big[\sum_{t=0}^\infty \gamma^t r(s_t, \mathbf{a}_t)\big]$ being identical for all agents due to the potential game structure.

2. Algorithmic Solution: Primal-Dual MARL

Fair-GNE employs a decentralized primal-dual algorithm. The primal step updates agents’ policies using any MARL backbone (e.g., QMIX, IPPO, MAPPO), with shaped rewards incorporating the Lagrange multiplier $\lambda$ : $\tilde{r}_t = r(s_t, \mathbf{a}_t) - \lambda^{(k)}(\tau - \mathsf{F}(w_t)).$

The dual step updates $\lambda$ by projected gradient ascent based on constraint violation: $\lambda^{(k+1)} = \mathrm{clip}(\lambda^{(k)} + \eta_\lambda \bar{g}^{(k+1)}, 0, \lambda_{\mathrm{max}}),$ where $\bar{g}^{(k+1)}$ is the empirical discounted constraint violation averaged over $M$ rollouts.

The overall scheme alternates between these two updates, with the dual variable responding dynamically to the degree of violation or satisfaction of the fairness constraint. This adaptive update mechanism ensures the collective policy remains close to the solution of the Karush-Kuhn-Tucker (KKT) system corresponding to the GNE with fairness constraint.

3. Theoretical Guarantees

Under standard convexity, compactness, and Slater’s condition, a stationary policy–multiplier pair $(\pi^*, \lambda^*)$ can be shown to satisfy the KKT conditions of the Lagrangian: $\mathcal{L}(\pi, \lambda) = J(\pi) - \lambda \, \bar{g}(\pi)$ which implies that $\pi^*$ is a stationary Markov GNE (SM-GNE).

With a finite policy space and exact solution to each primal update, the two-timescale stochastic approximation (with $\eta_\lambda \ll \eta_\pi$ ) ensures almost sure convergence of the iterates to a saddle point corresponding to constraint-satisfying (self-enforcing) fairness (Ekpo et al., 18 Nov 2025).

4. Empirical Performance and Comparative Results

Within the MARLHospital simulator, consisting of three specialized agents cooperating in a basic life-support task (50-timestep episodes, state dimension 174, discrete action space size 512), Fair-GNE demonstrates substantial empirical advantages.

Method	Task Success	$\lambda$	Workload JFI	Constraint Sat.	KKT Sat.
Gini ( $\lambda=0$ )	0.83 ± 0.00	0	0.33 ± 0.00	–	–
Gini ( $\lambda=50$ )	0.96 ± 0.01	50	0.33 ± 0.00	–	–
Fair-GNE ( $\tau=0.85$ )	0.86 ± 0.05	19.0 ± 0.4	0.89 ± 0.09	0.88	0.95

Fair-GNE increases the Jain’s fairness index by 168% over the best fixed-penalty baseline (0.89 vs 0.33, $p=0.0082$ ) while maintaining 86% task success ( $p=0.49$ vs QMIX). Lowering $\tau$ modulates the trade-off between fairness and task efficiency, but all tested settings outperform fixed-penalty alternatives.

5. Interpretation, Mechanisms, and Extensions

Fair-GNE’s core innovation is adaptive constraint enforcement. The dual variable $\lambda$ rises automatically when fairness drops below the specified threshold $\tau$ , thereby intensifying the fairness penalty, and decreases when fairness is achieved. This in-situ adjustment yields certified, self-sustaining fairness at runtime. This contrasts sharply with conventional post hoc reward shaping, which does not guarantee runtime enforceability or constraint satisfaction.

The framework can be extended by:

Incorporating additional fairness constraints, such as skill–task alignment.
Introducing alternative fairness metrics (e.g., Gini, max-min) via smooth surrogate constraints.
Enabling workload equity in mixed human–AI teams.
Deriving finite-sample statistical bounds in the presence of function approximation.
Investigating equilibrium uniqueness in complex or overparameterized models.

6. Relation to Broader GNE-Based Fairness Research

Fair-GNE operationalizes fairness in MARL by selecting equilibria that satisfy user-specified, quantitative fairness constraints at the policy level. In contrast, broader GNE notions such as the variational GNE (v-GNE) ensure fairness only under stringent cost comparability assumptions, and do so by enforcing a uniform shadow price across agents. Recent work has highlighted the fragility of this approach when agents’ cost functions are not cardinally comparable, and proposes the “f-GNE” solution concept, which selects a GNE by optimizing a pre-chosen fairness metric (e.g., max-min, Nash bargaining) over the set of all GNEs (Hall et al., 4 Apr 2025). Thus, Fair-GNE’s explicit, metric-driven fairness aligns with approaches that expose the fairness criterion to the designer and treat global equitable allocation as a constraint or selection principle, rather than an emergent property of undifferentiated coupling.

7. Significance and Impact

Fair-GNE advances the state of the art in certified runtime fairness for MARL in complex, resource-constrained environments. By embedding fairness notions directly into the policy optimization and equilibrium selection process, it enables robust and adaptive workload balance, demonstrably improving group-level equity without compromising task efficiency. This model establishes a generalizable framework for equity in cooperative AI systems, with potential applicability to any multi-agent system governed by shared-resource constraints and group-level goals (Ekpo et al., 18 Nov 2025).