Fairness-Aware Hierarchical Control Framework

Updated 15 November 2025

The framework is a multi-layer approach that integrates fairness metrics into sequential decision-making by decoupling high-level allocation from real-time execution.
It employs algorithmic techniques including RL, combinatorial optimization, and projected-gradient methods to enforce fairness across agents and tasks.
Applications span wireless resource allocation, traffic management, and federated learning, demonstrating improved equity and scalable multi-objective optimization.

A fairness-aware hierarchical control framework is a structured, multi-level architecture for sequential decision-making in dynamical systems that explicitly encodes fairness among agents, tasks, or system components as a core design constraint at one or more levels of the control or learning stack. Such frameworks integrate fairness metrics—either as objective terms, constraints, or through specialized reward shaping—into the logic of hierarchical optimization, ranging from discrete planning over combinatorial decision spaces to real-time continuous control, with applications spanning wireless resource allocation, traffic management, federated learning, and competitive multi-agent systems. They frequently combine principled fairness objectives (e.g., utility variance, generalized multicalibration, or inequity aversion) with hierarchical decompositions to manage the complexity of multi-objective optimization in large-scale, safety- or latency-critical domains.

1. Hierarchical Structure and Decoupling of Fairness Objectives

Fairness-aware hierarchical control frameworks typically decompose the system into at least two interacting layers, each responsible for a class of decisions at a distinct temporal or logical scale:

Top Layer (Planning/Allocation): Executes discrete, combinatorial decisions such as the assignment of control authority (vehicle scheduling at intersections (Shi et al., 8 Nov 2025)), pairing and clustering in federated or multi-agent systems (Huang et al., 2024), or enforcing admission and prioritization rules in competitive resource environments. Fairness mechanisms—such as the inequity-aversion utility in traffic (Shi et al., 8 Nov 2025), coefficient-of-variation penalties (Jiang et al., 2019), or group-risk constraints (Zhang et al., 2024)—are encoded directly into the decision criterion or allocation logic.
Bottom Layer (Execution/Tracking): Manages continuous or fast timescale decisions, typically tracking reference trajectories, executing local policies, or implementing refined safety or efficiency corrections. Normative control (e.g., LQR, HOCBF (Shi et al., 8 Nov 2025)) or learning-based policies handle real-time environmental response under the allocation from the top layer, while respecting the fairness-constrained system envelope.

This vertical separation enables strict fairness guarantees at ingress points (authority assignment, aggregation weighting), with fast feedback and potential correction of small-scale unfairness at the execution layer.

2. Mathematical Formalism and Fairness Metrics

Frameworks operationalize fairness using metrics suited to the problem structure and agent interactions:

Variance-based fairness (e.g., coefficient of variation of per-agent long-run utility (Jiang et al., 2019)):

$CV(\vec{u}_t) = \sqrt{\frac{1}{n-1} \sum_{i=1}^n \frac{(u^i_t - \overline{u}_t)^2}{\overline{u}_t^2}}$

Alignment is sometimes decomposed for decentralized optimization, approximating joint variance minimization by local squared deviations.

Generalized multi-dimensional calibration for multi-group fairness (Zhang et al., 2024):

$\mathbb{E}_D\left[\langle s(f,x,h,y; D),\, g(f(x), x) \rangle\right] \le \alpha, \; \forall g \in \mathcal{G}$

where $s$ and $g$ encode the task-specific residuals and group selection.

Inequity aversion, adapting Fehr–Schmidt-style penalties (Shi et al., 8 Nov 2025):

$\mathrm{IAU}_i(t) = p_i(t) - \frac{\beta_1}{N-1} \sum_{j\neq i} \max(p_j(t) - p_i(t),0) - \frac{\beta_2}{N-1} \sum_{j\neq i} \max(p_i(t) - p_j(t),0) + \delta v_i(t)$

with $p_i(t)$ summarizing individual performance factors (queueing delay, urgency, historical access).

Dynamic reward shaping in deep RL-based hierarchical federated learning (Huang et al., 2024):

$r(t) = \frac{\alpha}{F} \sum_{f=1}^F \frac{\gamma_f(t)}{\epsilon_{c1}\epsilon_f} + \frac{1-\alpha}{F} \sum_{f=1}^F \left[\frac{\gamma_f(t)}{\epsilon_{c2}\epsilon_f} + \left|\frac{1}{F}\sum_{j=1}^F \frac{\gamma_j(t)}{\gamma_f(t)} - 1\right|\right]$

balancing short-term task performance with long-term fairness across heterogeneous tasks.

3. Algorithmic Approaches: Hybrid and Hierarchical Optimization

Three algorithmic paradigms are prominent:

Centralized combinatorial allocation (authority scheduling or task-to-agent assignment). For instance, at intersections, the eligible vehicle set is filtered by recent-access constraints, and the vehicle maximizing a fairness-oriented IAU is chosen per-step (Shi et al., 8 Nov 2025). In the federated edge learning setting, pairings, path-planning, and aggregation weights are part of a hybrid discrete-continuous action vector optimized via distributional actor–critic DRL (Huang et al., 2024).
Distributional Soft-Actor-Critic with Hybrid Action Decoupling (Huang et al., 2024): Actions are split into discrete (allocation, clustering, routing) and continuous (weight, trajectory, aggregation) spaces:

$a(t) = \{A_d,\,A_c\},\;\quad \pi_\phi(a|s) = \pi_\phi^{A_d}(a^{A_d}|s)\; \pi_\phi^{A_c}(a^{A_c}|s)$

Learning and optimization are performed on each component followed by a MAP recoupling under KL constraints.

Decentralized multi-agent RL with hierarchical architectures. The Fair-Efficient Network (FEN) paradigm (Jiang et al., 2019) equips each agent with a policy hierarchy: a top-level controller selects among low-level sub-policies—some focused on exploitation, others on exploration or diversity—using per-agent fair-efficient rewards. Local consensus (gossip) mechanisms allow fully decentralized learning of global fairness.
Projected-gradient methods for fairness calibration (Zhang et al., 2024): By casting fairness targets as linear constraints on a potential functional, a simple iterative projective update enforces fairness over high-dimensional group or hierarchical error spaces (cf. hierarchical classification or image segmentation).

4. Case Study Applications

Domain	High-Layer Fairness Principle	Hierarchical Structure
SAGIN-based HFL (Huang et al., 2024)	Dynamic task fairness via RL reward	UAV–satellite–ground, DRL on pairing/weight/trajectory
Real-time CAV intersection (Shi et al., 8 Nov 2025)	Fehr–Schmidt inequity aversion	Control allocation (top), LQR/HOCBF (bottom)
Multi-agent jobs/plant (Jiang et al., 2019)	CV, fair-efficient reward	Controller/sub-policies, decentralized PPO
Hierarchical classification (Zhang et al., 2024)	Generalized multi-group calibration	Post-processing calibration, tree-structured groupings
Autonomous racing (Thakkar et al., 2022)	Hard lane/safety constraints, soft MARL penalties	High-level waypoints/game, low-level RL/LQNG

In federated edge learning over SAGIN (Huang et al., 2024), a hybrid hierarchical DRL agent jointly selects cluster assignments, trajectory plans, and HFL aggregation weights, using a dynamically adaptive reward function that penalizes both poor average task performance and fairness deviation. This enables convergence to balanced accuracy, mitigating adverse effects of non-IID data distributions or fleeting communication windows.
Connected vehicle intersection management (Shi et al., 8 Nov 2025) implements a two-layer system: centralized fair allocation according to history-sensitive measures (recent control, urgency, waiting time), and decentralized, real-time LQR tracking with formal quadratic-program safety filtering to guarantee collision avoidance and policy compliance. High fairness (Jain’s Index ≈ 0.98) and throughput gains (2.4× improvement) are demonstrated in simulation.
Multi-agent resource domains such as grid-based job scheduling (Jiang et al., 2019) deploy FEN’s per-agent hierarchical learning with agent-gossip to align local and global fairness, achieving substantial utility variance reduction with minimal loss in system throughput.
Multi-group calibration in hierarchical classification (Zhang et al., 2024) is formalized via (s,G,α)-GMC, leading to iterative post-processing updates that guarantee group-wise bounds for false-negative rate or prediction-set conditional coverage in hierarchical structures.

5. Evaluation Methodologies and Empirical Results

Evaluation typically considers both system efficiency (throughput, accuracy, convergence) and explicit fairness metrics:

Statistical fairness indices: Jain’s Index, Gini coefficient, coefficient of variation of utility, minimum/maximum agent utility, deviation-from-target coverage.
Task-specific metrics: Average delay, convergence speed, violation rates of fairness constraints (e.g., illegal lane changes (Thakkar et al., 2022)), accuracy distribution across tasks.
Empirical benchmarks: Comparison against non-fair, min-oriented, or naïve RL or optimization baselines.

For example, in (Huang et al., 2024), H-DSAC achieves ~91.4% average task accuracy in 100s (compared to 85–90% for baselines), with 10–15% higher accuracy for slow-converging tasks attributed to the fairness shaping in the RL reward. Intersection control (Shi et al., 8 Nov 2025) yields JFI ≈ 0.98 versus 0.93 (all-way-stop), and zero safety violations across broad demand and heterogeneity conditions. FEN achieves CV ≈ 0.17 in job scheduling (vs. CV ≈ 1.57 for independent agents) and both high resource utilization and fairness in all tested domains (Jiang et al., 2019).

6. Challenges, Scalability, and Extensions

A key advantage is the ability of hierarchical frameworks to decompose computationally intractable mixed-integer, non-convex optimization (e.g., trajectory pairing, cluster assignment, resource scheduling) into manageable subproblems, allowing real-time feasibility via fast combinatorial searches or parallelized local control.

Notable challenges include:

Scalability: Although per-step logic is efficient (often $\mathcal O(N)$ for $N$ agents), some frameworks rely on single-agent-at-a-time allocation, which may limit absolute throughput in under-loaded regimes (Shi et al., 8 Nov 2025).
Reliance on assumptions: Some decentralized approaches depend on fast consensus via gossip, which may be slow or fail in sparse or unreliable networks (Jiang et al., 2019).
Extensibility: Current frameworks focus on fixed fairness criteria; extensions to adaptive, user-specific, or context-adaptive fairness calibration (via learned weightings or multi-objective criteria) are logical next steps.
Generalization limitations: In some domains (e.g., autonomous racing (Thakkar et al., 2022)), residual fairness violations persist if low-level planners are insufficiently aligned with high-level rules.

A plausible implication is that future hierarchical fairness-aware control systems will integrate dynamic, data-driven fairness calibration, exploit more expressive low-level planners (e.g., deep RL with explicit fairness regularization), and support compositional architectures for interconnected systems (e.g., networked intersections or federated clusters across administrative domains).

7. Significance and Generalization Across Domains

The fairness-aware hierarchical control paradigm unifies approaches from federated learning, real-time traffic management, decentralized multi-agent systems, and calibrated ML post-processing. All share a commitment to decomposing fairness-constrained decision-making into tractable, layered structures, enabling provable, empirically validated guarantees on both performance and equity—critical for scalable socio-technical systems. Common features such as explicit fairness-oriented objectives, metric-driven architecture, and systematic algorithmic decoupling suggest broad applicability to emerging fairness- and safety-critical domains in networked autonomy, distributed AI, and intelligent infrastructure.