Dice Question Streamline Icon: https://streamlinehq.com

Pareto–Nash equilibria in individual-reward settings with unknown utilities

Establish general algorithms and theoretical guarantees to identify the Pareto–Nash set of joint policies in multi-objective multi-agent decision-making models with individual reward functions and unknown utility functions, including (but not limited to) multi-objective normal-form games, multi-objective stochastic games, and multi-objective partially observable stochastic games. Specifically, characterize conditions for existence and provide methods to compute undominated joint policies when agents’ scalarisation functions are unknown and symmetry or other structural assumptions are not imposed.

Information Square Streamline Icon: https://streamlinehq.com

Background

Within the axiomatic approach for multi-objective multi-agent settings, extending Pareto optimality from team-reward to individual-reward cases leads to the Pareto–Nash equilibrium concept, where each player’s value vector should be Pareto optimal given the opponents’ fixed policies. However, when agents have individual rewards and their utility (scalarisation) functions are unknown, identifying the full set of undominated joint policies becomes challenging.

The paper notes that only limited methods exist for computing Pareto–Nash sets under such general conditions, with prior work typically relying on additional structure (e.g., symmetry) or falling back to team rewards or known utilities (Nash equilibria). A general solution would require algorithms that operate without restrictive assumptions and provide guarantees on existence and computation of the undominated policy set.

References

We note that there is little work so far on the individual reward setting with unknown utility functions, so this more general setting remains an important open challenge in MOMARL.

MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning (2407.16312 - Felten et al., 23 Jul 2024) in Section 3.2 (Solution Concepts), Individual reward setting