Adapting reinforcement learning to deliberative policy negotiations
Ascertain the methodological adaptations required to apply common reinforcement learning algorithms, including Q-learning, to dynamic and complex multi-agent policy negotiations and deliberation procedures, and demonstrate their suitability in non-stationary social decision-making contexts.
References
Although, so far, it is far from clear how common learning algorithms, for instance Q-learning, can be adapted to dynamic and complex policy negotiations and deliberation procedures.
                — Artificial Utopia: Simulation and Intelligent Agents for a Democratised Future
                
                (2503.07364 - Oswald, 10 Mar 2025) in Section 3.4 (Reinforcement learning)