Degenerate equilibria in self-contained co-evolving agent–coach systems
Determine whether a fully self-contained multiagent system in which the agents and a trainable coach co-evolve without any external supervision (such as meta-evaluation from a stronger external model, agreement with outcome-based verification, or human feedback) can avoid converging to degenerate equilibria.
References
Whether a fully self-contained system---where agents and coaches co-evolve without external supervision---can avoid degenerate equilibria remains an open question.
— Scaling Multiagent Systems with Process Rewards
(2601.23228 - Li et al., 30 Jan 2026) in Section: Future Directions, subsection "Trainable Coach"