Mechanisms of Success in Multi-Agent Debate (MAD)
Determine the exact mechanisms and conditions under which Multi-Agent Debate (MAD) with large language models succeeds, distinguishing whether performance gains arise primarily from scaling test-time compute or from emergent capabilities produced by specific combinations of agent personas, response generators, discussion paradigms, and decision protocols.
References
"Yet, we have not understood the exact mechanisms of when and why MAD is successful. Different hypotheses exist around whether MAD is another way to scale test-time compute, or whether the combination of individual components has emergent capabilities."
— MALLM: Multi-Agent Large Language Models Framework
(2509.11656 - Becker et al., 15 Sep 2025) in Section 1, Introduction