When multi-agent coordination outperforms single-agent tool use

Ascertain the conditions under which language-model-based multi-agent coordination provides value over single strong language models equipped with tool use, identifying the task properties and architectural configurations that yield multi-agent advantages relative to single-agent baselines.

Background

The authors survey prior claims that multi-agent collaboration universally improves performance and note mixed findings, including reports that benefits diminish with stronger base models. They highlight the lack of a principled framework for predicting when multi-agent coordination is advantageous.

They explicitly state that determining when multi-agent coordination provides value over single strong models with tool use remains empirically open, motivating their controlled evaluation and scaling principles.

References

The question of when multi-agent coordination provides value over single strong models with tool use remains empirically open, with \citet{qian2024scaling}'s proposed scaling laws showing no significant universal pattern \citep{wang2024survey}, motivating our systematic evaluation.

Towards a Science of Scaling Agent Systems (2512.08296 - Kim et al., 9 Dec 2025) in Related Work, Multi-Agent Systems (MAS) versus Single-Agent Systems (SAS)