Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences

Published 28 Apr 2026 in cs.AI | (2604.25521v1)

Abstract: Cognitive science often evaluates theories through narrow paradigms and local model comparisons, limiting the integration of evidence across tasks and realizations. We introduce an automated adversarial collaboration framework for adjudicating among competing theories even when the candidate models and experiments must be discovered during the adjudication process. The system combines LLM-based theory agents, program synthesis, and information-theoretic experimental design in a closed loop. In a simulation study spanning three classic categorization theories, the framework recovered the ground-truth theory across noise settings with weaker reliability in the hardest settings. Together, the framework and findings provide a concrete proof of concept for closed-loop, in-silico theory adjudication in cognitive science.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces a closed-loop framework using LLM agents, program synthesis, and EIG maximization for automated theory adjudication in cognitive science.
It demonstrates high recoverability for GCM and SUSTAIN under varying noise levels while exposing the fragility of RULEX models in moderate noise.
The approach automates adversarial collaboration, paving the way for scalable theory development and reduced human bias in model evaluation.

Automated Adversarial Collaboration for Cognitive Science Theory Adjudication

Framework Overview

This study presents an integrated, closed-loop framework for automated, in-silico theory adjudication in cognitive science. Leveraging LLM-based theory agents, program synthesis (notably GeCCo), and information-theoretic experimental design, the system iteratively compares and refines competing theoretical models. In this protocol, each agent is aligned with a distinct theoretical claim, provides executable model candidates, and autonomously proposes experiments that maximize theoretical discriminability. The cycle consists of agent-driven debate, experiment proposal and selection via EIG maximization, synthetic data generation, and posterior updating for theory support. Agents may subsequently revise models or claims, enabling dynamic adaptation across the debate cycle.

Figure 1: The closed-loop adversarial collaboration protocol (left) and recovery performance of the adjudication system across noise levels for three theory families (right).

The framework is distinctive in that neither candidate models nor distinguishing experiments are predetermined. Instead, LLM theory agents synthesize models and propose optimal experiments on-the-fly, informed by divergence analyses in model predictions. The system’s modularity permits straightforward replacement of the behavioral data generation process—either with foundation models of human cognition or real-world participant data—creating a generalizable pipeline for theory evaluation and development.

Experimental Evaluation and Numerical Findings

Evaluation was performed on the domain of categorization, targeting three canonical theory families: Generalized Context Model (GCM), Rule plus Exception (RULEX), and SUSTAIN. For each, GeCCo (with Anthropic’s Sonnet-4 as the underlying LLM) instantiated the models and the adversarial loop was tasked with theory recovery under progressively increasing synthetic behavioral noise (lapse rates from $\epsilon = 0$ to $\epsilon = 0.4$ ).

Key empirical findings:

Perfect recovery in noiseless conditions: The framework accurately and unambiguously recovers the underlying theory family when synthetic data contains no randomization.
Robust recoverability for GCM and SUSTAIN under moderate noise: Across all tested lapse rates, GCM was consistently recovered; SUSTAIN had high recoverability except at maximal noise.
Fragility for RULEX models: Even under moderate noise ( $\epsilon > 0.1$ ), the system’s ability to recover RULEX collapsed, with both accuracy and winning margins declining sharply.
Winning margin analysis: Signed margins between candidate theories (Figure 1B) quantify the degree of separation; GCM and SUSTAIN maintain positive separation under most noise regimes, whereas RULEX rapidly loses distinguishability.

These results demonstrate that, while the automated framework is capable of strong, theory-agnostic model recovery, performance is modulated by the inherent recoverability and expressiveness of the synthesized models. In particular, biases in the model synthesis mechanism (GeCCo) may favor certain model families unless the empirical signal is overwhelming.

Theoretical and Practical Implications

Implications for Theory Development

This framework represents a concrete step toward scalable, automated cognitive science—a critical direction given the combinatorial explosion of theoretical, modeling, and experimental possibilities in modern research. By closing the loop between model synthesis, experimental design, and theory adjudication, the approach foregrounds a research paradigm that can systematically search broader theory and experiment spaces, potentially reducing human bottleneck and bias. However, the observed fragility for RULEX models under noise reveals the importance of the model synthesis component; the system currently inherits the expressivity and biases of the generative LLM, which may not adequately span the full spectrum of plausible cognitive models.

Methodological Implications

The use of adversarial collaboration—heretofore a manual, human-driven process—is here automated via LLM agents, driving both model and experiment innovation. The integration of information-theoretic EIG for experiment selection operationalizes an optimal focus on model discriminability. As the simulations utilize full access to ground-truth generating models, the demonstration is strong but idealized; future work integrating cognitive foundation models or empirical data is required for ecological validation.

Prospects for Future AI-Driven Science

Further development could replace or augment model synthesis modules with more powerful foundation models tailored to human-like behavioral data [see discussion in binz2025foundation]. The pipeline’s modularity would also allow adaptation to other domains—e.g., language, memory, reasoning—beyond categorization. As the framework matures, it can serve as a testbed for meta-theoretic questions concerning the structure of model spaces, recoverability under noise, limitations of LLM-driven theory instantiation, and the cumulative integration of evidence across tasks.

Conclusion

The presented automated adversarial collaboration framework constitutes a proof of concept for closed-loop, in-silico theory adjudication in cognitive science. The results highlight both the promise and current limitations of LLM-based program synthesis and adversarial experimental design for scalable theory evaluation. The framework’s capacity to discover, propose, and discriminate among models without predefining the candidate spaces opens new possibilities for developing integrative, robust theories in cognitive science and allied domains. Continued progress will depend on advances in foundation modeling of human behavior and on reducing the synthesis bias inherent in LLMs, with the ultimate goal of fully automating the scientific discovery cycle.

Markdown Report Issue