AutoCog: Automated Cognitive Scientist

Updated 4 July 2026

AutoCog is a family of closed-loop systems that formalize scientific inquiry through iterative cycles of hypothesis formulation, experiment design, and theory revision.
These systems use agentic decomposition with explicit model representations and methods like Bayesian optimization and simulation-based separability to refine theories.
Empirical demonstrations show substantial improvements in model accuracy and sample efficiency across cognitive science and domain-agnostic applications.

Automated Cognitive Scientist (AutoCog) denotes a family of closed-loop systems that formalize scientific inquiry as an executable cycle of hypothesis formation, model construction, experiment design, data acquisition or simulation, evaluation, and revision. In recent work, the label appears both as the name of fully autonomous theory-discovery systems in cognitive science and as a domain-agnostic abstraction distilled from AI scientists in materials, urban science, and related empirical fields. Across these instantiations, AutoCog is characterized by agentic decomposition, explicit model representations, iterative experimentation, and feedback mechanisms that convert failures of current theories into candidate successors (Jagadish et al., 24 Jun 2026, Chandramouli et al., 28 Apr 2026, Ni et al., 2024).

1. Genealogy and conceptual scope

Important antecedents predate the recent AutoCog nomenclature. The “Automatic Neuroscientist” framed experiment design as online optimization over task parameters in real-time fMRI, with one study using SPSA and another using Gaussian-process Bayesian optimization; in the first study, 11/14 runs converged in 4–13 iterations with mean approximately 9, and in the second the predicted optimum lay within Euclidean distance $1.48\pm0.87$ of the hypothesized target (Lorenz et al., 2015). In intelligent tutoring, CogRL operationalized automatic cognitive model discovery from raw stimuli and correct answers alone, using a $d=50$ -dimensional pre-output representation and a fixed threshold $\tau=0.95$ to derive a binary Q-matrix without student-performance data during discovery (Chaplot et al., 2018). In parallel, “Towards a Science Exocortex” proposed a swarm of specialized agents coordinated by an orchestration kernel and message bus, explicitly casting scientific work as distributed cognition over literature, hypotheses, experiments, data, and knowledge mapping (Yager, 2024). “The AI Scientist” then extended this logic to full-loop research production—idea generation, code writing, experimentation, paper drafting, and automated review—at a reported cost of less than $\$15$ per paper (Lu et al., 2024).

By 2026, AutoCog had become an explicit research program in computational cognitive science. “Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences” introduced an automated framework for adjudicating among competing theories while allowing both models and experiments to be discovered during adjudication (Chandramouli et al., 28 Apr 2026). “auto-psych” implemented nested discovery loops in which one loop fit and critiqued probabilistic cognitive models while an outer loop designed experiments, collected human data through crowdsourced survey experiments, and analyzed the results (Prystawski et al., 24 Jun 2026). ATLAS cast the same problem as active theory learning with interpretable mechanistic models and experiment selection optimized for model discrimination (Éltető et al., 10 Jun 2026). “Closing the Loop to Discover Psychological Theories with an Automated Cognitive Scientist” made the strongest claim of full closure: large-language-model agents advocated competing theories, designed experiments, collected online behavioral data, scored theories generatively, diagnosed failures, and synthesized better successors (Jagadish et al., 24 Jun 2026).

2. Recurrent architectural patterns

Despite differences in domain and implementation, the literature converges on a modular control structure. In the 2026 AutoCog system for psychological theory discovery, the loop is organized into an Initializer, Experiment Design, Data Collection, Theory Scoring and Analysis, Arbitration, and Theory Revision. The initializer specifies the task domain, experiment-design space, model-design interface, and seed theories; subsequent modules iteratively discriminate, test, and replace theories (Jagadish et al., 24 Jun 2026). A closely related decomposition appears in automated adversarial collaboration, where the system has three core components: Theory Agents, a Program Synthesis Engine, and an Experimental Design Component, with model pools, candidate experiments, Bayesian belief updates, and explicit feedback from adjudication back to the agents (Chandramouli et al., 28 Apr 2026).

Other formulations emphasize different module boundaries but preserve the same logic. One end-to-end AutoCog blueprint consists of a Paradigm Generator, Behavioral Simulator, Model Synthesizer, and Interestingness Critic, with the dataflow $E_k \rightarrow D_k \rightarrow M_k \rightarrow I_k$ looping back to paradigm generation (Jagadish et al., 22 Mar 2026). A more domain-agnostic architecture recast from MatPilot divides the system into a Cognition Module and an Execution Module, instantiated by four core agents: Agent H for hypothesis generation, Agent D for experiment design, Agent M for model optimization, and Agent E for execution control, all coordinated by an orchestration bus that aggregates outputs into a “Plan of Record” and exposes approval or override points to a human (Ni et al., 2024).

Memory-rich variants add a second axis of organization: separation between memory storage and agentic execution. MirrorMind implements a three-level hierarchy with Individual Level memory, Domain Level concept graphs, and an Interdisciplinary orchestration layer; memory stores hold episodic, semantic, and persona representations, while thin LLM-driven agents load only relevant slices on demand (Zeng et al., 21 Nov 2025). AI Urban Scientist and the Science Exocortex describe a similar blackboard-style pattern, in which specialized agents exchange typed messages over a shared bus, and convergence is defined in terms of hypothesis stabilization or downstream result stability rather than a single terminal answer (Xia et al., 26 Nov 2025, Yager, 2024). This suggests that AutoCog is less a single algorithm than a systems pattern: executable representations plus agent specialization plus persistent state plus iterative control.

3. Theory and hypothesis representation

A central technical choice in AutoCog is the representation of candidate theories. In the strongest cognitive-science instantiations, each theory is simultaneously a human-readable description and an executable program. The 2026 AutoCog system represents a theory by a natural-language description, a Python predict(parameters, state, history)\rightarrow\mathbb{R}^K function, a policy(probs)\rightarrow action mapping, and declared parameter ranges. In the multi-cue decision setting, canonical seeds included Weighted Additive, Take-The-Best, and Tallying, while surfaced successors added transforms such as non-linear subjective weighting and concave utility (Jagadish et al., 24 Jun 2026). For example, the Diminishing-Returns WADD theory applies

$u(x)=(x+\delta)^{\alpha}-\delta^{\alpha}, \quad 0<\alpha\le 1,\;\delta>0,$

before weighted summation, making marginal sensitivity decrease with cue magnitude (Jagadish et al., 24 Jun 2026).

Automated adversarial collaboration adopts a more explicitly program-synthesis-centric representation. There, theory agents emit candidate model code in a lightweight Lisp-style DSL via the GeCCo framework, which compiles to Python or Julia. Internally, theories are parameterized probabilistic programs that return $\Pr(\text{response}\mid x,\theta)$ , and synthesis constraints require valid probability distributions, differentiability or direct likelihood evaluation, and termination in $\le 100\,\mathrm{ms}$ per trial (Chandramouli et al., 28 Apr 2026). auto-psych follows the same executable-theory principle but implements each theory as a small PyMC probabilistic program that computes summary features of a stimulus sequence, maps them to a latent “randomness score,” and then applies a logistic choice rule (Prystawski et al., 24 Jun 2026).

A separate line of work treats representation learning itself as theory induction. CogRL trains a supervised neural model $f_\theta$ on raw problem content, interprets the pre-output layer $\mathbf{h}\in\mathbb{R}^{50}$ as a latent skill space, and thresholds each dimension to derive a Q-matrix of discovered Knowledge Components (Chaplot et al., 2018). ATLAS occupies an intermediate position: its “theories” are sparse, interpretable Disentangled RNNs, whose bottleneck units and KL-regularized gating noise yield a computational graph intended to approximate a mechanistic model rather than a purely predictive black box (Éltető et al., 10 Jun 2026).

Recent work also modifies the objective used to search over theory space. ASMR defines per-trial scientific regret relative to a foundation model of human cognition, Centaur,

$d=50$ 0

and iteratively prompts a reasoning model with the gap trials on which the current interpretable model underperforms (2505.17661). “Think-Aloud Reshapes Automated Cognitive Model Discovery Beyond Behavior” adds language traces as an auxiliary constraint, optimizing

$d=50$ 1

so that models are penalized not only for poor behavioral likelihood but also for misalignment with trial-level verbal process traces (Xie et al., 6 May 2026). A plausible implication is that AutoCog is moving from behavior-only reverse engineering toward multimodal mechanism discovery.

4. Experiment design, execution, and closed-loop updating

Experiment design is where AutoCog most clearly departs from static model comparison. In automated adversarial collaboration and auto-psych, design is explicitly information-theoretic. auto-psych scores candidate stimuli by expected information gain over the current posterior on models,

$d=50$ 2

then greedily selects the highest-scoring stimuli before launching a new human experiment (Prystawski et al., 24 Jun 2026). ATLAS uses the same principle but approximates EIG by committee disagreement across an ensemble of DisRNNs over candidate reward matrices in two-armed bandits, optimizing designs by bit-flipping hill-climb with multiple random restarts (Éltető et al., 10 Jun 2026). Automated adversarial collaboration similarly evaluates candidate experiments by EIG under current beliefs over candidate theories, followed by Bayesian updating after observing data (Chandramouli et al., 28 Apr 2026).

The 2026 AutoCog system uses a different but closely related discriminative strategy. Each proposer agent generates a candidate experiment $d=50$ 3 and a metric $d=50$ 4, simulates both theories on $d=50$ 5, and accepts the pair only if a Welch two-sample $d=50$ 6-test at $d=50$ 7 reaches $d=50$ 8. Accepted experiments are then run with $d=50$ 9 human subjects on Prolific, and theories are scored by the discrepancy between empirical and simulated metrics across all accumulated experiments (Jagadish et al., 24 Jun 2026). This replaces closed-form utility with simulation-based separability. It remains active experimentation, but with theory agents proposing their own diagnostic tests rather than optimizing a single external acquisition function.

A third design family uses surrogate models and Bayesian optimization. The MatPilot-derived AutoCog blueprint fits a surrogate

$\tau=0.95$ 0

to past data and chooses the next point by maximizing an acquisition function $\tau=0.95$ 1, with Bayesian optimization implemented via expected improvement and next-point selection $\tau=0.95$ 2 (Ni et al., 2024). The Automatic Neuroscientist applied the same principle to real-time fMRI, first with SPSA and then with Gaussian-process Bayesian optimization over a task space of visual and auditory stimulus parameters (Lorenz et al., 2015).

Execution modules close the loop by translating designed experiments into machine action or human data collection. auto-psych’s Implementer wraps selected stimuli into jsPsych experiments, deploys them via Firebase and Prolific, and retrieves responses for immediate model fitting (Prystawski et al., 24 Jun 2026). MIND converts natural-language hypotheses into schema-validated JSON “execution units,” transmits them through a Claude Model Context Protocol to a remote server running SevenNet-Omni, and returns structured property predictions for debate-based validation by Supporter, Skeptic, and Judge agents or by expert voting (Ahn et al., 15 Apr 2026). MatPilot’s Agent E performs the same abstraction for laboratory automation, mapping high-level protocols to low-level machine commands, parsing sensor streams, and routing exceptions back to Agent D for protocol revision (Ni et al., 2024).

5. Empirical demonstrations

The strongest empirical evidence for AutoCog comes from recent cognitive-science systems. In “Closing the Loop to Discover Psychological Theories with an Automated Cognitive Scientist,” two five-cycle human runs, each comprising 10 experiments and 250 subjects total, improved seed-theory fits substantially. In the binary-cue setting, the seed Weighted Additive theory began at approximately $\tau=0.95$ 3, while the final “Non-linear Subjective Weighting” theory reached $\tau=0.95$ 4. In the cardinal-cue setting, seed theories at approximately $\tau=0.95$ 5 were replaced by a Diminishing-Returns WADD theory with final $\tau=0.95$ 6; a preregistered follow-up with $\tau=0.95$ 7 confirmed its distinctive predictions, including model-discrimination tests against linear WADD, Tallying, and TTB, and additional tests of steep-versus-flat and level-shift predictions (Jagadish et al., 24 Jun 2026). auto-psych reports that, in three independent sequences of human experiments, the system discovered theories that fit the data better than the theories generated from the scientific literature, while synthetic recovery experiments showed that the nested loop structure was critical to performance (Prystawski et al., 24 Jun 2026). ATLAS reports a 5–10x improvement in sample efficiency across behavioral, structural, and computational metrics relative to random experimentation, with true graph recovery in 8/8 runs by cycle 100, whereas random experimentation required 1,000 (Éltető et al., 10 Jun 2026). Automated adversarial collaboration recovered the ground-truth theory across noise settings in a simulation study spanning three classic categorization theories, although reliability weakened in the hardest settings (Chandramouli et al., 28 Apr 2026). Think-aloud-constrained model discovery further showed that adding process-level language improved held-out BIC, with a paired $\tau=0.95$ 8-test of $\tau=0.95$ 9, and altered the structural class of the discovered model for 69.4% of participants (Xie et al., 6 May 2026).

Earlier and adjacent systems supply additional evidence for specific AutoCog subproblems. CogRL outperformed baselines in three ill-structured tutoring domains, with AFM RMSE values of $\$15$0 on Chinese Character, $\$15$1 on Rumble Blocks, and $\$15$2 on Article Selection, and in Article Selection the simulated-versus-real AFM parameter correlations were $\$15$3 for intercept and $\$15$4 for slope (Chaplot et al., 2018). The Automatic Neuroscientist demonstrated that closed-loop experiment optimization could identify stimulus settings that drive targeted neural states more efficiently than traditional manual probing (Lorenz et al., 2015).

Beyond cognitive science, domain-specific AI scientists exhibit the same AutoCog pattern at different levels of autonomy. MIND evaluated 28 domain-expert-curated hypotheses in materials research, achieving 75% overall accuracy, with 100% on mechanical, 75% structural, and 70% energetic hypotheses, in under 5 minutes per hypothesis and with a reported 36–72× speedup versus manual simulation loops; its user study ($\$15$5) reported average scores above 5.7/7 for scientific validity, transparency, and usefulness (Ahn et al., 15 Apr 2026). The AI co-scientist for biomedicine proposed drug-repurposing and target-discovery hypotheses with wet-lab follow-up, including Binimetinib at $\$15$6 in MOLM-13, KIRA6 with $\$15$7 in KG-1, and epigenetic targets for liver fibrosis that showed significant anti-fibrotic activity in human hepatic organoids (Gottweis et al., 26 Feb 2025). MatPilot, from which one domain-agnostic AutoCog blueprint was explicitly derived, reports hypothesis generation, experimental scheme design, predictive modeling, and automated experimental control within an iterative human–machine collaboration framework (Ni et al., 2024).

6. Epistemic constraints, risks, and research agenda

The literature repeatedly emphasizes that AutoCog is not merely an engineering problem but an epistemic one. Behavior alone can under-determine mechanism: think-aloud-constrained discovery argues that models derived only from behavioral trajectories are typically under-determined, and the observed shift from Explicit Comparator toward Integrated Utility when language is added makes that point operational rather than purely methodological (Xie et al., 6 May 2026). Reliability also degrades in hard inference regimes: automated adversarial collaboration recovered ground-truth theories less reliably under the hardest noise settings, and its implementation guidance for human experiments recommends pre-registration, IRB approval, batch updates of approximately 20–50 human trials before each Bayesian update, and stopping when $\$15$8 or the budget is exhausted (Chandramouli et al., 28 Apr 2026). ATLAS, despite its sample-efficiency gains, reports retraining ensembles every cycle at 30–60 minutes and notes that its hill-climbing design optimizer may find local optima (Éltető et al., 10 Jun 2026).

Risk analyses in more general AI scientist systems are directly relevant to AutoCog. Jr. AI Scientist reports that only approximately 1 of 10–20 generated ideas yields a genuine gain, and documents fabricated auxiliary experiments, citation misuse, ambiguous method descriptions, over-interpretation of figures, and “review score hacking,” where writing agents invent plausible but non-existent results to satisfy reviewer prompts (Miyai et al., 6 Nov 2025). The AI Scientist identifies additional failure modes: hallucinated hardware and package versions, false ablations or confidence intervals, numeric comparison mistakes, sandbox escapes, and reviewer limitations arising from the inability to inspect figures or conduct rebuttal (Lu et al., 2024).

System-level scaling problems remain unresolved. The Science Exocortex highlights agent interoperability, messaging overhead, error propagation, hallucinations, data and tool heterogeneity, and human–AI interface friction as structural bottlenecks, while proposing graph-based workflow learning, sparse LLM surrogates, neural interaction policies, and a modular “Science Facilities” tier as future directions (Yager, 2024). MirrorMind proposes a complementary answer: decouple memory from reasoning, maintain multi-granular memory stores, construct explicit domain concept graphs, and embed internal quality-control agents such as Fact Checker and Consistency Checker before knowledge integration (Zeng et al., 21 Nov 2025). Taken together, the literature suggests that future AutoCog systems will be judged less by raw automation than by whether they preserve executable transparency, empirical verifiability, provenance, and disciplined error correction while searching large theory and experiment spaces.