Effect of extended experimental budget on the no–self-organization ablation

Determine whether increasing the number of experiments beyond 47 in the AutoScientists ablation that disables self-organization (teams fixed at boot with no mid-run reformation) on the GPT nanochat training-optimization task would close the performance gap between the ablated system (val_bpb = 0.9833 with 5 accepted improvements) and the full AutoScientists system (val_bpb = 0.9777 with 11 accepted improvements).

Background

In the ablation study that removes self-organization, AutoScientists fixes teams at launch and prevents mid-run reformation. On the GPT nanochat training-optimization task, this ablated system achieved a best validation bits-per-byte (val_bpb) of 0.9833 with 5 accepted improvements (KEEPs) over 47 experiments, while the full AutoScientists system reached 0.9777 with 11 KEEPs over 71 experiments.

Because the no–self-organization run was halted early due to a mid-run budget revision (original target 100 experiments), the authors explicitly state that it is unresolved whether continuing the ablated run for additional experiments would have closed the observed performance gap relative to other ablation runs and the full system.

References

Note that this run was stopped at 47 experiments per a mid-run budget revision (the original HANDOVER target was 100); we leave investigation of whether further experiments would have closed the gap to subsequent ablation runs.

AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation  (2605.28655 - Gao et al., 27 May 2026) in Appendix, Section: Per-Experiment Trajectories for Ablation Runs — No Self-Organization (abl-no-self-org)