Drivers of the base-model advantage in multi-round strategic play
Determine which mechanisms in multi-round strategic interactions are responsible for base language models outperforming aligned language models in predicting human decisions, specifically adjudicating among opponent modeling, history integration, and trajectory novelty as candidate explanations.
References
Several open questions follow naturally. Which aspects of multi-round play drive the base advantage---opponent modeling, history integration, or trajectory novelty?
— Alignment Makes Language Models Normative, Not Descriptive
(2603.17218 - Shapira et al., 17 Mar 2026) in Discussion and Conclusion