Unexplained Tool-Use Spike in claude-3.6-sonnet
Investigate and explain the cause of the anomalous spike in bail-tool usage observed for the model claude-3.6-sonnet during the reported neutral-prompts and BailBench experiments with the bail tool method.
References
The spike in claude-3.6-sonnet tool use is odd and still unexplained.
— The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
(2509.04781 - Ensign et al., 5 Sep 2025) in Appendix O.2: Bails Georg: Models that have high bail rates on all prompts