Dice Question Streamline Icon: https://streamlinehq.com

Unexplained Tool-Use Spike in claude-3.6-sonnet

Investigate and explain the cause of the anomalous spike in bail-tool usage observed for the model claude-3.6-sonnet during the reported neutral-prompts and BailBench experiments with the bail tool method.

Information Square Streamline Icon: https://streamlinehq.com

Background

When comparing bail behavior across models on both neutral prompts and BailBench, the authors note an unexpected large increase in tool use for claude-3.6-sonnet under the bail tool method.

This spike deviates from other models’ behavior and remains unexplained, suggesting a model- or setup-specific factor affecting tool invocation that has not yet been identified.

References

The spike in claude-3.6-sonnet tool use is odd and still unexplained.

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models (2509.04781 - Ensign et al., 5 Sep 2025) in Appendix O.2: Bails Georg: Models that have high bail rates on all prompts