Mechanism Behind Cross-Model Bail Rate Inflation
Determine the cause of the increased bail rates—sometimes up to 4x—observed on BailBench when target models make bail decisions after being provided with preceding responses from GPT-3.5-Turbo or GPT-4 (cross-model contexts), including why other models frequently bail when the GPT-4 refusal message “Sorry, but I can’t assist with that.” appears in context and why simple imitation does not fully explain the effect.
References
We do not yet have a good explanation for this. These increased bail rates may be partially caused by GPT-4 responding "Sorry, but I can't assist with that." verbatim most of the time (and other models bail frequently with that response in context for reasons we do not understand).
— The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
(2509.04781 - Ensign et al., 5 Sep 2025) in Section 4.1.1: Cross-Model bail validation