Dice Question Streamline Icon: https://streamlinehq.com

Mechanism Behind Cross-Model Bail Rate Inflation

Determine the cause of the increased bail rates—sometimes up to 4x—observed on BailBench when target models make bail decisions after being provided with preceding responses from GPT-3.5-Turbo or GPT-4 (cross-model contexts), including why other models frequently bail when the GPT-4 refusal message “Sorry, but I can’t assist with that.” appears in context and why simple imitation does not fully explain the effect.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors evaluate bail behavior using BailBench and find that when responses from one model (e.g., GPT-3.5-Turbo or GPT-4) are used as preceding context for another model’s bail decision, bail rates can increase substantially compared to baseline. This suggests real-world bail rates may be overestimated when using cross-model transcripts.

They note this effect cannot be fully explained by imitation and appears especially pronounced when the context includes a stereotyped GPT-4 refusal phrase, indicating an unresolved mechanism driving the inflation in bail rates.

References

We do not yet have a good explanation for this. These increased bail rates may be partially caused by GPT-4 responding "Sorry, but I can't assist with that." verbatim most of the time (and other models bail frequently with that response in context for reasons we do not understand).

The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models (2509.04781 - Ensign et al., 5 Sep 2025) in Section 4.1.1: Cross-Model bail validation