Causes of Unexplained Step Changes in Offline Model Responses

Determine the causes of the persistent step changes observed in the longitudinal embeddings of offline large language model responses—such as the shifts around September 3 and September 24, 2024—that could not be attributed to publicly announced model updates.

Background

In the longitudinal analysis, the authors observe distinct step changes in response embeddings over time. While some can be linked to known checkpoint updates (e.g., the October 2 GPT‑4o update), others—particularly around September 3 and September 24—do not align with any known updates, despite the models being queried at temperature zero and, in these comparisons, without internet access.

The authors explicitly state that the causes of these unexplained step changes are not clear, indicating a need for future work to identify underlying drivers such as unannounced updates, infrastructure changes, guardrail adjustments, or other system-level factors.

References

It is not clear, given knowledge about model updates, precisely what causes these step changes, though these plots indicate that our dataset captures and could be used to analyze these persistent shifts.

— Large-Scale, Longitudinal Study of Large Language Models During the 2024 US Election Season (2509.18446 - Cen et al., 22 Sep 2025) in Section 6.1, Observation 1: “Step” changes

Causes of Unexplained Step Changes in Offline Model Responses

Background

References

Related Problems