Unclear deprecation criteria under Chatbot Arena’s ‘and/or’ policy

Ascertain which specific criterion—pricing comparisons or quality comparisons (as measured by overall Arena Score)—is actually applied by Chatbot Arena when retiring models under its stated ‘and/or’ deprecation rule, especially in cases where many models are hosted for free on the platform.

Background

The authors analyze how deprecation policy influences data access and rating reliability on Chatbot Arena. The stated policy allows retirement after 3000 votes if newer models in the same series exist and/or if more than three providers offer cheaper or similarly priced but strictly better models (by Arena Score).

Because many models on the Arena are hosted for free and the rule uses an ‘and/or’ formulation, the authors cannot audit which condition—price or quality—triggers removals. Clarifying the operative criterion is necessary to evaluate whether deprecations are being applied consistently and fairly across providers and license categories.

References

We note that the logic of this policy is difficult to audit in practice because many models are hosted for free on the Chatbot Arena, and the use of the "or" condition means it is not clear what criteria (price or quality) applies to decisions.

The Leaderboard Illusion (2504.20879 - Singh et al., 29 Apr 2025) in Section 4.1 Disparity in access to Chatbot Arena Data, item (3) Number of models publicly hosted on the arena