Determine causes of Elo discrepancy against humans vs. bots on Lichess
Ascertain the precise factors that cause the observed discrepancy between the Lichess blitz Elo ratings achieved by the 270M-parameter transformer action-value policy when playing exclusively against human opponents versus when playing against bots, rigorously evaluating the extent to which resignation behavior, rating-pool miscalibration between humans and bots, and differential exploitation of occasional tactical mistakes by bots contribute to the difference.
Sponsor
References
While the precise reasons are not entirely clear, we have three plausible hypotheses: (i) humans tend to resign when our bot has overwhelming win percentage but many bots do not (meaning that the previously described problem gets amplified when playing against bots); (ii) humans on Lichess rarely play against bots, meaning that the two player pools (humans and bots) are hard to compare and Elo ratings between pools may be miscalibrated~\citep{justaz2023exact}; and (iii) based on preliminary (but thorough) anecdotal analysis by a chess NM, our models make the occasional tactical mistake which may be penalized qualitatively differently (and more severely) by other bots compared to humans (see some of this analysis in \cref{ssec:tactics-analysis,ssec:playing-style-analysis}).