Mitigating teacher-forcing mismatch in rank-aware token-level training beyond the first token

Develop methods to mitigate the mismatch between rank-aware training with teacher forcing and inference-time generation for timesteps greater than one when using prefix-tree–based token-level target distributions within the SToICaL loss for autoregressive ranking.

Background

In the proposed SToICaL framework, rank-aware token-level targets are derived by marginalizing over a prefix tree of docIDs. During training, teacher forcing conditions the model on the correct previous tokens, which can misalign with the model’s actual choices during inference for t>1, potentially reducing rank-awareness beyond the first token. The authors explicitly flag mitigation of this mismatch as deferred work.

References

We leave the question of mitigating this rank-aware training-inference mismatch for $t>1$ to future work.

Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders  (2601.05588 - Rozonoyer et al., 9 Jan 2026) in Section 4.3 (Prefix Tree for Rank-Aware Token-Level Target Distributions), footnote