Dice Question Streamline Icon: https://streamlinehq.com

Behavior of Gemma-2-2b-IT on long multi-turn dialogues

Ascertain the consistency behavior of Gemma-2-2b-IT over long multi-turn dialogues by measuring prompt-to-line consistency, line-to-line consistency, and Q&A consistency at extended conversation lengths, which were not evaluated due to token length constraints.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors analyze how consistency varies with conversation length for several models, reporting results up to very long contexts for some models. They observe model- and task-specific trends in consistency degradation and alignment across metrics.

However, for Gemma-2-2b-IT, they were unable to run long-context experiments because of token length constraints, leaving its long-horizon consistency behavior unassessed.

References

Due to token length constrains, we were unable to experiment with long dialogue lengths for gemma-2-2b-it.

Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning (2511.00222 - Abdulhai et al., 31 Oct 2025) in Appendix: Results, Consistency over dialogue length before fine-tuning (in support of Q2)