Behavior of Gemma-2-2b-IT on long multi-turn dialogues
Ascertain the consistency behavior of Gemma-2-2b-IT over long multi-turn dialogues by measuring prompt-to-line consistency, line-to-line consistency, and Q&A consistency at extended conversation lengths, which were not evaluated due to token length constraints.
References
Due to token length constrains, we were unable to experiment with long dialogue lengths for gemma-2-2b-it.
— Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
(2511.00222 - Abdulhai et al., 31 Oct 2025) in Appendix: Results, Consistency over dialogue length before fine-tuning (in support of Q2)