Joint optimization of all consistency metrics during fine-tuning
Investigate whether simultaneously optimizing prompt-to-line consistency, line-to-line consistency, and Q&A consistency as training objectives during multi-turn fine-tuning of the User Simulator large language model yields more robust persona consistency than optimizing only a single metric (prompt-to-line consistency).
References
However, jointly training with all consistency metrics may yield more robust behavior, which we leave for future work.
— Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
(2511.00222 - Abdulhai et al., 31 Oct 2025) in Section 6: Limitations