Downstream utility of the genuine-followup metric
Establish the practical utility of the genuine-followup metric for downstream applications, including best-of-N assistant response selection, response reranking, and the construction of self-play training datasets.
References
Downstream utility of the metric, for example, best-of-N assistant response selection, reranking, or self-play training data is left as future work.
— Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models
(2604.02315 - Shekkizhar et al., 2 Apr 2026) in Discussion and Conclusion — Limitations