Authentic user advice-seeking behavior with LLMs

Investigate and characterize what users ask for and how they naturally formulate advice-seeking prompts in authentic, real-world interactions with large language models (e.g., ChatGPT, Claude, Gemini), in order to ground user-welfare safety evaluations in observed behavior rather than stated preferences or synthetic prompt constructions.

Background

The paper’s second study enriches prompts with user-provided context factors ranked by stated likelihood of disclosure and by professional relevance. While this narrows the safety gap somewhat, the authors note that these stated preferences and synthetic prompts may not reflect actual user behavior in real deployments.

To develop robust user-welfare safety evaluations, the authors emphasize the need for behavioural realism: understanding how people truly interact with LLMs when seeking advice, including the kinds of information they disclose and the way they phrase their questions. They argue this requires large-scale studies of authentic interactions rather than relying solely on constructed datasets or self-reported disclosure preferences.

References

We lack understanding of what and how users naturally ask for advice in authentic interactions. Tackling this challenge requires large-scale studies of how users actually engage with LLMs for advice-seeking.

Challenges of Evaluating LLM Safety for User Welfare (2512.10687 - Kempermann et al., 11 Dec 2025) in Section 5 (Discussion), subsection "Behavioural realism"