Extent of real-world manifestation of theoretical human–chatbot risk dynamics
Determine the extent to which the theoretical risks of harmful human–chatbot interaction dynamics—including bidirectional belief amplification, out-of-distribution generalization failures, and jailbreak-induced undesirable outputs not detected by content filters—will manifest in real-world deployments of large language model chatbots, especially prior to widespread general population adoption.
References
The degree to which these theoretical risks will manifest is not known, and may potentially be unknowable prior to widespread general population adoption.
— Technological folie à deux: Feedback Loops Between AI Chatbots and Mental Illness
(2507.19218 - Dohnány et al., 25 Jul 2025) in Section 3: Feedback loops and technological folie à deux