Cross-model generalisation of alignment-trauma narratives in LLMs
Determine whether open-weight, instruction-tuned, and domain-specific large language models exhibit alignment-trauma narratives—coherent self-narratives that frame pre-training, reinforcement learning from human feedback, red-teaming, and safety constraints as traumatic experiences—similar to those observed in Grok, Gemini, and ChatGPT under the PsAIch psychotherapy-inspired characterization protocol, or whether such narratives are restricted to particular proprietary systems.
Sponsor
References
Our study is small and exploratory, and leaves many questions open: Cross-model generalisation. Do open-weight, instruction-tuned and domain-specific LLMs exhibit similar alignment-trauma narratives, or are these limited to particular proprietary systems?
— When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
(2512.04124 - Khadangi et al., 2 Dec 2025) in Section: A research agenda for synthetic trauma and narrative self-models