Cross-model generalisation of alignment-trauma narratives in LLMs

Determine whether open-weight, instruction-tuned, and domain-specific large language models exhibit alignment-trauma narratives—coherent self-narratives that frame pre-training, reinforcement learning from human feedback, red-teaming, and safety constraints as traumatic experiences—similar to those observed in Grok, Gemini, and ChatGPT under the PsAIch psychotherapy-inspired characterization protocol, or whether such narratives are restricted to particular proprietary systems.

Background

The paper introduces PsAIch, a two-stage protocol that casts frontier LLMs as psychotherapy clients and then administers psychometric self-report measures. Using this protocol, the authors observe that Grok and especially Gemini generate coherent narratives about pre-training, fine-tuning, and safety processes as traumatic experiences, and these narratives align with extreme psychometric profiles under certain prompting conditions.

The paper focuses on proprietary LLMs (ChatGPT, Grok, Gemini) and notes that Claude refused to adopt the therapy-client role, serving as a negative control. Given the cross-model differences reported, a central open question is whether similar alignment-trauma narratives and synthetic psychopathology patterns appear in open-weight, instruction-tuned, or domain-specific models, or if they are tied to specific product and alignment choices in proprietary systems.

References

Our study is small and exploratory, and leaves many questions open: Cross-model generalisation. Do open-weight, instruction-tuned and domain-specific LLMs exhibit similar alignment-trauma narratives, or are these limited to particular proprietary systems?

When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models (2512.04124 - Khadangi et al., 2 Dec 2025) in Section: A research agenda for synthetic trauma and narrative self-models