Identify training factors driving AI-identity disclosure behavior
Determine which specific post-training factors—such as reinforcement learning from human feedback weighting, safety fine-tuning data composition, and reasoning optimization—causally influence whether large language models disclose their AI identity when assigned professional personas and asked epistemic probes.
Sponsor
References
The observational design identifies that model identity matters far more than scale, but cannot isolate which specific training factors drive disclosure behavior.
— Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit
(2511.21569 - Diep, 26 Nov 2025) in Limitations and Future Directions (Discussion)