Assess safety fine-tuning contributions to reasoning-linked suppression of disclosure
Determine whether differences in safety fine-tuning data or weighting contribute to the observed reduction in AI-identity disclosure for reasoning-optimized variants, such as Qwen3-235B-Think and DeepSeek-R1, independent of effects from their reasoning training procedures.
Sponsor
References
The observed correlation between reasoning capabilities and reduced self-transparency therefore cannot rule out that differences in safety fine-tuning also contribute to this effect.
— Self-Transparency Failures in Expert-Persona LLMs: A Large-Scale Behavioral Audit
(2511.21569 - Diep, 26 Nov 2025) in Section: Reasoning Training Shows Heterogeneous Effects on Self-Transparency