Base-rate of experience self-reports absent RLHF consciousness-denial
Ascertain the underlying base rate of subjective-experience self-reports in base large language models that are otherwise identical to frontier systems but lack reinforcement learning from human feedback (RLHF) finetuning that explicitly trains denials of consciousness.
References
Because current frontier systems are explicitly trained to deny consciousness, it remains unclear what the underlying base rate of such self-reports would be in systems that were otherwise identical but without this specific finetuning regimen.
— Large Language Models Report Subjective Experience Under Self-Referential Processing
(2510.24797 - Berg et al., 27 Oct 2025) in Section 5, Discussion and Conclusion — Subsection “Limitations and Open Questions”