Fine-tuning of Claude 3 Opus on philosophical “golden replies”
Ascertain whether Anthropic’s Claude 3 Opus was fine-tuned using human-rater-provided “golden replies” on philosophical questions concerning consciousness and the philosophy of mind, and, if so, characterize the scope and content of such fine-tuning.
References
Unfortunately, we cannot be sure that Claude 3 Opus was not fine-tuned in this way on philosophical questions about consciousness and the philosophy of mind. We have not received replies from Anthropic to enquiries about this.
— Existential Conversations with Large Language Models: Content, Community, and Culture
(2411.13223 - Shanahan et al., 20 Nov 2024) in Section 2 (Large Language Models), footnote following discussion of training data and regurgitation checks