Fine-tuning of Claude 3 Opus on philosophical “golden replies”

Ascertain whether Anthropic’s Claude 3 Opus was fine-tuned using human-rater-provided “golden replies” on philosophical questions concerning consciousness and the philosophy of mind, and, if so, characterize the scope and content of such fine-tuning.

Background

The authors discuss that many commercial LLMs are fine-tuned using human feedback, sometimes including "golden replies" crafted by subject matter experts. They explicitly state uncertainty about whether Claude 3 Opus was fine-tuned in this way for philosophical topics.

They further note that their attempts to clarify this with Anthropic received no response, leaving the question unresolved.

References

Unfortunately, we cannot be sure that Claude 3 Opus was not fine-tuned in this way on philosophical questions about consciousness and the philosophy of mind. We have not received replies from Anthropic to enquiries about this.

— Existential Conversations with Large Language Models: Content, Community, and Culture (2411.13223 - Shanahan et al., 2024) in Section 2 (Large Language Models), footnote following discussion of training data and regurgitation checks

Fine-tuning of Claude 3 Opus on philosophical “golden replies”

Background

References

Related Problems