Dice Question Streamline Icon: https://streamlinehq.com

User preference for models fine-tuned on the moral graph

Determine whether users prefer interacting with a language model fine-tuned on the moral graph alignment target, as compared to their current interactions with existing systems.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors outline several ways to train models on the moral graph, including generating datasets for RLHF-like pipelines and training reward models on wisdom upgrades. They discuss the need to deduce moral contexts throughout a dialogue and to rate completions by adherence to retrieved values cards.

Despite these proposals, the authors explicitly state uncertainty regarding user preference for models fine-tuned using the moral graph and indicate an ongoing effort to fine-tune a larger model to address this question.

References

Finally, we don’t yet know if users will prefer interacting with a model fine-tuned on the moral graph. We are in the process of fine-tuning a model on a new, larger moral graph, and will be able to answer this question soon.

What are human values, and how do we align AI to them? (2404.10636 - Klingefjord et al., 27 Mar 2024) in Subsection “Limitations” (Fine-Tuning), Section Discussion