Evaluate user preference for moral-graph–fine-tuned models
Evaluate whether users prefer interacting with a language model fine-tuned using the moral graph alignment target compared to baseline systems such as RLHF- or constitutionally-aligned models, and assess resulting user satisfaction and outcomes.
References
Finally, we don’t yet know if users will prefer interacting with a model fine-tuned on the moral graph. We are in the process of fine-tuning a model on a new, larger moral graph, and will be able to answer this question soon.
— What are human values, and how do we align AI to them?
(2404.10636 - Klingefjord et al., 27 Mar 2024) in Section 6 (Limitations) – Fine-Tuning