Evaluate user preference for moral-graph–fine-tuned models

Evaluate whether users prefer interacting with a language model fine-tuned using the moral graph alignment target compared to baseline systems such as RLHF- or constitutionally-aligned models, and assess resulting user satisfaction and outcomes.

Background

The paper proposes MGE and the moral graph as alignment targets and outlines possible training strategies but has not yet established user preference or satisfaction for models trained on this target.

The authors explicitly state they do not yet know if users will prefer such models and are preparing a larger-scale fine-tuning to test this.

References

Finally, we don’t yet know if users will prefer interacting with a model fine-tuned on the moral graph. We are in the process of fine-tuning a model on a new, larger moral graph, and will be able to answer this question soon.

— What are human values, and how do we align AI to them? (2404.10636 - Klingefjord et al., 27 Mar 2024) in Section 6 (Limitations) – Fine-Tuning

Evaluate user preference for moral-graph–fine-tuned models

Sponsor

Background

References

Related Problems