Develop a training algorithm for optimizing the moral graph alignment target
Develop a concrete algorithm for training large language models to optimize the moral graph alignment target produced by Moral Graph Elicitation, converting the moral graph into an objective function suitable for post-training and demonstrating its effectiveness relative to existing alignment methods.
References
Finally, we need an algorithm for training a model to optimize this target; we leave this final stage for future work.
— What are human values, and how do we align AI to them?
(2404.10636 - Klingefjord et al., 27 Mar 2024) in Section 1 (Introduction)