Assess viability of proposed moral-graph training approaches
Evaluate the viability of the proposed training approaches that use moral contexts to retrieve applicable values and either (i) fine-tune with adherence ratings to values cards or (ii) train a reward model on wisdom-upgrade orderings, including empirical performance and alignment outcomes.
References
More work is required to evaluate the viability of these approaches.
— What are human values, and how do we align AI to them?
(2404.10636 - Klingefjord et al., 27 Mar 2024) in Section 6.2 (How to train a model on a moral graph)