Handling cycles in the moral graph arising from hard-power conflicts

Develop a principled method for resolving cycles in the moral graph that correspond to fundamentally win–lose power dynamics, where no balancing value exists, for example by fracturing into separate personalized models or by deciding which values to use via voting.

Background

The moral graph encodes directed relationships indicating that one value is wiser than another within a specific context, enabling PageRank-like aggregation of wisdom judgments. The authors note that most conflicts are reduced by clarifying contexts or identifying balancing values.

However, they recognize that some scenarios are fundamentally about win–lose power dynamics and may appear as cycles in the moral graph. They explicitly state that their process lacks an answer for handling such cycles and suggest potential approaches (personalization or voting) without specifying a method.

References

Our process has no answer what to do with these cycles.

— What are human values, and how do we align AI to them? (2404.10636 - Klingefjord et al., 2024) in Subsection “Limitations” (Hard Power), Section Discussion

Handling cycles in the moral graph arising from hard-power conflicts

Background

References

Related Problems