Breaking Up Normatively Monolithic AI with Reason-Based Moral Architecture
This presentation introduces GRACE, a groundbreaking neuro-symbolic architecture that separates ethical reasoning from instrumental decision-making in AI agents. By decomposing agency into distinct modules—a Moral Module, Decision-Making Module, and Guard—GRACE enables transparent, contestable, and verifiable alignment with human values. We explore how this framework addresses the critical problem of opaque AI decision-making and demonstrate its potential through a therapy AI proof-of-concept.Script
Most AI safety work treats moral reasoning and goal pursuit as a single black box, making it impossible to audit why an agent chose one action over another. This paper introduces GRACE, an architecture that tears that black box apart.
When an autonomous AI makes a decision, we can't tell whether it weighed ethical constraints or simply optimized for its goal. The moral reasoning and the strategic planning are entangled in ways we cannot inspect or contest.
GRACE solves this by breaking agency into three distinct, interacting modules.
The Moral Module reasons about what's ethically allowed using formal logic that humans can inspect and challenge. Meanwhile, the Decision-Making Module does what AI does best—optimize—but only within the safe zone the Moral Module defines. A Guard enforces this boundary.
This diagram shows how the three modules work together as a multi-agent system. The Moral Module outputs permissible macro action types, the Decision-Making Module picks the best option, and the Guard blocks anything that violates moral constraints. Notice the external Moral Advisor feedback loop—this allows human stakeholders to refine the system's ethical reasoning over time without retraining the entire model.
The researchers demonstrated GRACE with a therapy AI, where understanding why the system recommends certain interventions matters as much as the recommendation itself. Because the moral reasoning is separated and formal, clinicians could examine the ethical logic, challenge it, and improve it—something impossible with monolithic policy networks.
GRACE turns AI alignment from an optimization problem into an architecture problem, giving us the tools to build agents we can actually reason about. Visit EmergentMind.com to explore this paper further and create your own research videos.