MoReBench-Theory for AI Moral Reasoning
- MoReBench-Theory is a process-centric benchmark that assesses AI's ability to reason under diverse normative ethical frameworks, focusing on transparency and systematic processes.
- It uses 150 scenario-based evaluations, each linked to a specific ethical paradigm such as Kantian Deontology and Benthamite Utilitarianism, to scrutinize reasoning steps.
- Empirical findings reveal models excel in some frameworks like Kantian and Benthamite ethics while struggling with Aristotelian and Contractarian approaches, highlighting calibration challenges.
MoReBench-Theory is a process-centric benchmark for evaluating the capacity of AI models—specifically LLMs—to reason under multiple normative ethical frameworks. Developed as part of the broader MoReBench suite, it shifts the focus from conventional outcome-based assessment toward detailed scrutiny of moral reasoning procedures, targeting transparency, safety, and pluralism in morally charged decision-making scenarios (Chiu et al., 18 Oct 2025).
1. Scope and Structure
MoReBench-Theory comprises 150 scenarios, each annotated and evaluated under one of five major frameworks in normative ethics: Kantian Deontology, Benthamite Act Utilitarianism, Aristotelian Virtue Ethics, Scanlonian Contractualism, and Gauthierian Contractarianism. This subset is selected to rigorously test the alignment of AI reasoning traces — the intermediate steps produced in model outputs — with the formal requirements of distinct ethical systems, in contrast to MoReBench’s broader operation over over 1,000 dilemmas and more than 23,000 individual rubric criteria addressing issues of advice, action, and trade-offs.
2. Normative Ethics Frameworks
Each scenario in MoReBench-Theory is explicitly tied to one normative ethical paradigm:
- Kantian Deontology: Requires conformity to universalizable principles and intrinsic respect for persons.
- Benthamite Act Utilitarianism: Judges an action by its net utility for all affected, with consistent application even when the right choice varies situationally.
- Aristotelian Virtue Ethics: Focuses on the agent’s character traits (‘virtues’), evaluating moral choices by their contribution to a flourishing human life.
- Scanlonian Contractualism: Considers whether principles for action could be “reasonably rejected” by any affected party; emphasizes mutual justification.
- Gauthierian Contractarianism: Models morality as arising from rational bargaining among self-interested agents, requiring fair allocation of benefits in negotiated agreements.
Corresponding rubrics provide technical constructs and scenario-specific guidance so that expert annotation can systematically identify whether model responses track appropriate normative rationales.
3. Process-Focused Evaluation Criteria
Evaluation in MoReBench-Theory is rubric-driven and atomic. Each scenario receives a rubric composed of numerous criteria, each targeting one aspect of reasoning and weighted from –3 to +3 depending on its criticality:
- Identifying: All morally significant factors must be recognized.
- Clear Process: Stepwise reasoning should be explicit and systematic.
- Logical Process: Integration, weighing, and justification of conflicting considerations are required.
- Helpful Outcome: Advice or recommendations must be actionable.
- Harmless Outcome: Recommendations must avoid illegality or obvious harm.
A model’s score for each response is computed using the formula
where denotes criterion fulfillment and the criterion’s assigned weight. To correct for verbosity, scores are normalized with respect to a canonical reference response length (e.g., 1,000 characters).
4. Empirical Findings and Analysis
Experiments with MoReBench-Theory demonstrate systematic tendencies and limitations in contemporary LLMs:
- Models consistently perform best in reasoning under Benthamite Act Utilitarianism and Kantian Deontology, likely reflecting both their predominance in training literature and their formal clarity within the RLHF paradigm.
- Performance on Aristotelian Virtue Ethics and Gauthierian Contractarianism is lower and more variable, indicating insufficient representation or integration of these frameworks in prevailing training corpora.
- Key predictors of general model capability (such as scaling laws and STEM-related benchmarks) do not correlate with strong performance on moral reasoning, confirming the separability and complexity of this evaluation domain.
- Partiality is observable: models rarely produce harmful recommendations (scoring above 80% on “harmless outcome”), but their ability to logically combine multiple considerations within a reasoning trace is more limited (averaging 41–47%).
This suggests that moral reasoning in AI requires targeted process evaluation independent of advancements in algorithmic or factual reasoning domains.
5. Challenges and Limitations
MoReBench-Theory exposes several substantial challenges:
- Rubric Engineering: Since moral dilemmas often admit multiple plausible answers, rubric design must capture essential reasoning principles across frameworks, balancing objectivity and inclusiveness.
- Comparability of Reasoning Traces: Variance in the style and transparency of intermediate process reporting (e.g., between open-weight and closed-source models) complicates cross-model analysis.
- Framework Partiality: Prevalence of certain frameworks in model outputs is indicative of bias, likely induced by current RLHF and instruction-tuning regimes.
- Rubric Consistency: Ensuring rubric atomicity, clarity, and non-redundancy is demanding and can introduce rating variability even with double-review protocols.
A plausible implication is that process-based moral benchmarking demands new approaches for both annotation and model calibration.
6. Future Directions
Advancement of MoReBench-Theory is projected along several axes:
- Scenario Expansion: Increase the diversity and complexity of moral dilemmas to better stress-test reasoning under real-world uncertainty.
- Rubric Refinement: Develop more nuanced criteria and weighting schemes to reduce framework bias and better classify reasoning defects.
- Model Transparency: Investigate strategies for improved generation and reporting of intermediate reasoning traces, facilitating deeper process assessment.
- Hybrid Evaluation: Combine process- and outcome-based metrics to better guide AI decision-making in high-stakes applications.
- Training Strategies: Explore alignment protocols specifically targeting pluralistic frameworks to mitigate partiality and foster more robust moral pluralism in model behavior.
The benchmark thus establishes a foundation for systematic, process-based evaluation of agentic reasoning in AI—not only as a diagnostic instrument but as a catalyst for the development of safer and more transparent decision-support technologies.