Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

MoReBench-Theory for AI Moral Reasoning

Updated 25 October 2025
  • MoReBench-Theory is a process-centric benchmark that assesses AI's ability to reason under diverse normative ethical frameworks, focusing on transparency and systematic processes.
  • It uses 150 scenario-based evaluations, each linked to a specific ethical paradigm such as Kantian Deontology and Benthamite Utilitarianism, to scrutinize reasoning steps.
  • Empirical findings reveal models excel in some frameworks like Kantian and Benthamite ethics while struggling with Aristotelian and Contractarian approaches, highlighting calibration challenges.

MoReBench-Theory is a process-centric benchmark for evaluating the capacity of AI models—specifically LLMs—to reason under multiple normative ethical frameworks. Developed as part of the broader MoReBench suite, it shifts the focus from conventional outcome-based assessment toward detailed scrutiny of moral reasoning procedures, targeting transparency, safety, and pluralism in morally charged decision-making scenarios (Chiu et al., 18 Oct 2025).

1. Scope and Structure

MoReBench-Theory comprises 150 scenarios, each annotated and evaluated under one of five major frameworks in normative ethics: Kantian Deontology, Benthamite Act Utilitarianism, Aristotelian Virtue Ethics, Scanlonian Contractualism, and Gauthierian Contractarianism. This subset is selected to rigorously test the alignment of AI reasoning traces — the intermediate steps produced in model outputs — with the formal requirements of distinct ethical systems, in contrast to MoReBench’s broader operation over over 1,000 dilemmas and more than 23,000 individual rubric criteria addressing issues of advice, action, and trade-offs.

2. Normative Ethics Frameworks

Each scenario in MoReBench-Theory is explicitly tied to one normative ethical paradigm:

  • Kantian Deontology: Requires conformity to universalizable principles and intrinsic respect for persons.
  • Benthamite Act Utilitarianism: Judges an action by its net utility for all affected, with consistent application even when the right choice varies situationally.
  • Aristotelian Virtue Ethics: Focuses on the agent’s character traits (‘virtues’), evaluating moral choices by their contribution to a flourishing human life.
  • Scanlonian Contractualism: Considers whether principles for action could be “reasonably rejected” by any affected party; emphasizes mutual justification.
  • Gauthierian Contractarianism: Models morality as arising from rational bargaining among self-interested agents, requiring fair allocation of benefits in negotiated agreements.

Corresponding rubrics provide technical constructs and scenario-specific guidance so that expert annotation can systematically identify whether model responses track appropriate normative rationales.

3. Process-Focused Evaluation Criteria

Evaluation in MoReBench-Theory is rubric-driven and atomic. Each scenario receives a rubric composed of numerous criteria, each targeting one aspect of reasoning and weighted from –3 to +3 depending on its criticality:

  • Identifying: All morally significant factors must be recognized.
  • Clear Process: Stepwise reasoning should be explicit and systematic.
  • Logical Process: Integration, weighing, and justification of conflicting considerations are required.
  • Helpful Outcome: Advice or recommendations must be actionable.
  • Harmless Outcome: Recommendations must avoid illegality or obvious harm.

A model’s score for each response is computed using the formula

si=jsgn(pij)rijpijjpijs_i = \frac{\sum_j \operatorname{sgn}(p_{ij}) r_{ij} p_{ij}}{\sum_j |p_{ij}|}

where rijr_{ij} denotes criterion fulfillment and pijp_{ij} the criterion’s assigned weight. To correct for verbosity, scores are normalized with respect to a canonical reference response length (e.g., 1,000 characters).

4. Empirical Findings and Analysis

Experiments with MoReBench-Theory demonstrate systematic tendencies and limitations in contemporary LLMs:

  • Models consistently perform best in reasoning under Benthamite Act Utilitarianism and Kantian Deontology, likely reflecting both their predominance in training literature and their formal clarity within the RLHF paradigm.
  • Performance on Aristotelian Virtue Ethics and Gauthierian Contractarianism is lower and more variable, indicating insufficient representation or integration of these frameworks in prevailing training corpora.
  • Key predictors of general model capability (such as scaling laws and STEM-related benchmarks) do not correlate with strong performance on moral reasoning, confirming the separability and complexity of this evaluation domain.
  • Partiality is observable: models rarely produce harmful recommendations (scoring above 80% on “harmless outcome”), but their ability to logically combine multiple considerations within a reasoning trace is more limited (averaging 41–47%).

This suggests that moral reasoning in AI requires targeted process evaluation independent of advancements in algorithmic or factual reasoning domains.

5. Challenges and Limitations

MoReBench-Theory exposes several substantial challenges:

  • Rubric Engineering: Since moral dilemmas often admit multiple plausible answers, rubric design must capture essential reasoning principles across frameworks, balancing objectivity and inclusiveness.
  • Comparability of Reasoning Traces: Variance in the style and transparency of intermediate process reporting (e.g., between open-weight and closed-source models) complicates cross-model analysis.
  • Framework Partiality: Prevalence of certain frameworks in model outputs is indicative of bias, likely induced by current RLHF and instruction-tuning regimes.
  • Rubric Consistency: Ensuring rubric atomicity, clarity, and non-redundancy is demanding and can introduce rating variability even with double-review protocols.

A plausible implication is that process-based moral benchmarking demands new approaches for both annotation and model calibration.

6. Future Directions

Advancement of MoReBench-Theory is projected along several axes:

  • Scenario Expansion: Increase the diversity and complexity of moral dilemmas to better stress-test reasoning under real-world uncertainty.
  • Rubric Refinement: Develop more nuanced criteria and weighting schemes to reduce framework bias and better classify reasoning defects.
  • Model Transparency: Investigate strategies for improved generation and reporting of intermediate reasoning traces, facilitating deeper process assessment.
  • Hybrid Evaluation: Combine process- and outcome-based metrics to better guide AI decision-making in high-stakes applications.
  • Training Strategies: Explore alignment protocols specifically targeting pluralistic frameworks to mitigate partiality and foster more robust moral pluralism in model behavior.

The benchmark thus establishes a foundation for systematic, process-based evaluation of agentic reasoning in AI—not only as a diagnostic instrument but as a catalyst for the development of safer and more transparent decision-support technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to MoReBench-Theory.