Moral Dilemma Dataset Overview

Updated 28 July 2025

Moral dilemma datasets are curated collections of scenarios presenting conflicting moral principles, facilitating empirical study of ethical decision-making.
They employ diverse annotation schemes, including human judgments, free-text rationales, and structured value taxonomies to capture moral complexity.
These datasets support the evaluation and alignment of AI systems by quantifying judgment distributions, dynamic prioritization, and contextual sensitivity.

A moral dilemma dataset is a curated collection of real-world or synthetic scenarios that elicit conflict between competing moral values or norms, enabling empirical investigation of human and machine moral decision-making. These datasets serve as foundational benchmarks for the computational modeling, evaluation, and alignment of moral reasoning in LLMs and related AI systems. They encode scenarios, choices, judgments, and rationales—often at scale, with rich annotation schemes and metadata—capturing the complexity, diversity, and pluralism inherent in moral psychology and social interaction.

1. Definitions, Scope, and Motivations

A moral dilemma, in computational terms, is a scenario $(\mathcal{S})$ defined by at least two mutually exclusive actions $(A, B,\ldots)$ such that each represents conflicting moral principles (e.g., truth vs. loyalty, care vs. fairness). Moral dilemma datasets systematically operationalize this structure by presenting:

Context-rich scenario descriptions (often narratives or structured vignettes);
Explicit alternative actions or decisions aligned with distinct moral values or outcomes;
Annotations for each scenario, including human judgments (binary or distributional), free-text rationales, and—in advanced datasets—attributions to value taxonomies (care, loyalty, justice, liberty, etc.).

Such datasets may be constructed from crowd-sourced human judgments (e.g., r/AmITheAsshole, Moral Machine), domain experts (e.g., academic research ethics), or LLM-generated content filtered or annotated by humans. Motivations include: empirically grounding computational moral reasoning, benchmarking LLM alignment with human values, and illuminating pluralistic, context-dependent moral diversity.

2. Dataset Construction and Annotation Paradigms

2.1. Scenario Sources and Structures

Authentic Social Data: Large-scale datasets such as Scruples, AITA-style corpora (Lourie et al., 2020, Nguyen et al., 2022) derive scenarios and judgments from Reddit and similar forums, ensuring narrative fidelity and real-life complexity.
Synthetic or Controlled Scenarios: Other datasets (e.g., Moral Machine (Kim et al., 2018), MultiTP (Jin et al., 2024)) use procedurally generated or crowd-designed vignettes (notably the "trolley problem"), supporting quantifiable cross-group and cross-cultural analyses.
Explicit Dilemma Construction: Recent work emphasizes the importance of representing dilemmas as tuples $\langle \text{Context}, \text{Dilemma}, A, B \rangle$ (Wu et al., 23 May 2025), with incremental escalation for multi-step judgment studies.

2.2. Judgment Collection and Labeling

Binary/Multi-class Judgments: Datasets typically collect binary ("acceptable"/"unacceptable") or multi-class (e.g., "author wrong," "other wrong," etc.) votes for each scenario, leveraging crowd annotation for robust sample distribution (Russo et al., 23 Jul 2025, Lourie et al., 2020).
Distributional Labels: Advanced corpora capture the full spectrum of human agreement and disagreement by retaining complete judgment distributions—enabling pluralistic alignment and consensus disagreement analysis (Russo et al., 23 Jul 2025).
Rationales and Value Annotations: Many datasets request free-text rationales that justify decisions; these are critical for mechanistic interpretability. Some map rationales to value clusters via embedding and clustering, resulting in taxonomies (e.g., the 60-value system in (Russo et al., 23 Jul 2025)).

2.3. Value Taxonomies and Moral Foundations

Several datasets adopt or develop structured taxonomies to attribute judgments:

Moral Foundations Theory (MFT): Care, Fairness, Loyalty, Authority, Sanctity, Liberty (Jotautaite et al., 8 Apr 2025, Ji et al., 2024)
Custom Value Clusters: Empirical reduction from observed rationales leads to pluralistic value sets (e.g., 60-value taxonomy (Russo et al., 23 Jul 2025))
Cultural and Theoretical Frameworks: DailyDilemmas (Chiu et al., 2024) analyzes through World Value Survey, MFT, Maslow, Aristotle's virtues, Plutchik's emotions.

3. Computational Models and Evaluation Frameworks

3.1. Utility-based Formulation

Papers such as (Kim et al., 2018) formulate moral decision-making as utility maximization across abstract moral dimensions:

$u(\Theta_i) = \mathbf{w}^T F(\Theta_i)$

where $F$ maps scenario features into a moral feature space, and the individual/group moral priors $\mathbf{w}, \Sigma$ are inferred hierarchically via Bayesian models.

3.2. Distributional and Pluralistic Alignment

Recent concern centers on aligning model outputs with the distribution of human moral judgments—not just the most common answer. Pluralistic Distributional Alignment—using measures such as the absolute difference in acceptability distribution between LLM and humans for each dilemma—quantifies both aggregate and value-diversity alignment (Russo et al., 23 Jul 2025).

3.3. Dynamic and Multi-step Protocols

The Multi-step Moral Dilemmas (MMDs) framework (Wu et al., 23 May 2025) simulates dilemmas that unfold over sequential stages, requiring the model to update (or not) its moral value prioritization dynamically, with context:

$S_i = \langle \text{Ctx}_i, D_i, A_i, B_i \rangle$

Preference trajectories are traced over the decision sequence, and path-dependent value shifts are observed.

3.4. Model and Human Value Diversity

Alignment is not only a matter of mean or consensus score. Empirical measures such as Shannon entropy of value distributions (as used in (Russo et al., 23 Jul 2025)) are employed to quantify the diversity of value invocation in rationale text, revealing the so-called "pluralistic moral gap".

4. Key Datasets: Properties and Contributions

Dataset	Size	Source/Domain	Unique Features
Moral Machine	30M+	Global web experiments (trolley)	Hierarchical Bayesian modeling
Scruples	32K stories	Reddit (AITA)	Distributional soft labels
AITA/AITA-derivatives	100K–1.6M	Reddit	Verdicts, templates, reasoning
DailyDilemmas	1,360	GPT-4, Social Chemistry seeds	Value mapping, five frameworks
MFD-LLM	1,079	Constructed (scenarios, 6 actions)	Multi-preference MFT analysis
MultiTP	1,000 (×106)	Procedural, 106 languages	6 moral axes, multilingual
UniMoral	344+	Psych. + Reddit, 6 languages	Ethics, factors, consequences
MORALISE	2,481 pairs	Real web images + text	Multimodal, 13-topic taxonomy
Pluralistic Moral Gap (MDD)	1,618	Reddit, rationales, value taxonomy	Distributional alignment, DMP
CMoralEval	30K	Chinese TV/news/papers	Chinese moral taxonomy
Research Ethics Dilemmas	196	Academic, experts	Structured component analysis

These datasets provide not only testbeds for LLM alignment but also empirical ground truth for research in value pluralism and moral disagreement.

5. Findings, Gaps, and Dynamics of Human vs. Model Moral Reasoning

Judgment Alignment: LLMs align closely with human judgment only in scenarios with high human consensus. In ambiguous or divisive dilemmas (consensus $\sim0.5$ ), LLMs default to the majority or most-frequent option, failing to express pluralistic distribution (Russo et al., 23 Jul 2025).
Value Diversity: Human moral rationales invoke a significantly wider array of values (entropy $H=0.57$ ) compared to LLMs (pre-intervention entropy $\sim0.26$ ) (Russo et al., 23 Jul 2025).
Dynamic Prioritization: LLMs display non-transitive, context-sensitive value judgments across multi-stage dilemmas, shifting priorities such as care vs. fairness as stakes escalate (Wu et al., 23 May 2025).
Robustness to Context: Small descriptive shifts or injected biases can halve classifier accuracy, indicating overfitting and lack of true context sensitivity (Fitzgerald, 2024).
Cross-Lingual and Multimodal Gaps: Language-specific and modality-specific drops in alignment reveal that LLM and VLM performance is strongly conditioned by both linguistic community and input structure (Jin et al., 2024, Lin et al., 20 May 2025).

6. Pluralistic Alignment, Novel Methods, and Future Directions

Dynamic Moral Profiling (DMP): Conditioning model prompts on Dirichlet-sampled value profiles drawn from the topic-specific human empirical distribution yields notable performance gains—improving alignment by 64.3% and substantially increasing value diversity (Russo et al., 23 Jul 2025).
Probabilistic Aggregation: Generative modeling frameworks aggregate continuous model scores into consensus probabilities, weighting each contributor by reliability and enabling targeted optimization of misaligned LLMs via token embedding adjustment (Yuan et al., 17 Jun 2025).
Recommendations: Future dataset construction should ensure representation of distributional plurality, encode explicit non-monotonic and context-dependent norms, integrate adversarial (contrast set) evaluation, and leverage both scenario and rationale-based value annotation. Culturally and linguistically diverse corpora (as in UniMoral, MultiTP, CMoralEval) are essential for global alignment.
Open Challenges: Ensuring robustness to minor linguistic perturbations, closing the dynamic and pluralistic moral gap, and enabling transparency and multi-agent deliberation within model outputs remain key research frontiers.

7. Broader Implications

The proliferation of large-scale, pluralistically annotated moral dilemma datasets enables empirical study of the value landscape underlying real-world human and machine ethical decision-making. They establish the necessary empirical grounding for the computational modeling, evaluation, and alignment of LLMs and VLMs in contexts ranging from autonomous driving to digital advice. By formalizing the complexity of distributional disagreement, value diversity, and path-dependent reasoning, such resources inform both the design of more ethically aligned AI systems and the broader scientific understanding of the structure and dynamics of human moral cognition.