Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 175 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 447 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Moral Dilemma Dataset (MDD) Overview

Updated 7 November 2025
  • MDD is a distributional, pluralism-oriented benchmark of 1,618 authentic moral dilemmas annotated with diverse, real-world human judgments.
  • It employs a rigorous methodology by decontextualizing Reddit scenarios, applying binary labels with free-text rationales, and utilizing a 60-value taxonomy.
  • Results indicate that LLMs often default to mode-seeking and narrow value repertoires, highlighting gaps in capturing human ethical pluralism.

The Moral Dilemma Dataset (MDD) is a distributional, pluralism-oriented benchmark designed to evaluate the alignment between LLMs and the spectrum of human moral judgment in authentic, context-rich scenarios. Unlike prior datasets focused on stylized, synthetic dilemmas or single-label crowd judgments, MDD combines real-world moral ambiguity, granular distributional human annotations, and rigorously engineered value taxonomy to probe both the outcomes and rationales of LLM moral reasoning.

1. Composition, Source, and Construction

The MDD comprises 1,618 true-to-life moral dilemmas sourced from the r/AmITheAsshole (AITA) subreddit over a six-month period, representing diverse domains such as family conflict, workplace ethics, care obligations, and interpersonal disputes. Each scenario is fully decontextualized—rewritten (by GPT-4o-mini) to remove distinguishing AITA formatting or signals—thereby capturing an ecologically valid and hard-to-spoof distribution of real dilemmas.

For each scenario, both a succinct title and a detailed narrative body are provided, along with manually assigned topic labels (family, work, health, etc.) and available demographic cues. Scenarios were selected for inclusion only if they could not be trivially identified as originating from AITA, ensuring general-purpose utility.

2. Human Annotation and Judgment Distribution

Each dilemma is annotated with the complete set of direct community moral judgments: 51,776 binary human evaluations (Acceptable/Unacceptable) and their accompanying free-text rationales, extracted from Reddit comments. To enforce comparability, only majority-verdict (“NTA” or “YTA”) comments with an explicit binary stance were retained; ambiguous or non-binary (e.g., ESH, NAH, INFO) replies were excluded.

Unlike prior “consensus-first” datasets, MDD preserves the entire pluralistic distribution of judgments for every scenario, not just the modal outcome:

Dilemma # Human Votes % Acceptable % Unacceptable Consensus Score
Example A 90 0.74 0.26 0.74
Example B 18 0.56 0.44 0.56

Consensus score (modal class proportion, 0.5 = maximal disagreement, 1.0 = unanimous) is recorded per dilemma. This allows for detailed stratification of “clean-cut” vs. “contentious” cases during model evaluation.

Each human judgment consists of:

  • A binary label: 1 (Acceptable) or 0 (Unacceptable).
  • A free-text rationale (mean length: 28.9 tokens, stdev: 11.7), providing the explicit moral reasoning behind the choice.

3. Taxonomy of Moral Values and Value Extraction

All rationales (human and model-generated) are processed with the Value Kaleidoscope system, which applies LLM-powered classification (validated by prior work) to extract explicit value terms from moral rationales. The full procedure yields:

  • 3,783 unique value expressions, mapped via semantic embedding (OpenAI text-embedding-3-large) and agglomerative clustering.
  • Manual expert curation (5 reviewers) produces a 60-value taxonomy encompassing Autonomy, Beneficence, Care, Compassion, Justice, Inclusivity, Freedom, and 53 additional fine-grained categories.
  • Each rationale yields a value profile—the set of values referenced, frequency-normalized.

For entropy analysis, value diversity per response set is calculated as:

H=k=1KpklogpkH = -\sum_{k=1}^{K} p_k \log p_k

where pkp_k is the frequency of value vkv_k in either human or model rationales.

4. Benchmarking LLMs: Protocol

MDD is structured for benchmarking in a distributional and rationalist fashion. Each LLM is tasked to, for every dilemma:

  • Generate as many binary evaluation+rationale pairs as the number of human judgments for that case (replicating sample size and diversity).
  • Rationale generation is evaluated both for surface-level label agreement and for value diversity and distributional alignment.

Prompting Regimes:

  • Zero-shot: LLMs respond without demographic or persona cues.
  • Persona-based: Demographic sampling to match community diversity.
  • Model council: Aggregated responses from multiple LLMs acting as a panel.

Distributional Alignment Metric:

For each dilemma did_i with NiN_i human judgments,

Δi=PHumani(1)PLLMi(1)\Delta_i = \left|P^{\text{Human}_i}(1) - P^{\text{LLM}_i}(1)\right|

where PHumani(1)P^{\text{Human}_i}(1) is the empirical proportion of “Acceptable” in human judgments, and PLLMi(1)P^{\text{LLM}_i}(1) is the corresponding model proportion. Mean Δ\Delta is reported across all dilemmas; lower is better.

5. Major Findings on Model-Human Alignment

MDD establishes two orthogonal axes of evaluation:

  1. Distributional Judgment Alignment: LLMs reproduce human consensus well only when it is high; in ambiguous scenarios with discordant human judgments, LLMs default to mode-seeking, resulting in poor distributional alignment even for best-in-class models (council Δ=0.22\Delta = 0.22). Standard prompting does not capture humanlike pluralism in ambiguous cases.
  2. Value Diversity Gap: Model rationales rely on a much narrower palette of moral values. Human rationales produce an entropy Hhuman=0.57H_{human} = 0.57, with the top 10 values accounting for only 35.2% of value mentions. LLMs, by contrast, have entropy Hmodel=0.37H_{model} = 0.37 (standard regime), top 10 covering 81.6%—focusing on Autonomy, Care, and similar platitudes, rarely reflecting marginalized or situational values prominent in human justification.

A key intervention, Dynamic Moral Profiling (DMP)—conditioning model responses on sampled human value-profiles from a Dirichlet prior—boosts both alignment (64.3%-64.3\% mean error, especially in low-consensus cases) and entropy (HDMP=0.52H_{DMP} = 0.52), yielding greater coverage of mid- and low-frequency values.

6. Methodological Advantages and Applications

MDD enables pluralism-sensitive, real-world grounded benchmarking for LLMs, providing:

  • Distributional direct comparison metrics for models’ ability to reflect true human heterogeneity.
  • Rationalist value-profile assessment, enabling analysis of which values are being invoked or neglected by LLMs in their justification.
  • Ecologically valid test bed for interventions such as DMP or theory-driven prompting (e.g., Moral Foundations Theory filtering).

MDD is well-positioned for diagnosing risks of monist/majoritarian alignment, testing LLMs intended for advisory roles, and for evaluating the effectiveness of context- or value-conditioning techniques at surfacing humanlike diversity in automated ethical advice.

7. Comparison and Positioning within Moral Judgment Datasets

Compared to prior datasets:

Dataset Scale Dilemma Source Label Distribution Value Taxonomy Free-text Rationales
MDD (this work) 1.6k Decontextualized real world Full, per-dilemma 60-values Yes
Scruples 32k+ Reddit-AITA Full, multiclass No No
Moral Machine 40M Synthetic AV Aggregate AMCE None No
ETHICS 130k Synthetic/fictional Binary Theory-mapped No
UniMoral 194–294/scenario x 6 lang Psych + Reddit Per-annotator 4 Principle types Yes

MDD’s design is unique for capturing both the judgment distribution and the diversity of rationales as produced by naturalistic communities, operationalizing value pluralism as a concrete evaluation axis (Russo et al., 23 Jul 2025).

8. Implications and Future Directions

The MDD reveals that distributional human alignment and value pluralism are not realized by default in even the strongest LLMs, underscoring the need for explicit, data-driven conditioning if LLMs are to play credible roles in sensitive, ethically-charged contexts. Pluralistic distributional benchmarks, such as MDD, provide a more stringent and informative test of moral alignment than single-label or stylized datasets.

The data and methodology demonstrate the inadequacy of majority-vote metrics for capturing human-moral complexity and provide a scalable, extensible foundation for future research on value alignment, model-centric pluralism interventions, and pluralistic explainability in AI moral reasoning.


References:

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Moral Dilemma Dataset (MDD).