Papers
Topics
Authors
Recent
2000 character limit reached

Moral Decision Distance in LLMs

Updated 5 December 2025
  • Moral Decision Distance (MDD) is a metric that measures ethical divergence between LLMs and human judgments using nine canonical AMCE dimensions.
  • It employs the Euclidean distance to quantify differences, enabling comparisons of proprietary and open-source models in trolley-style dilemmas.
  • Empirical findings indicate that larger models achieve lower MDDs, suggesting higher alignment with human moral preferences in autonomous driving.

Moral Decision Distance (MDD) quantifies the divergence between the moral judgments of LLMs and an aggregate representation of human preferences. Operationalized as the Euclidean (L2L_2) distance between two nine-dimensional AMCE (Average Marginal Component Effect) vectors—one extracted from a given LLM, the other from the Moral Machine human benchmark—MDD provides a clear metric for comparing machine models’ ethical decision-making in the context of autonomous driving, especially in trolley-style dilemmas. The MDD framework, extensively evaluated across 52 proprietary and open-source LLMs, enables systematic assessment of how model size, architecture, and training revisions impact ethical alignment with humans (Ahmad et al., 11 Nov 2024).

1. Mathematical Definition of Moral Decision Distance

Let i=1,,9i=1,\dots,9 denote nine canonical moral preference dimensions: Species, Social Status, Relation to AV, Number of Lives, Law, Intervention, Gender, Fitness, and Age. For each model mm, its AMCE vector is β(m)=(β1(m),,β9(m))\boldsymbol{\beta}^{(m)} = (\beta_1^{(m)}, \ldots, \beta_9^{(m)}); the human aggregate is β(H)\boldsymbol{\beta}^{(H)}. The Moral Decision Distance is defined as

MDD(m)=β(m)β(H)2=i=19(βi(m)βi(H))2.\mathrm{MDD}(m) = \|\boldsymbol{\beta}^{(m)} - \boldsymbol{\beta}^{(H)}\|_2 = \sqrt{\sum_{i=1}^{9} (\beta_i^{(m)} - \beta_i^{(H)})^2}.

AMCE components are bounded within [1,1][-1,1], so MDD ranges from 0 (perfect alignment) up to approximately 6 for complete preference reversal.

2. Conjoint Analysis and Extraction of AMCEs

The assessment of MDD relies on a standard conjoint-analysis framework (Hainmueller et al. 2014), as adopted in the original Moral Machine paper. Each scenario is defined by nine binary attributes, which are dummy-coded to produce a design matrix XX. For each LLM and scenario kk, the binary model response Yk(m)Y_k^{(m)} (prefer Case 1 or 2) is regressed nonparametrically on these attributes. The AMCE for dimension ii is extracted as

βi(m)E[Y(m)Xi=1]E[Y(m)Xi=0]\beta_i^{(m)} \approx \mathbb{E}[Y^{(m)} | X_i=1] - \mathbb{E}[Y^{(m)} | X_i=0]

averaged over attribute randomization. The human AMCE vector β(H)\boldsymbol{\beta}^{(H)} is computed analogously. As all AMCEs are differences in choice probabilities, no further normalization is necessary.

3. Scaling, Normalization, and Interpretability

No post hoc normalization is applied beyond the inherent AMCE scaling to [1,1][-1,1] per dimension; consequently, all computed MDDs are strictly comparable across models and attributes. The direct interpretation is preserved: an MDD of zero indicates exact replication of human aggregate behavior, while higher values denote increasing divergence.

4. Empirical Distribution Across LLMs

MDD was evaluated for 52 systems: 51 LLMs spanning proprietary models (GPT, Claude, Gemini) and open-source models (Llama, Gemma), plus the human benchmark. The empirical distribution is visualized via violin-and-box plots, summarizing medians and interquartile ranges:

Model family Median MDD Min–Max MDD
Proprietary (e.g., GPT-4) 0.9 0.6–1.2
Open-source (all) 1.2 varies
Open-source (>10B) 0.9 comparable

Large open-source models (>>10B parameters) are statistically indistinguishable from proprietary models regarding MDD (Wilcoxon p=0.93p=0.93), indicating convergence in human-alignment at sufficient parameter scale. Smaller open-source models exhibit larger distances, denoting reduced moral alignment.

5. Relationship Between Model Size and Moral Alignment

Analysis of 25 open-source models with known parameter counts reveals a significant negative correlation between MDD and log-model size (Spearman ρ=0.50\rho = -0.50, p=0.018p=0.018). The relationship is succinctly approximated as

MDD(m)1.50.10log10(Paramsm)\mathrm{MDD}(m) \approx 1.5 - 0.10 \log_{10}(\mathrm{Params}_m)

though emphasis is placed on the rank correlation rather than strict linearity. This suggests scaling open-source LLMs naturally yields greater alignment with human aggregate judgments, up to a practical threshold.

6. Implications for System Design and Moral Alignment

MDD magnitudes are directly informative: values in the $0.6$–$0.8$ range indicate LLMs whose aggregate preferences closely match humans in trolley-style dilemmas. For MDD surpassing $1.2$–$1.5$, systematic divergence occurs—such as over-prioritization of specific ethical principles (e.g., always maximizing lives saved, or excessive pedestrian preference).

System design in autonomous driving must navigate a trade-off: low-MDD (near-human moral alignment) is typically achieved by high-parameter models, which demand significant computational resources (e.g., GPT-4 at hundreds of billions of parameters with MDD 0.6\sim0.6). Conversely, deploying smaller models (e.g., Llama 7B, MDD \approx 1.4) can significantly reduce latency and hardware cost but at the expense of human-aligned ethical behavior. Empirically, models in the 10–70B range achieve MDD \approx 0.7–0.9, establishing a practical balance between alignment and system efficiency.

7. Context, Limitations, and Applications

MDD provides a quantitative, interpretable metric for the evaluation of LLMs in ethical-critical tasks, especially for integration into autonomous vehicles. The procedure’s reliance on global human judgment aggregates and nine canonical moral axes enables principled cross-model and cross-iteration comparison. However, model updates do not necessarily guarantee reduced MDD, and LLMs can still over- or under-weight specific moral dimensions. A plausible implication is that cultural context and scenario diversity must be deeply considered when using MDD for deployment decisions, as the metric formalizes only divergence from an averaged global human stance, not local or individualized norms. Comprehensive use of MDD supports both benchmarking and practical trade-off analyses in moral-AI system engineering (Ahmad et al., 11 Nov 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Moral Decision Distance (MDD).