Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Moral Reasoning Across Languages: The Critical Role of Low-Resource Languages in LLMs (2504.19759v1)

Published 28 Apr 2025 in cs.CL

Abstract: In this paper, we introduce the Multilingual Moral Reasoning Benchmark (MMRB) to evaluate the moral reasoning abilities of LLMs across five typologically diverse languages and three levels of contextual complexity: sentence, paragraph, and document. Our results show moral reasoning performance degrades with increasing context complexity, particularly for low-resource languages such as Vietnamese. We further fine-tune the open-source LLaMA-3-8B model using curated monolingual data for alignment and poisoning. Surprisingly, low-resource languages have a stronger impact on multilingual reasoning than high-resource ones, highlighting their critical role in multilingual NLP.

Summary

Moral Reasoning Across Languages: The Critical Role of Low-Resource Languages in LLMs

The paper "Moral Reasoning Across Languages: The Critical Role of Low-Resource Languages in LLMs" introduces an evaluative approach to understanding the multilingual moral reasoning capabilities of LLMs. It presents the Multilingual Moral Reasoning Benchmark (MMRB), a collection of datasets designed to assess moral reasoning across five diverse languages—English, Chinese, Russian, Vietnamese, and Indonesian—at varying levels of contextual complexity including sentence, paragraph, and document.

Key Findings

  1. Performance Variation Across Languages: The paper reveals significant inconsistencies in moral reasoning performance between different languages in LLMs. Notably, models tended to perform better in high-resource languages, such as English, compared to low-resource languages like Vietnamese and Indonesian. This underscores the disparity in multilingual NLP where English dominates, both in terms of data availability and performance metrics.
  2. Impact of Context Complexity: The research indicates a degradation in moral reasoning performance as contextual complexity increases from sentences to paragraphs. However, a counter trend was observed at the document level, where structured ethical frameworks are provided, suggesting that explicit moral guidance aids performance despite extended contexts.
  3. Influence of Low-Resource Languages: A pivotal discovery is the disproportionate influence of low-resource languages on multilingual reasoning. Fine-tuning LLMs with high-quality, monolingual data in these languages yielded stronger cross-linguistic alignment improvements than using data from high-resource languages. Conversely, data poisoning experiments highlighted vulnerabilities in model performance when subjected to corrupted datasets from low-resource languages.

Model Evaluation

The paper evaluates five notable LLMs—GPT-4, GPT-3.5, LLaMA3-70B, LLaMA3-8B, and Mixtral-8x7B—using the MMRB. GPT-4 consistently showcases superior performance across most scenarios, affirming its reputation as a leading model in processing complex language tasks. In contrast, LLaMA3-8B struggles with reasoning tasks in shorter contexts lacking explicit guidance but shows improved performance at the document-level owing to structured moral principles.

Discussion

The implications of these findings resonate both theoretically and practically within AI research and application:

  • Data Quality and Diversity: The importance of data quality in low-resource languages cannot be overstated. Diverse, culturally-grounded data can significantly impact the accuracy and fairness of LLMs in multilingual settings. This calls for increased focus on enhancing data quality and size in underrepresented languages.
  • Cross-Lingual Transfer: The paper challenges existing assumptions about cross-lingual transfer, suggesting that less represented languages might fill substantial gaps in model understanding, provided they are backed by high-quality data.
  • Ethical Considerations: Ensuring moral reasoning capabilities are consistent across languages is crucial to mitigate biases and ethical inconsistencies in AI systems deployed globally. This work emphasizes ongoing vigilance in data curation and ethical training across multilingual datasets.

Conclusions

The research underscores the need for robust methodologies in evaluating and enhancing multilingual moral reasoning in LLMs. By introducing the MMRB and demonstrating the critical role of low-resource languages, the paper sets a groundwork for further exploration into ethical AI development. Future work should explore refining fine-tuning strategies and expanding datasets to accommodate the cultural and linguistic diversity essential for fair and equitable AI solutions.