Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 62 tok/s

Gemini 2.5 Pro 51 tok/s Pro

GPT-5 Medium 36 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 67 tok/s Pro

Kimi K2 192 tok/s Pro

GPT OSS 120B 430 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Comparative Study of Multilingual Idioms and Similes in Large Language Models (2410.16461v2)

Published 21 Oct 2024 in cs.CL

Abstract: This study addresses the gap in the literature concerning the comparative performance of LLMs in interpreting different types of figurative language across multiple languages. By evaluating LLMs using two multilingual datasets on simile and idiom interpretation, we explore the effectiveness of various prompt engineering strategies, including chain-of-thought, few-shot, and English translation prompts. We extend the language of these datasets to Persian as well by building two new evaluation sets. Our comprehensive assessment involves both closed-source (GPT-3.5, GPT-4o mini, Gemini 1.5), and open-source models (Llama 3.1, Qwen2), highlighting significant differences in performance across languages and figurative types. Our findings reveal that while prompt engineering methods are generally effective, their success varies by figurative type, language, and model. We also observe that open-source models struggle particularly with low-resource languages in similes. Additionally, idiom interpretation is nearing saturation for many languages, necessitating more challenging evaluations.

Summary

The paper demonstrates that closed-source LLMs generally outperform open-source models in idiom interpretation, while simile processing remains challenging in low-resource languages.
It employs MABL, MAPS, and new Persian datasets along with varied prompting strategies such as zero-shot and chain-of-thought to rigorously evaluate model performance.
The study reveals that advancing LLMs for culturally nuanced figurative language requires targeted training and broader, more diverse multilingual datasets.

Comparative Study of Multilingual Idioms and Similes in LLMs

The paper, "Comparative Study of Multilingual Idioms and Similes in LLMs," provides an in-depth analysis of how various LLMs handle figurative language across multiple languages. The research specifically focuses on similes and idioms, offering a nuanced examination of LLM performance in interpreting these forms of expression. The authors address a significant gap in the literature by investigating the comparative capabilities of LLMs in multilingual contexts, demonstrating both the strengths and limitations of these models.

Methodology and Datasets

The paper employs two datasets, namely MABL and MAPS, complemented by two newly developed Persian datasets. MABL primarily deals with similes, while MAPS is concerned with idioms. The datasets span multiple languages, including both high-resource languages like English and low-resource languages such as Sundanese and Javanese. This diversity enables a comprehensive evaluation of LLMs, both closed-source (e.g., GPT-3.5, GPT-4o mini, Gemini 1.5) and open-source (e.g., Llama 3.1, Qwen2).

Prompt Engineering and Techniques

The paper explores various prompting strategies: zero-shot, one-shot, chain-of-thought (CoT), and dialogue simulation. These techniques are applied across native language inputs, as well as inputs translated into English. The authors particularly highlight the variable success of CoT depending on factors such as the language and the model employed. Furthermore, the research adapts these strategies to assess both literal and culturally nuanced figurative expressions, illustrating the complexity involved in processing such language forms.

Key Findings

The paper reveals several key insights:

Model Performance: While closed-source models generally outperform open-source ones, specific open-source models demonstrate competitive performance, particularly in high-resource languages.
Figurative Language Type: Idiom interpretation approaches saturation in many languages due to the presence of idioms in training data. In contrast, simile interpretation presents more challenges, particularly for low-resource languages.
Prompting Strategies: CoT proves highly effective for simile interpretation, especially in smaller models. However, the success of prompting techniques varies significantly by model size, the language of input, and the type of figurative expression.
Language and Cultural Nuance: LLMs struggle with idiomatic and culturally specific interpretations when presented in low-resource and non-Latin script languages.

Implications and Future Directions

The implications of this research are significant for both the practical deployment of LLMs and future theoretical advancements. Practically, the findings suggest that enhancing LLMs' capabilities in low-resource languages and simile interpretation requires targeted efforts in training and dataset development. Theoretically, it opens avenues for future research to explore the cultural nuances of figurative language processing and explore other figurative forms such as metaphors and sarcasm.

Future developments could focus on expanding the multilingual figurative language datasets to cover more languages consistently, which would facilitate better cross-linguistic comparisons. Additionally, creating more context-dependent and ambiguous test cases could push the boundaries of LLM capabilities further, ensuring models evolve to understand and generate figurative language with greater precision and cultural sensitivity.

In summary, this paper offers a rigorous comparative paper that enhances our understanding of how LLMs interact with complex and culturally loaded language constructs. The results and insights provide a foundational platform for ongoing research and development in multilingual natural language processing.