Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Code-Mixer Ya Nahi: Novel Approaches to Measuring Multilingual LLMs' Code-Mixing Capabilities (2410.11079v1)

Published 14 Oct 2024 in cs.CL and cs.AI

Abstract: Multilingual LLMs have demonstrated exceptional performance in Machine Translation (MT) tasks. However, their MT abilities in the context of code-switching (the practice of mixing two or more languages in an utterance) remain under-explored. In this paper, we introduce Rule-Based Prompting, a novel prompting technique to generate code-mixed sentences. We measure and compare the code-mixed MT abilities of 3 popular multilingual LLMs: GPT-3.5-turbo, GPT-4, and Gemini Pro across five language pairs: English-{Hindi, Bengali, Gujarati, French, Spanish} using $k$-shot prompting ($k\in{0, 1, 10, 20}$) and Rule-Based Prompting. Our findings suggest that though $k$-shot prompting often leads to the best results, Rule-Based prompting shows promise in generating unique code-mixed sentences that vary in their style of code-mixing. We also use $k$-shot prompting to gauge the code-mixed to English translation abilities of multilingual LLMs. For this purpose, we create a gold-standard code-mixed dataset spanning five language pairs: English-{Hindi, Bengali, Gujarati, French, Spanish}. As a real-world application of our work, we create a code-mixed chatbot.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Ayushman Gupta (2 papers)
  2. Akhil Bhogal (2 papers)
  3. Kripabandhu Ghosh (35 papers)

Summary

We haven't generated a summary for this paper yet.