Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CALCS 2021 Shared Task: Machine Translation for Code-Switched Data (2202.09625v1)

Published 19 Feb 2022 in cs.CL

Abstract: To date, efforts in the code-switching literature have focused for the most part on language identification, POS, NER, and syntactic parsing. In this paper, we address machine translation for code-switched social media data. We create a community shared task. We provide two modalities for participation: supervised and unsupervised. For the supervised setting, participants are challenged to translate English into Hindi-English (Eng-Hinglish) in a single direction. For the unsupervised setting, we provide the following language pairs: English and Spanish-English (Eng-Spanglish), and English and Modern Standard Arabic-Egyptian Arabic (Eng-MSAEA) in both directions. We share insights and challenges in curating the "into" code-switching language evaluation data. Further, we provide baselines for all language pairs in the shared task. The leaderboard for the shared task comprises 12 individual system submissions corresponding to 5 different teams. The best performance achieved is 12.67% BLEU score for English to Hinglish and 25.72% BLEU score for MSAEA to English.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shuguang Chen (12 papers)
  2. Gustavo Aguilar (16 papers)
  3. Anirudh Srinivasan (9 papers)
  4. Mona Diab (71 papers)
  5. Thamar Solorio (67 papers)
Citations (14)

Summary

We haven't generated a summary for this paper yet.