Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bootstrapping Multilingual Semantic Parsers using Large Language Models (2210.07313v2)

Published 13 Oct 2022 in cs.CL and cs.LG

Abstract: Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be a key mechanism for training task-specific multilingual models. However, for many low-resource languages, the availability of a reliable translation service entails significant amounts of costly human-annotated translation pairs. Further, translation services may continue to be brittle due to domain mismatch between task-specific input text and general-purpose text used for training translation models. For multilingual semantic parsing, we demonstrate the effectiveness and flexibility offered by LLMs for translating English datasets into several languages via few-shot prompting. Through extensive comparisons on two public datasets, MTOP and MASSIVE, spanning 50 languages and several domains, we show that our method of translating data using LLMs outperforms a strong translate-train baseline on 41 out of 50 languages. We study the key design choices that enable more effective multilingual data translation via prompted LLMs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Abhijeet Awasthi (12 papers)
  2. Nitish Gupta (27 papers)
  3. Bidisha Samanta (14 papers)
  4. Shachi Dave (12 papers)
  5. Sunita Sarawagi (46 papers)
  6. Partha Talukdar (51 papers)
Citations (6)