Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling (2106.02787v1)

Published 5 Jun 2021 in cs.CL

Abstract: Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches. We provide state-of-the-art baselines under three evaluation settings (monolingual, bilingual, and cross-lingual). The analysis of our baselines in different settings highlights 1) the effectiveness of training a bilingual ToD system compared to two independent monolingual ToD systems, and 2) the potential of leveraging a bilingual knowledge base and cross-lingual transfer learning to improve the system performance under low resource condition.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zhaojiang Lin (45 papers)
  2. Andrea Madotto (64 papers)
  3. Genta Indra Winata (94 papers)
  4. Peng Xu (357 papers)
  5. Feijun Jiang (13 papers)
  6. Yuxiang Hu (25 papers)
  7. Chen Shi (55 papers)
  8. Pascale Fung (150 papers)
Citations (59)
Github Logo Streamline Icon: https://streamlinehq.com