Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ViWOZ: A Multi-Domain Task-Oriented Dialogue Systems Dataset For Low-resource Language (2203.07742v1)

Published 15 Mar 2022 in cs.CL and cs.AI

Abstract: Most of the current task-oriented dialogue systems (ToD), despite having interesting results, are designed for a handful of languages like Chinese and English. Therefore, their performance in low-resource languages is still a significant problem due to the absence of a standard dataset and evaluation policy. To address this problem, we proposed ViWOZ, a fully-annotated Vietnamese task-oriented dialogue dataset. ViWOZ is the first multi-turn, multi-domain tasked oriented dataset in Vietnamese, a low-resource language. The dataset consists of a total of 5,000 dialogues, including 60,946 fully annotated utterances. Furthermore, we provide a comprehensive benchmark of both modular and end-to-end models in low-resource language scenarios. With those characteristics, the ViWOZ dataset enables future studies on creating a multilingual task-oriented dialogue system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Phi Nguyen Van (4 papers)
  2. Tung Cao Hoang (3 papers)
  3. Dung Nguyen Manh (3 papers)
  4. Quan Nguyen Minh (2 papers)
  5. Long Tran Quoc (7 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.