Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service (1911.09969v4)

Published 22 Nov 2019 in cs.CL, cs.AI, and cs.IR

Abstract: Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more and more prevalent which need a huge amount of real conversation data. In this paper, we construct a large-scale real scenario Chinese E-commerce conversation corpus, JDDC, with more than 1 million multi-turn dialogues, 20 million utterances, and 150 million words. The dataset reflects several characteristics of human-human conversations, e.g., goal-driven, and long-term dependency among the context. It also covers various dialogue types including task-oriented, chitchat and question-answering. Extra intent information and three well-annotated challenge sets are also provided. Then, we evaluate several retrieval-based and generative models to provide basic benchmark performance on the JDDC corpus. And we hope JDDC can serve as an effective testbed and benefit the development of fundamental research in dialogue task

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Meng Chen (98 papers)
  2. Ruixue Liu (7 papers)
  3. Lei Shen (91 papers)
  4. Shaozu Yuan (4 papers)
  5. Jingyan Zhou (16 papers)
  6. Youzheng Wu (32 papers)
  7. Xiaodong He (162 papers)
  8. Bowen Zhou (141 papers)
Citations (56)