PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation (2109.09519v2)

Published 20 Sep 2021 in cs.CL

Abstract: To explore the limit of dialogue generation pre-training, we present the models of PLATO-XL with up to 11 billion parameters, trained on both Chinese and English social media conversations. To train such large models, we adopt the architecture of unified transformer with high computation and parameter efficiency. In addition, we carry out multi-party aware pre-training to better distinguish the characteristic information in social media conversations. With such designs, PLATO-XL successfully achieves superior performances as compared to other approaches in both Chinese and English chitchat. We further explore the capacity of PLATO-XL on other conversational tasks, such as knowledge grounded dialogue and task-oriented conversation. The experimental results indicate that PLATO-XL obtains state-of-the-art results across multiple conversational tasks, verifying its potential as a foundation model of conversational AI.

PDF Abstract

An Analysis of PLATO-XL: A Large-Scale Pre-Trained Model for Dialogue Generation

The paper "PLATO-XL: Exploring the Large-scale Pre-training of Dialogue Generation" explores the development and evaluation of PLATO-XL, a dialogue generation model with 11 billion parameters. This model is trained across English and Chinese social media conversations using a unified transformer architecture. Its design is characterized by an emphasis on computational and parameter efficiency, alongside a technique termed multi-party aware pre-training. The core objective of PLATO-XL is to achieve heightened performance in dialogue generation tasks, extending across open-domain chitchat and other conversational scenarios like knowledge-grounded dialogue and task-oriented conversation.

Model and Training

The authors underscore the unique architecture of PLATO-XL, leveraging a unified transformer (also known as PrefixLM) instead of the traditional encoder-decoder setup. This architectural choice fortifies the model with computational and parameter efficiency. Utilizing multi-party aware pre-training, PLATO-XL can effectively process the distinctions among different speakers involved in multi-turn conversations ubiquitous in social media. The model benefits from a comprehensive pre-training objective, minimizing divergence from expected responses, thus fortifying its capability in understanding and generating relevant dialogue.

The training regimen for PLATO-XL is robust, involving extensive corpus drawn from Reddit for English and various social media platforms for Chinese, resulting in 811M and 1.2B (context, response) samples respectively. This extensive data, paired with an innovative pre-training approach, positions PLATO-XL to absorb nuances from human-like interactions and to resolve common challenges in dialogue systems, including inconsistencies and hallucinations in responses.

Evaluation and Results

PLATO-XL's performance is assessed through both self-chat and human-bot evaluations. In self-chat settings, where a model autonomously generates dialogues, PLATO-XL outshines other models measured by coherence, informativeness, and engagingness, explicitly addressing hallucination and inconsistency issues with remarkable efficacy. Comparisons include models like DialoGPT, Blender, and the prior iteration PLATO-2, where PLATO-XL consistently achieves superior results.

In human-bot interactive evaluations, PLATO-XL also surpasses several commercial conversational agents in Chinese, such as Microsoft XiaoIce and Turing Robot. These evaluations reinforce the model's capabilities in maintaining engagement and generating coherent, contextually relevant dialogues.

Exploration of Additional Tasks

Beyond open-domain chitchat, the paper extends PLATO-XL's evaluation to knowledge-grounded dialogue and task-oriented scenarios. Within these contexts, PLATO-XL achieves state-of-the-art results, as evidenced by its performance in tasks like DuConv and MultiWOZ 2.2, showcasing adaptability and robustness across a spectrum of conversational AI tasks.

Implications and Future Directions

The implications of PLATO-XL's advancements are twofold. Practically, the model establishes a foundation for more nuanced and responsive conversational AI applications, capable of participating in complex, multi-turn conversations with reduced error rates. Theoretically, the success of PLATO-XL signifies the efficacy of scaling transformers with innovative pre-training strategies, pushing the envelope in dialogue system capabilities.

Future work may focus on continuing the model's expansion in conversational diversity and utility, possibly integrating cross-lingual capabilities and further enhancing its comprehension and response mechanisms. Additionally, addressing ethical considerations, such as bias and the generation of unsafe content, remains pivotal, guiding the development of responsible and reliable dialogue systems.

In conclusion, PLATO-XL exemplifies a remarkable stride in dialogue model pre-training, combining large-scale datasets with high-capacity architectures and strategic training approaches to excel in contemporary conversational AI challenges.

PDF Markdown Bookmark Chat (Pro)

Authors (14)

Siqi Bao (21 papers)
Huang He (14 papers)
Fan Wang (312 papers)
Hua Wu (191 papers)
Haifeng Wang (194 papers)
Wenquan Wu (12 papers)
Zhihua Wu (24 papers)
Zhen Guo (76 papers)
Hua Lu (27 papers)
Xinxian Huang (2 papers)
Xin Tian (39 papers)
Xinchao Xu (5 papers)
Yingzhan Lin (6 papers)
Zheng-Yu Niu (10 papers)

Citations (58)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos