Text-to-Text Pre-Training for Data-to-Text Tasks (2005.10433v3)

Published 21 May 2020 in cs.CL

Abstract: We study the pre-train + fine-tune strategy for data-to-text tasks. Our experiments indicate that text-to-text pre-training in the form of T5, enables simple, end-to-end transformer based models to outperform pipelined neural architectures tailored for data-to-text generation, as well as alternative LLM based pre-training techniques such as BERT and GPT-2. Importantly, T5 pre-training leads to better generalization, as evidenced by large improvements on out-of-domain test sets. We hope our work serves as a useful baseline for future research, as transfer learning becomes ever more prevalent for data-to-text tasks.

Authors (2)

Mihir Kale (18 papers)
Abhinav Rastogi (29 papers)

Citations (190)

View on Semantic Scholar

Summary

Analyzing Text-to-Text Pre-Training for Data-to-Text Tasks

The paper "Text-to-Text Pre-Training for Data-to-Text Tasks" by Mihir Kale and Abhinav Rastogi puts forth a paper on utilizing the pre-train + fine-tune paradigm specifically for data-to-text generation tasks. The research highlights the advantages of employing the T5 model architecture, released by Raffel et al. (2019), surpassing traditional neural architectures in terms of performance and robustness.

The central focus of the paper is the application of T5’s text-to-text transfer learning to data-to-text tasks, encompassing a variety of benchmarks such as MultiWoz, ToTTo, and WebNLG. These tasks include converting structured data forms such as tables, graphs, and dialogue systems into coherent natural language text. The findings are significant, indicating that using T5 not only enables superior results across these benchmarks but demonstrates enhanced generalization capabilities to out-of-domain datasets compared to alternatives like BERT and GPT-2.

Key Contributions and Results

End-to-End Transformer Performance: The research demonstrates that through pre-training, a simple end-to-end transformer approach can outperform sophisticated multi-stage pipelined architectures, as well as graph neural networks. For instance, in the WebNLG dataset, the T5-Large model achieved a notable 57.1 BLEU score, outperforming existing models, including the current state-of-the-art DualEnc by 4.3 BLEU.
Generalization to Out-of-Domain Inputs: A remarkable aspect of T5's performance is its ability to adapt to unseen inputs effectively. On the WebNLG dataset’s unseen test set, T5 achieved a substantial improvement of 14 BLEU over the DualEnc model, indicating robust out-of-domain generalization capabilities.
Comparison with Other Models: Across datasets, the T5 models consistently performed better than other pre-trained models like BERT-to-BERT and SC-GPT2. For instance, on the ToTTo test set, T5-3B improved upon the BERT-to-BERT baseline by 5.5 BLEU and 5.8 PARENT, underscoring the efficacy of text-to-text pre-training.

Methodology

The paper employs the T5 encoder-decoder transformer architecture, pre-trained using a multitask objective involving unsupervised span masking and various supervised tasks like translation and summarization. The datasets used span diverse domains, with WebNLG involving graph-to-text tasks, ToTTo focusing on table-to-text, and MultiWoz centered on generating dialogue from meaning representations. The structured data is linearized into text format, and the model is fine-tuned on these datasets.

Implications and Future Directions

This paper sets a new standard for data-to-text generation tasks by showcasing the potential of T5 pre-training, providing a robust baseline for future research endeavors. By illustrating significant improvements over existing methods, this work encourages further exploration into tailored unsupervised pre-training objectives that could enhance text generation from diverse structured data domains.

Further research could also delve into extending these methodologies to other languages, especially those with limited resources, thereby broadening the accessibility and applicability of these advanced text generation models. Additionally, dissecting the impacts of model capacity on various datasets may yield insights into optimal configurations for specific task complexities.

In conclusion, this paper refrains from sensationalism while presenting meaningful advancements in the field of data-to-text tasks, emphasizing empirical results that demonstrate the profound impact of pre-training with the T5 architecture. As data-driven systems become increasingly integral to technology ecosystems, such contributions are pivotal in advancing natural language generation capabilities.

PDF Markdown

Related Papers

Find Related Papers