Logic2Text: High-Fidelity Natural Language Generation from Logical Forms (2004.14579v2)

Published 30 Apr 2020 in cs.CL

Abstract: Previous works on Natural Language Generation (NLG) from structured data have primarily focused on surface-level descriptions of record sequences. However, for complex structured data, e.g., multi-row tables, it is often desirable for an NLG system to describe interesting facts from logical inferences across records. If only provided with the table, it is hard for existing models to produce controllable and high-fidelity logical generations. In this work, we formulate logical level NLG as generation from logical forms in order to obtain controllable, high-fidelity, and faithful generations. We present a new large-scale dataset, \textsc{Logic2Text}, with 10,753 descriptions involving common logic types paired with the underlying logical forms. The logical forms show diversified graph structure of free schema, which poses great challenges on the model's ability to understand the semantics. We experiment on (1) Fully-supervised training with the full datasets, and (2) Few-shot setting, provided with hundreds of paired examples; We compare several popular generation models and analyze their performances. We hope our dataset can encourage research towards building an advanced NLG system capable of natural, faithful, and human-like generation. The dataset and code are available at https://github.com/czyssrs/Logic2Text.

PDF Abstract

High-Fidelity Natural Language Generation from Logical Forms

The paper "Logic2Text: High-Fidelity Natural Language Generation from Logical Forms" addresses the challenges associated with generating natural language descriptions from structured data, particularly multi-row tables. The authors focus on enhancing the fidelity and controllability of descriptions produced by Natural Language Generation (NLG) systems, which traditionally have been limited to surface-level summaries of data sequences. To improve the semantic richness and accuracy of generated texts, the paper proposes leveraging logical forms as intermediate representations.

Logic2Text Dataset and Methodology

The authors introduce a new dataset, Logic2Text, which contains 10,753 descriptions paired with their underlying logical forms extracted from 5,600 tables. These logical forms are designed to capture diverse graph structures and common logic types such as count, superlative, comparative, aggregation, majority, unique, and ordinal operations. They argue that providing a logical form alongside table data allows models to produce more faithful and semantically richer outputs, effectively separating logical reasoning from linguistic realization.

The dataset construction involves multiple stages, including data collection, logical form annotation, and execution verification. Annotation tasks are carefully designed to capture the logical relations embedded in table data. Logical forms are represented in a Python-like syntax and their semantic correctness verified through execution. Notably, this approach ensures 100% execution correctness while maintaining 90% semantic correctness as confirmed by expert evaluators.

Experimental Results

The paper evaluates several models under both fully-supervised and few-shot learning settings. These models include traditional seq2seq architectures with attention, pointer-generator networks, graph-to-sequence models, transformers with copy mechanisms, and the pre-trained GPT-2 LLM. Across experiments, GPT-2 demonstrated superior performance in generating high-fidelity descriptions, achieving 31.44 BLEU-4 and 64.16 ROUGE-1 scores under full supervision. This highlights the benefits of leveraging pre-trained LLMs' implicit semantic understanding and generation capabilities.

The authors further demonstrate the critical role of logical forms in ensuring logical fidelity by comparing results with and without logical form inputs. Results indicate that generating descriptions solely from tables results in low factual correctness rates, reinforcing the necessity of intermediate logical representations.

Implications and Future Development

The ability to generate accurate and semantically rich natural language descriptions from multi-row tables has significant implications for automating reports in sectors like finance, healthcare, and scientific research. The approach adopted in the paper can significantly improve the reliability and usability of automated text generation systems in these domains.

Looking ahead, the authors suggest avenues for future research including the development of parsers to automatically generate logical forms from text, extensions to the dataset for broader coverage, and exploring discourse planning for logical form selection. The Logic2Text dataset and findings also open up potential collaborations between semantic parsing and NLG, given the shared reliance on structured logical representations.

The paper establishes a substantial groundwork for advancing the capabilities of NLG systems with structured data inputs, encouraging continued exploration and refinement in this area.

PDF Markdown Bookmark Chat (Pro)

Authors (7)

Zhiyu Chen (60 papers)
Wenhu Chen (134 papers)
Hanwen Zha (8 papers)
Xiyou Zhou (6 papers)
Yunkai Zhang (11 papers)
Sairam Sundaresan (17 papers)
William Yang Wang (254 papers)

Citations (61)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - czyssrs/Logic2Text: Data and code for EMNLP 2020 paper "Logic2Text: High-Fidelity Natural Language Generation from Logical Forms" (69 stars)