PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Published 21 Dec 2022 in cs.CL | (2212.10750v2)

Abstract: The widely studied task of Natural Language Inference (NLI) requires a system to recognize whether one piece of text is textually entailed by another, i.e. whether the entirety of its meaning can be inferred from the other. In current NLI datasets and models, textual entailment relations are typically defined on the sentence- or paragraph-level. However, even a simple sentence often contains multiple propositions, i.e. distinct units of meaning conveyed by the sentence. As these propositions can carry different truth values in the context of a given premise, we argue for the need to recognize the textual entailment relation of each proposition in a sentence individually. We propose PropSegmEnt, a corpus of over 45K propositions annotated by expert human raters. Our dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity. We establish strong baselines for the segmentation and entailment tasks. Through case studies on summary hallucination detection and document-level NLI, we demonstrate that our conceptual framework is potentially useful for understanding and explaining the compositionality of NLI labels.

Abstract PDF Upgrade to Chat

Authors (5)

Citations (21)

View on Semantic Scholar

Summary

The paper introduces a large-scale corpus with over 45K human-annotated propositions for enhanced proposition-level analysis in NLI.
It employs flexible token subset representation and models like T5 and BERT to improve segmentation accuracy and entailment classification.
The dataset facilitates applications like hallucination detection and cross-domain generalization, advancing nuanced text understanding.

PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition

Introduction

The paper "PropSegmEnt" introduces a comprehensive corpus aimed at enhancing Natural Language Inference (NLI) by recognizing textual entailment at the proposition level, rather than the conventional sentence or paragraph level. The dataset, named PROPSEGMENT, consists of over 45K human-annotated propositions, which are evaluated for entailment with respect to topically aligned documents. This corpus serves as a significant tool for improving the granularity and accuracy of entailment recognition in NLI tasks.

Dataset and Methodology

Corpus Construction

PROPSEGMENT features document clusters from WIKIPEDIA and news sources, with each cluster containing related documents. The dataset is designed to facilitate two primary tasks:

Propositional Segmentation (T1): A system must identify and segment meaningful propositions within a sentence.
Propositional Entailment (T2): Each proposition is classified as either entailed, neutral, or contradicted by a document from the same cluster.

This segmentation is done by expert annotators who label the propositions based on token subsets within sentences.

Figure 1: Task overview for propositional segmentation and entailment classification within document clusters.

Proposition Representation

Unlike conventional methods, which rely on predicate-argument structures, PROPSEGMENT adopts a more flexible approach that represents propositions as subsets of tokens in a sentence. This is to address the granularity mismatches seen with OpenIE and SRL systems, enabling a comprehensive extraction of propositions for entailment evaluation.

Modeling and Baselines

The authors establish baseline models for both segmentation and entailment tasks using Seq2Seq frameworks like T5 and encoder-tagger models like BERT. The Seq2Seq models outperform the simpler BERT baseline, suggesting the importance of modeling the joint probability of proposition outputs.

Evaluation Metrics

For propositional segmentation, the performance is measured using precision, recall, and F1 scores, with Jaccard similarity applied to assess the matching between predicted and gold propositions. For entailment, the accuracy and balanced accuracy metrics are employed, given the label imbalance in the dataset.

Applications and Analysis

Hallucination Detection

One of the proposed applications of PROPSEGMENT is in detecting hallucinations in text summarization. The proposition-level entailment allows for identifying unfaithful content in summaries by analyzing non-entailed propositions.

Cross-Domain Generalization

The paper investigates cross-domain generalization by evaluating model performance on unseen domains. It reveals how the model can maintain performance across syntactically and semantically varied documents from both Wikipedia and news domains.

Conclusion

PROPSEGMENT marks a pivotal advancement in NLI by promoting the proposition-level analysis of entailment relationships. The dataset not only aids in developing more nuanced models for understanding text but also holds promise for applications in hallucination detection and document comparison tasks. Future exploration could involve more balanced contradiction examples and applications to broader text classification tasks.

Markdown Report Issue