WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation (2201.05955v5)

Published 16 Jan 2022 in cs.CL

Abstract: A recurring challenge of crowdsourcing NLP datasets at scale is that human writers often rely on repetitive patterns when crafting examples, leading to a lack of linguistic diversity. We introduce a novel approach for dataset creation based on worker and AI collaboration, which brings together the generative strength of LLMs and the evaluative strength of humans. Starting with an existing dataset, MultiNLI for natural language inference (NLI), our approach uses dataset cartography to automatically identify examples that demonstrate challenging reasoning patterns, and instructs GPT-3 to compose new examples with similar patterns. Machine generated examples are then automatically filtered, and finally revised and labeled by human crowdworkers. The resulting dataset, WANLI, consists of 107,885 NLI examples and presents unique empirical strengths over existing NLI datasets. Remarkably, training a model on WANLI improves performance on eight out-of-domain test sets we consider, including by 11% on HANS and 9% on Adversarial NLI, compared to training on the 4x larger MultiNLI. Moreover, it continues to be more effective than MultiNLI augmented with other NLI datasets. Our results demonstrate the promise of leveraging natural language generation techniques and re-imagining the role of humans in the dataset creation process.

Authors (4)

Alisa Liu (25 papers)
Swabha Swayamdipta (49 papers)
Noah A. Smith (224 papers)
Yejin Choi (287 papers)

Citations (200)

View on Semantic Scholar

Summary

The paper introduces a collaborative framework where human insight and GPT-3 generation combine to create a diverse NLI dataset that improves model robustness.
The methodology uses dataset cartography to pinpoint challenging MultiNLI examples that guide GPT-3 in generating refined instances for human review.
Models trained on WaNLI outperform those on larger datasets by up to 11% on benchmarks, demonstrating the value of human-AI collaboration.

Collaborative Dataset Creation in Natural Language Inference

The paper "WaNLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation" introduces a novel framework for generating datasets with a collaborative approach involving both human annotators and AI models. This approach is specifically applied to the Natural Language Inference (NLI) task, resulting in the creation of the WaNLI dataset, which emphasizes linguistic diversity and challenging reasoning patterns.

The motivation behind this approach arises from the inherent limitations of large-scale crowdsourced datasets. These datasets, while fostering rapid advancements in NLP, often suffer from repetitive linguistic patterns, leading to models that perform well on in-domain test sets but show brittleness with out-of-domain or adversarial examples. The authors identify the central issue as the repetitive annotation strategies employed by a relatively small group of crowdworkers under the prevalent crowdsourcing paradigm. This repetition constrains the diversity needed for robust generalization in NLP tasks.

To address this, the authors combine the generative capabilities of LLMs like GPT-3 with human expertise in evaluating and refining dataset examples. The process begins with "dataset cartography," which identifies challenging examples from an existing dataset, MultiNLI, that are used to guide the generation of new examples by GPT-3. These AI-generated examples are filtered based on an introduced metric before being reviewed and labeled by human workers.

The WaNLI dataset, consisting of 107,885 examples, demonstrates superior performance in enhancing model robustness, as evidenced by its effectiveness across eight out-of-domain test sets. Notably, WaNLI-trained models surpass those trained on the larger MultiNLI dataset by significant margins (e.g., improvements of 11% on the HANS benchmark and 9% on Adversarial NLI). These results underscore WaNLI's efficacy despite its smaller size and emphasize the potential of integrating LLM-based generation into the data creation pipeline.

Several implications stem from this research. Practically, the paper suggests a scalable and replicable method for constructing high-quality datasets that maintain linguistic diversity and complexity. Theoretically, it posits a shift in the role of human annotators from original data creators to refiners and evaluators, thus optimizing human involvement to areas where AI currently lacks efficacy, such as nuanced revision tasks.

Looking forward, this work could inspire similar methods across various NLP tasks to rejuvenate datasets suffering from pattern overfitting. Additionally, as AI models continue to improve in generating text that closely mimics human language, the human-machine collaboration model leveraged here could evolve to reduce manual revision further.

Overall, the research advances the field of AI by highlighting the benefits of a harmonious collaboration between human cognitive strengths and the computational power of AI models, leading to the creation of more resilient and adaptable NLP models.

PDF Markdown

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation (2201.05955v5)

Summary

Collaborative Dataset Creation in Natural Language Inference

Related Papers