Edisum: Summarizing and Explaining Wikipedia Edits at Scale (2404.03428v2)

Published 4 Apr 2024 in cs.CL

Abstract: An edit summary is a succinct comment written by a Wikipedia editor explaining the nature of, and reasons for, an edit to a Wikipedia page. Edit summaries are crucial for maintaining the encyclopedia: they are the first thing seen by content moderators and they help them decide whether to accept or reject an edit. Additionally, edit summaries constitute a valuable data source for researchers. Unfortunately, as we show, for many edits, summaries are either missing or incomplete. To overcome this problem and help editors write useful edit summaries, we propose a model for recommending edit summaries generated by a LLM trained to produce good edit summaries given the representation of an edit diff. To overcome the challenges of mixed-quality training data and efficiency requirements imposed by the scale of Wikipedia, we fine-tune a small generative LLM on a curated mix of human and synthetic data. Our model performs on par with human editors. Commercial LLMs are able to solve this task better than human editors, but are not well suited for Wikipedia, while open-source ones fail on this task. More broadly, we showcase how LLMing technology can be used to support humans in maintaining one of the largest and most visible projects on the Web.

References (43)

Citations (1)

View on Semantic Scholar

Summary

The paper presents Edisum, a model that combines human and synthetic data to generate clear Wikipedia edit summaries.
It employs fine-tuned LongT5-based models and benchmarks performance against human editors and costlier LLMs like GPT-4.
The study highlights opportunities to enhance summary quality by refining context capture and exploring alternative edit diff representations.

Edisum: A Novel Approach for Generating Wikipedia Edit Summaries

Introduction to the Challenge of Edit Summaries

Edit summaries in Wikipedia provide concise explanations regarding the nature and reasoning behind changes made to articles. These summaries are indispensable tools for content moderators and constitute a rich source of data for research into collaborative editing behaviors. Despite their importance, a considerable portion of edits either lack a summary or feature descriptions that are too vague or misleading. Addressing this gap, the paper presents a model designed to assist editors in crafting useful edit summaries by leveraging a LLM trained on a mixed dataset of human and synthetically generated summaries.

Underlying Challenges and Model Development

The task of generating informative edit summaries is fraught with difficulties. For one, distinguishing between high and low-quality summaries is non-trivial, potentially contaminating the training dataset with misleading examples. Furthermore, ideal summaries should encapsulate not only the changes made but also the motivations behind them, often requiring context beyond the edit itself. To circumvent these obstacles, the authors embarked on a meticulous process of data curation and model fine-tuning.

A combination of human-generated summaries and synthetically generated data was used to train a range of smaller generative LLMs based on LongT5. The synthetic data was produced using a LLM, specifically adapted to generate summaries that concisely describe edits and, where possible, their motivations. This synthetic training approach was novel and aimed at overcoming the limitations posed by the mixed-quality and often sparse nature of human-provided summaries.

Key Results and Evaluation

The model's efficacy was benchmarked against human editors and commercial LLMs through both automatic and manual evaluations. Notably, the generative model, dubbed Edisum, demonstrated parity with human editors in generating summaries, as evidenced by comparative MoverScore ratings and human assessments. Remarkably, while commercial LLMs like GPT-4 outperformed human summarizers, their operational costs and scalability constraints on platforms as vast as Wikipedia rendered them impractical for everyday use. Edisum emerges as a viable alternative, capable of generating high-quality summaries at a fraction of the computational and financial cost.

Implications and Prospective Developments

This paper illustrates the potential of generative LLMs in enhancing the quality of Wikipedia edit summaries, thereby supporting the encyclopedia's maintenance and the broader research community. The successful application of synthetic data in training points to an exciting direction for future research, suggesting ways to bridge the gap between the advanced capabilities of LLMs and the practical limitations of deploying such models at scale.

Looking forward, refining the model to better capture the subtleties of "why" an edit was made could further enhance summary quality. Additionally, exploring alternative representations of edit diffs may provide the model with richer context, potentially improving its ability to generate more accurate and informative summaries. The development and deployment of Edisum represent a significant step toward harnessing the power of AI to support one of the largest collaborative knowledge projects online.

PDF Markdown

Related Papers

Tweets

https://twitter.com/WikiResearch/status/1777290036985180162