Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Edisum: Summarizing and Explaining Wikipedia Edits at Scale (2404.03428v2)

Published 4 Apr 2024 in cs.CL

Abstract: An edit summary is a succinct comment written by a Wikipedia editor explaining the nature of, and reasons for, an edit to a Wikipedia page. Edit summaries are crucial for maintaining the encyclopedia: they are the first thing seen by content moderators and they help them decide whether to accept or reject an edit. Additionally, edit summaries constitute a valuable data source for researchers. Unfortunately, as we show, for many edits, summaries are either missing or incomplete. To overcome this problem and help editors write useful edit summaries, we propose a model for recommending edit summaries generated by a LLM trained to produce good edit summaries given the representation of an edit diff. To overcome the challenges of mixed-quality training data and efficiency requirements imposed by the scale of Wikipedia, we fine-tune a small generative LLM on a curated mix of human and synthetic data. Our model performs on par with human editors. Commercial LLMs are able to solve this task better than human editors, but are not well suited for Wikipedia, while open-source ones fail on this task. More broadly, we showcase how LLMing technology can be used to support humans in maintaining one of the largest and most visible projects on the Web.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Do not have enough data? deep learning to the rescue! In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  7383–7390, 2020.
  2. Turbulent stability of emergent roles: The dualistic nature of self-organizing knowledge coproduction. Information Systems Research, 27(4):792–812, 2016.
  3. Automatically labeling low quality content on wikipedia by leveraging patterns in editing behaviors. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):1–23, 2021.
  4. Scaling instruction-finetuned language models, 2022.
  5. Text editing by command. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.  5259–5274, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.414. URL https://aclanthology.org/2021.naacl-main.414.
  6. Zerogen+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT: Self-guided high-quality data generation in efficient zero-shot learning, 2022. URL https://arxiv.org/abs/2205.12679.
  7. The work of sustaining order in wikipedia: The banning of a vandal. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW ’10, pp.  117–126, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605587950. doi: 10.1145/1718918.1718941. URL https://doi.org/10.1145/1718918.1718941.
  8. Trace ethnography: Following coordination through documentary practices. In 2011 44th Hawaii international conference on system sciences, pp.  1–10. IEEE, 2011.
  9. Longt5: Efficient text-to-text transformer for long sequences, 2022.
  10. Exploiting asymmetry for synthetic training data generation: Synthie and the case of information extraction, 2023.
  11. The stack: 3 tb of permissively licensed source code. Transactions on Machine Learning Research, 2022.
  12. Data augmentation using pre-trained transformer models. arXiv preprint arXiv:2003.02245, 2020.
  13. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
  14. Self-prompting large language models for open-domain qa, 2022. URL https://arxiv.org/abs/2212.08635.
  15. Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp.  74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/W04-1013.
  16. Generating training data with language models: Towards zero-shot language understanding, 2022. URL https://arxiv.org/abs/2202.04538.
  17. Simulated chats for building dialog systems: Learning to generate conversations from instructions. arXiv preprint arXiv:2010.10216, 2020.
  18. Jonathan Morgan. Patrolling on wikipedia, 2019. URL https://meta.wikimedia.org/wiki/Research:Patrolling_on_Wikipedia.
  19. OpenAI. OpenAI model pricing. https://openai.com/pricing, 2024a. Accessed: 2024-03-12.
  20. OpenAI. OpenAI models documentation. https://platform.openai.com/docs/models, 2024b. Accessed: 2024-03-12.
  21. Wikipedians are born, not made: a study of power editors on wikipedia. In Proceedings of the 2009 ACM International Conference on Supporting Group Work, pp.  51–60, 2009.
  22. Dare: Data augmented relation extraction with gpt-2. arXiv preprint arXiv:2004.13845, 2020.
  23. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pp.  311–318, USA, 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://doi.org/10.3115/1073083.1073135.
  24. Mind your pov: Convergence of articles and editors towards wikipedia’s neutrality norm. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW):1–23, 2018.
  25. Data augmentation for intent classification with off-the-shelf large language models, 2022. URL https://arxiv.org/abs/2204.01959.
  26. Peer: A collaborative language model, 2022.
  27. Synthetic prompting: Generating chain-of-thought demonstrations for large language models, 2023. URL https://arxiv.org/abs/2302.00618.
  28. Information quality work organization in wikipedia. Journal of the American society for information science and technology, 59(6):983–1001, 2008.
  29. Edit wars in wikipedia. In 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing. IEEE, oct 2011. doi: 10.1109/passat/socialcom.2011.47. URL https://doi.org/10.1109%2Fpassat%2Fsocialcom.2011.47.
  30. Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks, 2023.
  31. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
  32. Towards zero-label language learning. CoRR, abs/2109.09193, 2021. URL https://arxiv.org/abs/2109.09193.
  33. Visualizing activity on wikipedia with chromograms. In Human-Computer Interaction–INTERACT 2007: 11th IFIP TC 13 International Conference, Rio de Janeiro, Brazil, September 10-14, 2007, Proceedings, Part II 11, pp.  272–287. Springer, 2007.
  34. Wikimedia. Wikimedia foundation guiding principles. https://foundation.wikimedia.org/wiki/Resolution:Wikimedia_Foundation_Guiding_Principles, 2024a. Accessed: 2024-03-12.
  35. Wikimedia. Edit summary. https://meta.wikimedia.org/wiki/Help:Edit_summary, 2024b. Accessed: 2024-03-12.
  36. Wikipedia. Automatic edit summaries. https://en.wikipedia.org/wiki/Help:Automatic_edit_summaries, 2024a. Accessed: 2024-03-12.
  37. Wikipedia. Canned edit summaries. https://en.wikipedia.org/wiki/Wikipedia:Canned_edit_summaries, 2024b. Accessed: 2024-03-12.
  38. Wikipedia. Wikipedia statistics on number of edits performed. https://stats.wikimedia.org/#/en.wikipedia.org/contributing/edits/normal|bar|2-year|editor_type~anonymous*group-bot*name-bot*user+(page_type)~content|monthly, 2024c. Accessed: 2024-03-12.
  39. Wikipedia. Wikipedia revision deletion. https://en.wikipedia.org/wiki/Wikipedia:Revision_deletion, 2024d. Accessed: 2024-03-12.
  40. Wikitech. Wikipedia: Access to gpus. https://wikitech.wikimedia.org/wiki/Machine_Learning/AMD_GPU#Do_we_have_Nvidia_GPUs, 2024. Accessed: 2024-03-12.
  41. Identifying semantic edit intentions from revisions in wikipedia. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp.  2000–2010, 2017.
  42. Zerogen: Efficient zero-shot learning via dataset generation, 2022. URL https://arxiv.org/abs/2202.07922.
  43. MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp.  563–578, Hong Kong, China, November 2019. doi: 10.18653/v1/D19-1053. URL https://aclanthology.org/D19-1053.
Citations (1)

Summary

  • The paper presents Edisum, a model that combines human and synthetic data to generate clear Wikipedia edit summaries.
  • It employs fine-tuned LongT5-based models and benchmarks performance against human editors and costlier LLMs like GPT-4.
  • The study highlights opportunities to enhance summary quality by refining context capture and exploring alternative edit diff representations.

Edisum: A Novel Approach for Generating Wikipedia Edit Summaries

Introduction to the Challenge of Edit Summaries

Edit summaries in Wikipedia provide concise explanations regarding the nature and reasoning behind changes made to articles. These summaries are indispensable tools for content moderators and constitute a rich source of data for research into collaborative editing behaviors. Despite their importance, a considerable portion of edits either lack a summary or feature descriptions that are too vague or misleading. Addressing this gap, the paper presents a model designed to assist editors in crafting useful edit summaries by leveraging a LLM trained on a mixed dataset of human and synthetically generated summaries.

Underlying Challenges and Model Development

The task of generating informative edit summaries is fraught with difficulties. For one, distinguishing between high and low-quality summaries is non-trivial, potentially contaminating the training dataset with misleading examples. Furthermore, ideal summaries should encapsulate not only the changes made but also the motivations behind them, often requiring context beyond the edit itself. To circumvent these obstacles, the authors embarked on a meticulous process of data curation and model fine-tuning.

A combination of human-generated summaries and synthetically generated data was used to train a range of smaller generative LLMs based on LongT5. The synthetic data was produced using a LLM, specifically adapted to generate summaries that concisely describe edits and, where possible, their motivations. This synthetic training approach was novel and aimed at overcoming the limitations posed by the mixed-quality and often sparse nature of human-provided summaries.

Key Results and Evaluation

The model's efficacy was benchmarked against human editors and commercial LLMs through both automatic and manual evaluations. Notably, the generative model, dubbed Edisum, demonstrated parity with human editors in generating summaries, as evidenced by comparative MoverScore ratings and human assessments. Remarkably, while commercial LLMs like GPT-4 outperformed human summarizers, their operational costs and scalability constraints on platforms as vast as Wikipedia rendered them impractical for everyday use. Edisum emerges as a viable alternative, capable of generating high-quality summaries at a fraction of the computational and financial cost.

Implications and Prospective Developments

This paper illustrates the potential of generative LLMs in enhancing the quality of Wikipedia edit summaries, thereby supporting the encyclopedia's maintenance and the broader research community. The successful application of synthetic data in training points to an exciting direction for future research, suggesting ways to bridge the gap between the advanced capabilities of LLMs and the practical limitations of deploying such models at scale.

Looking forward, refining the model to better capture the subtleties of "why" an edit was made could further enhance summary quality. Additionally, exploring alternative representations of edit diffs may provide the model with richer context, potentially improving its ability to generate more accurate and informative summaries. The development and deployment of Edisum represent a significant step toward harnessing the power of AI to support one of the largest collaborative knowledge projects online.

X Twitter Logo Streamline Icon: https://streamlinehq.com