Edisum: Summarizing and Explaining Wikipedia Edits at Scale (2404.03428v2)
Abstract: An edit summary is a succinct comment written by a Wikipedia editor explaining the nature of, and reasons for, an edit to a Wikipedia page. Edit summaries are crucial for maintaining the encyclopedia: they are the first thing seen by content moderators and they help them decide whether to accept or reject an edit. Additionally, edit summaries constitute a valuable data source for researchers. Unfortunately, as we show, for many edits, summaries are either missing or incomplete. To overcome this problem and help editors write useful edit summaries, we propose a model for recommending edit summaries generated by a LLM trained to produce good edit summaries given the representation of an edit diff. To overcome the challenges of mixed-quality training data and efficiency requirements imposed by the scale of Wikipedia, we fine-tune a small generative LLM on a curated mix of human and synthetic data. Our model performs on par with human editors. Commercial LLMs are able to solve this task better than human editors, but are not well suited for Wikipedia, while open-source ones fail on this task. More broadly, we showcase how LLMing technology can be used to support humans in maintaining one of the largest and most visible projects on the Web.
- Do not have enough data? deep learning to the rescue! In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp. 7383–7390, 2020.
- Turbulent stability of emergent roles: The dualistic nature of self-organizing knowledge coproduction. Information Systems Research, 27(4):792–812, 2016.
- Automatically labeling low quality content on wikipedia by leveraging patterns in editing behaviors. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):1–23, 2021.
- Scaling instruction-finetuned language models, 2022.
- Text editing by command. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 5259–5274, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.414. URL https://aclanthology.org/2021.naacl-main.414.
- Zerogen+{}^{+}start_FLOATSUPERSCRIPT + end_FLOATSUPERSCRIPT: Self-guided high-quality data generation in efficient zero-shot learning, 2022. URL https://arxiv.org/abs/2205.12679.
- The work of sustaining order in wikipedia: The banning of a vandal. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work, CSCW ’10, pp. 117–126, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781605587950. doi: 10.1145/1718918.1718941. URL https://doi.org/10.1145/1718918.1718941.
- Trace ethnography: Following coordination through documentary practices. In 2011 44th Hawaii international conference on system sciences, pp. 1–10. IEEE, 2011.
- Longt5: Efficient text-to-text transformer for long sequences, 2022.
- Exploiting asymmetry for synthetic training data generation: Synthie and the case of information extraction, 2023.
- The stack: 3 tb of permissively licensed source code. Transactions on Machine Learning Research, 2022.
- Data augmentation using pre-trained transformer models. arXiv preprint arXiv:2003.02245, 2020.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, 2019.
- Self-prompting large language models for open-domain qa, 2022. URL https://arxiv.org/abs/2212.08635.
- Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pp. 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL https://aclanthology.org/W04-1013.
- Generating training data with language models: Towards zero-shot language understanding, 2022. URL https://arxiv.org/abs/2202.04538.
- Simulated chats for building dialog systems: Learning to generate conversations from instructions. arXiv preprint arXiv:2010.10216, 2020.
- Jonathan Morgan. Patrolling on wikipedia, 2019. URL https://meta.wikimedia.org/wiki/Research:Patrolling_on_Wikipedia.
- OpenAI. OpenAI model pricing. https://openai.com/pricing, 2024a. Accessed: 2024-03-12.
- OpenAI. OpenAI models documentation. https://platform.openai.com/docs/models, 2024b. Accessed: 2024-03-12.
- Wikipedians are born, not made: a study of power editors on wikipedia. In Proceedings of the 2009 ACM International Conference on Supporting Group Work, pp. 51–60, 2009.
- Dare: Data augmented relation extraction with gpt-2. arXiv preprint arXiv:2004.13845, 2020.
- Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, pp. 311–318, USA, 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL https://doi.org/10.3115/1073083.1073135.
- Mind your pov: Convergence of articles and editors towards wikipedia’s neutrality norm. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW):1–23, 2018.
- Data augmentation for intent classification with off-the-shelf large language models, 2022. URL https://arxiv.org/abs/2204.01959.
- Peer: A collaborative language model, 2022.
- Synthetic prompting: Generating chain-of-thought demonstrations for large language models, 2023. URL https://arxiv.org/abs/2302.00618.
- Information quality work organization in wikipedia. Journal of the American society for information science and technology, 59(6):983–1001, 2008.
- Edit wars in wikipedia. In 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing. IEEE, oct 2011. doi: 10.1109/passat/socialcom.2011.47. URL https://doi.org/10.1109%2Fpassat%2Fsocialcom.2011.47.
- Artificial artificial artificial intelligence: Crowd workers widely use large language models for text production tasks, 2023.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018.
- Towards zero-label language learning. CoRR, abs/2109.09193, 2021. URL https://arxiv.org/abs/2109.09193.
- Visualizing activity on wikipedia with chromograms. In Human-Computer Interaction–INTERACT 2007: 11th IFIP TC 13 International Conference, Rio de Janeiro, Brazil, September 10-14, 2007, Proceedings, Part II 11, pp. 272–287. Springer, 2007.
- Wikimedia. Wikimedia foundation guiding principles. https://foundation.wikimedia.org/wiki/Resolution:Wikimedia_Foundation_Guiding_Principles, 2024a. Accessed: 2024-03-12.
- Wikimedia. Edit summary. https://meta.wikimedia.org/wiki/Help:Edit_summary, 2024b. Accessed: 2024-03-12.
- Wikipedia. Automatic edit summaries. https://en.wikipedia.org/wiki/Help:Automatic_edit_summaries, 2024a. Accessed: 2024-03-12.
- Wikipedia. Canned edit summaries. https://en.wikipedia.org/wiki/Wikipedia:Canned_edit_summaries, 2024b. Accessed: 2024-03-12.
- Wikipedia. Wikipedia statistics on number of edits performed. https://stats.wikimedia.org/#/en.wikipedia.org/contributing/edits/normal|bar|2-year|editor_type~anonymous*group-bot*name-bot*user+(page_type)~content|monthly, 2024c. Accessed: 2024-03-12.
- Wikipedia. Wikipedia revision deletion. https://en.wikipedia.org/wiki/Wikipedia:Revision_deletion, 2024d. Accessed: 2024-03-12.
- Wikitech. Wikipedia: Access to gpus. https://wikitech.wikimedia.org/wiki/Machine_Learning/AMD_GPU#Do_we_have_Nvidia_GPUs, 2024. Accessed: 2024-03-12.
- Identifying semantic edit intentions from revisions in wikipedia. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2000–2010, 2017.
- Zerogen: Efficient zero-shot learning via dataset generation, 2022. URL https://arxiv.org/abs/2202.07922.
- MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 563–578, Hong Kong, China, November 2019. doi: 10.18653/v1/D19-1053. URL https://aclanthology.org/D19-1053.