CoditT5: Pretraining for Source Code and Natural Language Editing (2208.05446v2)

Published 10 Aug 2022 in cs.SE and cs.LG

Abstract: Pretrained LLMs have been shown to be effective in many software-related generation tasks; however, they are not well-suited for editing tasks as they are not designed to reason about edits. To address this, we propose a novel pretraining objective which explicitly models edits and use it to build CoditT5, a LLM for software-related editing tasks that is pretrained on large amounts of source code and natural language comments. We fine-tune it on various downstream editing tasks, including comment updating, bug fixing, and automated code review. By outperforming standard generation-based models, we demonstrate the generalizability of our approach and its suitability for editing tasks. We also show how a standard generation model and our edit-based model can complement one another through simple reranking strategies, with which we achieve state-of-the-art performance for the three downstream editing tasks.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Jiyang Zhang (11 papers)
Sheena Panthaplackel (9 papers)
Pengyu Nie (19 papers)
Junyi Jessy Li (79 papers)
Milos Gligoric (23 papers)

Citations (80)

View on Semantic Scholar

CoditT5: Pretraining for Source Code and Natural Language Editing (2208.05446v2)

Related Papers