CommitBART: A Large Pre-trained Model for GitHub Commits (2208.08100v2)

Published 17 Aug 2022 in cs.SE and cs.AI

Abstract: GitHub commits, which record the code changes with natural language messages for description, play a critical role for software developers to comprehend the software evolution. To promote the development of the open-source software community, we collect a commit benchmark including over 7.99 million commits across 7 programming languages. Based on this benchmark, we present CommitBART, a large pre-trained encoder-decoder Transformer model for GitHub commits. The model is pre-trained by three categories (i.e., denoising objectives, cross-modal generation and contrastive learning) for six pre-training tasks to learn commit fragment representations. Furthermore, we unify a ``commit intelligence'' framework with one understanding task and three generation tasks for commits. The comprehensive experiments on these tasks demonstrate that CommitBARTsignificantly outperforms previous pre-trained works for code. Further analysis also reveals each pre-training task enhances the model performance.

Authors (4)

Shangqing Liu (28 papers)
Yanzhou Li (5 papers)
Xiaofei Xie (104 papers)
Yang Liu (2253 papers)

Citations (15)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

CommitBART: A Large Pre-trained Model for GitHub Commits (2208.08100v2)

Summary

Related Papers