Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CommitBART: A Large Pre-trained Model for GitHub Commits (2208.08100v2)

Published 17 Aug 2022 in cs.SE and cs.AI

Abstract: GitHub commits, which record the code changes with natural language messages for description, play a critical role for software developers to comprehend the software evolution. To promote the development of the open-source software community, we collect a commit benchmark including over 7.99 million commits across 7 programming languages. Based on this benchmark, we present CommitBART, a large pre-trained encoder-decoder Transformer model for GitHub commits. The model is pre-trained by three categories (i.e., denoising objectives, cross-modal generation and contrastive learning) for six pre-training tasks to learn commit fragment representations. Furthermore, we unify a ``commit intelligence'' framework with one understanding task and three generation tasks for commits. The comprehensive experiments on these tasks demonstrate that CommitBARTsignificantly outperforms previous pre-trained works for code. Further analysis also reveals each pre-training task enhances the model performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shangqing Liu (28 papers)
  2. Yanzhou Li (5 papers)
  3. Xiaofei Xie (104 papers)
  4. Yang Liu (2253 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.