Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoreGen: Contextualized Code Representation Learning for Commit Message Generation (2007.06934v3)

Published 14 Jul 2020 in cs.CL, cs.LG, and cs.SE

Abstract: Automatic generation of high-quality commit messages for code commits can substantially facilitate software developers' works and coordination. However, the semantic gap between source code and natural language poses a major challenge for the task. Several studies have been proposed to alleviate the challenge but none explicitly involves code contextual information during commit message generation. Specifically, existing research adopts static embedding for code tokens, which maps a token to the same vector regardless of its context. In this paper, we propose a novel Contextualized code representation learning strategy for commit message Generation (CoreGen). CoreGen first learns contextualized code representations which exploit the contextual information behind code commit sequences. The learned representations of code commits built upon Transformer are then fine-tuned for downstream commit message generation. Experiments on the benchmark dataset demonstrate the superior effectiveness of our model over the baseline models with at least 28.18% improvement in terms of BLEU-4 score. Furthermore, we also highlight the future opportunities in training contextualized code representations on larger code corpus as a solution to low-resource tasks and adapting the contextualized code representation framework to other code-to-text generation tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lun Yiu Nie (5 papers)
  2. Cuiyun Gao (97 papers)
  3. Zhicong Zhong (2 papers)
  4. Wai Lam (117 papers)
  5. Yang Liu (2253 papers)
  6. Zenglin Xu (145 papers)
Citations (41)

Summary

We haven't generated a summary for this paper yet.