Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning (2403.03677v1)
Abstract: When drafting question posts for Stack Overflow, developers may not accurately summarize the core problems in the question titles, which can cause these questions to not get timely help. Therefore, improving the quality of question titles has attracted the wide attention of researchers. An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. However, this study ignored the helpful information in their corresponding problem descriptions. Therefore, we propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Then we formalize the title generation for different programming languages as separate but related tasks and utilize multi-task learning to solve these tasks. Later we fine-tune the pre-trained LLM CodeT5 to automatically generate the titles. Unfortunately, the inconsistent inputs and optimization objectives between the pre-training task and our investigated task may make fine-tuning hard to fully explore the knowledge of the pre-trained model. To solve this issue, SOTitle+ further prompt-tunes CodeT5 with hybrid prompts (i.e., mixture of hard and soft prompts). To verify the effectiveness of SOTitle+, we construct a large-scale high-quality corpus from recent data dumps shared by Stack Overflow. Our corpus includes 179,119 high-quality question posts for six popular programming languages. Experimental results show that SOTitle+ can significantly outperform four state-of-the-art baselines in both automatic evaluation and human evaluation. Our work indicates that considering bi-modal information and prompt learning in Stack Overflow title generation is a promising exploration direction.
- Cohen J (1960) A coefficient of agreement for nominal scales. Educational and psychological measurement 20(1):37–46
- Gros D, Sezhiyan H, Devanbu P, Yu Z (2020) Code to comment “translation”: Data, metrics, baselining & evaluation. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 746–757
- LIN C (2004) Rouge: A package for automatic evaluation of summaries. In: Proc. Workshop on Text Summariation Branches Out, Post-Conference Workshop of ACL 2004
- Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International Conference on Learning Representations
- Wilcoxon F (1992) Individual comparisons by ranking methods. Springer
- Xia CS, Zhang L (2023) Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. arXiv preprint arXiv:230400385
- Zhang Y, Yang Q (2021) A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering 34(12):5586–5609
- Shaoyu Yang (4 papers)
- Xiang Chen (343 papers)
- Ke Liu (597 papers)
- Guang Yang (422 papers)
- Chi Yu (7 papers)