Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comparative Study of Transformer-based Neural Text Representation Techniques on Bug Triaging (2310.06913v1)

Published 10 Oct 2023 in cs.SE, cs.CL, and cs.IR

Abstract: Often, the first step in managing bug reports is related to triaging a bug to the appropriate developer who is best suited to understand, localize, and fix the target bug. Additionally, assigning a given bug to a particular part of a software project can help to expedite the fixing process. However, despite the importance of these activities, they are quite challenging, where days can be spent on the manual triaging process. Past studies have attempted to leverage the limited textual data of bug reports to train text classification models that automate this process -- to varying degrees of success. However, the textual representations and machine learning models used in prior work are limited by their expressiveness, often failing to capture nuanced textual patterns that might otherwise aid in the triaging process. Recently, large, transformer-based, pre-trained neural text representation techniques such as BERT have achieved greater performance in several natural language processing tasks. However, the potential for using these techniques to improve upon prior approaches for automated bug triaging is not well studied or understood. Therefore, in this paper we offer one of the first investigations that fine-tunes transformer-based LLMs for the task of bug triaging on four open source datasets, spanning a collective 53 years of development history with over 400 developers and over 150 software project components. Our study includes both a quantitative and qualitative analysis of effectiveness. Our findings illustrate that DeBERTa is the most effective technique across the triaging tasks of developer and component assignment, and the measured performance delta is statistically significant compared to other techniques. However, through our qualitative analysis, we also observe that each technique possesses unique abilities best suited to certain types of bug reports.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proceedings of the 28th international conference on Software engineering, 2006, pp. 361–370.
  2. J. Anvik, “Automating bug report assignment,” in Proceedings of the 28th international conference on Software engineering, 2006, pp. 937–940.
  3. K. Crowston, J. Howison, and H. Annabi, “Information systems success in free and open source software development: Theory and measures,” Software Process: Improvement and Practice, vol. 11, no. 2, pp. 123–148, 2006.
  4. G. Jeong, S. Kim, and T. Zimmermann, “Improving bug triage with bug tossing graphs,” in Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, 2009, pp. 111–120.
  5. T. Zhang, H. Jiang, X. Luo, and A. T. Chan, “A literature review of research in bug resolution: Tasks, challenges and future directions,” The Computer Journal, vol. 59, no. 5, pp. 741–773, 2016.
  6. P. Bhattacharya, I. Neamtiu, and C. R. Shelton, “Automated, highly-accurate, bug assignment using machine learning and tossing graphs,” Journal of Systems and Software, vol. 85, no. 10, pp. 2275–2292, 2012.
  7. S. Kim and E. J. Whitehead Jr, “How long did it take to fix bugs?” in Proceedings of the 2006 international workshop on Mining software repositories, 2006, pp. 173–174.
  8. J. Xuan, H. Jiang, Z. Ren, J. Yan, and Z. Luo, “Automatic bug triage using semi-supervised text classification,” arXiv preprint arXiv:1704.04769, 2017.
  9. G. Yang, T. Zhang, and B. Lee, “Towards semi-automatic bug triage and severity prediction based on topic model and multi-feature of bug reports,” in 2014 IEEE 38th Annual Computer Software and Applications Conference.   IEEE, 2014, pp. 97–106.
  10. S.-R. Lee, M.-J. Heo, C.-G. Lee, M. Kim, and G. Jeong, “Applying deep learning based automatic bug triager to industrial projects,” in Proceedings of the 2017 11th Joint Meeting on foundations of software engineering, 2017, pp. 926–931.
  11. S. Mani, A. Sankaran, and R. Aralikatte, “Deeptriage: Exploring the effectiveness of deep learning for bug triaging,” in Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2019, pp. 171–179.
  12. K. W. Church, “Word2vec,” Natural Language Engineering, vol. 23, no. 1, pp. 155–162, 2017.
  13. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), M. A. Walker, H. Ji, and A. Stent, Eds.   Association for Computational Linguistics, 2018, pp. 2227–2237. [Online]. Available: https://doi.org/10.18653/v1/n18-1202
  14. S. F. A. Zaidi, F. M. Awan, M. Lee, H. Woo, and C.-G. Lee, “Applying convolutional neural networks with different word representation techniques to recommend bug fixers,” IEEE Access, vol. 8, pp. 213 729–213 747, 2020.
  15. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  16. Anonymous, “Neural bug triaging online appendix,” https://sagelab.io/neural-bug-triaging, 2023.
  17. X. Xia, D. Lo, X. Wang, and B. Zhou, “Accurate developer recommendation for bug resolution,” in 2013 20th Working Conference on Reverse Engineering (WCRE).   IEEE, 2013, pp. 72–81.
  18. V. Nath, D. Sheldon, and J. Alphonso-Gibbs, “Principal component analysis and entropy-based selection for the improvement of bug triage,” in 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA).   IEEE, 2021, pp. 541–546.
  19. T. T. Nguyen, A. T. Nguyen, and T. N. Nguyen, “Topic-based, time-aware bug assignment,” ACM SIGSOFT Software Engineering Notes, vol. 39, no. 1, pp. 1–4, 2014.
  20. G. Murphy and D. Cubranic, “Automatic bug triage using text categorization,” in Proceedings of the Sixteenth International Conference on Software Engineering & Knowledge Engineering.   Citeseer, 2004, pp. 1–6.
  21. J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage: Recommenders for development-oriented decisions,” ACM Transactions on Software Engineering and Methodology (TOSEM), vol. 20, no. 3, pp. 1–35, 2011.
  22. A. Sarkar, P. C. Rigby, and B. Bartalos, “Improving bug triaging with high confidence predictions at ericsson,” in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).   IEEE, 2019, pp. 81–91.
  23. Z. Lin, F. Shu, Y. Yang, C. Hu, and Q. Wang, “An empirical study on bug assignment automation using chinese bug data,” in 2009 3rd International Symposium on Empirical Software Engineering and Measurement.   IEEE, 2009, pp. 451–455.
  24. S. N. Ahsan, J. Ferzund, and F. Wotawa, “Automatic software bug triage system (bts) based on latent semantic indexing and support vector machine,” in 2009 Fourth International Conference on Software Engineering Advances.   IEEE, 2009, pp. 216–221.
  25. S. Nasim, S. Razzaq, and J. Ferzund, “Automated change request triage using alpha frequency matrix,” in 2011 Frontiers of Information Technology.   IEEE, 2011, pp. 298–302.
  26. A.-C. Florea, J. Anvik, and R. Andonie, “Spark-based cluster implementation of a bug report assignment recommender system,” in International Conference on artificial intelligence and soft computing.   Springer, 2017, pp. 31–42.
  27. W. Fu and T. Menzies, “Easy over hard: A case study on deep learning,” in Proceedings of the 2017 11th joint meeting on foundations of software engineering, 2017, pp. 49–60.
  28. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2013. [Online]. Available: http://arxiv.org/abs/1301.3781
  29. S. Guo, X. Zhang, X. Yang, R. Chen, C. Guo, H. Li, and T. Li, “Developer activity motivated bug triaging: via convolutional neural network,” Neural Processing Letters, vol. 51, no. 3, pp. 2589–2606, 2020.
  30. J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, A. Moschitti, B. Pang, and W. Daelemans, Eds.   ACL, 2014, pp. 1532–1543. [Online]. Available: https://doi.org/10.3115/v1/d14-1162
  31. J. Lee, K. Han, and H. Yu, “A light bug triage framework for applying large pre-trained language model,” in 37th IEEE/ACM International Conference on Automated Software Engineering, 2022, pp. 1–11.
  32. C. Buciluǎ, R. Caruana, and A. Niculescu-Mizil, “Model compression,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006, pp. 535–541.
  33. M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Psychology of learning and motivation.   Elsevier, 1989, vol. 24, pp. 109–165.
  34. P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced bert with disentangled attention,” arXiv preprint arXiv:2006.03654, 2020.
  35. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
  36. H. Wu, H. Liu, and Y. Ma, “Empirical study on developer factors affecting tossing path length of bug reports,” IET Software, vol. 12, no. 3, pp. 258–270, 2018.
  37. W. Zhang, “Efficient bug triage for industrial environments,” in 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME).   IEEE, 2020, pp. 727–735.
  38. J. Xuan, H. Jiang, Z. Ren, and W. Zou, “Developer prioritization in bug repositories,” in 2012 34th International Conference on Software Engineering (ICSE).   IEEE, 2012, pp. 25–35.
  39. P. Bhattacharya and I. Neamtiu, “Fine-grained incremental learning and multi-feature tossing graphs to improve bug triaging,” in 2010 IEEE International Conference on Software Maintenance.   IEEE, 2010, pp. 1–10.
  40. V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter,” arXiv preprint arXiv:1910.01108, 2019.
  41. Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, “Albert: A lite bert for self-supervised learning of language representations,” arXiv preprint arXiv:1909.11942, 2019.
  42. Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “Codebert: A pre-trained model for programming and natural languages,” 2020.
  43. pytorch.org, “TyTorch,” https://pytorch.org/, [Online], [Accessed 10-13-2022].
  44. huggingface.co, “Transformers,” https://huggingface.co/, [Online], [Accessed 10-13-2022].
  45. scikit learn.org, “scikit-learn — Machine Learning in Python,” https://scikit-learn.org/stable/index.html, [Online], [Accessed 10-13-2022].
  46. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
Citations (4)

Summary

We haven't generated a summary for this paper yet.