An empirical study of ChatGPT-3.5 on question answering and code maintenance (2310.02104v1)
Abstract: Ever since the launch of ChatGPT in 2022, a rising concern is whether ChatGPT will replace programmers and kill jobs. Motivated by this widespread concern, we conducted an empirical study to systematically compare ChatGPT against programmers in question-answering and software-maintaining. We reused a dataset introduced by prior work, which includes 130 StackOverflow (SO) discussion threads referred to by the Java developers of 357 GitHub projects. We mainly investigated three research questions (RQs). First, how does ChatGPT compare with programmers when answering technical questions? Second, how do developers perceive the differences between ChatGPT's answers and SO answers? Third, how does ChatGPT compare with humans when revising code for maintenance requests? For RQ1, we provided the 130 SO questions to ChatGPT, and manually compared ChatGPT answers with the accepted/most popular SO answers in terms of relevance, readability, informativeness, comprehensiveness, and reusability. For RQ2, we conducted a user study with 30 developers, asking each developer to assess and compare 10 pairs of answers, without knowing the information source (i.e., ChatGPT or SO). For RQ3, we distilled 48 software maintenance tasks from 48 GitHub projects citing the studied SO threads. We queried ChatGPT to revise a given Java file, and to incorporate the code implementation for any prescribed maintenance requirement. Our study reveals interesting phenomena: For the majority of SO questions (97/130), ChatGPT provided better answers; in 203 of 300 ratings, developers preferred ChatGPT answers to SO answers; ChatGPT revised code correctly for 22 of the 48 tasks. Our research will expand people's knowledge of ChatGPT capabilities, and shed light on future adoption of ChatGPT by the software industry.
- 2021. Software Maintenance Cost: What Is It and Why Is It So Important? https://bambooagile.eu/insights/software-maintenance-costs/.
- 2023. Achilles. https://github.com/doanduyhai/Achilles.
- 2023. Aiolos. https://github.com/praus/Aiolos.
- 2023a. ChatGPT could make these jobs obsolete: ‘The wolf is at the door’. https://nypost.com/2023/01/25/chat-gpt-could-make-these-jobs-obsolete/.
- 2023. ChatGPT took their jobs. Now they walk dogs and fix air conditioners. https://www.washingtonpost.com/technology/2023/06/02/ai-taking-jobs/.
- 2023a. ChatGPT will replace All The Software Developer Jobs? https://medium.com/geekculture/chatgpt-will-replace-all-the-software-developer-jobs-dc0af5c52202.
- 2023. ChatGPT Will Replace Programmers Within 10 Years. https://levelup.gitconnected.com/chatgpt-will-replace-programmers-within-10-years-91e5b3bd3676.
- 2023b. ChatGPT Will Replace Programmers Within 10 Years . https://news.ycombinator.com/item?id=35298205.
- 2023. CodingProblems. https://github.com/sureshsajja/CodingProblems.
- 2023. CoreNLP. https://github.com/codev777/CoreNLP.
- 2023. DeltaLauncher. https://github.com/D4Delta/DeltaLauncher.
- 2023. Dhaval Patel’s Post. https://www.linkedin.com/posts/dhavalsays_chatgpt-will-take-away-all-programmer-jobs-activity-7007180411197874176-cPWK.
- 2023. gnikrap. https://github.com/jbenech/gnikrap.
- 2023b. How ChatGPT Will Destabilize White-Collar Work. https://www.theatlantic.com/ideas/archive/2023/01/chatgpt-ai-economy-automation-jobs/672767/.
- 2023c. How does ChatGPT actually work? https://www.zdnet.com/article/how-does-chatgpt-work/.
- 2023. jlib. https://github.com/dejlek/jlib.
- 2023. lanterna. https://github.com/hbyint/lanterna.
- 2023. LOFiles. https://github.com/CleyFaye/LOFiles.
- 2023. markov-test. https://github.com/Tamini/markov-test.
- 2023. opennars. https://github.com/printedheart/opennars.
- 2023. skyroad-magnets. https://github.com/znGames/skyroad-magnets.
- 2023. So-git experiment. https://figshare.com/articles/dataset/So-gitexperiment/20425839.
- 2023. the-holy-braille. https://github.com/jchien14/the-holy-braille.
- 2023d. What is ChatGPT and why does it matter? Here’s what you need to know. https://www.zdnet.com/article/what-is-chatgpt-and-why-does-it-matter-heres-everything-you-need-to-know/.
- 2023. Will ChatGPT replace programmers? https://www.quora.com/Will-ChatGPT-replace-programmers.
- 2023c. Will ChatGPT Replace Your Job? https://www.forbes.com/sites/ashleystahl/2023/03/03/will-chatgpt-replace-your-job/.
- Deema Adeeb Al Shoaibi and Mohamed Wiem Mkaouer. 2023. Understanding Software Performance Challenges an Empirical Study on Stack Overflow. In 2023 International Conference on Code Quality (ICCQ). 1–15. https://doi.org/10.1109/ICCQ57276.2023.10114662
- I Elaine Allen and Christopher A Seaman. 2007. Likert scales and data analyses. Quality progress 40, 7 (2007), 64–65.
- Attribution Required: Stack Overflow Code Snippets in GitHub Projects. In 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C). 161–163. https://doi.org/10.1109/ICSE-C.2017.99
- What do Developers Know About Machine Learning: A Study of ML Discussions on StackOverflow. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 260–264. https://doi.org/10.1109/MSR.2019.00052
- Pearson Correlation Coefficient. Springer Berlin Heidelberg, Berlin, Heidelberg, 1–4. https://doi.org/10.1007/978-3-642-00296-0_5
- GPTutor: a ChatGPT-powered programming tool for code explanation. arXiv:2305.01863 [cs.HC]
- How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects? arXiv:2308.09573 [cs.SE]
- How Reliable is the Crowdsourced Knowledge of Security Implementation?. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 536–547. https://doi.org/10.1109/ICSE.2019.00065
- On the Use of C# Unsafe Code Context: An Empirical Study of Stack Overflow. In Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) (Bari, Italy) (ESEM ’20). Association for Computing Machinery, New York, NY, USA, Article 39, 6 pages. https://doi.org/10.1145/3382494.3422165
- Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security. In 2017 IEEE Symposium on Security and Privacy (SP). 121–136. https://doi.org/10.1109/SP.2017.31
- The Synergy between Voting and Acceptance of Answers on StackOverflow - Or the Lack Thereof. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 406–409. https://doi.org/10.1109/MSR.2015.50
- Intuition vs. Truth: Evaluation of Common Myths about StackOverflow Posts. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 438–441. https://doi.org/10.1109/MSR.2015.58
- ChatGPT and Software Testing Education: Promises & Perils. In 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW). IEEE Computer Society, Los Alamitos, CA, USA, 4130–4137. https://doi.org/10.1109/ICSTW58534.2023.00078
- Motivation Under Gamification: An Empirical Study of Developers’ Motivations and Contributions in Stack Overflow. IEEE Transactions on Software Engineering 48, 12 (2022), 4947–4963. https://doi.org/10.1109/TSE.2021.3130088
- Marry L. McHugh. 2012. Interrater reliability: the kappa statistic. Biochem Med (Zagreb) (2012).
- Secure coding practices in java: Challenges and vulnerabilities. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). IEEE, 372–383.
- Analysis of the reputation system and user contributions on a question answering website: StackOverflow. In 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013). 886–893. https://doi.org/10.1145/2492517.2500242
- Comparing Software Developers with ChatGPT: An Empirical Investigation. arXiv:2305.11837 [cs.SE]
- What makes a good code example?: A study of programming Q amp;A in StackOverflow. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). 25–34. https://doi.org/10.1109/ICSM.2012.6405249
- The End of an Era: Can Ai Subsume Software Developers? Evaluating Chatgpt and Copilot Capabilities Using Leetcode Problems. http://dx.doi.org/10.2139/ssrn.4422122.
- Analysis of Modern Release Engineering Topics : ? A Large-Scale Study using StackOverflow ?. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). 104–114. https://doi.org/10.1109/ICSME46990.2020.00020
- How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow. Empirical Software Engineering 27, 1 (2021), 11. https://doi.org/10.1007/s10664-021-10045-x
- Mining Questions about Software Energy Consumption. In Proceedings of the 11th Working Conference on Mining Software Repositories (Hyderabad, India) (MSR 2014). Association for Computing Machinery, New York, NY, USA, 22–31. https://doi.org/10.1145/2597073.2597110
- An Analysis of the Automatic Bug Fixing Performance of ChatGPT. arXiv:2301.08653 [cs.SE]
- Is ChatGPT the Ultimate Programming Assistant – How far is it? arXiv:2304.11938 [cs.SE]
- F. Yates. 1934. The Analysis of Multiple Classifications with Unequal Numbers in the Different Classes. J. Amer. Statist. Assoc. 29, 185 (1934), 51–66. https://doi.org/10.1080/01621459.1934.10502686 arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1934.10502686
- Mining Questions Asked about Continuous Software Engineering: A Case Study of Stack Overflow. In Proceedings of the 24th International Conference on Evaluation and Assessment in Software Engineering (Trondheim, Norway) (EASE ’20). Association for Computing Machinery, New York, NY, USA, 41–50. https://doi.org/10.1145/3383219.3383224
- An Empirical Study of Obsolete Answers on Stack Overflow. IEEE Transactions on Software Engineering 47, 4 (2021), 850–862. https://doi.org/10.1109/TSE.2019.2906315
- Are Code Examples on an Online Q&A Forum Reliable? A Study of API Misuse on Stack Overflow. In Proceedings of the 40th International Conference on Software Engineering (Gothenburg, Sweden) (ICSE ’18). Association for Computing Machinery, New York, NY, USA, 886–896. https://doi.org/10.1145/3180155.3180260
- An empirical study of question discussions on Stack Overflow. Empirical Software Engineering 27, 6 (2022), 148. https://doi.org/10.1007/s10664-022-10180-z
- Which Non-functional Requirements Do Developers Focus On? An Empirical Study on Stack Overflow Using Topic Analysis. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 446–449. https://doi.org/10.1109/MSR.2015.60
- Md Mahir Asef Kabir (3 papers)
- Sk Adnan Hassan (2 papers)
- Xiaoyin Wang (12 papers)
- Ying Wang (366 papers)
- Hai Yu (40 papers)
- Na Meng (23 papers)