Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring ChatGPT for Toxicity Detection in GitHub (2312.13105v1)

Published 20 Dec 2023 in cs.SE

Abstract: Fostering a collaborative and inclusive environment is crucial for the sustained progress of open source development. However, the prevalence of negative discourse, often manifested as toxic comments, poses significant challenges to developer well-being and productivity. To identify such negativity in project communications, especially within large projects, automated toxicity detection models are necessary. To train these models effectively, we need large software engineering-specific toxicity datasets. However, such datasets are limited in availability and often exhibit imbalance (e.g., only 6 in 1000 GitHub issues are toxic), posing challenges for training effective toxicity detection models. To address this problem, we explore a zero-shot LLM (ChatGPT) that is pre-trained on massive datasets but without being fine-tuned specifically for the task of detecting toxicity in software-related text. Our preliminary evaluation indicates that ChatGPT shows promise in detecting toxicity in GitHub, and warrants further investigation. We experimented with various prompts, including those designed for justifying model outputs, thereby enhancing model interpretability and paving the way for potential integration of ChatGPT-enabled toxicity detection into developer communication channels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. N. Raman, M. Cao, Y. Tsvetkov, C. Kästner, and B. Vasilescu, “Stress and burnout in open source: Toward finding, understanding, and mitigating unhealthy interactions,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results, ser. ICSE-NIER ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 57–60. [Online]. Available: https://doi.org/10.1145/3377816.3381732
  2. “What it feels like to be an open-source maintainer,” https://nolanlawson.com/2017/03/05/what-it-feels-like-to-be-an-open-source-maintainer/, 2017, [Online; accessed 2-Jun-2023].
  3. J. Lehnardt, “Sustainable Open Source: The Maintainers Perspective or: How I Learned to Stop Caring and Love Open Source,” https://writing.jan.io/2017/03/06/sustainable-open-source-the-maintainers-perspective-or-how-i-learned-to-stop-caring-and-love-open-source.html, 2017, [Online; accessed 4-Jun-2023].
  4. C. Miller, S. Cohen, D. Klug, B. Vasilescu, and C. Kästner, ““did you miss my comment or what?” understanding toxicity in open source discussions,” in 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), 2022, pp. 710–722.
  5. I. Ferreira, J. Cheng, and B. Adams, “The “shut the f**k up” phenomenon: Characterizing incivility in open source code review discussions,” vol. 5, pp. 1–35, 2021-10-13. [Online]. Available: http://arxiv.org/abs/2108.09905
  6. I. Ferreira, B. Adams, and J. Cheng, “How heated is it? understanding GitHub locked issues,” in Proceedings of the 19th International Conference on Mining Software Repositories, 2022-05-23, pp. 309–320. [Online]. Available: http://arxiv.org/abs/2204.00155
  7. S. D. Gunawardena, P. Devine, I. Beaumont, L. P. Garden, E. Murphy-Hill, and K. Blincoe, “Destructive Criticism in Software Code Review Impacts Inclusion,” Proceedings of the ACM on Human-Computer Interaction, vol. 6, no. CSCW2, pp. 292:1–292:29, Nov. 2022. [Online]. Available: https://dl.acm.org/doi/10.1145/3555183
  8. D. Gachechiladze, F. Lanubile, N. Novielli, and A. Serebrenik, “Anger and Its Direction in Collaborative Software Development,” in 2017 IEEE/ACM 39th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER), May 2017, pp. 11–14.
  9. J. Cheriyan, B. T. R. Savarimuthu, and S. Cranefield, “Towards offensive language detection and reduction in four software engineering communities,” in Evaluation and Assessment in Software Engineering, ser. EASE 2021.   New York, NY, USA: Association for Computing Machinery, 2021, p. 254–259. [Online]. Available: https://doi.org/10.1145/3463274.3463805
  10. R. Ehsani, R. Rezapour, and P. Chatterjee, “Exploring Moral Principles Exhibited in Software‐related Text: A Case Study on GitHub Locked Issues,” in Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering: Ideas, Visions and Reflections Track, ser. FSE ’23, 2023.
  11. GitHub, “Open Source Survey,” https://opensourcesurvey.org/2017/, 2017, [Online; accessed 22-May-2023].
  12. C. D. Egelman, E. Murphy-Hill, E. Kammer, M. M. Hodges, C. Green, C. Jaspan, and J. Lin, “Predicting developers’ negative feelings about code review,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20.   Association for Computing Machinery, 2020-10-01, pp. 174–185. [Online]. Available: https://dl.acm.org/doi/10.1145/3377811.3380414
  13. H. S. Qiu, B. Vasilescu, C. Kästner, C. Egelman, C. Jaspan, and E. Murphy-Hill, “Detecting interpersonal conflict in issues and code review: cross pollinating open- and closed-source approaches,” in Proceedings of the 2022 ACM/IEEE 44th International Conference on Software Engineering: Software Engineering in Society, ser. ICSE-SEIS ’22.   Association for Computing Machinery, 2023-05-08, pp. 41–55. [Online]. Available: https://dl.acm.org/doi/10.1145/3510458.3513019
  14. M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are bullies more productive? empirical study of affectiveness vs. issue fixing time,” in Proceedings of the 12th Working Conference on Mining Software Repositories, ser. MSR ’15.   IEEE Press, 2015, pp. 303–313.
  15. G. Destefanis, M. Ortu, S. Counsell, S. Swift, M. Marchesi, and R. Tonelli, “Software development: do good manners matter?” PeerJ Comput. Sci., vol. 2, p. e73, 2016.
  16. “GiHub Conversations Locking,” https://github.blog/2014-06-09-locking-conversations/, 2014, [Online; accessed 7-Jun-2023].
  17. I. Ferreira, A. Rafiq, and J. Cheng, “Incivility detection in open source code review and issue discussions,” 2022-07-07. [Online]. Available: https://papers.ssrn.com/abstract=4156317
  18. J. Sarker, A. K. Turzo, M. Dong, and A. Bosu, “Automated identification of toxic code reviews using toxicr,” ACM Trans. Softw. Eng. Methodol., feb 2023, just Accepted. [Online]. Available: https://doi.org/10.1145/3583562
  19. J. Sarker, A. K. Turzo, and A. Bosu, “A benchmark study of the contemporary toxicity detectors on software engineering interactions,” no. arXiv:2009.09331, 2020-09-19. [Online]. Available: http://arxiv.org/abs/2009.09331
  20. M. M. Imran, Y. Jain, P. Chatterjee, and K. Damevski, “Data augmentation for improving emotion recognition in software engineering communication,” in 37th IEEE/ACM International Conference on Automated Software Engineering, 2022.
  21. H. Batra, N. S. Punn, S. K. Sonbhadra, and S. Agarwal, “Bert-based sentiment analysis: A software engineering perspective,” in International Conference on DEXA, 2021.
  22. E. Biswas, M. E. Karabulut, L. Pollock, and K. Vijay-Shanker, “Achieving reliable sentiment analysis in the software engineering domain using bert,” in 2020 IEEE ICSME, 2020.
  23. R. Kamath, A. Ghoshal, S. Eswaran, and P. B. Honnavalli, “Emoroberta: An enhanced emotion detection model using roberta,” in IEEE International Conference on Electronics, Computing and Communication Technologies, 2022.
  24. C. Liu, M. Osama, and A. De Andrade, “Dens: A dataset for multi-class emotion analysis,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 6293–6298.
  25. S. Kabir, D. N. Udo-Imeh, B. Kou, and T. Zhang, “Who answers it better? an in-depth analysis of chatgpt and stack overflow answers to software engineering questions,” 2023.
  26. X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li, X. Luo, D. Lo, J. Grundy, and H. Wang, “Large language models for software engineering: A systematic literature review,” 2023.
  27. W. Zhang, Y. Deng, B. Liu, S. J. Pan, and L. Bing, “Sentiment analysis in the era of large language models: A reality check,” 2023.
  28. A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang, “Large language models for software engineering: Survey and open problems,” 2023.
  29. S. Jalil, S. Rafi, T. D. LaToza, K. Moran, and W. Lam, “Chatgpt and software testing education: Promises & perils,” in 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 2023, pp. 4130–4137.
  30. F. Huang, H. Kwak, and J. An, “Is chatgpt better than human annotators? potential and limitations of chatgpt in explaining implicit hate speech,” in Companion Proceedings of the ACM Web Conference 2023, ser. WWW ’23.   ACM, Apr. 2023. [Online]. Available: http://dx.doi.org/10.1145/3543873.3587368
  31. L. Li, L. Fan, S. Atreja, and L. Hemphill, “”hot” chatgpt: The promise of chatgpt in detecting and discriminating hateful, offensive, and toxic comments on social media,” 2023.
  32. X. He, S. Zannettou, Y. Shen, and Y. Zhang, “You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content,” 2023.
  33. C. Ziems, W. Held, O. Shaikh, J. Chen, Z. Zhang, and D. Yang, “Can large language models transform computational social science?” 2023.
  34. OpenAI, “Chatgpt,” https://openai.com/blog/chatgpt, 2023.
  35. K. Stowe, P. Utama, and I. Gurevych, “Impli: Investigating nli models’ performance on figurative language,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022.
  36. J. Zhou, H. Gong, and S. Bhat, “Pie: A parallel idiomatic expression corpus for idiomatic sentence generation and paraphrasing,” in Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021), 2021.
  37. H. Haagsma, J. Bos, and M. Nissim, “Magpie: A large corpus of potentially idiomatic expressions,” in Proceedings of The 12th Language Resources and Evaluation Conference, 2020.
  38. J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, and D. C. Schmidt, “A prompt pattern catalog to enhance prompt engineering with chatgpt,” 2023.
  39. Conversationai. (n.d.) GitHub. [Online]. Available: https://github.com/conversationai/conversationai.github.io/blob/main/crowdsourcing_annotation_schemes/toxicity_with_subattributes.md
  40. S. Ouyang, J. M. Zhang, M. Harman, and M. Wang, “Llm is like a box of chocolates: the non-determinism of chatgpt in code generation,” 2023.
  41. O. Platform. (n.d.). [Online]. Available: https://platform.openai.com/docs/guides/gpt/managing-tokens
  42. OpenAI. (n.d.) Managing tokens. OpenAI Documentation. [Online]. Available: https://platform.openai.com/docs/guides/gpt/managing-tokens
  43. P. Chatterjee, K. Damevski, and L. Pollock, “Automatic extraction of opinion-based Q&A from online developer chats,” in Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), 2021, pp. 1260–1272.
  44. M. Kuutila, M. Mäntylä, and M. Claes, “Chat activity is a better predictor than chat sentiment on software developers productivity,” Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops, 2020.
  45. B. T. R. Savarimuthu, Z. Zareen, J. Cheriyan, M. Yasir, and M. Galster, “Barriers for social inclusion in online software engineering communities - a study of offensive language use in gitter projects,” in Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering, ser. EASE ’23.   New York, NY, USA: Association for Computing Machinery, 2023, p. 217–222. [Online]. Available: https://doi.org/10.1145/3593434.3593463
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Shyamal Mishra (1 paper)
  2. Preetha Chatterjee (16 papers)
Citations (7)
X Twitter Logo Streamline Icon: https://streamlinehq.com