Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Assessing the Influence of Toxic and Gender Discriminatory Communication on Perceptible Diversity in OSS Projects (2403.08113v2)

Published 12 Mar 2024 in cs.SE

Abstract: The presence of toxic and gender-identity derogatory language in open-source software (OSS) communities has recently become a focal point for researchers. Such comments not only lead to frustration and disengagement among developers but may also influence their leave from the OSS projects. Despite ample evidence suggesting that diverse teams enhance productivity, the existence of toxic or gender identity discriminatory communications poses a significant threat to the participation of individuals from marginalized groups and, as such, may act as a barrier to fostering diversity and inclusion in OSS projects. However, there is a notable lack of research dedicated to exploring the association between gender-based toxic and derogatory language with a perceptible diversity of open-source software teams. Consequently, this study aims to investigate how such content influences the gender, ethnicity, and tenure diversity of open-source software development teams. To achieve this, we extract data from active GitHub projects, assess various project characteristics, and identify instances of toxic and gender-discriminatory language within issue/pull request comments. Using these attributes, we construct a regression model to explore how they associate with the perceptible diversity of those projects.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. [n. d.]. Correlation and Regression with R. https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R5_Correlation-Regression/R5_Correlation-Regression4.html. Accessed: 2021-05-29.
  2. [n. d.]. langdetect. https://pypi.org/project/langdetect/
  3. [n. d.]. Regression Modeling Strategies. https://hbiostat.org/doc/rms.pdf. Accessed: 2021-05-29.
  4. Automatic identification and classification of misogynistic language on twitter. In Natural Language Processing and Information Systems: 23rd International Conference on Applications of Natural Language to Information Systems, NLDB 2018, Paris, France, June 13-15, 2018, Proceedings 23. Springer, 57–64.
  5. Anonymous Author. [n. d.]. Leaving Toxic Open Source Communities, [Online]. https://modelviewculture.com/pieces/leaving-toxic-open-source-communities. July 21st, 2014.
  6. Fredrik Barth. 2010. Introduction to ethnic groups and boundaries: The social organization of cultural difference. Selected studies in international migration and immigrant incorporation 1 (2010), 407.
  7. Chris Beard. 2018. Diversity and Inclusion at Mozilla.
  8. A First Survey on the Diversity of the R Community. The R Journal 9, 2 (2017), 542–551.
  9. Amiangshu Bosu and Jeffrey C. Carver. 2014. Impact of Developer Reputation on Code Review Outcomes in OSS Projects: An Empirical Investigation. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (Torino, Italy) (ESEM ’14). Association for Computing Machinery, New York, NY, USA, Article 33, 10 pages. https://doi.org/10.1145/2652524.2652544
  10. Amiangshu Bosu and Kazi Zakia Sultana. 2019. Diversity and inclusion in open source software (OSS) projects: Where do we stand?. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–11.
  11. Peter J Brockwell and Richard A Davis. 2013. Time series: theory and methods. Springer Science & Business Media.
  12. John M Chambers and Trevor J Hastie. 2017. Statistical models. In Statistical models in S. Routledge, 13–44.
  13. Free/Libre open-source software development: What we know and what we do not know. ACM Computing Surveys (CSUR) 44, 2 (2008), 1–35.
  14. Sampling projects in github for MSR studies. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 560–564.
  15. Building and Sustaining Ethnically, Racially, and Gender Diverse Software Engineering Teams: A Study at Google. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 631–643.
  16. Theodore M DeJong. 1975. A comparison of three diversity indices based on their components of richness and evenness. Oikos (1975), 222–227.
  17. Software development: do good manners matter? PeerJ Computer Science 2 (2016), e73.
  18. Holly Else and Jeffrey M Perkel. 2022. The giant plan to track diversity in research journals. Nature 602, 7898 (2022), 566–570.
  19. Nicholas R Farnum and LaVerne W Stanton. 1989. Quantitative forecasting methods. Pws Pub Co.
  20. Incivility Detection in Open Source Code Review and Issue Discussions. arXiv preprint arXiv:2206.13429 (2022).
  21. Onboarding vs. Diversity, Productivity and Quality — Empirical Study of the OpenStack Ecosystem. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1033–1045. https://doi.org/10.1109/ICSE43902.2021.00097
  22. What happens when software developers are (un) happy. Journal of Systems and Software 140 (2018), 32–47.
  23. Destructive criticism in software code review impacts inclusion. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–29.
  24. F.E. Harrell. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer International Publishing. https://books.google.com/books?id=94RgCgAAQBAJ
  25. Investigating the effects of gender bias on GitHub. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 700–711.
  26. An in-depth study of the promises and perils of mining GitHub. Empirical Software Engineering 21, 5 (2016), 2035–2071.
  27. Heidi Ledford. 2019. Millions affected by racial bias in health-care algorithm. Nature 574, 31 (2019), 2.
  28. Elizabeth Mannix and Margaret A Neale. 2005. What differences make a difference? The promise and reality of diverse teams in organizations. Psychological science in the public interest 6, 2 (2005), 31–55.
  29. An empirical study of the impact of modern code review practices on software quality. Empirical Software Engineering 21 (04 2015). https://doi.org/10.1007/s10664-015-9381-9
  30. ” Did you miss my comment or what?” understanding toxicity in open source discussions. In Proceedings of the 44th International Conference on Software Engineering. 710–722.
  31. Why do people give up flossing? a study of contributor disengagement in open source. In Open Source Systems: 15th IFIP WG 2.13 International Conference, OSS 2019, Montreal, QC, Canada, May 26–27, 2019, Proceedings 15. Springer, 116–129.
  32. The pushback effects of race, ethnicity, gender, and age in code review. Commun. ACM 65, 3 (2022), 52–57.
  33. Insights into nonmerged pull requests in GitHub: Is there evidence of bias based on perceptible race? IEEE Softw. 38, 2 (2021), 51–57.
  34. On the relationship between the developer’s perceptible race and ethnicity and the evaluation of contributions in oss. IEEE Transactions on Software Engineering 48, 8 (2021), 2955–2968.
  35. A benchmark study on sentiment analysis for software engineering research. In Proceedings of the 15th International Conference on Mining Software Repositories. 364–375.
  36. Stack Overflow. [n. d.]. Stack Overflow Developer Survey 2022. https://survey.stackoverflow.co/2022
  37. How Gender-Biased Tools Shape Newcomer Experiences in OSS Projects. IEEE Transactions on Software Engineering 48, 1 (2022), 241–259. https://doi.org/10.1109/TSE.2020.2984173
  38. Hypothesis testing and modeling perspectives on inquiry. Handbook of interpersonal communication 3 (2002), 23–72.
  39. Pygithub. [n. d.]. Toxicity in Open Source. https://pygithub.readthedocs.io/. 2020.
  40. Stress and burnout in open source: Toward finding, understanding, and mitigating unhealthy interactions. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results. 57–60.
  41. Said E Said and David A Dickey. 1984. Testing for unit roots in autoregressive-moving average models of unknown order. Biometrika 71, 3 (1984), 599–607.
  42. Lucía Santamaría and Helena Mihaljević. 2018. Comparison and benchmark of name-to-gender inference services. PeerJ Computer Science 4 (2018), e156.
  43. Automated Identification of Toxic Code Reviews Using ToxiCR. ACM Transactions on Software Engineering and Methodology (2023).
  44. A practical guide to calculating Cohen’sf 2, a measure of local effect size, from PROC MIXED. Frontiers in psychology 3 (2012), 111.
  45. Edward H Simpson. 1949. Measurement of diversity. nature 163, 4148 (1949), 688–688.
  46. Codes of conduct in Open Source Software—for warm and fuzzy feelings or equality in community? Software Quality Journal (2021), 1–40.
  47. Vandana Singh and William Brandon. 2022. Discrimination, misogyny and harassment: Examples from OSS: content analysis of women-focused online discussion forums. In Proceedings of the Third Workshop on Gender Equality, Diversity, and Inclusion in Software Engineering. 71–79.
  48. Megan Squire and Rebecca Gazda. 2015. FLOSS as a Source for Profanity and Insults: Collecting the Data. In 2015 48th Hawaii International Conference on System Sciences. IEEE, 5290–5298.
  49. Automated Identification of Sexual Orientation and Gender Identity Discriminatory Texts from Issue Comments. arXiv:2311.08485 [cs.SE]
  50. Fedora D&I Team. 2019. Diversity and inclusion in Fedora.
  51. Speaking truth to power: Exploring the intersectional experiences of Black women in computing. In 2018 Research on Equity and Sustained Participation in Engineering, Computing, and Technology (RESPECT). IEEE, 1–8.
  52. Review participation in modern code review. Empirical Software Engineering 22 (2016), 768–817.
  53. Gender and tenure diversity in GitHub teams. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 3789–3798.
  54. Yi Wang and David Redmiles. 2019. Implicit gender biases in professional software development: An empirical study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Society (ICSE-SEIS). IEEE, 1–10.
  55. Rebecca M Warner. 1998. Spectral analysis of time-series data. Guilford Press.
  56. An empirical comparison of ethnic and gender diversity of DevOps and non-DevOps contributions to open-source projects. Empirical Software Engineering 28, 6 (2023), 150.
  57. Taro Yamane. 1973. Statistics: An introductory analysis. (1973).
  58. G Udny Yule. 1926. Why do we sometimes get nonsense-correlations between Time-Series?–a study in sampling and the nature of time-series. Journal of the royal statistical society 89, 1 (1926), 1–63.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Sayma Sultana (7 papers)
  2. Gias Uddin (47 papers)
  3. Amiangshu Bosu (17 papers)