Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Does Documentation Matter? An Empirical Study of Practitioners' Perspective on Open-Source Software Adoption (2403.03819v1)

Published 6 Mar 2024 in cs.SE

Abstract: In recent years, open-source software (OSS) has become increasingly prevalent in developing software products. While OSS documentation is the primary source of information provided by the developers' community about a product, its role in the industry's adoption process has yet to be examined. We conducted semi-structured interviews and an online survey to provide insight into this area. Based on interviews and survey insights, we developed a topic model to collect relevant information from OSS documentation automatically. Additionally, according to our survey responses regarding challenges associated with OSS documentation, we propose a novel information augmentation approach, DocMentor, by combining OSS documentation corpus TF-IDF scores and ChatGPT. Through explaining technical terms and providing examples and references, our approach enhances the documentation context and improves practitioners' understanding. Our tool's effectiveness is assessed by surveying practitioners.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Software Documentation Issues Unveiled. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 1199–1210. https://doi.org/10.1109/ICSE.2019.00122
  2. An NLP-based quality attributes extraction and prioritization framework in Agile-driven software development. Automated Software Engineering 30, 1 (2023), 7. https://doi.org/10.1007/s10515-022-00371-9
  3. Dimo Angelov. 2020. Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470 (2020).
  4. Cross-lingual contextualized topic models with zero-shot learning. EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference (4 2021), 1676–1683. https://doi.org/10.18653/V1/2021.EACL-MAIN.143
  5. Latent dirichlet allocation. Journal of machine Learning research 3, Jan (2003), 993–1022.
  6. Johanna Blumenthal. 2022. Thinking like a Lawyer: Why You or Your IT Team Needs to Keep Your Software Systems up-to-Date. SIGCAS Comput. Soc. 50, 3 (8 2022), 10. https://doi.org/10.1145/3557900.3557905
  7. Considerations and challenges for the adoption of open source components in software-intensive businesses. Journal of Systems and Software 186 (4 2022), 111152. https://doi.org/10.1016/J.JSS.2021.111152
  8. Jaime Carbonell and Jade Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 335–336.
  9. DMOSS: Open source software documentation assessment. Computer Science and Information Systems 11, 4 (2014), 1197–1207.
  10. GPTutor: a ChatGPT-powered programming tool for code explanation. arXiv preprint arXiv:2305.01863 (2023).
  11. The Impact of Prior Knowledge on Searching in Software Documentation. In Proceedings of the 2014 ACM Symposium on Document Engineering (DocEng ’14). Association for Computing Machinery, New York, NY, USA, 189–198. https://doi.org/10.1145/2644866.2644878
  12. Roman Egger and Joanne Yu. 2022. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology 7 (2022), 886498.
  13. Jane Forman and Laura Damschroder. 2007. Qualitative content analysis. In Empirical methods for bioethics: A primer. Emerald Group Publishing Limited, 39–62.
  14. Revisiting methodological issues in transcript analysis: Negotiated coding and reliability. The internet and higher education 9, 1 (2006), 1–8.
  15. Extracting Quality Attributes from User Stories for Early Architecture Decision Making. In 2019 IEEE International Conference on Software Architecture Companion (ICSA-C). 129–136. https://doi.org/10.1109/ICSA-C.2019.00031
  16. GitHub User. 2019a. Improve documentation. https://github.com/SatoshiPortal/cyphernode/issues/62
  17. GitHub User. 2019b. This page is too technical. https://github.com/google/WebFundamentals/issues/7282
  18. GitHub User. 2020. Term too technical. https://github.com/nus-cs2113-AY2021S1/pe-dev-response/issues/1723
  19. GitHub User. 2022. Normie-fy the documentation. https://github.com/abdulqshabbir/star-wars/issues/5
  20. Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT. Zenodo (2020).
  21. Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. (3 2022).
  22. Identification and Analysis of Log4j Vulnerability. In 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART). 1580–1583. https://doi.org/10.1109/SMART55829.2022.10047372
  23. Does Documentation Matter? An Empirical Study of Practitioners’ Perspective on Open-Source Software Adoption. (3 2024). https://doi.org/10.6084/m9.figshare.24210366.v1
  24. Exploring factors and metrics to select open source software components for integration: An empirical study. Journal of Systems and Software 188 (2022), 111255. https://doi.org/10.1016/j.jss.2022.111255
  25. Arkom Madaehoh and Twittie Senivongse. 2022. OSS-AQM: An Open-Source Software Quality Model for Automated Quality Measurement. In 2022 International Conference on Data and Software Engineering (ICoDSE). 126–131. https://doi.org/10.1109/ICoDSE56892.2022.9972135
  26. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2, 11 (2017), 205.
  27. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. (2 2018). https://arxiv.org/abs/1802.03426v3
  28. OpenAI. 2023. ChatGPT.
  29. The Value of Software Documentation Quality. In 2014 14th International Conference on Quality Software. 333–342. https://doi.org/10.1109/QSIC.2014.22
  30. Sachin Rathee and Amol Chobe. 2022. Open Source Growth and Trends. In Getting Started with Open Source Technologies. Apress, Berkeley, CA, 149–169. https://doi.org/10.1007/978-1-4842-8127-7{_}8
  31. Read the Docs Inc & contributors. [n. d.]. Read the Docs. https://docs.readthedocs.io/en/stable
  32. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. (8 2019).
  33. Gitranking: A Ranking of Github Topics for Software Classification Using Active Sampling. SSRN Electronic Journal (2022). https://doi.org/10.2139/ssrn.4182105
  34. Sphinx developers. [n. d.]. Sphinx Documentation. https://www.sphinx-doc.org/en/master/index.html
  35. Synopsis. 2023. Open Source Security and Risk Analysis Report. Technical Report.
  36. Henry Tang and Sarah Nadi. 2023. Evaluating Software Documentation Quality. In 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR). 67–78. https://doi.org/10.1109/MSR59073.2023.00023
  37. Beyond Accuracy: Assessing Software Documentation Quality. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2020). Association for Computing Machinery, New York, NY, USA, 1509–1512. https://doi.org/10.1145/3368089.3417045
  38. Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering. 1–10.
  39. Nebi Yılmaz and Ayça Kolukısa Tarhan. 2022. Quality evaluation models or frameworks for open source software: A systematic literature review. Journal of Software: Evolution and Process 34, 6 (2022), e2458. https://doi.org/10.1002/smr.2458
  40. Topic modelling meets deep neural networks: A survey. arXiv preprint arXiv:2103.00498 (2021).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Aaron Imani (4 papers)
  2. Shiva Radmanesh (2 papers)
  3. Iftekhar Ahmed (35 papers)
  4. Mohammad Moshirpour (6 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com