Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Studying and Recommending Information Highlighting in Stack Overflow Answers (2401.01472v3)

Published 3 Jan 2024 in cs.CL, cs.IR, cs.LG, and cs.SE

Abstract: Context: Navigating the knowledge of Stack Overflow (SO) remains challenging. To make the posts vivid to users, SO allows users to write and edit posts with Markdown or HTML so that users can leverage various formatting styles (e.g., bold, italic, and code) to highlight the important information. Nonetheless, there have been limited studies on the highlighted information. Objective: We carried out the first large-scale exploratory study on the information highlighted in SO answers in our recent study. To extend our previous study, we develop approaches to automatically recommend highlighted content with formatting styles using neural network architectures initially designed for the Named Entity Recognition task. Method: In this paper, we studied 31,169,429 answers of Stack Overflow. For training recommendation models, we choose CNN-based and BERT-based models for each type of formatting (i.e., Bold, Italic, Code, and Heading) using the information highlighting dataset we collected from SO answers. Results: Our models achieve a precision ranging from 0.50 to 0.72 for different formatting types. It is easier to build a model to recommend Code than other types. Models for text formatting types (i.e., Heading, Bold, and Italic) suffer low recall. Our analysis of failure cases indicates that the majority of the failure cases are due to missing identification. One explanation is that the models are easy to learn the frequent highlighted words while struggling to learn less frequent words (i.g., long-tail knowledge). Conclusion: Our findings suggest that it is possible to develop recommendation models for highlighting information for answers with different formatting styles on Stack Overflow.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. A first look at information highlighting in stack overflow answers. In 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 369–373. IEEE.
  2. Syntax highlighting as an influencing factor when reading and comprehending source code. Journal of Eye Movement Research, 9(1).
  3. Bert: Pre-training of deep bidirectional transformers for language understanding.
  4. Spike–a code editor plugin highlighting fine-grained changes. In 2022 Working Conference on Software Visualization (VISSOFT), pages 167–171. IEEE.
  5. Evaluating information extraction. In Multilingual and Multimodal Information Access Evaluation: International Conference of the Cross-Language Evaluation Forum, CLEF 2010, Padua, Italy, September 20-23, 2010. Proceedings 1, pages 100–111. Springer.
  6. Face, H. (2023). BERT. https://huggingface.co/docs/transformers/model_doc/bert. Accessed: 2023-08-23.
  7. Finding relevant answers in software forums. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 323–332. IEEE.
  8. Does syntax highlighting help programming novices? Empirical Software Engineering, 23, 2795–2828.
  9. Ptm4tag: Sharpening tag recommendation of stack overflow posts with pre-trained models. 2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC), pages 1–11.
  10. Evaluating and combining name entity recognition systems. In Proceedings of the Sixth Named Entity Workshop, pages 21–27.
  11. Large language models struggle to learn long-tail knowledge. arXiv preprint arXiv:2211.08411.
  12. Taxonerd: deep neural models for the recognition of taxonomic entities in the ecological and evolutionary literature. Methods in Ecology and Evolution, 13(3), 625–641.
  13. Improving api caveats accessibility by mining api caveats knowledge graph. In 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 183–193. IEEE.
  14. From discussion to wisdom: Web resource recommendation for hyperlinks in stack overflow. In Proceedings of the 31st Annual ACM Symposium on Applied Computing, SAC ’16, page 1127–1133, New York, NY, USA. Association for Computing Machinery.
  15. Linklive: discovering web learning resources for developers from q&a discussions. World Wide Web, 22, 1699–1725.
  16. Deeptagrec: A content-cum-user based tag recommendation framework for stack overflow. In L. Azzopardi, B. Stein, N. Fuhr, P. Mayr, C. Hauff, and D. Hiemstra, editors, Advances in Information Retrieval, pages 125–131, Cham. Springer International Publishing.
  17. Memorization in nlp fine-tuning methods. arXiv preprint arXiv:2205.12506.
  18. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3–26.
  19. Essential sentences for navigating stack overflow answers. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 229–239. IEEE.
  20. Combining crowd and expert labels using decision theoretic active learning. In Third AAAI conference on human computation and crowdsourcing.
  21. Overflow, S. (2022). Markdown help. https://stackoverflow.com/editing-help. Accessed: 2023-01-30.
  22. On-the-fly syntax highlighting using neural networks. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 269–280.
  23. Toxic code snippets on stack overflow. IEEE Transactions on Software Engineering, 47(3), 560–581.
  24. Understanding the impact of text highlighting in crowdsourcing tasks. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 7, pages 144–152.
  25. A hybrid auto-tagging system for stackoverflow forum questions. In Proceedings of the 2014 International Conference on Interdisciplinary Advances in Applied Computing, New York, NY, USA. Association for Computing Machinery.
  26. Discovering, explaining and summarizing controversial discussions in community q&a sites. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 151–162. IEEE.
  27. Sarkar, A. (2015). The impact of syntax colouring on program comprehension. In PPIG, page 8.
  28. Seaman, C. B. (1999). Qualitative methods in empirical studies of software engineering. IEEE Transactions on software engineering, 25(4), 557–572.
  29. SPACY (2023). Linguistic Features. https://spacy.io/usage/linguistic-features#named-entities. Accessed: 2023-01-30.
  30. SpaCy (2023). Model architectures. Accessed: 2023-01-30.
  31. StackExchange (2023). How do I format my posts using Markdown or HTML? https://meta.stackexchange.com/help/formatting. Accessed: 2023-01-30.
  32. Guidelines for effective usage of text highlighting techniques. IEEE transactions on visualization and computer graphics, 22(1), 489–498.
  33. Genetag: a tagged corpus for gene/protein named entity recognition. BMC bioinformatics, 6, 1–7.
  34. Augmenting api documentation with insights from stack overflow. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 392–403. IEEE.
  35. Understanding interobserver agreement: the kappa statistic. Fam med, 37(5), 360–363.
  36. Sotagrec: A combined tag recommendation approach for stack overflow. In Proceedings of the 2019 4th International Conference on Mathematics and Artificial Intelligence, page 146–152. Association for Computing Machinery.
  37. Iea: an answerer recommendation approach on stack overflow. Science China Information Sciences, 62(11), 212103.
  38. Entagrec++: An enhanced tag recommendation system for software information sites. Empirical Software Engineering, 23, 800–832.
  39. Crowdsourcing annotations for websites’ privacy policies: Can it really work? In Proceedings of the 25th International Conference on World Wide Web, pages 133–143.
  40. Improving searching and reading performance: the effect of highlighting and text color coding. Information & Management, 40(7), 617–637.
  41. Answerbot: Automated generation of answer summary to developers’ technical questions. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 706–716. IEEE.
  42. A question-driven source code recommendation service based on stack overflow. In 2019 IEEE World Congress on Services (SERVICES), volume 2642-939X, pages 358–359.
  43. An empirical study of obsolete answers on stack overflow. IEEE Transactions on Software Engineering, 47(4), 850–862.
  44. Are comments on stack overflow well organized for easy retrieval by developers? ACM Transactions on Software Engineering and Methodology (TOSEM), 30(2), 1–31.
  45. Deep long-tailed learning: A survey. arXiv preprint arXiv:2110.04596.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com