RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation (2402.16667v1)
Abstract: Generative models have demonstrated considerable potential in software engineering, particularly in tasks such as code generation and debugging. However, their utilization in the domain of code documentation generation remains underexplored. To this end, we introduce RepoAgent, a LLM powered open-source framework aimed at proactively generating, maintaining, and updating code documentation. Through both qualitative and quantitative evaluations, we have validated the effectiveness of our approach, showing that RepoAgent excels in generating high-quality repository-level documentation. The code and results are publicly accessible at https://github.com/OpenBMB/RepoAgent.
- A convolutional attention network for extreme summarization of source code. In Proceedings of the 33nd International Conference on Machine Learning, volume 48, pages 2091–2100, New York City, NY, USA.
- Evaluating large language models trained on code. Computing Research Repository, arXiv:2107.03374.
- AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors. In Proceedings of the the 12th International Conference on Learning Representations, Vienna, Austria.
- A study of the documentation essential to software maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication: documenting & Designing for Pervasive Information, pages 68–75, Coventry, UK.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- On the use of automated text summarization techniques for summarizing source code. In Proceedings of the 17th Working Conference on Reverse Engineering, pages 35–44, Beverly, MA, USA.
- MetaGPT: Meta programming for multi-agent collaborative framework. In Proceedings of the the 12th International Conference on Learning Representations, Vienna, Austria.
- Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2073–2083, Berlin, Germany. Association for Computational Linguistics.
- Junaed Younus Khan and Gias Uddin. 2022. Automatic code documentation generation using GPT-3. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, pages 174:1–174:6, Rochester, MI, USA.
- M.M. Lehman. 1980. Programs, life cycles, and laws of software evolution. Proceedings of the IEEE, 68(9):1060–1076.
- StarCoder: may the source be with you! Computing Research Repository, arXiv:2305.06161.
- Gitagent: Facilitating autonomous agent with github by tool extension. Computing Research Repository, arXiv:2312.17294.
- Robert C Martin. 1996. The dependency inversion principle. C++ Report, 8(6):61–66.
- Automatic generation of natural language summaries for Java classes. In Proceedings of the IEEE 21st International Conference on Program Comprehension, pages 23–32, San Francisco, CA, USA.
- CodeGen: An open large language model for code with multi-turn program synthesis. In Proceedings of the 11th International Conference on Learning Representations, Kigali, Rwanda.
- OpenAI. 2022. OpenAI: Introducing ChatGPT.
- OpenAI. 2023. GPT-4 technical report. Computing Research Repository, arXiv:2303.08774.
- Communicative agents for software development. Computing Research Repository,, arXiv:2307.07924.
- Tool learning with foundation models. Computing Research Repository, arXiv:2304.08354.
- ToolLLM: Facilitating large language models to master 16000+ real-world APIs. In The Twelfth International Conference on Learning Representations, Vienna, Austria.
- Improving language understanding by generative pre-training. Preprint.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67.
- A review on source code documentation. ACM Transactions on Intelligent Systems and Technology, 13(5):1 – 44.
- Improving automated source code summarization via an eye-tracking study of programmers. In Proceedings of the 36th International Conference on Software Engineering, pages 390–401, Hyderabad, India.
- Code Llama: Open foundation models for code. Computing Research Repository,, arXiv:2308.12950.
- Towards automatically generating summary comments for java methods. In Proceedings of the 25th IEEE/ACM international conference on Automated software engineering, pages 43–52, Antwerp, Belgium.
- A prompt learning framework for source code summarization. Computing Research Repository, arXiv:2312.16066.
- DebugBench: Evaluating debugging capability of large language models. Computing Research Repository, arXiv:2401.04621.
- Llama 2: Open foundation and fine-tuned chat models. Computing Research Repository, arXiv:2307.09288.
- Attention is all you need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pages 5998–6008, Long Beach, CA, USA.
- gDoc: Automatic generation of structured API documentation. In Companion Proceedings of the ACM Web Conference 2023, pages 53–56, Austin, TX, USA.
- Chain-of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, pages 24824–24837, New Orleans, LA, USA.
- AutoGen: Enabling next-gen llm applications via multi-agent conversation framework. Computing Research Repository,, arXiv:2308.08155.
- XAgent. 2023. Xagent: An autonomous agent for complex task solving.
- Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering, 44(10):951–976.
- Lemur: Harmonizing natural language and code for language agents. In Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria.
- Proagent: From robotic process automation to agentic process automation. Computing Research Repository, arXiv:2311.10751.
- A survey of automatic source code summarization. Symmetry, 14(3):471.
- A novel neural source code representation based on abstract syntax tree. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering, pages 783–794, Montréal, Québec, Canada.
- Cost, benefits and quality of software development documentation: A systematic mapping. Journal of Systems and Software, 99:175–198.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.