Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant (2311.18450v3)

Published 30 Nov 2023 in cs.SE

Abstract: With their exceptional natural language processing capabilities, tools based on LLMs like ChatGPT and Co-Pilot have swiftly become indispensable resources in the software developer's toolkit. While recent studies suggest the potential productivity gains these tools can unlock, users still encounter drawbacks, such as generic or incorrect answers. Additionally, the pursuit of improved responses often leads to extensive prompt engineering efforts, diverting valuable time from writing code that delivers actual value. To address these challenges, a new breed of tools, built atop LLMs, is emerging. These tools aim to mitigate drawbacks by employing techniques like fine-tuning or enriching user prompts with contextualized information. In this paper, we delve into the lessons learned by a software development team venturing into the creation of such a contextualized LLM-based application, using retrieval-based techniques, called CodeBuddy. Over a four-month period, the team, despite lacking prior professional experience in LLM-based applications, built the product from scratch. Following the initial product release, we engaged with the development team responsible for the code generative components. Through interviews and analysis of the application's issue tracker, we uncover various intriguing challenges that teams working on LLM-based applications might encounter. For instance, we found three main group of lessons: LLM-based lessons, User-based lessons, and Technical lessons. By understanding these lessons, software development teams could become better prepared to build LLM-based applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. ]Googleblog2022 [n. d.]. ML-Enhanced Code Completion Improves Developer Productivity. https://ai.googleblog.com/2022/07/ml-enhanced-code-completion-improves.html. Accessed: 2022-12-12.
  2. Guidelines for Human-AI Interaction. In CHI 2019. ACM. https://www.microsoft.com/en-us/research/publication/guidelines-for-human-ai-interaction/ CHI 2019 Honorable Mention Award.
  3. Samuel R Bowman. 2023. Eight things to know about large language models. arXiv preprint arXiv:2304.00612 (2023).
  4. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
  5. A survey on evaluation of large language models. arXiv preprint arXiv:2307.03109 (2023).
  6. Victoria Clarke and Virginia Braun. 2013. Successful Qualitative Research: A Practical Guide for Beginners. Sage, London.
  7. Stable Diffusion Prompt Book. Accessed: 2023-10-01.
  8. Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? arXiv:2210.14699 [cs.SE]
  9. Large Language Models for Software Engineering: Survey and Open Problems. arXiv preprint arXiv:2310.03533 (2023).
  10. Automated repair of programs from large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 1469–1481.
  11. Görkem Giray. 2021. A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software 180 (2021), 111031.
  12. Jingxuan He and Martin T. Vechev. 2023. Controlling Large Language Models to Generate Secure and Vulnerable Code. CoRR abs/2302.05319 (2023). https://doi.org/10.48550/arXiv.2302.05319 arXiv:2302.05319
  13. Large Language Models for Software Engineering: A Systematic Literature Review. CoRR abs/2308.10620 (2023). https://doi.org/10.48550/arXiv.2308.10620 arXiv:2308.10620
  14. CS Krishna. 2023. Prompt Generate Train (PGT): A framework for few-shot domain adaptation, alignment, and uncertainty calibration of a retriever augmented generation (RAG) model for domain specific open book question-answering. arXiv preprint arXiv:2307.05915 (2023).
  15. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html
  16. AceCoder: Utilizing Existing Code to Enhance Code Generation. arXiv:2303.17780 [cs.SE]
  17. Lost in the Middle: How Language Models Use Long Contexts. CoRR abs/2307.03172 (2023). https://doi.org/10.48550/arXiv.2307.03172 arXiv:2307.03172
  18. LLM is Like a Box of Chocolates: the Non-determinism of ChatGPT in Code Generation. CoRR abs/2308.02828 (2023). https://doi.org/10.48550/arXiv.2308.02828 arXiv:2308.02828
  19. Large language models in medicine. Nature medicine (2023), 1–11.
  20. Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 332, 7 pages. https://doi.org/10.1145/3491101.3519665
  21. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.
  22. Software testing with large language model: Survey, landscape, and vision. arXiv preprint arXiv:2307.07221 (2023).
  23. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903 [cs.CL]
  24. Perfection Not Required? Human-AI Partnerships in Code Translation. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI ’21). Association for Computing Machinery, New York, NY, USA, 402–412. https://doi.org/10.1145/3397481.3450656
  25. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).
  26. In-IDE Code Generation from Natural Language: Promise and Challenges. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 29 (mar 2022), 47 pages. https://doi.org/10.1145/3487569
  27. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. CoRR abs/2304.13712 (2023). https://doi.org/10.48550/arXiv.2304.13712 arXiv:2304.13712
  28. What Do Code Models Memorize? An Empirical Study on Large Language Models of Code. CoRR abs/2308.09932 (2023). https://doi.org/10.48550/arXiv.2308.09932 arXiv:2308.09932
  29. Productivity Assessment of Neural Code Completion. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (San Diego, CA, USA) (MAPS 2022). Association for Computing Machinery, New York, NY, USA, 21–29. https://doi.org/10.1145/3520312.3534864
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Gustavo Pinto (33 papers)
  2. Cleidson de Souza (4 papers)
  3. João Batista Neto (2 papers)
  4. Alberto de Souza (6 papers)
  5. Tarcísio Gotto (1 paper)
  6. Edward Monteiro (3 papers)
Citations (3)