Building Your Own Product Copilot: Challenges, Opportunities, and Needs (2312.14231v1)
Abstract: A race is underway to embed advanced AI capabilities into products. These product copilots enable users to ask questions in natural language and receive relevant responses that are specific to the user's context. In fact, virtually every large technology company is looking to add these capabilities to their software products. However, for most software engineers, this is often their first encounter with integrating AI-powered technology. Furthermore, software engineering processes and tools have not caught up with the challenges and scale involved with building AI-powered applications. In this work, we present the findings of an interview study with 26 professional software engineers responsible for building product copilots at various companies. From our interviews, we found pain points at every step of the engineering process and the challenges that strained existing development practices. We then conducted group brainstorming sessions to collaborative on opportunities and tool designs for the broader software engineering community.
- ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing. arXiv:2309.09128 [cs.HC]
- The Bones of the System: A Case Study of Logging and Telemetry at Microsoft. In 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C). 92–101.
- Grounded Copilot: How Programmers Interact with Code-Generating Models. 7, OOPSLA1, Article 78 (apr 2023), 27 pages. https://doi.org/10.1145/3586030
- Promptify: Text-to-Image Generation through Interactive Prompt Exploration with Large Language Models. arXiv:2304.09337 [cs.HC]
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77–91.
- ”There’s no way to keep up!”: Diverse Motivations and Challenges Faced by Informal Learners of ML. In 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 1–11. https://doi.org/10.1109/VL/HCC53370.2022.9833100
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
- Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1, Article 4 (jan 2018), 27 pages. https://doi.org/10.1145/3143561
- Large Language Models for Software Engineering: Survey and Open Problems. arXiv:2310.03533 [cs.SE]
- Robert L. Forward. 1996. Ad Astra! Journal of the British Interplanetary Society 49, 1 (1996), 23–32.
- Fairness Testing: Testing Software for Discrimination. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (Paderborn, Germany) (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 498–510. https://doi.org/10.1145/3106237.3106277
- PromptMaker: Prompt-Based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 35, 8 pages. https://doi.org/10.1145/3491101.3503564
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 9459–9474. https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf
- Holistic Evaluation of Language Models. arXiv:2211.09110 [cs.CL]
- On the Design of AI-Powered Code Assistants for Notebooks. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 434, 16 pages. https://doi.org/10.1145/3544548.3580940
- CodeCompose: A Large-Scale Industrial Deployment of AI-assisted Code Authoring. arXiv:2305.12050 [cs.SE]
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318.
- Surveying the Developer Experience of Flaky Tests. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice (Pittsburgh, Pennsylvania) (ICSE-SEIP ’22). Association for Computing Machinery, New York, NY, USA, 253–262. https://doi.org/10.1145/3510457.3513037
- Test-Case Reduction for C Compiler Bugs. SIGPLAN Not. 47, 6 (jun 2012), 335–346. https://doi.org/10.1145/2345156.2254104
- Remote, but Connected: How #TidyTuesday Provides an Online Community of Practice for Data Scientists. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 52 (apr 2021), 31 pages. https://doi.org/10.1145/3449126
- Towards More Effective AI-Assisted Programming: A Systematic Design Exploration to Improve Visual Studio IntelliCode’s User Experience. In 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). 185–195. https://doi.org/10.1109/ICSE-SEIP58684.2023.00022
- PromptChainer: Chaining Large Language Model Prompts through Visual Programming. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 359, 10 pages. https://doi.org/10.1145/3491101.3519729
- AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 385, 22 pages. https://doi.org/10.1145/3491102.3517582
- Chloe Xiang. 2023. Man Dies by Suicide After Talking with AI Chatbot, Widow Says. https://www.vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says Accessed: 10/1/2023.
- WizardLM: Empowering Large Language Models to Follow Complex Instructions. arXiv:2304.12244 [cs.CL]
- Concept-Annotated Examples for Library Comparison. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 65, 16 pages. https://doi.org/10.1145/3526113.3545647
- Andreas Zeller. 1999. Yesterday, My Program Worked. Today, It Does Not. Why?. In Proceedings of the 7th European Software Engineering Conference Held Jointly with the 7th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Toulouse, France) (ESEC/FSE-7). Springer-Verlag, Berlin, Heidelberg, 253–267.
- Chris Parnin (19 papers)
- Gustavo Soares (21 papers)
- Rahul Pandita (6 papers)
- Sumit Gulwani (55 papers)
- Jessica Rich (1 paper)
- Austin Z. Henley (12 papers)