"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output (2404.07362v1)
Abstract: LLMs can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.
- 2023. guidance-ai/guidance. https://github.com/guidance-ai/guidance original-date: 2022-11-10T18:21:45Z.
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. https://doi.org/10.48550/arXiv.2204.05862 arXiv:2204.05862 [cs].
- Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In Proceedings of the 20th international conference on World wide web (WWW ’11). Association for Computing Machinery, New York, NY, USA, 107–116. https://doi.org/10.1145/1963405.1963424
- Prompting Is Programming: A Query Language for Large Language Models. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023), 186:1946–186:1969. https://doi.org/10.1145/3591300
- Language Models are Few-Shot Learners. https://doi.org/10.48550/arXiv.2005.14165 arXiv:2005.14165 [cs].
- Supporting Mobile Sensemaking Through Intentionally Uncertain Highlighting. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST ’16). ACM, New York, NY, USA, 61–68. https://doi.org/10.1145/2984511.2984538
- PaLM: Scaling Language Modeling with Pathways. https://doi.org/10.48550/arXiv.2204.02311 arXiv:2204.02311 [cs].
- Google Cloud. 2023. Function calling | Vertex AI. https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/function-calling
- Ronen Eldan and Yuanzhi Li. 2023. TinyStories: How Small Can Language Models Be and Still Speak Coherent English? https://doi.org/10.48550/arXiv.2305.07759 arXiv:2305.07759 [cs].
- Large Language Models for Software Engineering: Survey and Open Problems. https://doi.org/10.48550/arXiv.2310.03533 arXiv:2310.03533 [cs].
- Google. 2023. Google AI Studio quickstart. https://ai.google.dev/tutorials/ai-studio_quickstart
- Chris Hokamp and Qun Liu. 2017. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, Canada, 1535–1546. https://doi.org/10.18653/v1/P17-1141
- Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 839–850. https://doi.org/10.18653/v1/N19-1090
- PromptMaker: Prompt-based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3491101.3503564
- Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 1555–1574. https://doi.org/10.18653/v1/2023.emnlp-main.96
- LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models. https://doi.org/10.48550/arXiv.2402.10524 arXiv:2402.10524 [cs].
- Holistic Evaluation of Language Models. https://doi.org/10.48550/arXiv.2211.09110 arXiv:2211.09110 [cs].
- Michael Xieyang Liu. 2023. Tool Support for Knowledge Foraging, Structuring, and Transfer during Online Sensemaking. Ph. D. Dissertation. Carnegie Mellon University. http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/usr/ftp/hcii/abstracts/23-105.html
- Crystalline: Lowering the Cost for Developers to Collect and Organize Information for Decision Making. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3491102.3501968 event-place: New Orleans, LA, USA.
- Wigglite: Low-cost Information Collection and Triage. In The 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3526113.3545661
- “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–31. https://doi.org/10.1145/3544548.3580817
- Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models. https://doi.org/10.48550/arXiv.2310.02161
- NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 780–799. https://doi.org/10.18653/v1/2022.naacl-main.57
- Controlled Decoding from Language Models. https://doi.org/10.48550/arXiv.2310.17022 arXiv:2310.17022 [cs].
- OpenAI. 2023a. ChatGPT. https://chat.openai.com
- OpenAI. 2023b. Function calling | OpenAI Platform. https://platform.openai.com/docs/guides/function-calling
- OpenAI. 2023c. JSON mode - Text generation. https://platform.openai.com/docs/guides/text-generation/json-mode
- OpenAI. 2023d. Playground - OpenAI API. https://platform.openai.com/playground
- Training language models to follow instructions with human feedback. https://doi.org/10.48550/arXiv.2203.02155 arXiv:2203.02155 [cs].
- Building Your Own Product Copilot: Challenges, Opportunities, and Needs. https://doi.org/10.48550/arXiv.2312.14231 arXiv:2312.14231 [cs].
- PromptInfuser: Bringing User Interface Mock-ups to Life with Large Language Models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3544549.3585628
- ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles. https://doi.org/10.48550/arXiv.2310.15428 arXiv:2310.15428 [cs].
- Learning to summarize with human feedback. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 3008–3021. https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html
- Anselm Strauss and Juliet Corbin. 1990. Basics of qualitative research. Sage publications.
- AESOP: Paraphrase Generation with Adaptive Syntactic Control. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 5176–5189. https://doi.org/10.18653/v1/2021.emnlp-main.420
- Evaluating Large Language Models on Controlled Generation Tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 3155–3168. https://doi.org/10.18653/v1/2023.emnlp-main.190
- Visual Studio Code in Introductory Computer Science Course: An Experience Report. https://doi.org/10.48550/arXiv.2303.10174 arXiv:2303.10174 [cs].
- Prompt2Model: Generating Deployable Models from Natural Language Instructions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Yansong Feng and Els Lefever (Eds.). Association for Computational Linguistics, Singapore, 413–421. https://doi.org/10.18653/v1/2023.emnlp-demo.38
- Brandon T. Willard and Rémi Louf. 2023. Efficient Guided Generation for Large Language Models. https://arxiv.org/abs/2307.09702v4
- Look-back Decoding for Open-Ended Text Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 1039–1050. https://doi.org/10.18653/v1/2023.emnlp-main.66
- Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–21. https://doi.org/10.1145/3544548.3581388
- Evaluating Large Language Models at Evaluating Instruction Following. https://doi.org/10.48550/arXiv.2310.07641 arXiv:2310.07641 [cs].
- Instruction-Following Evaluation for Large Language Models. https://doi.org/10.48550/arXiv.2311.07911 arXiv:2311.07911 [cs].
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.