Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 34 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 21 tok/s Pro
GPT-4o 130 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

"We Need Structured Output": Towards User-centered Constraints on Large Language Model Output (2404.07362v1)

Published 10 Apr 2024 in cs.HC

Abstract: LLMs can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. 2023. guidance-ai/guidance. https://github.com/guidance-ai/guidance original-date: 2022-11-10T18:21:45Z.
  2. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. https://doi.org/10.48550/arXiv.2204.05862 arXiv:2204.05862 [cs].
  3. Ziv Bar-Yossef and Naama Kraus. 2011. Context-sensitive query auto-completion. In Proceedings of the 20th international conference on World wide web (WWW ’11). Association for Computing Machinery, New York, NY, USA, 107–116. https://doi.org/10.1145/1963405.1963424
  4. Prompting Is Programming: A Query Language for Large Language Models. Proceedings of the ACM on Programming Languages 7, PLDI (June 2023), 186:1946–186:1969. https://doi.org/10.1145/3591300
  5. Language Models are Few-Shot Learners. https://doi.org/10.48550/arXiv.2005.14165 arXiv:2005.14165 [cs].
  6. Supporting Mobile Sensemaking Through Intentionally Uncertain Highlighting. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST ’16). ACM, New York, NY, USA, 61–68. https://doi.org/10.1145/2984511.2984538
  7. PaLM: Scaling Language Modeling with Pathways. https://doi.org/10.48550/arXiv.2204.02311 arXiv:2204.02311 [cs].
  8. Google Cloud. 2023. Function calling | Vertex AI. https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/function-calling
  9. Ronen Eldan and Yuanzhi Li. 2023. TinyStories: How Small Can Language Models Be and Still Speak Coherent English? https://doi.org/10.48550/arXiv.2305.07759 arXiv:2305.07759 [cs].
  10. Large Language Models for Software Engineering: Survey and Open Problems. https://doi.org/10.48550/arXiv.2310.03533 arXiv:2310.03533 [cs].
  11. Google. 2023. Google AI Studio quickstart. https://ai.google.dev/tutorials/ai-studio_quickstart
  12. Chris Hokamp and Qun Liu. 2017. Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Regina Barzilay and Min-Yen Kan (Eds.). Association for Computational Linguistics, Vancouver, Canada, 1535–1546. https://doi.org/10.18653/v1/P17-1141
  13. Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, Minneapolis, Minnesota, 839–850. https://doi.org/10.18653/v1/N19-1090
  14. PromptMaker: Prompt-based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3491101.3503564
  15. Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 1555–1574. https://doi.org/10.18653/v1/2023.emnlp-main.96
  16. LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models. https://doi.org/10.48550/arXiv.2402.10524 arXiv:2402.10524 [cs].
  17. Holistic Evaluation of Language Models. https://doi.org/10.48550/arXiv.2211.09110 arXiv:2211.09110 [cs].
  18. Michael Xieyang Liu. 2023. Tool Support for Knowledge Foraging, Structuring, and Transfer during Online Sensemaking. Ph. D. Dissertation. Carnegie Mellon University. http://reports-archive.adm.cs.cmu.edu/anon/anon/usr0/ftp/usr/ftp/hcii/abstracts/23-105.html
  19. Crystalline: Lowering the Cost for Developers to Collect and Organize Information for Decision Making. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3491102.3501968 event-place: New Orleans, LA, USA.
  20. Wigglite: Low-cost Information Collection and Triage. In The 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3526113.3545661
  21. “What It Wants Me To Say”: Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–31. https://doi.org/10.1145/3544548.3580817
  22. Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models. https://doi.org/10.48550/arXiv.2310.02161
  23. NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Marine Carpuat, Marie-Catherine de Marneffe, and Ivan Vladimir Meza Ruiz (Eds.). Association for Computational Linguistics, Seattle, United States, 780–799. https://doi.org/10.18653/v1/2022.naacl-main.57
  24. Controlled Decoding from Language Models. https://doi.org/10.48550/arXiv.2310.17022 arXiv:2310.17022 [cs].
  25. OpenAI. 2023a. ChatGPT. https://chat.openai.com
  26. OpenAI. 2023b. Function calling | OpenAI Platform. https://platform.openai.com/docs/guides/function-calling
  27. OpenAI. 2023c. JSON mode - Text generation. https://platform.openai.com/docs/guides/text-generation/json-mode
  28. OpenAI. 2023d. Playground - OpenAI API. https://platform.openai.com/playground
  29. Training language models to follow instructions with human feedback. https://doi.org/10.48550/arXiv.2203.02155 arXiv:2203.02155 [cs].
  30. Building Your Own Product Copilot: Challenges, Opportunities, and Needs. https://doi.org/10.48550/arXiv.2312.14231 arXiv:2312.14231 [cs].
  31. PromptInfuser: Bringing User Interface Mock-ups to Life with Large Language Models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3544549.3585628
  32. ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles. https://doi.org/10.48550/arXiv.2310.15428 arXiv:2310.15428 [cs].
  33. Learning to summarize with human feedback. In Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 3008–3021. https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html
  34. Anselm Strauss and Juliet Corbin. 1990. Basics of qualitative research. Sage publications.
  35. AESOP: Paraphrase Generation with Adaptive Syntactic Control. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 5176–5189. https://doi.org/10.18653/v1/2021.emnlp-main.420
  36. Evaluating Large Language Models on Controlled Generation Tasks. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 3155–3168. https://doi.org/10.18653/v1/2023.emnlp-main.190
  37. Visual Studio Code in Introductory Computer Science Course: An Experience Report. https://doi.org/10.48550/arXiv.2303.10174 arXiv:2303.10174 [cs].
  38. Prompt2Model: Generating Deployable Models from Natural Language Instructions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Yansong Feng and Els Lefever (Eds.). Association for Computational Linguistics, Singapore, 413–421. https://doi.org/10.18653/v1/2023.emnlp-demo.38
  39. Brandon T. Willard and Rémi Louf. 2023. Efficient Guided Generation for Large Language Models. https://arxiv.org/abs/2307.09702v4
  40. Look-back Decoding for Open-Ended Text Generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 1039–1050. https://doi.org/10.18653/v1/2023.emnlp-main.66
  41. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–21. https://doi.org/10.1145/3544548.3581388
  42. Evaluating Large Language Models at Evaluating Instruction Following. https://doi.org/10.48550/arXiv.2310.07641 arXiv:2310.07641 [cs].
  43. Instruction-Following Evaluation for Large Language Models. https://doi.org/10.48550/arXiv.2311.07911 arXiv:2311.07911 [cs].
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 1 post and received 0 likes.