Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interpreting User Requests in the Context of Natural Language Standing Instructions (2311.09796v2)

Published 16 Nov 2023 in cs.CL and cs.AI

Abstract: Users of natural language interfaces, generally powered by LLMs,often must repeat their preferences each time they make a similar request. We describe an approach to LLM-based dialogue modeling in which persistent user constraints and preferences -- collectively termed standing instructions -- as additional context for such interfaces. For example, when a user states "I'm hungry", a previously expressed preference for Persian food can be automatically added to the LLM prompt, influencing the search for relevant restaurants. We develop NLSI, a language-to-program dataset consisting of over 2.4K dialogues spanning 17 domains, where each dialogue is paired with a user profile (a set of users specific standing instructions) and corresponding structured representations (API calls). A key challenge in NLSI is to identify which subset of the standing instructions is applicable to a given dialogue. NLSI contains diverse phenomena, from simple preferences to interdependent instructions such as triggering a hotel search whenever the user is booking tickets to an event. We conduct experiments on NLSI using prompting with LLMs and various retrieval approaches, achieving a maximum of 44.7% exact match on API prediction. Our results demonstrate the challenges in identifying the relevant standing instructions and their interpretation into API calls.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Constitutional AI: Harmlessness from AI feedback. Computing Research Repository, arXiv:2212.08073.
  2. Language models are few-shot learners. Advances in Neural Information Processing systems, 33:1877–1901.
  3. A survey of chain of thought reasoning: Advances, frontiers and future. Computing Research Repository, arXiv:2309.15402.
  4. Towards fair evaluation of dialogue state tracking by flexible incorporation of turn-level performances. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 318–324, Dublin, Ireland. Association for Computational Linguistics.
  5. A survey on in-context learning. Computing Research Repository, arXiv:2301.00234.
  6. Dialguide: Aligning dialogue model behavior with developer guidelines. arXiv preprint arXiv:2212.10557.
  7. ChatGPT for zero-shot dialogue state tracking: A solution or an opportunity? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 936–950, Toronto, Canada. Association for Computational Linguistics.
  8. Coffee with a hint of data: Towards using data-driven approaches in personalised long-term interactions. Frontiers in Robotics and AI, 8.
  9. Unsupervised dense information retrieval with contrastive learning. Trans. Mach. Learn. Res., 2022.
  10. Personalization in goal-oriented dialog. NeurIPS 2017 Conversational AI Workshop.
  11. Learning to learn semantic parsers from natural language supervision. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1676–1690, Brussels, Belgium. Association for Computational Linguistics.
  12. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual.
  13. A persona-based neural conversation model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 994–1003, Berlin, Germany. Association for Computational Linguistics.
  14. Knowledge-enhanced personalized review generation with capsule graph neural network. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 735–744.
  15. Trustworthy llms: a survey and guideline for evaluating large language models’ alignment. Computing Research Repository, arXiv:2308.05374.
  16. RoBERTa: A robustly optimized bert pretraining approach. Computing Research Repository, arXiv:1907.11692.
  17. Memory-assisted prompt editing to improve gpt-3 after deployment. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2833–2861.
  18. Like hiking? You probably enjoy nature: Persona-grounded dialog with commonsense expansions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9194–9206.
  19. CLIN: A continually learning language agent for rapid task adaptation and generalization. Computing Research Repository, arXiv:2310.10134.
  20. Personalized search on the World Wide Web. In Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl, editors, The Adaptive Web: Methods and Strategies of Web Personalization, volume 4321 of Lecture Notes in Computer Science, pages 195–230. Springer.
  21. OpenAI. 2023. GPT-4 technical report. Computing Research Repository, arXiv:2303.08774.
  22. ToolLLM: Facilitating large language models to master 16000+ real-world apis. Computing Research Repository, arXiv:2307.16789.
  23. Know what you don’t know: Unanswerable questions for SQuAD. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, Melbourne, Australia. Association for Computational Linguistics.
  24. Towards scalable multi-domain conversational agents: The schema-guided dialogue dataset. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 8689–8696.
  25. Okapi at TREC-3. In Text Retrieval Conference.
  26. BenchCLAMP: A benchmark for evaluating language models on syntactic and semantic parsing. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  27. Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics.
  28. LaMP: When large language models meet personalization. Computing Research Repository, arXiv:2304.11406.
  29. Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems.
  30. Mirco Speretta and Susan Gauch. 2005. Personalized search based on user search histories. The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI’05), pages 622–628.
  31. Is ChatGPT good at search? investigating large language models as re-ranking agents. Computing Research Repository, arXiv:2304.09542.
  32. Learning to repair: Repairing model output errors after deployment using a dynamic memory of feedback. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 339–352, Seattle, United States. Association for Computational Linguistics.
  33. Llama 2: Open foundation and fine-tuned chat models. Computing Research Repository, arXiv:2307.09288.
  34. CodeT5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 8696–8708, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  35. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
  36. Ontologically faithful generation of non-player character dialogues. Computing Research Repository, arXiv:2212.10618.
  37. Towards ai-complete question answering: A set of prerequisite toy tasks. Computing Research Repository, arXiv:1502.05698.
  38. Memory networks. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings.
  39. RoTBench: A multi-level benchmark for evaluating the robustness of large language models in tool learning. Computing Research Repository, arXiv:2401.08326.
  40. SatLM: Satisfiability-aided language models using declarative prompting. In The 3rd Workshop on Mathematical Reasoning and AI at NeurIPS’23.
  41. Answering questions by meta-reasoning over multiple chains of thought. Computing Research Repository, arXiv:2304.13007.
  42. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213, Melbourne, Australia. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Nikita Moghe (12 papers)
  2. Patrick Xia (26 papers)
  3. Jacob Andreas (116 papers)
  4. Jason Eisner (56 papers)
  5. Benjamin Van Durme (173 papers)
  6. Harsh Jhamtani (26 papers)
Citations (2)