FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs (2403.05766v3)
Abstract: Planning is a crucial task for agents in task oriented dialogs (TODs). Human agents typically resolve user issues by following predefined workflows, decomposing workflow steps into actionable items, and performing actions by executing APIs in order; all of which require reasoning and planning. With the recent advances in LLMs, there have been increasing attempts to use them for task planning and API usage. However, the faithfulness of the plans to predefined workflows and API dependencies, is not guaranteed with LLMs. Moreover, workflows in real life are often custom-defined and prone to changes; hence, adaptation is desirable. To study this, we propose the problem of faithful planning in TODs that needs to resolve user intents by following predefined flows and preserving API dependencies. To solve this problem, we propose FLAP, a Flow-Adhering Planning algorithm based on constrained decoding with lookahead heuristic for LLMs. Our algorithm alleviates the need for finetuning LLMs using domain specific (plan/dependency) data, enables quick adaptation to predefined flows, and outperforms other decoding and prompting-based baselines. Further, our algorithm empowers smaller LLMs (7B) to perform at par larger LLMs (30B-40B).
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
- Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988.
- Falcon-40B: an open large language model with state-of-the-art performance.
- Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
- Plasma: Making small language models better procedural knowledge models for (counterfactual) planning. arXiv preprint arXiv:2305.19472.
- Action-based conversations dataset: A corpus for building more in-depth task-oriented dialogue systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3002–3017.
- Chatcot: Tool-augmented chain-of-thought reasoning on\\\backslash\\\\backslash\chat-based large language models. arXiv preprint arXiv:2305.14323.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum. arXiv preprint arXiv:2308.14034.
- Koala: A dialogue model for academic research. Blog post.
- Approximate factoring for a* search. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 412–419.
- Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. arXiv preprint arXiv:1704.07138.
- The curious case of neural text degeneration. In International Conference on Learning Representations.
- Tool documentation enables zero-shot tool-usage with large language models. arXiv preprint arXiv:2308.00675.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
- Peter Jansen. 2020. Visually-grounded planning without vision: Language models infer detailed plans from high-level instructions. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4412–4417.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4929–4952.
- Api-bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244.
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
- Neurologic a* esque decoding: Constrained text generation with lookahead heuristics. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 780–799.
- Neurologic decoding:(un) supervised neural text generation with predicate logic constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4288–4299.
- Gear: Augmenting language models with generalizable and efficient tool resolution. arXiv preprint arXiv:2307.08775.
- Neuro-symbolic procedural planning with commonsense prompting. In The Eleventh International Conference on Learning Representations.
- Best-first beam search. Transactions of the Association for Computational Linguistics, 8:795–809.
- STAR: A Schema-Guided Dialog Dataset for Transfer Learning. arXiv e-prints.
- An efficient a* search algorithm for statistical machine translation. In Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation.
- Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
- The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
- Adapt: As-needed decomposition and planning with language models. arXiv preprint arXiv:2311.05772.
- Creator: Disentangling abstract and concrete reasonings of large language models through tool creation. arXiv preprint arXiv:2305.14318.
- Making language models better tool learners with execution feedback. arXiv preprint arXiv:2305.13068.
- Tool learning with foundation models. arXiv preprint arXiv:2304.08354.
- Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
- Tptu: Task planning and tool usage of large language model-based ai agents. arXiv preprint arXiv:2308.03427.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
- Generating high-quality and informative conversation responses with sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2210–2219.
- Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10737–10746. IEEE Computer Society.
- Restgpt: Connecting large language models with real-world applications via restful apis. arXiv preprint arXiv:2306.06624.
- Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301.
- MosaicML NLP Team. 2023. Introducing mpt-7b: A new standard for open-source, commercially usable llms. Accessed: 2023-03-28.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- On the planning abilities of large language models - a critical investigation. In Thirty-seventh Conference on Neural Information Processing Systems.
- Diverse beam search: Decoding diverse solutions from neural sequence models.
- Measuring and mitigating constraint violations of in-context learning for utterance-to-API semantic parsing. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7196–7207, Singapore. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Understanding multimodal procedural knowledge by sequencing multimodal instructional manuals. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4525–4542.
- Kevin Yang and Dan Klein. 2021. Fudge: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3511–3535.
- Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752.
- React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
- Prompt-based monte-carlo tree search for goal-oriented dialogue policy planning.
- Learning to decompose and organize complex tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2726–2735.
- Anytod: A programmable task-oriented dialog system.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.