Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 94 tok/s
Gemini 2.5 Pro 57 tok/s Pro
GPT-5 Medium 28 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 100 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 208 tok/s Pro
2000 character limit reached

FLAP: Flow-Adhering Planning with Constrained Decoding in LLMs (2403.05766v3)

Published 9 Mar 2024 in cs.CL

Abstract: Planning is a crucial task for agents in task oriented dialogs (TODs). Human agents typically resolve user issues by following predefined workflows, decomposing workflow steps into actionable items, and performing actions by executing APIs in order; all of which require reasoning and planning. With the recent advances in LLMs, there have been increasing attempts to use them for task planning and API usage. However, the faithfulness of the plans to predefined workflows and API dependencies, is not guaranteed with LLMs. Moreover, workflows in real life are often custom-defined and prone to changes; hence, adaptation is desirable. To study this, we propose the problem of faithful planning in TODs that needs to resolve user intents by following predefined flows and preserving API dependencies. To solve this problem, we propose FLAP, a Flow-Adhering Planning algorithm based on constrained decoding with lookahead heuristic for LLMs. Our algorithm alleviates the need for finetuning LLMs using domain specific (plan/dependency) data, enables quick adaptation to predefined flows, and outperforms other decoding and prompting-based baselines. Further, our algorithm empowers smaller LLMs (7B) to perform at par larger LLMs (30B-40B).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
  2. Santacoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988.
  3. Falcon-40B: an open large language model with state-of-the-art performance.
  4. Language (technology) is power: A critical survey of “bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5454–5476, Online. Association for Computational Linguistics.
  5. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  6. Plasma: Making small language models better procedural knowledge models for (counterfactual) planning. arXiv preprint arXiv:2305.19472.
  7. Action-based conversations dataset: A corpus for building more in-depth task-oriented dialogue systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3002–3017.
  8. Chatcot: Tool-augmented chain-of-thought reasoning on\\\backslash\\\\backslash\chat-based large language models. arXiv preprint arXiv:2305.14323.
  9. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  10. Confucius: Iterative tool learning from introspection feedback by easy-to-difficult curriculum. arXiv preprint arXiv:2308.14034.
  11. Koala: A dialogue model for academic research. Blog post.
  12. Approximate factoring for a* search. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 412–419.
  13. Chris Hokamp and Qun Liu. 2017. Lexically constrained decoding for sequence generation using grid beam search. arXiv preprint arXiv:1704.07138.
  14. The curious case of neural text degeneration. In International Conference on Learning Representations.
  15. Tool documentation enables zero-shot tool-usage with large language models. arXiv preprint arXiv:2308.00675.
  16. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
  17. Peter Jansen. 2020. Visually-grounded planning without vision: Language models infer detailed plans from high-level instructions. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4412–4417.
  18. Mistral 7b. arXiv preprint arXiv:2310.06825.
  19. Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4929–4952.
  20. Api-bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244.
  21. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842.
  22. Neurologic a* esque decoding: Constrained text generation with lookahead heuristics. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 780–799.
  23. Neurologic decoding:(un) supervised neural text generation with predicate logic constraints. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4288–4299.
  24. Gear: Augmenting language models with generalizable and efficient tool resolution. arXiv preprint arXiv:2307.08775.
  25. Neuro-symbolic procedural planning with commonsense prompting. In The Eleventh International Conference on Learning Representations.
  26. Best-first beam search. Transactions of the Association for Computational Linguistics, 8:795–809.
  27. STAR: A Schema-Guided Dialog Dataset for Transfer Learning. arXiv e-prints.
  28. An efficient a* search algorithm for statistical machine translation. In Proceedings of the ACL 2001 Workshop on Data-Driven Methods in Machine Translation.
  29. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334.
  30. The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only. arXiv preprint arXiv:2306.01116.
  31. Adapt: As-needed decomposition and planning with language models. arXiv preprint arXiv:2311.05772.
  32. Creator: Disentangling abstract and concrete reasonings of large language models through tool creation. arXiv preprint arXiv:2305.14318.
  33. Making language models better tool learners with execution feedback. arXiv preprint arXiv:2305.13068.
  34. Tool learning with foundation models. arXiv preprint arXiv:2304.08354.
  35. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789.
  36. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992.
  37. Tptu: Task planning and tool usage of large language model-based ai agents. arXiv preprint arXiv:2308.03427.
  38. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  39. Generating high-quality and informative conversation responses with sequence-to-sequence models. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2210–2219.
  40. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10737–10746. IEEE Computer Society.
  41. Restgpt: Connecting large language models with real-world applications via restful apis. arXiv preprint arXiv:2306.06624.
  42. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301.
  43. MosaicML NLP Team. 2023. Introducing mpt-7b: A new standard for open-source, commercially usable llms. Accessed: 2023-03-28.
  44. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  45. On the planning abilities of large language models - a critical investigation. In Thirty-seventh Conference on Neural Information Processing Systems.
  46. Diverse beam search: Decoding diverse solutions from neural sequence models.
  47. Measuring and mitigating constraint violations of in-context learning for utterance-to-API semantic parsing. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7196–7207, Singapore. Association for Computational Linguistics.
  48. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  49. Understanding multimodal procedural knowledge by sequencing multimodal instructional manuals. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4525–4542.
  50. Kevin Yang and Dan Klein. 2021. Fudge: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3511–3535.
  51. Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752.
  52. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations.
  53. Prompt-based monte-carlo tree search for goal-oriented dialogue policy planning.
  54. Learning to decompose and organize complex tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2726–2735.
  55. Anytod: A programmable task-oriented dialog system.
Citations (4)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces FLAP, which leverages constrained decoding with lookahead heuristics to enforce workflow and API dependency adherence.
  • FLAP significantly reduces planning errors, enabling smaller LLMs to achieve performance comparable to much larger models in task-oriented dialogs.
  • FLAP uses dynamic dependency graphs to adapt plans in real time, ensuring reliable execution of customized workflows in practical applications.

Understanding FLAP: Enhancing Task-Oriented Dialogs with Flow Adhering Planning in LLMs

Planning in Task-Oriented Dialogs (TODs)

In task-oriented dialogs, agents are often required to perform a sequence of actions, typically API calls, to fulfill user requests. This involves not just identifying what actions to perform but also understanding the correct sequence in which these actions should be executed, which can be quite challenging. Traditional approaches toward this involve leveraging LLMs directly to generate plans that satisfy the request. However, one significant limitation of directly using LLMs is their predisposition to deviate from predetermined workflows and API dependencies dictated by the specific task domain.

The FLAP Algorithm

The paper introduces FLAP (Flow Adhering Planning), a novel algorithm that enhances LLMs' ability to generate plans for TODs by adhering to predefined workflows and API dependencies. Unlike traditional approaches that might require retraining or fine-tuning LLMs with domain-specific data, FLAP operates using constrained decoding based on lookahead heuristics. This is particularly beneficial in real-world scenarios where workflows and API dependencies are often customized and subject to change.

FLAP's constrained decoding operates by maintaining dynamic dependency graphs for both APIs and workflow steps, enforcing that plans generated by LLMs preserve these dependencies. Through beam search lookahead, it scores potential next actions based on their alignment with the permitted actions inferred from the dependency graphs. Moreover, FLAP introduces several scoring components within its heuristic function, like the alignment of generated thoughts with permitted workflow steps and APIs, hence ensuring the high relevance of the generated plan with the actual task context.

Performance and Evaluation

The evaluation of FLAP was conducted on a novel dataset comprising various domains, intents, and associated workflows, showcasing its adaptability and efficiency across different scenarios. The results underscored that LLMs, when equipped with FLAP, significantly outperformed the baselines in faithful plan generation, notably reducing errors related to API and workflow step dependencies. Notably, applying FLAP on smaller LLMs (e.g., 7B parameters) yielded performance comparable to much larger models (30B-40B parameters), demonstrating FLAP's ability to enhance planning features without necessitating larger model sizes.

Implications and Forward Look

The introduction of FLAP opens up several avenues for future work, particularly in dynamic planning scenarios where plans may need to be adjusted in real-time based on conversational context or external events. Moreover, FLAP's ability to effectively utilize smaller LLMs for complex planning tasks hints at broader applicability in resource-constrained environments.

Looking ahead, further refinement of FLAP's constrained decoding strategies could provide even finer control over the planning process, enabling more nuanced adherence to complex workflows and dependencies. Additionally, integrating external knowledge sources or real-time API call feedback into FLAP's planning process could further enhance its effectiveness and reliability in practical applications.

In summary, FLAP represents a significant advance in leveraging LLMs for task-oriented dialogs, emphasizing the importance of structure and constraint adherence in plan generation tasks. Its development marks a step forward in the journey toward more effective, efficient, and adaptable automated dialog agents.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.