Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
120 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Case for Developing a Foundation Model for Planning-like Tasks from Scratch (2404.04540v1)

Published 6 Apr 2024 in cs.AI

Abstract: Foundation Models (FMs) have revolutionized many areas of computing, including Automated Planning and Scheduling (APS). For example, a recent study found them useful for planning problems: plan generation, language translation, model construction, multi-agent planning, interactive planning, heuristics optimization, tool integration, and brain-inspired planning. Besides APS, there are many seemingly related tasks involving the generation of a series of actions with varying guarantees of their executability to achieve intended goals, which we collectively call planning-like (PL) tasks like business processes, programs, workflows, and guidelines, where researchers have considered using FMs. However, previous works have primarily focused on pre-trained, off-the-shelf FMs and optionally fine-tuned them. This paper discusses the need for a comprehensive FM for PL tasks from scratch and explores its design considerations. We argue that such an FM will open new and efficient avenues for PL problem-solving, just like LLMs are creating for APS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Understanding approaches for web service composition and execution. In Proceedings of the 1st Bangalore annual Compute conference, 1–8.
  2. A dataset to facilitate automated workflow analysis. In PLoS ONE 14(2): e0211486. https://doi.org/10.1371/journal.pone.0211486.
  3. Generating Dialogue Agents via Automated Planning. In https://arxiv.org/abs/1902.00771.
  4. Camunda. 2024. BPMN for research. In https://github.com/camunda/bpmn-for-research.
  5. Web Services Description Language (WSDL) Version 2.0 Part 1: Core Language. World Wide Web Consortium, Recommendation REC-wsdl20-20070626.
  6. PaLM: Scaling Language Modeling with Pathways. arXiv:2204.02311.
  7. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. arXiv:2208.07339.
  8. Fast and Slow Planning. arXiv preprint arXiv:2303.04283.
  9. Clinical practice guidelines. Directions for a new program, 90(8).
  10. A Survey of Quantization Methods for Efficient Neural Network Inference. arXiv:2103.13630.
  11. ImageBind: One Embedding Space To Bind Them All. In CVPR.
  12. Knowledge Distillation of Large Language Models. arXiv:2306.08543.
  13. Textbooks are all you need. arXiv preprint arXiv:2306.11644.
  14. Hannibal046. 2024. Awesome-LLM: a curated list of Large Language Model. In https://github.com/Hannibal046/Awesome-LLM.
  15. Ethical Challenges in Data-Driven Dialogue Systems. In Proc. of AAAI/ACM Conference on AI Ethics and Society (AIES-18), New Orleans, Lousiana, USA.
  16. Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes. arXiv:2305.02301.
  17. HuggingFace. 2024. Training a causal language model from scratch. https://huggingface.co/learn/nlp-course/en/chapter7/6.
  18. ICAPS. 2023. International Planning Competitions. In https://www.icaps-conference.org/competitions/.
  19. Chatgpt and software testing education: Promises & perils. In 2023 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), 4130–4137. IEEE.
  20. Karpathy, A. 2023. Let’s build GPT: from scratch, in code, spelled out. https://www.youtube.com/watch?v=kCc8FmEb1nY.
  21. Reference Manual. In The C Programming Language, chapter Appendix A, 191–240. Prentice Hall, second edition.
  22. Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping. arXiv preprint arXiv:2402.14083.
  23. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781713829546.
  24. LLM-QAT: Data-Free Quantization Aware Training for Large Language Models. arXiv:2305.17888.
  25. LLM-Pruner: On the Structural Pruning of Large Language Models. arXiv:2305.11627.
  26. PDDL-the planning domain definition language.
  27. Conversational Interfaces: Past and Present. In The Conversational Interface. Springer, DOI: https://doi.org/10.1007/978-3-319-32967-3_4.
  28. Planning for Goal-Oriented Dialogue Systems. In https://arxiv.org/abs/1910.08137.
  29. Building and using a planning ontology from past data for performance efficiency. In Proc. ICAPS’23 Workshop PLanning And onTology wOrkshop (PLATO).
  30. OMG. 2011. Business Process Model and Notation (BPMN), Version 2.0.
  31. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  32. Plansformer: Generating symbolic plans using transformers. arXiv preprint arXiv:2212.08681.
  33. Plansformer: Generating Symbolic Plans using Transformers. In On Arxiv at: https://arxiv.org/abs/2212.08681.
  34. Understanding the Capabilities of Large Language Models for Automated Planning. arXiv preprint arXiv:2305.16151.
  35. Plansformer Tool: Demonstrating Generation of Symbolic Plans Using Transformers. In IJCAI, volume 2023, 7158–7162. International Joint Conferences on Artificial Intelligence.
  36. Harnessing Large Language Models for Planning: A Lab on Strategies for Success and Mitigation of Pitfalls. In AAAI Conference on Artificial Intelligence, https://github.com/VishalPallagani/LLMsforPlanningLab-AAAI24.
  37. On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS). In ICAPS, Calgary, Canada.
  38. A Generic Dialog Agent for Information Retrieval Based on Automated Planning Within a Reinforcement Learning Platform.
  39. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research, 21(140): 1–67.
  40. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research, 21(140): 1–67.
  41. Artificial Intelligence, A Modern Approach. Second Edition.
  42. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108.
  43. Process knowledge-infused ai: Toward user-level explainability, interpretability, and safety. IEEE Internet Computing, 26(5): 76–84.
  44. Srivastava, B. 2010. Processes Summarization. In Conference on Management of Data (COMAD), India. Code at: https://github.com/biplav-s/processes-summarizer.
  45. Srivastava, B. 2021. Did chatbots miss their “Apollo Moment”? Potential, gaps, and lessons from using collaboration assistants during COVID-19. In Patterns, Volume 2, Issue 8, 100308.
  46. Web service composition-current solutions and open problems. In ICAPS 2003 workshop on Planning for Web Services, volume 35, 28–35.
  47. An APQC-PCF based framework to compare service offerings in business transformation projects. In Proceedings of the 2010 ACM Symposium on Applied Computing, 73–78.
  48. Business Driven Consolidation of SOA Implementations. In 2010 IEEE International Conference on Services Computing, 49–56. IEEE.
  49. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568: 127063.
  50. A Simple and Effective Pruning Approach for Large Language Models. arXiv:2306.11695.
  51. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques, 242–264. IGI global.
  52. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
  53. Uncbiag. 2024. Awesome Foundation Models. In https://github.com/uncbiag/Awesome-Foundation-Models.
  54. Can Large Language Models Really Improve by Self-critiquing Their Own Plans? arXiv preprint arXiv:2310.08118.
  55. PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  56. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:2206.10498.
  57. On the planning abilities of large language models (a critical investigation with a proposed benchmark). arXiv preprint arXiv:2302.06706.
  58. Workflow Management: Models, Methods, and Systems. In ISBN:978-0262720465, MIT Press.
  59. Attention is all you need. Advances in neural information processing systems, 30.
  60. A Survey of Available Corpora for Building Data-Driven Dialogue Systems. ArXiv e-prints.
  61. From Cloud to Edge: Rethinking Generative AI for Low-Resource Design Challenges. arXiv:2402.12702.
  62. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859.
  63. Weitzenhoffer, A. M. 1974. When is an “instruction” an “instruction”? International Journal of Clinical and Experimental Hypnosis, 22(3): 258–269.
  64. DeepCAD: A Deep Generative Network for Computer-Aided Design Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 6772–6782.
  65. Travelplanner: A benchmark for real-world planning with language agents. arXiv preprint arXiv:2402.01622.
  66. Measuring GitHub Copilot’s Impact on Productivity. Commun. ACM, 67(3): 54–63.
  67. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.
Citations (2)

Summary

  • The paper establishes that current NLP foundation models lack the formalism required for generating precise action sequences in planning-like tasks.
  • It identifies a critical gap in handling execution semantics and task-specific complexities that general models fail to capture.
  • It proposes a novel training methodology incorporating custom tokenizers, pre-training tasks, and evaluation metrics to create efficient, domain-tailored models.

Overview of the Need for a Foundation Model for Planning-like Tasks

The paper "The Case for Developing a Foundation Model for Planning-like Tasks from Scratch" by Biplav Srivastava and Vishal Pallagani presents an intriguing discussion on the need to develop a specialized Foundation Model (FM) to address Planning-like (PL) tasks. The authors argue that existing Foundation Models, primarily pre-trained for NLP tasks, are inadequate for capturing the nuanced requirements of planning tasks that involve generating specific action sequences with varied execution guarantees. The paper outlines a framework for designing and training a bespoke FM specifically tailored to meet the intricate demands of PL tasks, leveraging insights from Automated Planning and Scheduling (APS).

Key Contributions

  1. Clarification of Planning-like Tasks: The paper introduces the concept of PL tasks, encompassing business processes, dialogues, guidelines, instructions, design drawings, programs, and workflows. Each of these tasks involves generating sequences of actions or decisions to achieve specific goals. While existing FMs are being explored for these tasks, their efficacy remains limited due to the lack of proper formalism akin to that available in APS.
  2. Identifying the Gap: Current FMs are predicated on general pre-training tasks and datasets, making them ill-equipped to handle intricacies crucial to PL tasks, such as action sequence generation, execution semantics, and task-specific complexities. The paper highlights the limited success of fine-tuning off-the-shelf models for domain-specific applications and suggests that starting from scratch could result in models better aligned with PL objectives.
  3. Proposed Training Methodology: A comprehensive training procedure is proposed that involves developing a specialized tokenizer, model architecture, and novel pre-training tasks aimed at capturing PL tasks' complex requirements. The paper also suggests leveraging domain-specific datasets and proposed evaluation metrics tailored to assess the FM's performance on PL tasks.
  4. Novel Pre-training Tasks: Drawing attention to limitations in current FMs, the paper suggests unique pre-training tasks such as Next Action Prediction, Execution Simulation, and Action and Effect Modeling. These tasks aim to impart an understanding of temporal planning, execution semantics, and action consequences, thereby enhancing the model's decision-making prowess.
  5. Implications and Practical Considerations: Beyond theoretical implications, the paper discusses practical considerations in developing such an FM, including strategies for pruning, quantization, and knowledge distillation to develop compact and efficient models that can be deployed in resource-constrained environments.

Implications for Future Research

The pursuit of a Foundation Model customized for PL tasks could significantly impact both theoretical approaches to AI planning and practical implementations across domains such as business process management, software engineering, and complex task orchestration. By incorporating multi-modal learning and domain-specific pre-training paradigms, future research could explore cross-domain applicability and the challenges related to grounding, alignment, and instructability, which are crucial for effective real-world deployment.

Conclusion

The paper presents a persuasive argument for the necessity of developing a Foundation Model specifically designed for Planning-like tasks. It addresses the inadequacies of existing models for this purpose and offers a detailed roadmap for creating more specialized, effective, and efficient systems tailored to the diverse needs of PL tasks. This work establishes the foundational considerations essential for advancing AI's capability in generating, executing, and validating plans across a broad spectrum of applications.