Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

System-1.x: Learning to Balance Fast and Slow Planning with Language Models (2407.14414v1)

Published 19 Jul 2024 in cs.AI, cs.CL, and cs.LG

Abstract: LLMs can be used to solve long-horizon planning problems in two distinct modes: a fast 'System-1' mode, directly generating plans without any explicit search or backtracking, and a slow 'System-2' mode, planning step-by-step by explicitly searching over possible actions. While System-2 is typically more effective, it is also more computationally expensive, making it infeasible for long plans or large action spaces. Moreover, isolated System-1 or 2 ignores the user's end goals, failing to provide ways to control the model's behavior. To this end, we propose the System-1.x Planner, a controllable planning framework with LLMs that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand. System-1.x consists of (i) a controller, (ii) a System-1 Planner, and (iii) a System-2 Planner. Based on a user-specified hybridization factor (x) governing the mixture between System-1 and 2, the controller decomposes a problem into sub-goals, and classifies them as easy or hard to be solved by either System-1 or 2, respectively. We fine-tune all three components on top of a single base LLM, requiring only search traces as supervision. Experiments with two diverse planning tasks -- Maze Navigation and Blocksworld -- show that our System-1.x Planner outperforms a System-1 Planner, a System-2 Planner trained to approximate A* search, and also a symbolic planner (A*). We demonstrate the following key properties of our planner: (1) controllability: increasing the hybridization factor (e.g., System-1.75 vs 1.5) performs more search, improving performance, (2) flexibility: by building a neuro-symbolic variant with a neural System-1 and a symbolic System-2, we can use existing symbolic methods, and (3) generalizability: by being able to learn from different search algorithms, our method is robust to the choice of search algorithm.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Swarnadeep Saha (19 papers)
  2. Archiki Prasad (18 papers)
  3. Justin Chih-Yao Chen (9 papers)
  4. Peter Hase (29 papers)
  5. Elias Stengel-Eskin (49 papers)
  6. Mohit Bansal (304 papers)
Citations (6)

Summary

System-1.x: Learning to Balance Fast and Slow Planning with LLMs

The paper entitled "System-1.x: Learning to Balance Fast and Slow Planning with LLMs" introduces a hybrid planning framework that leverages the strengths of both rapid heuristic-based decision-making and meticulous, step-by-step planning processes. This work situates itself within the broader discourse on the limitations and potentials of LLMs in long-horizon planning tasks, offering a sophisticated method that balances speed and accuracy based on problem complexity.

Overview

Traditional LLM planning can be categorized into two distinct modes: System-1 and System-2. System-1 approaches produce plans quickly by heuristics or learned models but often lack robustness and accuracy, especially for complex tasks. Conversely, System-2 approaches incorporate thorough search mechanisms, resulting in higher accuracy but at the expense of computational resources.

The System-1.x Planner, proposed in this paper, strikes a balance between these two paradigms. It consists of three primary components:

  1. Controller: Decomposes a planning problem into sub-goals and classifies them as either "easy" or "hard."
  2. System-1 Planner: Addresses the easier sub-goals using fast, heuristic-based planning.
  3. System-2 Planner: Deals with harder sub-goals through more deliberate, search-based methods.

Methodology

The innovative aspect of System-1.x lies in its controllability, governed by a user-defined hybridization factor xx. This factor determines the proportion of fast versus thorough planning used, allowing the controller to dynamically allocate resources based on the perceived difficulty of sub-goals.

Training

To facilitate the training of System-1.x, search traces from various classical planning problems are employed:

  • System-1 Data: Simple plans produced heuristically.
  • System-2 Data: Search trajectories generated by algorithms such as A^*.
  • Controller Data: Derived by decomposing plans into sub-goals using a sliding window technique and classifying them according to their difficulty, defined by a heuristic function.

Evaluation

The paper evaluates System-1.x through experiments on two diverse planning tasks:

  • Maze Navigation: Involves navigating a 5x5 maze with obstacles.
  • Blocksworld: Requires reconfiguring blocks on a table to a predefined goal state.

Key Results and Analysis

Performance and Efficiency

System-1.x demonstrates superior performance across a range of budgets compared to pure System-1 and System-2 approaches. For instance, in the Maze Navigation task:

  • System-1.x achieves an accuracy of 70.4% at approximately 13.6 states explored, surpassing the System-1 and System-2 planners, which obtain 48.7% and 37.2% at comparable states explored.

In the Blocksworld task, which tests out-of-distribution generalization to longer plan lengths:

  • System-1.x maintains a higher accuracy at lower #States-Explored, showing significant improvements over System-2, especially when sub-goals simplify the planning process.

Controllability

A notable feature of System-1.x is its controllability, both at training and inference times:

  • By adjusting the hybridization factor xx, users can fine-tune the balance between speed and accuracy.
  • During inference, the controller can be biased towards more System-2 planning if higher accuracy is required, effectively transforming the System-1.x Planner towards a full System-2 Planner without retraining.

Neuro-Symbolic Integration

The potential for combining neural and symbolic methods is also explored:

  • Neuro-symbolic System-1.x, which uses A^* as the System-2 component, outperforms pure symbolic planners like A^* at matched #States-Explored. For example, at 11.6 states, System-1.x achieves 70.5% accuracy compared to A^*'s 31.0%.

Implications and Future Directions

The implications of this research are significant:

  • Practically: System-1.x offers a robust, flexible planning approach suitable for diverse and complex tasks where resource constraints vary.
  • Theoretically: It underscores the potential for hybrid models that leverage the best of heuristic-based and search-based planning, aligning with concepts from dual-process theories in cognitive science.

Future developments could explore:

  • Scalability: Extending System-1.x to handle larger-scale and more dynamic planning environments.
  • Adaptation to Uncertainty: Enhancing the controller to better manage partially observable and non-deterministic environments.
  • Further Integration: Seamlessly blending neural and symbolic methods to enhance the adaptability and generality of the planner.

In summary, the System-1.x Planner marks a significant advancement in the application of LLMs to planning tasks, providing a compelling blend of efficiency and accuracy through its hybrid, controllable approach. This sets a promising precedent for the development of more sophisticated, adaptive planning systems in the future.