Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools (2404.11891v2)

Published 18 Apr 2024 in cs.AI, cs.CL, and cs.HC

Abstract: Despite their recent advancements, LLMs still struggle to directly generate correct plans for complex multi-constraint planning problems, even with self-verification and self-critique. For example, a U.S. domestic travel planning benchmark TravelPlanner was proposed in Xie et al. (2024), where the best LLM OpenAI o1-preview can only find travel plans that satisfy user requirements with a 10% success rate given all needed information. In this work, we tackle this difficult problem by proposing an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems as constrained satisfiability problems, which are further consumed by sound and complete satisfiability solvers. We start with TravelPlanner as the primary use case and achieve a success rate of 93.9%. We demonstrate our framework's robustness by showing its effectiveness in diverse paraphrased prompts. More importantly, our framework has strong zero-shot generalizability: It can successfully handle unseen constraints in a completely unseen international travel dataset we created, and it can even generalize well to new domains such as routing and task allocation problems in a zero-shot manner. Moreover, when user input queries are infeasible, our framework can identify the unsatisfiable core, provide failure reasons, and offers personalized modification suggestions to users according to diverse human preferences. We show that our framework can modify and solve for an average of 81.6% and 91.7% unsatisfiable queries from two datasets and prove with ablations that all key components of our framework are effective and necessary.

Formal Verification in Travel Planning with LLMs

LLMs have recently emerged as powerful tools capable of handling a variety of tasks due to their extensive world knowledge and reasoning abilities. Despite their impressive capabilities, LLMs have limitations in directly solving complex combinatorial optimization problems, such as travel planning, where multiple constraints must be satisfied. The paper "LLMs Can Plan Your Travels Rigorously with Formal Verification Tools" presents a novel framework integrating LLMs with formal verification tools to solve such intricate problems, specifically focusing on travel planning.

The authors propose a framework that leverages satisfiability modulo theories (SMT) solvers to address the shortcomings of LLMs in handling multi-constraint optimization. The framework transforms the travel planning challenge into a constraint satisfaction problem, enabling rigorous formulation and solution through SMT. By doing this, the framework ensures that all constraints are formally verified, guaranteeing a valid solution if one exists within the specified criteria.

The evaluation framework uses TravelPlanner, a benchmark specifically designed for U.S. domestic travel planning, revealing that LLMs alone achieve a success rate of only 0.6%. In contrast, the proposed framework reached a significantly higher success rate of 97% on TravelPlanner's validation and test sets. This indicates the effectiveness of combining LLMs with formal verification tools for computationally intensive planning tasks.

Furthermore, the authors expand the evaluation to include a separate dataset for international travel, achieving a success rate of 85% for TravelPlanner and 78.6% for their dataset. The variation in success rates illustrates the framework's adaptability to different datasets and constraints, underscoring its robustness.

A key component of the framework is its interactive plan repair capability. When confronted with unsatisfiable travel plans, the LLM component collaborates with the user by providing suggestions to modify constraints. This feature exemplifies the utility of LLMs in interacting with humans and adapting plans according to diverse preferences and dynamically changing requirements.

The research presents several implications for AI development. Practically, this framework can assist in efficiently planning complex travel itineraries, facilitating both individual and commercial applications. Theoretically, it offers a pathway to enhance LLM capabilities by integrating them with formal methods, potentially expanding their utility in other domains requiring strict constraint satisfaction.

Looking to the future, the integration of LLMs with formal solvers could see broader applications beyond travel planning. Fields such as logistics, supply chain management, and automated scheduling may benefit from such a hybrid approach, offering solutions that balance flexibility with formal correctness. Further research may explore extending this framework to encompass machine learning techniques within the reasoning process itself, enhancing the adaptive capabilities of LLMs in real-world applications.

In summary, this paper provides noteworthy insights into overcoming the inherent limitations of LLMs in complex planning scenarios through the use of formal verification tools, paving the way for future advancements in AI-driven planning and optimization tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yilun Hao (12 papers)
  2. Yongchao Chen (18 papers)
  3. Yang Zhang (1129 papers)
  4. Chuchu Fan (81 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com