- The paper introduces TriFlow, a multi-agent system that incrementally enforces feasibility of itineraries using a retrieval-planning-governance pipeline.
- It employs modular rule-based planners and validator loops to ensure monotonic constraint satisfaction, achieving a 10.9× speedup on benchmark tests.
- Its design enhances hard-constraint compliance, interpretability, and scalability, providing a blueprint for LLM-powered planning in constrained environments.
TriFlow: A Progressive Multi-Agent Framework for Intelligent Trip Planning
Problem Context and Motivation
Real-world trip planning requires mapping open-ended, ambiguous user intents into executable itineraries that strictly satisfy operational, temporal, spatial, and budgetary constraints while aligning with personalized preferences. Existing LLM-agent systems, despite recent advances, often fail to deliver consistent feasibility, rationality, or cost efficiency. Typical issues include hallucinated content, violation of real-world constraints, and high inference latency due to the lack of integrated constraint handling and structured intermediate representations.
TriFlow introduces a staged, multi-agent architecture that unifies structured reasoning with LLM-based natural language flexibility through a retrieval–planning–governance pipeline. The architecture is designed to progressively narrow the search space, enforce monotonic constraint satisfaction, and allow for explainable, bounded iterative refinement.
Figure 1: Three-stage progressive architecture of TriFlow.
Architectural Design and Methodology
System Overview
TriFlow leverages a three-stage pipeline: retrieval, planning, and governance. Each stage progressively contracts the solution space and enforces increasingly tight constraints. The stages are modular and operate in a feasibility-first hierarchy: no optimization for itinerary quality proceeds before feasibility is ensured at the current stage.
Figure 2: Demonstration of TriFlow’s pipeline from natural-language query to iterative governance guided itinerary generation.
Retrieval Stage
The first stage translates open-ended user queries into structured sub-tasks via LLM-driven query decomposition. A dedicated retrieval agent assembles a minimally sufficient subset of factual resources (flights, accommodations, POIs), which undergoes spatial, temporal, and budgetary validation. This stage encapsulates strict geometry, time window, and price boundary checks, ensuring that only consistent candidates are passed downstream.
Planning Stage
Modular rule-based planners construct a coarse-to-fine itinerary skeleton—allocating cities, days, and main activities—under strict adherence to all previously satisfied constraints. Validator loops (suggest, validate, normalize) ensure monotonic consistency: previously satisfied constraints are protected from subsequent violations. User preferences are incorporated via interpretable arbitration and structured ranking inside the feasible domain.
Governance Stage
An LLM-based governance agent iterates on the feasible itinerary to close residual gaps, making incremental changes (e.g., cost reduction, timing deconfliction, improved preference alignment) but only if each update strictly preserves global feasibility. Iterative adjustments terminate early on convergence or after a bounded number of refinement rounds.
This design forms a closed-loop “generate–verify–assemble–recompute” system, which contrasts with monolithic LLM planning by yielding predictable, interpretable, and reproducible agentic behavior.
TriFlow was evaluated on TravelPlanner and TripTailor, currently the most authoritative, large-scale benchmarks for intelligent trip planning.
TravelPlanner Results
On the TravelPlanner validation set (180 instances), TriFlow achieved:
- Delivery Rate: 100%
- Macro Commonsense Pass Rate: 95.0%
- Macro Hard Constraint Pass Rate: 96.1%
- Final Pass Rate (FPR): 91.1%
Per-task runtime averaged 22.6 seconds, a 10.9× speedup over the strongest prior method, FormalVerify, which clocked at 245.7 seconds. TriFlow exceeded 95% pass rates for all constraint categories and produced globally consistent, constraint-compliant itineraries even under long-horizon conditions.
TripTailor Results
For TripTailor, TriFlow delivered:
- Macro Feasibility: 99.1%
- Macro Rationality: 97.7%
- FPR: 97.7%
This outperformed the previous strongest workflow baseline by over 34%. Performance remained robust across task difficulties and all constraint dimensions, with all metrics at or above 97.6%, demonstrating domain generalization and stability of constraint handling.
Analysis of Constraint Satisfaction and Interpretation
TriFlow’s staged architecture exhibits the largest improvement in strict hard-constraint satisfaction and final feasibility under dense spatiotemporal and budgetary constraints. Metrics for “Within Sandbox,” “Reasonable City Route,” “Minimum Nights Stay,” and budget compliance improved from baselines of 4–53% to rates exceeding 95–100%. The retrieval stage's domain contraction and robust validation were decisive in eliminating factual inconsistency early, and the monotonic feasibility principle of the planning stage prevented newly introduced violations. Governance addressed residual edge cases via targeted, explainable correction.
The system’s modular design directly supports fault isolation, interpretability, and transparent diagnosis of failure modes across the retrieval, planning, or governance layers.
Implications and Future Directions
TriFlow’s framework represents a robust orchestration paradigm for agentic planning under strict multi-objective constraints. Its contributions include a composable architecture for constraint-governed agent workflows; a unified language–structure–constraint–iteration protocol; and empirical confirmation that explicit, progressive contraction of the solution space is critical for practical feasibility at scale.
Practically, the TriFlow architecture can serve as a blueprint for LLM-powered planners in other domains requiring hard-constraint satisfaction and high-cost efficiency (e.g., logistics, scheduling, personalized recommendation systems). Theoretically, the work motivates deeper integration between symbolic constraint programming and neural generation, particularly via closed-loop, validator-driven agent architectures.
Anticipated extensions include real-time integration of live data sources, adaptation to dynamic environments, and extending the governance protocols for continuous online deployment and agile recomputation.
Conclusion
TriFlow establishes a new standard for LLM-driven, constraint-consistent trip planning by unifying staged retrieval, rule-based planning, and iterative governance. It achieves state-of-the-art feasibility and rationality on public benchmarks with an order of magnitude efficiency gain over previous systems, demonstrating that modular, feasibility-first multi-agent designs are essential for real-world deployment of planning agents. TriFlow’s design advances both practical and theoretical understanding of robust agent orchestration in high-stakes, constrained decision-making scenarios (2512.11271).