- The paper proposes a hierarchical AFSP framework that decouples perception, reasoning, planning, and control to enable real-time autonomous driving.
- It employs a two-stage VLM adaptation and semantic-guided A* search, achieving up to a 2.5x reduction in latency and enhanced decision consistency.
- The integrated approach demonstrates up to a 45% reduction in lateral deviation and improved safety in diverse CARLA-based driving scenarios.
Bridging Large-Model Reasoning and Real-Time Control via Agentic Fast–Slow Planning
Introduction and Motivation
The integration of large foundation models, especially VLMs and LLMs, into autonomous driving systems (ADSs) introduces unprecedented reasoning capabilities. However, translating semantic-level reasoning to real-time, verifiable control remains unresolved. Existing paradigms either delegate full trajectory synthesis to LLMs—leading to verification issues and untenable real-time latency—or restrict LLMs to episodic MPC hyperparameter updates, resulting in interface ambiguity and suboptimal exploitation of neural reasoning in closed-loop control.
Agentic Fast–Slow Planning (AFSP) directly addresses the challenge by enforcing a principled decoupling of reasoning, planning, and control at their respective natural timescales. The architecture explicitly defines interfaces that are robust, interpretable, and practically deployable in resource-constrained edge–cloud topologies.
Figure 1: The AFSP hierarchy explicitly decouples perception, reasoning, planning, and closed-loop control for interpretable, robust autonomous driving.
System Architecture and Hierarchical Decomposition
The AFSP pipeline receives raw multi-modal sensory inputs plus navigation intent and emits real-time, feasible control actions. The hierarchy comprises three layers:
- Perception2Decision. An edge-deployed VLM Topology Detector compresses scene information to an ego-centric topology graph, encoding both spatial and semantic attributes using quantized coordinates and robust attribute fusion. This is relayed to a cloud-hosted LLM Decision Maker, which maps the structured graph to symbolic directives via structured in-context prompting, providing accessible interpretability and bandwidth efficiency.
- Decision2Trajectory. LLM-generated directives are passed to a Semantic-Guided A* planner, where intent is encoded as cost shaping, biasing geometric search toward intent-aligned, feasible trajectories. The Agentic Refinement Module automates hyperparameter selection and self-tuning based on structured feedback by leveraging LLM reasoning and scenario memory.
- Real-Time Control. The Model Predictive Controller is augmented to synthesize reference trajectories from either onboard or cloud sources, ensuring a fast safety loop with the option for global, LLM-informed trajectory adjustments in complex/ambiguous scenarios.
Figure 2: Detailed system architecture depicting the modularization of perception, semantic decision-making, intent-guided search, and continual agentic refinement.
Perception to Decision: Efficient, Interpretable Scene Abstraction
High-throughput cloud reasoning is bottlenecked by perceptual data volume. The AFSP Perception2Decision module compresses sensory input into a transferable and interpretable ego-polar topology graph. The VLM is fine-tuned with a two-stage schedule: (i) vision encoder adaptation with frozen language heads for stable grounding, (ii) joint finetuning to recover language priors while optimizing spatial alignment.
Quantitative results demonstrate the two-stage method achieves maximal bounding box IoU (93.5%) and category/geometry accuracy. Symbolic topology transmission enables substantial bandwidth and inference latency reductions, as the LLM Decision Maker operating on graphs matches VLM-inferred decisions but at nearly 2.5x reduced latency.
Figure 3: VLM finetuning strategy: vision backbone adaptation followed by joint language/vision optimization for robust topology detection.
Figure 4: Symbolic decision transmission and graph abstraction result in both comparable decision alignment and significant latency savings in edge–cloud pipelines.
Decision to Trajectory: Semantic-Guided Graph Search and Agentic Refinement
Classical A* is brittle in unstructured or perturbed maps, lacking semantic awareness. AFSP's Decision2Trajectory bridges this gap via two mechanisms:
- Semantic-Guided A*: LLM-generated directive sequences are encoded as soft cost functions, influencing A* expansion without introducing hard constraints that may induce search failures. Semantic cost shaping is operationalized by augmenting search states with prior moves and directive progress indices. Correction, delay, contradiction, and overaction are categorized and penalized/rewarded according to a structured cost matrix, which empirically enhances robustness to topological perturbations.

Figure 5: (a) Semantic pattern cost design categorizes directive alignment; (b) hyperparameter modulation yields distinct trajectory characteristics and robustness behaviors.
Closed-Loop Control: Cloud-Guided Flexible MPC
The final reference path, incorporating both geometric and semantic constraints, feeds into a switching MPC controller. Stepwise, the controller selects local references when feasible, invoking cloud-supplied plan summaries only when environmental ambiguity or complexity is detected. This enables low-latency actuation in routine conditions and LLM-informed global reasoning on demand, preserving safety and optimizing long-horizon performance.
Empirical Results
CARLA-based evaluations span perception, semantic decision consistency, trajectory generation under perturbation, and closed-loop driving. Key findings include:
- Perception2Decision: The two-stage finetuned VLM yields highest IoU/accuracy, and symbolic graph transmission to LLM preserves decision quality at a 60% reduction in inference latency.
- Decision2Trajectory: Semantic-Guided A* unequivocally preserves directive adherence under map misalignments where classic A* fails. The agentic refinement process converges on cost parameters that maximize both smoothness and semantic alignment.
- End-to-end driving: Across diverse CARLA scenarios, AFSP achieves up to 45% reduction in maximum lateral deviation and 12% reduction in completion time compared to both pure MPC and A*-guided MPC baselines, with enhanced robustness to map perturbation and scenario diversity.


Figure 7: Comparative analysis of control schemes in the CARLA testbed; AFSP demonstrates reduced oscillation, enhanced safety corridors, and semantic adherence relative to classical planners.
Figure 8: AFSP achieves minimum finish time and lateral tracking error quantitatively across all scenarios, validating effective hierarchy and interface design.
Implications and Future Directions
AFSP establishes a verifiable, interpretable architecture for embedding large-model reasoning in real-time control-intensive domains. By separating and bridging perception, reasoning, planning, and actuation, it enables explicit interface formalization at each timescale, circumventing the opacity and rigidity of end-to-end neural policies and the brittleness of classical search.
Theoretically, the semantic-guided cost shaping in A* and agentic memory-based parameter refinement offer generalizable templates for integrating symbolic/LLM-derived priors in graph-based planners. Practically, the AFSP approach realizes scalable cloud–edge collaboration with meaningful reductions in both resource demand and real-time actuation latency.
Further research will likely extend AFSP with dynamic adaptive querying (e.g., learning when to invoke cloud-based reasoning), formal verification over symbolic directive execution, and increased memory/contextual capacity for life-long agent self-improvement in open-world driving.
Conclusion
Agentic Fast–Slow Planning constitutes a hierarchical and interpretable solution for bridging the gap between large-model semantic reasoning and robust, real-time robot control. Its modular interfaces, cloud–edge semantic abstraction, cost-shaped search, and agentic self-tuning deliver quantifiable safety, efficiency, and adaptability improvements—laying the groundwork for transparent ADSs that can harness the strengths of both foundation models and classical planners (2604.01681).