Bridging Large-Model Reasoning and Real-Time Control via Agentic Fast-Slow Planning

Published 2 Apr 2026 in cs.RO and cs.AI | (2604.01681v1)

Abstract: Large foundation models enable powerful reasoning for autonomous systems, but mapping semantic intent to reliable real-time control remains challenging. Existing approaches either (i) let LLMs generate trajectories directly - brittle, hard to verify, and latency-prone - or (ii) adjust Model Predictive Control (MPC) objectives online - mixing slow deliberation with fast control and blurring interfaces. We propose Agentic Fast-Slow Planning, a hierarchical framework that decouples perception, reasoning, planning, and control across natural timescales. The framework contains two bridges. Perception2Decision compresses scenes into ego-centric topologies using an on-vehicle Vision-LLM (VLM) detector, then maps them to symbolic driving directives in the cloud with an LLM decision maker - reducing bandwidth and delay while preserving interpretability. Decision2Trajectory converts directives into executable paths: Semantic-Guided A* embeds language-derived soft costs into classical search to bias solutions toward feasible trajectories, while an Agentic Refinement Module adapts planner hyperparameters using feedback and memory. Finally, MPC tracks the trajectories in real time, with optional cloud-guided references for difficult cases. Experiments in CARLA show that Agentic Fast-Slow Planning improves robustness under perturbations, reducing lateral deviation by up to 45% and completion time by over 12% compared to pure MPC and an A*-guided MPC baseline. Code is available at https://github.com/cjychenjiayi/icra2026_AFSP.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper proposes a hierarchical AFSP framework that decouples perception, reasoning, planning, and control to enable real-time autonomous driving.
It employs a two-stage VLM adaptation and semantic-guided A* search, achieving up to a 2.5x reduction in latency and enhanced decision consistency.
The integrated approach demonstrates up to a 45% reduction in lateral deviation and improved safety in diverse CARLA-based driving scenarios.

Bridging Large-Model Reasoning and Real-Time Control via Agentic Fast–Slow Planning

Introduction and Motivation

The integration of large foundation models, especially VLMs and LLMs, into autonomous driving systems (ADSs) introduces unprecedented reasoning capabilities. However, translating semantic-level reasoning to real-time, verifiable control remains unresolved. Existing paradigms either delegate full trajectory synthesis to LLMs—leading to verification issues and untenable real-time latency—or restrict LLMs to episodic MPC hyperparameter updates, resulting in interface ambiguity and suboptimal exploitation of neural reasoning in closed-loop control.

Agentic Fast–Slow Planning (AFSP) directly addresses the challenge by enforcing a principled decoupling of reasoning, planning, and control at their respective natural timescales. The architecture explicitly defines interfaces that are robust, interpretable, and practically deployable in resource-constrained edge–cloud topologies.

Figure 1: The AFSP hierarchy explicitly decouples perception, reasoning, planning, and closed-loop control for interpretable, robust autonomous driving.

System Architecture and Hierarchical Decomposition

The AFSP pipeline receives raw multi-modal sensory inputs plus navigation intent and emits real-time, feasible control actions. The hierarchy comprises three layers:

Perception2Decision. An edge-deployed VLM Topology Detector compresses scene information to an ego-centric topology graph, encoding both spatial and semantic attributes using quantized coordinates and robust attribute fusion. This is relayed to a cloud-hosted LLM Decision Maker, which maps the structured graph to symbolic directives via structured in-context prompting, providing accessible interpretability and bandwidth efficiency.
Decision2Trajectory. LLM-generated directives are passed to a Semantic-Guided A* planner, where intent is encoded as cost shaping, biasing geometric search toward intent-aligned, feasible trajectories. The Agentic Refinement Module automates hyperparameter selection and self-tuning based on structured feedback by leveraging LLM reasoning and scenario memory.
Real-Time Control. The Model Predictive Controller is augmented to synthesize reference trajectories from either onboard or cloud sources, ensuring a fast safety loop with the option for global, LLM-informed trajectory adjustments in complex/ambiguous scenarios.
Figure 2: Detailed system architecture depicting the modularization of perception, semantic decision-making, intent-guided search, and continual agentic refinement.

Perception to Decision: Efficient, Interpretable Scene Abstraction

High-throughput cloud reasoning is bottlenecked by perceptual data volume. The AFSP Perception2Decision module compresses sensory input into a transferable and interpretable ego-polar topology graph. The VLM is fine-tuned with a two-stage schedule: (i) vision encoder adaptation with frozen language heads for stable grounding, (ii) joint finetuning to recover language priors while optimizing spatial alignment.

Quantitative results demonstrate the two-stage method achieves maximal bounding box IoU (93.5%) and category/geometry accuracy. Symbolic topology transmission enables substantial bandwidth and inference latency reductions, as the LLM Decision Maker operating on graphs matches VLM-inferred decisions but at nearly 2.5x reduced latency.

Figure 3: VLM finetuning strategy: vision backbone adaptation followed by joint language/vision optimization for robust topology detection.

Figure 4: Symbolic decision transmission and graph abstraction result in both comparable decision alignment and significant latency savings in edge–cloud pipelines.

Classical A* is brittle in unstructured or perturbed maps, lacking semantic awareness. AFSP's Decision2Trajectory bridges this gap via two mechanisms:

Semantic-Guided A*: LLM-generated directive sequences are encoded as soft cost functions, influencing A* expansion without introducing hard constraints that may induce search failures. Semantic cost shaping is operationalized by augmenting search states with prior moves and directive progress indices. Correction, delay, contradiction, and overaction are categorized and penalized/rewarded according to a structured cost matrix, which empirically enhances robustness to topological perturbations.

Figure 5: (a) Semantic pattern cost design categorizes directive alignment; (b) hyperparameter modulation yields distinct trajectory characteristics and robustness behaviors.

Agentic Refinement Module: Hyperparameter adaptation is recast as a closed-loop agentic process. The LLM retrieves scenario-similar warm-start settings from a cloud memory, evaluates cost shaping outcomes, and iteratively proposes refinements until metrics converge. This continual learning pipeline replaces manual tuning, adapting search and cost parameters to evolving environments without sacrificing interpretability or verifiability.
Figure 6: The agentic refinement loop: an LLM coordinates memory recall, semantic feedback analysis, and cost parameter updates for optimal path generation.

Closed-Loop Control: Cloud-Guided Flexible MPC

The final reference path, incorporating both geometric and semantic constraints, feeds into a switching MPC controller. Stepwise, the controller selects local references when feasible, invoking cloud-supplied plan summaries only when environmental ambiguity or complexity is detected. This enables low-latency actuation in routine conditions and LLM-informed global reasoning on demand, preserving safety and optimizing long-horizon performance.

Empirical Results

CARLA-based evaluations span perception, semantic decision consistency, trajectory generation under perturbation, and closed-loop driving. Key findings include:

Perception2Decision: The two-stage finetuned VLM yields highest IoU/accuracy, and symbolic graph transmission to LLM preserves decision quality at a 60% reduction in inference latency.
Decision2Trajectory: Semantic-Guided A* unequivocally preserves directive adherence under map misalignments where classic A* fails. The agentic refinement process converges on cost parameters that maximize both smoothness and semantic alignment.
End-to-end driving: Across diverse CARLA scenarios, AFSP achieves up to 45% reduction in maximum lateral deviation and 12% reduction in completion time compared to both pure MPC and A*-guided MPC baselines, with enhanced robustness to map perturbation and scenario diversity.

Figure 7: Comparative analysis of control schemes in the CARLA testbed; AFSP demonstrates reduced oscillation, enhanced safety corridors, and semantic adherence relative to classical planners.

Figure 8: AFSP achieves minimum finish time and lateral tracking error quantitatively across all scenarios, validating effective hierarchy and interface design.

Implications and Future Directions

AFSP establishes a verifiable, interpretable architecture for embedding large-model reasoning in real-time control-intensive domains. By separating and bridging perception, reasoning, planning, and actuation, it enables explicit interface formalization at each timescale, circumventing the opacity and rigidity of end-to-end neural policies and the brittleness of classical search.

Theoretically, the semantic-guided cost shaping in A* and agentic memory-based parameter refinement offer generalizable templates for integrating symbolic/LLM-derived priors in graph-based planners. Practically, the AFSP approach realizes scalable cloud–edge collaboration with meaningful reductions in both resource demand and real-time actuation latency.

Further research will likely extend AFSP with dynamic adaptive querying (e.g., learning when to invoke cloud-based reasoning), formal verification over symbolic directive execution, and increased memory/contextual capacity for life-long agent self-improvement in open-world driving.

Conclusion

Agentic Fast–Slow Planning constitutes a hierarchical and interpretable solution for bridging the gap between large-model semantic reasoning and robust, real-time robot control. Its modular interfaces, cloud–edge semantic abstraction, cost-shaped search, and agentic self-tuning deliver quantifiable safety, efficiency, and adaptability improvements—laying the groundwork for transparent ADSs that can harness the strengths of both foundation models and classical planners (2604.01681).

Markdown Report Issue