Prediction-Driven Motion Planning
- Prediction-driven motion planning is a framework that treats forecasts of trajectories, occupancies, or maps as primary planning variables to optimize autonomous actions.
- It couples predictive models with planners through techniques like differentiable MPC and Monte Carlo rollouts, enabling interactive, uncertainty-aware, and goal-conditioned decision-making.
- Applications in autonomous driving, robotics, and human-robot interaction demonstrate improvements in conflict recall, closed-loop safety, and overall system efficiency.
Prediction-driven motion planning denotes a family of methods in which forecasts of future trajectories, occupancies, maps, or latent interaction states are treated as first-class planning variables rather than as auxiliary perception outputs. In these systems, the planner selects an ego action sequence while explicitly reasoning over predicted future evolution of other agents or of the environment, and in stronger formulations the predictor itself is conditioned on ego intent or embedded inside the planner. The resulting design space spans planning-aware trajectory prediction for autonomous driving, uncertainty-aware model predictive control for mobile robots, human–robot co-optimization with differentiable predictors, map prediction in unknown environments, and even motion planning cast as video prediction (Huang et al., 2022, Vazquez et al., 2022, Elhafsi et al., 2019).
1. Conceptual foundations
A common formalization is to optimize an ego control sequence against a forecast distribution over future scene realizations :
Here denotes the observed scene history and context, is a planning-conditioned predictor, and aggregates progress, safety, comfort, legality, or task costs. The defining distinction is the dependence of prediction on the planned ego action. In decoupled stacks, prediction is often modeled as ; prediction-driven formulations instead treat future motion as interactive and planning-dependent.
This distinction is explicit in autonomous driving. "Deep Interactive Motion Prediction and Planning" embeds a learned interactive multi-agent policy inside a game-theoretic MPC and teacher-forces the ego planned state sequence into the predictor, so that surrounding agents react to the candidate ego plan during optimization (Vazquez et al., 2022). "Differentiable Integrated Motion Prediction and Planning" uses a differentiable nonlinear optimizer that takes predicted trajectories of surrounding agents as input and jointly learns the planning cost weights, so the prediction module becomes planning-centric rather than merely forecast-centric (Huang et al., 2022). "Planning by Simulation: Motion Planning with Learning-based Parallel Scenario Prediction for Autonomous Driving" argues that some methods overlook the significant influence of the ego vehicle’s planning on the possible trajectories of other agents, and proposes Planning by Simulation with learning-based parallel scenario prediction, where predictions are deduced iteratively based on Monte Carlo Tree Search (Niu et al., 2024).
Prediction-driven planning is not limited to interaction conditioning. Goal and route information can also enter prediction. "Prediction-Driven Motion Planning: Route Integration Strategies in Attention-Based Prediction Models" extends an attention-based joint predictor with route polylines and a goal token, thereby addressing the mismatch between goal-conditioned planning and traditional prediction models that ignore navigation intent (Steiner et al., 3 Dec 2025). This suggests that, in integrated systems, the forecast object is not just “what others will do,” but “what futures remain compatible with the ego’s route, goal, and control authority.”
2. What is predicted
The predicted object in prediction-driven planning varies substantially across subfields. In many robotic and driving systems, the predictor outputs multimodal trajectories or occupancy sets. "Future-Oriented Navigation" uses a one-shot multimodal energy-based predictor that produces a stack of per-time-step probability maps for , with steps, corresponding to 0 ahead at sampling time 1; these maps are then clustered into modes and fitted with elliptical Gaussian occupancies (Zhang et al., 1 May 2025). CogDrive likewise decodes multiple trajectory modes with probabilities and per-step covariances, using a multimodal Gaussian formulation and explicit topological interaction modes such as yielding, neutral, and aggressive behavior (Huang et al., 2 Dec 2025).
A second representation is set-valued prediction. "Robust Predictive Motion Planning by Learning Obstacle Uncertainty" learns an intended control set 2 for each obstacle by solving a linear program, then propagates forward reachable sets 3 and position occupancies 4 over the prediction horizon (Zhou et al., 2024). "A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning" goes further by augmenting the human state with a belief over model parameters and computing Belief-Augmented Forward Reachable Sets through a Hamilton–Jacobi PDE, with allowable action sets
5
thereby yielding prediction sets that are continuous in state and time and explicitly robust to prior misspecification (Bansal et al., 2019).
A third representation emphasizes calibrated uncertainty. "Adaptive Conformal Prediction for Motion Planning among Dynamic Agents" constructs online multistep uncertainty radii 6 from delayed residuals
7
and uses an adaptive recursion
8
to maintain horizon-specific uncertainty sets with probabilistic coverage under distribution shift (Dixit et al., 2022).
Prediction-driven planning also includes non-trajectory forecasts. "Map-Predictive Motion Planning in Unknown Environments" predicts occupancies of unobserved map cells with a Conditional Neural Process and then plans all the way to the goal without heuristic frontier selection (Elhafsi et al., 2019). "Planning Robot Motion using Deep Visual Prediction" predicts up to 10 future egocentric frames from monocular video using PROM-Net and passes those predictions to an MPC, while "Robot Motion Planning as Video Prediction" recasts path generation itself as next-frame prediction over map, robot-state, and goal channels (Sarkar et al., 2019, Zang et al., 2022). In these cases, the predicted object is the future sensor stream or workspace occupancy rather than a list of agent trajectories.
3. How prediction enters the planner
The most common coupling mechanism is MPC with prediction-derived costs or constraints. In "Future-Oriented Navigation", predicted ellipses are used as hard avoidance constraints in the near term and as soft costs over the full horizon, with 9, 0, and a solve time budget of 1 per cycle (Zhang et al., 1 May 2025). In "Adaptive Conformal Prediction for Motion Planning among Dynamic Agents", the MPC enforces a Lipschitz-tightened safety constraint
2
which converts distribution-free conformal uncertainty radii into robust planning constraints (Dixit et al., 2022). In "Robust Predictive Motion Planning by Learning Obstacle Uncertainty", reachable sets derived from the learned control sets are converted into polytope separation constraints with slack variables, allowing the ego trajectory to remain outside obstacle occupancies over the horizon (Zhou et al., 2024).
A second coupling mechanism is closed-loop interactive rollout inside a search or game formulation. "Deep Interactive Motion Prediction and Planning" uses an Interactive Multi-Agent Policy within an iterative leader-follower or iterative best-response MPC, and optimizes ego sequences with the Cross-Entropy Method while the predictor simulates other agents’ best responses (Vazquez et al., 2022). "Planning by Simulation" uses Monte Carlo Tree Search to balance and prune unreasonable actions and scenarios, explicitly exploring future interactions encoded within the prediction network (Niu et al., 2024). In both cases, prediction is not an exogenous input but part of the search semantics.
A third mechanism is end-to-end differentiable optimization. DIPP defines the planning problem as a sum of squared residuals for speed tracking, comfort, road adherence, traffic-light compliance, and safety, then solves it with a differentiable Gauss–Newton least-squares optimizer. Because the optimizer is unrolled during training, gradients from planning losses propagate back into both the predictor and the cost weights (Huang et al., 2022). This is a stronger form of prediction-driven planning: the prediction model is trained by the downstream planning objective rather than by open-loop forecast metrics alone.
Sampling-based and feedback planners also exploit predictive structure. "Adaptive Dual-Headway Unicycle Pose Control and Motion Prediction for Optimal Sampling-Based Feedback Motion Planning" derives an explicit convex bound on the future closed-loop trajectory,
3
and uses this shrinking convex set for constant-time safety verification of candidate edges (İşleyen et al., 2024). Prediction here is not a learned multimodal future but a feedback motion envelope, yet it serves the same planning function: candidate motions are accepted or rejected based on predicted future occupancy.
4. Representative domains and empirical evidence
Autonomous driving has supplied some of the clearest evidence that open-loop prediction accuracy is not sufficient for safe planning. "P4P: Conflict-Aware Motion Prediction for Planning in Autonomous Driving" reports that DenseTNT achieves minFDE 4 and minADE 5, but only Top-6 conflict recall 6 and a closed-loop collision rate of 7. By contrast, P4P records minFDE 8 and minADE 9, yet Top-6 conflict recall 0 and collision rate 1 (Sun et al., 2022). The critical planning signal is therefore conflict identification and relation inference, not merely pointwise displacement error.
Integrated driving systems show similar effects. DIPP attains open-loop prediction ADE/FDE of 2, and in closed-loop log-replay reports collision rate 3, off-route rate 4, and progress 5, improving over the separated planning-plus-prediction pipeline, which records collision rate 6 and progress 7 (Huang et al., 2022). In warehouse-style navigation, "Future-Oriented Navigation" reports success rates of 8 in Scenario 1, 9 in Scenario 2, 0 in Scenario 3, and 1 in Scenario 4, together with inference latency of about 2 per object and typical MPC solve times between 3 and 4 (Zhang et al., 1 May 2025).
Human-aware robotic planning provides a second major application area. "HMPO: Human Motion Prediction in Occluded Environments for Safe Motion Planning" couples an occlusion-aware CNN+LSTM predictor to an optimizer that inflates human capsules according to predicted visibility confidence. On Occlusion MoCap, HMPO reduces 3 s joint prediction error from 5 for Tracking+EKF to 6, a 7 reduction, and the qualitative planning results show that the planner first reduces occlusion and then completes the task with collision-free trajectories (Park et al., 2020). "Planning Coordinated Human-Robot Motions with Neural Network Full-Body Prediction Models" goes further by introducing latent modifiers to a differentiable recurrent human predictor and jointly optimizing robot controls and human prediction adjustments; on collision-avoidance tasks it reports success rate 8 for the joint method, compared with 9 for robot_avoids and 0 for human_avoids (Kratzer et al., 2022).
Prediction-driven planning also appears in compact and fully learned robotic systems. PROM-Net predicts the next 10 future frames in an unsupervised manner, uses about 1 million trainable parameters and a model size of about 2 Megabytes, and is intended to feed predicted frames and latent states into MPC for dynamic obstacle avoidance (Sarkar et al., 2019). STP-Net reformulates motion planning itself as video prediction and reports success rates from 3 to 4, while achieving at least 5, 6, and 7 faster speed with lower path cost on 2D Random Forest, 2D Maze, and 3D Random Forest environments, respectively (Zang et al., 2022). "Map-Predictive Motion Planning in Unknown Environments" uses learned occupancy prediction in unobserved space, achieving about 8 per planning iteration, roughly 9 faster than the comparison method, while maintaining time-efficient safe navigation (Elhafsi et al., 2019).
5. Trade-offs, misconceptions, and failure modes
A persistent misconception is that better open-loop trajectory metrics automatically imply better planning. P4P directly contradicts this: low minADE/minFDE predictors can still miss the conflicts that determine downstream collision risk (Sun et al., 2022). A plausible implication is that evaluation protocols for prediction-driven planning must include planning-centric metrics such as conflict recall, closed-loop success, or safety margins, not only forecast displacement.
A second trade-off is conservatism versus feasibility. In "Robust Predictive Motion Planning by Learning Obstacle Uncertainty", worst-case RMPC and the proposed learned-set method both achieve 0 collision-free rate in Monte Carlo trials, but RMPC has completion rate 1 while the proposed method reaches 2; DMPC, which assumes zero uncertainty, has completion rate 3 but only 4 collision-free rate (Zhou et al., 2024). Prediction-driven planning therefore lives between optimistic forecasting and worst-case reachability, and the representation of uncertainty largely determines where a system falls on that spectrum.
A third issue is uncertainty calibration. Adaptive conformal prediction addresses this by producing distribution-free, adaptive uncertainty sets with average coverage and an average closed-loop safety guarantee under recursive feasibility, rather than relying on fixed heuristic margins (Dixit et al., 2022). By contrast, "Future-Oriented Navigation" shows that loss design itself affects planning behavior: ENLL yields concentrated occupancy and higher success rates than BCE or KLD, whereas BCE and KLD overly inflate occupancy and hurt feasibility; grouping predicted obstacles is further used to mitigate the Freezing Robot Problem (Zhang et al., 1 May 2025).
Prediction-driven systems also inherit modality-specific failure modes. PROM-Net’s predictions blur with horizon under mean squared error training, although motion direction is preserved (Sarkar et al., 2019). Interactive planners depend on the fidelity of the behavioral model they embed; this suggests that planning-aware conditioning can improve interaction realism, but only insofar as the predictor remains calibrated under the actions the planner explores. In route-conditioned attention models, "SceneMotion-A1" improves open-loop planning score while navigation loss does not, indicating that route conditioning and closed-loop stability are distinct issues rather than interchangeable ones (Steiner et al., 3 Dec 2025).
6. Unification and current directions
Recent work increasingly treats prediction, planning, and simulation as variations of the same motion-modeling problem. "UniMotion" uses a decoder-only Transformer with dedicated interaction modes and joint training across simulation, prediction, and planning. After fine-tuning, it reports prediction minADE 5, minFDE 6, mAP 7, planning error 8 at 9, and collision 0, while also achieving a Realism Meta Metric of 1 in simulation (Song et al., 31 Jan 2026). This indicates a shift from modular pipelines toward shared motion backbones with task-specific decoding and masking.
A parallel direction is richer conditioning. Route integration in attention-based predictors improves both prediction and planning utility: SceneMotion-A1 reports average mAP 2, minADE 3, minFDE 4, and open-loop score 5, compared with a SceneMotion baseline open-loop score of 6 (Steiner et al., 3 Dec 2025). CogDrive adds cognitive interaction modes and a two-phase emergency trajectory tree with a short-term root branch that is safe across all modes and long-term branches that remain available under low-probability switching behaviors; it reports on Argoverse 2 b-minFDE 7, minFDE 8, miss rate 9, and minADE 0, as well as minJointFDE 1 and minJointADE 2 on INTERACTION (Huang et al., 2 Dec 2025).
These developments suggest that the field is converging on a few recurrent principles. Prediction must be action-aware or goal-aware when interaction matters. Uncertainty must be represented in forms planners can consume, whether as modes, ellipses, reachable sets, conformal radii, or convex feedback envelopes. Evaluation must be closed-loop and planning-centric. And, increasingly, the predictive model is no longer a detachable upstream module: it is part of the planning algorithm’s state, objective, or solver itself.