- The paper shows that integrating route and goal information into attention-based prediction models significantly enhances both prediction accuracy and planning feasibility.
- It introduces three fusion strategies, including early fusion with and without navigation loss, and benchmarks their performance on metrics like mAP, minADE, and miss rate.
- Early fusion without navigation loss (SceneMotion-A1) achieved the best outcomes, promoting route-adherent trajectories and improved open-loop planning efficiency.
Prediction-Driven Motion Planning via Attention-Based Route Integration
Introduction and Motivation
This paper investigates integrated approaches to motion prediction and motion planning for autonomous vehicles by enriching attention-based prediction models with explicit navigation information. The central claim is that conditioning trajectory prediction on the navigation goals and intended route of the ego vehicle enhances both prediction accuracy and planning feasibility. The research addresses two distinct challenges in IPP (Integrated Prediction and Planning): (1) the lack of navigation conditioning in conventional prediction models, and (2) the need for stable, kinematically feasible trajectories in planning. The focus here is on the first aspect, systematically studying how to architecturally infuse goal and route data into transformer-based multi-agent predictors.
Motion prediction models aim to forecast trajectories for multiple agents by considering their past behavior and environmental context, with transformer-based methods leveraging polyline and attention-based scene encodings to capture complex interactions. Motion planning, on the other hand, seeks feasible, safe trajectories for the ego agent, with a recent trend toward deep learning-based planners, including transformer and diffusion-based generative models.
Prior IPP frameworks are categorized as sequential, undirected, and bidirectional. Most existing works treat prediction and planning as partially or fully decoupled, missing the benefits of bidirectional conditioning—especially explicit planning goal integration into prediction. Goal-conditioned prediction has shown promise in related domains, but this work provides the first architectural and empirical comparison of fusion strategies for integrating route and goal information into attention-based joint prediction.
Methodology
The core implementation is based on the SceneMotion model, a transformer-based architecture capable of joint trajectory prediction over multiple agents using a polyline scene representation. The model is extended with three route integration strategies, each differing in the stage and mechanism of fusing navigation data:
- Early Fusion without Navigation Loss (SceneMotion-A1): One-hot route membership is appended to map polylines, and the goal pose is included as an agent-centric token. The prediction network can thus condition its representations on the intended navigation early in the feature pipeline.
- Early Fusion with Navigation Loss (SceneMotion-A2): Identical architectural augmentation to A1, but incorporates a novel navigation loss. Unlike standard imitation losses, this penalizes only lateral deviation from the assigned route, decoupling adherence to ground truth from route alignment—a key distinction for planning contexts.
- Late Fusion with Navigation Loss (SceneMotion-A3): Route polylines are encoded in a parallel branch and summarized into route embeddings using a reduction decoder. These navigation features are concatenated with agent-centric features deeper in the transformer, enabling the model to consult navigation after initial scene encoding.
The navigation loss employs a robust distance-based term (approximating the Welsch loss) to encourage predicted trajectories to remain proximate to the given route, mitigating the over-emphasis on perfect trajectory imitation.
Experimental Evaluation
Datasets and Baselines
Evaluations are conducted on the nuPlan dataset, leveraging both its validation split and the Val14 open-loop challenge subset. Up to eight focus agents per scene are selected using an interest scoring protocol inspired by the WOMD agent selection heuristic, ensuring coverage of diverse motion patterns. All methods are trained with consistent data configurations and model size constraints for comparability.
Metrics
Prediction metrics follow the Waymo Motion Prediction Challenge: mean average precision (mAP), average displacement error (minADE), final displacement error (minFDE), miss rate (MR), and overlap rate (OR). Open-loop planning is scored via the nuPlan Open-Loop Score (OLS), aggregating errors in position, heading, and misses across prediction horizons.
All navigation-integrated models surpass the SceneMotion baseline in mAP, displacement errors, and miss rate for the ego agent, confirming the positive impact of even basic route conditioning. Notably:
- SceneMotion-A1 (early fusion, no navigation loss) achieves the strongest performance, improving mAP by 3.5%, minADE and minFDE by ~10%, and MR by 14.4% for the ego vehicle relative to the baseline.
- The addition of the navigation loss (A2) and late fusion (A3) does not yield further improvements, with A1 outperforming both despite nearly identical parameter counts.
- Late fusion (A3) increases model complexity and parameter count but does not translate to further prediction gains.
Open-Loop Planning Results
- SceneMotion-A1 increases the OLS by nearly 0.9% over the baseline, with other adaptations achieving minor improvements or regressions. This demonstrates that navigation augmentation remains beneficial for open-loop planning, even when post-processing heuristics are absent.
- The model demonstrates more route-compliant trajectories in challenging intersection cases, reducing incidences of invalid or traffic rule-violating plans.
Limitations
While the integration of navigation data improves prediction and planning, the nuPlan benchmark's relative simplicity reduces the observable impact of these enhancements in many scenarios. Substantial score improvements may be masked when navigation challenges (e.g., lane changes, detours) are underrepresented.
Implications and Future Directions
From a practical standpoint, the results advocate for systematic incorporation of high-level navigation cues—such as route membership and goal position—directly into the representations learned by motion prediction models. The findings suggest that early fusion at the feature input stage is both sufficient and computationally efficient for extracting tangible benefits.
Theoretically, the research highlights the necessity of decoupling trajectory imitation from route-following in the loss design for integrated prediction-planning models. However, more intricate or adversarial environments (e.g., interPlan) may reveal additional advantages to navigation-aware prediction. Further, the potential of using high-level command-based navigation, as opposed to map-derived route data, warrants investigation for scaling IPP frameworks to resource-constrained or less-structured domains.
Future avenues include:
- Benchmarking on more complex datasets with richer navigation tasks
- Extending navigation integration to closed-loop or bidirectional architectures
- Exploring command- or language-based navigation input representations
- Unifying planning and prediction to enable on-policy, stable generation of feasible, safe trajectories under real-world constraints
Conclusion
This work substantiates the claim that integrating route and goal information into attention-based multi-agent motion predictors improves both prediction accuracy and planning utility. Among architectural choices, early-stage fusion of navigation is consistently most effective, obviating the need for complex navigation-specific loss terms or deep late-stage fusion. The outcomes underscore the value of cross-disciplinary synthesis between prediction and planning research domains and provide a foundation for future developments toward unified, goal-aware decision-making models for autonomous driving systems.