SparseDriveV2: Scoring is All You Need for End-to-End Autonomous Driving

Published 31 Mar 2026 in cs.CV | (2603.29163v1)

Abstract: End-to-end multi-modal planning has been widely adopted to model the uncertainty of driving behavior, typically by scoring candidate trajectories and selecting the optimal one. Existing approaches generally fall into two categories: scoring a large static trajectory vocabulary, or scoring a small set of dynamically generated proposals. While static vocabularies often suffer from coarse discretization of the action space, dynamic proposals provide finer-grained precision and have shown stronger empirical performance on existing benchmarks. However, it remains unclear whether dynamic generation is fundamentally necessary, or whether static vocabularies can already achieve comparable performance when they are sufficiently dense to cover the action space. In this work, we start with a systematic scaling study of Hydra-MDP, a representative scoring-based method, revealing that performance consistently improves as trajectory anchors become denser, without exhibiting saturation before computational constraints are reached. Motivated by this observation, we propose SparseDriveV2 to push the performance boundary of scoring-based planning through two complementary innovations: (1) a scalable vocabulary representation with a factorized structure that decomposes trajectories into geometric paths and velocity profiles, enabling combinatorial coverage of the action space, and (2) a scalable scoring strategy with coarse factorized scoring over paths and velocity profiles followed by fine-grained scoring on a small set of composed trajectories. By combining these two techniques, SparseDriveV2 achieves 92.0 PDMS and 90.1 EPDMS on NAVSIM, with 89.15 Driving Score and 70.00 Success Rate on Bench2Drive with a lightweight ResNet-34 as backbone. Code and model are released at https://github.com/swc-17/SparseDriveV2.

Abstract PDF Upgrade to Chat

Authors (7)

Summary

The paper introduces a scalable factorized trajectory vocabulary with hierarchical scoring, challenging the necessity of dynamic trajectory generation.
It details a method using independent clustering of geometric paths and velocity profiles to generate a super-dense candidate set for effective planning.
Experimental results on NAVSIM and Bench2Drive benchmarks show improved driving metrics and efficiency, underscoring its practical applicability.

SparseDriveV2: Scoring-Based Super-Dense Planning for End-to-End Autonomous Driving

Motivation and Problem Formulation

End-to-end autonomous driving models require effective planning strategies that handle uncertainty and multi-modality inherent to the driving task. Historically, scoring-based approaches either utilize static trajectory vocabularies, which are restricted by coarse discretization and computational constraints, or leverage dynamic trajectory generation, which improves coverage but increases model complexity due to added network components and iterative processes. The central question is whether dense static vocabularies, with proper representation and scoring, suffice for high-performance planning or whether dynamic trajectory generation is a necessity.

A scaling study with Hydra-MDP demonstrates that increasing trajectory vocabulary density yields monotonically improving planning results up to the computational limit, with no saturation observed. This insight substantiates that limitations of static methods are not intrinsic but due to action-space coverage bottlenecks. Thus, the research focus shifts toward scalable representations and efficient scoring mechanisms capable of operating over super-dense candidate sets.

SparseDriveV2 Framework

SparseDriveV2 introduces a scalable factorized trajectory vocabulary, representing each candidate trajectory as the composition of a geometric path and a velocity profile. This factorization dramatically expands possible coverage combinatorially while retaining compactness and tractability. The framework is supported by a hierarchical scoring pipeline: initial coarse scoring eliminates implausible paths and velocity profiles, followed by fine-grained scoring of top- $k$ compositions to select optimal actionable plans.

Figure 1: Overview of SparseDriveV2, showing trajectory factorization, vocabulary construction, and hierarchical scoring conditioned on scene context.

Factorized Vocabulary Construction

The trajectory vocabulary is constructed by separately clustering geometric paths and velocity profiles extracted from human driving demonstrations. Paths are sampled at fixed spatial intervals and clustered via K-Means, capturing diverse spatial motion patterns. Velocity profiles are extracted and clustered to represent temporal progressions along candidate paths. Final trajectories are composed as all possible path-velocity pairs, yielding a super-dense set with coverage proportional to $N_p \times N_v$ , greatly exceeding prior monolithic approaches.

Hierarchical Scoring Strategy

SparseDriveV2 employs coarse scoring independently on path and velocity anchors, conditioned on scene and vehicle status. Paths and velocity profiles interact with the scene via feature aggregation and cross-attention, yielding contextual scores. Top- $K$ selection prunes the candidate set, enabling composition of high-quality trajectories for fine-grained joint scoring. A trajectory re-conditioning mechanism further enhances modeling of spatial-temporal dependencies, allowing for expressive and accurate final scoring.

Experimental Results and Numerical Performance

SparseDriveV2 achieves strong numerical results across NAVSIM v1/v2 and Bench2Drive benchmarks:

NAVSIM v1: PDMS 92.0, outperforming all scoring-based and dynamic generation methods, including those utilizing larger backbone networks.
NAVSIM v2: EPDMS 90.1 (corrected protocol), with marked improvements in the efficiency metric (EP), indicating superior coverage and scoring precision.
Bench2Drive: Driving Score 89.15, Success Rate 70.00%, and Mean Multi-Ability Score 67.67, displaying robust generalization in closed-loop interactive scenarios.

SparseDriveV2 sets new records for scoring-based methods and exhibits strong alignment with expert trajectories in terms of high-level intent and adaptability across diverse driving abilities.

Figure 2: Trajectory output visualization in sharp-turning scenarios, illustrating smoother motion by SparseDriveV2 relative to baseline.

Figure 3: Traffic efficiency comparison, with SparseDriveV2 achieving higher throughput than baseline planners.

Figure 4: High-level intent alignment, showing SparseDriveV2's superior geometric path modeling for expert-matching behaviors.

Qualitative Analysis and Failure Modes

SparseDriveV2 consistently delivers smooth and efficient trajectories in complex scenarios, as verified by qualitative results on NAVSIM. The factorized modeling enhances intent alignment and trajectory plausibility. The notable failure modes pertain to navigation information insufficiency, where incorrect trajectory decisions can emerge due to incomplete scene context encoding.

Figure 5: Failure case visualization, highlighting SparseDriveV2’s vulnerability in navigation-limited scenarios.

Implications and Future Prospects

The study demonstrates that scoring-based planning, when augmented with dense and factorized trajectory vocabularies together with scalable, hierarchical scoring mechanisms, can rival or surpass the performance of dynamic generation approaches. This result challenges prevailing assumptions about the necessity of generative planning paradigms for multi-modal driving, suggesting that optimally constructed static vocabularies are not fundamentally limited. The practical implication is a reduction in model complexity and inference overhead, enabling efficient, high-quality planning in resource-constrained real-world deployments.

Future research should further investigate enhanced scene context encoding, addressing navigational failure cases, and explore scaling factorized vocabularies with richer environmental priors. Theoretical analysis on convergence properties and optimal vocabulary density under fixed computational budgets would provide additional guidance. Integration with hybrid approaches that selectively enable trajectory generation in coverage-critical scenarios presents another path for advancing end-to-end autonomous driving planning frameworks.

Conclusion

SparseDriveV2 establishes that scoring is sufficient when supported by a scalable, factorized trajectory vocabulary and a hierarchical scoring pipeline. Achieving SOTA results with minimal backbone complexity, SparseDriveV2 redefines the limits of scoring-based planning, supporting both theoretical and practical advancements for end-to-end autonomous driving systems (2603.29163).

Markdown Report Issue