CLOVER: Closed-Loop Value Estimation \& Ranking for End-to-End Autonomous Driving Planning

Published 14 May 2026 in cs.RO, cs.AI, and cs.CV | (2605.15120v1)

Abstract: End-to-end autonomous driving planners are commonly trained by imitating a single logged trajectory, yet evaluated by rule-based planning metrics that measure safety, feasibility, progress, and comfort. This creates a training--evaluation mismatch: trajectories close to the logged path may violate planning rules, while alternatives farther from the demonstration can remain valid and high-scoring. The mismatch is especially limiting for proposal-selection planners, whose performance depends on candidate-set coverage and scorer ranking quality. We propose CLOVER, a Closed-LOop Value Estimation and Ranking framework for end-to-end autonomous driving planning. CLOVER follows a lightweight generator--scorer formulation: a generator produces diverse candidate trajectories, and a scorer predicts planning-metric sub-scores to rank them at inference time. To expand proposal support beyond single-trajectory imitation, CLOVER constructs evaluator-filtered pseudo-expert trajectories and trains the generator with set-level coverage supervision. It then performs conservative closed-loop self-distillation: the scorer is fitted to true evaluator sub-scores on generated proposals, while the generator is refined toward teacher-selected top-$k$ and vector-Pareto targets with stability regularization. We analyze when an imperfect scorer can improve the generator, showing that scorer-mediated refinement is reliable when scorer-selected targets are enriched under the true evaluator and updates remain conservative. On NAVSIM, CLOVER achieves 94.5 PDMS and 90.4 EPDMS, establishing a new state of the art. On the more challenging NavHard split, it obtains 48.3 EPDMS, matching the strongest reported result. On supplementary nuScenes open-loop evaluation, CLOVER achieves the lowest L2 error and collision rate among compared methods. Code data will be released at https://github.com/WilliamXuanYu/CLOVER.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a closed-loop paradigm that couples proposal generation with scoring to improve autonomous driving planning.
It employs a two-stage training process with pseudo-expert coverage and conservative self-distillation to enhance both proposal diversity and quality.
Empirical results demonstrate state-of-the-art performance on benchmarks like NAVSIM and nuScenes with robust reproducibility and safety improvements.

CLOVER: Closed-Loop Value Estimation & Ranking for End-to-End Autonomous Driving Planning

Motivation and Problem Formulation

End-to-end autonomous driving planners are predominantly trained on imitation of logged human trajectories but are evaluated using rule-based planning metrics that reflect safety, feasibility, progress, and comfort. This creates a mismatch between training and evaluation: adherence to logged paths does not guarantee satisfaction of planning metrics, and valid high-scoring alternatives may exist outside the demonstration set. Proposal-selection planners, which generate a set of candidate trajectories, often suffer from limited candidate diversity and inadequate ranking quality due to single-trajectory supervision. The performance ceiling is determined by both the coverage of high-quality proposals and the efficacy of the ranking mechanism.

CLOVER Framework

CLOVER introduces a closed-loop value estimation and ranking paradigm that couples proposal generation and scoring. The architecture consists of a generator producing diverse candidate trajectories and a scorer predicting planning-metric sub-scores for each proposal. Inference selects the top-ranked trajectory based on the composed score. Training proceeds in two stages:

Stage 1: Pseudo-expert coverage is achieved by constructing evaluator-filtered pseudo-expert trajectories from diverse action families (lateral offsets, speed profiles, etc.), providing set-level supervision to expand proposal diversity and quality. This increases the oracle upper bound for the candidate set.

Stage 2: Conservative closed-loop self-distillation alternates scorer fitting (to true evaluator scores) and generator refinement. The scorer is used to select top-k and vector-Pareto targets, and the generator is trained to cover these with stability regularization. This process avoids diversity collapse and prevents exploitation of scorer imperfections.

Theoretical Guarantees

CLOVER’s refinement mechanism relies on a selected-set enrichment condition: if scorer-selected targets are statistically enriched for true high-quality trajectories relative to the existing proposal distribution, conservative set-level distillation increases the generator's support for high-quality candidates. CLOVER does not require a globally perfect scorer; target enrichment suffices for proposal set improvement. Empirical analysis supports this premise, showing substantial enrichment of true high-score proposals among scorer-selected candidates.

Empirical Results

CLOVER achieves state-of-the-art results across major closed-loop planning benchmarks:

NAVSIM v1: 94.5 PDMS, outperforms prior generator-scorer baselines and approaches human-driver reference.
NAVSIM v2: 90.4 EPDMS with the updated evaluator, and 87.2 EPDMS* under the original code. On the challenging NavHard split, 48.3 EPDMS matches the strongest previously reported results.
nuScenes open-loop: CLOVER achieves the lowest L2 displacement error and collision rate among compared approaches.

Seed-level reproducibility studies indicate negligible training variation (<0.02 PDMS), attesting to robustness.

Proposal Quality and Diversity

Analysis of generated proposals demonstrates significant stage-wise improvements:

Stage 1: Dramatically expands proposal diversity and oracle upper bound (Oracle@64 PDMS increases from 0.9933 to 0.9976), though introduces a low-score tail.
Stage 2: Refines the expanded distribution, increases mean proposal score (from 0.7972 to 0.8277), reduces low-score proposals (PDMS<0.50 drops from 9.05 to 6.83), and preserves diversity (Qualified Cluster Count@2m increases from 6.02 to 8.71 versus baseline).

Qualitative visualizations confirm that CLOVER produces broader candidate sets, covering multiple feasible driving modes, while maintaining high proposal quality.

Ablation and Diagnostic Studies

Critical ablations reinforce the effectiveness of CLOVER’s components:

Pseudo-expert coverage and closed-loop refinement are mutually complementary; full CLOVER reaches 94.5 PDMS.
Vector-Pareto guidance outperforms scalar top-k targets and distance suppression, preserving diversity among high-scoring proposals.
Anchor-assisted soft reranking (for EPDMS) improves extended comfort and total score by reducing temporal selection jitter, without sacrificing progress.

Proposal count studies indicate diminishing returns beyond $K=64$ candidates. Fixed-proposal scorer diagnostics show that larger video and vision backbones (e.g., Wan2.2-5B) improve ranking quality, highlighting the importance of scorer design for final performance.

Limitations and Future Directions

CLOVER focuses on trajectory-level scoring and ranking at the per-scene level. Temporally aggregated metrics (e.g., extended comfort) depend on cross-frame consistency, currently mitigated only by optional anchor-assisted reranking. Future advances could integrate sequence-level or history-aware scorers for improved temporal coherence. Scaling the scorer architecture (with more powerful features or world models) may further enhance ranking quality, though computational costs must be considered.

Implications and Outlook

CLOVER formally bridges the gap between imitation-based training and rule-based evaluation in end-to-end autonomous driving by integrating evaluator-guided proposal coverage and scorer-mediated conservative refinement. Its two-stage paradigm ensures both diversity and quality in candidate generation, while robust closed-loop distillation shifts probability mass toward high-value regions without requiring a perfect surrogate. Theoretical and empirical evidence indicates that even imperfect scorers suffice if target selection yields statistical enrichment.

Practically, CLOVER's training schema and efficient inference architecture are compatible with existing proposal-selection planners, enabling significant gains in safety, comfort, and planning robustness. The methodology is extensible to increasingly challenging metrics, more diverse driving environments, and richer evaluation schemes. The framework’s diagnostic protocols for scorer development provide a standardized path toward further improvements.

Conclusion

CLOVER establishes a closed-loop value estimation and ranking blueprint for end-to-end autonomous driving planning, achieving strong numerical results and improved proposal-set quality and diversity on benchmark evaluations. The framework couples evaluator-filtered pseudo-expert coverage and closed-loop self-distillation, demonstrating both practical efficacy and theoretical rigor in guiding proposal generation via imperfect but enriched scorer targets. This paves the way for more robust and generalizable autonomous planning systems with flexible integration of scoring and proposal-generation modules (2605.15120).

Markdown Report Issue