Papers
Topics
Authors
Recent
Search
2000 character limit reached

CdDrive: End-to-End Autonomous Driving Framework

Updated 10 February 2026
  • CdDrive is a dual-aspect concept, referring both to a leading autonomous driving paradigm that employs hybrid candidate generation and to the classic Compact Disk Drive in computing.
  • It utilizes a unified pipeline combining static vocabulary candidates from expert logs with vocabulary-conditioned diffusion refinement to adapt trajectories in complex driving scenarios.
  • Extensive experiments on NAVSIM benchmarks demonstrate that CdDrive achieves state-of-the-art safety, comfort, and progress metrics, ensuring robust performance in varied driving contexts.

CdDrive refers to two distinct but influential research and engineering concepts: (1) an advanced trajectory candidate generation and selection architecture for end-to-end autonomous driving, and (2) the canonical abbreviation for "Compact Disk Drive," particularly in the context of file system implementations and optical media management. The term is most prominently established as the name of a leading end-to-end autonomous driving paradigm, combining a fixed trajectory vocabulary with scene-adaptive refinement via diffusion, and further incorporating a noise-adaptation module to enhance trajectory smoothness. Detailed below are the key technical principles, workflow, architectural components, experiments, and practical implications as delineated by recent literature.

1. Unified Candidate Generation Pipeline in CdDrive

CdDrive establishes a multimodal planning workflow where a single perception backbone consumes multi-sensor input—specifically, bird’s-eye view (BEV) features synthesized from three front cameras and five LiDAR streams. The BEV feature map produced by a Transfuser-style ResNet-34 backbone drives two parallel candidate generation mechanisms:

  • Static Vocabulary Candidates: KK=256 trajectories sampled from expert driving logs, clustered via flat (K-means) algorithms to cover canonical driving behaviors (lane keeping, gentle turns, go-straight).
  • Vocabulary-Conditioned Diffusion Candidates: Each anchor trajectory from the vocabulary is refined by a conditional diffusion model employing a forward noising process, a horizon-aware temporally smoothed noise adapter (HATNA), and a truncated denoising sequence from which refined, scenario-adaptive candidates are synthesized.

The joint candidate pool C(z)=VocabDiffusionC(z)=Vocab \cup Diffusion is scored by a unified, shared selection module which evaluates each candidate trajectory using a latent world model rollout and a differentiable scorer network, selecting the single best plan for execution (Wu et al., 3 Feb 2026).

2. Fixed Trajectory Vocabulary Extraction and Structure

Vocabulary anchors are established by extracting dense sequences of 2D waypoints (positions) from expert logs and clustering them in the waypoint space using K-means:

  • Each vocabulary element Tk={(xk,i,yk,i,vk,i)}i=1nT_k = \{(x_{k,i}, y_{k,i}, v_{k,i})\}_{i=1}^n encapsulates nn discretized future positions and headings, providing a discrete, stable coverage of expected routine behaviors.
  • This fixed candidate set delivers robust performance in low-interaction scenes by ensuring strong coverage of typical road-following and basic interaction behaviors.
  • The vocabulary thus anchors the candidate set, suppressing degeneracies due to dynamic over-refinement in simple contexts (Wu et al., 3 Feb 2026).

3. Vocabulary-Conditioned Diffusion Denoising and HATNA

CdDrive’s vocabulary-conditioned diffusion module serves to generate scene-adaptive trajectories that can deviate from the static anchors in complex interaction settings:

  • Forward Process: For each anchor pkp_k, Gaussian noise is introduced at a truncated time step ttrt_{tr}; noise is adapted by HATNA, which:
    • Applies 1D Gaussian low-pass filtering along the waypoint axis to temporally regularize the sampled noise (kernel radius rr=2).
    • Scales noise by a horizon-aware profile si=(i/(n1)+ϵ0)αs_i=(i/(n-1)+\epsilon_0)^\alpha (with typical α=2.0\alpha=2.0), optionally learning per-waypoint gains.
  • Reverse (Denoising) Process: DDIM-style time-reversal with refinement δ=Aϕ(Pt,z,t)\delta=A_\phi(P_t,z,t) at each denoising step allows the candidate to move off-manifold as required by the current scene.
  • Heading Prediction: Positions use the diffusion process, while headings are predicted directly by an auxiliary MLP without added noise.

Scene-adaptive candidates thus capture nontrivial behaviors—such as merges, blockage avoidance, and rare multimodal futures—while HATNA ensures geometric continuity and suppresses spurious trajectory artifacts at long time horizons (Wu et al., 3 Feb 2026).

4. Shared Selection and World Model Rollout

Each candidate trajectory is assessed in the context of a learned latent world model (LWM):

  • Latent World Model: For each (z,T)(z, T) pair, the LWM predicts future BEV semantic maps conditioned on candidate trajectory rollouts.
  • Scoring Module: The scorer fψ(T,z,z)f_\psi(T, z, z') produces a vector of metrics:
    • Safety: No collision (NC), time-to-collision (TTC), and drivable-area compliance (DAC).
    • Progress: Ego progress (EP) over the horizon.
    • Comfort: Lateral and longitudinal jerk metrics.

Scores are aggregated as S(T;z)=ws(T;z)S(T;z)=w^\top s(T;z), and the plan with maximal aggregate score is selected for execution.

Weights (wNC,wDAC,...)(w_{\text{NC}}, w_{\text{DAC}}, ...) are fixed, establishing a consistent optimization basis and ensuring reproducibility across diverse scenes (Wu et al., 3 Feb 2026).

5. Training Procedure and Hyperparameterization

CdDrive is trained on the NAVSIM v1 and v2 datasets, derived from nuPlan, with the following salient parameters:

  • Vocabulary and Diffusion: K=256K=256 for both static and diffusion candidates; nn waypoints; T=20T=20 diffusion steps with truncation ttr=16t_{tr}=16.
  • HATNA: Gaussian smoothing (r=2r=2), scale exponent α=2.0\alpha=2.0, and ϵ0=1e-6\epsilon_0=1\text{e-6}.
  • Optimization: AdamW optimizer, learning rate 1e-41\text{e-4}, 30 epochs on 8 GPUs.
  • Losses: Weighted sum (with Atraj=1.0,Aim=0.01,Asim=0.1,...A_{\text{traj}}=1.0, A_{\text{im}}=0.01, A_{\text{sim}}=0.1, ...) over trajectory refinement (WTA L1L_1), imitation (cross-entropy), simulation compliance (BCE), world model supervision (focal loss), and BEV/agent semantic losses.

A plausible implication is that this weighting ensures the model simultaneously aligns candidate refinement to expert plans, maximizes compliance with semantic simulation rules, and maintains representational fidelity in scene prediction modules.

6. Experimental Results and Component Analysis

CdDrive achieves state-of-the-art results on established benchmarks:

Dataset Metric CdDrive ResAD WoTE ARTEMIS
NAVSIM v1 PDMS 89.2 88.6 88.3 87.0
NAVSIM v2 EPDMS 86.4 88.2*

Sub-metrics on NAVSIM v1: NC=98.7, DAC=97.5, EP=82.7, TTC=95.4, Comf=100.0. The unified candidate set (256 vocabulary + 256 diffusion) outperforms either candidate source alone (vocab only: PDMS=87.4; diffusion only: PDMS=88.6). HATNA produces a further improvement (+0.2 PDMS), specifically enhancing time-to-collision and comfort metrics.

Qualitative analysis shows that static anchors prevent over-correction in routine scenes, while diffusion candidates enable context-driven escape from the vocabulary in complex traffic and rare situations. HATNA specifically reduces “kinks” and nonsmooth artifacts in trajectories, especially at long horizon (Wu et al., 3 Feb 2026).

7. Summary and Impact

CdDrive demonstrates that a hybridized candidate set approach—static vocabulary anchors augmented by vocabulary-conditioned diffusion and regulated noise adaptation—addresses the trade-off between mode coverage and proposal quality in end-to-end autonomous driving. The unified scoring architecture ensures stable safety and comfort, while scene-driven candidate expansion enables robust handling of complex, interactive scenarios.

A plausible implication is that such a layered candidate architecture can generalize to other trajectory planning domains (e.g., robotics, multi-agent motion planning) where discrete proposal coverage and context adaptation must be balanced efficiently. CdDrive represents the current best-known approach as per PDMS and EPDMS metrics on NAVSIM benchmarks (Wu et al., 3 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CdDrive.