Papers
Topics
Authors
Recent
Search
2000 character limit reached

TrajectoryNAS: Lidar Trajectory Prediction

Updated 26 January 2026
  • TrajectoryNAS is a NAS framework designed for Lidar-based object detection, tracking, and multi-step trajectory forecasting in autonomous driving.
  • It uses multi-objective simulated annealing to optimize prediction accuracy and computational latency, as validated on benchmarks like nuScenes.
  • Integrating end-to-end multi-task training and latency-aware design, TrajectoryNAS demonstrates significant improvements over conventional baselines.

TrajectoryNAS is a neural architecture search (NAS) method tailored for trajectory prediction from 3D Lidar point-cloud data, specifically designed for autonomous driving scenarios. It automates the end-to-end design of models that perform object detection, tracking, and multi-step trajectory forecasting in a unified manner. TrajectoryNAS leverages a multi-objective search strategy to optimize both prediction quality and computational latency. The framework demonstrates improved accuracy and efficiency over previous end-to-end baselines, as shown through extensive empirical evaluation on large-scale benchmarks such as the nuScenes dataset (Sharifi et al., 2024).

1. Formulation of Lidar-Based Trajectory Prediction

The fundamental task addresses the prediction of future trajectories for dynamic agents (e.g., cars, pedestrians) detected in Lidar point-cloud sequences. Given a temporally ordered set of point-cloud frames, P={P1,P2,,PTobs}P = \{P^1, P^2, \ldots, P^{T_{\rm obs}}\}, where Pt={piti=1Nt}P^t = \{p_i^t \mid i=1\ldots N^t\} and pitR3p_i^t \in \mathbb{R}^3, the system detects MM agents. Each agent jj at time tt is represented by its state vector

sjt=[xjt,yjt,zjt,lj,wj,hj,θjt,vjt]s_j^t = [x_j^t, y_j^t, z_j^t, l_j, w_j, h_j, \theta_j^t, v_j^t]

including its spatial location, bounding box dimensions, orientation, and speed. The goal is to predict future states

S^j={s^jt+1,s^jt+2,,s^jt+Tpred}\hat{S}_j = \{\hat{s}_j^{t+1}, \hat{s}_j^{t+2}, \ldots, \hat{s}_j^{t+T_{\text{pred}}}\}

that closely approximate the true sequence SjS_j. Standard evaluation metrics are Average Displacement Error (ADE) and Final Displacement Error (FDE).

2. Architecture Search Space and Objective Function

TrajectoryNAS constructs models on top of a 3D backbone (VoxelNet followed by Sparse Feature Pyramid Network, FPN). The search space includes:

  • Region Proposal Network (RPN) layers, drawn from {\{sparse-3D-convolution, point-convolution, self-attention, MLP}\}.
  • Five prediction heads: Velocity, Rotation, Dimension, Regression, and Height. Each head is a small 2D CNN, where both the layer types and the channel widths are subject to search.

Architectural choices are encoded as discrete parameters α\alpha (operation selection per layer) and continuous width parameters β\beta (channels per layer). The compounded search space encompasses approximately 23002^{300} candidate architectures due to supporting detection, tracking, and forecasting jointly.

The multi-objective search seeks parameters (α,β)(\alpha^*, \beta^*) that minimize an energy function EE balancing prediction quality and runtime latency. TrajectoryNAS employs a multiplicative objective:

E(α,β)=Latency(α,β)×mAP(α,β)α×ADE(α,β)β×FDE(α,β)γE(\alpha, \beta) = \text{Latency}(\alpha, \beta) \times \text{mAP}(\alpha, \beta)^{\alpha'} \times \text{ADE}(\alpha, \beta)^{\beta'} \times \text{FDE}(\alpha, \beta)^{\gamma'}

where mAP denotes mean Average Precision for future locations, and (α,β,γ)(\alpha',\beta',\gamma') are user-defined weights controlling emphasis on each metric.

3. Search Strategy: Multi-Objective Simulated Annealing

TrajectoryNAS employs Multi-Objective Simulated Annealing (MOSA) as its search algorithm. At each iteration kk, given a current architecture (αk,βk)(\alpha_k,\beta_k) and temperature TkT_k, a neighboring architecture (α,β)(\alpha',\beta') is sampled by mutating either the operation for a layer or its channel width. The acceptance probability for a candidate is:

Paccept=min(1,exp(E(α,β)E(αk,βk)Tk))P_{\text{accept}} = \min\left(1, \exp\left(-\frac{E(\alpha',\beta') - E(\alpha_k,\beta_k)}{T_k}\right)\right)

Temperature is decreased via Tk+1=rTkT_{k+1} = rT_k, with r<1r<1, annealing from TmaxT_\text{max} to TminT_\text{min} to balance exploration and exploitation over the search iterations. This approach enables efficient navigation of the expansive, discrete search space.

4. End-to-End Multi-Task Training

For each candidate architecture, training is conducted end-to-end using a lightweight subset of nuScenes, with direct multi-task supervision. The composite training loss is:

Ltotal=λdetLdet+λtrackLtrack+λpredLpredL_{\text{total}} = \lambda_{\text{det}} L_{\text{det}} + \lambda_{\text{track}} L_{\text{track}} + \lambda_{\text{pred}} L_{\text{pred}}

where:

  • LdetL_{\text{det}} is the standard object detection loss
  • LtrackL_{\text{track}} enforces temporal ID consistency
  • LpredL_{\text{pred}} is an 2\ell_2 trajectory prediction loss

Assigning weights (λdet,λtrack,λpred)(\lambda_{\text{det}}, \lambda_{\text{track}}, \lambda_{\text{pred}}), the system balances detection, tracking, and forecasting efficacy throughout the search phase.

5. Latency-Aware Optimization and Measurement

To ensure practical deployability, TrajectoryNAS incorporates real-world latency as a first-class metric. Unlike prior work relying on indirect measures (e.g., FLOPs), latency is taken as the observed average inference time over thousands of samples on an NVIDIA RTX A4000 GPU. The energy objective's multiplicative form ensures that efficient architectures with marginally lower prediction quality may be favored if they achieve substantially better latency.

6. Experimental Validation and Comparative Performance

Empirical evaluation on the nuScenes benchmark demonstrates significant advances over prior work. TrajectoryNAS attains at least a +4.8%+4.8\% gain in mAPf_f and a 1.1×1.1\times speedup in inference time relative to the end-to-end FutureDet baseline:

  • Cars (K=5 predictions):
    • FutureDet: mAPf_f = 35.6%35.6\%, latency = 24 ms
    • TrajectoryNAS: mAPf_f = 36.2%36.2\% (+1.7%)(+1.7\%), latency = 22 ms (8%)(\approx-8\%)
  • Pedestrians (K=5 predictions):
    • FutureDet: mAPf_f = 28.0%28.0\%, latency = 24 ms
    • TrajectoryNAS: mAPf_f = 32.5%32.5\% (+16%)(+16\%), latency = 22 ms

Ablative experiments show that replacing MOSA with Random Search or simple Local Search increases the best achieved energy minimum from $0.113$ (MOSA) to approximately $0.19$ and $0.186$, respectively. Adjusting weights (α,β,γ)(\alpha', \beta', \gamma') in the energy function demonstrates that increasing the weight on FDE prioritizes reducing worst-case displacement error at the expense of minor latency increases.

7. Architectural Insights, Limitations, and Extension Opportunities

Analysis of architectures discovered by TrajectoryNAS reveals critical design patterns:

  • Shallow 3D-to-2D heads in the RPN effectively reduce latency with negligible detection loss.
  • Cross-attention layers within the forecasting heads yield mAP gains (up to 0.5%0.5\% for non-linear trajectories) over conventional 2D convolutions.
  • Rotation and Dimension prediction heads tend to benefit from increased channel width, highlighting the importance of precise, high-capacity estimation for accurate placement.

Current limitations include the exclusion of map-based features (e.g., HD-maps), which are effective in structured road environments. The search process itself is computationally expensive (1.2\sim1.2K GPU-hours for the 300-layer space), but further improvements such as weight-sharing or using a learned predictor for E(α)E(\alpha) are proposed for acceleration. Integrating multi-sensor fusion (camera+lidar+radar) is recognized as a potential extension, increasing representational richness at the cost of larger search space complexity.

In summary, TrajectoryNAS establishes a unified, latency-aware neural architecture search framework for Lidar-based trajectory prediction, achieving robust accuracy and efficient deployment for autonomous driving systems (Sharifi et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TrajectoryNAS.