Papers
Topics
Authors
Recent
2000 character limit reached

3DGS Cross-Trajectory Curation

Updated 10 December 2025
  • 3DGS-based cross-trajectory data curation is a method that uses Gaussian Splatting to generate geometry-aligned view pairs from distinct camera trajectories.
  • It overcomes traditional dataset limitations by synthesizing lateral and longitudinal view pairs to reduce errors and provide richer supervision for 3D reconstruction.
  • The approach employs a robust algorithmic pipeline and mathematical framework that integrates multi-temporal and multi-trajectory data for improved video generation and scene updating.

3DGS-based cross-trajectory data curation refers to the process of leveraging 3D Gaussian Splatting (3DGS) scene representations to align, curate, and fuse visual data from distinct camera trajectories or capture sessions, thereby enabling scalable, richly supervised datasets for tasks such as controllable video generation, scene updating, and temporally consistent 3D reconstruction. This curation methodology addresses the limitations of traditional datasets—where true multi-trajectory ground-truth is often lacking—by systematically synthesizing or updating scene-consistent, geometry-aligned view pairs and robustly integrating information from both dense and sparse camera passes.

1. Motivation and Core Challenges

In autonomous driving and vision-based video generation, models require supervision from diverse camera motions, including non-longitudinal (lateral or side-shifted) viewpoints. However, canonical datasets such as Waymo Open Dataset and NuScenes typically provide only single traversals per scene, precluding the direct pairing of true cross-trajectory views. When limited to forward motion during training, camera-controllable video synthesis frameworks exhibit a train–test gap: they inadequately generalize to inference-time lateral camera offsets, often hallucinating inconsistent or physically implausible scene geometry. Approaches simulating new trajectories by temporal splits (early vs. late segments) fail to capture the full statistics of multi-directional motion, especially for the sideways translations required to inspect occluded or peripheral scene regions (Li et al., 3 Dec 2025).

Cross-trajectory curation using 3DGS overcomes these gaps by (1) enabling synthesis of plausible, geometry-aligned lateral viewpoint data from monocular sources and (2) supporting multi-trajectory scene updates using sparse-view priors, even across non-concurrent (temporally distinct) captures (Li et al., 3 Dec 2025, An et al., 29 Nov 2025).

2. Algorithmic Pipeline for Cross-Trajectory Pair Generation

Given a monocular sequence and its associated camera poses, the ReCamDriving framework (Li et al., 3 Dec 2025) implements a 3DGS-based pipeline to create paired data for supervised training:

  1. Scene Reconstruction: Optimize a scene-wide 3DGS using DriveStudio over the entire recorded trajectory, yielding a dense, artifact-prone splat representation.
  2. Underfitted Recorded-view Renderings: During 3DGS training, save snapshots at early iterations (100, 500, 1,000), rendering each along the original camera path to capture regularized, misfitted scene characteristics.
  3. Lateral-Offset Trajectory Synthesis: For each lateral offset Δp{±1,±2,±3,±4}\Delta p \in \{\pm1,\pm2,\pm3,\pm4\} meters along the vehicle's local left–right axis, shift the original trajectory and render the fully converged (30,000-iteration) 3DGS scene along these new paths:

Tlat(t)=Trec(t)[I3×3[Δp,0,0] 01]T_{\mathrm{lat}}(t) = T_{\mathrm{rec}}(t) \begin{bmatrix} I_{3\times3} & [\Delta p, 0, 0]^\top \ 0 & 1 \end{bmatrix}

  1. Pair Assembly: For each scene and lateral offset, assemble triplets:
    • Source: Lateral-offset 3DGS video (VgslatV_{\mathrm{gs}}^{\mathrm{lat}}) with structured artifacts.
    • Condition: Underfitted recorded-view rendering or relative pose (ΔT=Trec1Tlat\Delta T = T_{\mathrm{rec}}^{-1} T_{\mathrm{lat}}).
    • Target: Clean, real recorded video (VrecV_{\mathrm{rec}}).

This pairing protocol enforces geometric alignment naturally by common scene context and exposes the model to diverse, physically valid camera transformations.

3. Mathematical Framework and Losses

Precise mathematical formalism underpins 3DGS-based curation:

  • Camera Poses: KR3×3K \in \mathbb{R}^{3\times3} (intrinsics); Trec(t),Tlat(t)SE(3)T_{\mathrm{rec}}(t), T_{\mathrm{lat}}(t) \in SE(3) (extrinsics).
  • Relative Transform: ΔT(t)=Tlat(t)Trec(t)1\Delta T(t) = T_{\mathrm{lat}}(t) T_{\mathrm{rec}}(t)^{-1}, encoding the transformation from original to offset trajectory.
  • Rendering Operator: Render3DGS(scene,T):\mathrm{Render}_{3DGS}(\mathrm{scene}, T): maps the 3DGS model and camera pose to a video sequence VV.

During training, a flow-matching loss over latent interpolations guides the learning of a diffusion prior:

LFM=Ex0,x1,tϵθ(xt;ccam,t)(x1x0)2\mathcal{L}_{\mathrm{FM}} = \mathbb{E}_{x_0, x_1, t}\left\|\epsilon_\theta(x_t; c_{\mathrm{cam}}, t) - (x_1 - x_0)\right\|^2

where x0x_0 is a noisy latent from the source rendering, x1x_1 is a clean latent from the target video, xtx_t is the linear interpolation in latent space, and ccamc_{\mathrm{cam}} encodes camera conditioning signals (relative pose or 3DGS render).

No explicit geometric alignment loss is necessary; shared scene context ensures spatial correspondence.

4. ParaDrive: Construction and Content

The ParaDrive dataset (Li et al., 3 Dec 2025), constructed with this curation framework, consists of over 110,000 parallel-trajectory video pairs. Key properties:

  • Data Sources: 1,600 real traffic scenes from Waymo Open Dataset and NuScenes.
  • Offsets: 8 lateral positions (±{1,2,3,4}\pm\{1,2,3,4\} m).
  • Clips: 3 non-overlapping 121-frame segments per offset (front/middle/rear).
  • Snapshots: For each scene, underfitted 3DGS models are saved at 100, 500, and 1,000 iterations, plus the fully fitted model at 30,000 iterations.
  • Quality Control: Scenes with poor 3DGS convergence or inconsistent LiDAR overlap (for external benchmarking) are excluded; visual checks ensure photometric consistency.

The resulting dataset enables supervised learning of both forward and lateral camera motions, promoting robust camera control and structural fidelity in generated video.

5. Cross-Temporal 3DGS for Multi-Trajectory Scene Fusion

Beyond paired data creation, 3DGS-based cross-trajectory curation extends to the fusion and updating of multi-temporal or multi-pass captures (An et al., 29 Nov 2025):

Stages:

  1. Camera-Trajectory Alignment: Align all trajectories into a unified coordinate frame.
    • Coarse Fit: Minimize projection errors via similarity transformations (scale, rotation, translation).
    • ICP-based Refinement: Fine-tune alignment based on point-to-plane correspondences.
    • Pose-Graph Optimization: For multiple captures, enforce global consistency in pose estimates.
  2. Interference-Based Confidence Mapping: Use short adaptation of the prior 3DGS model to new views, compute a modified SSIM-based stability score for image patches, and generate confidence masks distinguishing static (unchanged) from dynamic (changed) regions.
  3. Progressive Optimization: Initialize the new 3DGS model from the prior, optimize using a composite loss (photometric, SSIM, geometric consistency, regularization), and iteratively expand the trusted region as more data become available.
  4. Gaussian Integration and Conflict Resolution: Merge or freeze Gaussians based on spatial and confidence thresholds—static priors remain locked, dynamic regions are permitted full optimization.
  5. Validation: Employ metrics such as PSNR, SSIM, and LPIPS to evaluate the reconstructed scene quality, demonstrating significant improvements in both efficiency and fidelity over baselines.

This suggests a unified approach to both dataset synthesis and persistent scene modeling in dynamic environments, leveraging the compositional properties of 3DGS.

6. Empirical Results and Data Efficiency

Comprehensive ablation studies validate the efficacy of 3DGS-based curation (Li et al., 3 Dec 2025, An et al., 29 Nov 2025):

Mode / Method Pose Error (RErr°/TErr m) FID FVD PSNR↑ SSIM↑ LPIPS↓ Time (s)
Longitudinal only 1.97 / 3.02 34.17 28.54
Lateral (cross-trajectory) 1.49 / 2.55 24.88 19.18
Baseline, 3DGS/no priors 15.87 0.683 0.447 160
Ours (Full Pipeline) 23.89 0.864 0.239 310

Notable findings:

  • Lateral cross-trajectory pairs reduce both rotational and translational error, and significantly decrease distributional shift (FID/FVD reduction of ≈30%).
  • Robustness to large offsets: As requested camera offsets grow, models trained without lateral examples (longitudinally) exhibit >3× degradation in FID/FVD, while those trained with cross-trajectory curation maintain much more stable metrics (≈1.5× FID increase).
  • Cross-temporal 3DGS curation can reconstruct high-fidelity scenes from as few as 4 new sparse views, reusing priors to cut training time 40–50% and increase PSNR by 7–8 dB (An et al., 29 Nov 2025).

7. Broader Implications and Integration in Scene Modeling Pipelines

3DGS-based cross-trajectory curation is foundational for scalable supervision in vision-centric driving data, video generation, large-scale scene documentation, and temporal digital twins. By operationalizing geometry-aware alignment and multi-trajectory fusion, it enables:

  • Systematic elimination of train–test gaps in view synthesis tasks.
  • Data-efficient scene updating across time, capturing both stable and dynamic regions without requiring dense ground-truth.
  • Modular integration with downstream diffusion priors and generative architectures, which benefit from well-curated, geometrically regularized input–target pairs.

A plausible implication is that as camera-controlled generative models further scale, 3DGS-powered curation will form the backbone of future benchmarks and production systems in both research and industry for structured, controllable scene understanding and synthesis (Li et al., 3 Dec 2025, An et al., 29 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to 3DGS-Based Cross-Trajectory Data Curation.