Pseudo Ground-Truth Trajectories

Updated 28 December 2025

Pseudo ground-truth trajectories are surrogate trajectory annotations generated via algorithmic, probabilistic, and sensor-fusion methods, offering effective substitutes when canonical ground truth is unavailable.
They use techniques like SfM, SLAM, EKF filtering, and latent code distillation to create analytically tractable and physically plausible trajectory estimates for training and evaluation.
These trajectories are pivotal in applications such as camera re-localisation, human pose estimation, and pedestrian trajectory prediction, enabling scalable and robust model development.

Pseudo ground-truth trajectories are surrogate or estimated trajectory annotations generated through algorithmic, self-supervised, or sensor-fusion methods rather than by direct physical measurement or canonical ground truth. These trajectories serve, either directly or indirectly, both as learning targets during model training and as baselines for evaluation in domains where canonical ground truth is infeasible, expensive, or ill-defined. Their construction spans probabilistic modeling, deep latent structure, sensor augmentation, and estimator-based smoothing, and they are a pivotal ingredient in contemporary trajectory prediction, pose estimation, and scene reconstruction workloads.

1. Methodological Foundations and Variants

Pseudo ground-truth (pGT) trajectories emerge wherever accurate ground-truth annotations are unavailable or prohibitively costly. The instantiation of pGT falls broadly into four methodological families:

A. Reference Pipeline Generation

In camera re-localisation and robotics, pseudo ground-truth trajectories are generated by established reference algorithms including Structure-from-Motion (SfM) and RGB-D SLAM. The outputs of these reference pipelines are then used as quasi-absolute benchmarks for further algorithm evaluation (Brachmann et al., 2021). For example:

SfM-based pGT: Inter-image feature matching, triangulation, and bundle adjustment yield camera pose sequences, later aligned to scale.
SLAM-based pGT: Depth fusion and tracking (e.g., KinectFusion, BundleFusion) are used to generate dense pose or map trajectories, often with global consistency.

B. Probabilistic and Synthetic Approaches

Synthetic ground-truth distributions over trajectories are generated using structured probabilistic models (e.g., Gaussian process Bézier mixtures), as in probabilistic Bézier curves for multi-step trajectory prediction (Hug et al., 2024). Such methods yield fully known distributions—including all marginals and conditionals—enabling ground-truth evaluation of predictors using distributional metrics such as Wasserstein distance.

C. Self-Supervised and Estimator-Based Trajectories

Filtering, estimator smoothing, or self-supervised dynamics propagate uncertain observations into smooth pGT labels. For instance, EKF-style state estimation on tracked objects’ positions and velocities generates high-quality pseudo-ground truth for downstream learning, circumventing the need for curated labels (Kliniewski et al., 13 Feb 2025).

D. Learned or Latent Code Distillation

Advanced inference schemes, such as the Pseudo Oracle Predictor (POP) in trajectory deep learning and bioimpedance-augmented pose estimation, employ deep networks or sensor fusion to distill latent variables (or correct physically implausible estimates) into trajectory pseudo-labels. These can incorporate future information during training, or exploit external sensing (e.g., bioimpedance for contact detection) to refine initial estimates (Yang et al., 2020, Forte et al., 4 Dec 2025).

2. Mathematical Formulations and Key Constructs

The following table illustrates the principal mathematical definitions for pseudo ground-truth trajectories across key domains:

Method	Core Mathematical Construct	Reference
SfM SLAM pGT	$\min \sum_{i,j}\delta_{ij}\rho(\\|x_{ij} - \pi(R_iX_j + t_i)\\|^2)$	(Brachmann et al., 2021)
Probabilistic Bézier	$P_l \sim \mathcal{N}(\mu_l, \Sigma_l),\; X_t \sim \mathcal{N}(\mu_{\mathcal{P}}(t), \Sigma_{\mathcal{P}}(t))$	(Hug et al., 2024)
EKF/State Est.	$x_{k+1} = f(x_k, u_k) + w_k,\;z_k = h(x_k) + v_k$	(Kliniewski et al., 13 Feb 2025)
Self-supervised	Pseudo-labels from finite differences, e.g., $\hat{v} = \hat{p}_{t+1} - \hat{p}_t$	(Huang et al., 31 Mar 2025)
Oracle Latent	$z_i = [\epsilon_{\hat{\mu}_i^k, \hat{\sigma}_i^k}\\|\ldots]$ (train); $z_i = [\epsilon_{\mu_i^k, \sigma_i^k}\\|\ldots]$ (test)	(Yang et al., 2020)
Bioimpedance Fusion	$\mathcal{L}(\theta) = \mathcal{L}_{2D} + \lambda_{reg} \mathcal{L}_{reg} + \lambda_{cont}(\mathcal{L}_{prox} + \mathcal{L}_{consist} + \mathcal{L}_{inter})$	(Forte et al., 4 Dec 2025)

These approaches provide either surrogate target values, analytic posteriors, or latent code distillations, constituting the operative “pseudo ground-truth trajectory”.

3. Application Domains

Pseudo ground-truth trajectories are entrenched in a wide spectrum of computer vision and robotics applications:

Visual Camera Re-localisation: Benchmarking of 6DoF pose methods relies on SfM or SLAM output as pGT. These are constructed via feature extraction, track formation, bundle adjustment (SfM), or dense voxel fusion and depth alignment (SLAM) (Brachmann et al., 2021).
Human Pose & Motion Capture: Large-scale datasets for 3D pose estimation leverage video-based pose estimators refined through physical sensors (bioimpedance, inertial) to provide robust pGT, especially for contact-rich scenarios (Forte et al., 4 Dec 2025).
Trajectory Prediction (Pedestrian, Vehicle): Self-supervised frameworks induce pseudo-labels via finite differences of predicted trajectories, enforcing physical motion consistency even absent ground-truth future samples (Huang et al., 31 Mar 2025, Yang et al., 2020).
Synthetic Dataset Generation: Probabilistic Bézier mixtures model the entire distribution of plausible futures, facilitating both training and rigorous evaluation of probabilistic predictors (Hug et al., 2024).
Hypernetwork Training: Estimation of pseudo weight trajectories replaces sample-wise target regressions in hypernetwork-based model adaptors (Hedlin et al., 2024).

4. Construction Protocols and Algorithmic Details

Procedures for pseudo ground-truth trajectory generation are highly domain-dependent:

Pipeline-based pGT: Image features or depth data are processed by reference optimizers (SfM, SLAM variants), typically with additional scale alignment or global bundle adjustment to mitigate drift and ambiguity (Brachmann et al., 2021).
Probabilistic Synthetic Generation: Control points of Bézier trajectories are independently drawn from Gaussian distributions; composite segments and mixture components produce known, analytically tractable trajectory distributions (Hug et al., 2024).
Filter-Based Estimator pGT: State estimators (EKF, factor-graph smoothers) absorb noisy sensor readings and delta-kinematics, parameterized by process (Q) and measurement (R) noise, with hyperparameter adjustment to tune the tradeoff between smoothness and responsiveness (Kliniewski et al., 13 Feb 2025).
Learned Self-supervision: Multi-stream models infer pseudo-velocities and pseudo-accelerations from position predictions; cross-stream consistency losses then act as indirect supervision (Huang et al., 31 Mar 2025).
Oracle Latent Distillation: During training, a ground-truth branch encodes real future behaviors into a latent (e.g., POP), which is then predicted only from history at test time by the observer branch, using explicit KL alignment (Yang et al., 2020).
Sensor Fusion Refinement: For scenes with ground-truth-deficient modalities (e.g., hands in contact), wearable sensors (bioimpedance) provide event triggers that selectively optimize pose estimates to achieve contact-aware fidelity, enforcing several constraint losses and leveraging masked parameter updates (Forte et al., 4 Dec 2025).

5. Evaluation Metrics and Limitations

The evaluation of models using pseudo ground-truth trajectories generally adopts metrics that are sensitive to the quality and type of surrogate trajectory:

Metrics: Absolute displacement error (ADE), final displacement error (FDE), negative log-likelihood (NLL), Wasserstein distance (closed-form or sliced), rotation/translation error for pose estimates, and new metrics such as absolute consistency error (ACE) and dense correspondence reprojection error (DCRE) (Hug et al., 2024, Kliniewski et al., 13 Feb 2025, Brachmann et al., 2021).
Bias and Overfitting: pGT inherits the biases of the generation algorithm—rankings can invert under different pGT references, and “fidelity to the reference pipeline” may not equate to physical or task accuracy. Empirical studies show methods can overfit to SLAM or SfM-specific error modes (Brachmann et al., 2021).
Consistency and Robustness: Recommendations include providing multiple pGT variants per benchmark, thresholding for pGT noise, publishing uncertainty estimates, and complementing classical error metrics with downstream success measures (Brachmann et al., 2021).

6. Empirical Benefits, Limitations, and Future Directions

Pseudo ground-truth trajectory approaches confer several practical and scientific benefits but also present caveats:

Scalability: pGT enables large-scale learning in domains with no physically annotated ground truth (e.g., in-the-wild pose, dynamic scenes, rare behaviors), supporting efficient transfer to novel applications (Forte et al., 4 Dec 2025).
Distributional Evaluation: Synthetic ground-truth with full probabilistic structure enables model comparison using metrics sensitive to both means and variances (e.g., Wasserstein), not only pointwise alignment (Hug et al., 2024).
Generalization and Physical Plausibility: Learned pseudo-ground-truth frameworks (e.g., multi-modal latent codes, self-consistency losses) can capture short-term intent and discontinuities (abrupt turns, contacts) that standard predictors miss (Yang et al., 2020, Huang et al., 31 Mar 2025).
Physical Sensing and Correction: Hybrid sensor fusion (bioimpedance, inertial, audio) bridges the gap between vision-based estimates and hard physical constraints, yielding pGT sequences that are both visually and physically consistent (Forte et al., 4 Dec 2025).
Known Limitations: The validity of pGT is not absolute—its use for ranking or progress measurement is conditional upon awareness of reference pipeline bias, application context, and the statistical properties of the pseudo-label set.

Future efforts include further integration of physical sensor data for broad capture (e.g., contact, force, proprioception), domain-adaptive surrogate construction, multi-source uncertainty quantification, and the development of reference-free, self-consistent evaluation paradigms.

7. Representative Use Cases

To illustrate the diversity and typical protocols in the field, the following table summarizes representative use cases:

Application	pGT Source	Methodological Notes	Reference
Camera Relocalisation	SfM / SLAM	Benchmarking, pipeline-specific bias	(Brachmann et al., 2021)
Human Mesh Contact	Vision + Bioimpedance	Masked optimization during sensor-triggered frames	(Forte et al., 4 Dec 2025)
Pedestrian Trajectory	Self-supervised	Hierarchical streams, motion-consistency losses	(Huang et al., 31 Mar 2025)
Distribution Evaluation	Probabilistic Bézier	Analytic GMM over entire trajectories	(Hug et al., 2024)
Deep Latent Prediction	POP, LSTM-KL	Future intent distilled to history-only latent	(Yang et al., 2020)
Trajectory via Filtering	EKF / Factor-Graph	Noisy sensor smoothing, outlier suppression	(Kliniewski et al., 13 Feb 2025)
Hypernetwork Training	Grad-matching	Weight trajectory field in weight space	(Hedlin et al., 2024)

The central concept uniting these approaches is the deliberate, context-specific construction of surrogate trajectories with known statistical, physical, or algorithmic properties, which serve as learning targets or evaluation points wherever canonical ground truth is absent.