Papers
Topics
Authors
Recent
2000 character limit reached

T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory (2403.10052v1)

Published 15 Mar 2024 in cs.CV

Abstract: Trajectory prediction is a challenging problem that requires considering interactions among multiple actors and the surrounding environment. While data-driven approaches have been used to address this complex problem, they suffer from unreliable predictions under distribution shifts during test time. Accordingly, several online learning methods have been proposed using regression loss from the ground truth of observed data leveraging the auto-labeling nature of trajectory prediction task. We mainly tackle the following two issues. First, previous works underfit and overfit as they only optimize the last layer of the motion decoder. To this end, we employ the masked autoencoder (MAE) for representation learning to encourage complex interaction modeling in shifted test distribution for updating deeper layers. Second, utilizing the sequential nature of driving data, we propose an actor-specific token memory that enables the test-time learning of actor-wise motion characteristics. Our proposed method has been validated across various challenging cross-dataset distribution shift scenarios including nuScenes, Lyft, Waymo, and Interaction. Our method surpasses the performance of existing state-of-the-art online learning methods in terms of both prediction accuracy and computational efficiency. The code is available at https://github.com/daeheepark/T4P.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Adapt: Efficient multi-agent trajectory prediction with adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8295–8305, 2023.
  2. Non-probability sampling network for stochastic human trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6477–6487, 2022.
  3. Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10017–10029, 2023.
  4. Limitations of post-hoc feature alignment for robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2525–2533, 2021.
  5. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  6. Traphic: Trajectory prediction in dense and heterogeneous traffic using weighted interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8483–8492, 2019.
  7. Contrastive test-time adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 295–305, 2022a.
  8. Traj-mae: Masked autoencoders for trajectory prediction. arXiv preprint arXiv:2303.06697, 2023a.
  9. Improved test-time adaptation for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24172–24182, 2023b.
  10. Multimodal pedestrian trajectory prediction using probabilistic proposal network. IEEE Transactions on Circuits and Systems for Video Technology, 2022b.
  11. Trajectoryformer: 3d object tracking transformer with predictive trajectory hypotheses. arXiv preprint arXiv:2306.05888, 2023c.
  12. Scept: Scene-consistent, policy-based trajectory predictions for planning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17103–17112, 2022c.
  13. Forecast-mae: Self-supervised pre-training for motion forecasting with masked autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8679–8689, 2023.
  14. Promptstyler: Prompt-driven style generation for source-free domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15702–15712, 2023.
  15. Hierarchical latent structure for multi-modal vehicle trajectory forecasting. In European Conference on Computer Vision, pages 129–145. Springer, 2022.
  16. R-pred: Two-stage motion prediction via tube-query attention-based trajectory refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8525–8535, 2023.
  17. Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In 2019 International Conference on Robotics and Automation (ICRA), pages 2090–2096. IEEE, 2019.
  18. Source-free domain adaptation via distribution estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7212–7222, 2022.
  19. Uncertainty-aware short-term motion prediction of traffic actors for autonomous driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2095–2104, 2020.
  20. Sparse instance conditioned multimodal trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9763–9772, 2023.
  21. Large scale interactive motion forecasting for autonomous driving: The waymo open motion dataset. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9710–9719, 2021.
  22. Francois Fleuret et al. Uncertainty reduction for model adaptation in semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9613–9623, 2021.
  23. Test-time training with masked autoencoders. Advances in Neural Information Processing Systems, 35:29374–29385, 2022.
  24. THOMAS: Trajectory heatmap output with learned multi-agent sampling. In International Conference on Learning Representations, 2022a.
  25. Uncertainty estimation for cross-dataset performance in trajectory prediction. arXiv preprint arXiv:2205.07310, 2022b.
  26. Latent variable sequential set transformers for joint multi-agent motion prediction. In International Conference on Learning Representations, 2022.
  27. NOTE: Robust continual test-time adaptation against temporal correlation. In Advances in Neural Information Processing Systems, 2022.
  28. Vip3d: End-to-end visual trajectory prediction via 3d agent queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5496–5506, 2023.
  29. Multiple choice learning: Learning to produce multiple structured outputs. Advances in neural information processing systems, 25, 2012.
  30. One thousand and one hours: Self-driving motion prediction dataset. In Conference on Robot Learning, pages 409–418. PMLR, 2021.
  31. Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853–17862, 2023.
  32. Cross-domain trajectory prediction with ctp-net, 2022.
  33. Aol: Adaptive online learning for human trajectory prediction in dynamic video scenes. arXiv preprint arXiv:2002.06666, 2020.
  34. Online adaptive temporal memory with certainty estimation for human trajectory prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 940–949, 2023.
  35. Selective experience replay for lifelong learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  36. Expanding the deployment envelope of behavior prediction via adaptive meta-learning. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7786–7793. IEEE, 2023a.
  37. trajdata: A unified interface to multiple human trajectory datasets. In Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks, New Orleans, USA, 2023b.
  38. Motiondiffuser: Controllable multi-agent motion prediction using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9644–9653, 2023.
  39. Semi-supervised semantics-guided adversarial training for robust trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8207–8217, 2023.
  40. Ev-tta: Test-time adaptation for event-based object recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17745–17754, 2022.
  41. Muse-vae: multi-scale vae for environment-aware long term trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2221–2230, 2022.
  42. Graph-based spatial transformer with memory replay for multi-future pedestrian trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2231–2241, 2022a.
  43. Online multi-agent forecasting with interpretable collaborative graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2022b.
  44. Modar: Using motion forecasting for 3d object detection in point cloud sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9329–9339, 2023a.
  45. On the robustness of open-world test-time training: Self-training with dynamic prototype expansion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11836–11846, 2023b.
  46. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International conference on machine learning, pages 6028–6039. PMLR, 2020a.
  47. A comprehensive survey on test-time adaptation under distribution shifts. arXiv preprint arXiv:2303.15361, 2023.
  48. Learning lane graph representations for motion forecasting. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 541–556. Springer, 2020b.
  49. Ttn: A domain-shift aware batch normalization in test-time adaptation. In The Eleventh International Conference on Learning Representations, 2022.
  50. Ttt++: When does self-supervised test-time training fail or thrive? Advances in Neural Information Processing Systems, 34:21808–21820, 2021.
  51. Towards robust and adaptive motion forecasting: A causal representation perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17081–17092, 2022.
  52. Leapfrog diffusion model for stochastic trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5517–5526, 2023.
  53. The norm must go on: Dynamic unsupervised domain adaptation by normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14765–14775, 2022.
  54. Mate: Masked autoencoders are online 3d test-time learners. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16709–16718, 2023.
  55. Scene transformer: A unified architecture for predicting future trajectories of multiple agents. In International Conference on Learning Representations, 2022.
  56. Improving transferability for cross-domain trajectory prediction via neural stochastic differential equation. arXiv preprint arXiv:2312.15906, 2023a.
  57. Leveraging future relationship reasoning for vehicle trajectory prediction. In The Eleventh International Conference on Learning Representations, 2023b.
  58. Learn tarot with mentor: A meta-learned self-supervised approach for trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8384–8393, 2023.
  59. Experience replay for continual learning. Advances in neural information processing systems, 32, 2019.
  60. Fjmp: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13745–13755, 2023.
  61. Improving robustness against common corruptions by covariate shift adaptation. Advances in neural information processing systems, 33:11539–11551, 2020.
  62. Representing multimodal behaviors with mean location for pedestrian trajectory prediction. IEEE transactions on pattern analysis and machine intelligence, 2023a.
  63. Trajectory unified transformer for pedestrian trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9675–9684, 2023b.
  64. Three steps to multimodal trajectory prediction: Modality clustering, classification and synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13250–13259, 2021.
  65. Test-time training with self-supervision for generalization under distribution shifts. In Proceedings of the 37th International Conference on Machine Learning, pages 9229–9248. PMLR, 2020.
  66. Tesla: Test-time self-learning with automatic adversarial augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20341–20350, 2023.
  67. Social-ssl: Self-supervised cross-sequence representation learning based on transformers for multi-agent trajectory prediction. In European Conference on Computer Vision, pages 234–250. Springer, 2022.
  68. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  69. Atpfl: Automatic trajectory prediction model design under federated learning framework. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6563–6572, 2022a.
  70. Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations, 2021.
  71. Improving movement predictions of traffic actors in bird’s-eye view models using gans and differentiable trajectory rasterization. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2340–2348, 2020.
  72. Ltp: Lane-based trajectory prediction for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17134–17142, 2022b.
  73. Online adaptation of neural network models by modified extended kalman filter for customizable and transferable driving behavior prediction, 2022c.
  74. Transferable and adaptable driving behavior prediction. arXiv preprint arXiv:2202.05140, 2022d.
  75. Continual test-time domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7201–7211, 2022e.
  76. Bridging the gap: Improving domain generalization in trajectory prediction. IEEE Transactions on Intelligent Vehicles, 2023.
  77. Pretram: Self-supervised pre-training via connecting trajectory and map. In European Conference on Computer Vision, pages 34–50. Springer, 2022a.
  78. Eqmotion: Equivariant multi-agent motion prediction with invariant interaction reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1410–1420, 2023a.
  79. Socialvae: Human trajectory prediction using timewise latents. In European Conference on Computer Vision, pages 511–528. Springer, 2022b.
  80. Adaptive trajectory prediction via transferable gnn. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6520–6531, 2022c.
  81. Uncovering the missing pattern: Unified framework towards trajectory imputation and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9632–9643, 2023b.
  82. Int2: Interactive trajectory prediction at intersections. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8536–8547, 2023.
  83. Test-time batch normalization. arXiv preprint arXiv:2205.10210, 2022.
  84. Improving the generalizability of trajectory prediction models with frenét-based domain normalization. 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11562–11568, 2023.
  85. Human trajectory prediction via neural social physics. In European Conference on Computer Vision, pages 376–394. Springer, 2022.
  86. Interaction dataset: An international, adversarial and cooperative motion dataset in interactive driving scenarios with semantic maps. arXiv preprint arXiv:1910.03088, 2019.
  87. Memo: Test time robustness via adaptation and augmentation. Advances in Neural Information Processing Systems, 35:38629–38642, 2022.
  88. Deep stable learning for out-of-distribution generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5372–5382, 2021.
  89. Query-centric trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17863–17873, 2023.
  90. Ipcc-tp: Utilizing incremental pearson correlation coefficient for joint multi-agent trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5507–5516, 2023a.
  91. Biff: Bi-level future fusion with polyline-based coordinate for interactive trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 8260–8271, 2023b.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.