Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries (2208.01582v3)

Published 2 Aug 2022 in cs.CV and cs.RO

Abstract: Perception and prediction are two separate modules in the existing autonomous driving systems. They interact with each other via hand-picked features such as agent bounding boxes and trajectories. Due to this separation, prediction, as a downstream module, only receives limited information from the perception module. To make matters worse, errors from the perception modules can propagate and accumulate, adversely affecting the prediction results. In this work, we propose ViP3D, a query-based visual trajectory prediction pipeline that exploits rich information from raw videos to directly predict future trajectories of agents in a scene. ViP3D employs sparse agent queries to detect, track, and predict throughout the pipeline, making it the first fully differentiable vision-based trajectory prediction approach. Instead of using historical feature maps and trajectories, useful information from previous timestamps is encoded in agent queries, which makes ViP3D a concise streaming prediction method. Furthermore, extensive experimental results on the nuScenes dataset show the strong vision-based prediction performance of ViP3D over traditional pipelines and previous end-to-end models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. End-to-end object detection with transformers. In ECCV, 2020.
  2. Implicit latent variable model for scene-consistent motion forecasting. In ECCV, 2020.
  3. Intentnet: Learning to predict intention from raw sensor data. In Conference on Robot Learning, pages 947–956. PMLR, 2018.
  4. Deft: Detection embeddings for tracking. arXiv preprint arXiv:2102.02267, 2021.
  5. Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449, 2019.
  6. Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8748–8757, 2019.
  7. Hierarchical latent structure for multi-modal vehicle trajectory forecasting. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, pages 129–145. Springer, 2022.
  8. Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In 2019 International Conference on Robotics and Automation (ICRA), pages 2090–2096. IEEE, 2019.
  9. Kingma Da. A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  10. Multi-modal trajectory prediction of surrounding vehicles with maneuver based lstms. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 1179–1184. IEEE, 2018.
  11. Tpnet: Trajectory proposal network for motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6797–6806, 2020.
  12. Vectornet: Encoding hd maps and agent dynamics from vectorized representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11525–11533, 2020.
  13. Gohome: Graph-oriented heatmap output for future motion estimation. arXiv preprint arXiv:2109.01827, 2021.
  14. Home: Heatmap output for future motion estimation. arXiv preprint arXiv:2105.10968, 2021.
  15. Thomas: Trajectory heatmap output with learned multi-agent sampling. In International Conference on Learning Representations, 2021.
  16. Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15303–15312, 2021.
  17. Social gan: Socially acceptable trajectories with generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2255–2264, 2018.
  18. Deep Residual Learning for Image Recognition. In CVPR, pages 770–778, 2016.
  19. Rules of the road: Predicting driving behavior with a convolutional model of semantic interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8454–8462, 2019.
  20. Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15273–15282, 2021.
  21. Monocular quasi-dense 3d object tracking. arXiv preprint arXiv:2103.07351, 2021.
  22. Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
  23. Multi-agent trajectory prediction by combining egocentric and allocentric views. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 1434–1443. PMLR, 08–11 Nov 2022.
  24. Polarformer: Multi-camera 3d object detection with polar transformers. arXiv preprint arXiv:2206.15398, 2022.
  25. LaPred: Lane-aware prediction of multi-modal future trajectories of dynamic agents. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14636–14645, 2021.
  26. PointPillars: Fast Encoders for Object Detection from Point Clouds. In CVPR, pages 12697–12705, 2019.
  27. Desire: Distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 336–345, 2017.
  28. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022.
  29. Learning lane graph representations for motion forecasting. In European Conference on Computer Vision, pages 541–556. Springer, 2020.
  30. Pnpnet: End-to-end perception and prediction with tracking in the loop. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11553–11562, 2020.
  31. Feature Pyramid Networks for Object Detection. In CVPR, pages 2117–2125, 2017.
  32. Petr: Position embedding transformation for multi-view 3d object detection. arXiv preprint arXiv:2203.05625, 2022.
  33. Petrv2: A unified framework for 3d perception from multi-camera images. arXiv preprint arXiv:2206.01256, 2022.
  34. Fast and furious: Real time end-to-end 3d detection, tracking and motion forecasting with a single convolutional net. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 3569–3577, 2018.
  35. Divide-and-conquer for lane-aware diverse trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15799–15808, 2021.
  36. Wayformer: Motion forecasting via simple & efficient attention networks. arXiv preprint arXiv:2207.05844, 2022.
  37. Scene transformer: A unified multi-task model for behavior prediction and planning. arXiv preprint arXiv:2106.08417, 2021.
  38. Simpletrack: Understanding and rethinking 3d multi-object tracking. arXiv preprint arXiv:2111.09621, 2021.
  39. Covernet: Multimodal behavior prediction using trajectory sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14074–14083, 2020.
  40. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In European Conference on Computer Vision, pages 194–210. Springer, 2020.
  41. Deep multi-task learning for joint localization, perception, and prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4679–4689, June 2021.
  42. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In CVPR, pages 652–660, 2017.
  43. R2p2: A reparameterized pushforward policy for diverse, precise generative path forecasting. In Proceedings of the European Conference on Computer Vision (ECCV), pages 772–788, 2018.
  44. Learning in an uncertain world: Representing ambiguity through multiple hypotheses. In Proceedings of the IEEE international conference on computer vision, pages 3591–3600, 2017.
  45. Mono-camera 3d multi-object tracking using deep learning detections and pmbm filtering. In 2018 IEEE Intelligent Vehicles Symposium (IV), pages 433–440. IEEE, 2018.
  46. Disentangling monocular 3d object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1991–1999, 2019.
  47. Learning to predict vehicle trajectories with model-based planning. arXiv preprint arXiv:2103.04027, 2021.
  48. Stochastic prediction of multi-agent interactions from partial observations. arXiv preprint arXiv:1902.09641, 2019.
  49. Multiple futures prediction. arXiv preprint arXiv:1911.00997, 2019.
  50. Goal-driven long-term trajectory prediction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 796–805, 2021.
  51. Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction. In 2022 International Conference on Robotics and Automation (ICRA), pages 7814–7821. IEEE, 2022.
  52. Fcos3d: Fully convolutional one-stage monocular 3d object detection. arXiv preprint arXiv:2104.10956, 2021.
  53. Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. In CVPR, pages 8445–8453, 2019.
  54. Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In 5th Annual Conference on Robot Learning, 2021.
  55. 3D Multi-Object Tracking: A Baseline and New Evaluation Metrics. IROS, 2020.
  56. Diverse generation for multi-agent sports games. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4610–4619, 2019.
  57. Center-based 3D Object Detection and Tracking. arXiv preprint arXiv:2006.11275, 2020.
  58. Diverse trajectory forecasting with determinantal point processes. In International Conference on Learning Representations, 2019.
  59. Motr: End-to-end multiple-object tracking with transformer. arXiv preprint arXiv:2105.03247, 2021.
  60. Trajectory forecasting from detection with uncertainty-aware motion encoding. arXiv preprint arXiv:2202.01478, 2022.
  61. Mutr3d: A multi-camera tracking framework via 3d-to-2d queries. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4537–4546, 2022.
  62. Tnt: Target-driven trajectory prediction. arXiv preprint arXiv:2008.08294, 2020.
  63. Tracking objects as points. In European Conference on Computer Vision, pages 474–490. Springer, 2020.
  64. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In CVPR, pages 4490–4499, 2018.
Citations (70)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com