SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction (2403.11492v2)
Abstract: Predicting the future motion of surrounding agents is essential for autonomous vehicles (AVs) to operate safely in dynamic, human-robot-mixed environments. Context information, such as road maps and surrounding agents' states, provides crucial geometric and semantic information for motion behavior prediction. To this end, recent works explore two-stage prediction frameworks where coarse trajectories are first proposed, and then used to select critical context information for trajectory refinement. However, they either incur a large amount of computation or bring limited improvement, if not both. In this paper, we introduce a novel scenario-adaptive refinement strategy, named SmartRefine, to refine prediction with minimal additional computation. Specifically, SmartRefine can comprehensively adapt refinement configurations based on each scenario's properties, and smartly chooses the number of refinement iterations by introducing a quality score to measure the prediction quality and remaining refinement potential of each scenario. SmartRefine is designed as a generic and flexible approach that can be seamlessly integrated into most state-of-the-art motion prediction models. Experiments on Argoverse (1 & 2) show that our method consistently improves the prediction accuracy of multiple state-of-the-art prediction models. Specifically, by adding SmartRefine to QCNet, we outperform all published ensemble-free works on the Argoverse 2 leaderboard (single agent track) at submission. Comprehensive studies are also conducted to ablate design choices and explore the mechanism behind multi-iteration refinement. Codes are available at https://github.com/opendilab/SmartRefine/
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Multipath: Multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449, 2019.
- Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8748–8757, 2019.
- Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
- R-pred: Two-stage motion prediction via tube-query attention-based trajectory refinement. arXiv preprint arXiv:2211.08609, 2022.
- Multimodal trajectory predictions for autonomous driving using deep convolutional networks. In 2019 International Conference on Robotics and Automation (ICRA), pages 2090–2096. IEEE, 2019.
- Path-aware graph attention for hd maps in motion prediction. In 2022 International Conference on Robotics and Automation (ICRA), pages 6430–6436. IEEE, 2022.
- Trafficgen: Learning to generate diverse and realistic traffic scenarios. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 3567–3575. IEEE, 2023.
- Vectornet: Encoding hd maps and agent dynamics from vectorized representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11525–11533, 2020.
- Densetnt: End-to-end trajectory prediction from dense goal sets. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15303–15312, 2021.
- Planning-oriented autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17853–17862, 2023.
- Scenarionet: Open-source platform for large-scale traffic scenario simulation and modeling. Advances in neural information processing systems, 36, 2024.
- Learning lane graph representations for motion forecasting. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 541–556. Springer, 2020.
- Multimodal motion prediction with stacked transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7577–7586, 2021.
- Wayformer: Motion forecasting via simple & efficient attention networks. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2980–2987. IEEE, 2023.
- Scene transformer: A unified architecture for predicting multiple agent trajectories. arXiv preprint arXiv:2106.08417, 2021.
- Yael Niv. Learning task-state representations. Nature neuroscience, 22(10):1544–1553, 2019.
- Leveraging future relationship reasoning for vehicle trajectory prediction. arXiv preprint arXiv:2305.14715, 2023.
- Holistic reinforcement learning: the role of structure and attention. Trends in cognitive sciences, 23(4):278–292, 2019.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 683–700. Springer, 2020.
- Lmdrive: Closed-loop end-to-end driving with large language models. arXiv preprint arXiv:2312.07488, 2023a.
- Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Conference on Robot Learning, pages 726–737. PMLR, 2023b.
- Reasonnet: End-to-end driving with temporal and global reasoning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13723–13733, 2023c.
- Motion transformer with global intention localization and local movement refinement. Advances in Neural Information Processing Systems, 35:6531–6543, 2022.
- Multipath++: Efficient information fusion and trajectory aggregation for behavior prediction. In 2022 International Conference on Robotics and Automation (ICRA), pages 7814–7821. IEEE, 2022.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Hierarchical adaptable and transferable networks (hatn) for driving behavior prediction. arXiv preprint arXiv:2111.00788, 2021a.
- Socially-compatible behavior design of autonomous vehicles with verification on real human data. IEEE Robotics and Automation Letters, 6(2):3421–3428, 2021b.
- Transferable and adaptable driving behavior prediction. arXiv preprint arXiv:2202.05140, 2022.
- Efficient reinforcement learning for autonomous driving with parameterized skills and priors. arXiv preprint arXiv:2305.04412, 2023a.
- Ganet: Goal area network for motion forecasting. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1609–1615. IEEE, 2023b.
- Prophnet: Efficient agent-centric motion forecasting with anchor-informed proposals. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21995–22003, 2023c.
- Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv preprint arXiv:2301.00493, 2023.
- Improving the generalizability of trajectory prediction models with frenet-based domain normalization. arXiv preprint arXiv:2305.17965, 2023a.
- Bootstrap motion forecasting with self-consistent constraints. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8504–8514, 2023b.
- Map-adaptive goal-based trajectory prediction. In Conference on Robot Learning, pages 1371–1383. PMLR, 2021.
- Tnt: Target-driven trajectory prediction. In Conference on Robot Learning, pages 895–904. PMLR, 2021.
- Hivt: Hierarchical vector transformer for multi-agent motion prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8823–8833, 2022.
- Query-centric trajectory prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17863–17873, 2023.