Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Talk to Parallel LiDARs: A Human-LiDAR Interaction Method Based on 3D Visual Grounding (2405.15274v1)

Published 24 May 2024 in cs.CV and cs.HC

Abstract: LiDAR sensors play a crucial role in various applications, especially in autonomous driving. Current research primarily focuses on optimizing perceptual models with point cloud data as input, while the exploration of deeper cognitive intelligence remains relatively limited. To address this challenge, parallel LiDARs have emerged as a novel theoretical framework for the next-generation intelligent LiDAR systems, which tightly integrate physical, digital, and social systems. To endow LiDAR systems with cognitive capabilities, we introduce the 3D visual grounding task into parallel LiDARs and present a novel human-computer interaction paradigm for LiDAR systems. We propose Talk2LiDAR, a large-scale benchmark dataset tailored for 3D visual grounding in autonomous driving. Additionally, we present a two-stage baseline approach and an efficient one-stage method named BEVGrounding, which significantly improves grounding accuracy by fusing coarse-grained sentence and fine-grained word embeddings with visual features. Our experiments on Talk2Car-3D and Talk2LiDAR datasets demonstrate the superior performance of BEVGrounding, laying a foundation for further research in this domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Emergent visual sensors for autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 24(5):4716–4737, 2023.
  2. Automotive lidar technology: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(7):6282–6297, 2022.
  3. Parallel radars: from digital twins to digital intelligence for smart radar systems. Sensors, 22(24):9930, 2022.
  4. Software-defined active lidars for autonomous driving: A parallel intelligence-based adaptive model. IEEE Transactions on Intelligent Vehicles, 8(8):4047–4056, 2023.
  5. Scaneru: Interactive 3d visual grounding based on embodied reference understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 3936–3944, 2024.
  6. Scanrefer: 3d object localization in rgb-d scans using natural language. In European conference on computer vision, pages 202–221. Springer, 2020.
  7. Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pages 422–440. Springer, 2020.
  8. 3d-sps: Single-stage 3d visual grounding via referred point progressive selection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16454–16463, 2022.
  9. Bottom up top down detection transformers for language grounding in images and point clouds. In European Conference on Computer Vision, pages 417–433. Springer, 2022.
  10. Eda: Explicit text-decoupling and dense alignment for 3d visual grounding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19231–19242, 2023.
  11. Talk2car: Taking control of your self-driving car. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2088–2098, 2019.
  12. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  13. Visual instruction tuning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 34892–34916. Curran Associates, Inc., 2023.
  14. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  15. Radarverses in metaverses: A cpsi-based architecture for 6s radar systems in cpss. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 53(4):2128–2137, 2022.
  16. F-Y Wang. Parallel system methods for management and control of complex systems. Control and Decision., 19:485–489, 2004.
  17. Fei-Yue Wang. The emergence of intelligent enterprises: From cps to cpss. IEEE Intelligent Systems, 25(4):85–88, 2010.
  18. Parallel driving in cpss: a unified approach for transport automation and vehicle intelligence. IEEE/CAA Journal of Automatica Sinica, 4(4):577–587, 2017.
  19. Parallel driving os: A ubiquitous operating system for autonomous driving in cpss. IEEE Transactions on Intelligent Vehicles, 7(4):886–895, 2022.
  20. Parallel sensing in metaverses: Virtual-real interactive smart systems for “6s” sensing. IEEE/CAA Journal of Automatica Sinica, 9(12):2047–2054, 2022.
  21. Metasensing in metaverses: See there, be there, and know there. IEEE Intelligent Systems, 37(6):7–12, 2022.
  22. Parallel manufacturing for industrial metaverses: A new paradigm in smart manufacturing. IEEE/CAA Journal of Automatica Sinica, 9(12):2063–2070, 2022.
  23. Hpl-vit: A unified perception framework for heterogeneous parallel lidars in v2v. In 2024 International Conference on Robotics and Automation (ICRA), 2024.
  24. Parallel lidars meet the foggy weather. IEEE Journal of Radio Frequency Identification, 6:867–870, 2022.
  25. Commands 4 autonomous vehicles (c4av) workshop summary. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 3–26. Springer, 2020.
  26. Commands for autonomous vehicles by progressively stacking visual-linguistic representations. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 27–32. Springer, 2020.
  27. C4av: learning cross-modal representations from transformers. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 33–38. Springer, 2020.
  28. Cosine meets softmax: A tough-to-beat baseline for visual grounding. In Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 39–50. Springer, 2020.
  29. Predicting physical world destinations for commands given to self-driving cars. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 715–725, 2022.
  30. Mono3dvg: 3d visual grounding in monocular images. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 6988–6996, 2024.
  31. Referring multi-object tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14633–14642, 2023.
  32. Language prompt for autonomous driving. arXiv preprint arXiv:2309.04379, 2023.
  33. Language-guided 3d object detection in point cloud for autonomous driving. arXiv preprint arXiv:2305.15765, 2023.
  34. Transrefer3d: Entity-and-relation aware transformer for fine-grained 3d visual grounding. In Proceedings of the 29th ACM International Conference on Multimedia, pages 2344–2352, 2021.
  35. 3dvg-transformer: Relation modeling for visual grounding on point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2928–2937, 2021.
  36. Sat: 2d semantics assisted training for 3d visual grounding. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1856–1866, 2021.
  37. 3djcg: A unified framework for joint dense captioning and visual grounding on 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16464–16473, 2022.
  38. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 2774–2781, 2023.
  39. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  40. Transfusion: Robust lidar-camera fusion for 3d object detection with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1090–1099, 2022.
  41. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
  42. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  43. Multi3drefer: Grounding text description to multiple 3d objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15225–15236, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets