Papers
Topics
Authors
Recent
Search
2000 character limit reached

Language-Driven Active Learning for Diverse Open-Set 3D Object Detection

Published 19 Apr 2024 in cs.CV, cs.AI, and cs.LG | (2404.12856v2)

Abstract: Object detection is crucial for ensuring safe autonomous driving. However, data-driven approaches face challenges when encountering minority or novel objects in the 3D driving scene. In this paper, we propose VisLED, a language-driven active learning framework for diverse open-set 3D Object Detection. Our method leverages active learning techniques to query diverse and informative data samples from an unlabeled pool, enhancing the model's ability to detect underrepresented or novel objects. Specifically, we introduce the Vision-Language Embedding Diversity Querying (VisLED-Querying) algorithm, which operates in both open-world exploring and closed-world mining settings. In open-world exploring, VisLED-Querying selects data points most novel relative to existing data, while in closed-world mining, it mines novel instances of known classes. We evaluate our approach on the nuScenes dataset and demonstrate its efficiency compared to random sampling and entropy-querying methods. Our results show that VisLED-Querying consistently outperforms random sampling and offers competitive performance compared to entropy-querying despite the latter's model-optimality, highlighting the potential of VisLED for improving object detection in autonomous driving scenarios. We make our code publicly available at https://github.com/Bjork-crypto/VisLED-Querying

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Task-aware novelty detection for visual-based deep learning in autonomous systems. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 11060–11066. IEEE, 2020.
  2. Toward open set recognition. IEEE transactions on pattern analysis and machine intelligence, 35(7):1757–1772, 2012.
  3. On salience-sensitive sign classification in autonomous vehicle path planning: Experimental explorations with a novel dataset. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 636–644, 2022.
  4. What makes an on-road object important? In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 3392–3397. IEEE, 2016.
  5. Salient sign detection in safe autonomous driving: Ai which reasons over full visual context. In 27th International Technical Conference on the Enhanced Safety of Vehicles (ESV) National Highway Traffic Safety Administration, number 23-0333, 2023.
  6. Robust traffic light detection using salience-sensitive loss: Computational framework and evaluations. In 2023 IEEE Intelligent Vehicles Symposium (IV), pages 1–7. IEEE, 2023.
  7. Label efficient visual abstractions for autonomous driving. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2338–2345. IEEE, 2020.
  8. Create a large-scale video driving dataset with detailed attributes using amazon sagemaker ground truth. 2021.
  9. Sanjoy Dasgupta. Two faces of active learning. Theoretical computer science, 412(19):1767–1781, 2011.
  10. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  11. Towards explainable, safe autonomous driving with language embeddings for novelty identification and active learning: Framework and experimental analysis with real-world data sets. arXiv preprint arXiv:2402.07320, 2024.
  12. Learning transferable visual models from natural language supervision. International Conference on Machine Learning, 2021.
  13. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  14. Activeanno3d–an active learning framework for multi-modal 3d object detection. arXiv preprint arXiv:2402.03235, 2024.
  15. The why, when, and how to use active learning in large-data-driven 3d object detection for safe autonomous driving: An empirical exploration. arXiv preprint arXiv:2401.16634, 2024.
  16. Bevfusion: Multi-task multi-sensor fusion with unified bird’s-eye view representation. In 2023 IEEE international conference on robotics and automation (ICRA), pages 2774–2781. IEEE, 2023.
  17. Swin transformer: Hierarchical vision transformer using shifted window. ICCV, 2021.
  18. Second: Sparsely embedded convolutional detection. Sensors, 2018.
  19. Feature pyramid networks for object detectio. CVPR, 2017.
  20. Monocular 3d object detection with lidar guided semi supervised active learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2346–2355, 2024.
  21. Multi-task consistency for active learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3415–3424, 2023.
  22. Tumtraf intersection dataset: All you need for urban 3d camera-lidar roadside perception. In 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pages 1030–1037. IEEE, 2023.
Citations (3)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.