Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving (2402.02026v2)

Published 3 Feb 2024 in cs.CV and cs.AI

Abstract: Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “You only look once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
  2. “Scalable object detection using deep neural networks,” in CVPR, 2014, pp. 2147–2154.
  3. “Coda: A real-world road corner case dataset for object detection in autonomous driving,” in ECCV, 2022, pp. 406–423.
  4. “Towards open world object detection,” in CVPR, 2021, pp. 5830–5840.
  5. “Ow-detr: Open-world detection transformer,” in CVPR, 2022, pp. 9235–9244.
  6. “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
  7. “Open-vocabulary object detection using captions,” in CVPR, 2021, pp. 14393–14402.
  8. “Improved visual-semantic alignment for zero-shot object detection,” in AAAI, 2020, vol. 34, pp. 11932–11939.
  9. “Out-of-distribution detection for automotive perception,” in ITSC, 2021, pp. 2938–2943.
  10. “Pixel-wise anomaly detection in complex driving scenes,” in CVPR, 2021, pp. 16918–16927.
  11. “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  12. “Good: Exploring geometric cues for detecting objects in an open world,” arXiv preprint arXiv:2212.11720, 2022.
  13. “Grounded language-image pre-training,” in CVPR, 2022, pp. 10965–10975.
  14. “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in ICCV, 2021, pp. 10786–10796.
  15. “Learning open-world object proposals without learning to classify,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5453–5460, 2022.
  16. “Open-set semi-supervised object detection,” in ECCV, 2022, pp. 143–159.
  17. “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022.
  18. “Soda10m: a large-scale 2d self/semi-supervised object detection dataset for autonomous driving,” arXiv preprint arXiv:2106.11118, 2021.
  19. “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in CVPR, 2020, pp. 2636–2645.
  20. Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
  21. “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, 2022, pp. 12888–12900.
  22. “Yolop: You only look once for panoptic driving perception,” Machine Intelligence Research, vol. 19, no. 6, pp. 550–562, 2022.

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub