Multimodal-Enhanced Objectness Learner for Corner Case Detection in Autonomous Driving (2402.02026v2)
Abstract: Previous works on object detection have achieved high accuracy in closed-set scenarios, but their performance in open-world scenarios is not satisfactory. One of the challenging open-world problems is corner case detection in autonomous driving. Existing detectors struggle with these cases, relying heavily on visual appearance and exhibiting poor generalization ability. In this paper, we propose a solution by reducing the discrepancy between known and unknown classes and introduce a multimodal-enhanced objectness notion learner. Leveraging both vision-centric and image-text modalities, our semi-supervised learning framework imparts objectness knowledge to the student model, enabling class-aware detection. Our approach, Multimodal-Enhanced Objectness Learner (MENOL) for Corner Case Detection, significantly improves recall for novel classes with lower training costs. By achieving a 76.6% mAR-corner and 79.8% mAR-agnostic on the CODA-val dataset with just 5100 labeled training images, MENOL outperforms the baseline ORE by 71.3% and 60.6%, respectively. The code will be available at https://github.com/tryhiseyyysum/MENOL.
- “You only look once: Unified, real-time object detection,” in CVPR, 2016, pp. 779–788.
- “Scalable object detection using deep neural networks,” in CVPR, 2014, pp. 2147–2154.
- “Coda: A real-world road corner case dataset for object detection in autonomous driving,” in ECCV, 2022, pp. 406–423.
- “Towards open world object detection,” in CVPR, 2021, pp. 5830–5840.
- “Ow-detr: Open-world detection transformer,” in CVPR, 2022, pp. 9235–9244.
- “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
- “Open-vocabulary object detection using captions,” in CVPR, 2021, pp. 14393–14402.
- “Improved visual-semantic alignment for zero-shot object detection,” in AAAI, 2020, vol. 34, pp. 11932–11939.
- “Out-of-distribution detection for automotive perception,” in ITSC, 2021, pp. 2938–2943.
- “Pixel-wise anomaly detection in complex driving scenes,” in CVPR, 2021, pp. 16918–16927.
- “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- “Good: Exploring geometric cues for detecting objects in an open world,” arXiv preprint arXiv:2212.11720, 2022.
- “Grounded language-image pre-training,” in CVPR, 2022, pp. 10965–10975.
- “Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans,” in ICCV, 2021, pp. 10786–10796.
- “Learning open-world object proposals without learning to classify,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5453–5460, 2022.
- “Open-set semi-supervised object detection,” in ECCV, 2022, pp. 143–159.
- “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022.
- “Soda10m: a large-scale 2d self/semi-supervised object detection dataset for autonomous driving,” arXiv preprint arXiv:2106.11118, 2021.
- “Bdd100k: A diverse driving dataset for heterogeneous multitask learning,” in CVPR, 2020, pp. 2636–2645.
- Laurens van der Maaten and Geoffrey Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, no. 86, pp. 2579–2605, 2008.
- “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in ICML, 2022, pp. 12888–12900.
- “Yolop: You only look once for panoptic driving perception,” Machine Intelligence Research, vol. 19, no. 6, pp. 550–562, 2022.