Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 82 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 185 tok/s Pro
GPT OSS 120B 465 tok/s Pro
Claude Sonnet 4 30 tok/s Pro
2000 character limit reached

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving (2403.17373v1)

Published 26 Mar 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed perception model. This necessitates an expensive process of continuously curating and annotating data with significant human effort. We propose to leverage recent advances in vision-language and LLMs to design an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. This process operates iteratively, allowing for continuous self-improvement of the model. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Tesla autonomy day, howpublished = https://www.youtube.com/live/ucp0ttmvqoe?si=bwinmhvsuzthivax.
  2. Cruise’s continuous learning machine predicts the unpredictable on san francisco roads, howpublished = https://medium.com/cruise/cruise-continuous-learning-machine-30d60f4c691b.
  3. Scaling novel object detection with weakly supervised detection transformers. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 85–96, 2023.
  4. Simple open-vocabulary object detection. In European Conference on Computer Vision, pages 728–755. Springer, 2022.
  5. Unbiased teacher for semi-supervised object detection. arXiv preprint arXiv:2102.09480, 2021.
  6. Unbiased teacher v2: Semi-supervised object detection for anchor-free and anchor-based detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9819–9828, 2022.
  7. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489, 2017.
  8. Not all labels are equal: Rationalizing the labeling costs for training object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14492–14501, 2022.
  9. Talisman: targeted active learning for object detection with rare classes and slices using submodular mutual information. In European Conference on Computer Vision, pages 1–16. Springer, 2022.
  10. Box-level active detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23766–23775, 2023.
  11. Scaling open-vocabulary object detection. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  12. Introducing chatgpt, howpublished = https://openai.com/blog/chatgpt.
  13. End-to-end autonomous driving: Challenges and frontiers. arXiv preprint arXiv:2306.16927, 2023.
  14. Neil: Extracting visual knowledge from web data. In Proceedings of the IEEE international conference on computer vision, pages 1409–1416, 2013.
  15. Never-ending learning. Communications of the ACM, 61(5):103–115, 2018.
  16. Segment anything. In ICCV, 2023.
  17. Auto4d: Learning to label 4d objects from sequential point clouds. arXiv preprint arXiv:2101.06586, 2021.
  18. Offboard 3d object detection from point cloud sequences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6134–6144, 2021.
  19. Clip model is an efficient continual learner. arXiv preprint arXiv:2210.03114, 2022.
  20. Otter: A multi-modal model with in-context instruction tuning. arXiv preprint arXiv:2305.03726, 2023.
  21. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  22. Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
  23. End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
  24. Incremental few-shot object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13846–13855, 2020.
  25. The overlooked elephant of object detection: Open set. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1021–1030, 2020.
  26. Learning open-world object proposals without learning to classify. IEEE Robotics and Automation Letters, 7(2):5453–5460, 2022.
  27. Learning to detect every thing in an open world. In European Conference on Computer Vision, pages 268–284. Springer, 2022.
  28. Localized vision-language matching for open-vocabulary object detection. In DAGM German Conference on Pattern Recognition, pages 393–408. Springer, 2022.
  29. X-detr: A versatile architecture for instance-wise vision-language tasks. In European Conference on Computer Vision, pages 290–308. Springer, 2022.
  30. Learning object-language alignments for open-vocabulary object detection. In The Eleventh International Conference on Learning Representations, 2023.
  31. Towards open-set object detection and discovery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3961–3970, 2022.
  32. Towards open world object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5830–5840, 2021.
  33. Discovering objects that can move. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11789–11798, 2022.
  34. Generalized category discovery. In IEEE Conference on Computer Vision and Pattern Recognition, 2022.
  35. A unified objective for novel class discovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9284–9292, 2021.
  36. Open-vocabulary object detection via vision and language knowledge distillation. In International Conference on Learning Representations, 2022.
  37. Region-aware pretraining for open-vocabulary object detection with vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11144–11154, 2023.
  38. Open-vocabulary object detection upon frozen vision and language models. In The Eleventh International Conference on Learning Representations, 2023.
  39. A simple framework for open-vocabulary segmentation and detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1020–1031, 2023.
  40. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10965–10975, 2022.
  41. Exploiting unlabeled data with vision and language models for object detection. In European Conference on Computer Vision, pages 159–175. Springer, 2022.
  42. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019.
  43. PaLI: A jointly-scaled multilingual language-image model. In The Eleventh International Conference on Learning Representations, 2023.
  44. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  45. nuscenes: A multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027, 2019.
  46. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  47. The cityscapes dataset for semantic urban scene understanding. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  48. Are we ready for autonomous driving? the kitti vision benchmark suite. In Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
  49. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  50. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision, pages 4990–4999, 2017.
  51. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16, pages 510–526. Springer, 2020.
  52. Semi-detr: Semi-supervised object detection with detection transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23809–23818, 2023.
  53. Consistent-teacher: Towards reducing inconsistent pseudo-targets in semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3240–3249, 2023.
  54. Open-set semi-supervised object detection. In European Conference on Computer Vision, pages 143–159. Springer, 2022.
  55. A survey of deep active learning. ACM computing surveys (CSUR), 54(9):1–40, 2021.
  56. Just label what you need: Fine-grained active selection for p&p through partially labeled scenes. In Conference on Robot Learning, pages 816–826. PMLR, 2022.
  57. Improving the intra-class long-tail in 3d detection via rare example mining. In European Conference on Computer Vision, pages 158–175. Springer, 2022.
  58. Active learning for open-set annotation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 41–49, 2022.
  59. Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. In ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, volume 3, 2003.
  60. Diverse complexity measures for dataset curation in self-driving. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8609–8616. IEEE, 2021.
  61. Mixteacher: Mining promising labels with mixed scale teacher for semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7370–7379, 2023.
  62. Active teacher for semi-supervised object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14482–14491, 2022.
  63. Semi-supervised batch active learning via bilevel optimization. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3495–3499. IEEE, 2021.
  64. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE international conference on computer vision, pages 3400–3409, 2017.
  65. Wanderlust: Online continual object detection in the real world. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10829–10838, 2021.
  66. Overcoming catastrophic forgetting in incremental object detection via elastic response distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9427–9436, 2022.
  67. Ordisco: Effective and efficient usage of incremental unlabeled data for semi-supervised continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5383–5392, 2021.
  68. Memory-efficient semi-supervised continual learning: The world is its own replay buffer. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
  69. Continual semi-supervised learning through contrastive interpolation consistency. Pattern Recognition Letters, 162:9–14, 2022.
  70. A soft nearest-neighbor framework for continual semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11868–11877, 2023.
  71. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  72. Faster bounding box annotation for object detection in indoor scenes. In 2018 7th European Workshop on Visual Information Processing (EUVIP), pages 1–6. IEEE, 2018.
  73. GPU price from lambda, howpublished = https://lambdalabs.com/service/gpu-cloud.
  74. Object detection with a unified label space from multiple datasets. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 178–193. Springer, 2020.
  75. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  76. Fast segment anything. arXiv preprint arXiv:2306.12156, 2023.
  77. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  78. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
Citations (7)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com