Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments (2403.13803v1)

Published 20 Mar 2024 in cs.CV

Abstract: Bounding boxes uniquely characterize object detection, where a good detector gives accurate bounding boxes of categories of interest. However, in the real-world where test ground truths are not provided, it is non-trivial to find out whether bounding boxes are accurate, thus preventing us from assessing the detector generalization ability. In this work, we find under feature map dropout, good detectors tend to output bounding boxes whose locations do not change much, while bounding boxes of poor detectors will undergo noticeable position changes. We compute the box stability score (BoS score) to reflect this stability. Specifically, given an image, we compute a normal set of bounding boxes and a second set after feature map dropout. To obtain BoS score, we use bipartite matching to find the corresponding boxes between the two sets and compute the average Intersection over Union (IoU) across the entire test set. We contribute to finding that BoS score has a strong, positive correlation with detection accuracy measured by mean average precision (mAP) under various test environments. This relationship allows us to predict the accuracy of detectors on various real-world test sets without accessing test ground truths, verified on canonical detection tasks such as vehicle detection and pedestrian detection. Code and data are available at https://github.com/YangYangGirl/BoS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Detreg: Unsupervised pretraining with region priors for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14605–14615, 2022.
  2. Eurocity persons: A novel benchmark for person detection in traffic scenes. IEEE transactions on pattern analysis and machine intelligence, 41(8):1844–1861, 2019.
  3. Cascade r-cnn: high quality object detection and instance segmentation. IEEE transactions on pattern analysis and machine intelligence, 43(5):1483–1498, 2019.
  4. End-to-end object detection with transformers. In European conference on computer vision, pp.  213–229. Springer, 2020.
  5. MMDetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
  6. The cityscapes dataset. In CVPR Workshop on the Future of Datasets in Vision, volume 2. sn, 2015.
  7. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3213–3223, 2016.
  8. Computing the testing error without a testing set. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2677–2685, 2020.
  9. Up-detr: Unsupervised pre-training for object detection with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  1601–1610, 2021.
  10. Monte carlo dropblock for modelling uncertainty in object detection. arXiv preprint arXiv:2108.03614, 2021.
  11. Are labels always necessary for classifier accuracy evaluation? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp.  15069–15078, June 2021.
  12. What does rotation prediction tell us about classifier accuracy under varying testing environments? In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp.  2579–2589. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/deng21a.html.
  13. Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research, 11(4), 2010.
  14. Centernet: Keypoint triplets for object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  6569–6578, 2019.
  15. In search of robust measures of generalization. Advances in Neural Information Processing Systems, 33:11723–11733, 2020.
  16. The pascal visual object classes (voc) challenge. International journal of computer vision, 88:303–308, 2009.
  17. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pp.  1050–1059. PMLR, 2016.
  18. Leveraging unlabeled data to predict out-of-distribution performance. arXiv preprint arXiv:2201.04234, 2022.
  19. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  20. Caltech-256 object category dataset. 2007.
  21. Predicting with confidence on unseen distributions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1134–1144, 2021.
  22. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  23. Mask r-cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  2961–2969, 2017.
  24. A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136, 2016.
  25. Sam struggles in concealed scenes–empirical study on” segment anything”. Science China Information Sciences, 66:226101, 2023.
  26. Predicting the generalization gap in deep networks with margin distributions. arXiv preprint arXiv:1810.00113, 2018.
  27. Fantastic generalization measures and where to find them. arXiv preprint arXiv:1912.02178, 2019.
  28. Kaggle. Self driving cars. https://www.kaggle.com/datasets/alincijov/self-driving-cars, 2020a.
  29. Kaggle. Traffic vehicles object detection. https://www.kaggle.com/datasets/saumyapatel/traffic-vehicles-object-detection, 2020b.
  30. Kaggle. Udacity self driving car dataset. https://www.kaggle.com/datasets/sshikamaru/udacity-self-driving-car-dataset, 2021.
  31. Kaggle. Car-person custom object detection v2 roboflow. https://www.kaggle.com/datasets/owaiskhan9654/car-person-v2-roboflow, 2022.
  32. Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
  33. Ranking neural checkpoints. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2663–2673, 2021.
  34. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp.  740–755. Springer, 2014.
  35. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp.  2980–2988, 2017.
  36. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  10012–10022, 2021.
  37. Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178:30–42, 2019.
  38. Co-validation: Using model disagreement on unlabeled data to validate classification algorithms. Advances in neural information processing systems, 17, 2004.
  39. Exploring generalization in deep learning. Advances in neural information processing systems, 30, 2017.
  40. Energy-based automated model evaluation. arXiv preprint arXiv:2401.12689, 2024.
  41. Estimating accuracy from unlabeled data: A probabilistic logic approach. Advances in neural information processing systems, 30, 2017.
  42. Estimating accuracy from unlabeled data: A bayesian approach. In International Conference on Machine Learning, pp.  1416–1425. PMLR, 2016.
  43. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
  44. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  658–666, 2019.
  45. Semi-supervised domain adaptation via minimax entropy. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  8050–8058, 2019.
  46. Crowdhuman: A benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123, 2018.
  47. Label-free model evaluation with semi-structured dataset representations. arXiv preprint arXiv:2112.00694, 2021.
  48. Cifar-10-warehouse: Broad and more realistic testbeds in model generalization analysis. arXiv preprint arXiv:2310.04414, 2023.
  49. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF international conference on computer vision, pp.  9627–9636, 2019.
  50. Ua-detrac: A new benchmark and protocol for multi-object detection and tracking. Computer Vision and Image Understanding, 193:102907, 2020.
  51. Region similarity representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  10539–10548, 2021.
  52. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  2636–2645, 2020.
  53. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. In International Conference on Learning Representations, 2022.
  54. Citypersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3213–3221, 2017.
  55. Semantic understanding of scenes through the ade20k dataset. International Journal of Computer Vision, 127:302–321, 2019.
  56. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.