Improving Apple Object Detection with Occlusion-Enhanced Distillation (2409.01573v2)
Abstract: Apples growing in natural environments often face severe visual obstructions from leaves and branches. This significantly increases the risk of false detections in object detection tasks, thereby escalating the challenge. Addressing this issue, we introduce a technique called "Occlusion-Enhanced Distillation" (OED). This approach utilizes occlusion information to regularize the learning of semantically aligned features on occluded datasets and employs Exponential Moving Average (EMA) to enhance training stability. Specifically, we first design an occlusion-enhanced dataset that integrates Grounding DINO and SAM methods to extract occluding elements such as leaves and branches from each sample, creating occlusion examples that reflect the natural growth state of fruits. Additionally, we propose a multi-scale knowledge distillation strategy, where the student network uses images with increased occlusions as inputs, while the teacher network employs images without natural occlusions. Through this setup, the strategy guides the student network to learn from the teacher across scales of semantic and local features alignment, effectively narrowing the feature distance between occluded and non-occluded targets and enhancing the robustness of object detection. Lastly, to improve the stability of the student network, we introduce the EMA strategy, which aids the student network in learning more generalized feature expressions that are less affected by the noise of individual image occlusions. Our method significantly outperforms current state-of-the-art techniques through extensive comparative experiments.
- Mobile robotics in agricultural operations: A narrative review on planning aspects. Applied Sciences, 10(10):3453, 2020.
- Modelling and control methods in path tracking control for autonomous agricultural vehicles: A review of state of the art and challenges. Applied Sciences, 13(12):7155, 2023.
- A comprehensive survey of the recent studies with uav for precision agriculture in open fields and greenhouses. Applied Sciences, 12(3):1047, 2022.
- A pineapple target detection method in a field environment based on improved yolov7. Applied Sciences, 13(4):2691, 2023.
- Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
- Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2016.
- The combined use of uav-based rgb and dem images for the detection and delineation of orange tree crowns with mask r-cnn: an approach of labeling and unified framework. Future Internet, 14(10):275, 2022.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016.
- Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.
- Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023.
- Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023.
- Distilling object detectors via decoupled features. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2154–2164, 2021.
- Detrdistill: A universal knowledge distillation framework for detr-families. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6898–6908, 2023.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Occlusion handling in generic object detection: A review. In 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), pages 000477–000484. IEEE, 2021.
- A review of occluded objects detection in real complex scenarios for autonomous driving. Green energy and intelligent transportation, 2(3):100092, 2023.
- Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
- Masked generative distillation. In European Conference on Computer Vision, pages 53–69. Springer, 2022.
- Group detr: Fast training convergence with decoupled one-to-many label assignment. arXiv preprint arXiv:2207.13085, 2(3):12, 2022.
- Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155, 2019.
- Adamixer: A fast-converging query-based object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5364–5373, 2022.
- Tinydet: accurate small object detection in lightweight generic detectors. arXiv preprint arXiv:2304.03428, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
- Minneapple: a benchmark dataset for apple detection and segmentation. IEEE Robotics and Automation Letters, 5(2):852–858, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Liang Geng (3 papers)