MarineDet: Towards Open-Marine Object Detection (2310.01931v1)
Abstract: Marine object detection has gained prominence in marine research, driven by the pressing need to unravel oceanic mysteries and enhance our understanding of invaluable marine ecosystems. There is a profound requirement to efficiently and accurately identify and localize diverse and unseen marine entities within underwater imagery. The open-marine object detection (OMOD for short) is required to detect diverse and unseen marine objects, performing categorization and localization simultaneously. To achieve OMOD, we present \textbf{MarineDet}. We formulate a joint visual-text semantic space through pre-training and then perform marine-specific training to achieve in-air-to-marine knowledge transfer. Considering there is no specific dataset designed for OMOD, we construct a \textbf{MarineDet dataset} consisting of 821 marine-relative object categories to promote and measure OMOD performance. The experimental results demonstrate the superior performance of MarineDet over existing generalist and specialist object detection algorithms. To the best of our knowledge, we are the first to present OMOD, which holds a more valuable and practical setting for marine ecosystem monitoring and management. Our research not only pushes the boundaries of marine understanding but also offers a standard pipeline for OMOD.
- D. P. Williams, “On adaptive underwater object detection,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4741–4748, IEEE, 2011.
- M. Fulton, J. Hong, M. J. Islam, and J. Sattar, “Robotic detection of marine litter using deep visual detection models,” in IEEE International conference on robotics and automation (ICRA), pp. 5752–5758, IEEE, 2019.
- F. Zocco, T.-C. Lin, C.-I. Huang, H.-C. Wang, M. O. Khyam, and M. Van, “Towards more efficient efficientdets and real-time marine debris detection,” IEEE Robotics and Automation Letters (RA-L), vol. 8, no. 4, pp. 2134–2141, 2023.
- Y. Xia and J. Sattar, “Visual diver recognition for underwater human-robot collaboration,” in international conference on robotics and automation (ICRA), pp. 6839–6845, IEEE, 2019.
- B. Bovcon, J. Muhovič, J. Perš, and M. Kristan, “The mastr1325 dataset for training deep usv obstacle detection models,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3431–3438, IEEE, 2019.
- H. Lyu, Z. Shao, T. Cheng, Y. Yin, and X. Gao, “Sea-surface object detection based on electro-optical sensors: A review,” IEEE Intelligent Transportation Systems Magazine, pp. 2–27, 2022.
- C. Kuenzer, M. Ottinger, M. Wegmann, H. Guo, C. Wang, J. Zhang, S. Dech, and M. Wikelski, “Earth observation satellite sensors for biodiversity monitoring: potentials and bottlenecks,” International Journal of Remote Sensing, vol. 35, no. 18, pp. 6599–6647, 2014.
- B. Fan, W. Chen, Y. Cong, and J. Tian, “Dual refinement underwater object detection network,” in European Conference on Computer Vision (ECCV), pp. 275–291, Springer, 2020.
- J. Yan, Z. Zhou, B. Su, and Z. Xuanyuan, “Underwater object detection algorithm based on attention mechanism and cross-stage partial fast spatial pyramidal pooling,” Frontiers in Marine Science, p. 2299, 2022.
- M. Kapoor, S. Patra, B. N. Subudhi, V. Jakhetiya, and A. Bansal, “Underwater moving object detection using an end-to-end encoder-decoder architecture and graphsage with aggregator and refactoring,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshps (CVPRW), pp. 5635–5644, 2023.
- D. Miller, F. Dayoub, M. Milford, and N. Sünderhauf, “Evaluating merging strategies for sampling-based uncertainty techniques in object detection,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 2348–2354, IEEE, 2019.
- B. Sadrfaridpour, Y. Aloimonos, M. Yu, Y. Tao, and D. Webster, “Detecting and counting oysters,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 2156–2162, IEEE, 2021.
- X. Lin, N. J. Sanket, N. Karapetyan, and Y. Aloimonos, “Oysternet: Enhanced oyster detection using simulation,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 5170–5176, IEEE, 2023.
- L. Li, B. Dong, E. Rigall, T. Zhou, J. Dong, and G. Chen, “Marine animal segmentation,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 32, no. 4, pp. 2303–2314, 2021.
- J. Hong, M. Fulton, and J. Sattar, “Trashcan: A semantically-segmented dataset towards visual detection of marine debris,” arXiv preprint arXiv:2007.08097, 2020.
- L. Li, E. Rigall, J. Dong, and G. Chen, “Mas3k: An open dataset for marine animal segmentation,” in International Symposium on Benchmarking, Measuring and Optimization, pp. 194–212, Springer, 2020.
- P. Zhuang, Y. Wang, and Y. Qiao, “Wildfish: A large benchmark for fish recognition in the wild,” in ACM international conference on Multimedia (ACM MM), pp. 1301–1309, 2018.
- P. Zhuang, Y. Wang, and Y. Qiao, “Wildfish++: A comprehensive fish benchmark for multimedia research,” IEEE Transactions on Multimedia (TMM), vol. 23, pp. 3603–3617, 2020.
- C. Liu, Z. Wang, S. Wang, T. Tang, Y. Tao, C. Yang, H. Li, X. Liu, and X. Fan, “A new dataset, poisson gan and aquanet for underwater object grabbing,” IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 32, no. 5, pp. 2831–2844, 2021.
- M. J. Islam, C. Edge, Y. Xiao, P. Luo, M. Mehtaz, C. Morse, S. S. Enan, and J. Sattar, “Semantic segmentation of underwater imagery: Dataset and benchmark,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1769–1776, IEEE, 2020.
- A. Zareian, K. D. Rosa, D. H. Hu, and S.-F. Chang, “Open-vocabulary object detection using captions,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14393–14402, 2021.
- L. Yao, J. Han, X. Liang, D. Xu, W. Zhang, Z. Li, and H. Xu, “Detclipv2: Scalable open-vocabulary object detection pre-training via word-region alignment,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 23497–23506, 2023.
- D. Kim, A. Angelova, and W. Kuo, “Region-aware pretraining for open-vocabulary object detection with vision transformers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11144–11154, 2023.
- O. Zohar, K.-C. Wang, and S. Yeung, “Prob: Probabilistic objectness for open world object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11444–11453, 2023.
- Z. Wang, Y. Li, X. Chen, S.-N. Lim, A. Torralba, H. Zhao, and S. Wang, “Detecting everything in the open world: Towards universal object detection,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11433–11443, 2023.
- Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems (Neurips), vol. 28, 2015.
- X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, “Grid r-cnn,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7363–7372, 2019.
- Z. Zheng, Y. Wu, X. Han, and J. Shi, “Forkgan: Seeing into the rainy night,” in European Conference on Computer Vision (ECCV), pp. 155–170, 2020.
- G. Ferri, P. Stinco, G. De Magistris, A. Tesei, and K. D. LePage, “Cooperative autonomy and data fusion for underwater surveillance with networked auvs,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 871–877, IEEE, 2020.
- N. Karapetyan, J. Moulton, J. S. Lewis, A. Q. Li, J. M. O’Kane, and I. Rekleitis, “Multi-robot dubins coverage with autonomous surface vehicles,” in IEEE International Conference on Robotics and Automation (ICRA), pp. 2373–2379, IEEE, 2018.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning (ICML), pp. 8748–8763, PMLR, 2021.
- J. Li, D. Li, C. Xiong, and S. Hoi, “Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,” in International Conference on Machine Learning (ICML), pp. 12888–12900, PMLR, 2022.
- J. Li, D. Li, S. Savarese, and S. Hoi, “Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models,” International Conference on Machine Learning (ICML), 2023.
- C. Liu, H. Li, S. Wang, M. Zhu, D. Wang, X. Fan, and Z. Wang, “A dataset and benchmark of underwater object detection for robot picking,” in IEEE International Conference on Multimedia and Expo Workshops (ICMEW), pp. 1–6, IEEE, 2021.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, 2016.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV), pp. 740–755, Springer, 2014.
- S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
- A. Gupta, P. Dollar, and R. Girshick, “Lvis: A dataset for large vocabulary instance segmentation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5356–5364, 2019.
- C. Li, H. Liu, L. Li, P. Zhang, J. Aneja, J. Yang, P. Jin, H. Hu, Z. Liu, Y. J. Lee, et al., “Elevater: A benchmark and toolkit for evaluating language-augmented visual models,” Advances in Neural Information Processing Systems (Neurips), vol. 35, pp. 9287–9301, 2022.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.