Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synthetic Data-based Detection of Zebras in Drone Imagery (2305.00432v2)

Published 30 Apr 2023 in cs.CV and cs.RO

Abstract: Nowadays, there is a wide availability of datasets that enable the training of common object detectors or human detectors. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. On the other hand, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information, such as human shapes, are hardly available. To overcome this, synthetic data generation with realistic rendering technologies has recently gained traction and advanced research areas such as target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at https://eliabntt.github.io/grade-rr.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Y. LeCun and C. Cortes, “MNIST handwritten digit database,” 2010. [Online]. Available: http://yann.lecun.com/exdb/mnist/
  2. T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll’a r, and C. L. Zitnick, “Microsoft COCO: common objects in context,” CoRR, vol. abs/1405.0312, 2014. [Online]. Available: http://arxiv.org/abs/1405.0312
  3. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, Jun. 2010.
  4. M. Rottmann and M. Reese, “Automated detection of label errors in semantic segmentation datasets via deep learning and uncertainty quantification,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 3214–3223.
  5. N. Saini, E. Bonetto, E. Price, A. Ahmad, and M. J. Black, “Airpose: Multi-view fusion network for aerial 3d human pose and shape estimation,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4805–4812, 2022.
  6. S. Zuffi, A. Kanazawa, D. Jacobs, and M. J. Black, “3D menagerie: Modeling the 3D shape and pose of animals,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Jul. 2017.
  7. S. Zuffi, A. Kanazawa, and M. J. Black, “Lions and tigers and bears: Capturing non-rigid, 3D, articulated shape from images,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE Computer Society, 2018.
  8. E. Bonetto, C. Xu, and A. Ahmad, “GRADE: Generating realistic animated dynamic environments for robotics research,” arXiv preprint arXiv:2303.04466, 2023.
  9. S. E. Ebadi, S. Dhakad, S. Vishwakarma, C. Wang, Y.-C. Jhang, M. Chociej, A. Crespi, A. Thaman, and S. Ganguly, “Psp-hdri+: A synthetic dataset generator for pre-training of human-centric computer vision models,” in First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, 2022.
  10. S. Sankaranarayanan, Y. Balaji, A. Jain, S. N. Lim, and R. Chellappa, “Learning from synthetic data: Addressing domain shift for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3752–3761.
  11. J. Cao, H. Tang, H.-S. Fang, X. Shen, C. Lu, and Y.-W. Tai, “Cross-domain adaptation for animal pose estimation,” in The IEEE International Conference on Computer Vision (ICCV), October 2019.
  12. S. Ye, A. Mathis, and M. W. Mathis, “Panoptic animal pose estimators are zero-shot performers,” arXiv preprint arXiv:2203.07436, 2022.
  13. H. Yu, Y. Xu, J. Zhang, W. Zhao, Z. Guan, and D. Tao, “AP-10k: A benchmark for animal pose estimation in the wild,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. [Online]. Available: https://openreview.net/forum?id=rH8yliN6C83
  14. Y. Yang, J. Yang, Y. Xu, J. Zhang, L. Lan, and D. Tao, “Apt-36k: A large-scale benchmark for animal pose estimation and tracking,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35.   Curran Associates, Inc., 2022, pp. 17 301–17 313. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/6e566c91d381bd7a45647d9a90838817-Paper-Datasets_and_Benchmarks.pdf
  15. Y. Li, H. Takehara, T. Taketomi, B. Zheng, and M. Nießner, “4dcomplete: Non-rigid motion estimation beyond the observable surface.” IEEE International Conference on Computer Vision (ICCV), 2021.
  16. C. Li and G. H. Lee, “From synthetic to real: Unsupervised domain adaptation for animal pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1482–1491.
  17. A. Mathis, T. Biasi, S. Schneider, M. Yuksekgonul, B. Rogers, M. Bethge, and M. W. Mathis, “Pretraining boosts out-of-domain robustness for pose estimation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2021, pp. 1859–1868.
  18. J. M. Graving, D. Chae, H. Naik, L. Li, B. Koger, B. R. Costelloe, and I. D. Couzin, “Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning,” eLife, vol. 8, p. e47994, oct 2019. [Online]. Available: https://doi.org/10.7554/eLife.47994
  19. J. Mu, W. Qiu, G. D. Hager, and A. L. Yuille, “Learning from synthetic animals,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  20. N. Koenig and A. Howard, “Design and use paradigms for gazebo, an open-source multi-robot simulator,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), vol. 3, 2004, pp. 2149–2154 vol.3.
  21. S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: High-fidelity visual and physical simulation for autonomous vehicles,” in Field and Service Robotics, 2017. [Online]. Available: https://arxiv.org/abs/1705.05065
  22. E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y. Zhu, A. Kembhavi, A. Gupta, and A. Farhadi, “AI2-THOR: An Interactive 3D Environment for Visual AI,” ArXiv, vol. abs/1712.05474, 2017.
  23. B. Shen, F. Xia, C. Li, R. Martín-Martín, L. Fan, G. Wang, C. Pérez-D’Arpino, S. Buch, S. Srivastava, L. P. Tchapmi, M. E. Tchapmi, K. Vainio, J. Wong, L. Fei-Fei, and S. Savarese, “igibson 1.0: a simulation environment for interactive tasks in large realistic scenes,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, p. accepted.
  24. Manolis Savva*, Abhishek Kadian*, Oleksandr Maksymets*, Y. Zhao, E. Wijmans, B. Jain, J. Straub, J. Liu, V. Koltun, J. Malik, D. Parikh, and D. Batra, “Habitat: A Platform for Embodied AI Research,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  25. M. Müller, V. Casser, J. Lahoud, N. Smith, and B. Ghanem, “Sim4cv: A photo-realistic simulator for computer vision applications,” International Journal of Computer Vision, vol. 126, no. 9, pp. 902–919, Mar. 2018. [Online]. Available: https://doi.org/10.1007/s11263-018-1073-7
  26. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V. Vanhoucke, and K. Goldberg, Eds., vol. 78.   PMLR, 13–15 Nov 2017, pp. 1–16.
  27. W. Wang, Y. Hu, and S. Scherer, “Tartanvo: A generalizable learning-based vo,” in Conference on Robot Learning (CoRL), 2020.
  28. “Zebra motions. Free 3D model by Kapi777,” https://sketchfab.com/3d-models/zebramotions-2546097d0ea94ba88452ce62c041fb87, [Accessed 11-Apr-2023].
  29. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision – ECCV 2016, B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds.   Cham: Springer International Publishing, 2016, pp. 21–37.
  30. E. Price and A. Ahmad, “Accelerated video annotation driven by deep detector and tracker,” in Intelligent Autonomous Systems 18, 2023, to appear.
  31. K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Elia Bonetto (11 papers)
  2. Aamir Ahmad (28 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com