Evaluating the method reproducibility of deep learning models in the biodiversity domain (2407.07550v1)
Abstract: AI is revolutionizing biodiversity research by enabling advanced data analysis, species identification, and habitats monitoring, thereby enhancing conservation efforts. Ensuring reproducibility in AI-driven biodiversity research is crucial for fostering transparency, verifying results, and promoting the credibility of ecological findings.This study investigates the reproducibility of deep learning (DL) methods within the biodiversity domain. We design a methodology for evaluating the reproducibility of biodiversity-related publications that employ DL techniques across three stages. We define ten variables essential for method reproducibility, divided into four categories: resource requirements, methodological information, uncontrolled randomness, and statistical considerations. These categories subsequently serve as the basis for defining different levels of reproducibility. We manually extract the availability of these variables from a curated dataset comprising 61 publications identified using the keywords provided by biodiversity experts. Our study shows that the dataset is shared in 47% of the publications; however, a significant number of the publications lack comprehensive information on deep learning methods, including details regarding randomness.
- Biodivnere: Gold standard corpora for named entity recognition and relation extraction in the biodiversity domain. Biodiversity Data Journal, 10.
- How reproducible are the results gained with the help of deep learning methods in biodiversity research? Biodiversity Information Science and Standards, 7.
- An automated deep learning based satellite imagery analysis for ecology management. Ecol. Inform., 66(101452):101452.
- Integrating multi-sensors data for species distribution mapping using deep learning and envelope models. Remote Sens. (Basel), 13(16):3284.
- An alternative approach for mapping burn scars using landsat imagery, google earth engine, and deep learning in the brazilian savanna. Remote Sens. Appl. Soc. Environ., 22(100472):100472.
- AI naturalists might hold the key to unlocking biodiversity data in social media imagery. Patterns, 1(7):100116.
- Vegetation detection using deep learning and conventional methods. Remote Sens. (Basel), 12(15):2502.
- Country-wide retrieval of forest structure from optical and SAR satellite imagery with deep ensembles.
- An automated light trap to monitor moths (lepidoptera) using computer vision-based tracking and deep learning. Sensors (Basel), 21(2):343.
- Taxonomic classification of ants (formicidae) from images using deep learning. bioRxiv, page 407452.
- A deep learning approach to species distribution modelling. In Multimedia Tools and Applications for Environmental & Biodiversity Informatics, pages 169–199. Springer International Publishing, Cham.
- Predicting animal behaviour using deep learning: GPS data alone accurately predict diving in seabirds. Methods Ecol. Evol., 9(3):681–692.
- Understanding deep learning in land use classification based on sentinel-2 time series. Sci. Rep., 10(1):17188.
- Modelling animal biodiversity using acoustic monitoring and deep learning. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE.
- Automatic standardized processing and identification of tropical bat calls using deep learning approaches. Biol. Conserv., 241(108269):108269.
- Mapping potential plant species richness over large areas with deep learning, MODIS, and species distribution models. Remote Sens. (Basel), 13(13):2490.
- Applications for deep learning in ecology. Methods in Ecology and Evolution, 10(10):1632–1644.
- Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37–46.
- Extreme deep learning in biosecurity: the case of machine hearing for marine species identification. J. Inf. Telecommun., 2(4):492–510.
- Pollen analysis using multispectral imaging flow cytometry and deep learning. New Phytol., 229(1):593–606.
- Camera assisted roadside monitoring for invasive alien plant species using deep learning. Sensors (Basel), 21(18):6126.
- Deep learning pipeline. Apress: Berkeley, CA, USA.
- CityNet—Deep learning tools for urban ecoacoustic assessment. Methods Ecol. Evol., 10(2):186–197.
- A checklist for maximizing reproducibility of ecological niche models. Nature Ecology & Evolution, 3(10):1382–1395.
- Image-based taxonomic classification of bulk biodiversity samples using deep learning and domain adaptation.
- Trade-off between deep learning for species identification and inference about predator-prey co-occurrence: Reproducible R workflow integrating models in computer vision and ecological statistics.
- What does research reproducibility mean? Science Translational Medicine, 8(341):341ps12–341ps12.
- GPAI (2022). Biodiversity and artificial intelligence, opportunities and recommendations report.
- Tree cover estimation in global drylands from space using deep learning. Remote Sens. (Basel), 12(3):343.
- Deep-Learning convolutional neural networks for scattered shrub detection with google earth imagery.
- Whale counting in satellite and aerial images with deep learning. Sci. Rep., 9(1):14259.
- Do machine learning platforms provide out-of-the-box reproducibility? Future Generation Computer Systems, 126:34–47.
- Reproducibility standards for machine learning in the life sciences. Nature Methods, 18(10):1132–1135.
- Heredia, I. (2017). Large-scale plant classification with deep neural networks. In Proceedings of the Computing Frontiers Conference, New York, NY, USA. ACM.
- BatNet++: A robust deep learning-based predicting models for calls recognition. In 2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA). IEEE.
- Recognition of endemic bird species using deep learning models. IEEE Access, 9:102975–102984.
- Automated extraction of phenotypic leaf traits of individual intact herbarium leaves from herbarium specimen images using deep learning based semantic segmentation. Sensors (Basel), 21(13):4549.
- Reconstruction of damaged herbarium leaves using deep learning techniques for improving classification accuracy. Ecol. Inform., 61(101243):101243.
- Bag of features (BoF) based deep learning framework for bleached corals detection. Big Data Cogn. Comput., 5(4):53.
- Fish classification using DNA barcode sequences through deep learning method. Symmetry (Basel), 13(9):1599.
- Deep learning improves acoustic biodiversity monitoring and new candidate forest frog species identification (genus platymantis) in the Philippines. Biodivers. Conserv., 30(3):643–657.
- Automatic windthrow detection using very-high-resolution satellite imagery and deep learning. Remote Sens. (Basel), 12(7):1145.
- Deep learning for large scale biodiversity monitoring. In Bloomberg Data for Good Exchange Conference.
- From human experts to machines: An llm supported approach to ontology and knowledge graph construction. arXiv preprint arXiv:2403.08345.
- Leonelli, S. (2018). Rethinking reproducibility as a criterion for research quality. In Including a symposium on Mary Morgan: curiosity, imagination, and surprise, volume 36, pages 129–146. Emerald Publishing Limited.
- A novel deep learning based approach for seed image classification and retrieval. Comput. Electron. Agric., 187(106269):106269.
- Columnar cactus recognition in aerial images using a deep learning approach. Ecol. Inform., 52:131–138.
- Bat detective—deep learning tools for bat acoustic signal detection. PLoS Comput. Biol., 14(3):e1005995.
- Automatic annotation of coral reefs using deep learning. In OCEANS 2016 MTS/IEEE Monterey. IEEE.
- A continental-scale assessment of density, size, distribution and historical trends of farm dams using deep learning convolutional neural networks. Remote Sens. (Basel), 13(2):319.
- Semantic segmentation of tree-canopy in urban environment with pixel-wise deep learning. Remote Sens. (Basel), 13(16):3054.
- Deep learning for species identification of modern and fossil rodent molars.
- Using deep learning for image-based plant disease detection. Front. Plant Sci., 7:1419.
- Hierarchical mapping of brazilian savanna (cerrado) physiognomies based on deep learning. J. Appl. Remote Sens., 15(04).
- Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proceedings of the National Academy of Sciences, 115(25):E5716–E5725.
- Evaluation of deep learning techniques for deforestation detection in the brazilian amazon and cerrado biomes from remote sensing imagery. Remote Sens. (Basel), 12(6):910.
- Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). The Journal of Machine Learning Research, 22(1):7459–7478.
- Potamitis, I. (2016). Deep learning for detection of bird vocalisations. arXiv preprint arXiv:1609.08408.
- Raff, E. (2019). A step toward quantifying independently reproducible machine learning research. Advances in Neural Information Processing Systems, 32.
- Identifying land patterns from satellite imagery in amazon rainforest using deep learning.
- Harnessing deep learning in ecology: An example predicting bark beetle outbreaks. Front. Plant Sci., 10:1327.
- Assessment of deep learning techniques for land use land cover classification in southern new caledonia. Remote Sens. (Basel), 13(12):2257.
- ” which camera trap type and how many do i need?” a review of camera features and study designs for a range of wildlife research applications. Hystrix.
- Fusing shallow and deep learning for bioacoustic bird species classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.
- Understanding experiments and research practices for reproducibility: an exploratory study. PeerJ, 9:e11140.
- Machine learning pipelines: Provenance, reproducibility and FAIR data principles. In Glavic, B., Braganholo, V., and Koop, D., editors, Provenance and Annotation of Data and Processes - 8th and 9th International Provenance and Annotation Workshop, IPAW 2020 + IPAW 2021, Virtual Event, July 19-22, 2021, Proceedings, volume 12839 of Lecture Notes in Computer Science, pages 226–230. Springer.
- Deep learning and citizen science enable automated plant trait predictions from photographs. Sci. Rep., 11(1):16395.
- Identification of animals and recognition of their actions in wildlife videos using deep learning techniques. Ecol. Inform., 61(101215):101215.
- Would ecology fail the repeatability test? BioScience, 66(2):98–99.
- Applications of deep convolutional neural networks to digitized natural history collections. Biodivers. Data J., 5:e21139.
- Sashimi: A toolkit for facilitating high‐throughput organismal image segmentation using deep learning. Methods Ecol. Evol., 12(12):2341–2354.
- Stark, P. B. (2018). Before reproducibility must come preproducibility. Nature, 557(7706):613–614.
- Machine learning to classify animal species in camera trap images: Applications in ecology. Methods in Ecology and Evolution, 10(4):585–590.
- A practical taxonomy of reproducibility for machine learning research.
- Development of spectral-phenological features for deep learning to understand spartina alterniflora invasion. Remote Sens. Environ., 242(111745):111745.
- Coral reef fish detection and recognition in underwater videos by supervised machine learning: Comparison between deep learning and HOG+SVM methods. In Advanced Concepts for Intelligent Vision Systems, Lecture notes in computer science, pages 160–171. Springer International Publishing, Cham.
- A deep learning method for accurate and fast identification of coral reef fishes in underwater images. Ecol. Inform., 48:238–244.
- A new method to control error rates in automated species identification with deep learning algorithms. Sci. Rep., 10(1):10972.
- Demystifying the landscape of ecological data repositories in the United States. BioScience, 67(12):1044–1051.
- Weinstein, B. G. (2018). Scene‐specific convolutional neural networks for video‐based biodiversity detection. Methods Ecol. Evol., 9(6):1435–1441.
- Individual tree-crown detection in RGB imagery using semi-supervised deep learning neural networks. Remote Sens. (Basel), 11(11):1309.
- Investigation of different CNN-based models for improved bird sound classification. IEEE Access, 7:175353–175361.
- Detection and annotation of plant organs from digitised herbarium scans using deep learning. Biodivers. Data J., 8:e57090.
- Automated conservation assessment of the orchid family with deep learning. Conserv. Biol., 35(3):897–908.
- Towards an IoT-based deep learning architecture for camera trap image classification. In 2020 IEEE Global Conference on Artificial Intelligence and Internet of Things (GCAIoT). IEEE.
- Waqas Ahmed (46 papers)
- Vamsi Krishna Kommineni (3 papers)
- Birgitta König-Ries (13 papers)
- Jitendra Gaikwad (1 paper)
- Luiz Gadelha (6 papers)
- Sheeba Samuel (13 papers)