Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the selection and effectiveness of pseudo-absences for species distribution modeling with deep learning (2401.02989v1)

Published 3 Jan 2024 in q-bio.QM, cs.LG, and q-bio.PE

Abstract: Species distribution modeling is a highly versatile tool for understanding the intricate relationship between environmental conditions and species occurrences. However, the available data often lacks information on confirmed species absence and is limited to opportunistically sampled, presence-only observations. To overcome this limitation, a common approach is to employ pseudo-absences, which are specific geographic locations designated as negative samples. While pseudo-absences are well-established for single-species distribution models, their application in the context of multi-species neural networks remains underexplored. Notably, the significant class imbalance between species presences and pseudo-absences is often left unaddressed. Moreover, the existence of different types of pseudo-absences (e.g., random and target-group background points) adds complexity to the selection process. Determining the optimal combination of pseudo-absences types is difficult and depends on the characteristics of the data, particularly considering that certain types of pseudo-absences can be used to mitigate geographic biases. In this paper, we demonstrate that these challenges can be effectively tackled by integrating pseudo-absences in the training of multi-species neural networks through modifications to the loss function. This adjustment involves assigning different weights to the distinct terms of the loss function, thereby addressing both the class imbalance and the choice of pseudo-absence types. Additionally, we propose a strategy to set these loss weights using spatial block cross-validation with presence-only data. We evaluate our approach using a benchmark dataset containing independent presence-absence data from six different regions and report improved results when compared to competing approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Uses and misuses of bioclimatic envelope modeling. Ecology, 93(7):1527–1539, 2012.
  2. Selecting pseudo-absences for species distribution models: How, where and how many? Methods in ecology and evolution, 3(2):327–338, 2012.
  3. Spatial bias in the gbif database and its effect on modeling species’ geographic distributions. Ecological Informatics, 19:10–15, 2014.
  4. J. Bekker and J. Davis. Learning from positive and unlabeled data: A survey. Machine Learning, 109:719–760, 2020.
  5. Effects of sample size and network depth on a deep learning approach to species distribution modeling. Ecological Informatics, 60:101137, 2020.
  6. Correcting for the effects of class imbalance improves the performance of machine-learning based species distribution models. Ecological Modelling, 483:110414, 2023.
  7. Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  8. Bias in presence-only niche models related to sampling effort and species niches: Lessons for background point selection. PLoS One, 15(5):e0232078, 2020.
  9. Overview of geolifeclef 2023: Species composition prediction with high spatial resolution at continental scale using remote sensing. Working Notes of CLEF, 2023.
  10. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877–1901. Curran Associates, Inc., 2020.
  11. Deep multi-species embedding. arXiv preprint arXiv:1609.09353, 2016.
  12. Applications for deep learning in ecology. Methods in Ecology and Evolution, 10(10):1632–1644, 2019.
  13. Multi-label learning from single positive labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 933–942, 2021.
  14. Spatial implicit neural representations for global-scale species mapping. arXiv preprint arXiv:2306.02564, 2023.
  15. Deep learning with citizen science data enables estimation of species diversity and composition at continental extents. Ecology, page e4175, 2023.
  16. Convolutional neural networks improve species distribution modelling by capturing the spatial structure of the environment. PLoS computational biology, 17(4):e1008856, 2021.
  17. A. El-Gabbas and C. F. Dormann. Improved species-occurrence predictions in data-poor regions: using large-scale data and bias correction with down-weighted poisson regression and maxent. Ecography, 41(7):1161–1172, 2018.
  18. J. Elith and J. R. Leathwick. Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution and Systematics, 40(1):677–697, 2009.
  19. Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29(2):129–151, 2006.
  20. The art of modelling range-shifting species. Methods in ecology and evolution, 1(4):330–342, 2010.
  21. Presence-only and presence-absence data for comparing species distribution modeling methods. Biodiversity informatics, 15(2):69–80, 2020.
  22. An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data. Journal of applied ecology, 41(2):263–274, 2004.
  23. P. Flach and M. Kull. Precision-recall-gain curves: Pr analysis done right. Advances in neural information processing systems, 28, 2015.
  24. J. Franklin. Mapping species distributions: spatial inference and prediction. Cambridge University Press, 2010.
  25. Revisiting deep learning models for tabular data. Advances in Neural Information Processing Systems, 34:18932–18943, 2021.
  26. Why do tree-based models still outperform deep learning on typical tabular data? In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  27. Predicting species distributions for conservation decisions. Ecology letters, 16(12):1424–1435, 2013.
  28. T. Hastie and W. Fithian. Inference from presence-only data; the ongoing controversy. Ecography, 36(8):864–867, 2013.
  29. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  30. Field validation shows bias-corrected pseudo-absence selection is the best method for predictive species-distribution modelling. Diversity and distributions, 20(12):1403–1413, 2014.
  31. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
  32. A framework for species distribution modelling with improved pseudo-absence generation. Ecological Modelling, 312:166–174, 2015.
  33. Minimizing effects of methodological decisions on interpretation and prediction in species distribution studies: An example with background selection. Ecological Modelling, 363:48–56, 2017.
  34. Survey on deep learning with class imbalance. Journal of Big Data, 6(1):1–54, 2019.
  35. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  36. Effect of roadside bias on the accuracy of predictive maps produced by bioclimatic models. Ecological Applications, 14(2):401–413, 2004.
  37. Training techniques for presence-only habitat suitability mapping with deep learning. In IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, pages 5085–5088, 2022. doi: 10.1109/IGARSS46834.2022.9883627.
  38. K. Konowalik and A. Nosol. Evaluation metrics and validation of presence-only species distribution models based on distributional maps with varying coverage. Scientific Reports, 11(1):1482, 2021.
  39. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  40. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  41. An integrated high-resolution mapping shows congruent biodiversity patterns of fagales and pinales. New Phytologist, 235(2):759–772, 2022.
  42. Presence-only geographical priors for fine-grained image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9596–9606, 2019.
  43. Profile or group discriminative techniques? generating reliable species distribution models using pseudo-absences and target-group absences from natural history collections. Diversity and Distributions, 16(1):84–94, 2010.
  44. T. Mesaglio and C. T. Callaghan. An overview of the history, current contributions and future outlook of inaturalist in australia. Wildlife Research, 48(4):289–303, 2021.
  45. Modelling distribution and abundance with presence-only data. Journal of applied ecology, 43(3):405–412, 2006.
  46. S. J. Phillips and M. Dudík. Modeling of species distributions with maxent: new extensions and a comprehensive evaluation. Ecography, 31(2):161–175, 2008.
  47. Maximum entropy modeling of species geographic distributions. Ecological modelling, 190(3-4):231–259, 2006.
  48. Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological applications, 19(1):181–197, 2009.
  49. On the interpretations of joint modeling in community ecology. Trends in Ecology & Evolution, 36(5):391–401, 2021.
  50. Evaluation of museum collection data for use in biodiversity assessment. Conservation biology, 15(3):648–657, 2001.
  51. Performance tradeoffs in target-group bias correction for species distribution models. Ecography, 40(9):1076–1087, 2017.
  52. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8):913–929, 2017.
  53. On the stratification of multi-label data. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011, Proceedings, Part III 22, pages 145–158. Springer, 2011.
  54. Novel three-step pseudo-absence selection technique for improved species distribution modelling. PloS one, 8(8):e71218, 2013.
  55. Validation of presence-only models for conservation planning and the application to whales in a multiple-use marine park. Ecological Applications, 31(1):e02214, 2021.
  56. Development and delivery of species distribution models to inform decision-making. BioScience, 69(7):544–557, 2019.
  57. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
  58. Species distribution modelling—effect of design and sample size of pseudo-absence observations. Ecological Modelling, 222(11):1800–1809, 2011.
  59. Satbird: a dataset for bird species distribution modeling using remote sensing and citizen science data. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  60. Perspectives in machine learning for wildlife conservation. Nature communications, 13(1):792, 2022.
  61. blockcv: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Biorxiv, page 357798, 2018.
  62. Modelling species presence-only data with random forests. Ecography, 44(12):1731–1742, 2021.
  63. Predictive performance of presence-only species distribution models: a benchmark study with reproducible code. Ecological Monographs, 92(1):e01486, 2022.
  64. Flexible species distribution modelling methods perform well on spatially separated testing data. Global Ecology and Biogeography, 32(3):369–383, 2023.
  65. Selecting pseudo-absence data for presence-only distribution modeling: how far should you stray from what you know? Ecological modelling, 220(4):589–594, 2009.
  66. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  67. M. Wisz and A. Guisan. Do pseudo-absence selection strategies influence species distribution models and their predictions? an information-theoretic approach based on simulated data. BMC ecology, 9:8, 05 2009. doi: 10.1186/1472-6785-9-8.
  68. Exploring the potential of neural networks for species distribution modeling. ICLR climate change AI workshop, 2023.
  69. A novel multimodal species distribution model fusing remote sensing images and environmental features. Sustainability, 14(21):14034, 2022.
Citations (3)

Summary

We haven't generated a summary for this paper yet.