LD-SDM: Language-Driven Hierarchical Species Distribution Modeling (2312.08334v1)
Abstract: We focus on the problem of species distribution modeling using global-scale presence-only data. Most previous studies have mapped the range of a given species using geographical and environmental features alone. To capture a stronger implicit relationship between species, we encode the taxonomic hierarchy of species using a LLM. This enables range mapping for any taxonomic rank and unseen species without additional supervision. Further, we propose a novel proximity-aware evaluation metric that enables evaluating species distribution models using any pixel-level representation of ground-truth species range map. The proposed metric penalizes the predictions of a model based on its proximity to the ground truth. We describe the effectiveness of our model by systematically evaluating on the task of species range prediction, zero-shot prediction and geo-feature regression against the state-of-the-art. Results show our model outperforms the strong baselines when trained with a variety of multi-label learning losses.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10181–10190, 2021.
- A near-linear time algorithm for the chamfer distance. arXiv preprint arXiv:2307.03043, 2023.
- Species distribution modeling for machine learning practitioners: A review. In ACM SIGCAS conference on computing and sustainable societies, pages 329–348, 2021.
- Spherical fourier neural operators: Learning stable dynamics on the sphere. International conference on machine learning, 2023.
- A deep learning approach to species distribution modelling. Multimedia Tools and Applications for Environmental & Biodiversity Informatics, pages 169–199, 2018.
- Overview of geolifeclef 2019: plant species prediction using environment and animal occurrences. In CLEF 2019 Working Notes-Conference and Labs of the Evaluation Forum, page 257, 2019.
- Jointly estimating spatial sampling effort and habitat suitability for multiple species from opportunistic presence-only data. Methods in Ecology and Evolution, 12(5):933–945, 2021.
- From individual to joint species distribution models: A comparison of model complexity and predictive performance. Journal of Biogeography, 46(10):2260–2274, 2019.
- Bias reduction via end-to-end shift learning: Application to citizen science. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 493–500, 2019.
- Deep multi-species embedding. arXiv preprint arXiv:1609.09353, 2016.
- Geo-aware networks for fine-grained recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
- The geolifeclef 2020 dataset. arXiv preprint arXiv:2004.04192, 2020.
- Multi-label learning from single positive labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 933–942, 2021.
- Spatial implicit neural representations for global-scale species mapping. International conference on machine learning, 2023.
- Species distribution models: ecological explanation and prediction across space and time. Annual review of ecology, evolution, and systematics, 40:677–697, 2009.
- Novel methods improve prediction of species’ distributions from occurrence data. Ecography, 29(2):129–151, 2006.
- Worldclim 2: new 1-km spatial resolution climate surfaces for global land areas. International journal of climatology, 37(12):4302–4315, 2017.
- Spatiotemporal exploratory models for broad-scale survey data. Ecological Applications, 20(8):2131–2147, 2010.
- Cascaded neural networks improving fish species prediction accuracy: the role of the biotic information. Scientific Reports, 8(1):4581, 2018.
- Asymmetric polynomial loss for multi-label classification. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
- Estimating species distributions from spatially biased citizen science data. Ecological Modelling, 422:108927, 2020.
- Evaluation metrics and validation of presence-only species distribution models based on distributional maps with varying coverage. Scientific Reports, 11(1):1482, 2021.
- Active learning-based species range estimation. Advances in Neural Information Processing Systems, 2023.
- Overview of geolifeclef 2021: Predicting species distribution from 2 million remote sensing images. In CLEF (Working Notes), pages 1451–1462, 2021.
- Overview of geolifeclef 2022: Predicting species presence from multi-modal remote sensing, bioclimatic and pedologic data. In CLEF 2022-Conference and Labs of the Evaluation Forum, pages 1940–1956, 2022.
- Presence-only geographical priors for fine-grained image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9596–9606, 2019.
- DARRYL I MacKENZIE. What are the issues with presence-absence data for wildlife managers? The Journal of Wildlife Management, 69(3):849–860, 2005.
- A comprehensive evaluation of predictive performance of 33 species distribution models at species and community levels. Ecological monographs, 89(3):e01370, 2019.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Robust asymmetric loss for multi-label long-tailed learning. arXiv preprint arXiv:2308.05542, 2023.
- Asymmetric loss for multi-label classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 82–91, 2021.
- The gbif integrated publishing toolkit: facilitating the efficient publishing of biodiversity data on the internet. PloS one, 9(8):e102623, 2014.
- Geographic location encoding with spherical harmonics and sinusoidal representation networks. arXiv preprint arXiv:2310.06743, 2023.
- Birdsat: Cross-view contrastive masked autoencoders for bird species classification and mapping. IEEE/CVF Winter Conference on Applications of Computer Vision, 2024.
- An evaluation of methods for modelling species distributions. Journal of biogeography, 31(10):1555–1568, 2004.
- ebird: A citizen-based bird observation network in the biological sciences. Biological conservation, 142(10):2282–2292, 2009.
- Multi-entity dependence learning with rich context via conditional variational auto-encoder. In Proceedings of the AAAI conference on artificial intelligence, 2018.
- Satbird: Bird species distribution modeling with remote sensing and citizen science data. Advances in Neural Information Processing Systems, 2023a.
- Bird distribution modelling using remote sensing and citizen science data. arXiv preprint arXiv:2305.01079, 2023b.
- Joint dynamic species distribution models: a tool for community ordination and spatio-temporal monitoring. Global Ecology and Biogeography, 25(9):1144–1158, 2016.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Predictive performance of presence-only species distribution models: a benchmark study with reproducible code. Ecological Monographs, 92(1):e01486, 2022.
- The inaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
- Peter D Wilson. Distance-based methods for the analysis of maps produced by species distribution models. Methods in Ecology and Evolution, 2(6):623–633, 2011.
- Density-aware chamfer distance as a comprehensive metric for point cloud completion. Advances in Neural Information Processing Systems, 2021.
- Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
- Simple and robust loss design for multi-label learning with missing labels. arXiv preprint arXiv:2112.07368, 2021.
- Acknowledging the unknown for multi-label learning with single positive labels. In European Conference on Computer Vision, pages 423–440. Springer, 2022.