Leveraging Habitat Information for Fine-grained Bird Identification (2312.14999v3)
Abstract: Traditional bird classifiers mostly rely on the visual characteristics of birds. Some prior works even train classifiers to be invariant to the background, completely discarding the living environment of birds. Instead, we are the first to explore integrating habitat information, one of the four major cues for identifying birds by ornithologists, into modern bird classifiers. We focus on two leading model types: (1) CNNs and ViTs trained on the downstream bird datasets; and (2) original, multi-modal CLIP. Training CNNs and ViTs with habitat-augmented data results in an improvement of up to +0.83 and +0.23 points on NABirds and CUB-200, respectively. Similarly, adding habitat descriptors to the prompts for CLIP yields a substantial accuracy boost of up to +0.99 and +1.1 points on NABirds and CUB-200, respectively. We find consistent accuracy improvement after integrating habitat features into the image augmentation process and into the textual descriptors of vision-language CLIP classifiers. Code is available at: https://anonymous.4open.science/r/reasoning-8B7E/.
- Habitat characteristics supporting bird species richness in mid-field woodlots. Frontiers, 2023.
- Habitat Selection in Birds, chapter Habitat Selection in Birds. ScienceDirect, 2023.
- All About Birds. All About Birds: Online Bird Guide, 2023. 17 Oct 2023.
- Anonymous. Part-based bird classifiers with an explainable, editable language bottleneck. In Submitted to The Twelfth International Conference on Learning Representations, 2023. under review.
- Avian Report. Understanding bird habitats, 2023. Accessed: 2023-11-06.
- Birds & Blooms. Bird watching by habitat, 2023. Accessed: 2023-11-06.
- British Trust for Ornithology. The importance of habitat data, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1290–1299, 2022.
- Geo-aware networks for fine-grained recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
- Cornell Lab of Ornithology. Four keys to bird identification. https://www.allaboutbirds.org/news/four-keys-to-bird-identification/, 2023. Accessed: 2023-11-06.
- Metaformer: A unified meta framework for fine-grained recognition. CoRR, abs/2203.02751, 2022.
- Deformable protopnet: An interpretable image classifier using deformable prototypes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10265–10275, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4438–4446, 2017.
- Machine and deep learning approaches to understand and predict habitat suitability for seabird breeding. Ecology and Evolution, 13(9):e10549, 2023.
- Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2918–2928, 2021.
- Transfg: A transformer architecture for fine-grained recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 852–860, 2022.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Fine-grained image classification via combining vision and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5994–6002, 2017.
- Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE international conference on computer vision, pages 1449–1457, 2015.
- Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
- Presence-only geographical priors for fine-grained image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9596–9606, 2019.
- M. A. Marini et al. The role of different natural and human-related habitats for the conservation of birds in a high andean lake. Link Springer, 2019.
- Visual classification via description from large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
- Neural prototype trees for interpretable fine-grained image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14933–14943, 2021.
- This looks like that, because… explaining prototypes for interpretable image recognition. In Machine Learning and Principles and Practice of Knowledge Discovery in Databases: International Workshops of ECML PKDD 2021, Virtual Event, September 13-17, 2021, Proceedings, Part I, pages 441–456. Springer, 2022.
- OpenAI. Gpt-4 technical report, 2023.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Protoseg: Interpretable semantic segmentation with prototypical parts. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1481–1492, 2023.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
- Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
- Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 2149–2159, 2022.
- Visual correspondence-based explanations improve ai robustness and human-ai team accuracy. Advances in Neural Information Processing Systems, 35:34287–34301, 2022.
- Improving image classification with location context. In Proceedings of the IEEE international conference on computer vision, pages 1008–1016, 2015.
- The Cornell Lab. Inside birding: Habitat, 2023.
- Benchmarking representation learning for natural world image collections. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12884–12893, 2021.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- CLAD: A contrastive learning based approach for background debiasing. In 33rd British Machine Vision Conference 2022, BMVC 2022, London, UK, November 21-24, 2022, page 449. BMVA Press, 2022.
- Identifying habitat elements from bird images using deep convolutional neural networks. Animals, 11(5):1263, 2021.
- Noise or signal: The role of image backgrounds in object recognition. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Protopformer: Concentrating on prototypical parts in vision transformers for interpretable image recognition. arXiv preprint arXiv:2208.10431, 2022.
- Learning to navigate for fine-grained classification. In Proceedings of the European conference on computer vision (ECCV), pages 420–435, 2018.
- FILIP: fine-grained interactive language-image pre-training. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022.
- Coca: Contrastive captioners are image-text foundation models. Trans. Mach. Learn. Res., 2022, 2022.
- Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, pages 5209–5217, 2017.
- Object recognition with and without objects. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, page 3609–3615. AAAI Press, 2017.