Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale (2405.17537v3)

Published 27 May 2024 in cs.AI, cs.CL, and cs.CV

Abstract: Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, barcode DNA, and text-based representations of taxonomic labels in a unified embedding space. This allows for accurate classification of both known and unknown insect species without task-specific fine-tuning, leveraging contrastive learning for the first time to fuse DNA and image data. Our method surpasses previous single-modality approaches in accuracy by over 8% on zero-shot learning tasks, showcasing its effectiveness in biodiversity studies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. VATT: Transformers for multimodal self-supervised learning from raw video, audio and text. In Advances in Neural Information Processing Systems, pages 24206–24221. Curran Associates, Inc., 2021.
  2. Self-supervised multimodal versatile networks. In Advances in Neural Information Processing Systems, pages 25–37. Curran Associates, Inc., 2020.
  3. BarcodeBERT: Transformers for biodiversity analysis. arXiv preprint arXiv:2311.02401, 2023.
  4. Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods, 18(10):1196–1203, 2021.
  5. Fine-grained zero-shot learning with DNA as side information. In Advances in Neural Information Processing Systems, pages 19352–19362. Curran Associates, Inc., 2021.
  6. Classifying the unknown: Insect identification with deep hierarchical Bayesian learning. Methods in Ecology and Evolution, 14(6):1515–1530, 2023.
  7. Birdsnap: Large-scale fine-grained visual categorization of birds. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 2019–2026, 2014.
  8. SNP2Vec: Scalable self-supervised pre-training for genome-wide association study. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 140–154, Dublin, Ireland, 2022. Association for Computational Linguistics.
  9. Reproducible scaling laws for contrastive language-image learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2829, 2023.
  10. Contrastive language and vision learning of general fashion concepts. Scientific Reports, 12(1):18958, 2022.
  11. Applications for deep learning in ecology. Methods in Ecology and Evolution, 10(10):1632–1644, 2019.
  12. When does contrastive visual representation learning work? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14755–14764, 2022.
  13. The nucleotide transformer: Building and evaluating robust foundation models for human genomics. bioRxiv, pages 2023–01, 2023.
  14. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, 2019. Association for Computational Linguistics.
  15. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  16. CLOOB: Modern hopfield networks with InfoLOOB outperform CLIP. In Advances in Neural Information Processing Systems, pages 20450–20468. Curran Associates, Inc., 2022.
  17. Pl@ntNet-300K: a plant image dataset with high label ambiguity and a long-tailed distribution. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks. Curran Associates, Inc., 2021.
  18. A step towards worldwide biodiversity assessment: The BIOSCAN-1M insect dataset. In Advances in Neural Information Processing Systems, pages 43593–43619. Curran Associates, Inc., 2024.
  19. ImageBind: One embedding space to bind them all. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15180–15190, 2023.
  20. CyCLIP: Cyclic contrastive language-image pretraining. In Advances in Neural Information Processing Systems, pages 6704–6719. Curran Associates, Inc., 2022.
  21. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare, 3(1):1–23, 2021.
  22. Fine-grained image classification via combining vision and language. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5994–6002, 2017.
  23. Biological identifications through DNA barcodes. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270(1512):313–321, 2003.
  24. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  25. Quilt-1M: One million image-text pairs for histopathology. In Advances in Neural Information Processing Systems, pages 37995–38017. Curran Associates, Inc., 2023.
  26. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 37(15):2112–2120, 2021.
  27. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 4904–4916. PMLR, 2021.
  28. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR), San Diega, CA, USA, 2015.
  29. BERT-Promoter: An improved sequence-based predictor of dna promoter using bert pre-trained model and shap feature selection. Computational Biology and Chemistry, 99:107732, 2022.
  30. Learning the histone codes with large genomic windows and three-dimensional chromatin interactions using transformer. Nature Communications, 13(1):6678, 2022.
  31. BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the International Conference on Machine Learning, pages 12888–12900. PMLR, 2022a.
  32. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023a.
  33. Grounded language-image pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10965–10975, 2022b.
  34. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm. In 10th International Conference on Learning Representations, 2022c.
  35. Applications of deep learning in understanding gene regulation. Cell Reports Methods, 3(1):100384, 2023b.
  36. DeepMicrobes: taxonomic classification for metagenomics with deep learning. NAR Genomics and Bioinformatics, 2(1):lqaa009, 2020.
  37. Towards a visual-language foundation model for computational pathology. arXiv preprint arXiv:2307.12914, 2023.
  38. The insect cytochrome oxidase I gene: evolutionary patterns and conserved primers for phylogenetic studies. Insect Molecular Biology, 5(3):153–165, 1996.
  39. Presence-only geographical priors for fine-grained image classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9596–9606, 2019.
  40. A survey on image-based insect classification. Pattern Recognition, 65:273–284, 2017.
  41. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  42. Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks. Proceedings of the National Academy of Sciences, 119(35):e2122636119, 2022.
  43. HyenaDNA: Long-range genomic sequence modeling at single nucleotide resolution. In Advances in Neural Information Processing Systems, pages 43177–43201. Curran Associates, Inc., 2023a.
  44. Insect-Foundation: A foundation model and large-scale 1M dataset for visual insect understanding. arXiv preprint arXiv:2311.15206, 2023b.
  45. Audio-visual scene analysis with self-supervised multisensory features. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  46. Gerald Piosenka. Birds 525 species - image classification, 2023.
  47. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  48. From categories to subcategories: large-scale image classification with partial class label refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 231–239, 2015.
  49. TriCoLo: Trimodal contrastive loss for text to shape retrieval. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5815–5825, 2024.
  50. Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2016.
  51. Bi-modal progressive mask attention for fine-grained recognition. IEEE Transactions on Image Processing, 29:7006–7018, 2020.
  52. BioCLIP: A vision foundation model for the tree of life. arXiv preprint arXiv:2311.18803, 2023.
  53. Nigel E Stork. How many species of insects and other terrestrial arthropods are there on Earth? Annual review of entomology, 63(1):31–45, 2018. PMID: 28938083.
  54. A weakly supervised fine label classifier enhanced by coarse supervision. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 6459–6468, 2019.
  55. Transfer learning enables predictions in network biology. Nature, 618(7965):616–624, 2023.
  56. Grafit: Learning fine-grained image representations with coarse labels. In Proceedings of the IEEE/CVF international conference on computer vision, pages 874–884, 2021.
  57. Well-read students learn better: On the importance of pre-training compact models. arXiv preprint arXiv:1908.08962, 2019.
  58. The iNaturalist species classification and detection dataset. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8769–8778, 2018.
  59. Open-set recognition: a good closed-set classifier is all you need? In 10th International Conference on Learning Representations, 2022.
  60. Fine-grained image analysis with deep learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):8927–8948, 2022.
  61. What should not be contrastive in contrastive learning. In International Conference on Learning Representations, 2021.
  62. Sigmoid loss for language image pre-training. arXiv preprint arXiv:2303.15343, 2023.
  63. Audio visual attribute discovery for fine-grained object recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, 2018.
  64. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs. arXiv preprint arXiv:2303.00915, 2024.
  65. Contrastive learning of medical visual representations from paired images and text. In Proceedings of the 7th Machine Learning for Healthcare Conference, pages 2–25. PMLR, 2022.
  66. DNABERT-2: Efficient foundation model and benchmark for multi-species genomes. In International Conference on Learning Representations, 2024a.
  67. DNABERT-S: Learning species-aware DNA embedding with genome foundation models. arXiv preprint arXiv:2402.08777, 2024b.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com