Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis (2405.00876v1)

Published 1 May 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Vision LLMs (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data. VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning. Additionally, a universal segmentation model by Meta AI, Segment Anything Model (SAM) shows unprecedented performance at isolating objects from unforeseen images. Since medical experts, biologists, and materials scientists routinely examine microscopy or medical images in conjunction with textual information in the form of captions, literature, or reports, and draw conclusions of great importance and merit, it is indubitably essential to test the performance of VLMs and foundation models such as SAM, on these images. In this study, we charge ChatGPT, LLaVA, Gemini, and SAM with classification, segmentation, counting, and VQA tasks on a variety of microscopy images. We observe that ChatGPT and Gemini are impressively able to comprehend the visual features in microscopy images, while SAM is quite capable at isolating artefacts in a general sense. However, the performance is not close to that of a domain expert - the models are readily encumbered by the introduction of impurities, defects, artefact overlaps and diversity present in the images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. OpenAI, “Gpt-4 technical report,” ArXiv, vol. abs/2303.08774, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257532815
  2. G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth et al., “Gemini: a family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
  3. H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” arXiv preprint arXiv:2304.08485, 2023.
  4. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026.
  5. C. Shen, C. Wang, M. Huang, N. Xu, S. van der Zwaag, and W. Xu, “A generic high-throughput microstructure classification and quantification method for regular SEM images of complex steel microstructures combining EBSD labeling and deep learning,” vol. 93, pp. 191–204. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1005030221003819
  6. W. Ma, E. J. Kautz, A. Baskaran, A. Chowdhury, V. Joshi, B. Yener, and D. J. Lewis, “Image-driven discriminative and generative machine learning algorithms for establishing microstructure–processing relationships,” vol. 128, no. 13, p. 134901. [Online]. Available: https://doi.org/10.1063/5.0013720
  7. A. Baskaran, G. Kane, K. Biggs, R. Hull, and D. Lewis, “Adaptive characterization of microstructure dataset using a two stage machine learning approach,” vol. 177, p. 109593. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0927025620300847
  8. J. Tang, X. Geng, D. Li, Y. Shi, J. Tong, H. Xiao, and F. Peng, “Machine learning-based microstructure prediction during laser sintering of alumina,” vol. 11, no. 1, p. 10724, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41598-021-89816-x
  9. K. Tsutsui, H. Terasaki, K. Uto, T. Maemura, S. Hiramatsu, K. Hayashi, K. Moriguchi, and S. Morito, “A methodology of steel microstructure recognition using SEM images by machine learning based on textural analysis,” vol. 25, p. 101514. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352492820325253
  10. R. Perera, D. Guzzetti, and V. Agrawal, “Optimized and autonomous machine learning framework for characterizing pores, particles, grains and grain boundaries in microstructural images,” vol. 196, p. 110524. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0927025621002512
  11. J. Yang and H. Yao, “Automated identification and characterization of two-dimensional materials via machine learning-based processing of optical microscope images,” vol. 39, p. 100771. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352431620301048
  12. B. Lee, S. Yoon, J. W. Lee, Y. Kim, J. Chang, J. Yun, J. C. Ro, J.-S. Lee, and J. H. Lee, “Statistical characterization of the morphologies of nanoparticles through machine learning based electron microscopy image analysis,” vol. 14, no. 12, pp. 17 125–17 133, publisher: American Chemical Society. [Online]. Available: https://doi.org/10.1021/acsnano.0c06809
  13. M. Ilett, J. Wills, P. Rees, S. Sharma, S. Micklethwaite, A. Brown, R. Brydson, and N. Hondow, “Application of automated electron microscopy imaging and machine learning to characterise and quantify nanoparticle dispersion in aqueous media,” vol. 279, no. 3, pp. 177–184, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/jmi.12853. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/jmi.12853
  14. N. Khatavkar, S. Swetlana, and A. K. Singh, “Accelerated prediction of vickers hardness of co- and ni-based superalloys from microstructure and composition using advanced image processing techniques and machine learning,” vol. 196, pp. 295–303. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1359645420304778
  15. S. Akers, E. Kautz, A. Trevino-Gavito, M. Olszta, B. E. Matthews, L. Wang, Y. Du, and S. R. Spurgeon, “Rapid and flexible segmentation of electron microscopy data using few-shot machine learning,” vol. 7, no. 1, pp. 1–9, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41524-021-00652-z
  16. A. Bihani, H. Daigle, J. E. Santos, C. Landry, M. Prodanović, and K. Milliken, “MudrockNet: Semantic segmentation of mudrock SEM images through deep learning,” vol. 158, p. 104952. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300421002387
  17. Z. Chen, X. Liu, J. Yang, E. Little, and Y. Zhou, “Deep learning-based method for SEM image segmentation in mineral characterization, an example from duvernay shale samples in western canada sedimentary basin,” vol. 138, p. 104450. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300419304819
  18. J. Rausch, D. Jaramillo-Vogel, S. Perseguers, N. Schnidrig, B. Grobéty, and P. Yajan, “Automated identification and quantification of tire wear particles (TWP) in airborne dust: SEM/EDX single particle analysis coupled to a machine learning classifier,” vol. 803, p. 149832. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S004896972104907X
  19. K. Kaufmann, H. Lane, X. Liu, and K. S. Vecchio, “Efficient few-shot machine learning for classification of EBSD patterns,” vol. 11, no. 1, p. 8172, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41598-021-87557-5
  20. M.-H. Van, P. Verma, and X. Wu, “On large visual language models for medical imaging analysis: An empirical study,” arXiv preprint arXiv:2402.14162, 2024.
  21. Y. Huang, X. Yang, L. Liu, H. Zhou, A. Chang, X. Zhou, R. Chen, J. Yu, J. Chen, C. Chen et al., “Segment anything model for medical images?” Medical Image Analysis, vol. 92, p. 103061, 2024.
  22. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
  23. C. Li, C. Wong, S. Zhang, N. Usuyama, H. Liu, J. Yang, T. Naumann, H. Poon, and J. Gao, “Llava-med: Training a large language-and-vision assistant for biomedicine in one day,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  24. Z. Yan, K. Zhang, R. Zhou, L. He, X. Li, and L. Sun, “Multimodal chatgpt for medical applications: an experimental study of gpt-4v,” arXiv preprint arXiv:2310.19061, 2023.
  25. S. Srivastav, R. Chandrakar, S. Gupta, V. Babhulkar, S. Agrawal, A. Jaiswal, R. Prasad, M. B. Wanjari, S. Agarwal, and M. Wanjari, “Chatgpt in radiology: the advantages and limitations of artificial intelligence for medical imaging diagnosis,” Cureus, vol. 15, no. 7, 2023.
  26. R. Aversa, M. H. Modarres, S. Cozzini, and R. Ciancio, “NFFA-EUROPE - 100% SEM dataset,” pages: being the single images entirely detached from any specific information or scientific detail related to the displayed subject. This work has been done within the NFFA-EUROPE project (www.nffa.eu) and has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 654360 NFFA Publication Title: Dataset of 21 Volume: 169 SEM images produced at CNR-IOM (Trieste. [Online]. Available: https://b2share.eudat.eu/records/80df8606fcdb4b2bae1656f0dc6db8ba
  27. V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter, “Annotated high-throughput microscopy image sets for validation,” vol. 9, no. 7, pp. 637–637, publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/nmeth.2083
  28. M. W. Davidson. Mitosis in onion root tips. [Online]. Available: https://micro.magnet.fsu.edu/micro/gallery/mitosis/mitosis.html
  29. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  30. OpenAI. GPT-4 Technical Report.
  31. J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds et al., “Flamingo: a visual language model for few-shot learning,” NeurIPS, 2022.
  32. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.
  33. J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” NeurIPS, 2022.
  34. N. Otsu, “A threshold selection method from gray-level histograms,” vol. 9, no. 1, pp. 62–66, conference Name: IEEE Transactions on Systems, Man, and Cybernetics. [Online]. Available: https://ieeexplore.ieee.org/document/4310076
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Prateek Verma (39 papers)
  2. Minh-Hao Van (12 papers)
  3. Xintao Wu (70 papers)
Citations (1)