Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis (2405.00876v1)
Abstract: Vision LLMs (VLMs) have recently emerged and gained the spotlight for their ability to comprehend the dual modality of image and textual data. VLMs such as LLaVA, ChatGPT-4, and Gemini have recently shown impressive performance on tasks such as natural image captioning, visual question answering (VQA), and spatial reasoning. Additionally, a universal segmentation model by Meta AI, Segment Anything Model (SAM) shows unprecedented performance at isolating objects from unforeseen images. Since medical experts, biologists, and materials scientists routinely examine microscopy or medical images in conjunction with textual information in the form of captions, literature, or reports, and draw conclusions of great importance and merit, it is indubitably essential to test the performance of VLMs and foundation models such as SAM, on these images. In this study, we charge ChatGPT, LLaVA, Gemini, and SAM with classification, segmentation, counting, and VQA tasks on a variety of microscopy images. We observe that ChatGPT and Gemini are impressively able to comprehend the visual features in microscopy images, while SAM is quite capable at isolating artefacts in a general sense. However, the performance is not close to that of a domain expert - the models are readily encumbered by the introduction of impurities, defects, artefact overlaps and diversity present in the images.
- OpenAI, “Gpt-4 technical report,” ArXiv, vol. abs/2303.08774, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257532815
- G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth et al., “Gemini: a family of highly capable multimodal models,” arXiv preprint arXiv:2312.11805, 2023.
- H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” arXiv preprint arXiv:2304.08485, 2023.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026.
- C. Shen, C. Wang, M. Huang, N. Xu, S. van der Zwaag, and W. Xu, “A generic high-throughput microstructure classification and quantification method for regular SEM images of complex steel microstructures combining EBSD labeling and deep learning,” vol. 93, pp. 191–204. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1005030221003819
- W. Ma, E. J. Kautz, A. Baskaran, A. Chowdhury, V. Joshi, B. Yener, and D. J. Lewis, “Image-driven discriminative and generative machine learning algorithms for establishing microstructure–processing relationships,” vol. 128, no. 13, p. 134901. [Online]. Available: https://doi.org/10.1063/5.0013720
- A. Baskaran, G. Kane, K. Biggs, R. Hull, and D. Lewis, “Adaptive characterization of microstructure dataset using a two stage machine learning approach,” vol. 177, p. 109593. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0927025620300847
- J. Tang, X. Geng, D. Li, Y. Shi, J. Tong, H. Xiao, and F. Peng, “Machine learning-based microstructure prediction during laser sintering of alumina,” vol. 11, no. 1, p. 10724, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41598-021-89816-x
- K. Tsutsui, H. Terasaki, K. Uto, T. Maemura, S. Hiramatsu, K. Hayashi, K. Moriguchi, and S. Morito, “A methodology of steel microstructure recognition using SEM images by machine learning based on textural analysis,” vol. 25, p. 101514. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352492820325253
- R. Perera, D. Guzzetti, and V. Agrawal, “Optimized and autonomous machine learning framework for characterizing pores, particles, grains and grain boundaries in microstructural images,” vol. 196, p. 110524. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0927025621002512
- J. Yang and H. Yao, “Automated identification and characterization of two-dimensional materials via machine learning-based processing of optical microscope images,” vol. 39, p. 100771. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2352431620301048
- B. Lee, S. Yoon, J. W. Lee, Y. Kim, J. Chang, J. Yun, J. C. Ro, J.-S. Lee, and J. H. Lee, “Statistical characterization of the morphologies of nanoparticles through machine learning based electron microscopy image analysis,” vol. 14, no. 12, pp. 17 125–17 133, publisher: American Chemical Society. [Online]. Available: https://doi.org/10.1021/acsnano.0c06809
- M. Ilett, J. Wills, P. Rees, S. Sharma, S. Micklethwaite, A. Brown, R. Brydson, and N. Hondow, “Application of automated electron microscopy imaging and machine learning to characterise and quantify nanoparticle dispersion in aqueous media,” vol. 279, no. 3, pp. 177–184, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/jmi.12853. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/jmi.12853
- N. Khatavkar, S. Swetlana, and A. K. Singh, “Accelerated prediction of vickers hardness of co- and ni-based superalloys from microstructure and composition using advanced image processing techniques and machine learning,” vol. 196, pp. 295–303. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1359645420304778
- S. Akers, E. Kautz, A. Trevino-Gavito, M. Olszta, B. E. Matthews, L. Wang, Y. Du, and S. R. Spurgeon, “Rapid and flexible segmentation of electron microscopy data using few-shot machine learning,” vol. 7, no. 1, pp. 1–9, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41524-021-00652-z
- A. Bihani, H. Daigle, J. E. Santos, C. Landry, M. Prodanović, and K. Milliken, “MudrockNet: Semantic segmentation of mudrock SEM images through deep learning,” vol. 158, p. 104952. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300421002387
- Z. Chen, X. Liu, J. Yang, E. Little, and Y. Zhou, “Deep learning-based method for SEM image segmentation in mineral characterization, an example from duvernay shale samples in western canada sedimentary basin,” vol. 138, p. 104450. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0098300419304819
- J. Rausch, D. Jaramillo-Vogel, S. Perseguers, N. Schnidrig, B. Grobéty, and P. Yajan, “Automated identification and quantification of tire wear particles (TWP) in airborne dust: SEM/EDX single particle analysis coupled to a machine learning classifier,” vol. 803, p. 149832. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S004896972104907X
- K. Kaufmann, H. Lane, X. Liu, and K. S. Vecchio, “Efficient few-shot machine learning for classification of EBSD patterns,” vol. 11, no. 1, p. 8172, number: 1 Publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/s41598-021-87557-5
- M.-H. Van, P. Verma, and X. Wu, “On large visual language models for medical imaging analysis: An empirical study,” arXiv preprint arXiv:2402.14162, 2024.
- Y. Huang, X. Yang, L. Liu, H. Zhou, A. Chang, X. Zhou, R. Chen, J. Yu, J. Chen, C. Chen et al., “Segment anything model for medical images?” Medical Image Analysis, vol. 92, p. 103061, 2024.
- H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale et al., “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
- C. Li, C. Wong, S. Zhang, N. Usuyama, H. Liu, J. Yang, T. Naumann, H. Poon, and J. Gao, “Llava-med: Training a large language-and-vision assistant for biomedicine in one day,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- Z. Yan, K. Zhang, R. Zhou, L. He, X. Li, and L. Sun, “Multimodal chatgpt for medical applications: an experimental study of gpt-4v,” arXiv preprint arXiv:2310.19061, 2023.
- S. Srivastav, R. Chandrakar, S. Gupta, V. Babhulkar, S. Agrawal, A. Jaiswal, R. Prasad, M. B. Wanjari, S. Agarwal, and M. Wanjari, “Chatgpt in radiology: the advantages and limitations of artificial intelligence for medical imaging diagnosis,” Cureus, vol. 15, no. 7, 2023.
- R. Aversa, M. H. Modarres, S. Cozzini, and R. Ciancio, “NFFA-EUROPE - 100% SEM dataset,” pages: being the single images entirely detached from any specific information or scientific detail related to the displayed subject. This work has been done within the NFFA-EUROPE project (www.nffa.eu) and has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No. 654360 NFFA Publication Title: Dataset of 21 Volume: 169 SEM images produced at CNR-IOM (Trieste. [Online]. Available: https://b2share.eudat.eu/records/80df8606fcdb4b2bae1656f0dc6db8ba
- V. Ljosa, K. L. Sokolnicki, and A. E. Carpenter, “Annotated high-throughput microscopy image sets for validation,” vol. 9, no. 7, pp. 637–637, publisher: Nature Publishing Group. [Online]. Available: https://www.nature.com/articles/nmeth.2083
- M. W. Davidson. Mitosis in onion root tips. [Online]. Available: https://micro.magnet.fsu.edu/micro/gallery/mitosis/mitosis.html
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- OpenAI. GPT-4 Technical Report.
- J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds et al., “Flamingo: a visual language model for few-shot learning,” NeurIPS, 2022.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.
- J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V. Le, D. Zhou et al., “Chain-of-thought prompting elicits reasoning in large language models,” NeurIPS, 2022.
- N. Otsu, “A threshold selection method from gray-level histograms,” vol. 9, no. 1, pp. 62–66, conference Name: IEEE Transactions on Systems, Man, and Cybernetics. [Online]. Available: https://ieeexplore.ieee.org/document/4310076
- Prateek Verma (39 papers)
- Minh-Hao Van (12 papers)
- Xintao Wu (70 papers)