Analysis and Validation of Image Search Engines in Histopathology
Abstract: Searching for similar images in archives of histology and histopathology images is a crucial task that may aid in patient matching for various purposes, ranging from triaging and diagnosis to prognosis and prediction. Whole slide images (WSIs) are highly detailed digital representations of tissue specimens mounted on glass slides. Matching WSI to WSI can serve as the critical method for patient matching. In this paper, we report extensive analysis and validation of four search methods bag of visual words (BoVW), Yottixel, SISH, RetCCL, and some of their potential variants. We analyze their algorithms and structures and assess their performance. For this evaluation, we utilized four internal datasets ($1269$ patients) and three public datasets ($1207$ patients), totaling more than $200,000$ patches from $38$ different classes/subtypes across five primary sites. Certain search engines, for example, BoVW, exhibit notable efficiency and speed but suffer from low accuracy. Conversely, search engines like Yottixel demonstrate efficiency and speed, providing moderately accurate results. Recent proposals, including SISH, display inefficiency and yield inconsistent outcomes, while alternatives like RetCCL prove inadequate in both accuracy and efficiency. Further research is imperative to address the dual aspects of accuracy and minimal storage requirements in histopathological image search.
- Multiple disjoint dictionaries for representation of histopathology images. Journal of Visual Communication and Image Representation, 55:243–252, 2018.
- Yottixel – an image search engine for large archives of histopathology whole slide images. Medical Image Analysis, 65:101757, 2020.
- Fast and scalable search of whole-slide images via self-supervised deep learning. Nature Biomedical Engineering, 6(12):1420–1434, 2022. Chen C Lu MY Williamson DFK Chen TY Schaumberg AJ Mahmood F.
- Retccl: clustering-guided contrastive learning for whole-slide image retrieval. Medical image analysis, 83:102645, 2023.
- Modern information retrieval, volume 463. ACM press New York, 1999.
- Amit Singhal et al. Modern information retrieval: A brief overview. IEEE Data Eng. Bull., 24(4):35–43, 2001.
- Abby A Goodrum. Image information retrieval: An overview of current research. Informing Science, 3:63, 2000.
- Beyond information retrieval—medical question answering. In AMIA annual symposium proceedings, volume 2006, page 469. American Medical Informatics Association, 2006.
- Context-sensitive medical information retrieval. In MEDINFO 2004, pages 282–286. IOS Press, 2004.
- Carla Teixeira Lopes. Health information retrieval–state of the art report. arXiv preprint arXiv:2205.09083, 2022.
- Content-based image retrieval of digitized histopathology in boosted spectrally embedded spaces. Journal of pathology informatics, 6(1):41, 2015.
- Deep learning-based retrieval system for gigapixel histopathology cases and the open access literature. Journal of pathology informatics, 10(1):19, 2019.
- Similar image search for histopathology: Smily. NPJ digital medicine, 2(1):56, 2019.
- Kimia-5mag–a dataset for learning the magnification in histopathology images. In 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pages 363–367. IEEE, 2020.
- Content-based microscopic image retrieval system for multi-image queries. IEEE transactions on information technology in biomedicine, 16(4):758–769, 2012.
- Searching images for consensus: can ai remove observer variability in pathology? The American journal of pathology, 191(10):1702–1708, 2021.
- Pan-cancer diagnostic consensus through searching archival histopathology images using artificial intelligence. NPJ digital medicine, 3(1):31, 2020.
- Towards large-scale histopathological image analysis: Hashing-based image retrieval. IEEE Transactions on Medical Imaging, 34(2):496–506, 2014.
- Visual categorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV, volume 1, pages 1–2. Prague, 2004.
- Discovering object categories in image collections. Technical reports, CSAIL, Massachusetts Institute of Technology, 2005.
- Evaluating bag-of-visual-words representations in scene classification. In Proceedings of the international workshop on Workshop on multimedia information retrieval, pages 197–206, 2007.
- X-ray categorization and retrieval on the organ and pathology level, using patch-based visual words. IEEE Transactions on Medical Imaging, 30(3):733–746, 2011.
- Using a bag of words for automatic medical image annotation with a latent semantic. arXiv pre-print server, 2013.
- A comparative study of cnn, bovw and lbp for classification of histopathological images. arXiv pre-print server, 2017.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4700–4708, 2017.
- Hamid R Tizhoosh. Barcode annotations for medical image retrieval: A preliminary investigation. In 2015 IEEE international conference on image processing (ICIP), pages 818–822. IEEE, 2015.
- Minmax radon barcodes for medical image retrieval. In Advances in Visual Computing: 12th International Symposium, ISVC 2016, Las Vegas, NV, USA, December 12-14, 2016, Proceedings, Part I 12, pages 617–627. Springer, 2016.
- Deep barcodes for fast retrieval of histopathology scans. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2018.
- Jon Louis Bentley. Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9):509–517, 1975.
- Comments on’fast and scalable search of whole-slide images via self-supervised deep learning’. arXiv preprint arXiv:2304.08297, 2023.
- Peter van Emde Boas. Preserving order in a forest in less than logarithmic time. In 16th Annual Symposium on Foundations of Computer Science (sfcs 1975), pages 75–84. IEEE, 1975.
- Cache-oblivious b-trees. SIAM Journal on Computing, 35(2):341–358, 2005.
- Pipelined van emde boas tree: Algorithms, analysis, and applications. In IEEE INFOCOM 2007-26th IEEE International Conference on Computer Communications, pages 2471–2475. IEEE, 2007.
- Colored kimia path24 dataset: Configurations and benchmarks with deep embeddings. arXiv pre-print server, 2021.
- Integrating digital pathology into clinical practice. Modern Pathology, 35(2):152–164, 2022.
- Cost analysis of archives in the pathology laboratories: from safety to management. Journal of Clinical Pathology, 2023.
- Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama, 318(22):2199–2210, 2017.
- BRACS: A Dataset for BReAst Carcinoma Subtyping in H&E Histology Images. Database, 2022:baac093, 10 2022.
- Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge. Nature medicine, 28(1):154–163, 2022.
- Fine-tuning and training of densenet for histopathology image representation using tcga diagnostic slides. Medical image analysis, 70:102032, 2021.
- Densely connected convolutional networks. arXiv pre-print server, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.