Fast and Scalable Image Search For Histology (2107.13587v1)

Published 28 Jul 2021 in cs.CV, cs.AI, and q-bio.TO

Abstract: The expanding adoption of digital pathology has enabled the curation of large repositories of histology whole slide images (WSIs), which contain a wealth of information. Similar pathology image search offers the opportunity to comb through large historical repositories of gigapixel WSIs to identify cases with similar morphological features and can be particularly useful for diagnosing rare diseases, identifying similar cases for predicting prognosis, treatment outcomes, and potential clinical trial success. A critical challenge in developing a WSI search and retrieval system is scalability, which is uniquely challenging given the need to search a growing number of slides that each can consist of billions of pixels and are several gigabytes in size. Such systems are typically slow and retrieval speed often scales with the size of the repository they search through, making their clinical adoption tedious and are not feasible for repositories that are constantly growing. Here we present Fast Image Search for Histopathology (FISH), a histology image search pipeline that is infinitely scalable and achieves constant search speed that is independent of the image database size while being interpretable and without requiring detailed annotations. FISH uses self-supervised deep learning to encode meaningful representations from WSIs and a Van Emde Boas tree for fast search, followed by an uncertainty-based ranking algorithm to retrieve similar WSIs. We evaluated FISH on multiple tasks and datasets with over 22,000 patient cases spanning 56 disease subtypes. We additionally demonstrate that FISH can be used to assist with the diagnosis of rare cancer types where sufficient cases may not be available to train traditional supervised deep models. FISH is available as an easy-to-use, open-source software package (https://github.com/mahmoodlab/FISH).

Citations (2)

View on Semantic Scholar

Summary

The paper presents FISH, an image search system that leverages self-supervised learning and a vEB tree to achieve constant O(1) search time, regardless of dataset size.
The paper reduces the need for detailed annotations by employing deep learning methods, significantly lowering data preparation costs and time.
The paper demonstrates robust performance across diverse datasets, accurately retrieving slides to support clinical diagnostics and trial matching.

Fast Image Search for Histopathology: An Overview

The presented paper addresses the significant challenge of implementing a scalable and efficient image search system in digital pathology. With increasing adoption, institutions are amassing large databases of whole slide images (WSIs), which include gigapixel-scale data. Navigating these immense datasets demands both speed and interpretability to facilitate diagnostic processes, especially for rare diseases where data scarcity may preclude the use of traditional supervised methods. This paper introduces Fast Image Search for Histopathology (FISH), a novel approach designed to provide both a scalable framework and interpretable results for histology image retrieval.

Key Contributions

Infinite Scalability with Constant Time Complexity: The primary advantage of FISH lies in its ability to maintain constant search speed, denoted as $O(1)$ , regardless of the scale of the dataset. This impressive efficiency stems from utilizing self-supervised learning to extract meaningful representations from WSIs, coupled with indexing via a Van Emde Boas (vEB) tree.
Avoidance of Detailed Annotations: FISH's reliance on self-supervised deep learning negates the necessity for exhaustive region of interest (ROI) annotations. This lessens both the time and cost associated with data preparation significantly.
Interpretability and Practical Utility: The system’s interpretability is emphasized through its retrieval process, which allows medical practitioners to examine the precise morphological regions upon which the retrieval decisions are based. This aligns with clinical needs, ensuring results are transparent and trustworthy.
Versatile Application and Robust Performance: Evaluated across multiple datasets comprising over 22,000 patient cases across 56 disease subtypes, FISH demonstrates versatility. It not only retrieves slides from the same anatomical sites but also predicts prognosis and identifies potential clinical trials using existing data, heralding practical applications in diverse oncological scenarios.

Detailed Evaluation

The FISH pipeline was rigorously vetted through two primary tasks on datasets from the TCGA, CPTAC, and internal collections. First, the system’s aptitude for disease subtype retrieval was gauged, using both common data and rare type inquiries, demonstrating scalability and maintaining efficacy across varied diagnostic environments. Furthermore, FISH's ability to swiftly retrieve pertinent slides was underscored by its superior speed over comparative systems, such as Yottixel, in tests exceeding 1,000 slides.

In-depth qualitative analyses were performed on individual dataset retrieval results, providing insights into common morphological confusions within anatomically similar subtypes—such as in the frequent misidentification between Cholangiocarcinoma and Pancreatic Adenocarcinoma due to morphological similarities.

Future Directions and Implications

While FISH's innovations present a robust framework for histology image searches, the limitations imposed by the expressiveness of the current indexing approach remain a significant area for improvement. Developing a more semantically relevant distance metric for the indexing system is a prospective avenue to enhance accuracy without compromising speed. Moreover, exploring high-scale patch-level retrieval, facilitated by advancements in data annotation and storage technologies, suggests another promising direction for research.

The integration of feedback mechanisms—allowing pathologists to interactively refine results—could bridge the gap between algorithmic predictions and clinical decision-making. This aligns with contemporary movements towards human-in-the-loop systems in medical AI, augmenting FISH's potential to be seamlessly integrated into clinical workflows.

In essence, the paper presents a comprehensive solution to address the unique challenges posed by large-scale histopathology image databases. By ensuring scalability, speed, and interpretability, FISH offers a practical and forward-looking tool for the burgeoning field of digital pathology.

PDF Markdown

Related Papers

GitHub

GitHub - mahmoodlab/SISH: Fast and scalable search of whole-slide images via self-supervised deep learning - Nature Biomedical Engineering (99 stars)