Papers
Topics
Authors
Recent
Search
2000 character limit reached

HEST-Library: Unified Spatial Omics Analysis

Updated 17 February 2026
  • HEST-Library is a Python package that unifies spatial transcriptomics and histological imaging data from diverse sources for integrated multi-omics analysis.
  • It employs advanced segmentation, image alignment, and batch-effect correction methods to ensure accurate data harmonization across varied samples.
  • The package streamlines workflows for downloading, preprocessing, visualization, and benchmarking, enabling reproducible and scalable multimodal research.

HEST-Library is a Python package engineered to unify, preprocess, and analyze spatial transcriptomics (ST) and histological image data, specifically designed to operate on the HEST-1k dataset—a resource of 1,229 ST profiles paired with H&E whole slide images (WSIs), encompassing diverse tissue types, species, and cancer states. It enables streamlined data access, multimodal alignment, quantitative morphomolecular analyses, and foundation model benchmarking, with utilities for downloading, preprocessing, visualization, and batch-effect correction in spatial multi-omics research (Jaume et al., 2024).

1. Design Objectives and Scope

HEST-Library was developed to address the integrative and computational demands of large-scale multimodal studies involving legacy and contemporary ST datasets combined with digital pathology images. The principal goals are:

  • Data assembly and harmonization: Aggregating heterogeneous ST and histology data from diverse sources (153 cohorts, 26 organs, two species, 25 cancer types), seamlessly wrapping transcriptomics, WSIs, and metadata.
  • Unified I/O: Metadata-driven sample download, image conversion to pyramidal TIFF (for scalable viewing), and multimodal spot-to-image alignment.
  • Preprocessing utilities: Automated workflows for tissue and nuclear segmentation using DeepLabV3 and CellViT, patch extraction emulating Visium/Xenium layouts, magnification inference, and batch-effect mitigation (ComBat, Harmony, MNN).
  • Support for advanced analysis: Enabling downstream applications such as foundation model benchmarking (“HEST-Benchmark”), biomarker/morphology-gene exploration, and multimodal representation learning.

2. Architecture and Module Organization

The HEST-Library structure is organized under the top-level hest namespace with modular subcomponents, facilitating both end-to-end workflows and granular task execution. The principal submodules and provided APIs are:

Module Key Functions/Classes Purpose
hest.io download_hest, list_samples, load_sample Data access and sample loading
hest.core HestSample, to_anndata, to_pyramidal_tiff, etc. Core sample representation
hest.preprocess align_visium, tissue_segmentation, tile_patches, Preprocessing and stats
normalize_expression, compute_spatial_stats
hest.batch plot_batch_effect, correct_batch_effect Batch-effect exploration/correction
hest.benchmark run_hest_benchmark, BenchmarkResult Foundation model evaluation
hest.utils find_spot_under_patch, visualize_overlay Utility functions

Section 4 and Appendix Figure A1 of (Jaume et al., 2024) provide a full schematic of these modules and their interactions.

3. Principal Functionalities and API Patterns

HEST-Library exposes a high-level API for typical spatial omics workflows:

  • Sample enumeration and download: Retrieve metadata (list_samples) and download filtered subsets by species, organ, or pathology.
    1
    2
    3
    
    from hest.io import list_samples, download_hest
    meta_df = list_samples()
    download_hest({'species':'Homo sapiens', 'organ':'Breast', 'cancer_type':'IDC'}, local_dir=Path('/data/hest1k/'))
  • Sample loading and inspection: Encapsulated in the HestSample class, integrating WSI objects, AnnData transcriptomics, alignment, contours, and nuclei segmentation:
    1
    2
    3
    
    sample = load_sample('TENX111', data_dir=Path('/data/hest1k/'))
    adata = sample.to_anndata()
    slide = sample.to_pyramidal_tiff()
  • Expression normalization and filtering: Total-count and log1p normalization of AnnData; gene filtering via Scanpy.
    1
    2
    3
    
    from hest.preprocess import normalize_expression
    adata = normalize_expression(adata, method='total_count')
    adata = normalize_expression(adata, method='log1p')
  • Tissue and patch extraction: Automatic segmentation and Visium/Xenium-like patch assignment.
    1
    2
    3
    
    from hest.preprocess import tissue_segmentation, tile_patches
    mask = tissue_segmentation(sample)
    patches = tile_patches(sample, size_px=224, mag=20.0)
  • Nuclear feature quantification: Extraction of per-nucleus morphometrics (area, perimeter, eccentricity).
    1
    2
    
    masks, classes = sample.nuclei.load()
    df_feats = sample.nuclei.compute_features(classes_of_interest=['neoplastic'], features=['area'])
  • Spatial-molecular correlation: Quantify relationships (e.g., PCC ~0.47 between GATA3 expression and nuclear area) and spatial statistics (e.g., Moran’s I).
    1
    2
    
    from hest.preprocess import compute_spatial_stats
    morans_i = compute_spatial_stats(adata, gene='GATA3', neighbors=8, metric='morans_i')
  • Visualization: Overlay gene expression or segmentation masks on WSIs for interpretability.
    1
    2
    
    from hest.utils import visualize_overlay
    fig = visualize_overlay(slide, coords, expr, cmap='coolwarm', alpha=0.6)

4. End-to-End Analytical Workflows

The library supports comprehensive, protocolized analyses, with exemplar workflows (Sections 6 and 7):

  • Biomarker exploration: Identify histomorphological correlates of expression in carcinoma samples by segmenting nuclei, averaging per-spot features, and correlating with transcript abundance (e.g., GATA3: nuclear area PCC ≈ 0.47).
  • Multimodal representation learning: Construction of paired patch-expression datasets, used to fine-tune vision-language foundation models (e.g., CONCH) with contrastive losses (InfoNCE), enabling subsequent transfer and evaluation on external image cohorts for biomarker classification tasks.

These workflows are implemented with minimal boilerplate, leveraging HEST-Library’s integration with AnnData and major deep learning and visualization frameworks. Section 5 details model benchmarking interfaces (hest.benchmark), and Section 6 (Figure 1) illustrates biomarker studies.

5. Data Handling, Dependencies, and Performance Considerations

HEST-Library is optimized for high-throughput, interactive, and reproducible workflows:

  • Data formats: WSIs are converted to pyramidal TIFF via OpenSlide; transcript matrices stored in AnnData compatible with scanpy>=1.9.
  • Alignment pipelines: Spot-to-tissue registration employs YOLOv8 for Visium (“faster_fiducial”) and VALIS for Xenium (Appendix Figure A2).
  • Preprocessing backends: Tissue and nuclear segmentation utilize DeepLabV3 and CellViT.
  • Batch correction: ComBat (pycombat), Harmony (harmonypy), and MNN (scanpy.external.pp.mnn_correct) are implemented for normalization across sample batches.
  • Scalability: Lazy WSI loading via OpenSlide; patch extraction and feature quantification support multiprocessing via num_workers or joblib.Parallel.
  • Software stack: Dependencies include torch>=1.10, torchvision, yolov8, openslide-python, scikit-image, geopandas, scikit-learn, and xgboost.

A plausible implication is that the modularity and lazy evaluation scheme favor interactive as well as large-scale batch analyses for both computational biologists and machine learning practitioners.

6. References, Utility, and Broader Impact

HEST-Library is described in Section 4 (“HEST-Library”), with schematic and pipeline details in Appendix Figures A1–A2. It directly supports research in spatial genomics, digital pathology, and multimodal learning, evidenced by utility in the HEST-Benchmark (Section 5), biomarker analyses (Section 6), and multimodal foundation model research (Section 7, Table 7). The resource is fully open and accessible, with tutorials and code at https://github.com/mahmoodlab/hest, serving as a backbone for reproducible research and method development in spatial multi-omics (Jaume et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HEST-Library.