HEST-Library: Unified Spatial Omics Analysis

Updated 17 February 2026

HEST-Library is a Python package that unifies spatial transcriptomics and histological imaging data from diverse sources for integrated multi-omics analysis.
It employs advanced segmentation, image alignment, and batch-effect correction methods to ensure accurate data harmonization across varied samples.
The package streamlines workflows for downloading, preprocessing, visualization, and benchmarking, enabling reproducible and scalable multimodal research.

HEST-Library is a Python package engineered to unify, preprocess, and analyze spatial transcriptomics (ST) and histological image data, specifically designed to operate on the HEST-1k dataset—a resource of 1,229 ST profiles paired with H&E whole slide images (WSIs), encompassing diverse tissue types, species, and cancer states. It enables streamlined data access, multimodal alignment, quantitative morphomolecular analyses, and foundation model benchmarking, with utilities for downloading, preprocessing, visualization, and batch-effect correction in spatial multi-omics research (Jaume et al., 2024).

1. Design Objectives and Scope

HEST-Library was developed to address the integrative and computational demands of large-scale multimodal studies involving legacy and contemporary ST datasets combined with digital pathology images. The principal goals are:

Data assembly and harmonization: Aggregating heterogeneous ST and histology data from diverse sources (153 cohorts, 26 organs, two species, 25 cancer types), seamlessly wrapping transcriptomics, WSIs, and metadata.
Unified I/O: Metadata-driven sample download, image conversion to pyramidal TIFF (for scalable viewing), and multimodal spot-to-image alignment.
Preprocessing utilities: Automated workflows for tissue and nuclear segmentation using DeepLabV3 and CellViT, patch extraction emulating Visium/Xenium layouts, magnification inference, and batch-effect mitigation (ComBat, Harmony, MNN).
Support for advanced analysis: Enabling downstream applications such as foundation model benchmarking (“HEST-Benchmark”), biomarker/morphology-gene exploration, and multimodal representation learning.

2. Architecture and Module Organization

The HEST-Library structure is organized under the top-level hest namespace with modular subcomponents, facilitating both end-to-end workflows and granular task execution. The principal submodules and provided APIs are:

Module	Key Functions/Classes	Purpose
`hest.io`	`download_hest`, `list_samples`, `load_sample`	Data access and sample loading
`hest.core`	`HestSample`, `to_anndata`, `to_pyramidal_tiff`, etc.	Core sample representation
`hest.preprocess`	`align_visium`, `tissue_segmentation`, `tile_patches`,	Preprocessing and stats
	`normalize_expression`, `compute_spatial_stats`
`hest.batch`	`plot_batch_effect`, `correct_batch_effect`	Batch-effect exploration/correction
`hest.benchmark`	`run_hest_benchmark`, `BenchmarkResult`	Foundation model evaluation
`hest.utils`	`find_spot_under_patch`, `visualize_overlay`	Utility functions

Section 4 and Appendix Figure A1 of (Jaume et al., 2024) provide a full schematic of these modules and their interactions.

3. Principal Functionalities and API Patterns

HEST-Library exposes a high-level API for typical spatial omics workflows:

Sample enumeration and download: Retrieve metadata (list_samples) and download filtered subsets by species, organ, or pathology.

1
2
3

from hest.io import list_samples, download_hest
meta_df = list_samples()
download_hest({'species':'Homo sapiens', 'organ':'Breast', 'cancer_type':'IDC'}, local_dir=Path('/data/hest1k/'))

Sample loading and inspection: Encapsulated in the HestSample class, integrating WSI objects, AnnData transcriptomics, alignment, contours, and nuclei segmentation:
1 2 3
sample = load_sample('TENX111', data_dir=Path('/data/hest1k/')) adata = sample.to_anndata() slide = sample.to_pyramidal_tiff()

Expression normalization and filtering: Total-count and log1p normalization of AnnData; gene filtering via Scanpy.

1
2
3

from hest.preprocess import normalize_expression
adata = normalize_expression(adata, method='total_count')
adata = normalize_expression(adata, method='log1p')

Tissue and patch extraction: Automatic segmentation and Visium/Xenium-like patch assignment.

1
2
3

from hest.preprocess import tissue_segmentation, tile_patches
mask = tissue_segmentation(sample)
patches = tile_patches(sample, size_px=224, mag=20.0)

Nuclear feature quantification: Extraction of per-nucleus morphometrics (area, perimeter, eccentricity).

1 2	masks, classes = sample.nuclei.load() df_feats = sample.nuclei.compute_features(classes_of_interest=['neoplastic'], features=['area'])

Spatial-molecular correlation: Quantify relationships (e.g., PCC ~0.47 between GATA3 expression and nuclear area) and spatial statistics (e.g., Moran’s I).
1 2
from hest.preprocess import compute_spatial_stats morans_i = compute_spatial_stats(adata, gene='GATA3', neighbors=8, metric='morans_i')

Visualization: Overlay gene expression or segmentation masks on WSIs for interpretability.

1 2	from hest.utils import visualize_overlay fig = visualize_overlay(slide, coords, expr, cmap='coolwarm', alpha=0.6)

4. End-to-End Analytical Workflows

The library supports comprehensive, protocolized analyses, with exemplar workflows (Sections 6 and 7):

Biomarker exploration: Identify histomorphological correlates of expression in carcinoma samples by segmenting nuclei, averaging per-spot features, and correlating with transcript abundance (e.g., GATA3: nuclear area PCC ≈ 0.47).
Multimodal representation learning: Construction of paired patch-expression datasets, used to fine-tune vision-language foundation models (e.g., CONCH) with contrastive losses (InfoNCE), enabling subsequent transfer and evaluation on external image cohorts for biomarker classification tasks.

These workflows are implemented with minimal boilerplate, leveraging HEST-Library’s integration with AnnData and major deep learning and visualization frameworks. Section 5 details model benchmarking interfaces (hest.benchmark), and Section 6 (Figure 1) illustrates biomarker studies.

5. Data Handling, Dependencies, and Performance Considerations

HEST-Library is optimized for high-throughput, interactive, and reproducible workflows:

Data formats: WSIs are converted to pyramidal TIFF via OpenSlide; transcript matrices stored in AnnData compatible with scanpy>=1.9.
Alignment pipelines: Spot-to-tissue registration employs YOLOv8 for Visium (“faster_fiducial”) and VALIS for Xenium (Appendix Figure A2).
Preprocessing backends: Tissue and nuclear segmentation utilize DeepLabV3 and CellViT.
Batch correction: ComBat (pycombat), Harmony (harmonypy), and MNN (scanpy.external.pp.mnn_correct) are implemented for normalization across sample batches.
Scalability: Lazy WSI loading via OpenSlide; patch extraction and feature quantification support multiprocessing via num_workers or joblib.Parallel.
Software stack: Dependencies include torch>=1.10, torchvision, yolov8, openslide-python, scikit-image, geopandas, scikit-learn, and xgboost.

A plausible implication is that the modularity and lazy evaluation scheme favor interactive as well as large-scale batch analyses for both computational biologists and machine learning practitioners.

6. References, Utility, and Broader Impact

HEST-Library is described in Section 4 (“HEST-Library”), with schematic and pipeline details in Appendix Figures A1–A2. It directly supports research in spatial genomics, digital pathology, and multimodal learning, evidenced by utility in the HEST-Benchmark (Section 5), biomarker analyses (Section 6), and multimodal foundation model research (Section 7, Table 7). The resource is fully open and accessible, with tutorials and code at https://github.com/mahmoodlab/hest, serving as a backbone for reproducible research and method development in spatial multi-omics (Jaume et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HEST-Library.

HEST-Library: Unified Spatial Omics Analysis

1. Design Objectives and Scope

2. Architecture and Module Organization

3. Principal Functionalities and API Patterns

4. End-to-End Analytical Workflows

5. Data Handling, Dependencies, and Performance Considerations

6. References, Utility, and Broader Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

HEST-Library: Unified Spatial Omics Analysis

1. Design Objectives and Scope

2. Architecture and Module Organization

3. Principal Functionalities and API Patterns

4. End-to-End Analytical Workflows

5. Data Handling, Dependencies, and Performance Considerations

6. References, Utility, and Broader Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research