HEST-Benchmark: Pathology & Quantum Limits

Updated 17 February 2026

HEST-Benchmark is a dual-purpose construct defined as both a gene expression prediction evaluation suite in computational pathology and a theoretical framework for quantum shadow tomography.
In pathology, the benchmark standardizes regression tasks on multi-cancer histopathology images by aligning patch-derived embeddings with gene expressions using state-of-the-art foundation models.
In quantum information theory, HEST formalizes impossibility results for hyper-efficient shadow tomography under cryptographic assumptions, highlighting critical resource-bound limitations.

The designation "HEST-Benchmark" encompasses two distinct concepts recognized in current literature: (1) a rigorous evaluation suite for gene expression prediction from histopathology images within computational pathology, and (2) a theoretical construct in quantum information theory concerning the feasibility of hyper-efficient shadow tomography. Both uses reflect state-of-the-art methodological frontiers in their respective domains and are treated separately in the primary research literature.

1. HEST-Benchmark in Computational Pathology

HEST-Benchmark (Histopathology–Expression Spatial Transcriptomics Benchmark) provides a standardized, multivariate regression framework for evaluating how effectively fixed tile‐level feature extractors (foundation models) can predict gene expression levels from hematoxylin–eosin (H&E) stained whole-slide images (WSIs). Developed as part of the HEST-1k initiative, HEST-Benchmark leverages paired digital pathology and spatial transcriptomics data to assess cross-modal representation quality in cancer contexts (Jaume et al., 2024).

1.1 Dataset and Task Structure

The benchmark encompasses nine supervised regression tasks, each targeting a primary human cancer or metastatic indication:

Task ID	Cancer Type	Organ	Patients	Technology
1	Invasive ductal carcinoma	Breast	4	Xenium
2	Prostate adenocarcinoma	Prostate	2	Visium
3	Pancreatic adenocarcinoma	Pancreas	3	Xenium
4	Melanoma	Skin	2	Xenium
5	Colon adenocarcinoma	Colon	2	Xenium
6	Rectal adenocarcinoma	Rectum	2	Visium
7	Clear cell renal carcinoma	Kidney	24	Visium
8	Lung adenocarcinoma	Lung	2	Xenium
9	Axillary node metastasis	Lymph node	4	Visium

For each indication, the top 50 most variable genes across patients are selected as prediction targets. Sample pairing is meticulously controlled by aligning ST spots to image patches (224×224 pixels at 20× magnification) drawn from tumor regions, yielding in total millions of patch–expression pairs. Each regression task entails predicting log-normalized gene expression for these genes from image-derived embeddings.

1.2 Feature Extraction and Embedding Pipeline

Feature extraction employs a variety of state-of-the-art patch encoders, ranging from classical supervised (ResNet-50 ImageNet) to large-scale self-supervised ViT-based foundation models. Each slide or patch generates an embedding, typically formed by concatenating special tokens and pooled representations (e.g., [CLS] token plus mean patch tokens). To ensure comparability, all model outputs are projected via principal component analysis (PCA) into a common 256-dimensional embedding space prior to regression modeling (Jaume et al., 2024, Filiot et al., 27 Jan 2025).

1.3 Evaluation Protocols and Metrics

Downstream regression employs a fixed protocol of ℓ2-penalized ridge regression (with prescribed hyperparameters) to predict the gene expression panel from PCA-projected embeddings. The primary metric is the Pearson correlation coefficient averaged across the 50 gene targets per indication:

$r = \frac{\sum_i (y_i - \mu_y)(\hat{y}_i - \mu_{\hat{y}})}{\sqrt{\sum_i (y_i - \mu_y)^2} \, \sqrt{\sum_i (\hat{y}_i - \mu_{\hat{y}})^2}}$

where $y_i$ and $\hat{y}_i$ denote true and predicted gene expressions, and $\mu_y$ , $\mu_{\hat{y}}$ are means across the gene set. Cross-validation is performed with patient-stratified splits (k = number of patients, adjusted for ccRCC).

No task-wise statistical significance tests (e.g., paired t-tests or bootstrap p-values) are reported; results are primarily summarized as average r over folds and tasks.

1.4 Model Comparisons and Benchmark Results

Performance follows a clear scaling trend with foundation model size and data regime. Notably, billion-parameter ViT-Giant models (e.g., H-Optimus-0) achieve the highest average correlations (r ≈ 0.4146), with slightly lower but close values for advanced medium-scale ViTs and multimodal models (UNIv1.5, Virchow 2). The distilled model H0-mini (ViT-Base, 86M parameters) closes over 50% of the performance gap with its 1.1B parameter teacher, achieving r = 0.4044, and surpasses comparably sized or even much larger models (e.g., Virchow 2, 632M) on the average score (Filiot et al., 27 Jan 2025).

Model	Parameter Count	Average r (PCA+Ridge)
H-Optimus-0	1.1B	0.4146
UNIv1.5	1.1B	0.4090
Virchow 2	632M	0.3984
H0-mini	86M	0.4044 (third overall)

On specific tasks, highest per-task scores (SKCM: r = 0.6432) and notable task-difficulty heterogeneity are observed, implicating biological variability in morphologic–molecular associations.

1.5 Design Recommendations and Future Advances

HEST-Benchmark’s structure facilitates rigorous, reproducible comparisons of fixed encoder representations, independent of downstream tuning. Recommendations for future development include expanding task and organ coverage, evaluating classification and segmentation settings, applying noise-robust or multitask heads, and addressing inter-cohort batch effects. The approach monitors the scaling behavior of new vision-language and foundation encoders as new digital pathology/omics datasets and assays emerge (Jaume et al., 2024).

2. HEST in Quantum Information Theory: Hyper-Efficient Shadow Tomography

HEST also denotes "Hyper-Efficient Shadow Tomography," a conceptual framework in quantum state learning. Formally, an $(m, \varepsilon, \delta, n)$ -HEST algorithm is a shadow-tomography procedure that, given $m$ copies of an unknown $n$ -qubit state $\rho$ and a class of $M$ two-outcome measurements $\{E_i\}$ , outputs a classical summary $S$ enabling simultaneous estimation of each $p_i = \mathrm{Tr}[E_i \rho]$ up to accuracy $\varepsilon$ with failure probability at most $\delta$ , under sample and computational complexity both polynomial in $n$ , $\log M$ , $1/\varepsilon$ , and $\log(1/\delta)$ (Champion et al., 2024).

A weaker notion, WEST (Weakly-Efficient Shadow Tomography), relaxes the estimator to unbounded runtime but requires the sampling to be efficient.

3. Impossibility Theorems for HEST (Quantum Setting)

A principal result is the impossibility of hyper-efficient shadow tomography for general mixed quantum states and arbitrary measurement families under standard cryptographic assumptions. Specifically, existence of a HEST algorithm for general $(n,\log M)$ -indexed binary measurements would permit a generic break of any collusion-resistant untelegraphable encryption (UTE) scheme—contradicting the foundational security of such cryptosystems (Champion et al., 2024).

Formal impossibility statements:

If a HEST algorithm with sample and time complexity polynomial in $n, \log M, 1/\varepsilon, \log (1/\delta)$ existed, the $m$ -copy security of collusion-resistant UTE could be broken.
No such HEST exists under the existence of one-way functions (OWF), CPA-secure secret-key encryption, or even under the existence of pseudorandom-state generators.
The same conclusion applies to WEST; even with unbounded post-processing in the quantum random oracle model, efficient shadow-tomography protocols contradict everlasting security of collusion-resistant UTE.

Consequently, for arbitrary POVM families and general mixed states, HEST is infeasible; only protocols with super-polynomial resource requirements are possible.

4. Methodological Foundations and Proof Techniques

The key proof techniques reducing HEST to cryptographic break involve:

Constructing a mapping from UTE ciphertexts to quantum states, with each decryption key corresponding to a two-outcome measurement;
Feeding the state and measurement descriptions to a shadow tomography procedure, whose output enables extraction of all message bits, thus violating UTE's intended receiver security.

This reduction extends under both information-theoretic and computational security assumptions, leveraging constructions from one-way secure UTE and bounded-query secret-key cryptography based on pseudorandom-state generators.

5. Implications and Scope for Practical Benchmarking

The no-go theorems for HEST have critical implications:

No universal, polynomial-time, and polynomial-sample shadow tomography exists for arbitrary measurement families and general mixed states.
Practical benchmarks for shadow tomography must restrict resource budgets, measurement families (e.g., rank-1 projectors, Pauli observables), or the number of distinct measurement settings $M$ to polynomial regimes.
Efficient classical shadow protocols remain feasible only for limited measurement classes (e.g., random Pauli projections for pure-state estimation), which do not cover universal tomography for all mixed states.

For realistic benchmarking, parameter regimes must reflect these fundamental lower bounds: universal HEST cannot serve as a practical benchmark without relaxing assumptions or limiting the scope to computationally or physically tractable ensembles (Champion et al., 2024).

The literature also introduces several closely related constructs:

Weakly-Efficient Shadow Tomography (WEST): Efficient sampling, possibly unbounded post-processing; also ruled out for general POVMs.
Constructions such as bounded-query SK-NCE from pseudorandom states, untelegraphable secret sharing (UTSS), and untelegraphable functional encryption (UTFE), illustrating broader impacts of UTE and HEST impossibility beyond tomography—particularly in contexts demanding resistance to classical extraction of quantum secrets (Champion et al., 2024).

7. Synthesis and Distinctions across Domains

HEST-Benchmark in computational pathology and HEST in quantum information are contextually distinct: the former is an operational performance benchmark for foundation model evaluation in biomedical imaging; the latter encapsulates a theoretical boundary precluding universally efficient quantum shadow tomography under cryptographic hardness assumptions. Both serve as reference points—one as a measurement tool for cross-modal learning, the other as a theoretical barrier shaping expectations for quantum state prediction protocols (Jaume et al., 2024, Filiot et al., 27 Jan 2025, Champion et al., 2024).

Markdown Upgrade to Chat

References (3)

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis (2024)

Distilling foundation models for robust and efficient models in digital pathology (2025)

Untelegraphable Encryption and its Applications (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HEST-Benchmark.