HEST-Benchmark: Pathology & Quantum Limits
- HEST-Benchmark is a dual-purpose construct defined as both a gene expression prediction evaluation suite in computational pathology and a theoretical framework for quantum shadow tomography.
- In pathology, the benchmark standardizes regression tasks on multi-cancer histopathology images by aligning patch-derived embeddings with gene expressions using state-of-the-art foundation models.
- In quantum information theory, HEST formalizes impossibility results for hyper-efficient shadow tomography under cryptographic assumptions, highlighting critical resource-bound limitations.
The designation "HEST-Benchmark" encompasses two distinct concepts recognized in current literature: (1) a rigorous evaluation suite for gene expression prediction from histopathology images within computational pathology, and (2) a theoretical construct in quantum information theory concerning the feasibility of hyper-efficient shadow tomography. Both uses reflect state-of-the-art methodological frontiers in their respective domains and are treated separately in the primary research literature.
1. HEST-Benchmark in Computational Pathology
HEST-Benchmark (Histopathology–Expression Spatial Transcriptomics Benchmark) provides a standardized, multivariate regression framework for evaluating how effectively fixed tile‐level feature extractors (foundation models) can predict gene expression levels from hematoxylin–eosin (H&E) stained whole-slide images (WSIs). Developed as part of the HEST-1k initiative, HEST-Benchmark leverages paired digital pathology and spatial transcriptomics data to assess cross-modal representation quality in cancer contexts (Jaume et al., 2024).
1.1 Dataset and Task Structure
The benchmark encompasses nine supervised regression tasks, each targeting a primary human cancer or metastatic indication:
| Task ID | Cancer Type | Organ | Patients | Technology |
|---|---|---|---|---|
| 1 | Invasive ductal carcinoma | Breast | 4 | Xenium |
| 2 | Prostate adenocarcinoma | Prostate | 2 | Visium |
| 3 | Pancreatic adenocarcinoma | Pancreas | 3 | Xenium |
| 4 | Melanoma | Skin | 2 | Xenium |
| 5 | Colon adenocarcinoma | Colon | 2 | Xenium |
| 6 | Rectal adenocarcinoma | Rectum | 2 | Visium |
| 7 | Clear cell renal carcinoma | Kidney | 24 | Visium |
| 8 | Lung adenocarcinoma | Lung | 2 | Xenium |
| 9 | Axillary node metastasis | Lymph node | 4 | Visium |
For each indication, the top 50 most variable genes across patients are selected as prediction targets. Sample pairing is meticulously controlled by aligning ST spots to image patches (224×224 pixels at 20× magnification) drawn from tumor regions, yielding in total millions of patch–expression pairs. Each regression task entails predicting log-normalized gene expression for these genes from image-derived embeddings.
1.2 Feature Extraction and Embedding Pipeline
Feature extraction employs a variety of state-of-the-art patch encoders, ranging from classical supervised (ResNet-50 ImageNet) to large-scale self-supervised ViT-based foundation models. Each slide or patch generates an embedding, typically formed by concatenating special tokens and pooled representations (e.g., [CLS] token plus mean patch tokens). To ensure comparability, all model outputs are projected via principal component analysis (PCA) into a common 256-dimensional embedding space prior to regression modeling (Jaume et al., 2024, Filiot et al., 27 Jan 2025).
1.3 Evaluation Protocols and Metrics
Downstream regression employs a fixed protocol of ℓ2-penalized ridge regression (with prescribed hyperparameters) to predict the gene expression panel from PCA-projected embeddings. The primary metric is the Pearson correlation coefficient averaged across the 50 gene targets per indication:
where and denote true and predicted gene expressions, and , are means across the gene set. Cross-validation is performed with patient-stratified splits (k = number of patients, adjusted for ccRCC).
No task-wise statistical significance tests (e.g., paired t-tests or bootstrap p-values) are reported; results are primarily summarized as average r over folds and tasks.
1.4 Model Comparisons and Benchmark Results
Performance follows a clear scaling trend with foundation model size and data regime. Notably, billion-parameter ViT-Giant models (e.g., H-Optimus-0) achieve the highest average correlations (r ≈ 0.4146), with slightly lower but close values for advanced medium-scale ViTs and multimodal models (UNIv1.5, Virchow 2). The distilled model H0-mini (ViT-Base, 86M parameters) closes over 50% of the performance gap with its 1.1B parameter teacher, achieving r = 0.4044, and surpasses comparably sized or even much larger models (e.g., Virchow 2, 632M) on the average score (Filiot et al., 27 Jan 2025).
| Model | Parameter Count | Average r (PCA+Ridge) |
|---|---|---|
| H-Optimus-0 | 1.1B | 0.4146 |
| UNIv1.5 | 1.1B | 0.4090 |
| Virchow 2 | 632M | 0.3984 |
| H0-mini | 86M | 0.4044 (third overall) |
On specific tasks, highest per-task scores (SKCM: r = 0.6432) and notable task-difficulty heterogeneity are observed, implicating biological variability in morphologic–molecular associations.
1.5 Design Recommendations and Future Advances
HEST-Benchmark’s structure facilitates rigorous, reproducible comparisons of fixed encoder representations, independent of downstream tuning. Recommendations for future development include expanding task and organ coverage, evaluating classification and segmentation settings, applying noise-robust or multitask heads, and addressing inter-cohort batch effects. The approach monitors the scaling behavior of new vision-language and foundation encoders as new digital pathology/omics datasets and assays emerge (Jaume et al., 2024).
2. HEST in Quantum Information Theory: Hyper-Efficient Shadow Tomography
HEST also denotes "Hyper-Efficient Shadow Tomography," a conceptual framework in quantum state learning. Formally, an -HEST algorithm is a shadow-tomography procedure that, given copies of an unknown -qubit state and a class of two-outcome measurements , outputs a classical summary enabling simultaneous estimation of each up to accuracy with failure probability at most , under sample and computational complexity both polynomial in , , , and (Champion et al., 2024).
A weaker notion, WEST (Weakly-Efficient Shadow Tomography), relaxes the estimator to unbounded runtime but requires the sampling to be efficient.
3. Impossibility Theorems for HEST (Quantum Setting)
A principal result is the impossibility of hyper-efficient shadow tomography for general mixed quantum states and arbitrary measurement families under standard cryptographic assumptions. Specifically, existence of a HEST algorithm for general -indexed binary measurements would permit a generic break of any collusion-resistant untelegraphable encryption (UTE) scheme—contradicting the foundational security of such cryptosystems (Champion et al., 2024).
Formal impossibility statements:
- If a HEST algorithm with sample and time complexity polynomial in existed, the -copy security of collusion-resistant UTE could be broken.
- No such HEST exists under the existence of one-way functions (OWF), CPA-secure secret-key encryption, or even under the existence of pseudorandom-state generators.
- The same conclusion applies to WEST; even with unbounded post-processing in the quantum random oracle model, efficient shadow-tomography protocols contradict everlasting security of collusion-resistant UTE.
Consequently, for arbitrary POVM families and general mixed states, HEST is infeasible; only protocols with super-polynomial resource requirements are possible.
4. Methodological Foundations and Proof Techniques
The key proof techniques reducing HEST to cryptographic break involve:
- Constructing a mapping from UTE ciphertexts to quantum states, with each decryption key corresponding to a two-outcome measurement;
- Feeding the state and measurement descriptions to a shadow tomography procedure, whose output enables extraction of all message bits, thus violating UTE's intended receiver security.
This reduction extends under both information-theoretic and computational security assumptions, leveraging constructions from one-way secure UTE and bounded-query secret-key cryptography based on pseudorandom-state generators.
5. Implications and Scope for Practical Benchmarking
The no-go theorems for HEST have critical implications:
- No universal, polynomial-time, and polynomial-sample shadow tomography exists for arbitrary measurement families and general mixed states.
- Practical benchmarks for shadow tomography must restrict resource budgets, measurement families (e.g., rank-1 projectors, Pauli observables), or the number of distinct measurement settings to polynomial regimes.
- Efficient classical shadow protocols remain feasible only for limited measurement classes (e.g., random Pauli projections for pure-state estimation), which do not cover universal tomography for all mixed states.
For realistic benchmarking, parameter regimes must reflect these fundamental lower bounds: universal HEST cannot serve as a practical benchmark without relaxing assumptions or limiting the scope to computationally or physically tractable ensembles (Champion et al., 2024).
6. Additional Definitions and Related Constructs
The literature also introduces several closely related constructs:
- Weakly-Efficient Shadow Tomography (WEST): Efficient sampling, possibly unbounded post-processing; also ruled out for general POVMs.
- Constructions such as bounded-query SK-NCE from pseudorandom states, untelegraphable secret sharing (UTSS), and untelegraphable functional encryption (UTFE), illustrating broader impacts of UTE and HEST impossibility beyond tomography—particularly in contexts demanding resistance to classical extraction of quantum secrets (Champion et al., 2024).
7. Synthesis and Distinctions across Domains
HEST-Benchmark in computational pathology and HEST in quantum information are contextually distinct: the former is an operational performance benchmark for foundation model evaluation in biomedical imaging; the latter encapsulates a theoretical boundary precluding universally efficient quantum shadow tomography under cryptographic hardness assumptions. Both serve as reference points—one as a measurement tool for cross-modal learning, the other as a theoretical barrier shaping expectations for quantum state prediction protocols (Jaume et al., 2024, Filiot et al., 27 Jan 2025, Champion et al., 2024).