Cross-Entropy Benchmark Fidelity

Updated 14 November 2025

Cross-Entropy Benchmark Fidelity is a metric that links cross-entropy estimators to true model performance in domains such as quantum circuit sampling, likelihood-free inference, and classification.
It utilizes tractable surrogate losses like XEB and enhanced estimators (Alice/Alices) to approximate complex quantities such as likelihood ratios, quantum state fidelity, and classifier risk.
Practical recommendations highlight testing alternative losses and incorporating augmented data to mitigate noise effects and spoofing vulnerabilities in both quantum and classical systems.

Cross-entropy benchmark fidelity refers to the empirical and theoretical linkage between cross-entropy-based statistical estimators and the true performance or accuracy (“fidelity”) of models or physical devices, especially in likelihood-free inference, quantum circuit benchmarking, and deep network evaluation. Across machine learning, quantum information, and simulation-based inference, cross-entropy functionals serve as practical proxies for intractable quantities such as likelihood ratios, circuit fidelities, or true classifier risks. Rigorous understanding of how closely cross-entropy metrics track ground-truth performance—i.e., benchmark fidelity—is crucial for interpreting results, designing robust inference procedures, and defending claims of computational or statistical advantage.

1. Mathematical Definitions and Roles of Cross-Entropy Metrics

In statistical learning and inference, cross-entropy functionals provide tractable surrogates for evaluating model fit and discriminatory power. In the context of supervised classification, the (multi-class) cross-entropy loss for a neural network outputting logits $f(x)$ and one-hot label $y$ , after softmax normalization, is

$\ell_{\mathrm{CE}}(f(x),y) = -\log p_{y}(x)\,,$

where $p_{i}(x) = \mathrm{softmax}(f(x))_i$ . This loss is both a maximum likelihood objective under a multinomial model and a convex upper bound on the 0–1 classification error.

In likelihood-free inference for implicit generative models, cross-entropy-based estimators are used to fit likelihood ratios via binary classification surrogates. The population-level functional is

$L[\hat{s}] = -\mathbb{E}_{x \sim ½p(x|\theta_0)+½p(x|\theta_1)} \big[ s(x|\theta_0,\theta_1)\log \hat{s}(x) + (1 - s(x|\theta_0,\theta_1))\log(1-\hat{s}(x)) \big]$

with $s(x|\theta_0,\theta_1) = p(x|\theta_1)/(p(x|\theta_0) + p(x|\theta_1))$ . Empirical estimators replace population expectations by sample means using synthetic or augmented data.

In quantum circuit benchmarking, the linear cross-entropy benchmark (XEB) is

$\mathrm{XEB}_U(q) = 2^n \sum_x p_U(x)q(x) - 1\,,$

with $p_U(x)$ the ideal output distribution and $q(x)$ the empirical distribution, and is conventionally reported as a normalized score reflecting circuit fidelity.

2. Cross-Entropy Benchmarking: Quantum and Classical Contexts

Quantum Random Circuit Sampling and XEB

In random quantum circuit sampling experiments, XEB is widely used as a proxy for the fidelity $F$ between the ideal circuit output $|\psi_U\rangle = U|0^n\rangle$ and the actual noisy distribution $Q_U(x) = \langle x|\mathcal{E}_U(|0^n\rangle\langle 0^n|)|x\rangle$ . Under global depolarizing or weakly correlated noise,

$F_\mathrm{XEB} \approx F$

holds up to small corrections, provided the circuit ensemble scrambles errors rapidly and no substantial time- or space-localized “sinks” exist (Cheng et al., 13 Feb 2025, Gao et al., 2021).

Faults in these assumptions, including spatially or temporally correlated errors or adversarial manipulation, break this equivalence. Here, XEB can yield artificial indications of high fidelity even in the absence of true quantum coherence.

Alternative Cross-Entropy Metrics in Hamiltonian Simulation

In Hamiltonian-simulation–motivated quantum supremacy experiments, the System Linear Cross Entropy Score (sXES) is defined as

$\mathrm{sXES}(U) = \sum_{x \neq 0^n} \pi_U(x) p_\mathrm{exp}(x)$

to address sampling nonuniformity in block-encoded simulations. sXES’s complexity-theoretic hardness is conjectured under sXQUATH, an extension of the original XQUATH conjecture for XEB (Tanggara et al., 1 May 2024).

3. Theoretical Foundations and Limitations

Ergodicity and the Foundation of XEB Fidelity

Recent work formalizes the justification for using cross-entropy metrics as fidelity benchmarks via ergodicity theory (Cheng et al., 13 Feb 2025). For a given post-processing function $f(p)$ , the ergodicity condition states that for a unitary $2t$-design ensemble $\mathcal{U}$ and polynomials $f$ ,

$\mathbb{E}_U[f(P_U(x_0))] \approx \frac{1}{2^n}\sum_x f(P_U(x))$

up to $O(\sigma_f / \sqrt{2^n})$ fluctuations. Violation of this equality in experiment provides a quantifiable measure of noise. Specifically, the “deviation of ergodicity”

$\mathrm{DE}_f = \left| \mathbb{E}_U[f(P_U)] - C_f(P_U, Q_U) \right|$

where $C_f(P_U, Q_U)$ is the weighted empirical average, directly quantifies loss of fidelity. For $f(p) = N^2p^2$ , this framework recovers the linear XEB, with $F_\mathrm{XEB} \approx F$ , provided the “error” component of the output is sufficiently weakly correlated with the ideal output (details in (Cheng et al., 13 Feb 2025)).

Breakdown and Classical Spoofing

Cross-entropy benchmark fidelity can be fundamentally compromised in two settings:

Shallow Circuit Regime: Works such as (Barak et al., 2020) and (Tanggara et al., 1 May 2024) construct classical sampling strategies that, for sublinear or polylogarithmic-depth random quantum circuits, achieve nontrivial XEB or sXES scores using only local/light-cone simulation. For 2D circuits of depth $O(\sqrt{\log n})$ , such strategies can reach $\omega(1)$ XEB in polynomial time, falsifying the original XQUATH and sXQUATH conjectures in these regimes.
Adversarial Noise Models: As demonstrated in (Gao et al., 2021), by introducing concentrated “sinks” (strongly decohered or omitted gates), it is possible to classically generate samples whose XEB values are a substantial fraction (2–12%) of those from the best experimental devices, even when the underlying state fidelity is essentially zero. The XEB metric admits additive scaling, which allows such classical spoofing to outperform noisy quantum devices as system size grows.

Statistical Mechanics Perspective

Mapping the average XEB and fidelity to correlated statistical models (e.g., Ising-like diffusion–reaction systems) makes explicit the scaling and gap structure governing decay of XEB and fidelity with circuit depth and noise (Gao et al., 2021). In bona fide quantum hardware under uncorrelated noise, both decay exponentially, but in the presence of inhomogeneous noise (sink attacks), the scaling separates and XEB no longer certifies any meaningful quantum advantage.

4. Cross-Entropy Fidelity in Simulation-Based Likelihood-Free Inference

In likelihood-free inference, improved cross-entropy estimators (“Alice,” “Alices”) have been shown to yield significantly higher-fidelity benchmarks for likelihood-ratio estimation than standard binary cross-entropy or mean-squared-error–based approaches (Stoye et al., 2018). The key innovation is the use of simulator-provided augmented data: instead of hard binary class labels $y_i \in \{0,1\}$ , one leverages joint likelihood ratios $r(x_i, z_i)$ to compute soft labels $s_i = 1/(r(x_i, z_i) + 1)$ , allowing the improved estimator

$L_\mathrm{Alice}[\hat{s}] = -\frac{1}{N} \sum_{i=1}^N [ s_i \log \hat{s}(x_i) + (1 - s_i)\log(1-\hat{s}(x_i)) ]$

which has dramatically reduced variance and superior sample efficiency. Extension to include joint scores (“Alices”) further reduces variance at small sample sizes, yielding profile-likelihood contours and coverage-calibrated intervals nearly indistinguishable from those of the true likelihood, even under complex detector effects. The choice of cross-entropy functional thus directly governs the practical fidelity—and the trustworthiness—of simulation-based inference benchmarks.

5. Cross-Entropy Fidelity in Neural Network Classification

Empirical studies across NLP, ASR, and vision tasks show that training deep networks with square loss often achieves benchmark accuracies comparable to, or exceeding, those from cross-entropy, challenging widespread assumptions about the latter’s empirical or theoretical advantage (Hui et al., 2020). In 79% of model–dataset combinations tested, square loss matched or outperformed cross-entropy. Moreover, square loss demonstrated lower variance across random seeds, suggesting more stable optimization. These findings indicate that, at least in modern overparameterized regimes, cross-entropy is not uniquely privileged as a benchmark/fidelity measure.

6. Practical Recommendations and Open Directions

Inference, Simulation, and ML:

When joint likelihood ratios (or scores) are available, Alice/Alices cross-entropy estimators should be preferred for high-fidelity simulation-based inference (Stoye et al., 2018).
When only hard labels are accessible, binary cross-entropy is the default but yields noisier, less efficient benchmarks.
Cross-entropy and square loss should be tested as baselines for classification, rather than assumed to be optimal a priori (Hui et al., 2020).

Quantum Benchmarking:

In quantum supremacy or random-circuit sampling, reliance on XEB or sXES as fidelity proxies must be conditioned on circuit depth, scrambling speed, and noise homogeneity.
In presence of shallow circuits or inhomogeneous noise, reported benchmark fidelities from cross-entropy can be artificially inflated by classical spoofing (Gao et al., 2021, Tanggara et al., 1 May 2024, Barak et al., 2020).
Certification strategies should supplement XEB/sXES with cryptographically verifiable challenges or protocols with #P-hard amplitudes, or use alternative figures of merit (e.g., heavy-output generation, interactive proofs) that are not vulnerable to additive spoofing.

7. Summary Table: Cross-Entropy Benchmark Fidelity Domains

Domain	Proper Use of XEB/Fidelity	Breakdown Scenario
Quantum Circuits	Deep, scrambling, weakly correlated noise (Cheng et al., 13 Feb 2025)	Shallow depth / inhomogeneous noise: vulnerable to spoofing (Barak et al., 2020, Gao et al., 2021, Tanggara et al., 1 May 2024)
Simulation-based Inference	Simulator-augmented soft labels, Alice/Alices estimators (Stoye et al., 2018)	Use of standard binary labels (high variance, low efficiency)
ML Classification	Square loss vs. cross-entropy: both competitive, direct benchmark of accuracy (Hui et al., 2020)	Historical default favoring cross-entropy not justified empirically

The fidelity of cross-entropy-based benchmarks is highly context-dependent, with rigorously proven equivalence to ground-truth quantities only under strict assumptions regarding noise, depth, and data regime. Recent advances connect these properties to fundamental ergodicity conditions and establish both the power and limits of cross-entropy metrics as proxies for real-world fidelity.