Ground-truth Agnostic eXplanation Evaluation (AXE)

Updated 3 June 2026

The paper introduces AXE, a framework that evaluates explanation quality based on how well local attributions recover a model’s predictions on-manifold.
It employs a methodology where a kNN classifier is trained on top-n features from local explanations to measure predictiveness relative to the model.
AXE enhances robustness against adversarial fairwashing by comparing explanation methods under realistic data distributions without needing ground-truth annotations.

Ground-truth Agnostic eXplanation Evaluation (AXE) denotes a methodological and conceptual framework for evaluating the quality of machine learning model explanations without recourse to any pre-defined ground-truth explanations, human annotation, or model sensitivity to off-manifold perturbations. AXE addresses the core challenge in Explainable AI (XAI): how to compare, select, and validate explanation methods in realistic settings where "correct" explanations are unobservable, plural, or even fundamentally unknowable, as is typical with high-capacity black-box models and complex data.

1. Motivation and Foundational Principles

The necessity for AXE arises from two sources: (i) the impracticality or subjectivity of providing "ground-truth" explanations for most tasks (e.g., feature attributions in vision and tabular data), and (ii) the failure of commonly used evaluation strategies (ground-truth concordance, model sensitivity) to satisfy properties essential for meaningful, model-relative explanation evaluation (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026, Zhang et al., 2019). AXE formalizes three core principles:

Local Contextualization: The metric for explanation quality must depend on the specific sample under consideration, not just global properties or averages.
Model Relativism: The metric must explicitly depend on the model, distinguishing between distinct models even if their outputs are identical on observed data.
On-Manifold Evaluation: The metric must restrict evaluation to the empirical data manifold, avoiding off-manifold interventions or perturbations that may be scientifically uninterpretable.

Metrics that fail any of these—such as those relying on "ideal" attributions, or that probe models via randomized or adversarial input perturbations—can be fundamentally misled, obscuring genuine differences between explanation methods or between models within a Rashomon set (Rawal et al., 13 Jan 2026).

2. Canonical AXE Methodology

The canonical AXE approach operationalizes explanation evaluation via a predictiveness-based, on-manifold framework:

For each instance $\mathbf x_i$ , obtain a local explanation $\mathbf e_i$ (e.g., feature importances) for model $m$ .
Identify the indices of the top- $n$ features $\mathrm{Top}_n(\mathbf e_i)$ .
Project the entire dataset $\mathcal X$ onto these features, forming $\mathcal X_i^{(n)}$ .
Train a $k$ -nearest neighbor (kNN) classifier $M_i^k$ on the restricted data, using the model's own predictions as labels.
Compute $\hat y_i = M_i^k(\mathbf x_i[\mathrm{Top}_n(\mathbf e_i)])$ .
Assign an explanation quality score $\mathbf e_i$ 0.
The AXE score for the explainer is then the mean predictive accuracy:

$\mathbf e_i$ 1

This methodology evaluates whether the features deemed important by the explanation are sufficiently informative to recover the model’s prediction "on-manifold" (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026).

Pseudocode

$\mathbf e_i$ 3

This construction is explicitly ground-truth-agnostic: it does not require any reference explanation but uses only the model, its local attributions, and in-manifold data points (Rawal et al., 13 Jan 2026).

3. Comparison to Ground-Truth and Sensitivity-Based Metrics

Traditional metrics for evaluation—such as feature agreement or rank agreement with a pre-defined ground-truth attribution vector, or sensitivity scores (e.g., prediction gap when masking important features)—fail at least one of the three central AXE principles. Ground-truth comparison metrics cannot resolve between models with similar outputs but radically different mechanisms, and can be gamed by adversarial selection within Rashomon sets (i.e., models with near-identical predictive performance but differing internal logic). Sensitivity-based metrics are vulnerable to adversarial attacks exploiting off-manifold artifacts and may fail to flag explanation manipulation ("fairwashing") (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026).

AXE, through surrogate predictiveness restricted to empirically observed data, is robust to these manipulations. Empirically, AXE detects adversarial explanation fairwashing attacks with 100% accuracy, whereas sensitivity and ground-truth-matching metrics are frequently fooled (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026).

4. Extensions and Alternative Evaluation Protocols

The AXE paradigm has inspired the development of several methodological variants and meta-evaluation frameworks.

Unified Multi-Axis Frameworks: Some works organize evaluation along the axes of generalizability, fidelity, and persuasibility, often embedding the AXE score as one component in hierarchical protocols (Yang et al., 2019). These frameworks support both machine-only and human-in-the-loop scenarios and clarify best practices for aggregation, domain adaptation, and user relevance.
Robustness-Gap Analysis: For image models, AXE may be instantiated via robustness or discrepancy analysis between aligned and misaligned model–explanation pipelines. This quantifies the gap $\mathbf e_i$ 2 between best-case and worst-case overlap with human-annotated masks, capturing model–explainer vulnerability to adversarial misalignment (Baniecki et al., 2023).
Statistics-Driven Meta-Evaluation: Tools such as MetaQuantus perform "meta-evaluation" of AXE-type quality estimators, measuring their resilience to benign perturbations (intra-consistency) and reactivity to randomization (inter-consistency), without requiring any ground-truth explanations (Hedström et al., 2023).
Synthetic and Semi-Synthetic Benchmarking: Several platforms (such as EXACT) integrate synthetic datasets with analytically known ground-truth masks. These enable algorithmic benchmarking using mass accuracy, rank accuracy, and earth mover's distance (EMD), but their relevance to real-world settings is constrained by the limitations of synthetic-reality alignment (Clark et al., 2024, Amiri et al., 2020, Yalcin et al., 2021).
Model-Intrinsic Perturbative Analysis: Quantitative axes such as objectiveness (bias with respect to Shapley values), completeness, robustness, and mutual verification are introduced to capture explanation fidelity, diversity, and behavioral invariance, again without reference to external annotation (Zhang et al., 2019).

5. Empirical and Theoretical Evaluation

AXE and its variants have been evaluated on a diversity of tasks:

Tabular Classification: AXE robustly distinguishes explanation quality across baseline and adversarial models on UCI Credit, COMPAS, and Communities & Crime datasets. LIME and SHAP, when evaluated with AXE, show variable performance but can be systematically ranked; sensitivity-based evaluations are much less informative in these scenarios (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026).
Vision and Imaging: In ground-truth-available contexts (e.g., CLEVR-XAI, XAI-TRIS, synthetic brain MRI), conventional XAI methods frequently underperform relative to even basic random or edge-detection baselines when judged by overlap-centric metrics. This suggests that even state-of-the-art post-hoc methods are poorly calibrated to synthetic ground-truth in high-dimensional, noise-rich tasks (Clark et al., 2024, Arras et al., 2020).
Model Disambiguation and Fairwashing: AXE precisely identifies explanation manipulations designed to conceal use of protected features, a context in which standard evaluation frameworks systematically fail (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026).
Meta-Evaluation: AXE-type estimators with high intra/inter-consistency and proper reactivity demonstrate statistical robustness when subjected to both minor and disruptive perturbations; this is a necessary minimum standard for any proposed metric (Hedström et al., 2023).

Metric Class	Principle Violations	AXE Principles Satisfied
Ground-truth	Model relativism, context	No
Sensitivity-based	On-manifold evaluation	No
Predictiveness (AXE)	None	Yes

6. Limitations and Ongoing Challenges

While AXE provides a necessary standard for explanation evaluation, several limitations remain:

Data Modality Restriction: Canonical AXE is most naturally formulated for tabular, finite-dimensional spaces; extensions to multi-class, regression, sequence, or continuous data are non-trivial (Rawal et al., 15 May 2025).
Computational Complexity: The AXE surrogate-fitting protocol for large feature sets can incur substantial computational cost due to the number of kNN models instantiated (Rawal et al., 15 May 2025).
Synthetic Benchmarking Gaps: Synthetic or semi-synthetic datasets with analytically available ground-truth explanations are invaluable for debugging and protocol development, but the transferability of results to real-world data (with ambiguous or subjective explanatory targets) is limited (Clark et al., 2024, Amiri et al., 2020).
Metric Scope: Present metrics prize local feature predictiveness and overlap, but explainability desiderata such as semantic coherence, causal completeness, and user trust may require additional axes (Clark et al., 2024, Yang et al., 2019).

A plausible implication is that combination of AXE with human-in-the-loop and causal-inference-based validation will ultimately be essential for a comprehensive evaluation stack.

7. Best Practices and Recommendations

Adopt AXE-Type Predictiveness Metrics First: These should be the baseline for any explanation method prior to human-facing studies or domain deployment.
Disaggregate Metric Profiles: Report multiple metrics (predictiveness, overlap, robustness) per method to ensure transparency regarding strengths and failure modes (Clark et al., 2024).
Adversarial Robustness: Explicitly benchmark explanation methods against fairwashing and Rashomon set adversarial attacks using AXE or equivalent evaluation (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026).
Meta-Evaluation Validation: Vet any new quality estimator for both resilience to benign perturbations and reactivity to disruptive changes (Hedström et al., 2023).
Synthetic Dataset Usage: Use synthetic benchmarks to probe and debug new explanation methods, but avoid overinterpreting direct transfer to unconstrained naturalistic domains (Clark et al., 2024).
Incorporation into Hierarchical Frameworks: Locally predictive explanation evaluation alone is necessary but not sufficient—integrate within broader tiers of fidelity, persuasibility, and user impact (Yang et al., 2019).

Collectively, Ground-truth Agnostic eXplanation Evaluation, as instantiated by AXE and its derivatives, sets a rigorous, empirically validated, and model-relative foundation for benchmarking and advancing XAI methods in the absence of any agreed-upon ground-truth explanation (Rawal et al., 15 May 2025, Rawal et al., 13 Jan 2026, Zhang et al., 2019, Baniecki et al., 2023, Hedström et al., 2023).