Proof of Efficient Attribution (PoEA)
- Proof of Efficient Attribution (PoEA) is a cryptographic and algorithmic paradigm that ensures efficient, verifiable, and ε-optimal attribution in machine learning.
- It employs interactive PAC protocols and sublinear verifier retrainings to robustly validate computed attributions even in resource-constrained environments.
- Applications include model interpretability, data pricing, fairness, and secure distributed inference, offering scalable and trustworthy ML verification solutions.
Proof of Efficient Attribution (PoEA) is a rigorously defined, algorithmic, and cryptographic paradigm designed to ensure that feature or data attributions in machine learning—especially those derived from computationally intensive methods—are not only computed correctly but can also be checked efficiently and verifiably by resource-constrained parties. PoEA protocols formalize efficiency, completeness, and soundness guarantees in data/model attribution, offering scalable and trustworthy solutions across a broad spectrum of applications, including model interpretability, data pricing, fairness, and distributed AI infrastructure verification.
1. Formal Problem Statement and Theoretical Guarantees
PoEA protocols address the challenge of verifying that an attribution vector—such as a data-influence score, Shapley value, or linear predictor fitted to counterfactual model outputs—is -close (in mean squared error or task-relevant metric) to the optimal attribution, with failure probability at most , while requiring only sublinear computational effort by the verifier.
Given:
- A training set and an associated function representing a model statistic (e.g., a logit, test error differential) that can be evaluated via retraining.
- The objective is to verify a claimed linear datamodel without recomputing expensive influence calculations.
Let be the optimal attribution vector minimizing the -biased MSE:
For any candidate returned by an untrusted party, PoEA ensures that with probability at least , the sub-optimality error
is at most after only verifier retrainings, independent of (Karchmer et al., 14 Aug 2025).
2. Interactive PoEA Protocols and PAC Verification
PoEA instantiates a minimally interactive, resource-efficient protocol between a computationally unbounded Prover (P) and a resource-limited Verifier (V), formalizing the verification task in the interactive PAC framework.
Protocol Structure:
- Phase 1: Verifier prepares challenge sample sets (for residual estimation) and (for mean-squared error evaluation) by sampling from the Boolean hypercube, using random seeds to fix retrainings.
- Phase 2: Prover computes (using, e.g., empirical influence or datamodeling), trains models for all , and returns .
- Phase 3 (Verifier):
- Spot-checks a subset : retrains models to detect Prover deviation.
- Runs a robust degree-2 residual estimator (e.g., Saunshi–Goldwasser 2022) to estimate the error of the best linear fit.
- Computes mean squared error on using local retrainings and outputs if the empirical error is within of the residual estimate, else aborts.
Guarantees:
- Completeness: Honest prover is accepted with probability if attributions are -optimal.
- Soundness: Malicious prover is only accepted with probability if sub-optimality .
- Verifier Complexity: local retrainings and negligible additional computation (Karchmer et al., 14 Aug 2025).
3. Algorithmic Generalizations and Applications
PoEA’s mathematical foundation—linear verification of functions over the Boolean hypercube—broadly encompasses multiple attribution frameworks:
- Any attribution method yielding a linear model (empirical influence, Shapley approximation, representer points) is directly verifiable.
- The core requirement is the ability to spot-check functional evaluations and execute residual estimation over degree-2 Fourier coefficients.
Notable algorithmic instantiations include:
- Least-squares Shapley attribution admits acceleration via block QR, providing unbiased Monte Carlo estimates of feature attributions for least-squares models in time, a polynomial speedup over classical Shapley evaluation (Bell et al., 2023).
- Axiomatic multiterm attribution grounded in Shapley–Aumann–Shapley–Shubik theory ensures uniqueness and efficient computation for multilinear functions and permits explicit allocation in economic and network flows (Sun et al., 2011).
Table: Core Attributes of PoEA Protocols
| Guarantee | PoEA Protocols (Karchmer et al., 14 Aug 2025) | LS-Shapley (Bell et al., 2023) | ASS Axiomatic (Sun et al., 2011) |
|---|---|---|---|
| Soundness | PAC | Unbiased Monte Carlo | Unique under five axioms |
| Verifier Complexity | retrainings | QR/block solves | DP for multilinear |
| Class of Attributable Func | Linear/Boolean hypercube | for LS models | Multilinear + additive functions |
4. Efficient Model Explanation and Submodular Black-box Attribution
Recent advances extend PoEA principles to black-box and submodular contexts. The LiMA framework defines attribution as a submodular maximization problem, optimizing a structured function over input regions. Key structural properties—diminishing returns and monotonicity—enable bidirectional greedy algorithms with -approximation guarantees in model evaluations (Chen et al., 1 Apr 2025). This sharply contrasts with brute-force enumeration and is validated empirically on high-dimensional models.
Faithfulness here is established not by explicit MSE bounds, but via submodular proxy metrics (consistency, collaboration, confidence, and diversity) shown to capture the core attributional signal, with empirical Insertion/Deletion AUC improvements of 30–60% over prior state-of-the-art methods.
5. Cryptographic PoEA: Verifiable Attribution in Large-Scale Distributed Inference
PoEA as a cryptographically-binded consensus primitive emerges in the context of secure, decentralized inference protocols. As instantiated in Optimistic TEE-Rollups (OTR), PoEA binds each model output to a hardware attestation (e.g., via NVIDIA H100 TEE DCAP quote and MRENCLAVE identity), ensuring not only attribution integrity but model authenticity on-chain (Chan et al., 23 Dec 2025).
- Protocol Overview: After enclave execution, the sequencer publishes a tuple , where is a cryptographic attestation on the hashed input/output and a unique enclave measurement.
- Verification: On-chain verification checks (1) validity of attestation under the manufacturer root and (2) that the enclave measurement matches the registered model identity.
- Probabilistic Security: Occasional random zero-knowledge spot-checks ensure that adversaries cannot forge model outputs or downgrade models without overwhelming probability of detection. The expected cost overhead remains marginal ($\approx\$0.07O(N n^2)$ dynamic programming algorithms enable direct computation of the ASS value, permitting practical deployment of PoEA even in combinatorial settings such as advertising auction spend breakdowns, portfolio analysis, and e-commerce funnel attribution.</li> </ul> <p>These results delineate the frontier where efficient, uniquely fair attribution is possible, with necessity and sufficiency grounded in function class.</p> <h2 class='paper-heading' id='practical-impact-and-faithfulness-error-bounds'>7. Practical Impact and Faithfulness Error Bounds</h2> <p>Empirical validation of PoEA instantiations in deep learning and econometrics aligns theoretical guarantees with observed efficiency:</p> <ul> <li>MFABA demonstrates >100$\times1.2\%\times\epsilon(1-1/e-\epsilon)$ approximation) (Chen et al., 1 Apr 2025), or cryptographic (attestation-validity, ZK spot-check sample probability) (Chan et al., 23 Dec 2025).
PoEA thus provides the methodological and practical infrastructure necessary for scalable, trustworthy, and fair deployment of attribution in modern machine learning and distributed inference systems.