Papers
Topics
Authors
Recent
Search
2000 character limit reached

ONNXExplainer: Real-Time Shapley Attributions

Updated 2 July 2026
  • ONNXExplainer is a framework-agnostic explainer that computes Shapley-style feature attributions by integrating forward and backward graphs within the ONNX runtime.
  • It employs custom reverse-mode automatic differentiation and DeepLIFT multipliers to approximate Shapley values, reducing computational complexity from O(2^n) to O(|R|).
  • The system uses cache-based optimization for one-shot, real-time deployment, dramatically speeding up explanation latency over traditional methods like SHAP.

ONNXExplainer is a generic, framework-agnostic explainer that computes Shapley-value–style feature attributions for neural networks represented in the ONNX (Open Neural Network Exchange) format. Designed to address the inefficiencies and lack of cross-platform support in existing explainers such as SHAP for TensorFlow and PyTorch, ONNXExplainer introduces custom automatic differentiation, graph-level optimizations, and one-shot deployment, enabling efficient, model-agnostic, and real-time explainability directly within the ONNX runtime ecosystem (Zhao et al., 2023).

1. System Architecture and Workflow

ONNXExplainer orchestrates model explainability by transforming the ONNX model into integrated forward and backward graphs. The principal architecture consists of the following components:

  • Model Loader & Parser: Loads a frozen ONNX computational graph from storage, constructs a forward symbolic graph, and inverts this into a backward graph where each node contains dedicated "flow-in" and "flow-out" slots for tracking gradients.
  • Gradient Engine / Automatic Differentiation: Implements a custom reverse-mode AD mechanism through a single DFS over the backward graph. Supports four gradient propagation types (one-to-one, many-to-one, one-to-many, many-to-many). Local gradients are derived for standard ONNX ops (e.g. MatMul, Conv, Add, Pooling) using exact partial derivatives for linear ops and DeepLIFT multipliers for nonlinear ops.
  • Shapley Calculator: Uses DeepLIFT multipliers to induce efficient Shapley value approximations over a user-defined reference set RR:

ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)

where MM is the computed multiplier matrix, XX is the query input, and RR is the set of reference inputs.

  • Optimizer / Cache: Precomputes and caches all outputs and intermediate activations f(R)f(R), H(R)H(R) for RR during graph construction. During explanation, only a single forward and backward pass on XX is executed, while reference terms are acquired via efficient lookup from cache.

This workflow allows both inference and explanation to be performed within a single integrated ONNX graph, enabling cross-platform and real-time deployment.

2. Shapley Value Approximation and Differentiation Strategy

ONNXExplainer addresses the computational intractability of exact Shapley value calculation:

ϕi(f,x)=∑S⊆N\{i}∣S∣!(∣N∣−∣S∣−1)!∣N∣!  [f(xS∪{i})−f(xS)]\phi_i(f,x) = \sum_{S \subseteq N \backslash \{i\}} \frac{|S|! (|N| - |S| - 1)!}{|N|!} \; [f(x_{S \cup \{i\}}) - f(x_S)]

by adopting a DeepLIFT-style multiplier approximation. The procedure defines differences from a reference:

  • ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)0
  • Contribution: ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)1 with ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)2
  • Multiplier: ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)3
  • Propagation via chain rule: ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)4

The final attribution is computed as the average over reference comparisons using the multiplier-masked difference ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)5. This approach reduces complexity from ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)6 for exact Shapley to ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)7 for the approximate method.

3. Graph-Level Caching and Computational Optimization

ONNXExplainer introduces a cache-based optimization that precomputes and stores all forward outputs and intermediate activations ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)8 for the reference set. At inference/explanation time:

  • All reference activations are reused through broadcast and lookup within the ONNX computation graph.
  • Only one forward and one backward pass are required per query ϕ≈1∣R∣∑r∈RM⊙(X−Rr)\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)9.
  • This reduces the forward and backward computational cost from MM0 to MM1 per explanation, dramatically lowering both latency and memory consumption relative to methods such as SHAP.

Empirical performance analysis demonstrates that ONNXExplainer can reach up to 500% speedup in explanation latency over SHAP-TensorFlow on models such as VGG19, ResNet50, DenseNet201, and EfficientNetB0, with similar improvements observed for CPU-only scenarios (Zhao et al., 2023).

4. Deployment and Resource Efficiency

ONNXExplainer's one-shot deployment approach packages all necessary components into a single ONNX file containing: 1. The primary forward computational graph 2. The custom backward-pass subgraph for gradient/multiplier calculations 3. A cache subgraph storing reference activations 4. An explanation output node providing Shapley-style attribution maps

This design allows deployment via ONNX Runtime or compatible backends (e.g. Triton) without the need for TensorFlow/PyTorch APIs or Python interpreter. The solution is well suited for production-serving pipelines and edge deployment scenarios.

Resource utilization is quantified in Table 1 below, showing maximum reference set size MM2 avoid out-of-memory (OOM) on V100 GPUs for various models and frameworks:

Model ONNX Opt FP32 / FP16 TF Opt FP32 / FP16 PT Opt FP32 / FP16
VGG19 86 / 166 79 / 149 97 / 175
ResNet50 182 / 362 157 / 242 112 / 253
DenseNet201 78 / 158 60 / 115 72 / 127
EfficientNetB0 166 / 255 154 / 232 114 / 266

The cache-based approach enables higher MM3 values for ONNXExplainer than corresponding non-optimized baselines.

5. Implementation Practices and Parallelization

Best practices identified for ONNXExplainer include:

  • Maintaining only a single copy of reference activations in memory, releasing intermediate tensors post-caching
  • Utilizing ONNX Runtime’s thread pools for parallelizing per-node descriptor operations
  • Batching multiple input queries MM4 along the batch axis to leverage efficient backward pass parallelization
  • Segmenting large batches if memory constrained, with reference cache reused across sub-batches

These strategies further enhance scalability for both GPU and CPU inference backends. The system currently covers more than 25 ONNX operators, supporting key model classes (e.g. CNNs) but not yet RNNs, loops, or custom ops.

6. Trade-offs, Use Cases, and Limitations

ONNXExplainer’s use of DeepLIFT multipliers in place of exact Shapley values introduces a trade-off, reducing computational complexity to MM5 at the expense of approximation fidelity. While a larger reference set improves attribution quality, it increases memory consumption.

This architecture is most appropriate for:

  • Real-time integrated explanation in production pipelines such as fraud detection
  • Edge or mobile deployment scenarios where Python-based frameworks are infeasible
  • Serving environments demanding tight bounds on explanation latency and throughput

Documented limitations include incomplete support for all ONNX operators (notably RNNs), sporadic latency spikes for certain batch sizes under ONNX Runtime, and the potential for further speedups by employing reference set sampling or feature pruning. Extension to additional ops and addressing batch-specific runtime latencies is specified as active and future work.


ONNXExplainer constitutes a substantial advancement in framework-independent neural network explainability via Shapley-style attributions within the ONNX ecosystem, optimizing both computational efficiency and deployment flexibility (Zhao et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ONNXExplainer.