ONNXExplainer: Real-Time Shapley Attributions

Updated 2 July 2026

ONNXExplainer is a framework-agnostic explainer that computes Shapley-style feature attributions by integrating forward and backward graphs within the ONNX runtime.
It employs custom reverse-mode automatic differentiation and DeepLIFT multipliers to approximate Shapley values, reducing computational complexity from O(2^n) to O(|R|).
The system uses cache-based optimization for one-shot, real-time deployment, dramatically speeding up explanation latency over traditional methods like SHAP.

ONNXExplainer is a generic, framework-agnostic explainer that computes Shapley-value–style feature attributions for neural networks represented in the ONNX (Open Neural Network Exchange) format. Designed to address the inefficiencies and lack of cross-platform support in existing explainers such as SHAP for TensorFlow and PyTorch, ONNXExplainer introduces custom automatic differentiation, graph-level optimizations, and one-shot deployment, enabling efficient, model-agnostic, and real-time explainability directly within the ONNX runtime ecosystem (Zhao et al., 2023).

1. System Architecture and Workflow

ONNXExplainer orchestrates model explainability by transforming the ONNX model into integrated forward and backward graphs. The principal architecture consists of the following components:

Model Loader & Parser: Loads a frozen ONNX computational graph from storage, constructs a forward symbolic graph, and inverts this into a backward graph where each node contains dedicated "flow-in" and "flow-out" slots for tracking gradients.
Gradient Engine / Automatic Differentiation: Implements a custom reverse-mode AD mechanism through a single DFS over the backward graph. Supports four gradient propagation types (one-to-one, many-to-one, one-to-many, many-to-many). Local gradients are derived for standard ONNX ops (e.g. MatMul, Conv, Add, Pooling) using exact partial derivatives for linear ops and DeepLIFT multipliers for nonlinear ops.
Shapley Calculator: Uses DeepLIFT multipliers to induce efficient Shapley value approximations over a user-defined reference set $R$ :

$\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$

where $M$ is the computed multiplier matrix, $X$ is the query input, and $R$ is the set of reference inputs.

Optimizer / Cache: Precomputes and caches all outputs and intermediate activations $f(R)$ , $H(R)$ for $R$ during graph construction. During explanation, only a single forward and backward pass on $X$ is executed, while reference terms are acquired via efficient lookup from cache.

This workflow allows both inference and explanation to be performed within a single integrated ONNX graph, enabling cross-platform and real-time deployment.

2. Shapley Value Approximation and Differentiation Strategy

ONNXExplainer addresses the computational intractability of exact Shapley value calculation:

$\phi_i(f,x) = \sum_{S \subseteq N \backslash \{i\}} \frac{|S|! (|N| - |S| - 1)!}{|N|!} \; [f(x_{S \cup \{i\}}) - f(x_S)]$

by adopting a DeepLIFT-style multiplier approximation. The procedure defines differences from a reference:

$\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 0
Contribution: $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 1 with $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 2
Multiplier: $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 3
Propagation via chain rule: $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 4

The final attribution is computed as the average over reference comparisons using the multiplier-masked difference $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 5. This approach reduces complexity from $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 6 for exact Shapley to $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 7 for the approximate method.

3. Graph-Level Caching and Computational Optimization

ONNXExplainer introduces a cache-based optimization that precomputes and stores all forward outputs and intermediate activations $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 8 for the reference set. At inference/explanation time:

All reference activations are reused through broadcast and lookup within the ONNX computation graph.
Only one forward and one backward pass are required per query $\phi \approx \frac{1}{|R|} \sum_{r \in R} M \odot (X - R_r)$ 9.
This reduces the forward and backward computational cost from $M$ 0 to $M$ 1 per explanation, dramatically lowering both latency and memory consumption relative to methods such as SHAP.

Empirical performance analysis demonstrates that ONNXExplainer can reach up to 500% speedup in explanation latency over SHAP-TensorFlow on models such as VGG19, ResNet50, DenseNet201, and EfficientNetB0, with similar improvements observed for CPU-only scenarios (Zhao et al., 2023).

4. Deployment and Resource Efficiency

ONNXExplainer's one-shot deployment approach packages all necessary components into a single ONNX file containing: 1. The primary forward computational graph 2. The custom backward-pass subgraph for gradient/multiplier calculations 3. A cache subgraph storing reference activations 4. An explanation output node providing Shapley-style attribution maps

This design allows deployment via ONNX Runtime or compatible backends (e.g. Triton) without the need for TensorFlow/PyTorch APIs or Python interpreter. The solution is well suited for production-serving pipelines and edge deployment scenarios.

Resource utilization is quantified in Table 1 below, showing maximum reference set size $M$ 2 avoid out-of-memory (OOM) on V100 GPUs for various models and frameworks:

Model	ONNX Opt FP32 / FP16	TF Opt FP32 / FP16	PT Opt FP32 / FP16
VGG19	86 / 166	79 / 149	97 / 175
ResNet50	182 / 362	157 / 242	112 / 253
DenseNet201	78 / 158	60 / 115	72 / 127
EfficientNetB0	166 / 255	154 / 232	114 / 266

The cache-based approach enables higher $M$ 3 values for ONNXExplainer than corresponding non-optimized baselines.

5. Implementation Practices and Parallelization

Best practices identified for ONNXExplainer include:

Maintaining only a single copy of reference activations in memory, releasing intermediate tensors post-caching
Utilizing ONNX Runtime’s thread pools for parallelizing per-node descriptor operations
Batching multiple input queries $M$ 4 along the batch axis to leverage efficient backward pass parallelization
Segmenting large batches if memory constrained, with reference cache reused across sub-batches

These strategies further enhance scalability for both GPU and CPU inference backends. The system currently covers more than 25 ONNX operators, supporting key model classes (e.g. CNNs) but not yet RNNs, loops, or custom ops.

6. Trade-offs, Use Cases, and Limitations

ONNXExplainer’s use of DeepLIFT multipliers in place of exact Shapley values introduces a trade-off, reducing computational complexity to $M$ 5 at the expense of approximation fidelity. While a larger reference set improves attribution quality, it increases memory consumption.

This architecture is most appropriate for:

Real-time integrated explanation in production pipelines such as fraud detection
Edge or mobile deployment scenarios where Python-based frameworks are infeasible
Serving environments demanding tight bounds on explanation latency and throughput

Documented limitations include incomplete support for all ONNX operators (notably RNNs), sporadic latency spikes for certain batch sizes under ONNX Runtime, and the potential for further speedups by employing reference set sampling or feature pruning. Extension to additional ops and addressing batch-specific runtime latencies is specified as active and future work.

ONNXExplainer constitutes a substantial advancement in framework-independent neural network explainability via Shapley-style attributions within the ONNX ecosystem, optimizing both computational efficiency and deployment flexibility (Zhao et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

ONNXExplainer: an ONNX Based Generic Framework to Explain Neural Networks Using Shapley Values (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ONNXExplainer.