Layer-wise Relevance Propagation (LRP)

Updated 2 September 2025

Layer-wise Relevance Propagation (LRP) is an explanation method that redistributes prediction scores layer by layer to identify contributions of input components.
It employs diverse rules, such as ε-LRP, α/β-LRP, and Taylor decomposition, to ensure interpretability and numerical stability in complex models.
LRP is applied in CNNs, NLP, transformers, and other architectures, offering actionable insights for debugging models and enhancing trust in AI applications.

Layer-wise Relevance Propagation (LRP) is a family of explanation algorithms for interpreting complex, non-linear machine learning classifiers—including deep neural networks, Fisher Vector models, and structured data models—by redistributing the prediction score backward through the network's architecture. The core principle is to attribute the model’s output, layer by layer, down to contributions of input components such as image pixels, tokens, features, or neurons. Developed to ensure interpretability, LRP decomposes a prediction into local, typically signed, contributions while maintaining a (generalized) conservation property at each layer.

1. Theoretical Foundations and General Formulation

LRP algorithms propagate “relevance” from output to input based on local redistribution rules. For standard deep neural networks, the most general local rule is

$R_i^{(l)} = \sum_j \frac{z_{ij}}{z_j} R_j^{(l+1)}$

where $R_j^{(l+1)}$ is the relevance at neuron $j$ in layer $l+1$ , $z_{ij}$ is the contribution (e.g., activation-weight product) from neuron $i$ to neuron $j$ , and $z_j = \sum_i z_{ij} + b_j$ is the total pre-activation (including biases). The process is initialized at the output layer by setting relevance to the model’s prediction $f(x)$ for a chosen class.

Several variants exist:

$\varepsilon$ -LRP: Adds stabilization to denominators to avoid instability when $z_j$ is near zero.
$\alpha/\beta$ -LRP: Separates positive and negative contributions. For example, with $\beta = 1$ , known as the LRP- $\alpha_1\beta_1$ rule, the decomposition best matches supported decision processes in ReLU-based CNNs.
Taylor/Deep Taylor Decomposition: Uses first-order Taylor approximations to handle non-standard nonlinearities; for example, product-type nonlinearities in renormalization layers.
R-LRP: A relative formulation that computes relevances without dividing by small values by normalizing with connectivity degrees, preserving conservation up to a constant factor and eliminating the need for tunable hyperparameters (Nyiri et al., 24 Jan 2025).

These rules are tuned for the architecture under consideration and often combined in composite (layer-dependent) strategies for robust attribution (Kohlbrenner et al., 2019).

2. Domain-specific Adaptations and Extensions

LRP permits principled explanation of a wide range of architectures by adapting its rules:

Convolutional Neural Networks (CNNs): Relevance is assigned to image pixels, accounting for the local receptive field and nonlinearities. Adaptation to normalization and pooling layers is achieved via fusion techniques or dedicated rules (e.g., Taylor expansion for local renormalization (Binder et al., 2016); fusion with normalization layers for batch normalization (Guillemot et al., 2020)).
Bag-of-Features and Fisher Vector Models: For models aggregating local image descriptors, LRP is used to map aggregated relevances onto spatial regions by uniform distribution within local receptive fields, employing stabilization ( $\varepsilon = 100$ ) for numerical robustness (Bach et al., 2016).
Natural Language Processing: LRP propagates class outputs to word embeddings to generate token-level or word-level relevances. This produces signed attributions that distinguish supporting and contradictory evidence, outperforming sensitivity analysis in qualitative and quantitative evaluations (Arras et al., 2016).
Recurrent Neural Networks and Reservoir Computing: In LSTMs, GRUs, and Echo State Networks (ESNs), LRP introduces specific propagation rules for weighted and multiplicative gates, temporal unfolding to attribute relevance across time, and management of memory decay effects via reservoir leak rate (Arras et al., 2017, Landt-Hayen et al., 2022, Landt-Hayen et al., 2023).
Structured/tabular Data: LRP extends to 1D CNNs on structured datasets for sample-wise and holistic explanation, enabling competitive feature subset selection and significantly outpacing LIME/SHAP in computation time (Ullah et al., 2020).
Transformer Architectures: LRP-based approaches such as AttnLRP and PA-LRP propagate attribution through softmax attention, feedforward, and positional encoding layers using linearization, Taylor expansion, or symmetry principles to ensure full coverage and conservation of both semantic and positional components (Achtibat et al., 8 Feb 2024, Bakish et al., 2 Jun 2025). Specialized propagation handles matrix operations, positional encodings (both absolute and rotary), and allows holistic attribution over all latent representations.
Residual Networks (ResNet): The presence of skip connections in ResNets requires explicit relevance splitting, either symmetrically or by ratio-based allocation proportionally to the activations in each branch, guaranteeing conservation at branch merges (Otsuki et al., 12 Jul 2024).

3. Control of Explanation Resolution and Semantics

LRP incorporates the concept of decomposition depth (“mapping influence cut-off”), controlling how far relevance is redistributed before switching to uniform or spatially pooled assignment. For convolutional networks, this enables tuning the spatial granularity of the resulting heatmap—from highly local, pixel-level attribution for deep decompositions to more semantic, region-based heatmaps for early cut-offs. In bag-of-features classifiers, the cut-off is set naturally at the descriptor aggregation stage, producing coarser output (Bach et al., 2016). This tradeoff allows practitioners to select between fine-grained local explanations and global semantic focus.

4. Empirical Evaluation and Application Domains

LRP has been empirically validated across a spectrum of domains and architectures:

Image Recognition: Datasets such as PASCAL VOC 2007 and ImageNet serve as standard benchmarks. LRP heatmaps identify discriminative object parts and allow analysis of semantic focus in prediction.
Medical Imaging: In MRI-based Alzheimer's disease classification, LRP heatmaps correlate well with known pathological regions (e.g., temporal lobe, hippocampus), and size-corrected metrics such as relevance density and gain provide quantifiable region-level insights. LRP explanations remain model-dependent and parameter-sensitive (Böhle et al., 2019).
Recommendation Systems: Pixel-level LRP explanations in DCNN-based recommendation models facilitate trust and debuggability in product recommendation scenarios (Bharadhwaj, 2018).
Tabular Feature Analysis: In credit fraud and churn prediction, LRP identifies dominant features in individual and aggregate explanations, with LRP-driven feature selection supporting competitive or improved downstream model performance (Ullah et al., 2020).
Transformer Explainability: AttnLRP, LRP-QViT and PA-LRP frameworks introduce sophisticated layer- and operation-specific attribution rules, producing more faithful attributions for both language and vision models compared to rollout or gradient-based explanations, with holistic coverage including positional encodings (Achtibat et al., 8 Feb 2024, Ranjan et al., 20 Jan 2024, Bakish et al., 2 Jun 2025).

Evaluation metrics include the inside-total relevance ratio, weighted variants, Insertion-Deletion (ID) scores, AUAC/AU-MSE in perturbation studies, segmentation-based precision (e.g., mean IoU), and domain-specific metrics (such as relevance density and gain in neuroimaging).

5. Practical Considerations, Limitations, and Best Practices

Key best practices have emerged for LRP application:

Composite Decomposition: Empirical studies favor composite strategies that select specific LRP rules per layer/operation type over uniform application—yielding more robust, localized, and class-discriminative attributions, especially in deep architectures subject to gradient shattering (Kohlbrenner et al., 2019).
Numerical Stability: Division by small denominators is a critical challenge in standard LRP; stabilization ( $\varepsilon$ ), relative normalization (R-LRP), or special handling of skip connections address these issues (Nyiri et al., 24 Jan 2025, Otsuki et al., 12 Jul 2024).
Parameter Sensitivity and Artifacts: Explanation quality depends on model training, parameter choices, and architecture. Regularization (e.g., L1) can improve focus but may reveal model-inherent artifacts, particularly in CNNs with shared kernels or ESNs with strong memory decay (Landt-Hayen et al., 2023).
Extension to Novel Operations: New architectures require additions—e.g., Taylor expansion for product nonlinearities, explicit attention and positional encoding attribution in transformers, relevance splitting for residual branches (Binder et al., 2016, Achtibat et al., 8 Feb 2024, Otsuki et al., 12 Jul 2024, Bakish et al., 2 Jun 2025).
Computational Efficiency: LRP can be implemented in a single backward pass, offering speed advantages over perturbation-based methods such as LIME/SHAP in structured data (Ullah et al., 2020), and scalable GPU implementations for large transformer models (Achtibat et al., 8 Feb 2024).

Current limitations include the dependence on the underlying model for biological or semantic fidelity, sensitivity to rule and parameter selection, and (in R-LRP) the conservation property being up to a constant factor, rather than strict equality. For some tasks (e.g., structured data), the interpretability of LRP-derived features depends on domain characteristics and application context.

6. Impact and Future Directions

LRP provides a rigorous, architecturally adaptable suite of techniques for model interpretation. It is well established in computer vision, medical image analysis, recommendation, NLP, and time series domains. Its methods have informed both scientific validation (e.g., correlating heatmaps with known physiological structures) and practical improvements (e.g., reducing shortcut learning through LRP-optimized training (Bassi et al., 2022)).

Emerging research directions include broader incorporation into the training loop (for robust generalization control via explanation-based losses), refined handling for modern architectures (e.g., self-attention, positional embeddings, residual and normalization layers), and extension to neuron-level or concept-based explanations using network graph traversal and deconvolutional visualization (Bhati et al., 7 Dec 2024). The adaptability of LRP rules for structured or sequence data, transformers, and hybrid models is an ongoing area of active development, as evidenced by state-of-the-art advances in transformer quantization, latent explanation, and positional relevance decomposition (Ranjan et al., 20 Jan 2024, Achtibat et al., 8 Feb 2024, Bakish et al., 2 Jun 2025).

LRP continues to be foundational for trustworthy AI by providing transparent, quantitative, and domain-adaptable explanations that can be evaluated, compared, and used to guide both model debugging and scientific understanding across a wide array of applications.