Uncertainty propagation in feed-forward neural network models

Published 27 Mar 2025 in cs.LG and stat.ML | (2503.21059v2)

Abstract: We develop new uncertainty propagation methods for feed-forward neural network architectures with leaky ReLU activation functions subject to random perturbations in the input vectors. In particular, we derive analytical expressions for the probability density function (PDF) of the neural network output and its statistical moments as a function of the input uncertainty and the parameters of the network, i.e., weights and biases. A key finding is that an appropriate linearization of the leaky ReLU activation function yields accurate statistical results even for large perturbations in the input vectors. This can be attributed to the way information propagates through the network. We also propose new analytically tractable Gaussian copula surrogate models to approximate the full joint PDF of the neural network output. To validate our theoretical results, we conduct Monte Carlo simulations and a thorough error analysis on a multi-layer neural network representing a nonlinear integro-differential operator between two polynomial function spaces. Our findings demonstrate excellent agreement between the theoretical predictions and Monte Carlo simulations.

Abstract PDF Upgrade to Chat

Summary

The paper presents analytical and semi-analytical methods for uncertainty propagation in feed-forward neural networks with Leaky ReLU activations.
It details techniques to derive probability density functions and statistical moments of network outputs and uses Gaussian copula surrogate models for joint distributions.
Validation against Monte Carlo simulations showed excellent agreement, indicating these methods offer a computationally efficient alternative for quantifying NN uncertainty.

The challenge of quantifying and propagating uncertainty through feed-forward neural networks (FNNs) is significant, particularly when input data is subject to random perturbations. Standard methods like Monte Carlo (MC) simulations can be computationally prohibitive, especially for complex networks or high-dimensional inputs. The paper "Uncertainty propagation in feed-forward neural network models" (2503.21059) presents analytical and semi-analytical techniques specifically designed for FNNs employing Leaky Rectified Linear Unit (Leaky ReLU) activation functions. These methods aim to derive the probability density function (PDF) and statistical moments of the network's output based on the characterized input uncertainty and fixed network parameters (weights and biases).

Analytical Uncertainty Propagation Methodology

The core contribution is the derivation of analytical expressions for the PDF of the output of each layer in an FNN, given the PDF of the previous layer's output (or the initial input uncertainty). Consider a single layer $l$ in an FNN. The pre-activation output $\mathbf{z}^{(l)}$ is computed as an affine transformation of the previous layer's activation output $\mathbf{a}^{(l-1)}$ :

$\mathbf{z}^{(l)} = \mathbf{W}^{(l)} \mathbf{a}^{(l-1)} + \mathbf{b}^{(l)}$

where $\mathbf{W}^{(l)}$ and $\mathbf{b}^{(l)}$ are the weight matrix and bias vector for layer $l$ , respectively. The activation output $\mathbf{a}^{(l)}$ is then obtained by applying the activation function element-wise, $\mathbf{a}^{(l)} = \sigma(\mathbf{z}^{(l)})$ .

For the Leaky ReLU activation function, defined as:

$\sigma(z) = \begin{cases} z & \text{if } z > 0 \ \alpha z & \text{if } z \le 0 \end{cases}$

where $\alpha$ is a small positive constant (e.g., 0.01), the paper proposes a specific linearization strategy. The key insight is that even for significant input perturbations, the statistical properties of the output can be accurately captured by approximating the Leaky ReLU function. While the paper does not explicitly state the exact form of the linearization used in the abstract, analytical propagation often relies on techniques like Taylor expansion or assuming Gaussian distributions at each layer. Given the mention of deriving the exact PDF analytically, it likely involves deriving the distribution of the affine transformation $\mathbf{z}^{(l)}$ first. If $\mathbf{a}^{(l-1)}$ is assumed to follow a certain distribution (e.g., Gaussian, or a distribution derived from the previous layer), the distribution of $\mathbf{z}^{(l)}$ can be derived as it's a linear transformation.

The challenge lies in propagating this distribution through the non-linear activation function $\sigma$ . For the Leaky ReLU, the output distribution of $a_i^{(l)} = \sigma(z_i^{(l)})$ can be derived if the PDF of $z_i^{(l)}$ , let's call it $f_{Z_i^{(l)}}(z)$ , is known. The PDF of $A_i^{(l)} = \sigma(Z_i^{(l)})$ , denoted $f_{A_i^{(l)}}(a)$ , can be found using the change of variables formula. Since Leaky ReLU is piecewise linear, the resulting PDF $f_{A_i^{(l)}}(a)$ will be related to $f_{Z_i^{(l)}}(z)$ over different domains:

$f_{A_i^{(l)}}(a) = \begin{cases} f_{Z_i^{(l)}}(a) & \text{if } a > 0 \ f_{Z_i^{(l)}}(a/\alpha) / |\alpha| & \text{if } a \le 0 \end{cases}$

(assuming $\alpha \neq 0$ ). The derivation requires careful handling of the probability mass concentrated at $a=0$ if the original ReLU were used, but Leaky ReLU avoids this singularity. The paper claims that an "appropriate linearization" yields accurate results, suggesting that perhaps a moment-matching approach or a simplified functional form is used in practice, rather than propagating the full complex PDF derived from the exact piecewise transformation, especially for deeper networks where dependencies between neurons become complex. The accuracy claim, even for large perturbations, implies the chosen approximation effectively captures the essential statistical behavior induced by the Leaky ReLU's structure.

The propagation proceeds layer by layer:

Start with the input layer $\mathbf{a}^{(0)} = \mathbf{x}$ , where the PDF of $\mathbf{x}$ is known.
For layer $l=1, \dots, L$ : a. Compute the distribution of the pre-activation $\mathbf{z}^{(l)} = \mathbf{W}^{(l)} \mathbf{a}^{(l-1)} + \mathbf{b}^{(l)}$ based on the distribution of $\mathbf{a}^{(l-1)}$ . This involves deriving the distribution of linear combinations of potentially dependent random variables from the previous layer. b. Compute the distribution of the activation output $\mathbf{a}^{(l)} = \sigma(\mathbf{z}^{(l)})$ using the derived distribution of $\mathbf{z}^{(l)}$ and the properties of the Leaky ReLU function (potentially using the exact transformation or the validated linearization).
The final output layer $\mathbf{a}^{(L)}$ yields the desired output distribution.

Derivation of Statistical Moments

Once the analytical form (or a suitable approximation) of the PDF for the network output $\mathbf{a}^{(L)}$ is obtained, denoted $f_{\mathbf{A}^{(L)}}(\mathbf{a})$ , the statistical moments can be computed directly via integration. For example, the mean output is:

$\mathbb{E}[\mathbf{A}^{(L)}] = \int \mathbf{a} f_{\mathbf{A}^{(L)}}(\mathbf{a}) d\mathbf{a}$

The covariance matrix $\text{Cov}(\mathbf{A}^{(L)})$ can be computed as:

$\text{Cov}(\mathbf{A}^{(L)}) = \mathbb{E}[(\mathbf{A}^{(L)} - \mathbb{E}[\mathbf{A}^{(L)}])(\mathbf{A}^{(L)} - \mathbb{E}[\mathbf{A}^{(L)}])^T]$

$= \mathbb{E}[\mathbf{A}^{(L)}(\mathbf{A}^{(L)})^T] - \mathbb{E}[\mathbf{A}^{(L)}]\mathbb{E}[\mathbf{A}^{(L)}]^T$

where $\mathbb{E}[\mathbf{A}^{(L)}(\mathbf{A}^{(L)})^T] = \int \mathbf{a}\mathbf{a}^T f_{\mathbf{A}^{(L)}}(\mathbf{a}) d\mathbf{a}$ .

The paper emphasizes that these moments are analytical functions of the input uncertainty parameters (e.g., mean and variance of the input distribution) and the network parameters ( $\mathbf{W}^{(l)}, \mathbf{b}^{(l)}$ for all $l$ ). This allows for efficient sensitivity analysis and understanding of how different sources of uncertainty influence the output statistics without repeated sampling. The success of the linearization is crucial here; if the derived PDF is complex, computing these integrals might still be challenging, but if the linearization leads to tractable forms (e.g., Gaussian approximations or mixtures), the moment calculations become feasible.

Gaussian Copula Surrogate Models

Propagating the full joint PDF through the network can become analytically intractable or result in extremely complex distributions, especially in higher dimensions. Correlations induced between neuron outputs within a layer, even if inputs were independent, complicate the analysis for subsequent layers. To address this, the paper proposes using Gaussian copula surrogate models.

A copula is a function that links univariate marginal distribution functions to their full multivariate distribution function. A Gaussian copula specifically models the dependence structure using a multivariate Gaussian distribution. The process involves:

Estimate the marginal PDFs for each output neuron $A_i^{(L)}$ using the analytical methods described earlier (or their linearized approximations). Let $F_{A_i^{(L)}}(a_i)$ be the marginal cumulative distribution function (CDF) for the $i$ -th output.
Estimate the correlation matrix $\mathbf{R}$ capturing the dependence between the output neurons. This correlation might be estimated using the analytically derived moments (specifically, the covariance matrix) or potentially via limited MC sampling if analytical derivation is too complex.
Construct the Gaussian copula function $C_{\mathbf{R}}^{\text{Gauss}}(\mathbf{u}) = \Phi_{\mathbf{R}}(\Phi^{-1}(u_1), \dots, \Phi^{-1}(u_d))$ , where $d$ is the output dimension, $\Phi^{-1}$ is the inverse CDF of a standard normal distribution, and $\Phi_{\mathbf{R}}$ is the CDF of a multivariate normal distribution with mean zero and correlation matrix $\mathbf{R}$ .
The joint CDF of the output $\mathbf{A}^{(L)}$ is then approximated as $F_{\mathbf{A}^{(L)}}(\mathbf{a}) \approx C_{\mathbf{R}}^{\text{Gauss}}(F_{A_1^{(L)}}(a_1), \dots, F_{A_d^{(L)}}(a_d))$ .

The advantage of this approach is its analytical tractability. Once the marginal distributions and the correlation matrix are determined, the Gaussian copula provides a well-defined, easily computable surrogate for the full joint PDF. This surrogate can then be used for downstream tasks like risk assessment or decision-making under uncertainty, preserving the dependencies estimated between outputs while relying on the analytically derived marginals.

Validation and Numerical Results

The proposed methods were validated against MC simulations. The test case involved an FNN trained to approximate a nonlinear integro-differential operator mapping between two polynomial function spaces. This represents a challenging, high-dimensional function approximation task common in scientific computing and engineering.

The key result highlighted is the "excellent agreement" between the theoretical predictions (derived analytical PDFs, moments, and potentially the copula models) and the results from extensive MC simulations. This agreement reportedly holds even for large perturbations in the input vectors, which supports the claim regarding the effectiveness of the Leaky ReLU linearization strategy. A thorough error analysis was conducted, likely quantifying metrics such as the Kullback-Leibler (KL) divergence between the predicted and MC-estimated PDFs, or the relative error in the computed moments (mean and variance). The strong numerical results suggest that the analytical framework provides a computationally efficient alternative to MC for UP in FNNs with Leaky ReLU activations, particularly for the specific type of operator approximation task tested.

Implementation Considerations

Implementing the proposed analytical UP methods involves several considerations:

Computational Complexity: The primary advantage is avoiding the high computational cost of MC sampling, which scales with the number of samples required for convergence. The analytical method's complexity depends on the network size (number of layers and neurons) and the complexity of the PDF calculations at each layer. Deriving and evaluating the analytical expressions involves matrix multiplications and potentially complex integral evaluations (for moments) or manipulations of PDF/CDF expressions. While potentially intensive for very deep/wide networks, it's independent of the input perturbation magnitude or the required precision level in the same way MC sampling is. The Gaussian copula adds the step of estimating or deriving the correlation matrix.
Software Implementation: Implementing this requires careful handling of probability distributions. Libraries like scipy.stats in Python or specialized probabilistic programming frameworks could be useful. Representing and manipulating the potentially complex analytical forms of the PDFs derived after passing through Leaky ReLU layers is a key implementation challenge. Pseudocode for a single layer propagation might look like:

# Assume input_pdf is the PDF object for layer l-1 output a^(l-1)
# Assume W, b are parameters for layer l
# Assume leaky_relu_alpha is the slope for negative inputs

def propagate_layer_pdf(input_pdf, W, b, leaky_relu_alpha):
    # 1. PDF of pre-activation z = W * a^(l-1) + b
    # This involves deriving the PDF of a linear transformation
    # of the random variable(s) described by input_pdf.
    # If input_pdf is Gaussian, z_pdf will also be Gaussian.
    # If input_pdf is more complex, this step is non-trivial,
    # especially handling dependencies between elements of a^(l-1).
    z_pdf = derive_linear_transform_pdf(input_pdf, W, b)

    # 2. PDF of activation a = leaky_relu(z)
    # This applies the piecewise linear transformation rule.
    # Requires careful handling of the PDF across z=0.
    # May involve the 'appropriate linearization' mentioned in the paper.
    output_pdf = derive_leaky_relu_transform_pdf(z_pdf, leaky_relu_alpha)

    return output_pdf

def derive_linear_transform_pdf(input_pdf, W, b):
    # Implementation depends heavily on the type of input_pdf
    # (e.g., Gaussian, mixture model, analytically defined)
    # ... calculation ...
    pass

def derive_leaky_relu_transform_pdf(z_pdf, alpha):
    # Implementation applies the change of variables formula for
    # the piecewise leaky_relu function to z_pdf.
    # ... calculation ...
    pass

# Example usage for a network:
current_pdf = input_distribution
for layer in network.layers:
    current_pdf = propagate_layer_pdf(current_pdf, layer.W, layer.b, layer.leaky_relu_alpha)
final_output_pdf = current_pdf

Applicability and Generalization: The methods are specifically developed for FNNs with Leaky ReLU activations. Extending them to other activation functions (e.g., Sigmoid, Tanh, GeLU) would require deriving equivalent analytical propagation rules, which might be significantly more complex due to their non-linear, non-piecewise-linear nature. Applicability to other architectures like CNNs or RNNs would also require substantial modifications to account for weight sharing, pooling operations, or recurrent connections.
Linearization Accuracy: The claim that linearization works well even for large perturbations is significant but might be context-dependent. The specific linearization technique used and its underlying assumptions are critical. Its robustness across different network structures, depths, and data distributions beyond the tested integro-differential operator should be further investigated. The nature of information propagation that makes this linearization effective warrants deeper examination.
Gaussian Copula Limitations: While analytically tractable, Gaussian copulas assume a specific dependence structure (elliptical). If the true dependence structure between network outputs is highly non-Gaussian, the surrogate model might provide a poor approximation of the joint behavior, even if the marginals are accurate.
Use Cases: This methodology is particularly relevant for applications where quantifying output uncertainty is critical and MC simulation is too slow. Examples include:
- Physics-informed neural networks (PINNs) solving differential equations with uncertain parameters or boundary conditions.
- Control systems employing NN controllers where input sensor noise propagates to affect control actions.
- Financial modeling where input market data uncertainty impacts prediction risk.
- Safety-critical applications requiring guarantees or bounds on NN output variability.

Conclusion

The work presented in "Uncertainty propagation in feed-forward neural network models" (2503.21059) offers a valuable contribution to uncertainty quantification in neural networks. By deriving analytical expressions for the PDF and moments of FNN outputs with Leaky ReLU activations, and proposing Gaussian copula surrogates for the joint PDF, it provides a potentially computationally efficient alternative to Monte Carlo methods. The reported strong agreement with MC simulations on a complex task, even under large input perturbations, underscores the potential practical utility of the proposed analytical framework, particularly the effectiveness of the chosen linearization strategy for Leaky ReLU within the context of network information propagation. Further investigation into the specific linearization technique and its applicability across diverse network architectures and problems is warranted.

Markdown