Activation Perturbation for Exploration (APEX)

Updated 10 February 2026

The paper introduces APEX as a probing technique that injects Gaussian noise into hidden activations to interpolate between input-sensitive and model-driven behaviors.
APEX employs Monte Carlo sampling over multiple noise scales to compute escape noise, providing precise diagnostics for sample regularity, semantic alignment, and backdoor detection.
By modulating noise levels, APEX enables researchers to gain both local insights and global bias analysis without needing model retraining or ensemble methods.

Activation Perturbation for EXploration (APEX) is an inference-time probing paradigm for neural networks that systematically injects Gaussian noise into hidden activations, while holding both the model input and parameters fixed. APEX is designed to address limitations inherent in input-space and parameter perturbation approaches, providing a direct lens into the structure and regularities encoded in intermediate network representations. By varying the noise scale, APEX enables a controlled transition from sample-dependent, input-sensitive responses to model-driven, input-agnostic behaviors, offering both local and global perspectives on network decision processes (Ren et al., 3 Feb 2026).

1. Formalism and Algorithmic Specification

Consider an $L$ -layer feed-forward network $f_\theta: \mathbb{R}^d \rightarrow \mathbb{R}^c$ with pre-activations and post-activations given by

$z_\ell = W_\ell a_{\ell-1} + b_\ell,\quad a_\ell = \phi(z_\ell) \quad (a_0 = x)$

where $\theta = (W, b)$ . APEX introduces additive Gaussian noise to each post-activation at inference:

$\tilde a_\ell(x; \sigma) = \phi(z_\ell(x)) + \sigma \xi_\ell, \quad \xi_\ell \sim \mathcal{N}(0, I)$

for each $\ell = 1,\dots,L$ and noise scale $\sigma > 0$ . The final logits are

$s(x; \sigma) = U\,\tilde a_L(x; \sigma) + c$

and the predicted label is

$k^*(x;\sigma) = \arg\max_k s_k(x;\sigma)$

The empirical output distribution is estimated via $T$ Monte Carlo forward passes:

$\hat P_x(k;\sigma) = \frac{1}{T} \sum_{t=1}^T \mathbf{1}\{ k^*_t = k \}$

APEX thus interpolates between sample-dependent ( $\sigma\to0$ ) and model-dependent ( $\sigma\to\infty$ ) response regimes.

Algorithmically, for each input $x_i$ , each chosen noise scale $\sigma_s$ , and each of $T$ forward passes:

Independently sample $\xi_\ell^t$ for all $\ell$
Inject $\sigma_s \xi^t_\ell$ after each layer’s activation in the network
Record the resulting top-1 class $k^*_t$
Aggregate to yield $\hat P_{x_i}(k;\sigma_s)$

The escape noise $\sigma_\mathrm{escape}(x_i)$ for input $x_i$ is defined as the minimal $\sigma_s$ where the original prediction’s probability drops below a fixed threshold $\tau$ .

2. Theoretical Framework and Decomposition

At the core of APEX is a decomposition theorem applicable for any $x$ with $\|x\| \leq R$ and all $\sigma > 0$ :

$\tilde a_\ell(x; \sigma) = \sigma v_\ell + r_\ell(x; \sigma)$

where $v_\ell$ is a function solely of the network parameters and the sampled noise up to layer $\ell$ , while the residual $r_\ell(x; \sigma)$ is uniformly bounded in norm. At the output layer,

$s(x;\sigma) = \sigma Uv_L + e(x;\sigma),\qquad \|e(x;\sigma)\|_\infty \leq C$

The prediction simplifies in the large-noise limit:

$\arg\max_i s_i(x;\sigma) = \arg\max_i \left[ (Uv_L)_i + \frac{e_i(x;\sigma)}{\sigma} \right]$

Thus, as $\sigma \rightarrow \infty$ , predictions become independent of $x$ and depend only on random features $Uv_L$ . This demonstrates that APEX suppresses input-specific signals and amplifies the structural, representation-level aspects embedded in the model.

Input perturbation, $x \rightarrow x + \varepsilon$ , is shown to be a constrained form of activation perturbation, as its induced change at layer $\ell$ is

$\Delta_\ell(x, \varepsilon) = a_\ell(x + \varepsilon) - a_\ell(x) \approx J_{a_\ell}(x)\, \varepsilon$

where $J_{a_\ell}(x)$ is the Jacobian; input noise thus spans a low-dimensional subspace of the activation space, in contrast to the full-variance, unconstrained perturbations of APEX.

3. Probing Regimes and Interpretive Phenomena

There exists a qualitative dichotomy between small- and large-noise regimes:

Small-Noise Regime ( $\sigma \ll 1$ ): The residual $r_\ell(x;\sigma)$ dominates. Predictions remain input-sensitive. In this regime, escape noise correlates strongly with sample regularity metrics such as memorization score and consistency/C-score (Spearman’s $\rho \approx 0.7$ –$0.9$ on ImageNet and CIFAR-100). APEX detects smooth semantic transitions in networks trained on controlled splits, exhibiting monotonic probability transfer that aligns with learned representations.
Large-Noise Regime ( $\sigma \gg 1$ ): The term $\sigma v_L$ becomes dominant, rendering predictions input-agnostic. The network's output converges to a stationary, model-characteristic distribution. This regime exposes global biases: benign models exhibit high entropy output distributions, whereas backdoored models demonstrate collapse of output probability onto the target class (near-zero entropy).

The framework enables computation of normalized entropy,

$H = -\frac{1}{\log c}\sum_k P(k) \log P(k)$

and quantification of target-class concentration, serving as diagnostics for backdoor detection and capacity-induced bias amplification.

4. Empirical Evaluations and Case Studies

APEX has been systematically validated through distinct probes:

Probe Type	Quantitative Outcome	Interpretation
Sample Regularity	Spearman’s $\rho \approx 0.7$ –$0.9$ with memorization score	Effective, lightweight alternative to ensembles
Random-Label Models	Average escape noise decreases with more random labeling	Reveals fragmented, non-semantic decision regions
Semantic Alignment	Monotonic class transfer under activation noise only	Confirms structure in representation space
Backdoor Detection	Target class $\hat P_\mathrm{target} \approx 0.90$ –$1.0$ vs. $\approx 0.1$ –$0.2$ (benign); entropy collapse	Captures global, training-induced bias
Model Architecture Sensitivity	Deeper ResNets: stronger probability collapse; ViTs: partial, attenuated collapse	Architecture-dependent bias revelation

In all cases, input- or parameter-level perturbations fail to exhibit the monotonicity, transition alignment, or diagnostic sharpness furnished by APEX.

5. Computational Complexity and Practical Implementation

Each estimation for APEX requires $T$ full forward passes per input; $T=1{,}000$ is typical for CIFAR and $T=100$ for ImageNet. The noise injection itself constitutes a layerwise vector addition, which introduces minimal computational overhead. No model retraining or ensemble construction is necessary; all analysis occurs at inference with fixed weights and is trivially parallelizable over both examples and Monte Carlo samples. Choice of noise scale $\sigma$ and threshold $\tau$ for metrics such as escape noise allows sensitivity–cost trade-offs.

APEX complements input and parameter perturbation probes—by acting directly on hidden representations, it accesses structural information that cannot be inferred from reachable input space alone.

6. Methodological Distinction and Conceptual Scope

APEX unifies local sample analysis and global model bias probing within a single, theoretically grounded framework. It admits input perturbation as a constrained, degenerate instance, subsuming prior approaches in expressive power. The ability to interpolate between regimes by modulating $\sigma$ enables fine-grained scrutiny of memorization, regularity, semantic partitioning, and bias-induced collapse, revealing properties inaccessible to traditional probing techniques. The method's lightweight computational profile combined with its ability to interrogate internal network structure positions it as an effective probe for model interpretability, robustness diagnostics, and backdoor detection (Ren et al., 3 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

APEX: Probing Neural Networks via Activation Perturbation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Activation Perturbation for EXploration (APEX).

Activation Perturbation for Exploration (APEX)

1. Formalism and Algorithmic Specification

2. Theoretical Framework and Decomposition

3. Probing Regimes and Interpretive Phenomena

4. Empirical Evaluations and Case Studies

5. Computational Complexity and Practical Implementation

6. Methodological Distinction and Conceptual Scope

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Activation Perturbation for Exploration (APEX)

1. Formalism and Algorithmic Specification

2. Theoretical Framework and Decomposition

3. Probing Regimes and Interpretive Phenomena

4. Empirical Evaluations and Case Studies

5. Computational Complexity and Practical Implementation

6. Methodological Distinction and Conceptual Scope

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research