Papers
Topics
Authors
Recent
Search
2000 character limit reached

Activation Perturbation for Exploration (APEX)

Updated 10 February 2026
  • The paper introduces APEX as a probing technique that injects Gaussian noise into hidden activations to interpolate between input-sensitive and model-driven behaviors.
  • APEX employs Monte Carlo sampling over multiple noise scales to compute escape noise, providing precise diagnostics for sample regularity, semantic alignment, and backdoor detection.
  • By modulating noise levels, APEX enables researchers to gain both local insights and global bias analysis without needing model retraining or ensemble methods.

Activation Perturbation for EXploration (APEX) is an inference-time probing paradigm for neural networks that systematically injects Gaussian noise into hidden activations, while holding both the model input and parameters fixed. APEX is designed to address limitations inherent in input-space and parameter perturbation approaches, providing a direct lens into the structure and regularities encoded in intermediate network representations. By varying the noise scale, APEX enables a controlled transition from sample-dependent, input-sensitive responses to model-driven, input-agnostic behaviors, offering both local and global perspectives on network decision processes (Ren et al., 3 Feb 2026).

1. Formalism and Algorithmic Specification

Consider an LL-layer feed-forward network fθ:RdRcf_\theta: \mathbb{R}^d \rightarrow \mathbb{R}^c with pre-activations and post-activations given by

z=Wa1+b,a=ϕ(z)(a0=x)z_\ell = W_\ell a_{\ell-1} + b_\ell,\quad a_\ell = \phi(z_\ell) \quad (a_0 = x)

where θ=(W,b)\theta = (W, b). APEX introduces additive Gaussian noise to each post-activation at inference:

a~(x;σ)=ϕ(z(x))+σξ,ξN(0,I)\tilde a_\ell(x; \sigma) = \phi(z_\ell(x)) + \sigma \xi_\ell, \quad \xi_\ell \sim \mathcal{N}(0, I)

for each =1,,L\ell = 1,\dots,L and noise scale σ>0\sigma > 0. The final logits are

s(x;σ)=Ua~L(x;σ)+cs(x; \sigma) = U\,\tilde a_L(x; \sigma) + c

and the predicted label is

k(x;σ)=argmaxksk(x;σ)k^*(x;\sigma) = \arg\max_k s_k(x;\sigma)

The empirical output distribution is estimated via TT Monte Carlo forward passes:

P^x(k;σ)=1Tt=1T1{kt=k}\hat P_x(k;\sigma) = \frac{1}{T} \sum_{t=1}^T \mathbf{1}\{ k^*_t = k \}

APEX thus interpolates between sample-dependent (σ0\sigma\to0) and model-dependent (σ\sigma\to\infty) response regimes.

Algorithmically, for each input xix_i, each chosen noise scale σs\sigma_s, and each of TT forward passes:

  • Independently sample ξt\xi_\ell^t for all \ell
  • Inject σsξt\sigma_s \xi^t_\ell after each layer’s activation in the network
  • Record the resulting top-1 class ktk^*_t
  • Aggregate to yield P^xi(k;σs)\hat P_{x_i}(k;\sigma_s)

The escape noise σescape(xi)\sigma_\mathrm{escape}(x_i) for input xix_i is defined as the minimal σs\sigma_s where the original prediction’s probability drops below a fixed threshold τ\tau.

2. Theoretical Framework and Decomposition

At the core of APEX is a decomposition theorem applicable for any xx with xR\|x\| \leq R and all σ>0\sigma > 0:

a~(x;σ)=σv+r(x;σ)\tilde a_\ell(x; \sigma) = \sigma v_\ell + r_\ell(x; \sigma)

where vv_\ell is a function solely of the network parameters and the sampled noise up to layer \ell, while the residual r(x;σ)r_\ell(x; \sigma) is uniformly bounded in norm. At the output layer,

s(x;σ)=σUvL+e(x;σ),e(x;σ)Cs(x;\sigma) = \sigma Uv_L + e(x;\sigma),\qquad \|e(x;\sigma)\|_\infty \leq C

The prediction simplifies in the large-noise limit:

argmaxisi(x;σ)=argmaxi[(UvL)i+ei(x;σ)σ]\arg\max_i s_i(x;\sigma) = \arg\max_i \left[ (Uv_L)_i + \frac{e_i(x;\sigma)}{\sigma} \right]

Thus, as σ\sigma \rightarrow \infty, predictions become independent of xx and depend only on random features UvLUv_L. This demonstrates that APEX suppresses input-specific signals and amplifies the structural, representation-level aspects embedded in the model.

Input perturbation, xx+εx \rightarrow x + \varepsilon, is shown to be a constrained form of activation perturbation, as its induced change at layer \ell is

Δ(x,ε)=a(x+ε)a(x)Ja(x)ε\Delta_\ell(x, \varepsilon) = a_\ell(x + \varepsilon) - a_\ell(x) \approx J_{a_\ell}(x)\, \varepsilon

where Ja(x)J_{a_\ell}(x) is the Jacobian; input noise thus spans a low-dimensional subspace of the activation space, in contrast to the full-variance, unconstrained perturbations of APEX.

3. Probing Regimes and Interpretive Phenomena

There exists a qualitative dichotomy between small- and large-noise regimes:

  • Small-Noise Regime (σ1\sigma \ll 1): The residual r(x;σ)r_\ell(x;\sigma) dominates. Predictions remain input-sensitive. In this regime, escape noise correlates strongly with sample regularity metrics such as memorization score and consistency/C-score (Spearman’s ρ0.7\rho \approx 0.7–$0.9$ on ImageNet and CIFAR-100). APEX detects smooth semantic transitions in networks trained on controlled splits, exhibiting monotonic probability transfer that aligns with learned representations.
  • Large-Noise Regime (σ1\sigma \gg 1): The term σvL\sigma v_L becomes dominant, rendering predictions input-agnostic. The network's output converges to a stationary, model-characteristic distribution. This regime exposes global biases: benign models exhibit high entropy output distributions, whereas backdoored models demonstrate collapse of output probability onto the target class (near-zero entropy).

The framework enables computation of normalized entropy,

H=1logckP(k)logP(k)H = -\frac{1}{\log c}\sum_k P(k) \log P(k)

and quantification of target-class concentration, serving as diagnostics for backdoor detection and capacity-induced bias amplification.

4. Empirical Evaluations and Case Studies

APEX has been systematically validated through distinct probes:

Probe Type Quantitative Outcome Interpretation
Sample Regularity Spearman’s ρ0.7\rho \approx 0.7–$0.9$ with memorization score Effective, lightweight alternative to ensembles
Random-Label Models Average escape noise decreases with more random labeling Reveals fragmented, non-semantic decision regions
Semantic Alignment Monotonic class transfer under activation noise only Confirms structure in representation space
Backdoor Detection Target class P^target0.90\hat P_\mathrm{target} \approx 0.90–$1.0$ vs. 0.1\approx 0.1–$0.2$ (benign); entropy collapse Captures global, training-induced bias
Model Architecture Sensitivity Deeper ResNets: stronger probability collapse; ViTs: partial, attenuated collapse Architecture-dependent bias revelation

In all cases, input- or parameter-level perturbations fail to exhibit the monotonicity, transition alignment, or diagnostic sharpness furnished by APEX.

5. Computational Complexity and Practical Implementation

Each estimation for APEX requires TT full forward passes per input; T=1,000T=1{,}000 is typical for CIFAR and T=100T=100 for ImageNet. The noise injection itself constitutes a layerwise vector addition, which introduces minimal computational overhead. No model retraining or ensemble construction is necessary; all analysis occurs at inference with fixed weights and is trivially parallelizable over both examples and Monte Carlo samples. Choice of noise scale σ\sigma and threshold τ\tau for metrics such as escape noise allows sensitivity–cost trade-offs.

APEX complements input and parameter perturbation probes—by acting directly on hidden representations, it accesses structural information that cannot be inferred from reachable input space alone.

6. Methodological Distinction and Conceptual Scope

APEX unifies local sample analysis and global model bias probing within a single, theoretically grounded framework. It admits input perturbation as a constrained, degenerate instance, subsuming prior approaches in expressive power. The ability to interpolate between regimes by modulating σ\sigma enables fine-grained scrutiny of memorization, regularity, semantic partitioning, and bias-induced collapse, revealing properties inaccessible to traditional probing techniques. The method's lightweight computational profile combined with its ability to interrogate internal network structure positions it as an effective probe for model interpretability, robustness diagnostics, and backdoor detection (Ren et al., 3 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Activation Perturbation for EXploration (APEX).