Cross-Mode Activation Patching in Neural Decoding

Updated 8 February 2026

Cross-mode activation patching is a causal interpretability technique that replaces internal neural activations across vocalized, mimed, and imagined speech modalities.
It employs coarse-to-fine tracing in convolutional and recurrent layers to identify compact subspaces that critically influence decoding performance.
Neuron-level interventions and manifold interpolation studies reveal that small, specific neuron subsets drive cross-modal transfer via graded latent representations.

Cross-mode activation patching is a causal mechanistic interpretability technique for probing neural networks trained on multimodal datasets, particularly in the context of brain-to-speech decoding. It systematically substitutes internal activations in a model across distinct input modalities—such as vocalized, mimed, and imagined speech—while holding all other activations and weights fixed. This framework enables researchers to localize, quantify, and characterize how representations supporting cross-modal generalization are encoded within neural architectures and to distinguish whether information is preserved in discrete, localizable subspaces or distributed activity patterns (Maghsoudi et al., 1 Feb 2026).

1. Formal Definition and Methodological Foundations

Let $m,n\in\{\mathrm{V},\mathrm{M},\mathrm{I}\}$ denote modes corresponding to vocalized, mimed, and imagined speech, respectively. For a fixed decoder $f$ with $L$ layers, the activation at layer $\ell$ for input $x^{\text{hero}}$ in mode $m$ is $a_\ell^{(m)}(x^{\text{hero}})\in\mathbb{R}^{d_\ell}$ . The patching operator at layer $\ell$ is formally defined as

$P_{\ell,m\rightarrow n}(a_\ell^{(m)}) \triangleq a_\ell^{(n)},$

which replaces mode $m$ 's activations by those from mode $n$ for paired linguistic content.

Given model split $f = f_{\mathrm{post}-\ell} \circ \mathrm{Pre}-\ell$ , patched inference is

$\hat{y}_\mathrm{path} = f_{\mathrm{post}-\ell}\big(P_{\ell,m\rightarrow n}(a_\ell^{(m)}(x))\big)$

with all parameters fixed.

Causal impact is quantified by comparing the patched to unpatched output:

$\Delta\mathcal{L}_{m\rightarrow n}^{(\ell)} = \mathcal{L}(\hat{y}_\mathrm{path},y) - \mathcal{L}(f(x),y),$

where $\mathcal{L}$ is a metric such as negative Pearson correlation (PCC) or Mel Cepstral Distortion (MCD). Directionality is probed via patching from a higher- to lower-performing mode (sufficiency: $\Delta\mathcal{L}<0$ or $\Delta\mathrm{PCC}>0$ ) and in the reverse (necessity).

2. Causal Tracing: Identifying Localized Cross-Mode Structure

Coarse-to-fine tracing is employed to specify which internal subspaces are sufficient or necessary for cross-modal information transfer. In convolutional layers ( $\ell=\mathrm{conv}$ ), the $64$ output channels are divided into four contiguous $16$-channel groups ( $g_0$ – $g_3$ ), each patched independently. Similarly, for a recurrent layer ( $\ell=\mathrm{rnn}$ ) of length $T=128$ , time is segmented into thirds: Early $[1,43]$ , Mid $[44,85]$ , and Late $[86,128]$ .

Coarse group patching (Table 4) finds that:

Sufficiency: Patching group $g_2$ from vocalized into imagined yields $\mathrm{PCC}\uparrow$ (from $0.725\rightarrow0.659$ ), $\mathrm{MCD}\downarrow$ ( $2.88\rightarrow3.00$ ).
Necessity: Reverse patching $g_2$ causes $\mathrm{PCC}$ to drop from $0.752\rightarrow 0.932$ and $\mathrm{MCD}$ to rise $2.80\rightarrow 1.82$ .

Fine sliding-window tracing in the recurrent layer uses $32$-timestep (25% of $T$ ) windows, shifted in $10$-step increments; windows spanning $t\approx 21\ldots84$ yield most of the cross-mode benefit, demonstrating local temporal specificity.

3. Subspace Identification and Causal Scrubbing

Coarse tracing reveals two localized regions mediating cross-mode transfer:

$S_{\mathrm{conv}}$ : convolutional channels $[32,48)$ (16-dimensional)
$S_{\mathrm{rnn}}$ : $\sim32$ recurrent time-steps $[21,84)$

Let $W_{\mathrm{conv}}\in\{0,1\}^{16\times64}$ be the selection mask for $S_{\mathrm{conv}}$ , $W_{\mathrm{rnn}}$ analogously for $S_{\mathrm{rnn}}$ . Subspace projections are defined as

$\Pi_{\mathrm{conv}}(a) = W_{\mathrm{conv}}a,\quad \bar{\Pi}_{\mathrm{conv}}(a) = (I-W_{\mathrm{conv}})a$

Hybrid (“scrubbed”) activations are constructed by combining the subspace of interest from the donor mode with the complement from a random example in the target mode:

$\tilde{a}_\ell[m\rightarrow n] = \Pi_\ell(a_\ell(n)) + \bar{\Pi}_\ell(a_\ell(\text{rnd}))$

Variants include KEEP-Conv (only conv keep), KEEP-RNN, and KEEP-Combo. Performance is compared to random baselines (RAND-Conv, RAND-RNN) selecting the same number of features at random.

Key scrubbing sufficiency results (I←V direction):

Intervention	PCC	MCD
FullPatch	0.954	1.63
KEEP-Conv	0.666	3.13
RAND-Conv	0.564	3.23

A compact $16$-dimensional conv subspace suffices for a majority of the benefit, outperforming random selection and demonstrating directional asymmetry in transfer.

4. Neuron-Level Patching and Distributed Coding

Refinement proceeds to individual neuron and “top-k” group interventions. For source $A_s$ , target $A_t\in\mathbb{R}^{T\times d_\ell}$ , patching neuron $i$ at all $t$ yields

$\tilde{A}^{(i)}[t,j] = \begin{cases} A_s[t,i], & j=i \ A_t[t,j], & \text{otherwise} \end{cases}$

The effect on decoding is measured by $\Delta\mathrm{PCC}^{(i)}$ .

For top-k patching, neurons are ranked by mean $\Delta\mathrm{PCC}^{(i)}$ ; patching the top $k$ produces a “saturation-degradation” curve: $\mathrm{PCC}$ rises, peaks at $k^*$ , then falls due to interference.

Key quantitative findings:

Peak $\Delta\mathrm{PCC}\approx +0.04$ at $k\approx20$ in RNN, $k\approx50$ in Conv.
No single neuron produces $|\Delta\mathrm{PCC}^{(i)}|>0.01$ ; only a small, specific subset suffices.

Sentence-level coverage indicates that the top-5 RNN neurons, for the V→M case, suffice for up to $35\%$ of sentences, with lower coverage in Conv, supporting the conclusion that cross-mode transfer is mediated by small neuron subsets, but not isolated units.

To interrogate the geometry of speech mode representations, activations at each layer are interpolated as

$a_\ell(a;A\rightarrow B) = (1-a)\,a_\ell(A) + a\,a_\ell(B),\quad a\in[0,1]$

For tri-modal generalization, convex combinations $a_\ell(\alpha,\beta,\gamma) = \alpha a_\ell(V) + \beta a_\ell(M) + \gamma a_\ell(I)$ , where $\alpha+\beta+\gamma=1$ .

Observed effects include:

Nearly linear $\mathrm{PCC}$ and $\mathrm{MCD}$ transitions in $\mathrm{conv}$ suggest graded local coding at mid-level layers.
RNN interpolation curves are smoother with mild non-linearities, indicating higher-level structure.
Mimed activations occupy an intermediate point between vocalized and imagined, not forming a discrete regime.

6. Quantitative Highlights and Interpretational Summary

Major experimental outcomes include:

Full layer activation patching: V→I (sufficiency) raises $\mathrm{PCC}$ from $0.725$ to $0.954$; I→V (necessity) drops $\mathrm{PCC}$ from $0.752$ to $0.177$.
A 16-channel conv block and a $\sim$ 32-step RNN window account for nearly all patching benefit; random subspaces are significantly less effective.
Peak sufficiency is obtained by patching $k\approx 20$ RNN or $k\approx 50$ conv neurons, confirming the absence of isolated “magic” units.

All reported differences are statistically significant with $\mathrm{SEM}<0.01$ for major conditions ( $N=200$ sentences, 5-fold cross-validation). The results establish that speech modes lie on a single, graded manifold in latent space, and cross-mode transfer is driven by compact, layer-specific subspaces rather than widely distributed or individual-neuron activity. Directionality is pronounced: only vocalized representations enable effective transfer to imagined/mimed decoding; the reverse patching severely degrades performance (Maghsoudi et al., 1 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Mechanistic Interpretability of Brain-to-Speech Models Across Speech Modes (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross-Mode Activation Patching.

Cross-Mode Activation Patching in Neural Decoding

1. Formal Definition and Methodological Foundations

2. Causal Tracing: Identifying Localized Cross-Mode Structure

3. Subspace Identification and Causal Scrubbing

4. Neuron-Level Patching and Distributed Coding

6. Quantitative Highlights and Interpretational Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Cross-Mode Activation Patching in Neural Decoding

1. Formal Definition and Methodological Foundations

2. Causal Tracing: Identifying Localized Cross-Mode Structure

3. Subspace Identification and Causal Scrubbing

4. Neuron-Level Patching and Distributed Coding

5. Manifold Structure: Tri-Modal Activation Interpolation

6. Quantitative Highlights and Interpretational Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research