Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cross-Mode Activation Patching in Neural Decoding

Updated 8 February 2026
  • Cross-mode activation patching is a causal interpretability technique that replaces internal neural activations across vocalized, mimed, and imagined speech modalities.
  • It employs coarse-to-fine tracing in convolutional and recurrent layers to identify compact subspaces that critically influence decoding performance.
  • Neuron-level interventions and manifold interpolation studies reveal that small, specific neuron subsets drive cross-modal transfer via graded latent representations.

Cross-mode activation patching is a causal mechanistic interpretability technique for probing neural networks trained on multimodal datasets, particularly in the context of brain-to-speech decoding. It systematically substitutes internal activations in a model across distinct input modalities—such as vocalized, mimed, and imagined speech—while holding all other activations and weights fixed. This framework enables researchers to localize, quantify, and characterize how representations supporting cross-modal generalization are encoded within neural architectures and to distinguish whether information is preserved in discrete, localizable subspaces or distributed activity patterns (Maghsoudi et al., 1 Feb 2026).

1. Formal Definition and Methodological Foundations

Let m,n{V,M,I}m,n\in\{\mathrm{V},\mathrm{M},\mathrm{I}\} denote modes corresponding to vocalized, mimed, and imagined speech, respectively. For a fixed decoder ff with LL layers, the activation at layer \ell for input xherox^{\text{hero}} in mode mm is a(m)(xhero)Rda_\ell^{(m)}(x^{\text{hero}})\in\mathbb{R}^{d_\ell}. The patching operator at layer \ell is formally defined as

P,mn(a(m))a(n),P_{\ell,m\rightarrow n}(a_\ell^{(m)}) \triangleq a_\ell^{(n)},

which replaces mode mm's activations by those from mode nn for paired linguistic content.

Given model split f=fpostPref = f_{\mathrm{post}-\ell} \circ \mathrm{Pre}-\ell, patched inference is

y^path=fpost(P,mn(a(m)(x)))\hat{y}_\mathrm{path} = f_{\mathrm{post}-\ell}\big(P_{\ell,m\rightarrow n}(a_\ell^{(m)}(x))\big)

with all parameters fixed.

Causal impact is quantified by comparing the patched to unpatched output:

ΔLmn()=L(y^path,y)L(f(x),y),\Delta\mathcal{L}_{m\rightarrow n}^{(\ell)} = \mathcal{L}(\hat{y}_\mathrm{path},y) - \mathcal{L}(f(x),y),

where L\mathcal{L} is a metric such as negative Pearson correlation (PCC) or Mel Cepstral Distortion (MCD). Directionality is probed via patching from a higher- to lower-performing mode (sufficiency: ΔL<0\Delta\mathcal{L}<0 or ΔPCC>0\Delta\mathrm{PCC}>0) and in the reverse (necessity).

2. Causal Tracing: Identifying Localized Cross-Mode Structure

Coarse-to-fine tracing is employed to specify which internal subspaces are sufficient or necessary for cross-modal information transfer. In convolutional layers (=conv\ell=\mathrm{conv}), the $64$ output channels are divided into four contiguous $16$-channel groups (g0g_0g3g_3), each patched independently. Similarly, for a recurrent layer (=rnn\ell=\mathrm{rnn}) of length T=128T=128, time is segmented into thirds: Early [1,43][1,43], Mid [44,85][44,85], and Late [86,128][86,128].

Coarse group patching (Table 4) finds that:

  • Sufficiency: Patching group g2g_2 from vocalized into imagined yields PCC\mathrm{PCC}\uparrow (from 0.7250.6590.725\rightarrow0.659), MCD\mathrm{MCD}\downarrow (2.883.002.88\rightarrow3.00).
  • Necessity: Reverse patching g2g_2 causes PCC\mathrm{PCC} to drop from 0.7520.9320.752\rightarrow 0.932 and MCD\mathrm{MCD} to rise 2.801.822.80\rightarrow 1.82.

Fine sliding-window tracing in the recurrent layer uses $32$-timestep (25% of TT) windows, shifted in $10$-step increments; windows spanning t2184t\approx 21\ldots84 yield most of the cross-mode benefit, demonstrating local temporal specificity.

3. Subspace Identification and Causal Scrubbing

Coarse tracing reveals two localized regions mediating cross-mode transfer:

  • SconvS_{\mathrm{conv}}: convolutional channels [32,48)[32,48) (16-dimensional)
  • SrnnS_{\mathrm{rnn}}: 32\sim32 recurrent time-steps [21,84)[21,84)

Let Wconv{0,1}16×64W_{\mathrm{conv}}\in\{0,1\}^{16\times64} be the selection mask for SconvS_{\mathrm{conv}}, WrnnW_{\mathrm{rnn}} analogously for SrnnS_{\mathrm{rnn}}. Subspace projections are defined as

Πconv(a)=Wconva,Πˉconv(a)=(IWconv)a\Pi_{\mathrm{conv}}(a) = W_{\mathrm{conv}}a,\quad \bar{\Pi}_{\mathrm{conv}}(a) = (I-W_{\mathrm{conv}})a

Hybrid (“scrubbed”) activations are constructed by combining the subspace of interest from the donor mode with the complement from a random example in the target mode:

a~[mn]=Π(a(n))+Πˉ(a(rnd))\tilde{a}_\ell[m\rightarrow n] = \Pi_\ell(a_\ell(n)) + \bar{\Pi}_\ell(a_\ell(\text{rnd}))

Variants include KEEP-Conv (only conv keep), KEEP-RNN, and KEEP-Combo. Performance is compared to random baselines (RAND-Conv, RAND-RNN) selecting the same number of features at random.

Key scrubbing sufficiency results (I←V direction):

Intervention PCC MCD
FullPatch 0.954 1.63
KEEP-Conv 0.666 3.13
RAND-Conv 0.564 3.23

A compact $16$-dimensional conv subspace suffices for a majority of the benefit, outperforming random selection and demonstrating directional asymmetry in transfer.

4. Neuron-Level Patching and Distributed Coding

Refinement proceeds to individual neuron and “top-k” group interventions. For source AsA_s, target AtRT×dA_t\in\mathbb{R}^{T\times d_\ell}, patching neuron ii at all tt yields

A~(i)[t,j]={As[t,i],j=i At[t,j],otherwise\tilde{A}^{(i)}[t,j] = \begin{cases} A_s[t,i], & j=i \ A_t[t,j], & \text{otherwise} \end{cases}

The effect on decoding is measured by ΔPCC(i)\Delta\mathrm{PCC}^{(i)}.

For top-k patching, neurons are ranked by mean ΔPCC(i)\Delta\mathrm{PCC}^{(i)}; patching the top kk produces a “saturation-degradation” curve: PCC\mathrm{PCC} rises, peaks at kk^*, then falls due to interference.

Key quantitative findings:

  • Peak ΔPCC+0.04\Delta\mathrm{PCC}\approx +0.04 at k20k\approx20 in RNN, k50k\approx50 in Conv.
  • No single neuron produces ΔPCC(i)>0.01|\Delta\mathrm{PCC}^{(i)}|>0.01; only a small, specific subset suffices.

Sentence-level coverage indicates that the top-5 RNN neurons, for the V→M case, suffice for up to 35%35\% of sentences, with lower coverage in Conv, supporting the conclusion that cross-mode transfer is mediated by small neuron subsets, but not isolated units.

5. Manifold Structure: Tri-Modal Activation Interpolation

To interrogate the geometry of speech mode representations, activations at each layer are interpolated as

a(a;AB)=(1a)a(A)+aa(B),a[0,1]a_\ell(a;A\rightarrow B) = (1-a)\,a_\ell(A) + a\,a_\ell(B),\quad a\in[0,1]

For tri-modal generalization, convex combinations a(α,β,γ)=αa(V)+βa(M)+γa(I)a_\ell(\alpha,\beta,\gamma) = \alpha a_\ell(V) + \beta a_\ell(M) + \gamma a_\ell(I), where α+β+γ=1\alpha+\beta+\gamma=1.

Observed effects include:

  • Nearly linear PCC\mathrm{PCC} and MCD\mathrm{MCD} transitions in conv\mathrm{conv} suggest graded local coding at mid-level layers.
  • RNN interpolation curves are smoother with mild non-linearities, indicating higher-level structure.
  • Mimed activations occupy an intermediate point between vocalized and imagined, not forming a discrete regime.

6. Quantitative Highlights and Interpretational Summary

Major experimental outcomes include:

  • Full layer activation patching: V→I (sufficiency) raises PCC\mathrm{PCC} from $0.725$ to $0.954$; I→V (necessity) drops PCC\mathrm{PCC} from $0.752$ to $0.177$.
  • A 16-channel conv block and a \sim32-step RNN window account for nearly all patching benefit; random subspaces are significantly less effective.
  • Peak sufficiency is obtained by patching k20k\approx 20 RNN or k50k\approx 50 conv neurons, confirming the absence of isolated “magic” units.

All reported differences are statistically significant with SEM<0.01\mathrm{SEM}<0.01 for major conditions (N=200N=200 sentences, 5-fold cross-validation). The results establish that speech modes lie on a single, graded manifold in latent space, and cross-mode transfer is driven by compact, layer-specific subspaces rather than widely distributed or individual-neuron activity. Directionality is pronounced: only vocalized representations enable effective transfer to imagined/mimed decoding; the reverse patching severely degrades performance (Maghsoudi et al., 1 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cross-Mode Activation Patching.