Cross-Mode Activation Patching in Neural Decoding
- Cross-mode activation patching is a causal interpretability technique that replaces internal neural activations across vocalized, mimed, and imagined speech modalities.
- It employs coarse-to-fine tracing in convolutional and recurrent layers to identify compact subspaces that critically influence decoding performance.
- Neuron-level interventions and manifold interpolation studies reveal that small, specific neuron subsets drive cross-modal transfer via graded latent representations.
Cross-mode activation patching is a causal mechanistic interpretability technique for probing neural networks trained on multimodal datasets, particularly in the context of brain-to-speech decoding. It systematically substitutes internal activations in a model across distinct input modalities—such as vocalized, mimed, and imagined speech—while holding all other activations and weights fixed. This framework enables researchers to localize, quantify, and characterize how representations supporting cross-modal generalization are encoded within neural architectures and to distinguish whether information is preserved in discrete, localizable subspaces or distributed activity patterns (Maghsoudi et al., 1 Feb 2026).
1. Formal Definition and Methodological Foundations
Let denote modes corresponding to vocalized, mimed, and imagined speech, respectively. For a fixed decoder with layers, the activation at layer for input in mode is . The patching operator at layer is formally defined as
which replaces mode 's activations by those from mode for paired linguistic content.
Given model split , patched inference is
with all parameters fixed.
Causal impact is quantified by comparing the patched to unpatched output:
where is a metric such as negative Pearson correlation (PCC) or Mel Cepstral Distortion (MCD). Directionality is probed via patching from a higher- to lower-performing mode (sufficiency: or ) and in the reverse (necessity).
2. Causal Tracing: Identifying Localized Cross-Mode Structure
Coarse-to-fine tracing is employed to specify which internal subspaces are sufficient or necessary for cross-modal information transfer. In convolutional layers (), the $64$ output channels are divided into four contiguous $16$-channel groups (–), each patched independently. Similarly, for a recurrent layer () of length , time is segmented into thirds: Early , Mid , and Late .
Coarse group patching (Table 4) finds that:
- Sufficiency: Patching group from vocalized into imagined yields (from ), ().
- Necessity: Reverse patching causes to drop from and to rise .
Fine sliding-window tracing in the recurrent layer uses $32$-timestep (25% of ) windows, shifted in $10$-step increments; windows spanning yield most of the cross-mode benefit, demonstrating local temporal specificity.
3. Subspace Identification and Causal Scrubbing
Coarse tracing reveals two localized regions mediating cross-mode transfer:
- : convolutional channels (16-dimensional)
- : recurrent time-steps
Let be the selection mask for , analogously for . Subspace projections are defined as
Hybrid (“scrubbed”) activations are constructed by combining the subspace of interest from the donor mode with the complement from a random example in the target mode:
Variants include KEEP-Conv (only conv keep), KEEP-RNN, and KEEP-Combo. Performance is compared to random baselines (RAND-Conv, RAND-RNN) selecting the same number of features at random.
Key scrubbing sufficiency results (I←V direction):
| Intervention | PCC | MCD |
|---|---|---|
| FullPatch | 0.954 | 1.63 |
| KEEP-Conv | 0.666 | 3.13 |
| RAND-Conv | 0.564 | 3.23 |
A compact $16$-dimensional conv subspace suffices for a majority of the benefit, outperforming random selection and demonstrating directional asymmetry in transfer.
4. Neuron-Level Patching and Distributed Coding
Refinement proceeds to individual neuron and “top-k” group interventions. For source , target , patching neuron at all yields
The effect on decoding is measured by .
For top-k patching, neurons are ranked by mean ; patching the top produces a “saturation-degradation” curve: rises, peaks at , then falls due to interference.
Key quantitative findings:
- Peak at in RNN, in Conv.
- No single neuron produces ; only a small, specific subset suffices.
Sentence-level coverage indicates that the top-5 RNN neurons, for the V→M case, suffice for up to of sentences, with lower coverage in Conv, supporting the conclusion that cross-mode transfer is mediated by small neuron subsets, but not isolated units.
5. Manifold Structure: Tri-Modal Activation Interpolation
To interrogate the geometry of speech mode representations, activations at each layer are interpolated as
For tri-modal generalization, convex combinations , where .
Observed effects include:
- Nearly linear and transitions in suggest graded local coding at mid-level layers.
- RNN interpolation curves are smoother with mild non-linearities, indicating higher-level structure.
- Mimed activations occupy an intermediate point between vocalized and imagined, not forming a discrete regime.
6. Quantitative Highlights and Interpretational Summary
Major experimental outcomes include:
- Full layer activation patching: V→I (sufficiency) raises from $0.725$ to $0.954$; I→V (necessity) drops from $0.752$ to $0.177$.
- A 16-channel conv block and a 32-step RNN window account for nearly all patching benefit; random subspaces are significantly less effective.
- Peak sufficiency is obtained by patching RNN or conv neurons, confirming the absence of isolated “magic” units.
All reported differences are statistically significant with for major conditions ( sentences, 5-fold cross-validation). The results establish that speech modes lie on a single, graded manifold in latent space, and cross-mode transfer is driven by compact, layer-specific subspaces rather than widely distributed or individual-neuron activity. Directionality is pronounced: only vocalized representations enable effective transfer to imagined/mimed decoding; the reverse patching severely degrades performance (Maghsoudi et al., 1 Feb 2026).