Emotion-Sensitive Neurons in Neural Models

Updated 13 January 2026

Emotion-Sensitive Neurons (ESNs) are defined as neural units displaying selective activation correlated with specific emotion classes, as shown by targeted ablation and steering experiments.
They are identified using metrics like Activation Probability, Mean Activation Difference, and Contrastive Activation Selection to ensure precise detection across models.
Their manipulation in audio-language and cross-domain frameworks offers actionable insights for interpretability, control, and safety in affective AI systems.

Emotion-Sensitive Neurons (ESNs) are individual neural units whose activation is selectively correlated with particular emotion classes and whose manipulation via causal interventions yields emotion-specific effects on model predictions. ESNs have been formalized in both large audio-LLMs (@@@@3@@@@) and cross-domain frameworks such as SHArE, where they encode appraisals of core values (e.g., valence, arousal) and participate directly in the ultra-fast emotional judgments essential to both human and artificial cognition (Zhao et al., 6 Jan 2026, Opong-Mensah, 2020).

1. Formal Definition and Mechanistic Role

In LALMs, ESNs are specified as decoder SwiGLU-MLP gate units with positive activations that exhibit selective, emotion-conditioned firing patterns. For an emotion $e$ , an ESN is characterized by an activation profile $\{a_{l,n,t}\}_{t}$ such that the probability

$\mathrm{LAP}^{(e)}_{l,n} = P^{(e)}_{l,n} = \frac{K^{(e)}_{l,n}}{T_e}$

is elevated for one emotion compared to others, where $K^{(e)}_{l,n}$ counts positive firings and $T_e$ is the set size for evaluation. Ablations of these neurons selectively degrade recognition of their associated emotions (“self-deactivation”) far more than for other classes (“cross-deactivation”). Conversely, gain amplification steers outputs toward the target emotion (“steering”). This demonstrates their causal necessity and partial sufficiency for affective decisions in speech-to-text inference (Zhao et al., 6 Jan 2026).

In the SHArE framework, ESNs are dynamical units whose membrane voltage dynamics encode not just stimulus classification but valenced appraisal: $\gamma_{i,c} = \eta_{i,c}\,\Delta s_i$ with valence sensitivity $\eta_{i,c}$ and perception depth $\Delta s_i$ . These parameters endow the ESN with the ability to compute and transmit emotional judgments both at the connectionist and conductance-based simulation levels (Opong-Mensah, 2020).

2. ESN Identification: Metrics and Algorithms

ESN selection in LALMs relies on four primary neuron-selector metrics:

Activation Probability (LAP): Frequency of positive activation for an emotion.
Activation Probability Entropy (LAPE): Entropy of emotion-conditioned activation probabilities, quantifying selectivity.
Mean Activation Difference (MAD): Difference between mean activation for $e$ and all other emotions.

$\mathrm{MAD}^{(e)}_{l,n} = M^{(e)}_{l,n} - \bar{M}^{(-e)}_{l,n}$

Contrastive Activation Selection (CAS): Margin between top and runner-up emotion activation probabilities, yielding ESNs via

$s_{l,n}^{\mathrm{CAS}(e)} = \begin{cases} P^{(1)}_{l,n} - P^{(2)}_{l,n}, & \text{if } e = e^{(1)}_{l,n} \ -\infty, & \text{otherwise} \end{cases}$

Top $r\%$ neurons by metric score are selected per emotion (Zhao et al., 6 Jan 2026).

In SHArE, identification is formalized via the assignment to each ANN neuron of the parameters $\{\eta_{j,c}, \Delta s_j, \rho_{k,j}\}$ , thereby rendering each unit an ESN $^{(n)}_j$ with explicit valence/arousal sensitivity and correlation coefficients embedded in its activation dynamics (Opong-Mensah, 2020).

3. Inference-Time Intervention and Causal Validation

LALM ESNs are validated by three key interventions:

Ablation/Deactivation: Setting all ESNs for emotion $e_{\text{src}}$ to zero via binary mask

$\tilde g^{\rm abl}_{l,t} = g_{l,t} \odot r^{(m,e_{\rm src})}_l$

causes targeted performance drops for $e_{\text{src}}$ (up to −14.63% accuracy change), with minimal impact on other classes.

Targeted Steering (Gain Amplification): Scaling ESN outputs by $1+\alpha$

$\tilde g^{\rm steer}_{l,t} = g_{l,t} \odot s^{(m,e_{\rm src})}_l(\alpha)$

increases $e_{\text{src}}$ recognition by +2.7–3.3 points, demonstrating sufficiency of selected neurons (Zhao et al., 6 Jan 2026).

Agnostic Injection: Non-specific gain amplification using multiple emotion masks (methods: 2-Pass, Mix, Union), yielding less specificity but quantifying cross-emotion circuit interactions.

In SHArE, causal interpretability extends to conductance-based models: synaptic conductance $g_{ij}(t)$ is modulated by valence and perception ( $\eta_{j,c}\Delta s_j$ ), and spike events trigger judgment functions $D$ , mapping neural voltage trajectories to emotional state updates (Opong-Mensah, 2020).

4. Layer-Wise Distribution and Transfer Properties

ESNs in Qwen2.5-Omni, Kimi-Audio, and Audio Flamingo 3 cluster non-uniformly across the decoder, with maxima in layers 0, 6–8, and 19–22, and sparse representation in central layers. This robustly replicates across architectures and emotionally annotated datasets.

Cross-dataset transfer analyses show that ESNs identified on one corpus yield diagonal self-effect signatures when ablated on another (shared classes only), albeit with reduced magnitude. Emotional specificity is asymmetric: “anger” and “sadness” consistently transfer more strongly than “neutral,” suggesting a shared mechanistic encoding with some dataset-dependent adaptation (Zhao et al., 6 Jan 2026).

5. Experimental Paradigms and Benchmarks

Empirical validation follows a strict protocol:

Model Platforms: Qwen2.5-Omni-7B, Kimi-Audio-7B, Audio Flamingo 3 (28-layer decoders, speech input, text output).
Benchmark Datasets: IEMOCAP, MELD, MSP-Podcast, five-way emotion labels (anger, joy/happiness, neutral, sadness, frustration/surprise).
Data Pools: Identification performed on correctly answered items per-model/emotion (min 200, max 1000), evaluation on balanced held-out sets.
Prompting: Multiple-choice speech emotion recognition (SER), randomized map, greedy decoding. Performance statistics confirm highly emotion-selective necessity and sufficiency across methods and datasets, with statistical robustness evidenced by consistent effects across three models (Zhao et al., 6 Jan 2026).

The SHArE framework generalizes ESNs beyond digital LALMs, embedding them into biological simulations and abstract policy networks:

Biological Simulation: ESNs mapped to real neurons via conductance-based models, with membrane voltage $V_{m,i}(t)$ and real-time valence proxies $\eta_{i,c}\Delta s_i$ .
Artificial Networks: ANN units treated as ESNs by augmenting with $\{\eta, \Delta s, \rho\}$ , enabling sentiment manifolds and emotion-region clustering in latent activation space.
Therapy and Machine Motivation: Gradients $\partial \underline{\alpha}/\partial \tilde a$ enable trajectory design to steer patient ESN states (behavioral intervention) or imbue artificial agents with synthetic motivational drives analogous to human emotional processes (“hunger,” “sociality”) (Opong-Mensah, 2020).

7. Implications for Interpretability, Control, and Safety

The existence of compact ESN sets delivers a mechanistic account of affective computation at the neuron level in LALMs and broader neural frameworks. Targeted ESN interventions provide actionable control handles:

Interpretability: ESNs elucidate the internal routing of paralinguistic features through selective modulating subspaces (SwiGLU gates).
Controllability: Gain amplification can steer agent behavior or conversational tone (e.g., increasing “empathy” by amplifying “sadness” circuits).
Safety: Understanding and modulating ESNs presents a pathway to mitigate undesired tone, affective bias, and to monitor ethical alignment in emotional AI agents. A plausible implication is that refined ESN-driven interventions could underpin next-generation affectively capable and transparent conversational systems, as well as advancing neuroscientific modeling of emotion at the single-neuron granularity (Zhao et al., 6 Jan 2026, Opong-Mensah, 2020).