Empirical Activation Similarity (EAS) Overview
- Empirical Activation Similarity (EAS) is a metric that quantifies cosine similarity in high-dimensional activation vectors to assess statistical alignment in neural systems.
- It is applied across artificial neural networks and cognitive neuroscience to analyze representational specialization and inform model pruning and calibration strategies.
- Empirical studies demonstrate that EAS can enhance model compression efficiency and support dynamic domain sensitivity analysis through time-resolved similarity measures.
Empirical Activation Similarity (EAS) quantifies the statistical alignment or correspondence between high-dimensional activation patterns elicited by different inputs within neural systems, including artificial neural networks and the human brain. EAS metrics have been deployed to measure semantic similarity, to guide model pruning, and to analyze representational specialization across domains and layers. Core instantiations span model-comparison in cognitive neuroscience, gradient-driven attribution in transformer architectures, and angular fidelity loss for deep learning compression. Definitions and protocols vary by field, but EAS typically leverages second-order activation statistics or cosine-based similarity over activation vectors, grounded directly in observed (empirical) activity rather than parametric modeling assumptions.
1. Fundamental Formulations of EAS Across Modalities
In LLMs, EAS formalizes the cosine similarity between “activation vectors” derived from parameterwise gradients of model outputs—measuring which parameters are influential for a particular input. Given a model output functional for input and parameters , the per-parameter activation metric is
where denotes the -dimensional activation vector. For two inputs , EAS is computed as
This metric, referred to as LLMDcos, takes values in due to nonnegative elements, with unity indicating maximal overlap in activated parameters (Wang et al., 2024).
In neural data analysis (e.g., MEG studies), EAS characterizes stimulus similarity via Pearson correlation between empirically reduced activation vectors at time 0:
1
where 2 is the time-resolved empirical activation similarity (Wardle et al., 2015). Alternative constructions may use classification-based dissimilarity measures (e.g., 3 from decoding analysis) normalized and inverted to produce similarity scores.
For transformer interpretability, gradient × activation saliency maps define tokenwise and word-group activations, enabling EAS-like explanatory matching of words/phrases between text pairs (Malkiel et al., 2022).
2. EAS in Model Compression and Pruning
Recent pruning strategies exploit EAS to preserve the angular structure of representations during parameter ablation. In the ACE framework, Empirical Activation Similarity measures the cosine fidelity between unpruned and pruned model activations:
4
where 5 and 6 are dense and pruned layer outputs for an 7-token batch. The pruning score for each connection combines a weight-magnitude × activation-norm factor (CosP) with an activation-variance factor (VarP):
8
9
0
The ACE algorithm prunes weights ranked by 1, directly minimizing angular distortion and improving calibration efficiency. Experiments show EAS-informed pruning achieves up to 18% reduction in perplexity and up to 63% reduction in time relative to non-EAS baselines, while requiring as few as 16 tokens of calibration data (2505.21987).
3. Layerwise and Domain-Sensitivity Analysis Using EAS
EAS provides a lens onto internal specialization, differentiating “universal encoder” layers (high activation similarity across domains) from deep “expert” layers which activate differently for task-specific or cross-domain inputs (Wang et al., 2024). Empirically:
- Within-domain EAS: High (2) for all layers—parameters consistently co-activated.
- Cross-domain EAS: High for shallow layers then decays to 3 in deep layers—deep blocks exhibit representational individuality.
- Peak domain-agnosticity in layer 2; maximal specialization in layers 20–30 (for Llama2-7B).
Averaging EAS matrices over datasets recovers an interpretable domain-task similarity structure, accurately reflecting semantic or procedural overlap between benchmarks.
4. Empirical Activation Similarity in Cognitive Neuroscience
EAS enables the comparison of neural representations evoked by sensory stimuli. In MEG studies (Wardle et al., 2015), EAS is calculated as correlation similarity between PCA-reduced activation vectors for each stimulus at each timepoint, yielding dynamic similarity matrices. This approach supports representational similarity analysis (RSA) to compare empirical neural geometry to external models (retinotopic, computational, or perceptual). Key empirical findings:
- Early visual cortex representations align with retinotopic models (450–80 ms post-stimulus).
- From 5150 ms, EAS with perceptual-similarity models approaches the empirical noise ceiling.
- EAS provides a metric for empirical quantification of perceptual Gestalts via brain-wide activation patterns.
5. Applications: Pruning, Interpretability, Retrieval, Model Calibration
EAS metrics have demonstrable utility:
- Adaptive Model Pruning: EAS guides unstructured and semi-structured pruning; layerwise pruning ratios are tuned by observed activation density/sparsity (e.g., densest layers pruned less aggressively) (Wang et al., 2024, 2505.21987).
- Calibration Efficiency: EAS-based pruning remains robust with very short calibration sequences, supporting rapid compression.
- Interpretability and Attribution: EAS-inspired saliency and word-pair matching provide token-level explanations for BERT similarity (Malkiel et al., 2022).
- Semantic Similarity: Deep-layer EAS correlates with human judgment on STS-B and SICK, offering embedding-free data relevance signals.
- Monitoring and Robustness: Large rotational changes in activation similarity can signal domain shift or calibration drift in deployment (2505.21987).
6. Implementation Protocols and Assessment
EAS operationalizes as batch-wise or layer-wise cosine similarity between activation vectors. Variants exist:
- Parameter-space activation statistics (gradient × parameter, (Wang et al., 2024))
- Activation vector correlation (MEG, (Wardle et al., 2015))
- Token/word-level saliency (transformers, (Malkiel et al., 2022))
- Angular deviation between dense and compressed model activations (ACE, (2505.21987))
Calibration batch size (6) and sequence length (7) are key hyperparameters; practical settings range from 8 to 9 sequences. Validation is performed by correlating EAS matrices with external similarity labels or task/domain outcomes, using Spearman or Wilcoxon statistics.
7. Comparative Table: EAS Usage Across Domains
| Field | EAS Formulation | Principal Use |
|---|---|---|
| LLMs/Pruning (2505.21987) | Cosine sim. of dense/pruned acts | Compression, calibration |
| LLMs/Interpretability (Wang et al., 2024) | Cosine sim. of activation vectors | Layer specialization, domain analysis |
| BERT/Interpretation (Malkiel et al., 2022) | Gradient × activation saliency | Token/word attribution |
| Cognitive Neuroscience (Wardle et al., 2015) | Corr. similarity over neural acts | Representational similarity |
The diversity of EAS instantiations reflects the underlying generality of empirical activation geometry as a unifying framework for measuring representational, functional, and semantic similarity in both artificial and biological systems.