Semantic Similarity Heatmaps
- Semantic similarity heatmaps are matrix-style visualizations that encode the degree of semantic similarity between objects using a continuous color scale.
- They integrate both model-based methods (such as AIS, LSA, and diffusion-model approaches) and knowledge-based techniques (like topic maps and indexing vocabularies) to explain complex relationships.
- These heatmaps enhance interpretability and diagnostic accuracy across vision, language, and document applications, aligning computational metrics with human judgments.
Semantic similarity heatmaps are matrix-style visualizations where each cell encodes the degree of semantic similarity between two objects—such as images, words, or documents—using a continuous color map. They serve as analytical, diagnostic, or explanatory tools for quantifying and interpreting underlying relationships in high-dimensional representational spaces. Recent developments have allowed heatmaps not merely to display similarity scores but also to capture cognitive, contextual, and human-aligned notions of similarity, and to reveal the specific features, input regions, or contexts that drive these correspondences. Approaches span vision, language, and information retrieval, encompassing both model-based (e.g., deep neural embeddings, diffusion models) and knowledge-based (e.g., topic maps, indexing vocabularies) paradigms.
1. Alignment-Importance Semantic Similarity Heatmaps
The Alignment Importance Score (AIS) framework (Truong et al., 2024) defines the contribution of each feature-map in a deep neural network (DNN) to the alignment between network similarity geometry and human similarity assessments. Given images, the human similarity matrix is aligned to the model's similarity matrix (via pairwise cosine similarities in embedding space), and their agreement is quantified by Spearman correlation . The contribution of each feature is assessed by masking it and measuring the drop in alignment, which yields: where is the alignment after masking feature .
To render a semantic similarity heatmap for a particular image , the method computes AIS scores at the per-image, per-feature level, rectifies and normalizes them to form a nonnegative, sum-to-one weighting. These weights are used to linearly combine upsampled activation maps, localizing the “comparison-relevant” image regions. Optionally, spatial smoothing (e.g., Gaussian blur) can be applied.
Empirically, AIS heatmaps highlight semantically diagnostic regions rather than purely visually salient ones. For instance, in category discrimination tasks, they pick out features (monkey body posture, truck wheel-arch) that affect comparative judgments among peers, even if these regions draw less gaze attention from viewers. Precision–recall analyses demonstrate that conventional saliency maps (e.g., TranSalNet) fail to reliably predict AIS-selected regions except in simple cases (e.g., animals), and relative-risk ratios () quantify the enrichment of comparison-relevant pixels among saliency hotspots (e.g., for animals, for vegetables).
AIS-based pruning substantially improves the out-of-sample predictivity of human similarity judgments over the DNN representation, outperforming both full feature sets and global reweighting baselines such as LPIPS. The method generalizes across architectures (ResNet, DenseNet, Barlow-Twins, etc.) and domains (aligning to neural or language representations), and is theoretically grounded as an operationalization of Tversky’s “features of similarity” in complex, learned spaces (Truong et al., 2024).
2. Heatmaps for Word Semantic Similarity via Classification Confusion
To accommodate the asymmetrical, context-dependent, and polysemous character of word meaning, classification confusion-based heatmaps leverage classifier output probabilities to quantify word similarity in context (Zhou et al., 8 Feb 2025). The method processes textual data as follows:
- Extract context embeddings for each word occurrence using a pre-trained contextual encoder (e.g., BERT).
- Train a -class classifier to predict word identity from embeddings.
- For every target word and potential confounder , average the classifier's predicted probability over all embeddings where the true label is :
- Optionally symmetrize Conf to yield a square similarity matrix .
The resulting similarity matrix is mapped to a color scale, with hierarchical clustering or thresholding optionally applied for interpretability. This representation can be stratified by sense clusters or temporal slices to diagnose contextually-induced semantic shifts, as seen in examples tracking “révolution” across historical, political, and technical contexts.
Confusion-based heatmaps are competitive with traditional embedding cosine similarity for human judgment prediction, and are deployable across vocabulary subsets or specialized semantic domains (Zhou et al., 8 Feb 2025).
3. Semantic Heatmaps in Document and Term Space
Semantic similarity heatmaps may also be built using knowledge-structured approaches such as topic maps and controlled indexing vocabularies.
Topic-map–based similarity (Rafi et al., 2013) proceeds by encoding each document into a topic map , then identifying all rooted, label- and order-preserving common subtrees between two documents' topic-tree representations and . The similarity score: is compiled into an symmetric similarity matrix for a collection of documents. Heatmap visualizations, especially after ordering by hierarchical clustering, reveal more sharply defined, semantically coherent groups than co-occurrence or vector-based approaches, particularly for short or noisy documents.
Indexing vocabulary heatmaps (Mutschke et al., 2015) visualize the co-occurrence frequencies among first- and second-order controlled terms as a two-dimensional grid. After normalization to , color is assigned such that “hot” cells (e.g., ) indicate mainstream or highly associated subjects, and “cold” cells reflect niche or weakly connected terms. These maps can be interactively linked to search and recommendation interfaces, supporting exploratory navigation in semantic space.
4. Latent Semantic Analysis and Blurring-Based Heatmaps
Latent Semantic Analysis (LSA) enables semantic similarity heatmaps by projecting term–document matrices into a low-rank space via SVD: , where (Koeman et al., 2014). Pairwise cosine similarities between rows of (documents in reduced space) form the basis of , a similarity matrix visualized as a heatmap.
Decreasing progressively “blurs” these similarity structures: for small , intra-cluster similarities converge and inter-cluster distinctions sharpen, exposing major thematic blocks but erasing finer distinctions. LSA heatmaps have been employed to illustrate the emergence and dissolution of latent semantic groupings as compression increases, providing insights into semantic granularity and information loss (Koeman et al., 2014).
5. Similarity Grounding via Diffusion-Model–Induced Image Distributions
Recent approaches quantify semantic similarity between textual expressions by comparing the image distributions they induce under text-conditioned diffusion generative models (Liu et al., 2024). Each prompt specifies a reverse-time stochastic differential equation (SDE) trajectory over latent space, parameterized by a learned score network .
The semantic distance between prompts is computed as the Jensen–Shannon divergence between their induced path measures, operationalized as: with drawn from the mixture of the two path distributions over the SDE steps. Monte Carlo estimation (typically to $5$ trajectories, steps) is practical and stable.
The resulting distance matrix is min–max normalized and inverted (optionally via an exponential kernel) to a similarity scale suitable for heatmap visualization. Clustering reveals semantic blocks (e.g., canine vs. cetacean terms, verb classes). Pairwise heatmap entries can be interpreted visually, as the semantics are anchored in generated image trajectories and their score-function differences—providing an explicit explanation for the computed similarity (Liu et al., 2024).
6. Comparative Perspectives and Interpretive Functions
Semantic similarity heatmaps differ from traditional saliency maps and standard similarity matrices in multiple respects:
- Task alignment: Methods like AIS attribute importance to features optimizing alignment with external similarity judgments (human, neural, linguistic), while saliency maps focus on class-discriminative or visually attended regions (Truong et al., 2024).
- Semantic vs. low-level focus: Cognitive alignment (e.g., AIS, confusion) highlights comparison-relevant features, which may not coincide with perceptual saliency.
- Interpretability: As in topic-map and indexing-vocabulary approaches, heatmaps can be constructed to reflect explicit knowledge structures, offering more transparent semantic groupings (Rafi et al., 2013, Mutschke et al., 2015).
- Explanatory scope: Model-based approaches (AIS, LSA, diffusion-grounded similarity) yield both predictive improvements and interpretive visualizations, illuminating the mechanics of human or model-driven comparison.
Table: Comparison of Semantic Similarity Heatmap Methodologies
| Approach | Target Domain | Similarity Basis |
|---|---|---|
| AIS heatmaps (Truong et al., 2024) | Vision, multimodal | Feature importance to human-model alignment |
| Classification confusion (Zhou et al., 8 Feb 2025) | Language | Classifier confusion on contexts |
| Topic map (Rafi et al., 2013) | Text/documents | Structural subtree isomorphism |
| LSA (Koeman et al., 2014) | Language/documents | SVD-based latent cosine similarity |
| Diffusion path (Liu et al., 2024) | Language-to-image, text | JS divergence on generative paths |
| Index vocab (Mutschke et al., 2015) | Info retrieval, IR | Controlled-term co-occurrence |
These distinctions clarify the selection and configuration of heatmap methodologies for different applications and interpretive objectives.
7. Extensions and Theoretical Implications
Semantic similarity heatmaps can be extended beyond standard visual and linguistic tasks:
- Architectural generalization: AIS and related approaches operate across a range of DNN architectures and self-supervised models (Truong et al., 2024).
- Domain alignment: The underlying similarity measure can target neural, linguistic, or multimodal representational geometries (Truong et al., 2024).
- Cognitive modeling: By grounding heatmap features in empirical perturbation analyses, these techniques instantiate multidimensional scaling and “features of similarity” frameworks within modern neural representations.
- Dynamic and polysemous semantics: Confusion-based and clustering heatmaps enable the analysis of context-dependent meaning change, sense drifts, and category evolution, supporting investigations in cultural analytics (Zhou et al., 8 Feb 2025).
Semantic similarity heatmaps therefore unify representation, prediction, and explanation across a spectrum of technical paradigms, offering rigorous, quantitatively validated frameworks for empirical alignment between computational models and human or application-domain semantics.