Concept Boundary Vector (CBV) Analysis
- Concept Boundary Vector (CBV) is a vectorial construct in latent space that delineates the semantic boundary between two distinct neural activation sets.
- CBVs are derived through a precise method involving boundary pair extraction and gradient-based optimization, offering sharper and more localized interpretability than CAVs.
- Empirical studies show that CBVs yield higher logit influence and more robust adversarial perturbations, enhancing model interpretation and concept attribution.
A Concept Boundary Vector (CBV) is a vectorial construct in the latent space of a neural network, intended to encapsulate the semantic boundary between two distinct concepts as represented by their activation manifolds. Unlike previous approaches that primarily aim to separate concepts in latent space for classification, the CBV is explicitly derived to align with the normals to the decision boundary between the sets of latent activations corresponding to the positive (target) and negative (source) concepts. This geometric focus confers CBVs with distinct interpretability and precision characteristics, as demonstrated across empirical, mathematical, and topological analyses (Walker, 2024).
1. Formal Definition and Computational Framework
Let be a neural network, and denote the function up to layer , producing -dimensional activations. For a concept , let be the set of inputs exhibiting , and its embedding at . With two concepts (positive) and (negative), and denote their activations, respectively.
A Concept Boundary Vector is defined as the unit vector best aligning (in cosine similarity) with the set of normals to the decision boundary between and . Boundary pairs are selected to straddle the empirical boundary. The normalized difference vectors are collected as boundary normals. The CBV is obtained by solving
2. Construction Algorithm
The CBV computation procedure consists of two primary stages: identification of boundary pairs and optimization of the boundary-aligned vector.
Boundary Pair Extraction
For each , its nearest neighbor is found (stored in ), and vice versa for each . Mutual-nearest pairs and pairs where either side is nearest constitute .
Vector Optimization
For each , compute . Starting with a random unit vector , apply a gradient-based optimizer (e.g., Adam) to minimize , constraining after every step.
3. Comparison with Concept Activation Vectors (CAVs)
Concept Activation Vectors (CAVs) are conventionally obtained by training a linear classifier between (labeled ) and (labeled $0$), with the classifier's weight vector serving as the concept direction. The key distinctions between CAVs and CBVs are:
- Objective: CAVs maximize separation via classification accuracy (cross-entropy); CBVs maximize cosine alignment with local boundary normals.
- Geometric Sensitivity: CAVs disregard the margin location of examples, while CBVs focus on activations closest to the decision boundary.
- Locality: CBVs produce sharper changes in alignment-based loss under vector rotation, indicating a more precise, localized semantic direction.
Empirically, for the classification loss and similarity loss , the second derivative near the optimum, indicating stronger localization for CBV (Walker, 2024).
4. Empirical Analyses and Metrics
Experiments conducted on MNIST using a ConvNet (2 Conv + ReLU + MaxPool, 64-dim embedding) revealed the following:
- Logit Influence: CBVs yield greater influence on target class logits and more strongly suppress source logits compared to CAVs for nearly all concept class pairs .
- Concept Entanglement: CBVs exhibit higher variability across different target transformations, with their cosine similarities reflecting semantic changes (e.g., and produce distinct CBVs).
- Concept Algebra: Vector arithmetic accurately reconstructs intermediate concepts with higher success rate and stronger similarity for CBVs than CAVs (e.g., ).
- Adversarial Perturbations: CBV-aligned perturbations switch class labels with smaller than CAV perturbations, both on boundary activations and across the negative concept cluster ( on average).
- Spatial Attribution: For maximization tasks in input space, CBVs yield more focused feature attribution maps than CAVs, exemplified in the digit transformation.
- Topological Properties: Persistent homology reveals that CBV point clouds exhibit more robust 1-dimensional loops (higher persistence), and Mapper graph analyses demonstrate that CBV clusters have higher intra-cluster cosine similarity and richer semantic chaining.
5. Theoretical Characteristics and Interpretability
CBVs demonstrate higher boundary-faithfulness compared to CAVs by design, integrating local empirical boundary geometry, which enhances sensitivity to decision-surface complexity. Boundary complexity (sum of lifetimes in persistent homology) is negatively correlated with CBV logit influence, with this correlation being stronger for CBVs than for CAVs.
CBV loss landscapes are more sharply peaked; therefore, deviations from the learned direction quickly reduce the boundary-normal alignment. This confers increased invariance and robustness under random rotations. The homogeneity requirement ("A2") is studied via Euclidicity using TARDIS, confirming that CBV logit influence is higher when the difference in Euclidicity between concept boundary and interior is low, and when positive and negative concept interiors have similar Euclidicity.
6. Use Cases and Practical Relevance
CBVs are applicable to a range of tasks:
- Model Interpretation: Quantifying the influence of semantic directions in latent space on output logits, extending concept attribution frameworks such as TCAV.
- Local Concept Attribution: Enabling spatial feature maximization explanations that are sharper than those produced with CAVs.
- Concept Entanglement and Algebra: Mapping and probing the compositional and relational structure underlying learned representations.
- Adversarial Example Generation: Crafting targeted perturbations aligned with semantic boundaries for robustness evaluation.
- Cross-Layer and Dictionary Alignment: In Vision Transformers, CBVs become more sharply aligned at deeper layers and show greater across-layer consistency than CAVs; CBVs also align with unsupervised features from sparse autoencoders, exposing semantic axes.
7. Constraints, Limitations, and Prospective Directions
Constructing CBVs imposes specific computational and algorithmic costs. Boundary pair extraction for sets is and scales linearly in embedding dimension , with further costs for vector optimization. In deep architectures, boundary pairs become sparse in higher layers, raising concerns about overfitting to limited boundary points; however, empirical results suggest the "A2" homogeneity assumption typically holds.
Current CBV methods employ linear nearest-neighbor boundary pairing; extending to nonlinear boundaries or alternative metrics (e.g., kernelized distances) could extend CBV applicability, similar to methods for concept activation regions. Future research includes kernel and nonlinear CBVs, unsupervised boundary detection, integrating causal inner-product concepts, and benchmarking on large-scale vision and LLMs (Walker, 2024).