Class-Averaged Text Embeddings
- Class-Averaged Text Embeddings are methods that aggregate distributed text features over all instances in a class to create interpretable and discriminative semantic prototypes.
- Various techniques—such as naive averaging, learned class vectors, hyperbolic centroids, and sparse autoencoder-derived conceptual averages—optimize accuracy and interpretability in tasks like classification and ontology alignment.
- These embeddings are applied for prototype-driven classification, multilingual alignment, and bias auditing, achieving up to 94.91% accuracy and improved correlation metrics in empirical evaluations.
Class-averaged text embeddings encode the semantic profile of entire classes—such as categories, topics, document labels, or ontology concepts—by aggregating distributed text or feature representations over all instances within the class. These embeddings function as explicit class prototypes in neural or kernel space, serving as interpretable anchors for classification, semantic alignment, model interpretability, and knowledge integration. Modern approaches encompass naive mean aggregation, learned class vectors, hyperbolic centroids, and sparse autoencoder-derived “conceptual averages,” often leveraging supervised, multilingual, or geometric constraints.
1. Motivation for Class-Averaged Representations
Class-averaged text embeddings address the need for compact, discriminatory encodings of class semantics that go beyond token-level or document-level embeddings. While word or paragraph embeddings (e.g., word2vec, Paragraph Vector) capture local lexical or document context, many downstream tasks—such as topic modeling, ontology alignment, or classification—require an explicit class-level summary. Class-averaged embeddings enable:
- Direct similarity computation between text (or features) and classes.
- Efficient prototype-based classification and clustering.
- Enhanced statistical sharing across all documents of a class, which is critical in low-resource or imbalanced regimes.
- Mechanistic interpretability and semantic audit of learned features via correspondence between vector components and human-understandable concepts (Sachan et al., 2015, O'Reilly et al., 19 Aug 2025).
2. Naive Averaging, Class Vectors, and Discriminative Optimization
The most direct approach computes a class centroid by averaging the embeddings of all constituent instances (words, sentences, documents):
where is the set of documents in class . While simple, this unsupervised aggregation is susceptible to dilution by atypical sentences or dominant lexical signals.
The Class Vectors framework extends the skip-gram paradigm by jointly learning explicit vectors for each class, optimized to predict class-specific word distributions (Sachan et al., 2015). The augmented objective incorporates class-word co-occurrences:
Here, accurately encodes features that are maximally discriminative for its class, addressing the main weaknesses of naive centroids:
- No direct supervision between centroid and class-indicative words.
- Over-representation of common but nonspecific tokens.
- Potential skew induced by heterogeneous document lengths or outliers.
Empirically, class vectors are sharper and more semantically coherent than naive averages; for sentiment analysis benchmarks, they yield classification accuracy that matches or surpasses strong baselines such as CNNs and NB-LR. For instance, norm CV-LR achieved 94.91% on Yelp data, outperforming classic bag-of-words and Doc2Vec approaches (Sachan et al., 2015).
3. Geometric Generalization: Hyperbolic Class Centroids
Recent advances in geometric NLP embed tokens or documents in non-Euclidean (specifically hyperbolic) manifolds to better capture hierarchical relationships and long-tailed distributions (Gerek et al., 2022). In these contexts, the concept of an average must be replaced by a suitable centroid operation. The Fréchet mean, which minimizes sum-of-squared geodesic distances,
is the intrinsic notion of centroid on a Riemannian manifold . Exact computation is iterative and computationally demanding; therefore, practical algorithms employ fast, O(n) approximations in the Poincaré ball via Möbius addition and midpoint operations:
- Naive centroid (NC): successive Möbius additions, scaled by $1/n$.
- Linear forward/backward centroid (LFC/LBC): recursive weighted midpoints, order-sensitive.
- Linear average centroid (LAC): midpoint of LFC and LBC.
- Binary tree centroid (BTC): balanced merge tree.
Document or class prototypes are computed by aggregating word (or document) embeddings using these schemes. On Turkish news datasets, LAC and LBC achieved up to 91.9% accuracy, competitive with (or surpassing) Euclidean means. On large English corpora, Euclidean averaging remains dominant, likely due to the relatively shallow hierarchies in such datasets (Gerek et al., 2022).
4. Sparse Autoencoders and Multilingual Conceptual Averages
When applied to neural LLM hidden states, averaging embeddings across classes or modalities often entangles syntactic and language-specific variations, reducing semantic purity. O’Reilly et al. introduce a sparse autoencoder (SAE) framework to extract high-level concept activations from the hidden layers of a LLM (Gemma 2B). For each ontology class, natural-language descriptions (in English, French, and Chinese) are passed to the model, and activations (sparse codes) are extracted (O'Reilly et al., 19 Aug 2025).
The “conceptual average” is formed as follows:
- Extract sparse activations for each class c in each language.
- Compute the intersection of active feature indices present in all language-specific activations, suppressing language-specific variance.
- Form the class’s multilingual average by averaging the nonzero components over surviving indices:
The resulting code, , is both interpretable and maximally language-invariant. Evaluation is performed by correlating cosine similarities of these averaged codes against a reference ontology alignment using the point-biserial correlation :
- English-only:
- English + French:
- English + Chinese: (multilingual summary prompts; under-sampled negatives)
Thus, multilingual averaging yielded a 0.30 absolute improvement in correlation over the best single-language baseline. This confirms that conceptual averaging via SAE robustly disentangles semantics from surface form, increasing alignment with true ontological structure and mechanistic interpretability (O'Reilly et al., 19 Aug 2025).
5. Evaluation Metrics and Empirical Performance
Common evaluation protocols for class-averaged text embeddings include:
- Classification accuracy, as in sentiment or topic categorization (Sachan et al., 2015, Gerek et al., 2022).
- Correlation of similarity scores vs. ground-truth alignments (point-biserial correlation, as in ontology alignment) (O'Reilly et al., 19 Aug 2025).
- Feature interpretability, i.e., the degree to which nonzero code entries map to human-interpretable concepts (Neuronpedia, concept dictionaries) (O'Reilly et al., 19 Aug 2025).
Empirical findings across approaches:
| Approach & Dataset | Metric | Performance |
|---|---|---|
| Norm CV-LR (Yelp) | Accuracy | 94.91% |
| SAE conceptual avg (ont.) | r_pb | 0.09 (EN), 0.39 (EN+FR), 0.33 (EN+ZH) |
| LAC/LBC (1150Haber) | Accuracy | 91.9%, 91.65% |
For English sentiment data, class vector LR features are competitive with CNNs. On morphologically rich language data, hyperbolic centroids are at least as effective as Euclidean means, and in multilingual ontology alignment, SAE-based averages deliver the strongest semantic correspondence (Sachan et al., 2015, O'Reilly et al., 19 Aug 2025, Gerek et al., 2022).
6. Interpretability and Downstream Uses
Class-averaged embeddings serve as interpretable, auditable features for:
- Ontology alignment (mapping of categories across languages or datasets) (O'Reilly et al., 19 Aug 2025).
- Bias and safety auditing of LLM concept representations, as the sparsity and intersection steps yield features traceable to actual model neurons (O'Reilly et al., 19 Aug 2025).
- Hybrid reasoning systems that integrate neural and symbolic models, leveraging class prototypes as bridge features.
- Prototype-driven classification, retrieval, and novelty detection tasks (Sachan et al., 2015, Gerek et al., 2022).
By isolating only the features that are invariant across linguistic or context shifts, conceptual averages offer a route to semantic “purification” and mechanistic model inspection at scale.
7. Best Practices and Limitations
Optimal results and interpretability derive from:
- Jointly learning class vectors with word tokens, rather than naive centroiding.
- Appropriately selecting centroid computation schemes (LAC, LBC) when working in hyperbolic spaces or with strongly non-Euclidean corpora (Gerek et al., 2022).
- Enforcing sparsity and intersecting features across modalities/languages to suppress spurious, entangled codes (O'Reilly et al., 19 Aug 2025).
- Choosing evaluation metrics (accuracy, r_pb) aligned with downstream objectives and dataset properties.
Limitations include:
- Naive averaging can dilute discriminative content when classes are heterogeneous or document lengths vary considerably (Sachan et al., 2015).
- Hyperbolic centroiding, while efficient, is only an approximation to the true Fréchet mean and may underperform on large, flat (Euclidean) datasets (Gerek et al., 2022).
- Multilingual SAE conceptual averaging requires high-quality translations and robust prompt engineering to avoid leakage of linguistic artifacts (O'Reilly et al., 19 Aug 2025).
This suggests that the regime in which class-averaged text embeddings have maximal unique benefit is when semantic structure is deep, hierarchical, or language-variant, and where mechanistic interpretability is a priority.