Targeted SAE clustering for sentiment and emotion labels

Identify a combination of natural-language query terms and a top-k latent selection parameter that enables sparse autoencoder–based targeted clustering to align with ground-truth sentiment and emotion labels on Twitter datasets (SemEval-2017 Task 4 for sentiment and CARER for emotion), thereby determining whether SAE embeddings can recover these label structures under appropriate filtering.

Background

The paper proposes targeted clustering by filtering SAE embedding dimensions with labels semantically related to user-specified keyphrases. While this approach produces meaningful clusters on several datasets, the authors report difficulty in reproducing clusters that match ground-truth sentiment and emotion labels on Twitter benchmarks.

They note that general-purpose dense embeddings also fail to align with these ground truths, whereas task-specific finetuned models perform well. The authors explicitly state they were unable to find effective query and k settings for the SAE method, leaving open whether such settings exist that would recover the labeled structures.

References

For our SAE method, we were unable to find a good combination of queries and k.

— Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit (2512.10092 - Jiang et al., 10 Dec 2025) in Appendix: Additional Results—Clustering, subsection “Failure to recover ground truth labels for sentiment and emotion clustering”

Targeted SAE clustering for sentiment and emotion labels

Background

References

Related Problems