Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts

Published 8 Nov 2025 in cs.CL and cs.LG | (2511.06048v1)

Abstract: Sparse autoencoders (SAEs) have emerged as a powerful tool for uncovering interpretable features in LLMs through the sparse directions they learn. However, the sheer number of extracted directions makes comprehensive exploration intractable. While conventional embedding techniques such as UMAP can reveal global structure, they suffer from limitations including high-dimensional compression artifacts, overplotting, and misleading neighborhood distortions. In this work, we propose a focused exploration framework that prioritizes curated concepts and their corresponding SAE features over attempts to visualize all available features simultaneously. We present an interactive visualization system that combines topology-based visual encoding with dimensionality reduction to faithfully represent both local and global relationships among selected features. This hybrid approach enables users to investigate SAE behavior through targeted, interpretable subsets, facilitating deeper and more nuanced analysis of concept representation in latent space.