Train Sparse Autoencoders Efficiently by Utilizing Features Correlation (2505.22255v1)

Published 28 May 2025 in cs.LG and cs.CL

Abstract: Sparse Autoencoders (SAEs) have demonstrated significant promise in interpreting the hidden states of LLMs by decomposing them into interpretable latent directions. However, training SAEs at scale remains challenging, especially when large dictionary sizes are used. While decoders can leverage sparse-aware kernels for efficiency, encoders still require computationally intensive linear operations with large output dimensions. To address this, we propose KronSAE, a novel architecture that factorizes the latent representation via Kronecker product decomposition, drastically reducing memory and computational overhead. Furthermore, we introduce mAND, a differentiable activation function approximating the binary AND operation, which improves interpretability and performance in our factorized framework.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Train Sparse Autoencoders Efficiently by Utilizing Features Correlation (2505.22255v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (5)