Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 102 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 30 tok/s
GPT-5 High 27 tok/s Pro
GPT-4o 110 tok/s
GPT OSS 120B 475 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

Revisiting End-To-End Sparse Autoencoder Training: A Short Finetune Is All You Need (2503.17272v2)

Published 21 Mar 2025 in cs.LG

Abstract: Sparse autoencoders (SAEs) are widely used for interpreting LLM activations. A key evaluation metric is the increase in cross-entropy loss between the original model logits and the reconstructed model logits when replacing model activations with SAE reconstructions. Typically, SAEs are trained solely on mean squared error (MSE) when reconstructing precomputed, shuffled activations. Recent work introduced training SAEs directly with a combination of KL divergence and MSE ("end-to-end" SAEs), significantly improving reconstruction accuracy at the cost of substantially increased computation, which has limited their widespread adoption. We propose a brief KL+MSE fine-tuning step applied only to the final 25M training tokens (just a few percent of typical training budgets) that achieves comparable improvements, reducing the cross-entropy loss gap by 20-50%, while incurring minimal additional computational cost. We further find that multiple fine-tuning methods (KL fine-tuning, LoRA adapters, linear adapters) yield similar, non-additive cross-entropy improvements, suggesting a common, easily correctable error source in MSE-trained SAEs. We demonstrate a straightforward method for effectively transferring hyperparameters and sparsity penalties between training phases despite scale differences between KL and MSE losses. While both ReLU and TopK SAEs see significant cross-entropy loss improvements, evaluations on supervised SAEBench metrics yield mixed results, with improvements on some metrics and decreases on others, depending on both the SAE architecture and downstream task. Nonetheless, our method may offer meaningful improvements in interpretability applications such as circuit analysis with minor additional cost.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

X Twitter Logo Streamline Icon: https://streamlinehq.com