BatchTopK Sparse Autoencoders (2412.06410v1)

Published 9 Dec 2024 in cs.LG, cs.AI, and stat.ML

Abstract: Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting LLM activations by decomposing them into sparse, interpretable features. A popular approach is the TopK SAE, that uses a fixed number of the most active latents per sample to reconstruct the model activations. We introduce BatchTopK SAEs, a training method that improves upon TopK SAEs by relaxing the top-k constraint to the batch-level, allowing for a variable number of latents to be active per sample. As a result, BatchTopK adaptively allocates more or fewer latents depending on the sample, improving reconstruction without sacrificing average sparsity. We show that BatchTopK SAEs consistently outperform TopK SAEs in reconstructing activations from GPT-2 Small and Gemma 2 2B, and achieve comparable performance to state-of-the-art JumpReLU SAEs. However, an advantage of BatchTopK is that the average number of latents can be directly specified, rather than approximately tuned through a costly hyperparameter sweep. We provide code for training and evaluating BatchTopK SAEs at https://github.com/bartbussmann/BatchTopK

Authors (3)

Bart Bussmann (6 papers)
Patrick Leask (4 papers)
Neel Nanda (50 papers)

Citations (1)

View on Semantic Scholar

Summary

Evaluating BatchTopK Sparse Autoencoders for Increased Reconstruction Performance

The paper "BatchTopK Sparse Autoencoders" introduces a refined approach to sparse autoencoders (SAEs), specifically focusing on enhancing the TopK SAE framework. Developed by Bart Bussmann, Patrick Leask, and Neel Nanda, this advancement rethinks the conventional mechanism of TopK SAEs by establishing a batch-level allocation system for active latents, rather than maintaining a uniform allocation per individual sample. This innovation allows the allocation of latent nodes based on the batch context, adapting dynamically to the sample complexity, which ultimately enhances reconstruction precision without compromising the sparsity constraints.

Key Contributions and Methodology

The primary contribution of this research is the introduction of the BatchTopK mechanism, which selects top activations from the entire batch rather than an individual sample. This method modifies the activation function by employing a batch-level thresholding approach, a significant departure from the fixed allocation used in standard TopK SAEs. This shift enables BatchTopK to adjust the number of active latents based on the batch context, privileging complex samples with more latents while limiting simpler samples, thus optimizing resources and improving network performance.

Two popular LLMs, GPT-2 Small and Gemma 2 2B, were used to assess the efficacy of the BatchTopK SAEs. A series of experiments conducted on these models demonstrated that BatchTopK consistently outperforms traditional TopK SAEs in terms of normalized mean squared error (NMSE) and cross-entropy (CE), across various dictionary sizes and different sparsity levels. Such improvements were particularly noted in settings with a fixed number of active latents, as well as at specific dictionary sizes, indicating a robust performance enhancement attributable to the flexible allocation strategy.

Overall Performance and Implications

Experimental data substantiate the claims that BatchTopK achieves enhanced reconstruction performance relative to both TopK and JumpReLU SAEs, especially when dealing with a modest number of active latents or fixed dictionary size. While BatchTopK consistently improved reconstruction error figures on GPT-2 Small, its comparative advantage against JumpReLU SAEs on Gemma 2 2B was partially constrained to conditions with lower sparsity settings.

The implications of these findings extend beyond simple reconstruction metrics. By mitigating the need for exhaustive hyperparameter sweeps typical of methods like TopK SAEs, BatchTopK provides a more direct control over sparsity, thereby streamlining the tuning process. Furthermore, given its architectural similarity, it is plausible that the latents learned by BatchTopK provide comparable interpretability to those extracted by traditional TopK methods, though this aspect was not directly evaluated.

Future Directions

The avenues for further exploration involve extending the adaptability and application of BatchTopK to more complex models and diverse datasets. Future work could delve into leveraging this batch-level flexibility to more gradients of complexity in data, perhaps in conjunction with advance guided learning paradigms. Another potential direction could involve integrating BatchTopK SAEs within hybrid systems that benefit from the dynamic latent allocation, enabling more comprehensive and efficient model interpretability and performance adjustment.

The BatchTopK framework exemplifies how nuanced changes in architecture and training methodology can significantly bolster the practical capabilities of SAEs. By aligning the latent activation mechanisms with the complex, variable nature of model input data, BatchTopK heralds an important development in scalable, interpretable machine learning systems.

PDF Markdown

Related Papers

GitHub

GitHub - bartbussmann/BatchTopK (8 stars)

Tweets

https://twitter.com/BartBussmann/status/1866438129252508002

https://twitter.com/livgorton/status/1919509358741701018